Re: [SparkR] gapply with strings with arrow

2020-10-10 Thread Hyukjin Kwon
If it works without Arrow optimization, it's likely a bug. Please feel free
to file a JIRA for that.

On Wed, 7 Oct 2020, 22:44 Jacek Pliszka,  wrote:

> Hi!
>
> Is there any place I can find information how to use gapply with arrow?
>
> I've tried something very simple
>
> collect(gapply(
>   df,
>   c("ColumnA"),
>   function(key, x){
>   data.frame(out=c("dfs"), stringAsFactors=FALSE)
>   },
>   "out String"
> ))
>
> But it fails - similar code with integers or double works fine.
>
> [Fetched stdout timeout] Error in readBin(con, raw(),
> as.integer(dataLen), endian = "big") : invalid 'n' argument
>
> java.lang.UnsupportedOperationException at
>
> org.apache.spark.sql.vectorized.ArrowColumnVector$ArrowVectorAccessor.getUTF8String(ArrowColumnVector.java:233)
> at
> org.apache.spark.sql.vectorized.ArrowColumnVector.getUTF8String(ArrowColumnVector.java:109)
> at
> org.apache.spark.sql.vectorized.ColumnarBatchRow.getUTF8String(ColumnarBatch.java:220)
> at
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
> Source)
>  ...
>
> When I looked at the source code there - it is all stubs.
>
> Is there a proper way to use arrow in gapply in SparkR?
>
> BR,
>
> Jacel
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


[SparkR] gapply with strings with arrow

2020-10-07 Thread Jacek Pliszka
Hi!

Is there any place I can find information how to use gapply with arrow?

I've tried something very simple

collect(gapply(
  df,
  c("ColumnA"),
  function(key, x){
  data.frame(out=c("dfs"), stringAsFactors=FALSE)
  },
  "out String"
))

But it fails - similar code with integers or double works fine.

[Fetched stdout timeout] Error in readBin(con, raw(),
as.integer(dataLen), endian = "big") : invalid 'n' argument

java.lang.UnsupportedOperationException at
org.apache.spark.sql.vectorized.ArrowColumnVector$ArrowVectorAccessor.getUTF8String(ArrowColumnVector.java:233)
at 
org.apache.spark.sql.vectorized.ArrowColumnVector.getUTF8String(ArrowColumnVector.java:109)
at 
org.apache.spark.sql.vectorized.ColumnarBatchRow.getUTF8String(ColumnarBatch.java:220)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
Source)
 ...

When I looked at the source code there - it is all stubs.

Is there a proper way to use arrow in gapply in SparkR?

BR,

Jacel

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org