If it works without Arrow optimization, it's likely a bug. Please feel free
to file a JIRA for that.
On Wed, 7 Oct 2020, 22:44 Jacek Pliszka, wrote:
> Hi!
>
> Is there any place I can find information how to use gapply with arrow?
>
> I've tried something very simple
>
> collect(gapply(
> df,
> c("ColumnA"),
> function(key, x){
> data.frame(out=c("dfs"), stringAsFactors=FALSE)
> },
> "out String"
> ))
>
> But it fails - similar code with integers or double works fine.
>
> [Fetched stdout timeout] Error in readBin(con, raw(),
> as.integer(dataLen), endian = "big") : invalid 'n' argument
>
> java.lang.UnsupportedOperationException at
>
> org.apache.spark.sql.vectorized.ArrowColumnVector$ArrowVectorAccessor.getUTF8String(ArrowColumnVector.java:233)
> at
> org.apache.spark.sql.vectorized.ArrowColumnVector.getUTF8String(ArrowColumnVector.java:109)
> at
> org.apache.spark.sql.vectorized.ColumnarBatchRow.getUTF8String(ColumnarBatch.java:220)
> at
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
> Source)
> ...
>
> When I looked at the source code there - it is all stubs.
>
> Is there a proper way to use arrow in gapply in SparkR?
>
> BR,
>
> Jacel
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>