Neil McQuarrie created SPARK-21727:
--------------------------------------

             Summary: Operating on an ArrayType in a SparkR DataFrame throws 
error
                 Key: SPARK-21727
                 URL: https://issues.apache.org/jira/browse/SPARK-21727
             Project: Spark
          Issue Type: Bug
          Components: SparkR
    Affects Versions: 2.2.0
            Reporter: Neil McQuarrie


Previously 
[posted|https://stackoverflow.com/questions/45056973/sparkr-dataframe-with-r-lists-as-elements]
 this as a stack overflow question but it seems to be a bug.

If I have an R data.frame where one of the column data types is an integer list 
-- i.e., each of the column elements embeds an entire R list of integers -- 
then I can convert the data.frame to a SparkR DataFrame just fine; SparkR 
treats the column as ArrayType(Double). However, any subsequent operation on 
this DataFrame appears to throw an error.

Create an example R data.frame:
{code}
indices <- 1:4
myDf <- data.frame(indices)
myDf$data <- list(rep(0, 20))}}
{code}

Convert it to a SparkR DataFrame:
{code}
library(SparkR, lib.loc=paste0(Sys.getenv("SPARK_HOME"),"/R/lib"))
sparkR.session(master = "local[*]")
mySparkDf <- as.DataFrame(myDf)
{code}

Examine the DataFrame schema; the list column was successfully converted to 
ArrayType:
{code}
> schema(mySparkDf)
StructType
|-name = "indices", type = "IntegerType", nullable = TRUE
|-name = "data", type = "ArrayType(DoubleType,true)", nullable = TRUE
{code}

However, operating on the SparkR DataFrame throws an error:
{code}
> collect(mySparkDf)
17/07/13 17:23:00 ERROR executor.Executor: Exception in task 0.0 in stage 1.0 
(TID 1)
java.lang.RuntimeException: Error while encoding: java.lang.RuntimeException: 
java.lang.Double is not a valid external type for schema of array<double>
if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null 
else validateexternaltype(getexternalrowfield(assertnotnull(input[0, 
org.apache.spark.sql.Row, true]), 0, indices), IntegerType) AS indices#0
... long stack trace ...
{code}

Using Spark 2.2.0, R 3.4.0, Java 1.8.0_131, Windows 10.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to