[ https://issues.apache.org/jira/browse/SPARK-11046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15613459#comment-15613459 ]
Sammie Durugo commented on SPARK-11046: --------------------------------------- I'm not sure anyone has noticed that nested schema cannot be passed to dapply and array type cannot by declared just like you would do with "integer", "string", and "double" when defining a schema using structType. I think it will be useful to be able to declare array type when using dapply as most R outputs take the form of an R list object. For example, suppose the R output takes the following form: output = list(bd = array(..., dim = c(d1, d2, d3), dd = matrix(..., nr, nc), cp = list(a = matrix(..., nr, nc), b = vector(...)) ), in order to define a schema to pass to dapply in the above context, one should have the liberty to define the schema with the following form (if possible): schema = structType(structField("bd", "array"), structField("dd", "array"), structField("cp", structType(structField("a", "array"), structField("b", "double") ) ) ), which may look like this (if possible): StructType |-name = "bd", type = "ArrayType", nullable = TRUE |-name = "dd", type = "ArrayType", nullable = TRUE |-name = "cp", type = "ArrayType", nullable = TRUE |-name = "a", type = "ArrayType", nullable = TRUE |-name = "b", type = "double", nullable = TRUE At the moment, only character type is allowed for data type parameter within structField. But by relaxing this condition and allowing the flexibility to pass in a structType inside an existing structType, the above structure can be accommodated easily. Also, you should allow R list objects, which are very close to R array objects by design, to be mapped into spark's ArrayType. Having to use the default setting of 'schema = NULL' in the dapply, which leaves the output as bytes should be the very last resort. Thank you for your help with this. Sammie. > Pass schema from R to JVM using JSON format > ------------------------------------------- > > Key: SPARK-11046 > URL: https://issues.apache.org/jira/browse/SPARK-11046 > Project: Spark > Issue Type: Improvement > Components: SparkR > Affects Versions: 1.5.1 > Reporter: Sun Rui > Priority: Minor > > Currently, SparkR passes a DataFrame schema from R to JVM backend using > regular expression. However, Spark now supports schmea using JSON format. > So enhance SparkR to use schema in JSON format. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org