subject:"org.apache.spark.sql.types.GenericArrayData cannot be cast to org.apache.spark.sql.catalyst.InternalRow"

Re: org.apache.spark.sql.types.GenericArrayData cannot be cast to org.apache.spark.sql.catalyst.InternalRow

2016-03-07 Thread dmt

Is there a workaround ? My dataset contains billions of rows, and it would be nice to ignore/exclude the few lines that are badly formatted. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/org-apache-spark-sql-types-GenericArrayData-cannot-be-cast-to-org-apa

Re: org.apache.spark.sql.types.GenericArrayData cannot be cast to org.apache.spark.sql.catalyst.InternalRow

2016-03-07 Thread dmt

I have found why the exception is raised. I have defined a JSON schema, using org.apache.spark.sql.types.StructType, that expects this kind of record : /{ "request": { "user": { "id": 123 } } }/ There's a bad record in my dataset, that defines field "user" as an array, instead of

org.apache.spark.sql.types.GenericArrayData cannot be cast to org.apache.spark.sql.catalyst.InternalRow

2016-03-02 Thread dmt

): java.lang.ClassCastException: org.apache.spark.sql.types.GenericArrayData cannot be cast to org.apache.spark.sql.catalyst.InternalRow at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getStruct(rows.scala:50) at