Is there a workaround ? My dataset contains billions of rows, and it would be
nice to ignore/exclude the few lines that are badly formatted.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/org-apache-spark-sql-types-GenericArrayData-cannot-be-cast-to-org-apa
I have found why the exception is raised.
I have defined a JSON schema, using org.apache.spark.sql.types.StructType,
that expects this kind of record :
/{
"request": {
"user": {
"id": 123
}
}
}/
There's a bad record in my dataset, that defines field "user" as an array,
instead of
): java.lang.ClassCastException:
org.apache.spark.sql.types.GenericArrayData cannot be cast to
org.apache.spark.sql.catalyst.InternalRow
at
org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getStruct(rows.scala:50)
at