I have some code to recover a complex structured row from a dataset. The row contains several ARRAY fields (mostly Array(IntegerType)), which are populated with Array[java.lang.Integer], as that seems to be the only way the Spark row serializer will accept them.
If the dataset is written out to a file (parquet in this case), and then read back in from the file, Row.getList() (either scala or java) works fine, and I get a List. But if I simply apply the created dataset into another dataset iterator, Row.getList() throws an exception: java.lang.ClassCastException: [Ljava.lang.Integer; cannot be cast to scala.collection.Seq On top of that mess, the array fields of the row which were assigned a null show up as non-null empty arrays, yet when written out to a file and then read back, they are actually null. Why isn't the behavior consistent ? And why isn't there a Row.getArray() ? Will any of this nonsense be fixed in 3.0 ? --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org