Github user ajacques commented on the issue: https://github.com/apache/spark/pull/21889 @mallman: I've rebased on top of your changes and pushed. I'm seeing the following: Given the following schema: ``` root |-- id: integer (nullable = true) |-- name: struct (nullable = true) | |-- first: string (nullable = true) | |-- middle: string (nullable = true) | |-- last: string (nullable = true) |-- address: string (nullable = true) |-- pets: integer (nullable = true) |-- friends: array (nullable = true) | |-- element: struct (containsNull = true) | | |-- first: string (nullable = true) | | |-- middle: string (nullable = true) | | |-- last: string (nullable = true) |-- relatives: map (nullable = true) | |-- key: string | |-- value: struct (valueContainsNull = true) | | |-- first: string (nullable = true) | | |-- middle: string (nullable = true) | | |-- last: string (nullable = true) |-- p: integer (nullable = true) ``` The query: `select name.middle, address from temp` throws: ``` Caused by: org.apache.parquet.io.ParquetDecodingException: Can not read value at 0 in block -1 in file file:/private/var/folders/ss/cw601dzn59b2nygs8k1bs78x75lhr0/T/spark-cab140ca-cbba-4dc1-9fe5-6ae739dab70a/contacts/p=2/part-00000-91d2abf5-625f-4080-b34c-e373b89c9895-c000.snappy.parquet at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:251) at org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:207) at org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:109) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:186) ... 20 more Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 at java.util.ArrayList.rangeCheck(ArrayList.java:657) at java.util.ArrayList.get(ArrayList.java:433) at org.apache.parquet.io.GroupColumnIO.getFirst(GroupColumnIO.java:99) at org.apache.parquet.io.GroupColumnIO.getFirst(GroupColumnIO.java:99) at org.apache.parquet.io.PrimitiveColumnIO.getFirst(PrimitiveColumnIO.java:97) at org.apache.parquet.io.PrimitiveColumnIO.isFirst(PrimitiveColumnIO.java:92) at org.apache.parquet.io.RecordReaderImplementation.<init>(RecordReaderImplementation.java:278) at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:147) at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:109) at org.apache.parquet.filter2.compat.FilterCompat$NoOpFilter.accept(FilterCompat.java:165) at org.apache.parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:109) at org.apache.parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:137) at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:222) ... 25 more ``` No root cause yet, but I noticed this while working with the unit tests.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org