[ https://issues.apache.org/jira/browse/SPARK-1913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael Armbrust resolved SPARK-1913. ------------------------------------- Resolution: Fixed > Parquet table column pruning error caused by filter pushdown > ------------------------------------------------------------ > > Key: SPARK-1913 > URL: https://issues.apache.org/jira/browse/SPARK-1913 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.1.0 > Environment: mac os 10.9.2 > Reporter: Chen Chao > Assignee: Cheng Lian > > When scanning Parquet tables, attributes referenced only in predicates that > are pushed down are not passed to the `ParquetTableScan` operator and causes > exception. Verified in the {{sbt hive/console}}: > {code} > loadTestTable("src") > table("src").saveAsParquetFile("src.parquet") > parquetFile("src.parquet").registerAsTable("src_parquet") > hql("SELECT value FROM src_parquet WHERE key < 10").collect().foreach(println) > {code} > Exception > {code} > parquet.io.ParquetDecodingException: Can not read value at 0 in block -1 in > file file:/scratch/rxin/spark/src.parquet/part-r-2.parquet > at > parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:177) > at > parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:130) > at > org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:122) > at > org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) > at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at > scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) > at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) > at scala.collection.AbstractIterator.to(Iterator.scala:1157) > at > scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) > at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) > at > scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) > at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) > at org.apache.spark.rdd.RDD$$anonfun$15.apply(RDD.scala:717) > at org.apache.spark.rdd.RDD$$anonfun$15.apply(RDD.scala:717) > at > org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1080) > at > org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1080) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111) > at org.apache.spark.scheduler.Task.run(Task.scala:51) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.lang.IllegalArgumentException: Column key does not exist. > at parquet.filter.ColumnRecordFilter$1.bind(ColumnRecordFilter.java:51) > at > org.apache.spark.sql.parquet.ComparisonFilter.bind(ParquetFilters.scala:306) > at parquet.io.FilteredRecordReader.<init>(FilteredRecordReader.java:46) > at parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:74) > at > parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:110) > at > parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:172) > ... 28 more > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)