[jira] [Updated] (SPARK-9442) java.lang.ArithmeticException: / by zero when reading Parquet
[ https://issues.apache.org/jira/browse/SPARK-9442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-9442: Labels: bulk-closed (was: ) > java.lang.ArithmeticException: / by zero when reading Parquet > - > > Key: SPARK-9442 > URL: https://issues.apache.org/jira/browse/SPARK-9442 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.4.1 >Reporter: DB Tsai >Priority: Major > Labels: bulk-closed > > I am counting how many records in my nested parquet file with this schema, > {code} > scala> u1aTesting.printSchema > root > |-- profileId: long (nullable = true) > |-- country: string (nullable = true) > |-- data: array (nullable = true) > ||-- element: struct (containsNull = true) > |||-- videoId: long (nullable = true) > |||-- date: long (nullable = true) > |||-- label: double (nullable = true) > |||-- weight: double (nullable = true) > |||-- features: vector (nullable = true) > {code} > and the number of the records in the nested data array is around 10k, and > each of the parquet file is around 600MB. The total size is around 120GB. > I am doing a simple count > {code} > scala> u1aTesting.count > parquet.io.ParquetDecodingException: Can not read value at 100 in block 0 in > file > hdfs://compute-1.amazonaws.com:9000/users/dbtsai/testing/u1old/20150721/part-r-00115-d70c946b-b0f0-45fe-9965-b9f062b9ec6d.gz.parquet > at > parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:213) > at > parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:204) > at > org.apache.spark.sql.sources.SqlNewHadoopRDD$$anon$1.hasNext(SqlNewHadoopRDD.scala:163) > at > org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) > at > org.apache.spark.sql.execution.Aggregate$$anonfun$doExecute$1$$anonfun$6.apply(Aggregate.scala:129) > at > org.apache.spark.sql.execution.Aggregate$$anonfun$doExecute$1$$anonfun$6.apply(Aggregate.scala:126) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:686) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:686) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:70) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.ArithmeticException: / by zero > at > parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:109) > at > parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:193) > ... 21 more > {code} > BTW, no all the tasks fail, and some of them are successful. > Another note: By explicitly looping through the data to count, it will works. > {code} > sqlContext.read.load(hdfsPath + s"/testing/u1snappy/${date}/").map(x => > 1L).reduce((x, y) => x + y) > {code} > I think maybe some metadata in parquet files are corrupted. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9442) java.lang.ArithmeticException: / by zero when reading Parquet
[ https://issues.apache.org/jira/browse/SPARK-9442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DB Tsai updated SPARK-9442: --- Description: I am counting how many records in my nested parquet file with this schema, {code} scala u1aTesting.printSchema root |-- profileId: long (nullable = true) |-- country: string (nullable = true) |-- data: array (nullable = true) ||-- element: struct (containsNull = true) |||-- videoId: long (nullable = true) |||-- date: long (nullable = true) |||-- label: double (nullable = true) |||-- weight: double (nullable = true) |||-- features: vector (nullable = true) {code} and the number of the records in the nested data array is around 10k, and each of the parquet file is around 600MB. The total size is around 120GB. I am doing a simple count {code} scala u1aTesting.count parquet.io.ParquetDecodingException: Can not read value at 100 in block 0 in file hdfs://compute-1.amazonaws.com:9000/users/dbtsai/testing/u1old/20150721/part-r-00115-d70c946b-b0f0-45fe-9965-b9f062b9ec6d.gz.parquet at parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:213) at parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:204) at org.apache.spark.sql.sources.SqlNewHadoopRDD$$anon$1.hasNext(SqlNewHadoopRDD.scala:163) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) at org.apache.spark.sql.execution.Aggregate$$anonfun$doExecute$1$$anonfun$6.apply(Aggregate.scala:129) at org.apache.spark.sql.execution.Aggregate$$anonfun$doExecute$1$$anonfun$6.apply(Aggregate.scala:126) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:686) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:686) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:70) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ArithmeticException: / by zero at parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:109) at parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:193) ... 21 more {code} BTW, no all the tasks fail, and some of them are successful. Another note: By explicitly looping through the data to count, it will works. [code] sqlContext.read.load(hdfsPath + s/testing/u1snappy/${date}/).map(x = 1L).reduce((x, y) = x + y) [code] I think maybe some metadata in parquet files are corrupted. was: I am counting how many records in my nested parquet file with this schema, {code} scala u1aTesting.printSchema root |-- profileId: long (nullable = true) |-- country: string (nullable = true) |-- data: array (nullable = true) ||-- element: struct (containsNull = true) |||-- videoId: long (nullable = true) |||-- date: long (nullable = true) |||-- label: double (nullable = true) |||-- weight: double (nullable = true) |||-- features: vector (nullable = true) {code} and the number of the records in the nested data array is around 10k, and each of the parquet file is around 600MB. The total size is around 120GB. I am doing a simple count {code} scala u1aTesting.count parquet.io.ParquetDecodingException: Can not read value at 100 in block 0 in file hdfs://compute-1.amazonaws.com:9000/users/dbtsai/testing/u1old/20150721/part-r-00115-d70c946b-b0f0-45fe-9965-b9f062b9ec6d.gz.parquet at parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:213) at parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:204) at org.apache.spark.sql.sources.SqlNewHadoopRDD$$anon$1.hasNext(SqlNewHadoopRDD.scala:163) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) at
[jira] [Updated] (SPARK-9442) java.lang.ArithmeticException: / by zero when reading Parquet
[ https://issues.apache.org/jira/browse/SPARK-9442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DB Tsai updated SPARK-9442: --- Description: I am counting how many records in my nested parquet file with this schema, {code} scala u1aTesting.printSchema root |-- profileId: long (nullable = true) |-- country: string (nullable = true) |-- data: array (nullable = true) ||-- element: struct (containsNull = true) |||-- videoId: long (nullable = true) |||-- date: long (nullable = true) |||-- label: double (nullable = true) |||-- weight: double (nullable = true) |||-- features: vector (nullable = true) {code} and the number of the records in the nested data array is around 10k, and each of the parquet file is around 600MB. The total size is around 120GB. I am doing a simple count {code} scala u1aTesting.count parquet.io.ParquetDecodingException: Can not read value at 100 in block 0 in file hdfs://compute-1.amazonaws.com:9000/users/dbtsai/testing/u1old/20150721/part-r-00115-d70c946b-b0f0-45fe-9965-b9f062b9ec6d.gz.parquet at parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:213) at parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:204) at org.apache.spark.sql.sources.SqlNewHadoopRDD$$anon$1.hasNext(SqlNewHadoopRDD.scala:163) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) at org.apache.spark.sql.execution.Aggregate$$anonfun$doExecute$1$$anonfun$6.apply(Aggregate.scala:129) at org.apache.spark.sql.execution.Aggregate$$anonfun$doExecute$1$$anonfun$6.apply(Aggregate.scala:126) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:686) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:686) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:70) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ArithmeticException: / by zero at parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:109) at parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:193) ... 21 more {code} BTW, no all the tasks fail, and some of them are successful. Another note: By explicitly looping through the data to count, it will works. {code} sqlContext.read.load(hdfsPath + s/testing/u1snappy/${date}/).map(x = 1L).reduce((x, y) = x + y) {code} I think maybe some metadata in parquet files are corrupted. was: I am counting how many records in my nested parquet file with this schema, {code} scala u1aTesting.printSchema root |-- profileId: long (nullable = true) |-- country: string (nullable = true) |-- data: array (nullable = true) ||-- element: struct (containsNull = true) |||-- videoId: long (nullable = true) |||-- date: long (nullable = true) |||-- label: double (nullable = true) |||-- weight: double (nullable = true) |||-- features: vector (nullable = true) {code} and the number of the records in the nested data array is around 10k, and each of the parquet file is around 600MB. The total size is around 120GB. I am doing a simple count {code} scala u1aTesting.count parquet.io.ParquetDecodingException: Can not read value at 100 in block 0 in file hdfs://compute-1.amazonaws.com:9000/users/dbtsai/testing/u1old/20150721/part-r-00115-d70c946b-b0f0-45fe-9965-b9f062b9ec6d.gz.parquet at parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:213) at parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:204) at org.apache.spark.sql.sources.SqlNewHadoopRDD$$anon$1.hasNext(SqlNewHadoopRDD.scala:163) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) at
[jira] [Updated] (SPARK-9442) java.lang.ArithmeticException: / by zero when reading Parquet
[ https://issues.apache.org/jira/browse/SPARK-9442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DB Tsai updated SPARK-9442: --- Description: I am counting how many records in my nested parquet file with this schema, {code} scala u1aTesting.printSchema root |-- profileId: long (nullable = true) |-- country: string (nullable = true) |-- data: array (nullable = true) ||-- element: struct (containsNull = true) |||-- videoId: long (nullable = true) |||-- date: long (nullable = true) |||-- label: double (nullable = true) |||-- weight: double (nullable = true) |||-- features: vector (nullable = true) {code} and the number of the records in the nested data array is around 10k, and each of the parquet file is around 600MB. The total size is around 120GB. I am doing a simple count {code} scala u1aTesting.count parquet.io.ParquetDecodingException: Can not read value at 100 in block 0 in file hdfs://compute-1.amazonaws.com:9000/users/dbtsai/testing/u1old/20150721/part-r-00115-d70c946b-b0f0-45fe-9965-b9f062b9ec6d.gz.parquet at parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:213) at parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:204) at org.apache.spark.sql.sources.SqlNewHadoopRDD$$anon$1.hasNext(SqlNewHadoopRDD.scala:163) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) at org.apache.spark.sql.execution.Aggregate$$anonfun$doExecute$1$$anonfun$6.apply(Aggregate.scala:129) at org.apache.spark.sql.execution.Aggregate$$anonfun$doExecute$1$$anonfun$6.apply(Aggregate.scala:126) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:686) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:686) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:70) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ArithmeticException: / by zero at parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:109) at parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:193) ... 21 more {code} java.lang.ArithmeticException: / by zero when reading Parquet - Key: SPARK-9442 URL: https://issues.apache.org/jira/browse/SPARK-9442 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.4.1 Reporter: DB Tsai I am counting how many records in my nested parquet file with this schema, {code} scala u1aTesting.printSchema root |-- profileId: long (nullable = true) |-- country: string (nullable = true) |-- data: array (nullable = true) ||-- element: struct (containsNull = true) |||-- videoId: long (nullable = true) |||-- date: long (nullable = true) |||-- label: double (nullable = true) |||-- weight: double (nullable = true) |||-- features: vector (nullable = true) {code} and the number of the records in the nested data array is around 10k, and each of the parquet file is around 600MB. The total size is around 120GB. I am doing a simple count {code} scala u1aTesting.count parquet.io.ParquetDecodingException: Can not read value at 100 in block 0 in file hdfs://compute-1.amazonaws.com:9000/users/dbtsai/testing/u1old/20150721/part-r-00115-d70c946b-b0f0-45fe-9965-b9f062b9ec6d.gz.parquet at parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:213) at parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:204) at org.apache.spark.sql.sources.SqlNewHadoopRDD$$anon$1.hasNext(SqlNewHadoopRDD.scala:163) at
[jira] [Updated] (SPARK-9442) java.lang.ArithmeticException: / by zero when reading Parquet
[ https://issues.apache.org/jira/browse/SPARK-9442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DB Tsai updated SPARK-9442: --- Description: I am counting how many records in my nested parquet file with this schema, {code} scala u1aTesting.printSchema root |-- profileId: long (nullable = true) |-- country: string (nullable = true) |-- data: array (nullable = true) ||-- element: struct (containsNull = true) |||-- videoId: long (nullable = true) |||-- date: long (nullable = true) |||-- label: double (nullable = true) |||-- weight: double (nullable = true) |||-- features: vector (nullable = true) {code} and the number of the records in the nested data array is around 10k, and each of the parquet file is around 600MB. The total size is around 120GB. I am doing a simple count {code} scala u1aTesting.count parquet.io.ParquetDecodingException: Can not read value at 100 in block 0 in file hdfs://compute-1.amazonaws.com:9000/users/dbtsai/testing/u1old/20150721/part-r-00115-d70c946b-b0f0-45fe-9965-b9f062b9ec6d.gz.parquet at parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:213) at parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:204) at org.apache.spark.sql.sources.SqlNewHadoopRDD$$anon$1.hasNext(SqlNewHadoopRDD.scala:163) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) at org.apache.spark.sql.execution.Aggregate$$anonfun$doExecute$1$$anonfun$6.apply(Aggregate.scala:129) at org.apache.spark.sql.execution.Aggregate$$anonfun$doExecute$1$$anonfun$6.apply(Aggregate.scala:126) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:686) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:686) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:70) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ArithmeticException: / by zero at parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:109) at parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:193) ... 21 more {code} BTW, no all the tasks fail, and some of them are successful. was: I am counting how many records in my nested parquet file with this schema, {code} scala u1aTesting.printSchema root |-- profileId: long (nullable = true) |-- country: string (nullable = true) |-- data: array (nullable = true) ||-- element: struct (containsNull = true) |||-- videoId: long (nullable = true) |||-- date: long (nullable = true) |||-- label: double (nullable = true) |||-- weight: double (nullable = true) |||-- features: vector (nullable = true) {code} and the number of the records in the nested data array is around 10k, and each of the parquet file is around 600MB. The total size is around 120GB. I am doing a simple count {code} scala u1aTesting.count parquet.io.ParquetDecodingException: Can not read value at 100 in block 0 in file hdfs://compute-1.amazonaws.com:9000/users/dbtsai/testing/u1old/20150721/part-r-00115-d70c946b-b0f0-45fe-9965-b9f062b9ec6d.gz.parquet at parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:213) at parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:204) at org.apache.spark.sql.sources.SqlNewHadoopRDD$$anon$1.hasNext(SqlNewHadoopRDD.scala:163) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) at org.apache.spark.sql.execution.Aggregate$$anonfun$doExecute$1$$anonfun$6.apply(Aggregate.scala:129) at