Re: NPE while reading ORC file using Spark 1.4 API
I think this is fixed in 1.5 (release soon), by https://github.com/apache/spark/pull/8407 On Tue, Sep 8, 2015 at 11:39 AM, unk1102 wrote: > Hi I read many ORC files in Spark and process it those files are basically > Hive partitions. Most of the times processing goes well but for few files I > get the following exception dont know why? These files are working fine in > Hive using Hive queries. Please guide. Thanks in advance. > > DataFrame df = hiveContext.read().format("orc").load("/path/in/hdfs"); > > java.lang.NullPointerException > at > org.apache.spark.sql.hive.HiveInspectors$class.unwrapperFor(HiveInspectors.scala:402) > at > org.apache.spark.sql.hive.orc.OrcTableScan.unwrapperFor(OrcRelation.scala:206) > at > org.apache.spark.sql.hive.orc.OrcTableScan$$anonfun$8.apply(OrcRelation.scala:238) > at > org.apache.spark.sql.hive.orc.OrcTableScan$$anonfun$8.apply(OrcRelation.scala:238) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at scala.collection.immutable.List.foreach(List.scala:318) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) > at scala.collection.AbstractTraversable.map(Traversable.scala:105) > at > org.apache.spark.sql.hive.orc.OrcTableScan.org$apache$spark$sql$hive$orc$OrcTableScan$$fillObject(OrcRelation.scala:238) > at > org.apache.spark.sql.hive.orc.OrcTableScan$$anonfun$execute$3.apply(OrcRelation.scala:290) > at > org.apache.spark.sql.hive.orc.OrcTableScan$$anonfun$execute$3.apply(OrcRelation.scala:288) > at > org.apache.spark.rdd.HadoopRDD$HadoopMapPartitionsWithSplitRDD.compute(HadoopRDD.scala:380) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) > at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:87) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) > at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:242) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63) > at org.apache.spark.scheduler.Task.run(Task.scala:70) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/NPE-while-reading-ORC-file-using-Spark-1-4-API-tp24609.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: NPE while reading ORC file using Spark 1.4 API
Hi Zhan, thanks for the reply. Yes schema should be same actually I am reading Hive table partitions as ORC format into Spark. So I believe it should be same. I am new to Hive so dont know if schema can be different in Hive partitioned table. On Wed, Sep 9, 2015 at 12:16 AM, Zhan Zhang wrote: > Does your directory includes some orc files that is not having same schema > of the table? Please refer to > https://issues.apache.org/jira/browse/SPARK-10304 > > Thanks. > > Zhan Zhang > > On Sep 8, 2015, at 11:39 AM, unk1102 wrote: > > Hi I read many ORC files in Spark and process it those files are basically > Hive partitions. Most of the times processing goes well but for few files I > get the following exception dont know why? These files are working fine in > Hive using Hive queries. Please guide. Thanks in advance. > > DataFrame df = hiveContext.read().format("orc").load("/path/in/hdfs"); > > java.lang.NullPointerException >at > > org.apache.spark.sql.hive.HiveInspectors$class.unwrapperFor(HiveInspectors.scala:402) >at > > org.apache.spark.sql.hive.orc.OrcTableScan.unwrapperFor(OrcRelation.scala:206) >at > > org.apache.spark.sql.hive.orc.OrcTableScan$$anonfun$8.apply(OrcRelation.scala:238) >at > > org.apache.spark.sql.hive.orc.OrcTableScan$$anonfun$8.apply(OrcRelation.scala:238) >at > > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) >at > > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) >at scala.collection.immutable.List.foreach(List.scala:318) >at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) >at scala.collection.AbstractTraversable.map(Traversable.scala:105) >at > org.apache.spark.sql.hive.orc.OrcTableScan.org > $apache$spark$sql$hive$orc$OrcTableScan$$fillObject(OrcRelation.scala:238) >at > > org.apache.spark.sql.hive.orc.OrcTableScan$$anonfun$execute$3.apply(OrcRelation.scala:290) >at > > org.apache.spark.sql.hive.orc.OrcTableScan$$anonfun$execute$3.apply(OrcRelation.scala:288) >at > > org.apache.spark.rdd.HadoopRDD$HadoopMapPartitionsWithSplitRDD.compute(HadoopRDD.scala:380) >at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) >at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) >at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) >at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) >at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) >at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:87) >at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) >at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) >at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) >at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) >at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69) >at org.apache.spark.rdd.RDD.iterator(RDD.scala:242) >at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) >at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) >at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) >at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) >at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) >at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) >at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) >at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) >at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) >at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) >at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) >at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) >at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) >at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) >at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) >at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63) >at org.apache.spark.scheduler.Task.run(Task.scala:70) >at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) >at > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >at > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >at java.lang.Thread.run(Thread.java:744) > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/NPE-while-reading-ORC-file-using-Spark-1-4-API-tp24609.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > > >
NPE while reading ORC file using Spark 1.4 API
Hi I read many ORC files in Spark and process it those files are basically Hive partitions. Most of the times processing goes well but for few files I get the following exception dont know why? These files are working fine in Hive using Hive queries. Please guide. Thanks in advance. DataFrame df = hiveContext.read().format("orc").load("/path/in/hdfs"); java.lang.NullPointerException at org.apache.spark.sql.hive.HiveInspectors$class.unwrapperFor(HiveInspectors.scala:402) at org.apache.spark.sql.hive.orc.OrcTableScan.unwrapperFor(OrcRelation.scala:206) at org.apache.spark.sql.hive.orc.OrcTableScan$$anonfun$8.apply(OrcRelation.scala:238) at org.apache.spark.sql.hive.orc.OrcTableScan$$anonfun$8.apply(OrcRelation.scala:238) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.immutable.List.foreach(List.scala:318) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at org.apache.spark.sql.hive.orc.OrcTableScan.org$apache$spark$sql$hive$orc$OrcTableScan$$fillObject(OrcRelation.scala:238) at org.apache.spark.sql.hive.orc.OrcTableScan$$anonfun$execute$3.apply(OrcRelation.scala:290) at org.apache.spark.sql.hive.orc.OrcTableScan$$anonfun$execute$3.apply(OrcRelation.scala:288) at org.apache.spark.rdd.HadoopRDD$HadoopMapPartitionsWithSplitRDD.compute(HadoopRDD.scala:380) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:87) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69) at org.apache.spark.rdd.RDD.iterator(RDD.scala:242) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63) at org.apache.spark.scheduler.Task.run(Task.scala:70) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/NPE-while-reading-ORC-file-using-Spark-1-4-API-tp24609.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org