Re: NPE while reading ORC file using Spark 1.4 API

2015-09-08 Thread Davies Liu
I think this is fixed in 1.5 (release soon), by
https://github.com/apache/spark/pull/8407

On Tue, Sep 8, 2015 at 11:39 AM, unk1102  wrote:
> Hi I read many ORC files in Spark and process it those files are basically
> Hive partitions. Most of the times processing goes well but for few files I
> get the following exception dont know why? These files are working fine in
> Hive using Hive queries. Please guide. Thanks in advance.
>
> DataFrame df = hiveContext.read().format("orc").load("/path/in/hdfs");
>
> java.lang.NullPointerException
> at
> org.apache.spark.sql.hive.HiveInspectors$class.unwrapperFor(HiveInspectors.scala:402)
> at
> org.apache.spark.sql.hive.orc.OrcTableScan.unwrapperFor(OrcRelation.scala:206)
> at
> org.apache.spark.sql.hive.orc.OrcTableScan$$anonfun$8.apply(OrcRelation.scala:238)
> at
> org.apache.spark.sql.hive.orc.OrcTableScan$$anonfun$8.apply(OrcRelation.scala:238)
> at
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> at
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> at scala.collection.immutable.List.foreach(List.scala:318)
> at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
> at scala.collection.AbstractTraversable.map(Traversable.scala:105)
> at
> org.apache.spark.sql.hive.orc.OrcTableScan.org$apache$spark$sql$hive$orc$OrcTableScan$$fillObject(OrcRelation.scala:238)
> at
> org.apache.spark.sql.hive.orc.OrcTableScan$$anonfun$execute$3.apply(OrcRelation.scala:290)
> at
> org.apache.spark.sql.hive.orc.OrcTableScan$$anonfun$execute$3.apply(OrcRelation.scala:288)
> at
> org.apache.spark.rdd.HadoopRDD$HadoopMapPartitionsWithSplitRDD.compute(HadoopRDD.scala:380)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
> at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
> at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:87)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
> at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
> at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:242)
> at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
> at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
> at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
> at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
> at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
> at org.apache.spark.scheduler.Task.run(Task.scala:70)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
>
>
>
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/NPE-while-reading-ORC-file-using-Spark-1-4-API-tp24609.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: NPE while reading ORC file using Spark 1.4 API

2015-09-08 Thread Umesh Kacha
Hi Zhan, thanks for the reply. Yes schema should be same actually I am
reading Hive table partitions as ORC format into Spark. So I believe it
should be same. I am new to Hive so dont know if schema can be different in
Hive partitioned table.

On Wed, Sep 9, 2015 at 12:16 AM, Zhan Zhang  wrote:

> Does your directory includes some orc files that is not having same schema
> of the table? Please refer to
> https://issues.apache.org/jira/browse/SPARK-10304
>
> Thanks.
>
> Zhan Zhang
>
> On Sep 8, 2015, at 11:39 AM, unk1102  wrote:
>
> Hi I read many ORC files in Spark and process it those files are basically
> Hive partitions. Most of the times processing goes well but for few files I
> get the following exception dont know why? These files are working fine in
> Hive using Hive queries. Please guide. Thanks in advance.
>
> DataFrame df = hiveContext.read().format("orc").load("/path/in/hdfs");
>
> java.lang.NullPointerException
>at
>
> org.apache.spark.sql.hive.HiveInspectors$class.unwrapperFor(HiveInspectors.scala:402)
>at
>
> org.apache.spark.sql.hive.orc.OrcTableScan.unwrapperFor(OrcRelation.scala:206)
>at
>
> org.apache.spark.sql.hive.orc.OrcTableScan$$anonfun$8.apply(OrcRelation.scala:238)
>at
>
> org.apache.spark.sql.hive.orc.OrcTableScan$$anonfun$8.apply(OrcRelation.scala:238)
>at
>
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>at
>
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>at scala.collection.immutable.List.foreach(List.scala:318)
>at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>at scala.collection.AbstractTraversable.map(Traversable.scala:105)
>at
> org.apache.spark.sql.hive.orc.OrcTableScan.org
> $apache$spark$sql$hive$orc$OrcTableScan$$fillObject(OrcRelation.scala:238)
>at
>
> org.apache.spark.sql.hive.orc.OrcTableScan$$anonfun$execute$3.apply(OrcRelation.scala:290)
>at
>
> org.apache.spark.sql.hive.orc.OrcTableScan$$anonfun$execute$3.apply(OrcRelation.scala:288)
>at
>
> org.apache.spark.rdd.HadoopRDD$HadoopMapPartitionsWithSplitRDD.compute(HadoopRDD.scala:380)
>at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:87)
>at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
>at org.apache.spark.rdd.RDD.iterator(RDD.scala:242)
>at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
>at org.apache.spark.scheduler.Task.run(Task.scala:70)
>at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
>at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>at java.lang.Thread.run(Thread.java:744)
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/NPE-while-reading-ORC-file-using-Spark-1-4-API-tp24609.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>
>
>


NPE while reading ORC file using Spark 1.4 API

2015-09-08 Thread unk1102
Hi I read many ORC files in Spark and process it those files are basically
Hive partitions. Most of the times processing goes well but for few files I
get the following exception dont know why? These files are working fine in
Hive using Hive queries. Please guide. Thanks in advance.

DataFrame df = hiveContext.read().format("orc").load("/path/in/hdfs");

java.lang.NullPointerException
at
org.apache.spark.sql.hive.HiveInspectors$class.unwrapperFor(HiveInspectors.scala:402)
at
org.apache.spark.sql.hive.orc.OrcTableScan.unwrapperFor(OrcRelation.scala:206)
at
org.apache.spark.sql.hive.orc.OrcTableScan$$anonfun$8.apply(OrcRelation.scala:238)
at
org.apache.spark.sql.hive.orc.OrcTableScan$$anonfun$8.apply(OrcRelation.scala:238)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.immutable.List.foreach(List.scala:318)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.AbstractTraversable.map(Traversable.scala:105)
at
org.apache.spark.sql.hive.orc.OrcTableScan.org$apache$spark$sql$hive$orc$OrcTableScan$$fillObject(OrcRelation.scala:238)
at
org.apache.spark.sql.hive.orc.OrcTableScan$$anonfun$execute$3.apply(OrcRelation.scala:290)
at
org.apache.spark.sql.hive.orc.OrcTableScan$$anonfun$execute$3.apply(OrcRelation.scala:288)
at
org.apache.spark.rdd.HadoopRDD$HadoopMapPartitionsWithSplitRDD.compute(HadoopRDD.scala:380)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:87)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:242)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
at org.apache.spark.scheduler.Task.run(Task.scala:70)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/NPE-while-reading-ORC-file-using-Spark-1-4-API-tp24609.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org