[jira] [Comment Edited] (SPARK-20880) When spark SQL is used with Avro-backed HIVE tables, NPE from org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.supportedCategories.

2019-10-22 Thread Benjamyn Ward (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-20880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957491#comment-16957491
 ] 

Benjamyn Ward edited comment on SPARK-20880 at 10/23/19 2:05 AM:
-

Gentle ping. While the description states that the issue is fixed in Hive 2.2, 
based on the Hive Jira, the issue was fixed in version 2.3.0.
 * https://issues.apache.org/jira/browse/HIVE-16175

I am also running into this issue. I am going to try to work around the issue 
by using the **extraClassPath** that includes Hive SerDe 2.3.x, but I'm not 
sure if this will work or not. A much better solution would be to upgrade 
Spark's library dependencies.


was (Author: errorsandglitches):
Gentle ping. While the description states that the issue is fixed in Hive 2.2, 
based on the Hive Jira, the issue was fixed in version 2.3.

* https://issues.apache.org/jira/browse/HIVE-16175

I am also running into this issue. I am going to try to work around the issue 
by using the **extraClassPath** that includes Hive SerDe 2.3.x, but I'm not 
sure if this will work or not. A much better solution would be to upgrade 
Spark's library dependencies.

> When spark SQL is used with  Avro-backed HIVE tables,  NPE from 
> org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.supportedCategories.
> 
>
> Key: SPARK-20880
> URL: https://issues.apache.org/jira/browse/SPARK-20880
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Vinod KC
>Priority: Minor
>
> When spark SQL is used with  Avro-backed HIVE tables,  intermittently getting 
> NPE from 
> org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.supportedCategories.
> Root cause is due race condition in hive 1.2.1 jar used in Spark SQL .
> In HIVE 2.2 this issue has been fixed (HIVE JIRA: 
> https://issues.apache.org/jira/browse/HIVE-16175. ), since  Spark is still 
> using Hive 1.2.1 jars we are still getting into race condition.
> One workaround  is to run Spark with a single task per executor, however it 
> will slow down the jobs. 
> Exception stack trace
> 13/05/07 09:18:39 WARN scheduler.TaskSetManager: Lost task 18.0 in stage 0.0 
> (TID 18, aiyhyashu.dxc.com): java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.supportedCategories(AvroObjectInspectorGenerator.java:142)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.createObjectInspectorWorker(AvroObjectInspectorGenerator.java:91)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.createObjectInspectorWorker(AvroObjectInspectorGenerator.java:104)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.createObjectInspectorWorker(AvroObjectInspectorGenerator.java:104)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.createObjectInspectorWorker(AvroObjectInspectorGenerator.java:120)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.createObjectInspector(AvroObjectInspectorGenerator.java:83)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.(AvroObjectInspectorGenerator.java:56)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroSerDe.initialize(AvroSerDe.java:124)
> at 
> org.apache.spark.sql.hive.HadoopTableReader$$anonfun$5$$anonfun$10.apply(TableReader.scala:251)
> at 
> org.apache.spark.sql.hive.HadoopTableReader$$anonfun$5$$anonfun$10.apply(TableReader.scala:239)
> at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:785)
> at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:785)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
> at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:105)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
> at 
> org.apache

[jira] [Commented] (SPARK-20880) When spark SQL is used with Avro-backed HIVE tables, NPE from org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.supportedCategories.

2019-10-22 Thread Benjamyn Ward (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-20880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957491#comment-16957491
 ] 

Benjamyn Ward commented on SPARK-20880:
---

Gentle ping. While the description states that the issue is fixed in Hive 2.2, 
based on the Hive Jira, the issue was fixed in version 2.3.

* https://issues.apache.org/jira/browse/HIVE-16175

I am also running into this issue. I am going to try to work around the issue 
by using the **extraClassPath** that includes Hive SerDe 2.3.x, but I'm not 
sure if this will work or not. A much better solution would be to upgrade 
Spark's library dependencies.

> When spark SQL is used with  Avro-backed HIVE tables,  NPE from 
> org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.supportedCategories.
> 
>
> Key: SPARK-20880
> URL: https://issues.apache.org/jira/browse/SPARK-20880
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Vinod KC
>Priority: Minor
>
> When spark SQL is used with  Avro-backed HIVE tables,  intermittently getting 
> NPE from 
> org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.supportedCategories.
> Root cause is due race condition in hive 1.2.1 jar used in Spark SQL .
> In HIVE 2.2 this issue has been fixed (HIVE JIRA: 
> https://issues.apache.org/jira/browse/HIVE-16175. ), since  Spark is still 
> using Hive 1.2.1 jars we are still getting into race condition.
> One workaround  is to run Spark with a single task per executor, however it 
> will slow down the jobs. 
> Exception stack trace
> 13/05/07 09:18:39 WARN scheduler.TaskSetManager: Lost task 18.0 in stage 0.0 
> (TID 18, aiyhyashu.dxc.com): java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.supportedCategories(AvroObjectInspectorGenerator.java:142)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.createObjectInspectorWorker(AvroObjectInspectorGenerator.java:91)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.createObjectInspectorWorker(AvroObjectInspectorGenerator.java:104)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.createObjectInspectorWorker(AvroObjectInspectorGenerator.java:104)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.createObjectInspectorWorker(AvroObjectInspectorGenerator.java:120)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.createObjectInspector(AvroObjectInspectorGenerator.java:83)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.(AvroObjectInspectorGenerator.java:56)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroSerDe.initialize(AvroSerDe.java:124)
> at 
> org.apache.spark.sql.hive.HadoopTableReader$$anonfun$5$$anonfun$10.apply(TableReader.scala:251)
> at 
> org.apache.spark.sql.hive.HadoopTableReader$$anonfun$5$$anonfun$10.apply(TableReader.scala:239)
> at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:785)
> at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:785)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
> at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:105)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache