Re: [External] Re: Spark 1.6.0 HiveContext NPE

Ted Yu Wed, 03 Feb 2016 18:04:37 -0800

Create a pull request:
https://github.com/apache/spark/pull/11066


FYI

On Wed, Feb 3, 2016 at 1:27 PM, Shipper, Jay [USA] <[email protected]>
wrote:

> It was just renamed recently: https://github.com/apache/spark/pull/10981
>
> As SessionState is entirely managed by Spark’s code, it still seems like
> this is a bug with Spark 1.6.0, and not with how our application is using
> HiveContext.  But I’d feel more confident filing a bug if someone else
> could confirm they’re having this issue with Spark 1.6.0.  Ideally, we
> should also have some simple proof of concept that can be posted with the
> bug.
>
> From: Ted Yu <[email protected]>
> Date: Wednesday, February 3, 2016 at 3:57 PM
> To: Jay Shipper <[email protected]>
> Cc: "[email protected]" <[email protected]>
> Subject: Re: [External] Re: Spark 1.6.0 HiveContext NPE
>
> In ClientWrapper.scala, the SessionState.get().getConf call might have
> been executed ahead of SessionState.start(state) at line 194.
>
> This was the JIRA:
>
> [SPARK-10810] [SPARK-10902] [SQL] Improve session management in SQL
>
> In master branch, there is no more ClientWrapper.scala
>
> FYI
>
> On Wed, Feb 3, 2016 at 11:15 AM, Shipper, Jay [USA] <[email protected]>
> wrote:
>
>> One quick update on this: The NPE is not happening with Spark 1.5.2, so
>> this problem seems specific to Spark 1.6.0.
>>
>> From: Jay Shipper <[email protected]>
>> Date: Wednesday, February 3, 2016 at 12:06 PM
>> To: "[email protected]" <[email protected]>
>> Subject: Re: [External] Re: Spark 1.6.0 HiveContext NPE
>>
>> Right, I could already tell that from the stack trace and looking at
>> Spark’s code.  What I’m trying to determine is why that’s coming back as
>> null now, just from upgrading Spark to 1.6.0.
>>
>> From: Ted Yu <[email protected]>
>> Date: Wednesday, February 3, 2016 at 12:04 PM
>> To: Jay Shipper <[email protected]>
>> Cc: "[email protected]" <[email protected]>
>> Subject: [External] Re: Spark 1.6.0 HiveContext NPE
>>
>> Looks like the NPE came from this line:
>>   def conf: HiveConf = SessionState.get().getConf
>>
>> Meaning SessionState.get() returned null.
>>
>> On Wed, Feb 3, 2016 at 8:33 AM, Shipper, Jay [USA] <[email protected]>
>> wrote:
>>
>>> I’m upgrading an application from Spark 1.4.1 to Spark 1.6.0, and I’m
>>> getting a NullPointerException from HiveContext.  It’s happening while it
>>> tries to load some tables via JDBC from an external database (not Hive),
>>> using context.read().jdbc():
>>>
>>> —
>>> java.lang.NullPointerException
>>> at
>>> org.apache.spark.sql.hive.client.ClientWrapper.conf(ClientWrapper.scala:205)
>>> at
>>> org.apache.spark.sql.hive.HiveContext.hiveconf$lzycompute(HiveContext.scala:552)
>>> at org.apache.spark.sql.hive.HiveContext.hiveconf(HiveContext.scala:551)
>>> at
>>> org.apache.spark.sql.hive.HiveContext$$anonfun$configure$1.apply(HiveContext.scala:538)
>>> at
>>> org.apache.spark.sql.hive.HiveContext$$anonfun$configure$1.apply(HiveContext.scala:537)
>>> at
>>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>>> at
>>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>>> at scala.collection.immutable.List.foreach(List.scala:318)
>>> at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>>> at scala.collection.AbstractTraversable.map(Traversable.scala:105)
>>> at org.apache.spark.sql.hive.HiveContext.configure(HiveContext.scala:537)
>>> at
>>> org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:250)
>>> at
>>> org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:237)
>>> at
>>> org.apache.spark.sql.hive.HiveContext$$anon$2.<init>(HiveContext.scala:457)
>>> at
>>> org.apache.spark.sql.hive.HiveContext.catalog$lzycompute(HiveContext.scala:457)
>>> at org.apache.spark.sql.hive.HiveContext.catalog(HiveContext.scala:456)
>>> at
>>> org.apache.spark.sql.hive.HiveContext$$anon$3.<init>(HiveContext.scala:473)
>>> at
>>> org.apache.spark.sql.hive.HiveContext.analyzer$lzycompute(HiveContext.scala:473)
>>> at org.apache.spark.sql.hive.HiveContext.analyzer(HiveContext.scala:472)
>>> at
>>> org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:34)
>>> at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:133)
>>> at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:52)
>>> at
>>> org.apache.spark.sql.SQLContext.baseRelationToDataFrame(SQLContext.scala:442)
>>> at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:223)
>>> at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:146)
>>> —
>>>
>>> Even though the application is not using Hive, HiveContext is used
>>> instead of SQLContext, for the additional functionality it provides.
>>> There’s no hive-site.xml for the application, but this did not cause an
>>> issue for Spark 1.4.1.
>>>
>>> Does anyone have an idea about what’s changed from 1.4.1 to 1.6.0 that
>>> could explain this NPE?  The only obvious change I’ve noticed for
>>> HiveContext is that the default warehouse location is different (1.4.1 -
>>> current directory, 1.6.0 - /user/hive/warehouse), but I verified that this
>>> NPE happens even when /user/hive/warehouse exists and is readable/writeable
>>> for the application.  In terms of changes to the application to work with
>>> Spark 1.6.0, the only one that might be relevant to this issue is the
>>> upgrade in the Hadoop dependencies to match what Spark 1.6.0 uses
>>> (2.6.0-cdh5.7.0-SNAPSHOT).
>>>
>>> Thanks,
>>> Jay
>>>
>>
>>
>

Re: [External] Re: Spark 1.6.0 HiveContext NPE

Reply via email to