Create a pull request: https://github.com/apache/spark/pull/11066
FYI On Wed, Feb 3, 2016 at 1:27 PM, Shipper, Jay [USA] <[email protected]> wrote: > It was just renamed recently: https://github.com/apache/spark/pull/10981 > > As SessionState is entirely managed by Spark’s code, it still seems like > this is a bug with Spark 1.6.0, and not with how our application is using > HiveContext. But I’d feel more confident filing a bug if someone else > could confirm they’re having this issue with Spark 1.6.0. Ideally, we > should also have some simple proof of concept that can be posted with the > bug. > > From: Ted Yu <[email protected]> > Date: Wednesday, February 3, 2016 at 3:57 PM > To: Jay Shipper <[email protected]> > Cc: "[email protected]" <[email protected]> > Subject: Re: [External] Re: Spark 1.6.0 HiveContext NPE > > In ClientWrapper.scala, the SessionState.get().getConf call might have > been executed ahead of SessionState.start(state) at line 194. > > This was the JIRA: > > [SPARK-10810] [SPARK-10902] [SQL] Improve session management in SQL > > In master branch, there is no more ClientWrapper.scala > > FYI > > On Wed, Feb 3, 2016 at 11:15 AM, Shipper, Jay [USA] <[email protected]> > wrote: > >> One quick update on this: The NPE is not happening with Spark 1.5.2, so >> this problem seems specific to Spark 1.6.0. >> >> From: Jay Shipper <[email protected]> >> Date: Wednesday, February 3, 2016 at 12:06 PM >> To: "[email protected]" <[email protected]> >> Subject: Re: [External] Re: Spark 1.6.0 HiveContext NPE >> >> Right, I could already tell that from the stack trace and looking at >> Spark’s code. What I’m trying to determine is why that’s coming back as >> null now, just from upgrading Spark to 1.6.0. >> >> From: Ted Yu <[email protected]> >> Date: Wednesday, February 3, 2016 at 12:04 PM >> To: Jay Shipper <[email protected]> >> Cc: "[email protected]" <[email protected]> >> Subject: [External] Re: Spark 1.6.0 HiveContext NPE >> >> Looks like the NPE came from this line: >> def conf: HiveConf = SessionState.get().getConf >> >> Meaning SessionState.get() returned null. >> >> On Wed, Feb 3, 2016 at 8:33 AM, Shipper, Jay [USA] <[email protected]> >> wrote: >> >>> I’m upgrading an application from Spark 1.4.1 to Spark 1.6.0, and I’m >>> getting a NullPointerException from HiveContext. It’s happening while it >>> tries to load some tables via JDBC from an external database (not Hive), >>> using context.read().jdbc(): >>> >>> — >>> java.lang.NullPointerException >>> at >>> org.apache.spark.sql.hive.client.ClientWrapper.conf(ClientWrapper.scala:205) >>> at >>> org.apache.spark.sql.hive.HiveContext.hiveconf$lzycompute(HiveContext.scala:552) >>> at org.apache.spark.sql.hive.HiveContext.hiveconf(HiveContext.scala:551) >>> at >>> org.apache.spark.sql.hive.HiveContext$$anonfun$configure$1.apply(HiveContext.scala:538) >>> at >>> org.apache.spark.sql.hive.HiveContext$$anonfun$configure$1.apply(HiveContext.scala:537) >>> at >>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) >>> at >>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) >>> at scala.collection.immutable.List.foreach(List.scala:318) >>> at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) >>> at scala.collection.AbstractTraversable.map(Traversable.scala:105) >>> at org.apache.spark.sql.hive.HiveContext.configure(HiveContext.scala:537) >>> at >>> org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:250) >>> at >>> org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:237) >>> at >>> org.apache.spark.sql.hive.HiveContext$$anon$2.<init>(HiveContext.scala:457) >>> at >>> org.apache.spark.sql.hive.HiveContext.catalog$lzycompute(HiveContext.scala:457) >>> at org.apache.spark.sql.hive.HiveContext.catalog(HiveContext.scala:456) >>> at >>> org.apache.spark.sql.hive.HiveContext$$anon$3.<init>(HiveContext.scala:473) >>> at >>> org.apache.spark.sql.hive.HiveContext.analyzer$lzycompute(HiveContext.scala:473) >>> at org.apache.spark.sql.hive.HiveContext.analyzer(HiveContext.scala:472) >>> at >>> org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:34) >>> at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:133) >>> at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:52) >>> at >>> org.apache.spark.sql.SQLContext.baseRelationToDataFrame(SQLContext.scala:442) >>> at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:223) >>> at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:146) >>> — >>> >>> Even though the application is not using Hive, HiveContext is used >>> instead of SQLContext, for the additional functionality it provides. >>> There’s no hive-site.xml for the application, but this did not cause an >>> issue for Spark 1.4.1. >>> >>> Does anyone have an idea about what’s changed from 1.4.1 to 1.6.0 that >>> could explain this NPE? The only obvious change I’ve noticed for >>> HiveContext is that the default warehouse location is different (1.4.1 - >>> current directory, 1.6.0 - /user/hive/warehouse), but I verified that this >>> NPE happens even when /user/hive/warehouse exists and is readable/writeable >>> for the application. In terms of changes to the application to work with >>> Spark 1.6.0, the only one that might be relevant to this issue is the >>> upgrade in the Hadoop dependencies to match what Spark 1.6.0 uses >>> (2.6.0-cdh5.7.0-SNAPSHOT). >>> >>> Thanks, >>> Jay >>> >> >> >
