Hi Robert,

I'm glad you've found a solution that works for you.

An attempt to answer your questions:


*1. What is the difference between "--jars" and*
*"spark.driver.extraClassPath"? What does each one do?*
As I understand it, the 'extraClassPath' setting makes the JARs available
to Spark's bootstrap class loader, whereas the '--jars' only has visibility
to the runtime classloader. There were a number of issues with JDBC drivers
not loading properly when provided via '--jars', however it's possible
that's been fixed in more recent versions of Spark. The workaround is to
ensure those classes are available to the bootstrap class loader via
'extraClassPath'.


*2. How do I replicate the "--jars" switch in a config file?*
Again, as I understand it, the '--jars' option is a run-time flag passed to
the spark-submit application. You may have some luck with a wrapper script,
but I'm not sure there's a config file setting for it.

*3. Is the "--jars" solution stable, or will some minor tweak break it?*



*Given that this config issue seems to be close to voodoo, I'm hesitantto
commit to Phoenix, even though I can get it working, until I'mconfident
that we'll be able to continue doing so.*
You're definitely in uncharted territory here, and I suggest you continue
to follow up with your vendor. However, if you're able to successfully load
and save data across a few nodes, you're probably in the clear.

*4. What is the metastore, and what are the errors its showing? Does*

*that have to do with Phoenix? Does Phoenix somehow interfere withSLF4J?*

The phoenix-client.jar provided in that version has some upstream
dependency conflicts when Spark has a HiveContext (which is default for
HDP), causing those runtime errors. The phoenix-client-spark JAR has those
conflicting dependencies stripped out, and in Phoenix 4.8.0 the regular
client JAR is properly shaded to avoid any of those issues going forward.

Good luck,

Josh


On Tue, Jul 5, 2016 at 6:46 PM, Robert James <srobertja...@gmail.com> wrote:

> I've found a (rather perplexing) partial solution.
>
> If I leave the spark.driver.extraClassPath out completely, and instead
> do "spark-shell --jars
> /usr/hdp/current/phoenix-client/phoenix-client.jar", it seems to work
> perfectly! Note that the jar there is the phoenix-client.jar as
> shipped with HDP (is that a backport of a jar made in a later version
> of Phoenix)?
>
> Note that I haven't been able to get the "--jars" method to work by
> specifying other jars, only that one.
>
> What's perplexing is that if I use the exact same jar in the
> spark.driver.extraClassPath config directive (only, no
> executorExtraClassPath), I get errors about the metastore (see below
> for the trace).
>
> This raises several questions:
> 1. What is the difference between "--jars" and
> "spark.driver.extraClassPath"? What does each one do?
> 2. How do I replicate the "--jars" switch in a config file?
> 3. Is the "--jars" solution stable, or will some minor tweak break it?
> Given that this config issue seems to be close to voodoo, I'm hesitant
> to commit to Phoenix, even though I can get it working, until I'm
> confident that we'll be able to continue doing so.
> 4. What is the metastore, and what are the errors its showing? Does
> that have to do with Phoenix? Does Phoenix somehow interfere with
> SLF4J?
>
> Here are the errors when I set the spark.driver.extraClassPath to
> /usr/hdp/current/phoenix-client/phoenix-client.jar.  Note that we get
> them just running spark-shell, even before entering any code.
>
> [root@sandbox ~]# spark-shell
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in
> [jar:file:/usr/hdp/2.4.0.0-169/phoenix/phoenix-4.4.0.2.4.0.0-169
> -client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in
>
> [jar:file:/usr/hdp/2.4.0.0-169/spark/lib/spark-assembly-1.6.0.2.4.0.0-169-hadoop2.7.1.2.4.0.0-169.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> explanation.
> 16/07/05 21:56:42 WARN NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where
> applicable
> ....
> 16/07/05 21:57:18 INFO HiveMetaStore: Added admin role in metastore
> 16/07/05 21:57:18 INFO HiveMetaStore: Added public role in metastore
> 16/07/05 21:57:18 WARN Hive: Failed to access metastore. This class
> should not accessed in runtime.
> org.apache.hadoop.hive.ql.metadata.HiveException:
> java.lang.RuntimeException: Unable to instantiate
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
>         at
> org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1236)
>         at
> org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:174)
>         at org.apache.hadoop.hive.ql.metadata.Hive.<clinit>(Hive.java:166)
>         at
> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:503)
> ...
>         ... 73 more
> ...
> <console>:16: error: not found: value sqlContext
>          import sqlContext.implicits._
>                 ^
> <console>:16: error: not found: value sqlContext
>          import sqlContext.sql
>
>
>
>
> On 7/5/16, Josh Mahonin <jmaho...@gmail.com> wrote:
> > Hi Robert,
> >
> > I recommend following up with HDP on this issue.
> >
> > The underlying problem is that the 'phoenix-spark-4.4.0.2.4.0.0-169.jar'
> > they've provided isn't actually a fat client JAR, it's missing many of
> the
> > required dependencies. They might be able to provide the correct JAR for
> > you, but you'd have to check with them. It may also be possible for you
> to
> > manually include all of the necessary JARs on the Spark classpath to
> mimic
> > the fat jar, but that's fairly ugly and time consuming.
> >
> > FWIW, the HDP 2.5 Tech Preview seems to include the correct JAR, though I
> > haven't personally tested it out yet.
> >
> > Good luck,
> >
> > Josh
> >
> > On Tue, Jul 5, 2016 at 2:00 AM, Robert James <srobertja...@gmail.com>
> > wrote:
> >
> >> I'm trying to use Phoenix on Spark, and can't get around this error:
> >>
> >> java.lang.NoClassDefFoundError:
> >> org/apache/hadoop/hbase/HBaseConfiguration
> >>         at
> >>
> org.apache.phoenix.spark.PhoenixRDD.getPhoenixConfiguration(PhoenixRDD.scala:82)
> >>
> >> DETAILS:
> >> 1. I'm running HDP 2.4.0.0-169
> >> 2. Using phoenix-sqlline, I can access Phoenix perfectly
> >> 3. Using hbase shell, I can access HBase perfectly
> >> 4. I added the following lines to /etc/spark/conf/spark-defaults.conf
> >>
> >> spark.driver.extraClassPath
> >> /usr/hdp/current/phoenix-client/lib/phoenix-spark-4.4.0.2.4.0.0-169.jar
> >> spark.executor.extraClassPath
> >> /usr/hdp/current/phoenix-client/lib/phoenix-spark-4.4.0.2.4.0.0-169.jar
> >>
> >> 5. Steps to reproduce the error:
> >> # spark-shell
> >> ...
> >> scala> import org.apache.phoenix.spark._
> >> import org.apache.phoenix.spark._
> >>
> >> scala> sqlContext.load("org.apache.phoenix.spark", Map("table" ->
> >> "EMAIL_ENRON", "zkUrl" -> "localhost:2181"))
> >> warning: there were 1 deprecation warning(s); re-run with -deprecation
> >> for details
> >> java.lang.NoClassDefFoundError:
> >> org/apache/hadoop/hbase/HBaseConfiguration
> >>         at
> >>
> org.apache.phoenix.spark.PhoenixRDD.getPhoenixConfiguration(PhoenixRDD.scala:82)
> >>
> >> // Or, this gets the same error
> >> scala> val rdd = sc.phoenixTableAsRDD("EMAIL_ENRON", Seq("MAIL_FROM",
> >> "MAIL_TO"), zkUrl=Some("localhost"))
> >> java.lang.NoClassDefFoundError:
> >> org/apache/hadoop/hbase/HBaseConfiguration
> >>         at
> >>
> org.apache.phoenix.spark.PhoenixRDD.getPhoenixConfiguration(PhoenixRDD.scala:82)
> >>         at
> >>
> org.apache.phoenix.spark.PhoenixRDD.phoenixConf$lzycompute(PhoenixRDD.scala:38)
> >>
> >> 6. I've tried every permutation I can think of, and also spent hours
> >> Googling.  Some times I can get different errors, but always errors.
> >> Interestingly, if I manage to load the HBaseConfiguration class
> >> manually (by specifying classpaths and then import), I get a
> >> "phoenixTableAsRDD is not a member of SparkContext" error.
> >>
> >> How can I use Phoenix from within Spark?  I'm really eager to do so,
> >> but haven't been able to.
> >>
> >> Also: Can someone give me some background on the underlying issues
> >> here? Trial-and-error-plus-google is not exactly high quality
> >> engineering; I'd like to understand the problem better.
> >>
> >
>

Reply via email to