I am trying to get Spark up and running with Phoenix, but the installation 
instructions are not clear to me, or there is something else wrong. I'm using 
Spark 1.5.2, HBase 1.1.2 and Phoenix 4.6.0 with a standalone install (no HDFS 
or cluster) with Debian Linux 8 (Jessie) x64. I'm also using Java 1.8.0_40.

The instructions state:

1.       Ensure that all requisite Phoenix / HBase platform dependencies are 
available on the classpath for the Spark executors and drivers

2.       One method is to add the phoenix-4.4.0-client.jar to 'SPARK_CLASSPATH' 
in spark-env.sh, or setting both 'spark.executor.extraClassPath' and 
'spark.driver.extraClassPath' in spark-defaults.conf

First off, what are "all requisite Phoenix / HBase platform dependencies"? #2 
suggests that all I need to do is add  'phoenix-4.6.0-HBase-1.1-client.jar' to 
Spark's class path. But what about 'phoenix-spark-4.6.0-HBase-1.1.jar' or 
'phoenix-core-4.6.0-HBase-1.1.jar'? Do either of these (or anything else) need 
to be added to Spark's class path?

Secondly, if I follow the instructions exactly, and add only 
'phoenix-4.6.0-HBase-1.1-client.jar' to 'spark-defaults.conf':
spark.executor.extraClassPath   
/usr/local/phoenix/phoenix-4.6.0-HBase-1.1-client.jar
spark.driver.extraClassPath     
/usr/local/phoenix/phoenix-4.6.0-HBase-1.1-client.jar
Then I get the following error when starting the interactive Spark shell with 
'spark-shell':
15/12/08 18:38:05 WARN ObjectStore: Version information not found in metastore. 
hive.metastore.schema.verification is not enabled so recording the schema 
version 1.2.0
15/12/08 18:38:05 WARN ObjectStore: Failed to get database default, returning 
NoSuchObjectException
15/12/08 18:38:05 WARN Hive: Failed to access metastore. This class should not 
accessed in runtime.
org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: 
Unable to instantiate 
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
                at 
org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1236)
...

<console>:10: error: not found: value sqlContext
       import sqlContext.implicits._
              ^
<console>:10: error: not found: value sqlContext
       import sqlContext.sql

On the other hand, if I include all three of the aforementioned JARs, I get the 
same error. However, if I include only the 'phoenix-spark-4.6.0-HBase-1.1.jar', 
spark-shell seems so launch without error. Nevertheless, if I then try the 
simple tutorial commands in spark-shell, I get the following:
Spark output: SQL context available as sqlContext.

scala >> import org.apache.spark.SparkContext
import org.apache.spark.sql.SQLContext
import org.apache.phoenix.spark._

                                val sqlContext = new SQLContext(sc)

                                val df = 
sqlContext.load("org.apache.phoenix.spark", Map("table" -> "TABLE1", "zkUrl" -> 
"phoenix-server:2181")

                Spark error:
                                java.lang.NoClassDefFoundError: 
org/apache/hadoop/hbase/HBaseConfiguration
                at 
org.apache.phoenix.spark.PhoenixRDD.getPhoenixConfiguration(PhoenixRDD.scala:71)
                at 
org.apache.phoenix.spark.PhoenixRDD.phoenixConf$lzycompute(PhoenixRDD.scala:39)
                at 
org.apache.phoenix.spark.PhoenixRDD.phoenixConf(PhoenixRDD.scala:38)
                at 
org.apache.phoenix.spark.PhoenixRDD.<init>(PhoenixRDD.scala:42)
                at 
org.apache.phoenix.spark.PhoenixRelation.schema(PhoenixRelation.scala:50)
                at 
org.apache.spark.sql.execution.datasources.LogicalRelation.<init>(LogicalRelation.scala:37)
                at 
org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:120)

This final error seems similar to the one in mailing list post Phoenix-spark : 
NoClassDefFoundError: 
HBaseConfiguration<http://mail-archives.apache.org/mod_mbox/phoenix-user/201511.mbox/ajax/%3CCAKwwsRSEJHkotiF28kzumDZM6kgBVeTJNGUoJnZcLiuEGCTjHQ%40mail.gmail.com%3E>
 < 
http://mail-archives.apache.org/mod_mbox/phoenix-user/201511.mbox/ajax/%3CCAKwwsRSEJHkotiF28kzumDZM6kgBVeTJNGUoJnZcLiuEGCTjHQ%40mail.gmail.com%3E>.
 But the question does not seem to have been answered satisfactory. Also note, 
if I include all three JARs, as he did, I get an error when launching 
spark-shell.

Can you please clarify what is the proper way to install and configure Phoenix 
with Spark?

Sincerely,
Jonathan

Reply via email to