RE: Build Spark 1.2.0-rc1 encounter exceptions when running HiveContext - Caused by: java.lang.ClassNotFoundException: com.esotericsoftware.shaded.org.objenesis.strategy.InstantiatorStrategy

2014-12-29 Thread Andrew Lee
Hi Patrick,
I manually hardcoded the hive version to 0.13.1a and it works. It turns out 
that for some reason, 0.13.1 is being picked up instead of the 0.13.1a version 
from maven.
So my solution was:hardcode the hive.version to 0.13.1a in my case since I am 
building it against hive 0.13 only, so the pom.xml was hardcoded with that 
version string, and the final JAR is working now with hive-exec 0.13.1a embed.
Possible Reason why it didn't work?I suspect our internal environment is 
picking up 0.13.1 since we do use our own maven repo as a proxy and caching.  
0.13.1a did appear in our own repo and it got replicated from the maven central 
repo, but during the build process, maven picked up 0.13.1 instead of 0.13.1a.

 Date: Wed, 10 Dec 2014 12:23:08 -0800
 Subject: Re: Build Spark 1.2.0-rc1 encounter exceptions when running 
 HiveContext - Caused by: java.lang.ClassNotFoundException: 
 com.esotericsoftware.shaded.org.objenesis.strategy.InstantiatorStrategy
 From: pwend...@gmail.com
 To: alee...@hotmail.com
 CC: dev@spark.apache.org
 
 Hi Andrew,
 
 It looks like somehow you are including jars from the upstream Apache
 Hive 0.13 project on your classpath. For Spark 1.2 Hive 0.13 support,
 we had to modify Hive to use a different version of Kryo that was
 compatible with Spark's Kryo version.
 
 https://github.com/pwendell/hive/commit/5b582f242946312e353cfce92fc3f3fa472aedf3
 
 I would look through the actual classpath and make sure you aren't
 including your own hive-exec jar somehow.
 
 - Patrick
 
 On Wed, Dec 10, 2014 at 9:48 AM, Andrew Lee alee...@hotmail.com wrote:
  Apologize for the format, somehow it got messed up and linefeed were 
  removed. Here's a reformatted version.
  Hi All,
  I tried to include necessary libraries in SPARK_CLASSPATH in spark-env.sh 
  to include auxiliaries JARs and datanucleus*.jars from Hive, however, when 
  I run HiveContext, it gives me the following error:
 
  Caused by: java.lang.ClassNotFoundException: 
  com.esotericsoftware.shaded.org.objenesis.strategy.InstantiatorStrategy
 
  I have checked the JARs with (jar tf), looks like this is already included 
  (shaded) in the assembly JAR (spark-assembly-1.2.0-hadoop2.4.1.jar) which 
  is configured in the System classpath already. I couldn't figure out what 
  is going on with the shading on the esotericsoftware JARs here.  Any help 
  is appreciated.
 
 
  How to reproduce the problem?
  Run the following 3 statements in spark-shell ( This is how I launched my 
  spark-shell. cd /opt/spark; ./bin/spark-shell --master yarn --deploy-mode 
  client --queue research --driver-memory 1024M)
 
  import org.apache.spark.SparkContext
  val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
  hiveContext.hql(CREATE TABLE IF NOT EXISTS spark_hive_test_table (key INT, 
  value STRING))
 
 
 
  A reference of my environment.
  Apache Hadoop 2.4.1
  Apache Hive 0.13.1
  Apache Spark branch-1.2 (installed under /opt/spark/, and config under 
  /etc/spark/)
  Maven build command:
 
  mvn -U -X -Phadoop-2.4 -Pyarn -Phive -Phive-0.13.1 -Dhadoop.version=2.4.1 
  -Dyarn.version=2.4.1 -Dhive.version=0.13.1 -DskipTests install
 
  Source Code commit label: eb4d457a870f7a281dc0267db72715cd00245e82
 
  My spark-env.sh have the following contents when I executed spark-shell:
  HADOOP_HOME=/opt/hadoop/
  HIVE_HOME=/opt/hive/
  HADOOP_CONF_DIR=/etc/hadoop/
  YARN_CONF_DIR=/etc/hadoop/
  HIVE_CONF_DIR=/etc/hive/
  HADOOP_SNAPPY_JAR=$(find $HADOOP_HOME/share/hadoop/common/lib/ -type f 
  -name snappy-java-*.jar)
  HADOOP_LZO_JAR=$(find $HADOOP_HOME/share/hadoop/common/lib/ -type f -name 
  hadoop-lzo-*.jar)
  SPARK_YARN_DIST_FILES=/user/spark/libs/spark-assembly-1.2.0-hadoop2.4.1.jar
  export JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH:$HADOOP_HOME/lib/native
  export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HADOOP_HOME/lib/native
  export SPARK_LIBRARY_PATH=$SPARK_LIBRARY_PATH:$HADOOP_HOME/lib/native
  export 
  SPARK_CLASSPATH=$SPARK_CLASSPATH:$HADOOP_SNAPPY_JAR:$HADOOP_LZO_JAR:$HIVE_CONF_DIR:/opt/hive/lib/datanucleus-api-jdo-3.2.6.jar:/opt/hive/lib/datanucleus-core-3.2.10.jar:/opt/hive/lib/datanucleus-rdbms-3.2.9.jar
 
 
  Here's what I see from my stack trace.
  warning: there were 1 deprecation warning(s); re-run with -deprecation for 
  details
  Hive history 
  file=/home/hive/log/alti-test-01/hive_job_log_b5db9539-4736-44b3-a601-04fa77cb6730_1220828461.txt
  java.lang.NoClassDefFoundError: 
  com/esotericsoftware/shaded/org/objenesis/strategy/InstantiatorStrategy
at 
  org.apache.hadoop.hive.ql.exec.Utilities.clinit(Utilities.java:925)
at 
  org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.validate(SemanticAnalyzer.java:9718)
at 
  org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.validate(SemanticAnalyzer.java:9712)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:434)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:322)
at org.apache.hadoop.hive.ql.Driver.compileInternal

RE: Build Spark 1.2.0-rc1 encounter exceptions when running HiveContext - Caused by: java.lang.ClassNotFoundException: com.esotericsoftware.shaded.org.objenesis.strategy.InstantiatorStrategy

2014-12-10 Thread Andrew Lee
Apologize for the format, somehow it got messed up and linefeed were removed. 
Here's a reformatted version.
Hi All,
I tried to include necessary libraries in SPARK_CLASSPATH in spark-env.sh to 
include auxiliaries JARs and datanucleus*.jars from Hive, however, when I run 
HiveContext, it gives me the following error:

Caused by: java.lang.ClassNotFoundException: 
com.esotericsoftware.shaded.org.objenesis.strategy.InstantiatorStrategy

I have checked the JARs with (jar tf), looks like this is already included 
(shaded) in the assembly JAR (spark-assembly-1.2.0-hadoop2.4.1.jar) which is 
configured in the System classpath already. I couldn't figure out what is going 
on with the shading on the esotericsoftware JARs here.  Any help is appreciated.


How to reproduce the problem?
Run the following 3 statements in spark-shell ( This is how I launched my 
spark-shell. cd /opt/spark; ./bin/spark-shell --master yarn --deploy-mode 
client --queue research --driver-memory 1024M)

import org.apache.spark.SparkContext
val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
hiveContext.hql(CREATE TABLE IF NOT EXISTS spark_hive_test_table (key INT, 
value STRING))



A reference of my environment.
Apache Hadoop 2.4.1
Apache Hive 0.13.1
Apache Spark branch-1.2 (installed under /opt/spark/, and config under 
/etc/spark/)
Maven build command:

mvn -U -X -Phadoop-2.4 -Pyarn -Phive -Phive-0.13.1 -Dhadoop.version=2.4.1 
-Dyarn.version=2.4.1 -Dhive.version=0.13.1 -DskipTests install

Source Code commit label: eb4d457a870f7a281dc0267db72715cd00245e82

My spark-env.sh have the following contents when I executed spark-shell:
 HADOOP_HOME=/opt/hadoop/
 HIVE_HOME=/opt/hive/
 HADOOP_CONF_DIR=/etc/hadoop/
 YARN_CONF_DIR=/etc/hadoop/
 HIVE_CONF_DIR=/etc/hive/
 HADOOP_SNAPPY_JAR=$(find $HADOOP_HOME/share/hadoop/common/lib/ -type f -name 
 snappy-java-*.jar)
 HADOOP_LZO_JAR=$(find $HADOOP_HOME/share/hadoop/common/lib/ -type f -name 
 hadoop-lzo-*.jar)
 SPARK_YARN_DIST_FILES=/user/spark/libs/spark-assembly-1.2.0-hadoop2.4.1.jar
 export JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH:$HADOOP_HOME/lib/native
 export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HADOOP_HOME/lib/native
 export SPARK_LIBRARY_PATH=$SPARK_LIBRARY_PATH:$HADOOP_HOME/lib/native
 export 
 SPARK_CLASSPATH=$SPARK_CLASSPATH:$HADOOP_SNAPPY_JAR:$HADOOP_LZO_JAR:$HIVE_CONF_DIR:/opt/hive/lib/datanucleus-api-jdo-3.2.6.jar:/opt/hive/lib/datanucleus-core-3.2.10.jar:/opt/hive/lib/datanucleus-rdbms-3.2.9.jar


 Here's what I see from my stack trace.
 warning: there were 1 deprecation warning(s); re-run with -deprecation for 
 details
 Hive history 
 file=/home/hive/log/alti-test-01/hive_job_log_b5db9539-4736-44b3-a601-04fa77cb6730_1220828461.txt
 java.lang.NoClassDefFoundError: 
 com/esotericsoftware/shaded/org/objenesis/strategy/InstantiatorStrategy
   at org.apache.hadoop.hive.ql.exec.Utilities.clinit(Utilities.java:925)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.validate(SemanticAnalyzer.java:9718)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.validate(SemanticAnalyzer.java:9712)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:434)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:322)
   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:975)
   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1040)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901)
   at org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:305)
   at 
 org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:276)
   at 
 org.apache.spark.sql.hive.execution.NativeCommand.sideEffectResult$lzycompute(NativeCommand.scala:35)
   at 
 org.apache.spark.sql.hive.execution.NativeCommand.sideEffectResult(NativeCommand.scala:35)
   at 
 org.apache.spark.sql.execution.Command$class.execute(commands.scala:46)
   at 
 org.apache.spark.sql.hive.execution.NativeCommand.execute(NativeCommand.scala:30)
   at 
 org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:425)
   at 
 org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:425)
   at 
 org.apache.spark.sql.SchemaRDDLike$class.$init$(SchemaRDDLike.scala:58)
   at org.apache.spark.sql.SchemaRDD.init(SchemaRDD.scala:108)
   at org.apache.spark.sql.hive.HiveContext.hiveql(HiveContext.scala:102)
   at org.apache.spark.sql.hive.HiveContext.hql(HiveContext.scala:106)
   at $iwC$$iwC$$iwC$$iwC.init(console:16)
   at $iwC$$iwC$$iwC.init(console:21)
   at $iwC$$iwC.init(console:23)
   at $iwC.init(console:25)
   at init(console:27)
   at .init(console:31)
   at .clinit(console)
   at .init(console:7)
   at .clinit(console)
   at $print(console)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 

Re: Build Spark 1.2.0-rc1 encounter exceptions when running HiveContext - Caused by: java.lang.ClassNotFoundException: com.esotericsoftware.shaded.org.objenesis.strategy.InstantiatorStrategy

2014-12-10 Thread Patrick Wendell
Hi Andrew,

It looks like somehow you are including jars from the upstream Apache
Hive 0.13 project on your classpath. For Spark 1.2 Hive 0.13 support,
we had to modify Hive to use a different version of Kryo that was
compatible with Spark's Kryo version.

https://github.com/pwendell/hive/commit/5b582f242946312e353cfce92fc3f3fa472aedf3

I would look through the actual classpath and make sure you aren't
including your own hive-exec jar somehow.

- Patrick

On Wed, Dec 10, 2014 at 9:48 AM, Andrew Lee alee...@hotmail.com wrote:
 Apologize for the format, somehow it got messed up and linefeed were removed. 
 Here's a reformatted version.
 Hi All,
 I tried to include necessary libraries in SPARK_CLASSPATH in spark-env.sh to 
 include auxiliaries JARs and datanucleus*.jars from Hive, however, when I run 
 HiveContext, it gives me the following error:

 Caused by: java.lang.ClassNotFoundException: 
 com.esotericsoftware.shaded.org.objenesis.strategy.InstantiatorStrategy

 I have checked the JARs with (jar tf), looks like this is already included 
 (shaded) in the assembly JAR (spark-assembly-1.2.0-hadoop2.4.1.jar) which is 
 configured in the System classpath already. I couldn't figure out what is 
 going on with the shading on the esotericsoftware JARs here.  Any help is 
 appreciated.


 How to reproduce the problem?
 Run the following 3 statements in spark-shell ( This is how I launched my 
 spark-shell. cd /opt/spark; ./bin/spark-shell --master yarn --deploy-mode 
 client --queue research --driver-memory 1024M)

 import org.apache.spark.SparkContext
 val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
 hiveContext.hql(CREATE TABLE IF NOT EXISTS spark_hive_test_table (key INT, 
 value STRING))



 A reference of my environment.
 Apache Hadoop 2.4.1
 Apache Hive 0.13.1
 Apache Spark branch-1.2 (installed under /opt/spark/, and config under 
 /etc/spark/)
 Maven build command:

 mvn -U -X -Phadoop-2.4 -Pyarn -Phive -Phive-0.13.1 -Dhadoop.version=2.4.1 
 -Dyarn.version=2.4.1 -Dhive.version=0.13.1 -DskipTests install

 Source Code commit label: eb4d457a870f7a281dc0267db72715cd00245e82

 My spark-env.sh have the following contents when I executed spark-shell:
 HADOOP_HOME=/opt/hadoop/
 HIVE_HOME=/opt/hive/
 HADOOP_CONF_DIR=/etc/hadoop/
 YARN_CONF_DIR=/etc/hadoop/
 HIVE_CONF_DIR=/etc/hive/
 HADOOP_SNAPPY_JAR=$(find $HADOOP_HOME/share/hadoop/common/lib/ -type f -name 
 snappy-java-*.jar)
 HADOOP_LZO_JAR=$(find $HADOOP_HOME/share/hadoop/common/lib/ -type f -name 
 hadoop-lzo-*.jar)
 SPARK_YARN_DIST_FILES=/user/spark/libs/spark-assembly-1.2.0-hadoop2.4.1.jar
 export JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH:$HADOOP_HOME/lib/native
 export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HADOOP_HOME/lib/native
 export SPARK_LIBRARY_PATH=$SPARK_LIBRARY_PATH:$HADOOP_HOME/lib/native
 export 
 SPARK_CLASSPATH=$SPARK_CLASSPATH:$HADOOP_SNAPPY_JAR:$HADOOP_LZO_JAR:$HIVE_CONF_DIR:/opt/hive/lib/datanucleus-api-jdo-3.2.6.jar:/opt/hive/lib/datanucleus-core-3.2.10.jar:/opt/hive/lib/datanucleus-rdbms-3.2.9.jar


 Here's what I see from my stack trace.
 warning: there were 1 deprecation warning(s); re-run with -deprecation for 
 details
 Hive history 
 file=/home/hive/log/alti-test-01/hive_job_log_b5db9539-4736-44b3-a601-04fa77cb6730_1220828461.txt
 java.lang.NoClassDefFoundError: 
 com/esotericsoftware/shaded/org/objenesis/strategy/InstantiatorStrategy
   at 
 org.apache.hadoop.hive.ql.exec.Utilities.clinit(Utilities.java:925)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.validate(SemanticAnalyzer.java:9718)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.validate(SemanticAnalyzer.java:9712)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:434)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:322)
   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:975)
   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1040)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901)
   at org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:305)
   at 
 org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:276)
   at 
 org.apache.spark.sql.hive.execution.NativeCommand.sideEffectResult$lzycompute(NativeCommand.scala:35)
   at 
 org.apache.spark.sql.hive.execution.NativeCommand.sideEffectResult(NativeCommand.scala:35)
   at 
 org.apache.spark.sql.execution.Command$class.execute(commands.scala:46)
   at 
 org.apache.spark.sql.hive.execution.NativeCommand.execute(NativeCommand.scala:30)
   at 
 org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:425)
   at 
 org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:425)
   at 
 org.apache.spark.sql.SchemaRDDLike$class.$init$(SchemaRDDLike.scala:58)
   at org.apache.spark.sql.SchemaRDD.init(SchemaRDD.scala:108)
   at