Ok. I modified as per your suggestions export SPARK_HOME=/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4 export SPARK_JAR=$SPARK_HOME/lib/spark-assembly-1.3.0-hadoop2.4.0.jar export HADOOP_CONF_DIR=/apache/hadoop/conf
cd $SPARK_HOME ./bin/spark-sql -v --driver-class-path /apache/hadoop/share/hadoop/common/hadoop-common-2.4.1-EBAY-2.jar:/apache/hadoop/lib/hadoop-lzo-0.6.0.jar:/apache/hadoop-2.4.1-2.1.3.0-2-EBAY/share/hadoop/yarn/lib/guava-11.0.2.jar:/home/dvasthimal/spark1.3/spark-avro_2.10-1.0.0.jar:/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.6.jar:/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-core-3.2.10.jar:/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-rdbms-3.2.9.jar:/home/dvasthimal/spark1.3/mysql-connector-java-5.1.34.jar spark-sql prompt . I ran show tables , desc dw_bid. Each throw below exception. spark-sql> desc dw_bid; 15/03/26 23:10:14 WARN conf.HiveConf: DEPRECATED: Configuration property hive.metastore.local no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are connecting to a remote metastore. 15/03/26 23:10:14 WARN conf.HiveConf: DEPRECATED: hive.metastore.ds.retry.* no longer has any effect. Use hive.hmshandler.retry.* instead 15/03/26 23:10:14 INFO parse.ParseDriver: Parsing command: desc dw_bid 15/03/26 23:10:14 INFO parse.ParseDriver: Parse Completed 15/03/26 23:10:15 INFO metastore.HiveMetaStore: 0: get_table : db=default tbl=dw_bid 15/03/26 23:10:15 INFO HiveMetaStore.audit: ugi=dvasthi...@corp.ebay.com ip=unknown-ip-addr cmd=get_table : db=default tbl=dw_bid 15/03/26 23:10:15 INFO spark.SparkContext: Starting job: collect at SparkPlan.scala:83 15/03/26 23:10:15 INFO scheduler.DAGScheduler: Got job 0 (collect at SparkPlan.scala:83) with 1 output partitions (allowLocal=false) 15/03/26 23:10:15 INFO scheduler.DAGScheduler: Final stage: Stage 0(collect at SparkPlan.scala:83) 15/03/26 23:10:15 INFO scheduler.DAGScheduler: Parents of final stage: List() 15/03/26 23:10:15 INFO scheduler.DAGScheduler: Missing parents: List() 15/03/26 23:10:15 INFO scheduler.DAGScheduler: Submitting Stage 0 (MapPartitionsRDD[1] at map at SparkPlan.scala:83), which has no missing parents 15/03/26 23:10:16 INFO scheduler.TaskSchedulerImpl: Cancelling stage 0 15/03/26 23:10:16 INFO scheduler.DAGScheduler: Job 0 failed: collect at SparkPlan.scala:83, took 0.078101 s 15/03/26 23:10:16 ERROR thriftserver.SparkSQLDriver: Failed in [desc dw_bid] org.apache.spark.SparkException: Job aborted due to stage failure: Task serialization failed: java.lang.reflect.InvocationTargetException sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) java.lang.reflect.Constructor.newInstance(Constructor.java:526) org.apache.spark.io.CompressionCodec$.createCodec(CompressionCodec.scala:68) org.apache.spark.io.CompressionCodec$.createCodec(CompressionCodec.scala:60) org.apache.spark.broadcast.TorrentBroadcast.org $apache$spark$broadcast$TorrentBroadcast$$setConf(TorrentBroadcast.scala:73) org.apache.spark.broadcast.TorrentBroadcast.<init>(TorrentBroadcast.scala:79) org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34) org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:29) org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62) org.apache.spark.SparkContext.broadcast(SparkContext.scala:1051) org.apache.spark.scheduler.DAGScheduler.org $apache$spark$scheduler$DAGScheduler$$submitMissingTasks(DAGScheduler.scala:839) org.apache.spark.scheduler.DAGScheduler.org $apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:778) org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:762) org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1362) org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354) org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) at org.apache.spark.scheduler.DAGScheduler.org $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1203) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1191) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1191) at org.apache.spark.scheduler.DAGScheduler.org $apache$spark$scheduler$DAGScheduler$$submitMissingTasks(DAGScheduler.scala:847) at org.apache.spark.scheduler.DAGScheduler.org $apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:778) at org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:762) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1362) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) org.apache.spark.SparkException: Job aborted due to stage failure: Task serialization failed: java.lang.reflect.InvocationTargetException sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) java.lang.reflect.Constructor.newInstance(Constructor.java:526) org.apache.spark.io.CompressionCodec$.createCodec(CompressionCodec.scala:68) org.apache.spark.io.CompressionCodec$.createCodec(CompressionCodec.scala:60) org.apache.spark.broadcast.TorrentBroadcast.org $apache$spark$broadcast$TorrentBroadcast$$setConf(TorrentBroadcast.scala:73) org.apache.spark.broadcast.TorrentBroadcast.<init>(TorrentBroadcast.scala:79) org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34) org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:29) org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62) org.apache.spark.SparkContext.broadcast(SparkContext.scala:1051) org.apache.spark.scheduler.DAGScheduler.org $apache$spark$scheduler$DAGScheduler$$submitMissingTasks(DAGScheduler.scala:839) org.apache.spark.scheduler.DAGScheduler.org $apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:778) org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:762) org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1362) org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354) org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) at org.apache.spark.scheduler.DAGScheduler.org $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1203) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1191) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1191) at org.apache.spark.scheduler.DAGScheduler.org $apache$spark$scheduler$DAGScheduler$$submitMissingTasks(DAGScheduler.scala:847) at org.apache.spark.scheduler.DAGScheduler.org $apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:778) at org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:762) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1362) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) 15/03/26 23:10:16 ERROR CliDriver: org.apache.spark.SparkException: Job aborted due to stage failure: Task serialization failed: java.lang.reflect.InvocationTargetException sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) java.lang.reflect.Constructor.newInstance(Constructor.java:526) org.apache.spark.io.CompressionCodec$.createCodec(CompressionCodec.scala:68) org.apache.spark.io.CompressionCodec$.createCodec(CompressionCodec.scala:60) org.apache.spark.broadcast.TorrentBroadcast.org $apache$spark$broadcast$TorrentBroadcast$$setConf(TorrentBroadcast.scala:73) org.apache.spark.broadcast.TorrentBroadcast.<init>(TorrentBroadcast.scala:79) org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34) org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:29) org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62) org.apache.spark.SparkContext.broadcast(SparkContext.scala:1051) org.apache.spark.scheduler.DAGScheduler.org $apache$spark$scheduler$DAGScheduler$$submitMissingTasks(DAGScheduler.scala:839) org.apache.spark.scheduler.DAGScheduler.org $apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:778) org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:762) org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1362) org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354) org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) at org.apache.spark.scheduler.DAGScheduler.org $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1203) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1191) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1191) at org.apache.spark.scheduler.DAGScheduler.org $apache$spark$scheduler$DAGScheduler$$submitMissingTasks(DAGScheduler.scala:847) at org.apache.spark.scheduler.DAGScheduler.org $apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:778) at org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:762) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1362) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) spark-sql> 15/03/26 23:10:16 INFO scheduler.StatsReportListener: Finished stage: org.apache.spark.scheduler.StageInfo@2c3fa686 Also, how do i know from log messages that spark sql has connected to Hive. On Fri, Mar 27, 2015 at 9:36 AM, Cheng Lian <lian.cs....@gmail.com> wrote: > Hey Deepak, > > It seems that your hive-site.xml says your Hive metastore setup is using > MySQL. If that's not the case, you need to adjust your hive-site.xml > configurations. As for the version of MySQL driver, it should match the > MySQL server. > > Cheng > > On 3/27/15 11:07 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) wrote: > > I do not use MySQL, i want to read Hive tables from Spark SQL and > transform them in Spark SQL. Why do i need a MySQL driver ? If i still need > it which version should i use. > > Assuming i need it, i downloaded the latest version of it from > http://mvnrepository.com/artifact/mysql/mysql-connector-java/5.1.34 and > ran the following commands, i do not see above exception , however i see a > new one. > > > > > > export SPARK_HOME=/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4 > export SPARK_JAR=$SPARK_HOME/lib/spark-assembly-1.3.0-hadoop2.4.0.jar > export > SPARK_CLASSPATH=/apache/hadoop/share/hadoop/common/hadoop-common-2.4.1-EBAY-2.jar:/apache/hadoop/lib/hadoop-lzo-0.6.0.jar:/apache/hadoop-2.4.1-2.1.3.0-2-EBAY/share/hadoop/yarn/lib/guava-11.0.2.jar:/home/dvasthimal/spark1.3/spark-avro_2.10-1.0.0.jar:/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.6.jar:/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-core-3.2.10.jar:/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-rdbms-3.2.9.jar: > */home/dvasthimal/spark1.3/mysql-connector-java-5.1.34.jar* > export HADOOP_CONF_DIR=/apache/hadoop/conf > cd $SPARK_HOME > ./bin/spark-sql > Spark assembly has been built with Hive, including Datanucleus jars on > classpath > ... > ... > > spark-sql> > > spark-sql> > > spark-sql> > > > show tables; > > 15/03/26 20:03:57 INFO metastore.HiveMetaStore: 0: get_tables: db=default > pat=.* > > 15/03/26 20:03:57 INFO HiveMetaStore.audit: ugi=dvasthi...@corp.ebay.com > ip=unknown-ip-addr cmd=get_tables: db=default pat=.* > > 15/03/26 20:03:58 INFO spark.SparkContext: Starting job: collect at > SparkPlan.scala:83 > > 15/03/26 20:03:58 INFO scheduler.DAGScheduler: Got job 1 (collect at > SparkPlan.scala:83) with 1 output partitions (allowLocal=false) > > 15/03/26 20:03:58 INFO scheduler.DAGScheduler: Final stage: Stage > 1(collect at SparkPlan.scala:83) > > 15/03/26 20:03:58 INFO scheduler.DAGScheduler: Parents of final stage: > List() > > 15/03/26 20:03:58 INFO scheduler.DAGScheduler: Missing parents: List() > > 15/03/26 20:03:58 INFO scheduler.DAGScheduler: Submitting Stage 1 > (MapPartitionsRDD[3] at map at SparkPlan.scala:83), which has no missing > parents > > 15/03/26 20:03:58 INFO scheduler.TaskSchedulerImpl: Cancelling stage 1 > > 15/03/26 20:03:58 INFO scheduler.StatsReportListener: Finished stage: > org.apache.spark.scheduler.StageInfo@2bfd9c4d > > 15/03/26 20:03:58 INFO scheduler.DAGScheduler: Job 1 failed: collect at > SparkPlan.scala:83, took 0.005163 s > > 15/03/26 20:03:58 ERROR thriftserver.SparkSQLDriver: Failed in [show > tables] > > org.apache.spark.SparkException: Job aborted due to stage failure: Task > serialization failed: java.lang.reflect.InvocationTargetException > > sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > > > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) > > > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > > java.lang.reflect.Constructor.newInstance(Constructor.java:526) > > > org.apache.spark.io.CompressionCodec$.createCodec(CompressionCodec.scala:68) > > > org.apache.spark.io.CompressionCodec$.createCodec(CompressionCodec.scala:60) > > org.apache.spark.broadcast.TorrentBroadcast.org > $apache$spark$broadcast$TorrentBroadcast$$setConf(TorrentBroadcast.scala:73) > > > org.apache.spark.broadcast.TorrentBroadcast.<init>(TorrentBroadcast.scala:79) > > > org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34) > > > org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:29) > > > org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62) > > org.apache.spark.SparkContext.broadcast(SparkContext.scala:1051) > > org.apache.spark.scheduler.DAGScheduler.org > $apache$spark$scheduler$DAGScheduler$$submitMissingTasks(DAGScheduler.scala:839) > > org.apache.spark.scheduler.DAGScheduler.org > $apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:778) > > > org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:762) > > > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1362) > > > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354) > > org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) > > > at org.apache.spark.scheduler.DAGScheduler.org > $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1203) > > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192) > > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1191) > > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > > at > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1191) > > at org.apache.spark.scheduler.DAGScheduler.org > $apache$spark$scheduler$DAGScheduler$$submitMissingTasks(DAGScheduler.scala:847) > > at org.apache.spark.scheduler.DAGScheduler.org > $apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:778) > > at > org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:762) > > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1362) > > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354) > > at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) > > org.apache.spark.SparkException: Job aborted due to stage failure: Task > serialization failed: java.lang.reflect.InvocationTargetException > > sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > > > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) > > > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > > java.lang.reflect.Constructor.newInstance(Constructor.java:526) > > > org.apache.spark.io.CompressionCodec$.createCodec(CompressionCodec.scala:68) > > > org.apache.spark.io.CompressionCodec$.createCodec(CompressionCodec.scala:60) > > org.apache.spark.broadcast.TorrentBroadcast.org > $apache$spark$broadcast$TorrentBroadcast$$setConf(TorrentBroadcast.scala:73) > > > org.apache.spark.broadcast.TorrentBroadcast.<init>(TorrentBroadcast.scala:79) > > > org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34) > > > org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:29) > > > org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62) > > org.apache.spark.SparkContext.broadcast(SparkContext.scala:1051) > > org.apache.spark.scheduler.DAGScheduler.org > $apache$spark$scheduler$DAGScheduler$$submitMissingTasks(DAGScheduler.scala:839) > > org.apache.spark.scheduler.DAGScheduler.org > $apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:778) > > > org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:762) > > > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1362) > > > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354) > > org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) > > > at org.apache.spark.scheduler.DAGScheduler.org > $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1203) > > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192) > > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1191) > > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > > at > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1191) > > at org.apache.spark.scheduler.DAGScheduler.org > $apache$spark$scheduler$DAGScheduler$$submitMissingTasks(DAGScheduler.scala:847) > > at org.apache.spark.scheduler.DAGScheduler.org > $apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:778) > > at > org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:762) > > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1362) > > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354) > > at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) > > > 15/03/26 20:03:58 ERROR CliDriver: org.apache.spark.SparkException: Job > aborted due to stage failure: Task serialization failed: > java.lang.reflect.InvocationTargetException > > sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > > > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) > > > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > > java.lang.reflect.Constructor.newInstance(Constructor.java:526) > > > org.apache.spark.io.CompressionCodec$.createCodec(CompressionCodec.scala:68) > > > org.apache.spark.io.CompressionCodec$.createCodec(CompressionCodec.scala:60) > > org.apache.spark.broadcast.TorrentBroadcast.org > $apache$spark$broadcast$TorrentBroadcast$$setConf(TorrentBroadcast.scala:73) > > > org.apache.spark.broadcast.TorrentBroadcast.<init>(TorrentBroadcast.scala:79) > > > org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34) > > > org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:29) > > > org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62) > > org.apache.spark.SparkContext.broadcast(SparkContext.scala:1051) > > org.apache.spark.scheduler.DAGScheduler.org > $apache$spark$scheduler$DAGScheduler$$submitMissingTasks(DAGScheduler.scala:839) > > org.apache.spark.scheduler.DAGScheduler.org > $apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:778) > > > org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:762) > > > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1362) > > > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354) > > org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) > > > at org.apache.spark.scheduler.DAGScheduler.org > $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1203) > > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192) > > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1191) > > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > > at > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1191) > > at org.apache.spark.scheduler.DAGScheduler.org > $apache$spark$scheduler$DAGScheduler$$submitMissingTasks(DAGScheduler.scala:847) > > at org.apache.spark.scheduler.DAGScheduler.org > $apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:778) > > at > org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:762) > > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1362) > > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354) > > at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) > > > Regards, > > Deepak > > > > On Fri, Mar 27, 2015 at 8:33 AM, Cheng Lian <lian.cs....@gmail.com> wrote: > >> As the exception suggests, you don't have MySQL JDBC driver on your >> classpath. >> >> >> >> On 3/27/15 10:45 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) wrote: >> >> I am unable to run spark-sql form command line. I attempted the >> following >> >> 1) >> >> export SPARK_HOME=/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4 >> export SPARK_JAR=$SPARK_HOME/lib/spark-assembly-1.3.0-hadoop2.4.0.jar >> export >> SPARK_CLASSPATH=/apache/hadoop/share/hadoop/common/hadoop-common-2.4.1-EBAY-2.jar:/apache/hadoop/lib/hadoop-lzo-0.6.0.jar:/apache/hadoop-2.4.1-2.1.3.0-2-EBAY/share/hadoop/yarn/lib/guava-11.0.2.jar >> cd $SPARK_HOME >> >> ./bin/spark-sql >> >> ./bin/spark-sql >> 2) >> >> export SPARK_HOME=/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4 >> export SPARK_JAR=$SPARK_HOME/lib/spark-assembly-1.3.0-hadoop2.4.0.jar >> export >> SPARK_CLASSPATH=/apache/hadoop/share/hadoop/common/hadoop-common-2.4.1-EBAY-2.jar:/apache/hadoop/lib/hadoop-lzo-0.6.0.jar:/apache/hadoop-2.4.1-2.1.3.0-2-EBAY/share/hadoop/yarn/lib/guava-11.0.2.jar >> cd $SPARK_HOME >> >> ./bin/spark-sql --jars >> /home/dvasthimal/spark1.3/spark-avro_2.10-1.0.0.jar,/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.6.jar,/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-core-3.2.10.jar,/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-rdbms-3.2.9.jar >> >> >> 3) >> >> export SPARK_HOME=/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4 >> export SPARK_JAR=$SPARK_HOME/lib/spark-assembly-1.3.0-hadoop2.4.0.jar >> export >> SPARK_CLASSPATH=/apache/hadoop/share/hadoop/common/hadoop-common-2.4.1-EBAY-2.jar:/apache/hadoop/lib/hadoop-lzo-0.6.0.jar:/apache/hadoop-2.4.1-2.1.3.0-2-EBAY/share/hadoop/yarn/lib/guava-11.0.2.jar:/home/dvasthimal/spark1.3/spark-avro_2.10-1.0.0.jar:/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.6.jar:/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-core-3.2.10.jar:/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-rdbms-3.2.9.jar >> export HADOOP_CONF_DIR=/apache/hadoop/conf >> cd $SPARK_HOME >> ./bin/spark-sql >> >> >> >> *Each time i get the below exception* >> >> >> Spark assembly has been built with Hive, including Datanucleus jars on >> classpath >> 15/03/26 19:43:49 WARN conf.HiveConf: DEPRECATED: Configuration property >> hive.metastore.local no longer has any effect. Make sure to provide a valid >> value for hive.metastore.uris if you are connecting to a remote metastore. >> 15/03/26 19:43:49 WARN conf.HiveConf: DEPRECATED: >> hive.metastore.ds.retry.* no longer has any effect. Use >> hive.hmshandler.retry.* instead >> 15/03/26 19:43:49 INFO metastore.HiveMetaStore: 0: Opening raw store with >> implemenation class:org.apache.hadoop.hive.metastore.ObjectStore >> 15/03/26 19:43:49 INFO metastore.ObjectStore: ObjectStore, initialize >> called >> 15/03/26 19:43:50 INFO DataNucleus.Persistence: Property >> datanucleus.cache.level2 unknown - will be ignored >> 15/03/26 19:43:50 INFO DataNucleus.Persistence: Property >> hive.metastore.integral.jdo.pushdown unknown - will be ignored >> Exception in thread "main" java.lang.RuntimeException: >> java.lang.RuntimeException: Unable to instantiate >> org.apache.hadoop.hive.metastore.HiveMetaStoreClient >> at >> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:346) >> at >> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:101) >> at >> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> at java.lang.reflect.Method.invoke(Method.java:606) >> at >> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569) >> at >> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166) >> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189) >> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110) >> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) >> Caused by: java.lang.RuntimeException: Unable to instantiate >> org.apache.hadoop.hive.metastore.HiveMetaStoreClient >> at >> org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1412) >> at >> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:62) >> at >> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:72) >> at >> org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2453) >> at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2465) >> at >> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:340) >> ... 11 more >> Caused by: java.lang.reflect.InvocationTargetException >> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) >> at >> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) >> at >> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) >> at java.lang.reflect.Constructor.newInstance(Constructor.java:526) >> at >> org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1410) >> ... 16 more >> Caused by: javax.jdo.JDOFatalInternalException: Error creating >> transactional connection factory >> NestedThrowables: >> java.lang.reflect.InvocationTargetException >> at >> org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:587) >> at >> org.datanucleus.api.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:788) >> at >> org.datanucleus.api.jdo.JDOPersistenceManagerFactory.createPersistenceManagerFactory(JDOPersistenceManagerFactory.java:333) >> at >> org.datanucleus.api.jdo.JDOPersistenceManagerFactory.getPersistenceManagerFactory(JDOPersistenceManagerFactory.java:202) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> at java.lang.reflect.Method.invoke(Method.java:606) >> at javax.jdo.JDOHelper$16.run(JDOHelper.java:1965) >> at java.security.AccessController.doPrivileged(Native Method) >> at javax.jdo.JDOHelper.invoke(JDOHelper.java:1960) >> at >> javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1166) >> at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:808) >> at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:701) >> at >> org.apache.hadoop.hive.metastore.ObjectStore.getPMF(ObjectStore.java:310) >> at >> org.apache.hadoop.hive.metastore.ObjectStore.getPersistenceManager(ObjectStore.java:339) >> at >> org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:248) >> at >> org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:223) >> at >> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73) >> at >> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) >> at >> org.apache.hadoop.hive.metastore.RawStoreProxy.<init>(RawStoreProxy.java:58) >> at >> org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:67) >> at >> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:497) >> at >> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:475) >> at >> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:523) >> at >> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:397) >> at >> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.<init>(HiveMetaStore.java:356) >> at >> org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:54) >> at >> org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:59) >> at >> org.apache.hadoop.hive.metastore.HiveMetaStore.newHMSHandler(HiveMetaStore.java:4944) >> at >> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:171) >> ... 21 more >> Caused by: java.lang.reflect.InvocationTargetException >> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) >> at >> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) >> at >> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) >> at java.lang.reflect.Constructor.newInstance(Constructor.java:526) >> at >> org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:631) >> at >> org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:325) >> at >> org.datanucleus.store.AbstractStoreManager.registerConnectionFactory(AbstractStoreManager.java:282) >> at >> org.datanucleus.store.AbstractStoreManager.<init>(AbstractStoreManager.java:240) >> at >> org.datanucleus.store.rdbms.RDBMSStoreManager.<init>(RDBMSStoreManager.java:286) >> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) >> at >> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) >> at >> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) >> at java.lang.reflect.Constructor.newInstance(Constructor.java:526) >> at >> org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:631) >> at >> org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:301) >> at >> org.datanucleus.NucleusContext.createStoreManagerForProperties(NucleusContext.java:1187) >> at org.datanucleus.NucleusContext.initialise(NucleusContext.java:356) >> at >> org.datanucleus.api.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:775) >> ... 50 more >> Caused by: org.datanucleus.exceptions.NucleusException: Attempt to invoke >> the "BONECP" plugin to create a ConnectionPool gave an error : The >> specified datastore driver ("com.mysql.jdbc.Driver") was not found in the >> CLASSPATH. Please check your CLASSPATH specification, and the name of the >> driver. >> at >> org.datanucleus.store.rdbms.ConnectionFactoryImpl.generateDataSources(ConnectionFactoryImpl.java:259) >> at >> org.datanucleus.store.rdbms.ConnectionFactoryImpl.initialiseDataSources(ConnectionFactoryImpl.java:131) >> at >> org.datanucleus.store.rdbms.ConnectionFactoryImpl.<init>(ConnectionFactoryImpl.java:85) >> ... 68 more >> Caused by: >> org.datanucleus.store.rdbms.connectionpool.DatastoreDriverNotFoundException: >> The specified datastore driver ("com.mysql.jdbc.Driver") was not found in >> the CLASSPATH. Please check your CLASSPATH specification, and the name of >> the driver. >> at >> org.datanucleus.store.rdbms.connectionpool.AbstractConnectionPoolFactory.loadDriver(AbstractConnectionPoolFactory.java:58) >> at >> org.datanucleus.store.rdbms.connectionpool.BoneCPConnectionPoolFactory.createConnectionPool(BoneCPConnectionPoolFactory.java:54) >> at >> org.datanucleus.store.rdbms.ConnectionFactoryImpl.generateDataSources(ConnectionFactoryImpl.java:238) >> ... 70 more >> -- >> Deepak >> >> >> > > > -- > Deepak > > > -- Deepak