Ted, I suspect I hit the issue https://issues.apache.org/jira/browse/SPARK-11818 Could you refer the issue and verify that it makes sense?
Thanks, Jungtaek Lim (HeartSaVioR) 2015-11-18 20:32 GMT+09:00 Ted Yu <yuzhih...@gmail.com>: > Here is related code: > > private static void checkDefaultsVersion(Configuration conf) { > > if (conf.getBoolean("hbase.defaults.for.version.skip", Boolean.FALSE)) > return; > > String defaultsVersion = conf.get("hbase.defaults.for.version"); > > String thisVersion = VersionInfo.getVersion(); > > if (!thisVersion.equals(defaultsVersion)) { > > throw new RuntimeException( > > "hbase-default.xml file seems to be for an older version of HBase > (" + > > defaultsVersion + "), this version is " + thisVersion); > > null means that "hbase.defaults.for.version" was not set in the other > hbase-default.xml > > Can you retrieve the classpath of Spark task so that we can have more clue > ? > > > Cheers > > On Tue, Nov 17, 2015 at 10:06 PM, 임정택 <kabh...@gmail.com> wrote: > >> Ted, >> >> Thanks for the reply. >> >> My fat jar has dependency with spark related library to only spark-core >> as "provided". >> Seems like Spark only adds 0.98.7-hadoop2 of hbase-common in >> spark-example module. >> >> And if there're two hbase-default.xml in the classpath, should one of >> them be loaded, instead of showing (null)? >> >> Best, >> Jungtaek Lim (HeartSaVioR) >> >> >> >> 2015-11-18 13:50 GMT+09:00 Ted Yu <yuzhih...@gmail.com>: >> >>> Looks like there're two hbase-default.xml in the classpath: one for 0.98.6 >>> and another for 0.98.7-hadoop2 (used by Spark) >>> >>> You can specify hbase.defaults.for.version.skip as true in your >>> hbase-site.xml >>> >>> Cheers >>> >>> On Tue, Nov 17, 2015 at 1:01 AM, 임정택 <kabh...@gmail.com> wrote: >>> >>>> Hi all, >>>> >>>> I'm evaluating zeppelin to run driver which interacts with HBase. >>>> I use fat jar to include HBase dependencies, and see failures on >>>> executor level. >>>> I thought it is zeppelin's issue, but it fails on spark-shell, too. >>>> >>>> I loaded fat jar via --jars option, >>>> >>>> > ./bin/spark-shell --jars hbase-included-assembled.jar >>>> >>>> and run driver code using provided SparkContext instance, and see >>>> failures from spark-shell console and executor logs. >>>> >>>> below is stack traces, >>>> >>>> org.apache.spark.SparkException: Job aborted due to stage failure: Task 55 >>>> in stage 0.0 failed 4 times, most recent failure: Lost task 55.3 in stage >>>> 0.0 (TID 281, <svr hostname>): java.lang.NoClassDefFoundError: Could not >>>> initialize class org.apache.hadoop.hbase.client.HConnectionManager >>>> at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:197) >>>> at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:159) >>>> at >>>> org.apache.hadoop.hbase.mapreduce.TableInputFormat.setConf(TableInputFormat.java:101) >>>> at >>>> org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:128) >>>> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:104) >>>> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:66) >>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) >>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) >>>> at >>>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) >>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) >>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) >>>> at >>>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70) >>>> at >>>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) >>>> at org.apache.spark.scheduler.Task.run(Task.scala:70) >>>> at >>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) >>>> at >>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>>> at >>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>>> at java.lang.Thread.run(Thread.java:745) >>>> >>>> Driver stacktrace: >>>> at >>>> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1273) >>>> at >>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1264) >>>> at >>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1263) >>>> at >>>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) >>>> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) >>>> at >>>> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1263) >>>> at >>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730) >>>> at >>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730) >>>> at scala.Option.foreach(Option.scala:236) >>>> at >>>> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:730) >>>> at >>>> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1457) >>>> at >>>> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1418) >>>> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) >>>> >>>> >>>> 15/11/16 18:59:57 ERROR Executor: Exception in task 14.0 in stage 0.0 (TID >>>> 14) >>>> java.lang.ExceptionInInitializerError >>>> at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:197) >>>> at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:159) >>>> at >>>> org.apache.hadoop.hbase.mapreduce.TableInputFormat.setConf(TableInputFormat.java:101) >>>> at >>>> org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:128) >>>> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:104) >>>> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:66) >>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) >>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) >>>> at >>>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) >>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) >>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) >>>> at >>>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70) >>>> at >>>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) >>>> at org.apache.spark.scheduler.Task.run(Task.scala:70) >>>> at >>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) >>>> at >>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>>> at >>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>>> at java.lang.Thread.run(Thread.java:745) >>>> Caused by: java.lang.RuntimeException: hbase-default.xml file seems to be >>>> for and old version of HBase (null), this version is 0.98.6-cdh5.2.0 >>>> at >>>> org.apache.hadoop.hbase.HBaseConfiguration.checkDefaultsVersion(HBaseConfiguration.java:73) >>>> at >>>> org.apache.hadoop.hbase.HBaseConfiguration.addHbaseResources(HBaseConfiguration.java:105) >>>> at >>>> org.apache.hadoop.hbase.HBaseConfiguration.create(HBaseConfiguration.java:116) >>>> at >>>> org.apache.hadoop.hbase.client.HConnectionManager.<clinit>(HConnectionManager.java:222) >>>> ... 18 more >>>> >>>> >>>> Please note that it runs smoothly on spark-submit. >>>> >>>> Btw, if issue is that hbase-default.xml is not properly loaded (maybe >>>> because of classloader), it seems to run properly on driver level. >>>> >>>> import org.apache.hadoop.hbase.HBaseConfiguration >>>> val conf = HBaseConfiguration.create() >>>> println(conf.get("hbase.defaults.for.version")) >>>> >>>> It prints "0.98.6-cdh5.2.0". >>>> >>>> I'm using Spark-1.4.1-hadoop-2.4-bin, and zeppelin 0.5.5, and HBase >>>> 0.98.6-CDH5.2.0. >>>> >>>> Thanks in advance! >>>> >>>> Best, >>>> Jungtaek Lim (HeartSaVioR) >>>> >>> >>> >> >> >> -- >> Name : 임 정택 >> Blog : http://www.heartsavior.net / http://dev.heartsavior.net >> Twitter : http://twitter.com/heartsavior >> LinkedIn : http://www.linkedin.com/in/heartsavior >> > > -- Name : 임 정택 Blog : http://www.heartsavior.net / http://dev.heartsavior.net Twitter : http://twitter.com/heartsavior LinkedIn : http://www.linkedin.com/in/heartsavior