It's clear that the problem is SPARK-11818 <https://issues.apache.org/jira/browse/SPARK-11818> which doesn't lookup resource properly in Executor when config value of REPL URI presented to Executor. (filed related PR <https://github.com/apache/spark/pull/9812>, which is in reviewing.)
I personally apply PR to Spark 1.4.1 and create custom build, and now my code runs smoothly in Zeppelin. Best, Jungtaek Lim (HeartSaVioR) 2015-11-19 0:59 GMT+09:00 Ted Yu <yuzhih...@gmail.com>: > Interesting. > > I will watching your PR. > > On Wed, Nov 18, 2015 at 7:51 AM, 임정택 <kabh...@gmail.com> wrote: > >> Ted, >> >> I suspect I hit the issue >> https://issues.apache.org/jira/browse/SPARK-11818 >> Could you refer the issue and verify that it makes sense? >> >> Thanks, >> Jungtaek Lim (HeartSaVioR) >> >> 2015-11-18 20:32 GMT+09:00 Ted Yu <yuzhih...@gmail.com>: >> >>> Here is related code: >>> >>> private static void checkDefaultsVersion(Configuration conf) { >>> >>> if (conf.getBoolean("hbase.defaults.for.version.skip", Boolean.FALSE)) >>> return; >>> >>> String defaultsVersion = conf.get("hbase.defaults.for.version"); >>> >>> String thisVersion = VersionInfo.getVersion(); >>> >>> if (!thisVersion.equals(defaultsVersion)) { >>> >>> throw new RuntimeException( >>> >>> "hbase-default.xml file seems to be for an older version of >>> HBase (" + >>> >>> defaultsVersion + "), this version is " + thisVersion); >>> >>> null means that "hbase.defaults.for.version" was not set in the other >>> hbase-default.xml >>> >>> Can you retrieve the classpath of Spark task so that we can have more >>> clue ? >>> >>> >>> Cheers >>> >>> On Tue, Nov 17, 2015 at 10:06 PM, 임정택 <kabh...@gmail.com> wrote: >>> >>>> Ted, >>>> >>>> Thanks for the reply. >>>> >>>> My fat jar has dependency with spark related library to only spark-core >>>> as "provided". >>>> Seems like Spark only adds 0.98.7-hadoop2 of hbase-common in >>>> spark-example module. >>>> >>>> And if there're two hbase-default.xml in the classpath, should one of >>>> them be loaded, instead of showing (null)? >>>> >>>> Best, >>>> Jungtaek Lim (HeartSaVioR) >>>> >>>> >>>> >>>> 2015-11-18 13:50 GMT+09:00 Ted Yu <yuzhih...@gmail.com>: >>>> >>>>> Looks like there're two hbase-default.xml in the classpath: one for 0.98.6 >>>>> and another for 0.98.7-hadoop2 (used by Spark) >>>>> >>>>> You can specify hbase.defaults.for.version.skip as true in your >>>>> hbase-site.xml >>>>> >>>>> Cheers >>>>> >>>>> On Tue, Nov 17, 2015 at 1:01 AM, 임정택 <kabh...@gmail.com> wrote: >>>>> >>>>>> Hi all, >>>>>> >>>>>> I'm evaluating zeppelin to run driver which interacts with HBase. >>>>>> I use fat jar to include HBase dependencies, and see failures on >>>>>> executor level. >>>>>> I thought it is zeppelin's issue, but it fails on spark-shell, too. >>>>>> >>>>>> I loaded fat jar via --jars option, >>>>>> >>>>>> > ./bin/spark-shell --jars hbase-included-assembled.jar >>>>>> >>>>>> and run driver code using provided SparkContext instance, and see >>>>>> failures from spark-shell console and executor logs. >>>>>> >>>>>> below is stack traces, >>>>>> >>>>>> org.apache.spark.SparkException: Job aborted due to stage failure: Task >>>>>> 55 in stage 0.0 failed 4 times, most recent failure: Lost task 55.3 in >>>>>> stage 0.0 (TID 281, <svr hostname>): java.lang.NoClassDefFoundError: >>>>>> Could not initialize class >>>>>> org.apache.hadoop.hbase.client.HConnectionManager >>>>>> at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:197) >>>>>> at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:159) >>>>>> at >>>>>> org.apache.hadoop.hbase.mapreduce.TableInputFormat.setConf(TableInputFormat.java:101) >>>>>> at >>>>>> org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:128) >>>>>> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:104) >>>>>> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:66) >>>>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) >>>>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) >>>>>> at >>>>>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) >>>>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) >>>>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) >>>>>> at >>>>>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70) >>>>>> at >>>>>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) >>>>>> at org.apache.spark.scheduler.Task.run(Task.scala:70) >>>>>> at >>>>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) >>>>>> at >>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>>>>> at >>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>>>>> at java.lang.Thread.run(Thread.java:745) >>>>>> >>>>>> Driver stacktrace: >>>>>> at >>>>>> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1273) >>>>>> at >>>>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1264) >>>>>> at >>>>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1263) >>>>>> at >>>>>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) >>>>>> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) >>>>>> at >>>>>> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1263) >>>>>> at >>>>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730) >>>>>> at >>>>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730) >>>>>> at scala.Option.foreach(Option.scala:236) >>>>>> at >>>>>> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:730) >>>>>> at >>>>>> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1457) >>>>>> at >>>>>> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1418) >>>>>> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) >>>>>> >>>>>> >>>>>> 15/11/16 18:59:57 ERROR Executor: Exception in task 14.0 in stage 0.0 >>>>>> (TID 14) >>>>>> java.lang.ExceptionInInitializerError >>>>>> at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:197) >>>>>> at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:159) >>>>>> at >>>>>> org.apache.hadoop.hbase.mapreduce.TableInputFormat.setConf(TableInputFormat.java:101) >>>>>> at >>>>>> org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:128) >>>>>> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:104) >>>>>> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:66) >>>>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) >>>>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) >>>>>> at >>>>>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) >>>>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) >>>>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) >>>>>> at >>>>>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70) >>>>>> at >>>>>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) >>>>>> at org.apache.spark.scheduler.Task.run(Task.scala:70) >>>>>> at >>>>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) >>>>>> at >>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>>>>> at >>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>>>>> at java.lang.Thread.run(Thread.java:745) >>>>>> Caused by: java.lang.RuntimeException: hbase-default.xml file seems to >>>>>> be for and old version of HBase (null), this version is 0.98.6-cdh5.2.0 >>>>>> at >>>>>> org.apache.hadoop.hbase.HBaseConfiguration.checkDefaultsVersion(HBaseConfiguration.java:73) >>>>>> at >>>>>> org.apache.hadoop.hbase.HBaseConfiguration.addHbaseResources(HBaseConfiguration.java:105) >>>>>> at >>>>>> org.apache.hadoop.hbase.HBaseConfiguration.create(HBaseConfiguration.java:116) >>>>>> at >>>>>> org.apache.hadoop.hbase.client.HConnectionManager.<clinit>(HConnectionManager.java:222) >>>>>> ... 18 more >>>>>> >>>>>> >>>>>> Please note that it runs smoothly on spark-submit. >>>>>> >>>>>> Btw, if issue is that hbase-default.xml is not properly loaded (maybe >>>>>> because of classloader), it seems to run properly on driver level. >>>>>> >>>>>> import org.apache.hadoop.hbase.HBaseConfiguration >>>>>> val conf = HBaseConfiguration.create() >>>>>> println(conf.get("hbase.defaults.for.version")) >>>>>> >>>>>> It prints "0.98.6-cdh5.2.0". >>>>>> >>>>>> I'm using Spark-1.4.1-hadoop-2.4-bin, and zeppelin 0.5.5, and HBase >>>>>> 0.98.6-CDH5.2.0. >>>>>> >>>>>> Thanks in advance! >>>>>> >>>>>> Best, >>>>>> Jungtaek Lim (HeartSaVioR) >>>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> Name : 임 정택 >>>> Blog : http://www.heartsavior.net / http://dev.heartsavior.net >>>> Twitter : http://twitter.com/heartsavior >>>> LinkedIn : http://www.linkedin.com/in/heartsavior >>>> >>> >>> >> >> >> -- >> Name : 임 정택 >> Blog : http://www.heartsavior.net / http://dev.heartsavior.net >> Twitter : http://twitter.com/heartsavior >> LinkedIn : http://www.linkedin.com/in/heartsavior >> > > -- Name : 임 정택 Blog : http://www.heartsavior.net / http://dev.heartsavior.net Twitter : http://twitter.com/heartsavior LinkedIn : http://www.linkedin.com/in/heartsavior