I believe the root cause if fixed in a recent jira issue and patch that will go into a later release.
Two solutions: First, you can look and see which class definitions are missing, locate their containing JARs, and move them to your spark jars folder. For me, this meant moved jar files from /opt/hbase/lib to /opt/kylin/spark/jars. Second, I took an alternate approach which was easier at the time. I moved -everything- from hbase/lib to spark/jars and then resolved class conflicts when I got an error message. For me, this meant removing an extra *netty* jar which spark had a conflict but I got a successful spark/kylin build. (remove the netty jar coming from hbase libs into spark jars) I'd say the second approach is extremely sub-optimal but I'm working in a test-lab setup and it unblocked an issue (got spark builds working) and let me move forward. Also .... At the same time I got errors regarding kylin.properties not being found. Since this lab setup is established from TAR downloads I needed to copy my kylin configuration to each node (same path/directory structure) Not sure if this was before or after the above item but the hbase jars and distributing the conf files got kylin/spark working for my small data set; working on optimizing the medium data set now. Best of luck! J On Tue, Dec 18, 2018 at 9:30 AM smallsuperman <[email protected]> wrote: > Hello all, > I usesd Apache Spark to replace MapReduce in the build cube step as > document at http://kylin.apache.org/docs/tutorial/cube_spark.html . But > the build job was failed at step 8 named Convert Cuboid Data to HFile and > the log file output is > > OS command error exit with return code: 1, error message: 18/12/18 > 23:31:53 INFO client.RMProxy: Connecting to ResourceManager at iap12m6/ > 10.8.245.41:8032 > 18/12/18 23:31:53 INFO yarn.Client: Requesting a new application from > cluster with 3 NodeManagers > 18/12/18 23:31:53 INFO yarn.Client: Verifying our application has not > requested more than the maximum memory capability of the cluster (8192 MB > per container) > 18/12/18 23:31:53 INFO yarn.Client: Will allocate AM container, with 1408 > MB memory including 384 MB overhead > 18/12/18 23:31:53 INFO yarn.Client: Setting up container launch context > for our AM > 18/12/18 23:31:53 INFO yarn.Client: Setting up the launch environment for > our AM container > 18/12/18 23:31:53 INFO yarn.Client: Preparing resources for our AM > container > 18/12/18 23:31:54 WARN yarn.Client: Neither spark.yarn.jars nor > spark.yarn.archive is set, falling back to uploading libraries under > SPARK_HOME. > I also check the error log at yarn > > Diagnostics: > User class threw exception: java.lang.RuntimeException: error execute > org.apache.kylin.storage.hbase.steps.SparkCubeHFile. Root cause: Job > aborted due to stage failure: Task 1 in stage 1.0 failed 4 times, most > recent failure: Lost task 1.3 in stage 1.0 (TID 15, iap12m8, executor 3): > java.lang.NoClassDefFoundError: Could not initialize class > org.apache.hadoop.hbase.io.hfile.HFile > at > org.apache.hadoop.hbase.mapreduce.HFileOutputFormat2$1.getNewWriter(HFileOutputFormat2.java:305) > at > org.apache.hadoop.hbase.mapreduce.HFileOutputFormat2$1.write(HFileOutputFormat2.java:229) > at > org.apache.hadoop.hbase.mapreduce.HFileOutputFormat2$1.write(HFileOutputFormat2.java:167) > at > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12$$anonfun$apply$4.apply$mcV$sp(PairRDDFunctions.scala:1125) > at > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12$$anonfun$apply$4.apply(PairRDDFunctions.scala:1123) > at > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12$$anonfun$apply$4.apply(PairRDDFunctions.scala:1123) > at > org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1353) > at > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1131) > at > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1102) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > at org.apache.spark.scheduler.Task.run(Task.scala:99) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:325) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Driver stacktrace: > and I think java.lang.NoClassDefFoundError: Could not initialize class > org.apache.hadoop.hbase.io.hfile.HFile > at > org.apache.hadoop.hbase.mapreduce.HFileOutputFormat2$1.getNewWriter(HFileOutputFormat2.java:305) > is the most import info, however I have no idea and find few useful > sugesstion at Internet… > > here is my environment > > hadoop-2.7.3 > hbase-1.4.9 > hive-1.2.1 > kylin-2.5.2-bin-hbase1x > jdk1.8.0_144 > spark-2.2.0 > hope your helps thanks.. >
