I believe the root cause if fixed in a recent jira issue and patch that
will go into a later release.

Two solutions:

First, you can look and see which class definitions are missing, locate
their containing JARs, and move them to your spark jars folder.

For me, this meant moved jar files from /opt/hbase/lib to
/opt/kylin/spark/jars.

Second, I took an alternate approach which was easier at the time. I moved
-everything- from hbase/lib to spark/jars and then resolved class conflicts
when I got an error message.

For me, this meant removing an extra *netty* jar which spark had a conflict
but I got a successful spark/kylin build. (remove the netty jar coming from
hbase libs into spark jars)

I'd say the second approach is extremely sub-optimal but I'm working in a
test-lab setup and it unblocked an issue (got spark builds working) and let
me move forward.

Also ....

At the same time I got errors regarding kylin.properties not being found.
Since this lab setup is established from TAR downloads I needed to copy my
kylin configuration to each node (same path/directory structure)

Not sure if this was before or after the above item but the hbase jars and
distributing the conf files got kylin/spark working for my small data set;
working on optimizing the medium data set now.

Best of luck! J













On Tue, Dec 18, 2018 at 9:30 AM smallsuperman <[email protected]>
wrote:

> Hello all,
> I usesd Apache Spark to replace MapReduce in the build cube step as
> document at http://kylin.apache.org/docs/tutorial/cube_spark.html . But
> the build job was failed at step 8 named Convert Cuboid Data to HFile and
> the log file output is
>
> OS command error exit with return code: 1, error message: 18/12/18
> 23:31:53 INFO client.RMProxy: Connecting to ResourceManager at iap12m6/
> 10.8.245.41:8032
> 18/12/18 23:31:53 INFO yarn.Client: Requesting a new application from
> cluster with 3 NodeManagers
> 18/12/18 23:31:53 INFO yarn.Client: Verifying our application has not
> requested more than the maximum memory capability of the cluster (8192 MB
> per container)
> 18/12/18 23:31:53 INFO yarn.Client: Will allocate AM container, with 1408
> MB memory including 384 MB overhead
> 18/12/18 23:31:53 INFO yarn.Client: Setting up container launch context
> for our AM
> 18/12/18 23:31:53 INFO yarn.Client: Setting up the launch environment for
> our AM container
> 18/12/18 23:31:53 INFO yarn.Client: Preparing resources for our AM
> container
> 18/12/18 23:31:54 WARN yarn.Client: Neither spark.yarn.jars nor
> spark.yarn.archive is set, falling back to uploading libraries under
> SPARK_HOME.
> I also check the error log at yarn
>
> Diagnostics:
> User class threw exception: java.lang.RuntimeException: error execute
> org.apache.kylin.storage.hbase.steps.SparkCubeHFile. Root cause: Job
> aborted due to stage failure: Task 1 in stage 1.0 failed 4 times, most
> recent failure: Lost task 1.3 in stage 1.0 (TID 15, iap12m8, executor 3):
> java.lang.NoClassDefFoundError: Could not initialize class
> org.apache.hadoop.hbase.io.hfile.HFile
> at
> org.apache.hadoop.hbase.mapreduce.HFileOutputFormat2$1.getNewWriter(HFileOutputFormat2.java:305)
> at
> org.apache.hadoop.hbase.mapreduce.HFileOutputFormat2$1.write(HFileOutputFormat2.java:229)
> at
> org.apache.hadoop.hbase.mapreduce.HFileOutputFormat2$1.write(HFileOutputFormat2.java:167)
> at
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12$$anonfun$apply$4.apply$mcV$sp(PairRDDFunctions.scala:1125)
> at
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12$$anonfun$apply$4.apply(PairRDDFunctions.scala:1123)
> at
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12$$anonfun$apply$4.apply(PairRDDFunctions.scala:1123)
> at
> org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1353)
> at
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1131)
> at
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1102)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
> at org.apache.spark.scheduler.Task.run(Task.scala:99)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:325)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Driver stacktrace:
> and I think java.lang.NoClassDefFoundError: Could not initialize class
> org.apache.hadoop.hbase.io.hfile.HFile
> at
> org.apache.hadoop.hbase.mapreduce.HFileOutputFormat2$1.getNewWriter(HFileOutputFormat2.java:305)
> is the most import info, however I have no idea and find few useful
> sugesstion at Internet…
>
> here is my environment
>
> hadoop-2.7.3
> hbase-1.4.9
> hive-1.2.1
> kylin-2.5.2-bin-hbase1x
> jdk1.8.0_144
> spark-2.2.0
> hope your helps thanks..
>

Reply via email to