Re: Kylin-2.5.2 bulid cube failed with spark at step 8

smallsuperman Tue, 18 Dec 2018 18:16:27 -0800

On 2018/12/18 17:10:45, Jon Shoberg <[email protected]> wrote: 
> I believe the root cause if fixed in a recent jira issue and patch that
> will go into a later release.
> 
> Two solutions:
> 
> First, you can look and see which class definitions are missing, locate
> their containing JARs, and move them to your spark jars folder.
> 
> For me, this meant moved jar files from /opt/hbase/lib to
> /opt/kylin/spark/jars.
> 
> Second, I took an alternate approach which was easier at the time. I moved
> -everything- from hbase/lib to spark/jars and then resolved class conflicts
> when I got an error message.
> 
> For me, this meant removing an extra *netty* jar which spark had a conflict
> but I got a successful spark/kylin build. (remove the netty jar coming from
> hbase libs into spark jars)
> 
> I'd say the second approach is extremely sub-optimal but I'm working in a
> test-lab setup and it unblocked an issue (got spark builds working) and let
> me move forward.
> 
> Also ....
> 
> At the same time I got errors regarding kylin.properties not being found.
> Since this lab setup is established from TAR downloads I needed to copy my
> kylin configuration to each node (same path/directory structure)
> 
> Not sure if this was before or after the above item but the hbase jars and
> distributing the conf files got kylin/spark working for my small data set;
> working on optimizing the medium data set now.
> 
> Best of luck! J
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> On Tue, Dec 18, 2018 at 9:30 AM smallsuperman <[email protected]>
> wrote:
> 
> > Hello all,
> > I usesd Apache Spark to replace MapReduce in the build cube step as
> > document at http://kylin.apache.org/docs/tutorial/cube_spark.html . But
> > the build job was failed at step 8 named Convert Cuboid Data to HFile and
> > the log file output is
> >
> > OS command error exit with return code: 1, error message: 18/12/18
> > 23:31:53 INFO client.RMProxy: Connecting to ResourceManager at iap12m6/
> > 10.8.245.41:8032
> > 18/12/18 23:31:53 INFO yarn.Client: Requesting a new application from
> > cluster with 3 NodeManagers
> > 18/12/18 23:31:53 INFO yarn.Client: Verifying our application has not
> > requested more than the maximum memory capability of the cluster (8192 MB
> > per container)
> > 18/12/18 23:31:53 INFO yarn.Client: Will allocate AM container, with 1408
> > MB memory including 384 MB overhead
> > 18/12/18 23:31:53 INFO yarn.Client: Setting up container launch context
> > for our AM
> > 18/12/18 23:31:53 INFO yarn.Client: Setting up the launch environment for
> > our AM container
> > 18/12/18 23:31:53 INFO yarn.Client: Preparing resources for our AM
> > container
> > 18/12/18 23:31:54 WARN yarn.Client: Neither spark.yarn.jars nor
> > spark.yarn.archive is set, falling back to uploading libraries under
> > SPARK_HOME.
> > I also check the error log at yarn
> >
> > Diagnostics:
> > User class threw exception: java.lang.RuntimeException: error execute
> > org.apache.kylin.storage.hbase.steps.SparkCubeHFile. Root cause: Job
> > aborted due to stage failure: Task 1 in stage 1.0 failed 4 times, most
> > recent failure: Lost task 1.3 in stage 1.0 (TID 15, iap12m8, executor 3):
> > java.lang.NoClassDefFoundError: Could not initialize class
> > org.apache.hadoop.hbase.io.hfile.HFile
> > at
> > org.apache.hadoop.hbase.mapreduce.HFileOutputFormat2$1.getNewWriter(HFileOutputFormat2.java:305)
> > at
> > org.apache.hadoop.hbase.mapreduce.HFileOutputFormat2$1.write(HFileOutputFormat2.java:229)
> > at
> > org.apache.hadoop.hbase.mapreduce.HFileOutputFormat2$1.write(HFileOutputFormat2.java:167)
> > at
> > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12$$anonfun$apply$4.apply$mcV$sp(PairRDDFunctions.scala:1125)
> > at
> > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12$$anonfun$apply$4.apply(PairRDDFunctions.scala:1123)
> > at
> > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12$$anonfun$apply$4.apply(PairRDDFunctions.scala:1123)
> > at
> > org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1353)
> > at
> > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1131)
> > at
> > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1102)
> > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
> > at org.apache.spark.scheduler.Task.run(Task.scala:99)
> > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:325)
> > at
> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> > at
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> > at java.lang.Thread.run(Thread.java:748)
> > Driver stacktrace:
> > and I think java.lang.NoClassDefFoundError: Could not initialize class
> > org.apache.hadoop.hbase.io.hfile.HFile
> > at
> > org.apache.hadoop.hbase.mapreduce.HFileOutputFormat2$1.getNewWriter(HFileOutputFormat2.java:305)
> > is the most import info, however I have no idea and find few useful
> > sugesstion at Internet…
> >
> > here is my environment
> >
> > hadoop-2.7.3
> > hbase-1.4.9
> > hive-1.2.1
> > kylin-2.5.2-bin-hbase1x
> > jdk1.8.0_144
> > spark-2.2.0
> > hope your helps thanks..
> >
> 


thanks for your reply ..  I just do as your second sugesstion to copy 
/HBAS-HOME/lib/* to /KYLIN-HOME/spark/jars and overwrite existing jars ...It 
works for me ！！！
Thanks alot..
um....I have another doubt thant Kylin embeds a Spark binary (v2.1.0) in 
$KYLIN_HOME/spark. I have try to replace  $KYLIN_HOME/spark with spark-2.2.0 
before copy /HBAS-HOME/lib/* to /KYLIN-HOME/spark/jars..when I build my cube 
there was a different error 
I would try to copy  /HBAS-HOME/lib/*  to the new spark to check if it works to 
spark 2.2.0
Re: Kylin-2.5.2 bulid cube failed with spark at step 8

Reply via email to