Re: Kylin-2.5.2 bulid cube failed with spark at step 8

Jon Shoberg Tue, 18 Dec 2018 18:43:44 -0800

Glad it seems to be working but I think a future JIRA fix and after 2.5.2
these suggestions should not be needed.


Just make sure any further errors do not come from JAR conflicts; for
example there was an extra netty jar from hbase I had to remove.

My spark is 2.1.3 which comes with Kylin and so far it works for my small
data case but trying to tune it for my medium data case now.

Best of luck! J

On Tue, Dec 18, 2018 at 7:29 PM feng wang <[email protected]>
wrote:

> thanks for your reply ..  I just do as your second sugesstion to copy
> /HBAS-HOME/lib/* to /KYLIN-HOME/spark/jars and overwrite existing jars
> ...It works for me ！！！
> Thanks alot..
> um....I have another doubt thant Kylin embeds a Spark binary (v2.1.0) in
> $KYLIN_HOME/spark. I have try to replace  $KYLIN_HOME/spark with
> spark-2.2.0 before copy /HBAS-HOME/lib/* to /KYLIN-HOME/spark/jars..when
> I build my cube there was a different error
> I would try to copy  /HBAS-HOME/lib/*  to the new spark to check if it
> works to spark 2.2.0
>
> Jon Shoberg <[email protected]> 于2018年12月19日周三 上午1:10写道：
>
>> I believe the root cause if fixed in a recent jira issue and patch that
>> will go into a later release.
>>
>> Two solutions:
>>
>> First, you can look and see which class definitions are missing, locate
>> their containing JARs, and move them to your spark jars folder.
>>
>> For me, this meant moved jar files from /opt/hbase/lib to
>> /opt/kylin/spark/jars.
>>
>> Second, I took an alternate approach which was easier at the time. I
>> moved -everything- from hbase/lib to spark/jars and then resolved class
>> conflicts when I got an error message.
>>
>> For me, this meant removing an extra *netty* jar which spark had a
>> conflict but I got a successful spark/kylin build. (remove the netty jar
>> coming from hbase libs into spark jars)
>>
>> I'd say the second approach is extremely sub-optimal but I'm working in a
>> test-lab setup and it unblocked an issue (got spark builds working) and let
>> me move forward.
>>
>> Also ....
>>
>> At the same time I got errors regarding kylin.properties not being
>> found.  Since this lab setup is established from TAR downloads I needed to
>> copy my kylin configuration to each node (same path/directory structure)
>>
>> Not sure if this was before or after the above item but the hbase jars
>> and distributing the conf files got kylin/spark working for my small data
>> set; working on optimizing the medium data set now.
>>
>> Best of luck! J
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Tue, Dec 18, 2018 at 9:30 AM smallsuperman <[email protected]>
>> wrote:
>>
>>> Hello all,
>>> I usesd Apache Spark to replace MapReduce in the build cube step as
>>> document at http://kylin.apache.org/docs/tutorial/cube_spark.html . But
>>> the build job was failed at step 8 named Convert Cuboid Data to HFile and
>>> the log file output is
>>>
>>> OS command error exit with return code: 1, error message: 18/12/18
>>> 23:31:53 INFO client.RMProxy: Connecting to ResourceManager at iap12m6/
>>> 10.8.245.41:8032
>>> 18/12/18 23:31:53 INFO yarn.Client: Requesting a new application from
>>> cluster with 3 NodeManagers
>>> 18/12/18 23:31:53 INFO yarn.Client: Verifying our application has not
>>> requested more than the maximum memory capability of the cluster (8192 MB
>>> per container)
>>> 18/12/18 23:31:53 INFO yarn.Client: Will allocate AM container, with
>>> 1408 MB memory including 384 MB overhead
>>> 18/12/18 23:31:53 INFO yarn.Client: Setting up container launch context
>>> for our AM
>>> 18/12/18 23:31:53 INFO yarn.Client: Setting up the launch environment
>>> for our AM container
>>> 18/12/18 23:31:53 INFO yarn.Client: Preparing resources for our AM
>>> container
>>> 18/12/18 23:31:54 WARN yarn.Client: Neither spark.yarn.jars nor
>>> spark.yarn.archive is set, falling back to uploading libraries under
>>> SPARK_HOME.
>>> I also check the error log at yarn
>>>
>>> Diagnostics:
>>> User class threw exception: java.lang.RuntimeException: error execute
>>> org.apache.kylin.storage.hbase.steps.SparkCubeHFile. Root cause: Job
>>> aborted due to stage failure: Task 1 in stage 1.0 failed 4 times, most
>>> recent failure: Lost task 1.3 in stage 1.0 (TID 15, iap12m8, executor 3):
>>> java.lang.NoClassDefFoundError: Could not initialize class
>>> org.apache.hadoop.hbase.io.hfile.HFile
>>> at
>>> org.apache.hadoop.hbase.mapreduce.HFileOutputFormat2$1.getNewWriter(HFileOutputFormat2.java:305)
>>> at
>>> org.apache.hadoop.hbase.mapreduce.HFileOutputFormat2$1.write(HFileOutputFormat2.java:229)
>>> at
>>> org.apache.hadoop.hbase.mapreduce.HFileOutputFormat2$1.write(HFileOutputFormat2.java:167)
>>> at
>>> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12$$anonfun$apply$4.apply$mcV$sp(PairRDDFunctions.scala:1125)
>>> at
>>> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12$$anonfun$apply$4.apply(PairRDDFunctions.scala:1123)
>>> at
>>> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12$$anonfun$apply$4.apply(PairRDDFunctions.scala:1123)
>>> at
>>> org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1353)
>>> at
>>> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1131)
>>> at
>>> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1102)
>>> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
>>> at org.apache.spark.scheduler.Task.run(Task.scala:99)
>>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:325)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>>> at java.lang.Thread.run(Thread.java:748)
>>> Driver stacktrace:
>>> and I think java.lang.NoClassDefFoundError: Could not initialize class
>>> org.apache.hadoop.hbase.io.hfile.HFile
>>> at
>>> org.apache.hadoop.hbase.mapreduce.HFileOutputFormat2$1.getNewWriter(HFileOutputFormat2.java:305)
>>> is the most import info, however I have no idea and find few useful
>>> sugesstion at Internet…
>>>
>>> here is my environment
>>>
>>> hadoop-2.7.3
>>> hbase-1.4.9
>>> hive-1.2.1
>>> kylin-2.5.2-bin-hbase1x
>>> jdk1.8.0_144
>>> spark-2.2.0
>>> hope your helps thanks..
>>>
>>

Re: Kylin-2.5.2 bulid cube failed with spark at step 8

Reply via email to