[jira] [Created] (KYLIN-3667) ArrayIndexOutOfBoundsException in NDCuboidBuilder

2018-11-05 Thread Hubert STEFANI (JIRA)
Hubert STEFANI created KYLIN-3667:
-

 Summary: ArrayIndexOutOfBoundsException in NDCuboidBuilder
 Key: KYLIN-3667
 URL: https://issues.apache.org/jira/browse/KYLIN-3667
 Project: Kylin
  Issue Type: Bug
  Components: Spark Engine
Affects Versions: v2.5.0
 Environment: AWS EMR 
Reporter: Hubert STEFANI


The former errors

https://issues.apache.org/jira/browse/KYLIN-3115

and

https://issues.apache.org/jira/browse/KYLIN-1768

still remains in step SparkCubingByLayer.

we encounter the following error : 

java.lang.ArrayIndexOutOfBoundsException at java.lang.System.arraycopy(Native 
Method)
 at 
org.apache.kylin.engine.mr.common.NDCuboidBuilder.buildKeyInternal(NDCuboidBuilder.java:106)
 at 
org.apache.kylin.engine.mr.common.NDCuboidBuilder.buildKey2(NDCuboidBuilder.java:87)
 at 
org.apache.kylin.engine.spark.SparkCubingByLayer$CuboidFlatMap.call(SparkCubingByLayer.java:425)
 at 
org.apache.kylin.engine.spark.SparkCubingByLayer$CuboidFlatMap.call(SparkCubingByLayer.java:370)
 at 
org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLike.scala:143)
 at 
org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLike.scala:143)
 at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
 at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
 at 
org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:191)
 at 
org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
 at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
 at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
 at org.apache.spark.scheduler.Task.run(Task.scala:99)
 at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:325)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

 

Do we have to (painfully) change dimensions size or should it be fixed through 
a patch ? 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: ArrayIndexOutOfBoundsException for NDCuboidBuilder

2018-10-29 Thread ShaoFeng Shi
The root cause is:
.ArrayIndexOutOfBoundsException
at java.lang.System.arraycopy(Native Method)
at org.apache.kylin.engine.mr
.common.NDCuboidBuilder.buildKeyInternal(NDCuboidBuilder.java:106)
at org.apache.kylin.engine.mr
.common.NDCuboidBuilder.buildKey2(NDCuboidBuilder.java:87)

Normally it won't occur; Could you please share how the cube designed?
Especially the encoding for the dimensions.

liuzhixin  于2018年10月29日周一 下午8:18写道:

> Hello kylin team:
> #
> When I test the kylin2.5.0-hadoop3.1-hbase2 for hdp3.0,
> I select spark as the engine type,
> Sometimes there are some ArrayIndexOutOfBoundsExceptions,
> And the error info is blow:
> #
> #
> [2018-10-29 11:48:26][INFO][Driver][DAGScheduler:54] Job 1 failed: runJob
> at SparkHadoopWriter.scala:78, took 0.425416 s
> [2018-10-29 11:48:26][ERROR][Driver][SparkHadoopWriter:91] Aborting job
> job_20181029114826_0008.
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0
> in stage 3.0 failed 4 times, most recent failure: Lost task 0.3 in stage
> 3.0 (TID 5, ip-172-31-40-100.ec2.internal, executor 1):
> java.lang.ArrayIndexOutOfBoundsException
> at java.lang.System.arraycopy(Native Method)
> at org.apache.kylin.engine.mr
> .common.NDCuboidBuilder.buildKeyInternal(NDCuboidBuilder.java:106)
> at org.apache.kylin.engine.mr
> .common.NDCuboidBuilder.buildKey2(NDCuboidBuilder.java:87)
> at
> org.apache.kylin.engine.spark.SparkCubingByLayer$CuboidFlatMap.call(SparkCubingByLayer.java:432)
> at
> org.apache.kylin.engine.spark.SparkCubingByLayer$CuboidFlatMap.call(SparkCubingByLayer.java:376)
> at
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLike.scala:143)
> at
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLike.scala:143)
> at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
> at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
> at
> org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:191)
> at
> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
> at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
> at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
> at org.apache.spark.scheduler.Task.run(Task.scala:109)
> at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
>
> Driver stacktrace:
> at org.apache.spark.scheduler.DAGScheduler.org
> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1602)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1590)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1589)
> at
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at
> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
> at
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1589)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)
> at scala.Option.foreach(Option.scala:257)
> at
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:831)
> at
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1823)
> at
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1772)
> at
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1761)
> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
> at
> org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:642)
> at org.apache.spark.SparkContext.runJob(SparkContext.scala:2034)
> at org.apache.spark.SparkContext.runJob(SparkContext.scala:2055)
> at org.apache.spark.SparkContext.runJob(SparkContext.scala:2087)
> at org.apache.spark.internal.io
> .SparkHadoopWriter$.write(SparkHadoopWriter.scala:78)
> at
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1083)
> at
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:1081)
> at
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:1081)
> at

ArrayIndexOutOfBoundsException for NDCuboidBuilder

2018-10-29 Thread liuzhixin
Hello kylin team:
#
When I test the kylin2.5.0-hadoop3.1-hbase2 for hdp3.0, 
I select spark as the engine type,
Sometimes there are some ArrayIndexOutOfBoundsExceptions,
And the error info is blow:
#
#
[2018-10-29 11:48:26][INFO][Driver][DAGScheduler:54] Job 1 failed: runJob at 
SparkHadoopWriter.scala:78, took 0.425416 s
[2018-10-29 11:48:26][ERROR][Driver][SparkHadoopWriter:91] Aborting job 
job_20181029114826_0008.
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage 3.0 failed 4 times, most recent failure: Lost task 0.3 in stage 3.0 (TID 
5, ip-172-31-40-100.ec2.internal, executor 1): 
java.lang.ArrayIndexOutOfBoundsException
at java.lang.System.arraycopy(Native Method)
at 
org.apache.kylin.engine.mr.common.NDCuboidBuilder.buildKeyInternal(NDCuboidBuilder.java:106)
at 
org.apache.kylin.engine.mr.common.NDCuboidBuilder.buildKey2(NDCuboidBuilder.java:87)
at 
org.apache.kylin.engine.spark.SparkCubingByLayer$CuboidFlatMap.call(SparkCubingByLayer.java:432)
at 
org.apache.kylin.engine.spark.SparkCubingByLayer$CuboidFlatMap.call(SparkCubingByLayer.java:376)
at 
org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLike.scala:143)
at 
org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLike.scala:143)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at 
org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:191)
at 
org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

Driver stacktrace:
at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1602)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1590)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1589)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1589)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)
at scala.Option.foreach(Option.scala:257)
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:831)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1823)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1772)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1761)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at 
org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:642)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2034)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2055)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2087)
at 
org.apache.spark.internal.io.SparkHadoopWriter$.write(SparkHadoopWriter.scala:78)
at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1083)
at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:1081)
at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:1081)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
at 
org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:1081)
at 
org.apache.spark.api.java.JavaPairRDD.saveAsNewAPIHadoopDataset(JavaPairRDD.scala:831)
...skipping...
[2018-10-29 11:48:26][ERROR][Driver][ApplicationMaster:91] User class threw 
exception: java.lang.RuntimeException: error execute