[jira] [Created] (KYLIN-3667) ArrayIndexOutOfBoundsException in NDCuboidBuilder
Hubert STEFANI created KYLIN-3667: - Summary: ArrayIndexOutOfBoundsException in NDCuboidBuilder Key: KYLIN-3667 URL: https://issues.apache.org/jira/browse/KYLIN-3667 Project: Kylin Issue Type: Bug Components: Spark Engine Affects Versions: v2.5.0 Environment: AWS EMR Reporter: Hubert STEFANI The former errors https://issues.apache.org/jira/browse/KYLIN-3115 and https://issues.apache.org/jira/browse/KYLIN-1768 still remains in step SparkCubingByLayer. we encounter the following error : java.lang.ArrayIndexOutOfBoundsException at java.lang.System.arraycopy(Native Method) at org.apache.kylin.engine.mr.common.NDCuboidBuilder.buildKeyInternal(NDCuboidBuilder.java:106) at org.apache.kylin.engine.mr.common.NDCuboidBuilder.buildKey2(NDCuboidBuilder.java:87) at org.apache.kylin.engine.spark.SparkCubingByLayer$CuboidFlatMap.call(SparkCubingByLayer.java:425) at org.apache.kylin.engine.spark.SparkCubingByLayer$CuboidFlatMap.call(SparkCubingByLayer.java:370) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLike.scala:143) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLike.scala:143) at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434) at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:191) at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) at org.apache.spark.scheduler.Task.run(Task.scala:99) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:325) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) Do we have to (painfully) change dimensions size or should it be fixed through a patch ? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: ArrayIndexOutOfBoundsException for NDCuboidBuilder
The root cause is: .ArrayIndexOutOfBoundsException at java.lang.System.arraycopy(Native Method) at org.apache.kylin.engine.mr .common.NDCuboidBuilder.buildKeyInternal(NDCuboidBuilder.java:106) at org.apache.kylin.engine.mr .common.NDCuboidBuilder.buildKey2(NDCuboidBuilder.java:87) Normally it won't occur; Could you please share how the cube designed? Especially the encoding for the dimensions. liuzhixin 于2018年10月29日周一 下午8:18写道: > Hello kylin team: > # > When I test the kylin2.5.0-hadoop3.1-hbase2 for hdp3.0, > I select spark as the engine type, > Sometimes there are some ArrayIndexOutOfBoundsExceptions, > And the error info is blow: > # > # > [2018-10-29 11:48:26][INFO][Driver][DAGScheduler:54] Job 1 failed: runJob > at SparkHadoopWriter.scala:78, took 0.425416 s > [2018-10-29 11:48:26][ERROR][Driver][SparkHadoopWriter:91] Aborting job > job_20181029114826_0008. > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 > in stage 3.0 failed 4 times, most recent failure: Lost task 0.3 in stage > 3.0 (TID 5, ip-172-31-40-100.ec2.internal, executor 1): > java.lang.ArrayIndexOutOfBoundsException > at java.lang.System.arraycopy(Native Method) > at org.apache.kylin.engine.mr > .common.NDCuboidBuilder.buildKeyInternal(NDCuboidBuilder.java:106) > at org.apache.kylin.engine.mr > .common.NDCuboidBuilder.buildKey2(NDCuboidBuilder.java:87) > at > org.apache.kylin.engine.spark.SparkCubingByLayer$CuboidFlatMap.call(SparkCubingByLayer.java:432) > at > org.apache.kylin.engine.spark.SparkCubingByLayer$CuboidFlatMap.call(SparkCubingByLayer.java:376) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLike.scala:143) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLike.scala:143) > at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434) > at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) > at > org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:191) > at > org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) > at org.apache.spark.scheduler.Task.run(Task.scala:109) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > > Driver stacktrace: > at org.apache.spark.scheduler.DAGScheduler.org > $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1602) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1590) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1589) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at > scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) > at > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1589) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831) > at scala.Option.foreach(Option.scala:257) > at > org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:831) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1823) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1772) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1761) > at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) > at > org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:642) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:2034) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:2055) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:2087) > at org.apache.spark.internal.io > .SparkHadoopWriter$.write(SparkHadoopWriter.scala:78) > at > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1083) > at > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:1081) > at > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:1081) > at
ArrayIndexOutOfBoundsException for NDCuboidBuilder
Hello kylin team: # When I test the kylin2.5.0-hadoop3.1-hbase2 for hdp3.0, I select spark as the engine type, Sometimes there are some ArrayIndexOutOfBoundsExceptions, And the error info is blow: # # [2018-10-29 11:48:26][INFO][Driver][DAGScheduler:54] Job 1 failed: runJob at SparkHadoopWriter.scala:78, took 0.425416 s [2018-10-29 11:48:26][ERROR][Driver][SparkHadoopWriter:91] Aborting job job_20181029114826_0008. org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 3.0 failed 4 times, most recent failure: Lost task 0.3 in stage 3.0 (TID 5, ip-172-31-40-100.ec2.internal, executor 1): java.lang.ArrayIndexOutOfBoundsException at java.lang.System.arraycopy(Native Method) at org.apache.kylin.engine.mr.common.NDCuboidBuilder.buildKeyInternal(NDCuboidBuilder.java:106) at org.apache.kylin.engine.mr.common.NDCuboidBuilder.buildKey2(NDCuboidBuilder.java:87) at org.apache.kylin.engine.spark.SparkCubingByLayer$CuboidFlatMap.call(SparkCubingByLayer.java:432) at org.apache.kylin.engine.spark.SparkCubingByLayer$CuboidFlatMap.call(SparkCubingByLayer.java:376) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLike.scala:143) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLike.scala:143) at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434) at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:191) at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) at org.apache.spark.scheduler.Task.run(Task.scala:109) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1602) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1590) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1589) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1589) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831) at scala.Option.foreach(Option.scala:257) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:831) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1823) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1772) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1761) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:642) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2034) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2055) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2087) at org.apache.spark.internal.io.SparkHadoopWriter$.write(SparkHadoopWriter.scala:78) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1083) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:1081) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:1081) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:363) at org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:1081) at org.apache.spark.api.java.JavaPairRDD.saveAsNewAPIHadoopDataset(JavaPairRDD.scala:831) ...skipping... [2018-10-29 11:48:26][ERROR][Driver][ApplicationMaster:91] User class threw exception: java.lang.RuntimeException: error execute