yangping wu created SPARK-3656: ---------------------------------- Summary: IllegalArgumentException when I using sort-based shuffle Key: SPARK-3656 URL: https://issues.apache.org/jira/browse/SPARK-3656 Project: Spark Issue Type: Bug Components: Shuffle Affects Versions: 1.1.0 Reporter: yangping wu
The code work fine in hash-based shuffle. {code} sc.textFile("file:///export1/spark/zookeeper.out").flatMap(l => l.split(" ")).map(w=>(w,1)).reduceByKey(_ + _).collect {code} But when I test the program using sort-based shuffle,the program encounters an error: {code} scala> sc.textFile("file:///export1/spark/zookeeper.out").flatMap(l => l.split(" ")).map(w=>(w,1)).reduceByKey(_ + _).collect org.apache.spark.SparkException: Job aborted due to stage failure: Task 22 in stage 1.0 failed 1 times, most recent failure: Lost task 22.0 in stage 1.0 (TID 22, localhost): java.lang.IllegalArgumentException: Comparison method violates its general contract! org.apache.spark.util.collection.Sorter$SortState.mergeHi(Sorter.java:876) org.apache.spark.util.collection.Sorter$SortState.mergeAt(Sorter.java:495) org.apache.spark.util.collection.Sorter$SortState.mergeForceCollapse(Sorter.java:436) org.apache.spark.util.collection.Sorter$SortState.access$300(Sorter.java:294) org.apache.spark.util.collection.Sorter.sort(Sorter.java:137) org.apache.spark.util.collection.AppendOnlyMap.destructiveSortedIterator(AppendOnlyMap.scala:271) org.apache.spark.util.collection.ExternalSorter.spillToMergeableFile(ExternalSorter.scala:323) org.apache.spark.util.collection.ExternalSorter.spill(ExternalSorter.scala:271) org.apache.spark.util.collection.ExternalSorter.maybeSpill(ExternalSorter.scala:249) org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:212) org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:67) org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) org.apache.spark.scheduler.Task.run(Task.scala:54) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) java.lang.Thread.run(Thread.java:619) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1174) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1173) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1173) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:688) at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1391) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at akka.actor.ActorCell.invoke(ActorCell.scala:456) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org