[GitHub] spark pull request: SPARK-2045 Sort-based shuffle

2014-07-30 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1499#discussion_r15603509 --- Diff: core/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleWriter.scala --- @@ -0,0 +1,159 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-2045 Sort-based shuffle

2014-07-30 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1499#discussion_r15603895 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockManager.scala --- @@ -91,6 +97,20 @@ class ShuffleBlockManager(blockManager: BlockManager)

[GitHub] spark pull request: SPARK-2045 Sort-based shuffle

2014-07-30 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1499#discussion_r15604053 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala --- @@ -0,0 +1,667 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-2045 Sort-based shuffle

2014-07-30 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1499#discussion_r15605438 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala --- @@ -0,0 +1,667 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-2045 Sort-based shuffle

2014-07-30 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1499#issuecomment-50668930 Thanks everyone, I think I addressed all the comments. Anything else before we merge this? I'd like to merge it fairly soon because there are a few other issues I'd like

[GitHub] spark pull request: SPARK-2045 Sort-based shuffle

2014-07-30 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1499#issuecomment-50669058 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-2045 Sort-based shuffle

2014-07-30 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1499#issuecomment-50669721 QA tests have started for PR 1499. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17472/consoleFull ---

[GitHub] spark pull request: SPARK-2045 Sort-based shuffle

2014-07-30 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1499#issuecomment-50678947 QA results for PR 1499:br- This patch FAILED unit tests.br- This patch merges cleanlybr- This patch adds the following public classes (experimental):brclass

[GitHub] spark pull request: SPARK-2045 Sort-based shuffle

2014-07-30 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1499#issuecomment-50679871 QA results for PR 1499:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds the following public classes (experimental):brclass

[GitHub] spark pull request: SPARK-2045 Sort-based shuffle

2014-07-30 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1499#issuecomment-50694691 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-2045 Sort-based shuffle

2014-07-30 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/1499#discussion_r15620594 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala --- @@ -0,0 +1,662 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-2045 Sort-based shuffle

2014-07-30 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/1499#discussion_r15620640 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala --- @@ -0,0 +1,662 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-2045 Sort-based shuffle

2014-07-30 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1499#discussion_r15620674 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala --- @@ -0,0 +1,662 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-2045 Sort-based shuffle

2014-07-30 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/1499#discussion_r15621714 --- Diff: core/src/test/scala/org/apache/spark/util/collection/ExternalSorterSuite.scala --- @@ -0,0 +1,566 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-2045 Sort-based shuffle

2014-07-30 Thread andrewor14
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/1499#issuecomment-50700437 test this please! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: SPARK-2045 Sort-based shuffle

2014-07-30 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1499#issuecomment-50700807 QA tests have started for PR 1499. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17526/consoleFull ---

[GitHub] spark pull request: SPARK-2045 Sort-based shuffle

2014-07-30 Thread andrewor14
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/1499#issuecomment-50701304 I took another pass over the patch and the changes look ready to me. I also tested this locally and verified that the shuffle files were actually cleaned up. There is

[GitHub] spark pull request: SPARK-2045 Sort-based shuffle

2014-07-30 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1499#issuecomment-50701589 Ok I merged this in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-2045 Sort-based shuffle

2014-07-30 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/1499 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: SPARK-2045 Sort-based shuffle

2014-07-30 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1499#issuecomment-50704737 QA results for PR 1499:br- This patch FAILED unit tests.br- This patch merges cleanlybr- This patch adds the following public classes (experimental):brclass

[GitHub] spark pull request: SPARK-2045 Sort-based shuffle

2014-07-29 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1499#discussion_r15508024 --- Diff: core/src/main/scala/org/apache/spark/rdd/OrderedRDDFunctions.scala --- @@ -43,10 +44,10 @@ import org.apache.spark.{Logging, RangePartitioner}

[GitHub] spark pull request: SPARK-2045 Sort-based shuffle

2014-07-29 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1499#discussion_r15543949 --- Diff: core/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleWriter.scala --- @@ -0,0 +1,159 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-2045 Sort-based shuffle

2014-07-29 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1499#discussion_r15544043 --- Diff: core/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleWriter.scala --- @@ -0,0 +1,159 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-2045 Sort-based shuffle

2014-07-29 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1499#discussion_r15544196 --- Diff: core/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleWriter.scala --- @@ -0,0 +1,159 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-2045 Sort-based shuffle

2014-07-29 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1499#discussion_r15544259 --- Diff: core/src/main/scala/org/apache/spark/storage/DiskBlockManager.scala --- @@ -54,12 +55,16 @@ private[spark] class DiskBlockManager(shuffleManager:

[GitHub] spark pull request: SPARK-2045 Sort-based shuffle

2014-07-29 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1499#issuecomment-50522967 @mateiz please refer to changes here : https://github.com/apache/spark/pull/1609/files#diff-10 They should be relevant to this PR too --- If your project is set up

[GitHub] spark pull request: SPARK-2045 Sort-based shuffle

2014-07-29 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1499#discussion_r1237 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala --- @@ -0,0 +1,667 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-2045 Sort-based shuffle

2014-07-29 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1499#discussion_r1233 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala --- @@ -0,0 +1,390 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-2045 Sort-based shuffle

2014-07-29 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/1499#discussion_r1382 --- Diff: core/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleWriter.scala --- @@ -0,0 +1,159 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-2045 Sort-based shuffle

2014-07-29 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1499#discussion_r1675 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala --- @@ -0,0 +1,667 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-2045 Sort-based shuffle

2014-07-29 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1499#discussion_r1693 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala --- @@ -0,0 +1,667 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-2045 Sort-based shuffle

2014-07-29 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1499#discussion_r15556144 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala --- @@ -0,0 +1,667 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-2045 Sort-based shuffle

2014-07-29 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/1499#discussion_r15556374 --- Diff: core/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleWriter.scala --- @@ -0,0 +1,159 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-2045 Sort-based shuffle

2014-07-29 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/1499#discussion_r15557253 --- Diff: core/src/main/scala/org/apache/spark/storage/DiskBlockManager.scala --- @@ -54,12 +55,16 @@ private[spark] class

[GitHub] spark pull request: SPARK-2045 Sort-based shuffle

2014-07-29 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/1499#discussion_r15557939 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockManager.scala --- @@ -91,6 +97,20 @@ class ShuffleBlockManager(blockManager:

[GitHub] spark pull request: SPARK-2045 Sort-based shuffle

2014-07-29 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/1499#discussion_r15560379 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala --- @@ -0,0 +1,667 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-2045 Sort-based shuffle

2014-07-29 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/1499#discussion_r15560508 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala --- @@ -0,0 +1,667 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-2045 Sort-based shuffle

2014-07-29 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/1499#discussion_r15560695 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala --- @@ -0,0 +1,667 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-2045 Sort-based shuffle

2014-07-29 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/1499#discussion_r15560798 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala --- @@ -0,0 +1,667 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-2045 Sort-based shuffle

2014-07-29 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/1499#discussion_r15560817 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala --- @@ -0,0 +1,667 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-2045 Sort-based shuffle

2014-07-29 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/1499#discussion_r15560994 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala --- @@ -0,0 +1,667 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-2045 Sort-based shuffle

2014-07-29 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/1499#discussion_r15562065 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala --- @@ -0,0 +1,667 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-2045 Sort-based shuffle

2014-07-29 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/1499#discussion_r15562204 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala --- @@ -0,0 +1,667 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-2045 Sort-based shuffle

2014-07-29 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/1499#discussion_r15562761 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala --- @@ -0,0 +1,667 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-2045 Sort-based shuffle

2014-07-29 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/1499#discussion_r15562851 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala --- @@ -0,0 +1,667 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-2045 Sort-based shuffle

2014-07-29 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/1499#discussion_r15562901 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala --- @@ -0,0 +1,667 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-2045 Sort-based shuffle

2014-07-29 Thread aarondav
Github user aarondav commented on a diff in the pull request: https://github.com/apache/spark/pull/1499#discussion_r15562939 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala --- @@ -0,0 +1,667 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-2045 Sort-based shuffle

2014-07-29 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/1499#discussion_r15563349 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala --- @@ -0,0 +1,667 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-2045 Sort-based shuffle

2014-07-29 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/1499#discussion_r15563638 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala --- @@ -0,0 +1,667 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-2045 Sort-based shuffle

2014-07-28 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1499#issuecomment-50303404 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-2045 Sort-based shuffle

2014-07-28 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1499#issuecomment-50303714 QA tests have started for PR 1499. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17274/consoleFull ---

[GitHub] spark pull request: SPARK-2045 Sort-based shuffle

2014-07-28 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1499#issuecomment-50306524 QA results for PR 1499:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds the following public classes (experimental):brclass

[GitHub] spark pull request: SPARK-2045 Sort-based shuffle

2014-07-28 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/1499#discussion_r15493355 --- Diff: core/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleManager.scala --- @@ -0,0 +1,80 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-2045 Sort-based shuffle

2014-07-28 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/1499#discussion_r15494109 --- Diff: core/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleManager.scala --- @@ -0,0 +1,80 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-2045 Sort-based shuffle

2014-07-28 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/1499#discussion_r15494358 --- Diff: core/src/main/scala/org/apache/spark/storage/DiskBlockManager.scala --- @@ -54,12 +55,16 @@ private[spark] class DiskBlockManager(shuffleManager:

[GitHub] spark pull request: SPARK-2045 Sort-based shuffle

2014-07-28 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/1499#discussion_r15494423 --- Diff: core/src/main/scala/org/apache/spark/storage/DiskBlockManager.scala --- @@ -54,12 +55,16 @@ private[spark] class DiskBlockManager(shuffleManager:

[GitHub] spark pull request: SPARK-2045 Sort-based shuffle

2014-07-28 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1499#discussion_r15494991 --- Diff: core/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleManager.scala --- @@ -0,0 +1,80 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-2045 Sort-based shuffle

2014-07-28 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1499#discussion_r15495007 --- Diff: core/src/main/scala/org/apache/spark/storage/DiskBlockManager.scala --- @@ -54,12 +55,16 @@ private[spark] class DiskBlockManager(shuffleManager:

[GitHub] spark pull request: SPARK-2045 Sort-based shuffle

2014-07-28 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/1499#discussion_r15495220 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockManager.scala --- @@ -91,6 +97,20 @@ class ShuffleBlockManager(blockManager:

[GitHub] spark pull request: SPARK-2045 Sort-based shuffle

2014-07-28 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/1499#discussion_r15497588 --- Diff: core/src/test/scala/org/apache/spark/util/collection/ExternalSorterSuite.scala --- @@ -0,0 +1,566 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-2045 Sort-based shuffle

2014-07-28 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/1499#discussion_r15501936 --- Diff: core/src/main/scala/org/apache/spark/rdd/OrderedRDDFunctions.scala --- @@ -43,10 +44,10 @@ import org.apache.spark.{Logging, RangePartitioner}

[GitHub] spark pull request: SPARK-2045 Sort-based shuffle

2014-07-27 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1499#issuecomment-50292919 I've now rebased this on top of the SizeTracker class in #1165 -- should be ready to go in. There is one issue left with both the ExternalSorter and

[GitHub] spark pull request: SPARK-2045 Sort-based shuffle

2014-07-27 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1499#issuecomment-50297730 Are map tasks spilling by any chance? There is one issue in this right now, which is that if your map task spills to disk, you need to spill multiple times with the

[GitHub] spark pull request: SPARK-2045 Sort-based shuffle

2014-07-27 Thread colorant
Github user colorant commented on the pull request: https://github.com/apache/spark/pull/1499#issuecomment-50301634 @mateiz , yep, the map tasks did spill and it seems contribute most to the increased process time. though in my case only about 400K data been spilled to disk per task.