[GitHub] spark pull request: [SPARK-3495] Block replication fails continuou...

2014-09-12 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/2366#issuecomment-55472830 What happens when there is recomputation which results in same blockId getting regenerated (unpersist followed by recomputation/persist or block drop followed

[GitHub] spark pull request: [SPARK-3495] Block replication fails continuou...

2014-09-21 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/2366#issuecomment-56293724 @tdas handling (1) deterministically will make (2) in line with what we currently have. And that should be sufficient imo. (3) was not in context

[GitHub] spark pull request: [SPARK-3495] Block replication fails continuou...

2014-09-21 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/2366#discussion_r17833363 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -787,31 +789,88 @@ private[spark] class BlockManager

[GitHub] spark pull request: [SPARK-3495] Block replication fails continuou...

2014-09-22 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/2366#discussion_r17833383 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -787,31 +789,88 @@ private[spark] class BlockManager

[GitHub] spark pull request: [SPARK-3495] Block replication fails continuou...

2014-09-22 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/2366#discussion_r17833419 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -787,31 +789,88 @@ private[spark] class BlockManager

[GitHub] spark pull request: [SPARK-3495] Block replication fails continuou...

2014-09-22 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/2366#discussion_r17833483 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -787,31 +789,88 @@ private[spark] class BlockManager

[GitHub] spark pull request: SPARK-3561 - Pluggable strategy to facilitate ...

2014-09-22 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/2422#issuecomment-56373277 Is there an example of how this is going to be leveraged ? The default case is the simple version delegating to existing spark - would be good to see how this is used

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-23 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1486#issuecomment-56480392 Are we proposing to introduce hdfs caching tags/idioms directly into TaskSetManager in this pr ? That does not look right. We need to generalize this so that any rdd

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-23 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1486#issuecomment-56506066 @pwendell This is not hadoop RDD specific functionality - it is a general requirement which can be leveraged by any RDD in spark - and hadoop RDD currently happens

[GitHub] spark pull request: [SPARK-3495] Block replication fails continuou...

2014-09-23 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/2366#discussion_r17927862 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -787,31 +790,111 @@ private[spark] class BlockManager

[GitHub] spark pull request: [SPARK-3495] Block replication fails continuou...

2014-09-23 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/2366#issuecomment-56566367 @tdas In case I did not mention it before :-) this is definitely a great improvement over what existed earlier ! I would love it if we could (sometime soon I hope

[GitHub] spark pull request: (WIP) SPARK-2045 Sort-based shuffle

2014-07-22 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1499#issuecomment-49804353 We saw a bunch of EOF Exceptions from SpillReader. java.io.EOFException at java.io.ObjectInputStream$BlockDataInputStream.peekByte

[GitHub] spark pull request: (WIP) SPARK-2045 Sort-based shuffle

2014-07-22 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/1499#discussion_r15259118 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala --- @@ -0,0 +1,573 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: (WIP) SPARK-2045 Sort-based shuffle

2014-07-22 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/1499#discussion_r15259190 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala --- @@ -0,0 +1,573 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: (WIP) SPARK-2045 Sort-based shuffle

2014-07-22 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1499#issuecomment-49833579 I had pulled about 20 mins after I mailed you ... I have elaborated on why this occurs inline in the code - we can ignore it for now though, since it happens even

[GitHub] spark pull request: (WIP) SPARK-2045 Sort-based shuffle

2014-07-23 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/1499#discussion_r15274240 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala --- @@ -0,0 +1,649 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-2634: Change MapOutputTrackerWorker.mapS...

2014-07-23 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1541#issuecomment-49855865 Instead of a ConcurrentHashMap, we should actually move it to a disk backed Map - the cleanup of this datastructure is painful - which it can become extremely large

[GitHub] spark pull request: (WIP) SPARK-2045 Sort-based shuffle

2014-07-23 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/1499#discussion_r15288486 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala --- @@ -0,0 +1,649 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: (WIP) SPARK-2045 Sort-based shuffle

2014-07-23 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1499#issuecomment-49949511 @mateiz The total memory overhead actually goes much higher than num_streams right ? It should be order of num_streams + num_values for this key. For fairly

[GitHub] spark pull request: (WIP) SPARK-2045 Sort-based shuffle

2014-07-25 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1499#issuecomment-50115353 Running tests with export SPARK_JAVA_OPTS=-Dspark.shuffle.manager=org.apache.spark.shuffle.sort.SortShuffleManager causes : ''' - sorting using mutable

[GitHub] spark pull request: (WIP) SPARK-2045 Sort-based shuffle

2014-07-25 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1499#issuecomment-50115453 BTW, this is one of 5 failures from core. I hope there are no merge issues though, --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request: (WIP) SPARK-2045 Sort-based shuffle

2014-07-25 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1499#issuecomment-50116492 ah, thanks ! rerunning with 9c29957. cant pull the pr - and manual merge is painful, hence delays in testing :-) --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-2671] BlockObjectWriter should create p...

2014-07-26 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1580#issuecomment-50246267 Actually we have also seen this happen multiple times. A few have them have been fixed, but not all have been identified. For example, there is incorrect DCL

[GitHub] spark pull request: SPARK-2294: fix locality inversion bug in Task...

2014-07-27 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1313#issuecomment-50257258 Since all process local tasks are also node, rack and any : we will incur node local delay also. On 27-Jul-2014 11:09 am, Matei Zaharia notificati...@github.com

[GitHub] spark pull request: [SPARK-2532] WIP Consolidated shuffle fixes

2014-07-27 Thread mridulm
GitHub user mridulm opened a pull request: https://github.com/apache/spark/pull/1609 [SPARK-2532] WIP Consolidated shuffle fixes Status of the PR - [X] Cherry pick and merge changes from internal branch to spark master - [X] Remove WIP comments and 2G branch references

[GitHub] spark pull request: [SPARK-2532] WIP Consolidated shuffle fixes

2014-07-27 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/1609#discussion_r15442779 --- Diff: core/src/test/scala/org/apache/spark/storage/DiskBlockObjectWriterSuite.scala --- @@ -0,0 +1,296 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-2532] WIP Consolidated shuffle fixes

2014-07-28 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/1609#discussion_r15448803 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -935,15 +941,22 @@ private[spark] object Utils extends Logging { * Currently

[GitHub] spark pull request: [SPARK-2532] WIP Consolidated shuffle fixes

2014-07-28 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1609#issuecomment-50306648 Accidental close, apologies ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-2532] WIP Consolidated shuffle fixes

2014-07-28 Thread mridulm
Github user mridulm closed the pull request at: https://github.com/apache/spark/pull/1609 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: [SPARK-2532] WIP Consolidated shuffle fixes

2014-07-28 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1609#issuecomment-50306633 @witgo I did not understand the space issue : stylecheck seems to run fine. Regarding the actual issues : the JIRA lists some of them - unfortunately

[GitHub] spark pull request: [SPARK-2532] WIP Consolidated shuffle fixes

2014-07-28 Thread mridulm
GitHub user mridulm reopened a pull request: https://github.com/apache/spark/pull/1609 [SPARK-2532] WIP Consolidated shuffle fixes Status of the PR - [X] Cherry pick and merge changes from internal branch to spark master - [X] Remove WIP comments and 2G branch references

[GitHub] spark pull request: [SPARK-2532] WIP Consolidated shuffle fixes

2014-07-28 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1609#issuecomment-50307155 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-2532] WIP Consolidated shuffle fixes

2014-07-28 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/1609#discussion_r15449433 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -947,6 +958,34 @@ private[spark] object Utils extends Logging

[GitHub] spark pull request: [SPARK-2532] Consolidated shuffle fixes

2014-07-29 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1609#issuecomment-50455483 All pending fixes work be done. I dont think there are any pieces missing in the merge from internal branch to master. Open for review, thanks ! --- If your

[GitHub] spark pull request: SPARK-2638 MapOutputTracker concurrency improv...

2014-07-29 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1542#issuecomment-50488319 @pwendell @mateiz was this PR really merged into spark ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [SPARK-2532] Consolidated shuffle fixes

2014-07-29 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/1609#discussion_r15537366 --- Diff: core/src/main/scala/org/apache/spark/serializer/JavaSerializer.scala --- @@ -40,7 +40,7 @@ private[spark] class JavaSerializationStream(out

[GitHub] spark pull request: SPARK-2638 MapOutputTracker concurrency improv...

2014-07-29 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1542#issuecomment-50509704 That was super scarey ! Thanks for clarifying @aarondav --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [SPARK-2532] Consolidated shuffle fixes

2014-07-29 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/1609#discussion_r15540734 --- Diff: core/src/main/scala/org/apache/spark/shuffle/hash/HashShuffleWriter.scala --- @@ -116,8 +118,13 @@ class HashShuffleWriter[K, V]( private

[GitHub] spark pull request: [SPARK-2532] Consolidated shuffle fixes

2014-07-29 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/1609#discussion_r15540782 --- Diff: core/src/main/scala/org/apache/spark/shuffle/hash/HashShuffleWriter.scala --- @@ -71,7 +72,8 @@ class HashShuffleWriter[K, V]( try

[GitHub] spark pull request: [SPARK-2532] Consolidated shuffle fixes

2014-07-29 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/1609#discussion_r15540934 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockObjectWriter.scala --- @@ -107,68 +109,296 @@ private[spark] class DiskBlockObjectWriter

[GitHub] spark pull request: [SPARK-2532] Consolidated shuffle fixes

2014-07-29 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/1609#discussion_r15541003 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockObjectWriter.scala --- @@ -107,68 +109,296 @@ private[spark] class DiskBlockObjectWriter

[GitHub] spark pull request: [SPARK-2532] Consolidated shuffle fixes

2014-07-29 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/1609#discussion_r15541065 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockObjectWriter.scala --- @@ -188,6 +425,39 @@ private[spark] class DiskBlockObjectWriter

[GitHub] spark pull request: [SPARK-2532] Consolidated shuffle fixes

2014-07-29 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/1609#discussion_r15541257 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockManager.scala --- @@ -236,31 +241,61 @@ object ShuffleBlockManager { new

[GitHub] spark pull request: [SPARK-2532] Consolidated shuffle fixes

2014-07-29 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/1609#discussion_r15541435 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalAppendOnlyMap.scala --- @@ -353,26 +368,53 @@ class ExternalAppendOnlyMap[K, V, C

[GitHub] spark pull request: [SPARK-2532] Consolidated shuffle fixes

2014-07-29 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/1609#discussion_r15542308 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalAppendOnlyMap.scala --- @@ -418,7 +459,25 @@ class ExternalAppendOnlyMap[K, V, C

[GitHub] spark pull request: [SPARK-2532] Consolidated shuffle fixes

2014-07-29 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1609#issuecomment-50517754 I have added some comments to the PR in the hopes that it will aid in the review. I am sure it is still involved process inspite of this, so please do feel free

[GitHub] spark pull request: SPARK-2045 Sort-based shuffle

2014-07-29 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1499#issuecomment-50522967 @mateiz please refer to changes here : https://github.com/apache/spark/pull/1609/files#diff-10 They should be relevant to this PR too --- If your project is set up

[GitHub] spark pull request: [SPARK-2532] Consolidated shuffle fixes

2014-07-29 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/1609#discussion_r15565447 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockObjectWriter.scala --- @@ -107,68 +109,296 @@ private[spark] class DiskBlockObjectWriter

[GitHub] spark pull request: [SPARK-2532] Consolidated shuffle fixes

2014-07-29 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/1609#discussion_r15565486 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockObjectWriter.scala --- @@ -107,68 +109,296 @@ private[spark] class DiskBlockObjectWriter

[GitHub] spark pull request: [SPARK-2532] Consolidated shuffle fixes

2014-07-29 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/1609#discussion_r15565552 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -947,6 +958,34 @@ private[spark] object Utils extends Logging

[GitHub] spark pull request: SPARK-2532: Minimal shuffle consolidation fixe...

2014-07-31 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/1678#discussion_r15682389 --- Diff: core/src/main/scala/org/apache/spark/shuffle/hash/HashShuffleWriter.scala --- @@ -120,8 +121,7 @@ private[spark] class HashShuffleWriter[K, V

[GitHub] spark pull request: SPARK-2532: Minimal shuffle consolidation fixe...

2014-07-31 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/1678#discussion_r15682412 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockObjectWriter.scala --- @@ -147,28 +147,36 @@ private[spark] class DiskBlockObjectWriter

[GitHub] spark pull request: SPARK-2532: Minimal shuffle consolidation fixe...

2014-07-31 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/1678#discussion_r15682457 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockObjectWriter.scala --- @@ -147,28 +147,36 @@ private[spark] class DiskBlockObjectWriter

[GitHub] spark pull request: SPARK-2532: Minimal shuffle consolidation fixe...

2014-07-31 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/1678#discussion_r15683205 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockObjectWriter.scala --- @@ -147,28 +147,36 @@ private[spark] class DiskBlockObjectWriter

[GitHub] spark pull request: SPARK-2532: Minimal shuffle consolidation fixe...

2014-07-31 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/1678#discussion_r15683224 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockObjectWriter.scala --- @@ -147,28 +147,36 @@ private[spark] class DiskBlockObjectWriter

[GitHub] spark pull request: [SPARK-2033] Automatically cleanup checkpoint

2014-08-01 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/855#issuecomment-50895949 This definitely is much better, thanks for the PR ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: SPARK-2532: Minimal shuffle consolidation fixe...

2014-08-01 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/1678#discussion_r15701250 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockObjectWriter.scala --- @@ -147,28 +147,36 @@ private[spark] class DiskBlockObjectWriter

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-08-01 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/1486#discussion_r15725601 --- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala --- @@ -243,10 +244,23 @@ class HadoopRDD[K, V]( new

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-08-01 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/1486#discussion_r15725610 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -216,6 +216,7 @@ abstract class RDD[T: ClassTag]( getPreferredLocations(split

[GitHub] spark pull request: SPARK-2792. Fix reading too much or too little...

2014-08-01 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/1722#discussion_r15725631 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalAppendOnlyMap.scala --- @@ -215,16 +218,28 @@ class ExternalAppendOnlyMap[K, V, C

[GitHub] spark pull request: SPARK-2792. Fix reading too much or too little...

2014-08-01 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/1722#discussion_r15725641 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalAppendOnlyMap.scala --- @@ -389,27 +404,51 @@ class ExternalAppendOnlyMap[K, V, C

[GitHub] spark pull request: SPARK-2792. Fix reading too much or too little...

2014-08-01 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/1722#discussion_r15725667 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalAppendOnlyMap.scala --- @@ -389,27 +404,51 @@ class ExternalAppendOnlyMap[K, V, C

[GitHub] spark pull request: SPARK-2792. Fix reading too much or too little...

2014-08-01 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/1722#discussion_r15725700 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalAppendOnlyMap.scala --- @@ -455,7 +495,25 @@ class ExternalAppendOnlyMap[K, V, C

[GitHub] spark pull request: SPARK-2792. Fix reading too much or too little...

2014-08-01 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/1722#discussion_r15725724 --- Diff: core/src/test/scala/org/apache/spark/util/collection/ExternalAppendOnlyMapSuite.scala --- @@ -30,8 +30,19 @@ class ExternalAppendOnlyMapSuite

[GitHub] spark pull request: SPARK-2792. Fix reading too much or too little...

2014-08-01 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/1722#discussion_r15725858 --- Diff: core/src/test/scala/org/apache/spark/util/collection/ExternalAppendOnlyMapSuite.scala --- @@ -30,8 +30,19 @@ class ExternalAppendOnlyMapSuite

[GitHub] spark pull request: [SPARK-2635] Fix race condition at SchedulerBa...

2014-08-01 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/1525#discussion_r15725875 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala --- @@ -47,19 +47,19 @@ class

[GitHub] spark pull request: [SPARK-2635] Fix race condition at SchedulerBa...

2014-08-01 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/1525#discussion_r15725931 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala --- @@ -47,19 +47,19 @@ class

[GitHub] spark pull request: SPARK-2792. Fix reading too much or too little...

2014-08-02 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/1722#discussion_r15728047 --- Diff: core/src/test/scala/org/apache/spark/util/collection/ExternalAppendOnlyMapSuite.scala --- @@ -30,8 +30,19 @@ class ExternalAppendOnlyMapSuite

[GitHub] spark pull request: [Minor] Fixes on top of #1679

2014-08-02 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/1736#discussion_r15728056 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerSource.scala --- @@ -46,9 +46,8 @@ private[spark] class BlockManagerSource(val

[GitHub] spark pull request: [Minor] Fixes on top of #1679

2014-08-02 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/1736#discussion_r15732438 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerSource.scala --- @@ -46,9 +46,8 @@ private[spark] class BlockManagerSource(val

[GitHub] spark pull request: [Minor] Fixes on top of #1679

2014-08-02 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/1736#discussion_r15732470 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerSource.scala --- @@ -46,9 +46,8 @@ private[spark] class BlockManagerSource(val

[GitHub] spark pull request: SPARK-2792. Fix reading too much or too little...

2014-08-03 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1722#issuecomment-50992153 LGTM, thanks Matei ! On 03-Aug-2014 12:13 pm, Matei Zaharia notificati...@github.com wrote: @aarondav https://github.com/aarondav / @mridulm https

[GitHub] spark pull request: SPARK-2792. Fix reading too much or too little...

2014-08-03 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1722#issuecomment-50992283 Oh wait, is the java serialier change also ported ? Else the tests won't do what we want it to do. On 03-Aug-2014 8:11 pm, Mridul Muralidharan mri...@gmail.com

[GitHub] spark pull request: SPARK-2792. Fix reading too much or too little...

2014-08-03 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1722#issuecomment-51003282 LGTM ! Though I would prefer if @aarondav also took a look at it - since this is based on my earlier work, I might be too close to it to see potential issues

[GitHub] spark pull request: SPARK-2792. Fix reading too much or too little...

2014-08-04 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/1722#discussion_r15750212 --- Diff: core/src/main/scala/org/apache/spark/serializer/JavaSerializer.scala --- @@ -35,16 +35,15 @@ private[spark] class JavaSerializationStream(out

[GitHub] spark pull request: SPARK-2792. Fix reading too much or too little...

2014-08-04 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1722#issuecomment-51047651 LGTM ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request: [SPARK-2856] Decrease initial buffer size for ...

2014-08-05 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1780#issuecomment-51168402 IIRC if kryo cant host entire serialized object in the buffer, it throws up : we saw issues with it being as high as 256 kb for some of our jobs : though we were using

[GitHub] spark pull request: [SPARK-2503] Lower shuffle output buffer (spar...

2014-08-05 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1781#issuecomment-51169641 We are running this with 8k or so :-) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: Expose aplication ID in ApplicationStart event...

2014-08-05 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/1218#discussion_r15827428 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1531,18 +1532,6 @@ object SparkContext extends Logging { throw new

[GitHub] spark pull request: [SPARK-2856] Decrease initial buffer size for ...

2014-08-05 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1780#issuecomment-51269524 Hi @pwendell, my observation about buffer size was not in context of spark ... we saw issues which looked like buffer overflow when the serialized object graph was large

[GitHub] spark pull request: Turn UpdateBlockInfo into case class.

2014-08-10 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1872#issuecomment-51710872 If case class then does it still need to be Externalizable ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: [SPARK-2952] Enable logging actor messages at ...

2014-08-11 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1870#issuecomment-51850204 Just saw this as part of the close, sorry for the late comment. Also, some of the INFO messages which are useful have now become DEBUG ? Makes it slightly harder

[GitHub] spark pull request: [SPARK-2931] In TaskSetManager, reset currentL...

2014-08-11 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1896#issuecomment-51850469 It is just a modification of the test above it :-) Maybe some copy paste error ? That line is not required for this test btw - just the last line validates the issue

[GitHub] spark pull request: [SPARK-2952] Enable logging actor messages at ...

2014-08-11 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1870#issuecomment-51850828 Unfortunately, in most cases, we wont know what the issue is other than bug hunting in the logs. So debug logging gets enabled for a wide swathe of packages

[GitHub] spark pull request: [SPARK-2931] In TaskSetManager, reset currentL...

2014-08-11 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1896#issuecomment-51852483 It has to do with FakeRackUtil used in that class. I guess not all tests clean it up properly after assigning to it : which is why host2 (or host1 depending on order

[GitHub] spark pull request: [SPARK-2931] In TaskSetManager, reset currentL...

2014-08-11 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1896#issuecomment-51852593 To be deterministic, you can add FakeRackUtil.cleanup at begining of this test too. Though ideally we should do it to tests which add hosts to rack

[GitHub] spark pull request: [SPARK-3875] Add TEMP DIRECTORY configuration

2014-10-09 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/2729#issuecomment-58473957 At least for yarn, this will create issues if overridden from default. Not sure about mesos. Why not use std java property and define it for local

[GitHub] spark pull request: [SPARK-3875] Add TEMP DIRECTORY configuration

2014-10-09 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/2729#issuecomment-58479810 There is a java property which controls this ... java.io.tmpdir On 09-Oct-2014 1:22 pm, 刘钰帆 notificati...@github.com wrote: @mridulm https://github.com

[GitHub] spark pull request: [SPARK-3889] Attempt to avoid SIGBUS by not mm...

2014-10-10 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/2742#issuecomment-58724312 This needs to be configurable ... IIRC 1.1 had this customizable. Different limits exist for vm vs heap memory in yarn (for example). --- If your project is set up

[GitHub] spark pull request: [SPARK-3889] Attempt to avoid SIGBUS by not mm...

2014-10-10 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/2742#issuecomment-58728241 With 1.1, in expts, we have done both : depending on whether our user code is mmap'ing too much data (and so we pull things into heap .. using libraries not in our

[GitHub] spark pull request: [SPARK-3889] Attempt to avoid SIGBUS by not mm...

2014-10-10 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/2742#issuecomment-58728319 Note: this is reqd since there are heap and vm limits enforced, so we juggle available memory around so that jobs can run to completion! On 11-Oct-2014 4:56 am

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-10 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/892#discussion_r13596860 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -153,8 +153,8 @@ private[spark] class TaskSetManager

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-10 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/892#discussion_r13597675 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -388,7 +386,7 @@ private[spark] class TaskSetManager( val

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-10 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/892#discussion_r13598180 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -388,7 +386,7 @@ private[spark] class TaskSetManager( val

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-10 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/892#discussion_r13601836 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -388,7 +386,7 @@ private[spark] class TaskSetManager( val

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-11 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/892#issuecomment-45732793 Just wanted to drop a quick node (since I might not be able to get to this until late next week). I think the proposal should work : though I might be missing

[GitHub] spark pull request: [SPARK-1946] Submit stage after (configured ra...

2014-06-11 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/900#issuecomment-45779752 This one slipped off my radar, my apologies. @tgravescs In #892, if there is even a single executor which is process local with any partition, then we start waiting

[GitHub] spark pull request: [SPARK-1946] Submit stage after (configured ra...

2014-06-11 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/900#issuecomment-45780405 Hit submit by mistake, to continue ... The side effect of not having sufficient executors are different from #892. For example, a) the default parallelism in yarn

[GitHub] spark pull request: Just a POC for having compression for every RD...

2014-06-15 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1091#issuecomment-46125063 Use spark.rdd.compress = true for compressing serialized RDD. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: Compression should be a setting for individual...

2014-06-18 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1091#issuecomment-46481279 Misread the PR and confused it with another pull request, ignore my earlier comment. --- If your project is set up for it, you can reply to this email and have your

  1   2   3   4   5   6   7   8   9   10   >