[GitHub] spark pull request: SPARK-2634: Change MapOutputTrackerWorker.mapS...

2014-09-25 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/1541#issuecomment-5603 This seems fine to me, actually, so I'm going to re-run the tests then merge it. We might want to do some other cleanup in this file, but that can wait for a separate

[GitHub] spark pull request: SPARK-2634: Change MapOutputTrackerWorker.mapS...

2014-09-25 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1541#issuecomment-56889065 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/155/consoleFull) for PR 1541 at commit

[GitHub] spark pull request: SPARK-2634: Change MapOutputTrackerWorker.mapS...

2014-09-25 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1541#issuecomment-56895776 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/155/consoleFull) for PR 1541 at commit

[GitHub] spark pull request: SPARK-2634: Change MapOutputTrackerWorker.mapS...

2014-09-25 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/1541 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: SPARK-2634: Change MapOutputTrackerWorker.mapS...

2014-09-25 Thread zsxwing
Github user zsxwing commented on the pull request: https://github.com/apache/spark/pull/1541#issuecomment-56909022 Thank you @JoshRosen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: SPARK-2634: Change MapOutputTrackerWorker.mapS...

2014-09-17 Thread zsxwing
Github user zsxwing commented on the pull request: https://github.com/apache/spark/pull/1541#issuecomment-55985572 @JoshRosen do you think it's OK? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: SPARK-2634: Change MapOutputTrackerWorker.mapS...

2014-09-01 Thread zsxwing
Github user zsxwing commented on the pull request: https://github.com/apache/spark/pull/1541#issuecomment-54067191 ping @JoshRosen, could you help take a look at this one? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well.

[GitHub] spark pull request: SPARK-2634: Change MapOutputTrackerWorker.mapS...

2014-09-01 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/1541#issuecomment-54098544 Thanks for the reminder. @kayousterhout I looked over @zsxwing's example and I agree that there's a thread-safety issue here.We can definitely have

[GitHub] spark pull request: SPARK-2634: Change MapOutputTrackerWorker.mapS...

2014-09-01 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/1541#issuecomment-54098674 Actually, it looks like the `fetchedStatuses` vs `statuses` synchronization is correct, since it's guarding against modification to that statuses array while reading

[GitHub] spark pull request: SPARK-2634: Change MapOutputTrackerWorker.mapS...

2014-09-01 Thread zsxwing
Github user zsxwing commented on the pull request: https://github.com/apache/spark/pull/1541#issuecomment-54103100 the output status for a particular map task should be immutable once written. But `mapStatuses` is mutable. Therefore, some thread is putting an item to

[GitHub] spark pull request: SPARK-2634: Change MapOutputTrackerWorker.mapS...

2014-09-01 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/1541#issuecomment-54103317 I agree that `mapStatuses` itself is mutable. I was just observing that the values stored in `mapStatuses` (the `Array[MapStatus]`es) aren't modified after they're

[GitHub] spark pull request: SPARK-2634: Change MapOutputTrackerWorker.mapS...

2014-09-01 Thread zsxwing
Github user zsxwing commented on the pull request: https://github.com/apache/spark/pull/1541#issuecomment-54103752 You remind me that even if `Array[MapStatus]` won't be modified, according to java memory model, fetchedStatuses

[GitHub] spark pull request: SPARK-2634: Change MapOutputTrackerWorker.mapS...

2014-09-01 Thread zsxwing
Github user zsxwing commented on the pull request: https://github.com/apache/spark/pull/1541#issuecomment-54104158 ConcurrentHashMap should fix all these issues I found. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark pull request: SPARK-2634: Change MapOutputTrackerWorker.mapS...

2014-07-24 Thread kayousterhout
Github user kayousterhout commented on the pull request: https://github.com/apache/spark/pull/1541#issuecomment-50073250 When is this accessed concurrently? I looked quickly and can only find updates from the (single-threaded) DAGScheduler event loop. Is the issue that it can be

[GitHub] spark pull request: SPARK-2634: Change MapOutputTrackerWorker.mapS...

2014-07-24 Thread zsxwing
Github user zsxwing commented on the pull request: https://github.com/apache/spark/pull/1541#issuecomment-50097007 When is this accessed concurrently? For example, HashShuffleReader.read - object BlockStoreShuffleFetcher.fetch - MapOutputTracker.getServerStatuses. Different

[GitHub] spark pull request: SPARK-2634: Change MapOutputTrackerWorker.mapS...

2014-07-23 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1541#issuecomment-49855865 Instead of a ConcurrentHashMap, we should actually move it to a disk backed Map - the cleanup of this datastructure is painful - which it can become extremely large;

[GitHub] spark pull request: SPARK-2634: Change MapOutputTrackerWorker.mapS...

2014-07-23 Thread zsxwing
Github user zsxwing commented on the pull request: https://github.com/apache/spark/pull/1541#issuecomment-49865627 Instead of a ConcurrentHashMap, we should actually move it to a disk backed Map Agree. Is there a PR ready? I think this is a critical bug and hope it can be

[GitHub] spark pull request: SPARK-2634: Change MapOutputTrackerWorker.mapS...

2014-07-22 Thread zsxwing
GitHub user zsxwing opened a pull request: https://github.com/apache/spark/pull/1541 SPARK-2634: Change MapOutputTrackerWorker.mapStatuses to ConcurrentHashMap MapOutputTrackerWorker.mapStatuses is used concurrently, it should be thread-safe. This bug has already been fixed in

[GitHub] spark pull request: SPARK-2634: Change MapOutputTrackerWorker.mapS...

2014-07-22 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1541#issuecomment-49829141 QA tests have started for PR 1541. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17011/consoleFull ---

[GitHub] spark pull request: SPARK-2634: Change MapOutputTrackerWorker.mapS...

2014-07-22 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1541#issuecomment-49829180 QA results for PR 1541:br- This patch FAILED unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test

[GitHub] spark pull request: SPARK-2634: Change MapOutputTrackerWorker.mapS...

2014-07-22 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1541#issuecomment-49829579 QA tests have started for PR 1541. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17012/consoleFull ---

[GitHub] spark pull request: SPARK-2634: Change MapOutputTrackerWorker.mapS...

2014-07-22 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1541#issuecomment-49834008 QA results for PR 1541:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test