Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/1541#issuecomment-5603
This seems fine to me, actually, so I'm going to re-run the tests then
merge it. We might want to do some other cleanup in this file, but that can
wait for a separate
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1541#issuecomment-56889065
[QA tests have
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/155/consoleFull)
for PR 1541 at commit
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1541#issuecomment-56895776
[QA tests have
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/155/consoleFull)
for PR 1541 at commit
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/1541
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
Github user zsxwing commented on the pull request:
https://github.com/apache/spark/pull/1541#issuecomment-56909022
Thank you @JoshRosen
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user zsxwing commented on the pull request:
https://github.com/apache/spark/pull/1541#issuecomment-55985572
@JoshRosen do you think it's OK?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user zsxwing commented on the pull request:
https://github.com/apache/spark/pull/1541#issuecomment-54067191
ping @JoshRosen, could you help take a look at this one?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well.
Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/1541#issuecomment-54098544
Thanks for the reminder.
@kayousterhout I looked over @zsxwing's example and I agree that there's a
thread-safety issue here.We can definitely have
Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/1541#issuecomment-54098674
Actually, it looks like the `fetchedStatuses` vs `statuses` synchronization
is correct, since it's guarding against modification to that statuses array
while reading
Github user zsxwing commented on the pull request:
https://github.com/apache/spark/pull/1541#issuecomment-54103100
the output status for a particular map task should be immutable once
written.
But `mapStatuses` is mutable. Therefore, some thread is putting an item to
Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/1541#issuecomment-54103317
I agree that `mapStatuses` itself is mutable. I was just observing that
the values stored in `mapStatuses` (the `Array[MapStatus]`es) aren't modified
after they're
Github user zsxwing commented on the pull request:
https://github.com/apache/spark/pull/1541#issuecomment-54103752
You remind me that even if `Array[MapStatus]` won't be modified, according
to java memory model, fetchedStatuses
Github user zsxwing commented on the pull request:
https://github.com/apache/spark/pull/1541#issuecomment-54104158
ConcurrentHashMap should fix all these issues I found.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If
Github user kayousterhout commented on the pull request:
https://github.com/apache/spark/pull/1541#issuecomment-50073250
When is this accessed concurrently? I looked quickly and can only find
updates from the (single-threaded) DAGScheduler event loop. Is the issue that
it can be
Github user zsxwing commented on the pull request:
https://github.com/apache/spark/pull/1541#issuecomment-50097007
When is this accessed concurrently?
For example, HashShuffleReader.read - object
BlockStoreShuffleFetcher.fetch - MapOutputTracker.getServerStatuses. Different
Github user mridulm commented on the pull request:
https://github.com/apache/spark/pull/1541#issuecomment-49855865
Instead of a ConcurrentHashMap, we should actually move it to a disk backed
Map - the cleanup of this datastructure is painful - which it can become
extremely large;
Github user zsxwing commented on the pull request:
https://github.com/apache/spark/pull/1541#issuecomment-49865627
Instead of a ConcurrentHashMap, we should actually move it to a disk
backed Map
Agree. Is there a PR ready? I think this is a critical bug and hope it can
be
GitHub user zsxwing opened a pull request:
https://github.com/apache/spark/pull/1541
SPARK-2634: Change MapOutputTrackerWorker.mapStatuses to ConcurrentHashMap
MapOutputTrackerWorker.mapStatuses is used concurrently, it should be
thread-safe. This bug has already been fixed in
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1541#issuecomment-49829141
QA tests have started for PR 1541. This patch merges cleanly. brView
progress:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17011/consoleFull
---
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1541#issuecomment-49829180
QA results for PR 1541:br- This patch FAILED unit tests.br- This patch
merges cleanlybr- This patch adds no public classesbrbrFor more
information see test
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1541#issuecomment-49829579
QA tests have started for PR 1541. This patch merges cleanly. brView
progress:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17012/consoleFull
---
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1541#issuecomment-49834008
QA results for PR 1541:br- This patch PASSES unit tests.br- This patch
merges cleanlybr- This patch adds no public classesbrbrFor more
information see test
22 matches
Mail list logo