Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2944#issuecomment-60877916
[Test build #22430 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22430/consoleFull)
for PR 2944 at commit
Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/2944#issuecomment-60877993
I've rewritten this patch so that thread dumps are triggered on-demand
using a new driver - executor RPC channel. There are a few hacks involved in
setting this up,
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2944#issuecomment-60878081
[Test build #22430 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22430/consoleFull)
for PR 2944 at commit
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/2944#issuecomment-60878083
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/2944#discussion_r19521370
--- Diff:
core/src/main/scala/org/apache/spark/storage/BlockManagerMasterActor.scala ---
@@ -412,6 +415,17 @@ class BlockManagerMasterActor(val isLocal:
Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/2944#discussion_r19521449
--- Diff: core/src/main/scala/org/apache/spark/ui/exec/ThreadDumpPage.scala
---
@@ -0,0 +1,71 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2944#issuecomment-60878689
[Test build #22433 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22433/consoleFull)
for PR 2944 at commit
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2944#issuecomment-60878899
[Test build #22433 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22433/consoleFull)
for PR 2944 at commit
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/2944#issuecomment-60878901
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2944#issuecomment-61012331
[Test build #22482 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22482/consoleFull)
for PR 2944 at commit
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2944#issuecomment-61021043
[Test build #22482 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22482/consoleFull)
for PR 2944 at commit
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/2944#issuecomment-61021051
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
Github user andrewor14 commented on the pull request:
https://github.com/apache/spark/pull/2944#issuecomment-60818509
Wow, awesome!!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user andrewor14 commented on the pull request:
https://github.com/apache/spark/pull/2944#issuecomment-60818581
This is even easier to read than the raw jstack output
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well.
Github user shivaram commented on the pull request:
https://github.com/apache/spark/pull/2944#issuecomment-60818745
@JoshRosen This is super awesome !
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/2944#issuecomment-60834858
It looks like executorIds are assigned by the cluster manager, so in
principle they could be arbitrary strings but in practice they seem to not
contain special
Github user shivaram commented on the pull request:
https://github.com/apache/spark/pull/2944#issuecomment-60837003
Do you know how large the threadDump is typically ? I'm concerned this
might make the heartbeat too large
---
If your project is set up for it, you can reply to this
Github user shivaram commented on the pull request:
https://github.com/apache/spark/pull/2944#issuecomment-60837888
The other idea I had was that we could just open a port on the executor and
have a web ui on it. This could also display the executor's stderr (Which is
very painful to
Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/2944#issuecomment-60842701
@shivaram That's a good point RE: the size of the thread dumps. I can now
imagine problems where a thread-leak in an executor causes the heartbeat to
become huge and
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2944#issuecomment-60843263
[Test build #22383 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22383/consoleFull)
for PR 2944 at commit
Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/2944#issuecomment-60845020
I like the idea of running a separate UI server on the executor, but this
seems like a much more involved change that will take a lot more design review.
For example,
Github user shivaram commented on the pull request:
https://github.com/apache/spark/pull/2944#issuecomment-60846384
Yes - I think having a separate RPC sounds good for now.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well.
Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/2944#issuecomment-60846448
Upon closer inspection, there's not a general driver - executor RPC path
that I can use to send arbitrary Akka messages to executors. To keep this PR
simple and
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2944#issuecomment-60852209
[Test build #22383 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22383/consoleFull)
for PR 2944 at commit
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/2944#issuecomment-60852214
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2944#issuecomment-60856220
[Test build #22402 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22402/consoleFull)
for PR 2944 at commit
Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/2944#issuecomment-60856334
Alright, I've updated this to send the dumps as part of a separate
fire-and-forget RPC and removed the new code from the heartbeat code paths
(which should make things
Github user shivaram commented on the pull request:
https://github.com/apache/spark/pull/2944#issuecomment-60857056
That sounds good -- Actually could we make this a request-reply pattern ?
i.e we only fetch the stack traces if somebody clicks on the link ?
---
If your project is
Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/2944#issuecomment-60857353
I've thought about that, but it looks like we don't actually create
addressable actors on the executors, so there's no path for the driver to send
an RPC to the
Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/2944#issuecomment-60857677
Another subtlety: when the web UI receives a request for a thread-dump, it
would need to issue a RPC to the executor to fetch that request. Ideally, we
wouldn't block
Github user shivaram commented on the pull request:
https://github.com/apache/spark/pull/2944#issuecomment-60858892
Hmm okay - I agree that we don't really have a request - reply route from
the web ui (maybe this is also worth investigating if / when we have a executor
web ui).
Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/2944#issuecomment-60859702
What if we had a 1000-node cluster, though, and kept the default heartbeat
interval of 10 seconds? In that case, we'd be sending a huge flood of data to
the driver,
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/2944#issuecomment-60862614
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2944#issuecomment-60862610
[Test build #22402 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22402/consoleFull)
for PR 2944 at commit
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2944#issuecomment-60650825
[Test build #22301 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22301/consoleFull)
for PR 2944 at commit
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/2944#issuecomment-60664728
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
Github user shaneknapp commented on the pull request:
https://github.com/apache/spark/pull/2944#issuecomment-60664970
jenkins, test this
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user shaneknapp commented on the pull request:
https://github.com/apache/spark/pull/2944#issuecomment-60668583
jenkins, test this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2944#issuecomment-60668699
[Test build #22305 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22305/consoleFull)
for PR 2944 at commit
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2944#issuecomment-60680949
[Test build #22305 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22305/consoleFull)
for PR 2944 at commit
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/2944#issuecomment-60680961
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
GitHub user JoshRosen opened a pull request:
https://github.com/apache/spark/pull/2944
[SPARK-611] [WIP] Display executor thread dumps in web UI
This patch allows executor thread dumps to be viewed in the Spark web UI.
Thread dumps obtained from Thread.getAllStackTraces()
Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/2944#discussion_r19378809
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/local/LocalBackend.scala ---
@@ -47,7 +47,7 @@ private[spark] class LocalActor(
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2944#issuecomment-60505076
[Test build #8 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/8/consoleFull)
for PR 2944 at commit
Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/2944#issuecomment-60505192
One subtle issue that I've run into is that the driver always runs a block
manager but only runs an Executor in local mode. So, the executors tab in
the web UI is
Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/2944#issuecomment-60505272
Executor IDs are strings, so I should probably check whether they'll need
to be url-encoded; I guess this depends on which components create these
strings.
---
If
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/2944#issuecomment-60505576
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2944#issuecomment-60505573
[Test build #8 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/8/consoleFull)
for PR 2944 at commit
48 matches
Mail list logo