Github user sitalkedia commented on the issue:
https://github.com/apache/spark/pull/17297
okay, closing the PR.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and
Github user jiangxb1987 commented on the issue:
https://github.com/apache/spark/pull/17297
Should we temporarily close the PR and wait for the design doc to be
finalized? @sitalkedia
---
If your project is set up for it, you can reply to this email and have your
reply appear on
Github user sitalkedia commented on the issue:
https://github.com/apache/spark/pull/17297
@kayousterhout, @squito - Since we need more discussion on this change over
a design doc, I have put out a temporary change
(https://github.com/apache/spark/pull/17485) to kill the running tasks
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17297
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17297
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75339/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17297
**[Test build #75339 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75339/testReport)**
for PR 17297 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17297
**[Test build #75339 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75339/testReport)**
for PR 17297 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17297
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17297
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75332/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17297
**[Test build #75332 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75332/testReport)**
for PR 17297 at commit
Github user markhamstra commented on the issue:
https://github.com/apache/spark/pull/17297
Agreed. Let's establish what we want to do before trying to discuss the
details of how we are going to do it.
On Tue, Mar 28, 2017 at 8:17 AM, Imran Rashid
Github user sitalkedia commented on the issue:
https://github.com/apache/spark/pull/17297
@squito - Sounds good to me, let me compile the list of pain points related
to fetch failure we are seeing and also a design doc to have better handling of
the issues.
---
If your project is
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17297
**[Test build #75332 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75332/testReport)**
for PR 17297 at commit
Github user kayousterhout commented on the issue:
https://github.com/apache/spark/pull/17297
Agree sounds good!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and
Github user tgravescs commented on the issue:
https://github.com/apache/spark/pull/17297
Sounds good to me.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes
Github user squito commented on the issue:
https://github.com/apache/spark/pull/17297
@sitalkedia This change is pretty contentious, there are lot of questions
about whether or not this is a good change. I don't think discussing this here
in github comments on a PR is the best form.
Github user squito commented on the issue:
https://github.com/apache/spark/pull/17297
btw I filed https://issues.apache.org/jira/browse/SPARK-20128 for the test
timeout -- fwiw I don't think its a problem w/ the test but a potential real
issue with the metrics system, though I don't
Github user sitalkedia commented on the issue:
https://github.com/apache/spark/pull/17297
@kayousterhout - Both the scenario A and B you described above are likely
(it totally depend on the nature of the job and available cluster resources)
and you are right that in case of scenario
Github user kayousterhout commented on the issue:
https://github.com/apache/spark/pull/17297
@sitalkedia they're in core/target/unit-tests.log
Sometimes it's easier to move the logs to the tests (so they show up
in-line), which you can do by changing
Github user sitalkedia commented on the issue:
https://github.com/apache/spark/pull/17297
@squito - I am able to reproduce the issue by running `./build/sbt
"test-only org.apache.spark.InternalAccumulatorSuite`, however test case logs
are not being printed on the console, do you
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17297
**[Test build #75287 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75287/testReport)**
for PR 17297 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17297
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17297
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75287/
Test FAILed.
---
Github user squito commented on the issue:
https://github.com/apache/spark/pull/17297
@sitalkedia how are you trying to run the test? Works fine for me on my
laptop on master. Note that the test is referencing a var which is only
defined if "spark.testing" is a system property:
Github user sitalkedia commented on the issue:
https://github.com/apache/spark/pull/17297
@squito - I am not able to reproduce this issue locally.
The tests fails with some other issue -
``None.get
java.util.NoSuchElementException: None.get
at
Github user kayousterhout commented on the issue:
https://github.com/apache/spark/pull/17297
To recap the issue that Imran and I discussed here, I think it can be
summarized as follows:
- A Fetch Failure happens at some time t and indicates that the map output
on machine M
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17297
**[Test build #75287 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75287/testReport)**
for PR 17297 at commit
Github user sitalkedia commented on the issue:
https://github.com/apache/spark/pull/17297
@squito - Thanks, that helps a lot. I will fix the issue and submit a patch
soon.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well.
Github user squito commented on the issue:
https://github.com/apache/spark/pull/17297
@sitalkedia I took a closer look -- I think this is from
"o.a.s.InternalAccumulatorSuite: 'internal accumulators in resubmitted
stages'". From the console output on jenkins, that was the last test
Github user sitalkedia commented on the issue:
https://github.com/apache/spark/pull/17297
@kayousterhout - It seems like the test timeout might be related to the
change. But I am not able to find the culprit test case from the build log. Any
idea what is wrong?
---
If your project
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17297
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17297
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75176/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17297
**[Test build #75176 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75176/testReport)**
for PR 17297 at commit
Github user sitalkedia commented on the issue:
https://github.com/apache/spark/pull/17297
@kayousterhout - Sure will file a JIRA in future. Latest test failed and I
am not sure if this is the same issue -
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75151/
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17297
**[Test build #75176 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75176/testReport)**
for PR 17297 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17297
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75151/
Test FAILed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17297
Build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user kayousterhout commented on the issue:
https://github.com/apache/spark/pull/17297
@sitalkedia can you file a JIRA in the future when you see flaky test
failures? In this case I updated an existing JIRA
(https://issues.apache.org/jira/browse/SPARK-19612) but please do this
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17297
**[Test build #75151 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75151/testReport)**
for PR 17297 at commit
Github user sitalkedia commented on the issue:
https://github.com/apache/spark/pull/17297
Jenkins retest this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17297
**[Test build #75127 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75127/testReport)**
for PR 17297 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17297
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75127/
Test FAILed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17297
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17297
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17297
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75124/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17297
**[Test build #75124 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75124/testReport)**
for PR 17297 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17297
**[Test build #75127 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75127/testReport)**
for PR 17297 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17297
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17297
**[Test build #75126 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75126/testReport)**
for PR 17297 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17297
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75126/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17297
**[Test build #75126 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75126/testReport)**
for PR 17297 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17297
**[Test build #75124 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75124/testReport)**
for PR 17297 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17297
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75030/
Test FAILed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17297
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17297
**[Test build #75030 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75030/testReport)**
for PR 17297 at commit
Github user sitalkedia commented on the issue:
https://github.com/apache/spark/pull/17297
Thanks @markhamstra for review comments, addressed. I also found an issue
with my previous implementation that we do not allow task commits from old
stage attempts, I fixed that issue as well.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17297
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17297
**[Test build #75029 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75029/testReport)**
for PR 17297 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17297
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75029/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17297
**[Test build #75029 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75029/testReport)**
for PR 17297 at commit
Github user squito commented on the issue:
https://github.com/apache/spark/pull/17297
> when the stage fails because of fetch failure, we remove the stage from
the output commiter. So if any task completes between the time of first fetch
failure and the time stage is resubmitted,
Github user sitalkedia commented on the issue:
https://github.com/apache/spark/pull/17297
>> I don't think its true that it relaunches all tasks that hadn't
completed when the fetch failure occurred. it relaunches all the tasks haven't
completed, by the time the stage gets
Github user sitalkedia commented on the issue:
https://github.com/apache/spark/pull/17297
Thanks a lot @squito for taking a look at it and for your feedback.
>> this is already true. when there is a fetch failure, the TaskSetManager
is marked as zombie, and the DAGScheduler
Github user squito commented on the issue:
https://github.com/apache/spark/pull/17297
I'm a bit confused by the description:
> 1. When a fetch failure happens, the task set manager ask the dag
scheduler to abort all the non-running tasks. However, the running tasks in the
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17297
**[Test build #74631 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74631/testReport)**
for PR 17297 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17297
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74631/
Test FAILed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17297
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17297
**[Test build #74631 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74631/testReport)**
for PR 17297 at commit
Github user sitalkedia commented on the issue:
https://github.com/apache/spark/pull/17297
@kayousterhout - I understand your concern and I agree that canceling the
running tasks is definitely a simpler approach, but this is very inefficient
for large jobs where tasks can run for
Github user kayousterhout commented on the issue:
https://github.com/apache/spark/pull/17297
@sitalkedia I won't have time to review this in detail for at least a few
weeks, just so you know (although others may have time to review / merge it).
At a very high level, I'm
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17297
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17297
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74566/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17297
**[Test build #74566 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74566/testReport)**
for PR 17297 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17297
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17297
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74562/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17297
**[Test build #74562 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74562/testReport)**
for PR 17297 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17297
**[Test build #74566 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74566/testReport)**
for PR 17297 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17297
**[Test build #74562 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74562/testReport)**
for PR 17297 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17297
**[Test build #74560 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74560/testReport)**
for PR 17297 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17297
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74560/
Test FAILed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17297
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17297
**[Test build #74560 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74560/testReport)**
for PR 17297 at commit
Github user sitalkedia commented on the issue:
https://github.com/apache/spark/pull/17297
cc - @kayousterhout - Addressed your earlier comment about
https://github.com/apache/spark/pull/12436 ignoring fetch failure from stale
map output. I have addressed this issue by adding epoch
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17297
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17297
**[Test build #74558 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74558/testReport)**
for PR 17297 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17297
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74558/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17297
**[Test build #74558 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74558/testReport)**
for PR 17297 at commit
87 matches
Mail list logo