Github user squito commented on the pull request:
https://github.com/apache/spark/pull/5636#issuecomment-97323086
Thanks for the update @ilganeli ! my comments are mostly minor. The only
thing which is bugging me is that the tests don't really show how the stage
failure gets pushed up to the user code. Eg., do they get a `SparkException`
with a good message -- or does the DAGScheduler end up in some weird state
where it stops running any additional jobs? I think it should work, but the
DAGScheduler code is hairy enough that I'd really prefer a test. But I can't
come up with a good way to write a unit test (or test manually for that
matter). Maybe something like this test in `ShuffleSuite`?
https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/ShuffleSuite.scala#L264
The problem is you don't have a good way to delete the shuffle files
between stage attempts ... but maybe we could swap-in a different
`diskBlockManager` that always fails to find the files or something. I'll
think about it a little more.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]