Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/4155#issuecomment-71956823
> The users on my side have been able to reproduce the missing files issue
reliably, so we may just have to live with an empirical verification and be
done with it.
Given that the actual bug is non-deterministic, I think we could be okay
without a regression test that reliably reproduces this issue. There might be
some value in a non-deterministic regression test, though, as long as it
detects the bug with sufficiently high probability, since we'd eventually catch
any regression by noticing that the test had become flaky in Jenkins.
Unless we can come up with a better test, in the immediate term I'm okay
with having unit tests for the individual components and an empirical
verification using your reproduction. Even though they aren't regression
tests, the new tests added here will be helpful for preventing regressions if
anyone changes the OutputCommitCoordinator logic.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]