Github user andrewor14 commented on the pull request:
https://github.com/apache/spark/pull/11115#issuecomment-202639815
I just talked with @rxin and we decided that this flag isn't quite ready to
be exposed yet. The problem is the default value (false) has weird semantics on
stage retries; if you're counting the number of rows in your data and your
stage was run twice, then you might get 2X the actual number of rows. This is
what #11105 is looking at, and we might proceed from there.
Either way I think we should try to fix the default behavior before
exposing a flag that we won't be able to change in the future. Let's keep the
issue open but close this PR for now.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]