Github user rdblue commented on the issue:
https://github.com/apache/spark/pull/17540
@srowen, agreed. Closely related but not the same code paths. The question
is: when should `withNewExecutionId` get called?
I'm running the test suite now and this patch causes test failures when
`withNewExecutionId` is called twice; once in `DataFrameWriter` and once in
`InsertIntoHadoopFsRelationCommand`. It looks like the call has now been
littered about the codebase (e.g. in `InsertIntoHadoopFsRelationCommand` and
other execution nodes) to fix this problem on certain operations, so we should
decide where it should be used and fix tests around that.
The reason why I added it to `DataFrameWriter` is that it is called in
`Dataset` actions, and it makes sense to call it once from where an action is
started. I think it makes the most sense for action methods, like
`Dataset#collect` or `DataFrameWriter#insertInto` to minimize the number of
places we need to add it. I don't think this is a concern that should be
addressed by the execution plan.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]