Github user rdblue commented on the issue:
https://github.com/apache/spark/pull/17540
@cloud-fan, the one from this PR is the correct plan to show. There are two
problems affecting this query. First, the call to `withNewExecutionId` should
be done where the execution is started, in either `DataFrameWriter` or
`Dataset`, not in the leaf nodes of the Spark plan. That way, we don't have to
add calls to it everywhere and have some code paths that don't show up in the
SQL tab.
The second problem is one that I discovered in tests: **queries like this
have two physical plans** that are computed at different times because
`ExecutedCommandExec` links a logical plan into the physical plan.
The result of both problems is that the current SQL tab only shows the
second/inner physical plan, the `LocalTableScan` without showing that it is
being written. The write operation is crucial information missing in the
current version. The solution to this problem is to call `withNewExecutionId`
at the right time (what this PR fixes) *and* to update queries like this to
have a single physical plan. With that fix, this would show that the write
operation read using a `LocalTableScan`.
I think the plan with the the write operation is the right plan to show,
and this fixes the queries that don't show up at all. Once this goes in, I'll
bring up a discussion on the dev list about how we are going to fix the queries
that are split into two physical plans. That's a larger fix than this PR should
handle.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]