[GitHub] spark issue #17540: [SPARK-20213][SQL][UI] Fix DataFrameWriter operations in...

rdblue Mon, 24 Apr 2017 09:28:10 -0700

Github user rdblue commented on the issue:

    https://github.com/apache/spark/pull/17540
  
    @cloud-fan, the one from this PR is the correct plan to show. There are two 
problems affecting this query. First, the call to `withNewExecutionId` should 
be done where the execution is started, in either `DataFrameWriter` or 
`Dataset`, not in the leaf nodes of the Spark plan. That way, we don't have to 
add calls to it everywhere and have some code paths that don't show up in the 
SQL tab.
    
    The second problem is one that I discovered in tests: **queries like this 
have two physical plans** that are computed at different times because 
`ExecutedCommandExec` links a logical plan into the physical plan.
    
    The result of both problems is that the current SQL tab only shows the 
second/inner physical plan, the `LocalTableScan` without showing that it is 
being written. The write operation is crucial information missing in the 
current version. The solution to this problem is to call `withNewExecutionId` 
at the right time (what this PR fixes) *and* to update queries like this to 
have a single physical plan. With that fix, this would show that the write 
operation read using a `LocalTableScan`.
    
    I think the plan with the the write operation is the right plan to show, 
and this fixes the queries that don't show up at all. Once this goes in, I'll 
bring up a discussion on the dev list about how we are going to fix the queries 
that are split into two physical plans. That's a larger fix than this PR should 
handle.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #17540: [SPARK-20213][SQL][UI] Fix DataFrameWriter operations in...

Reply via email to