rednaxelafx opened a new pull request #23579: [WIP][SQL] Fix duplicate 
cmd.nodeName in the explain output of DataWritingCommandExec
URL: https://github.com/apache/spark/pull/23579
 
 
   ## What changes were proposed in this pull request?
   
   `DataWritingCommandExec` generates `cmd.nodeName` twice in its explain 
output, e.g. when running this query `spark.sql("create table foo stored as 
parquet as select id, id % 10 as cat1, id % 20 as cat2 from range(10)")`,
   ```
   Execute OptimizedCreateHiveTableAsSelectCommand 
OptimizedCreateHiveTableAsSelectCommand [Database:default, TableName: foo, 
InsertIntoHiveTable]
   +- *(1) Project [id#2L, (id#2L % 10) AS cat1#0L, (id#2L % 20) AS cat2#1L]
      +- *(1) Range (0, 10, step=1, splits=8)
   ```
   After the fix, it'll go back to normal:
   ```
   Execute OptimizedCreateHiveTableAsSelectCommand [Database:default, 
TableName: foo, InsertIntoHiveTable]
   +- *(1) Project [id#2L, (id#2L % 10) AS cat1#0L, (id#2L % 20) AS cat2#1L]
      +- *(1) Range (0, 10, step=1, splits=8)
   ```
   
   This duplication is introduced when this specialized 
`DataWritingCommandExec` was created in place of `ExecutedCommandExec`.
   
   The former is a `UnaryExecNode` whose `children` include the physical plan 
of the query, and the `cmd` is picked up via `TreeNode.stringArgs` into the 
argument string. The duplication comes from: `DataWritingCommandExec.nodeName` 
is `s"Execute ${cmd.nodeName}"` while the argument string is 
`cmd.simpleString()` which also includes `cmd.nodeName`.
   
   The latter didn't have that problem because it's a `LeafExecNode` with no 
children, and it declares the `cmd` as being a part of the `innerChildren` 
which is excluded from the argument string.
   
   ## How was this patch tested?
   
   Manual testing of running the example above in a local Spark Shell.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to