rednaxelafx opened a new pull request #27955: [SPARK-31187][SQL] Sort the 
whole-stage codegen debug output by codegenStageId
URL: https://github.com/apache/spark/pull/27955
 
 
   ### What changes were proposed in this pull request?
   
   Spark SQL's whole-stage codegen (WSCG) supports dumping the generated code 
to help with debugging. One way to get the generated code is through 
`df.queryExecution.debug.codegen`, or SQL `EXPLAIN CODEGEN` statement.
   
   The generated code is currently printed without specific ordering, which can 
make debugging a bit annoying. This PR makes a minor improvement to sort the 
codegen dump by the `codegenStageId`, ascending.
   
   After this change, the following query:
   ```scala
   spark.range(10).agg(sum('id)).queryExecution.debug.codegen
   ```
   will always dump the generated code in a natural, stable order. A version of 
this example with shorter output is:
   ```
   
spark.range(10).agg(sum('id)).queryExecution.debug.codegenToSeq.map(_._1).foreach(println)
   *(1) HashAggregate(keys=[], functions=[partial_sum(id#8L)], output=[sum#15L])
   +- *(1) Range (0, 10, step=1, splits=16)
   
   *(2) HashAggregate(keys=[], functions=[sum(id#8L)], output=[sum(id)#12L])
   +- Exchange SinglePartition, true, [id=#30]
      +- *(1) HashAggregate(keys=[], functions=[partial_sum(id#8L)], 
output=[sum#15L])
         +- *(1) Range (0, 10, step=1, splits=16)
   ```
   
   The number of codegen stages within a single SQL query tends to be very 
small, most likely < 50, so the overhead of adding the sorting shouldn't be 
significant.
   
   
   ### Why are the changes needed?
   
   Minor improvement to aid WSCG debugging.
   
   ### Does this PR introduce any user-facing change?
   
   No user-facing change for end-users; minor change for developers who debug 
WSCG generated code.
   
   ### How was this patch tested?
   
   Manually tested the output; all other tests still pass.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to