peter-toth opened a new pull request #28083: [WIP][SPARK-30564][SQL] Use 
comment placeholders in HashAggregateExec
URL: https://github.com/apache/spark/pull/28083
 
 
   ### What changes were proposed in this pull request?
   SPARK-21870 (cb0cddf#diff-06dc5de6163687b7810aa76e7e152a76R146-R149) caused 
significant performance regression in cases where the source code size is 
fairly large as `HashAggregateExec` uses `Block.length` to decide on splitting 
the code. The change in `length` makes sense as the comment and extra new lines 
shouldn't be taken into account when deciding on splitting, but the regular 
expression based approach is very slow and adds a big relative overhead to 
cases where the execution is quick (small number of rows).
   This PR:
   - places comments in `HashAggragateExec` with 
`CodegenContext.registerComment` so as to appear only when comments are enabled 
`spark.sql.codegen.comments=true` and 
   - restores `Block.length` to it's original form
   
   ### Why are the changes needed?
   To fix performance regression.
   
   ### Does this PR introduce any user-facing change?
   No.
   
   ### How was this patch tested?
   Existing UTs.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to