maropu commented on issue #20965: [SPARK-21870][SQL] Split aggregation code 
into small functions
URL: https://github.com/apache/spark/pull/20965#issuecomment-524121633
 
 
   > Besides the inline comments, one question I have for this PR is: when the 
conf is turned on, it unconditionally tries to perform aggregate expression 
codegen splitting, right?
   This would generate extra preparation code for loading the aggregate buffer 
slots to local variables, but when 
CodeGenerator.isValidParamLength(paramLength) is false, this PR gives up on 
splitting and goes back to generating code inline, which means in this case it 
can generate worse (bigger) code than the current master.
   > If we get rid of the extra code for loading aggregate buffers into locals, 
it might avoid the code size regression.
   
   Yea, right. As you said, the code size regression could happen when we hit 
the JVM limit in parameter lengths. So, I rethought the design and I fixed the 
code so that we could fall back into the current (non-split) generated code 
when we hit the limit: 
https://github.com/apache/spark/pull/20965/files#diff-2eb948516b5beaeb746aadac27fbd5b4R284
   Could you check again? 
   
   > But one more question remains: in the normal case where splitting worked, 
have we verified whether or not there's a performance regression due to the 
extra function invocations from the splitting?
   
   Ur, it is difficult to prove that there is no performance regression in all 
workloads. I checked that there was no regression in TPCDS (and [there was the 
big performance gain in 
Q66](https://github.com/apache/spark/pull/19082#issuecomment-325847309)), but 
I'm not sure about that of the other workloads, e.g., ETL/ML.  So, this pr 
turns off this mode by default.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to