maropu commented on issue #20965: [SPARK-21870][SQL] Split aggregation code into small functions URL: https://github.com/apache/spark/pull/20965#issuecomment-524121633 > Besides the inline comments, one question I have for this PR is: when the conf is turned on, it unconditionally tries to perform aggregate expression codegen splitting, right? This would generate extra preparation code for loading the aggregate buffer slots to local variables, but when CodeGenerator.isValidParamLength(paramLength) is false, this PR gives up on splitting and goes back to generating code inline, which means in this case it can generate worse (bigger) code than the current master. > If we get rid of the extra code for loading aggregate buffers into locals, it might avoid the code size regression. Yea, right. As you said, the code size regression could happen when we hit the JVM limit in parameter lengths. So, I rethought the design and I fixed the code so that we could fall back into the current (non-split) generated code when we hit the limit: https://github.com/apache/spark/pull/20965/files#diff-2eb948516b5beaeb746aadac27fbd5b4R284 Could you check again? > But one more question remains: in the normal case where splitting worked, have we verified whether or not there's a performance regression due to the extra function invocations from the splitting? Ur, it is difficult to prove that there is no performance regression in all workloads. I checked that there was no regression in TPCDS (and [there was the big performance gain in Q66](https://github.com/apache/spark/pull/19082#issuecomment-325847309)), but I'm not sure about that of the other workloads, e.g., ETL/ML. So, this pr turns off this mode by default.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
