peter-toth commented on issue #28041: [SPARK-30564][SQL] Improved extra new line and comment remove URL: https://github.com/apache/spark/pull/28041#issuecomment-612643258 I looked into this issue a bit deeper. Let me share my thoughts: - I still don't think reverting the comment removal chage that @maropu added to `Block.length` would be a good idea for 2 reasons: - Pretty big amount of the source code we currently generate is comment or extra whitespace/newline. I admit that if we slowly update all comments to use `CodegenContext.registerComment` can help to reduce code size, but why don't we remove the comment removal logic only when we reach that desired state? - `CodegenContext.registerComment` has some issue here as it also adds some comment placeholders to the source code. If we don't remove those comment placeholders in `Block.length` then method splitting can be different when `spark.sql.codegen.comments` is enabled or disabled, which can be very confusing. - Since `Block.length` is still a costly operation in this PR, this commit: https://github.com/apache/spark/pull/28041/commits/f7fbe1777651cd93c1f8a33547854c5ba61963ee helped to further reduce run time: ``` deeply nested struct field r/w: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ 250 deep x 400 rows (read in-mem) 137 164 27 0.7 1367.6 0.2X ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
