[GitHub] [spark] JoshRosen edited a comment on issue #27246: [SPARK-30536][CORE][SQL] Sort-merge join operator spilling performance improvements
JoshRosen edited a comment on issue #27246: [SPARK-30536][CORE][SQL] Sort-merge join operator spilling performance improvements URL: https://github.com/apache/spark/pull/27246#issuecomment-577483354 Regarding the "Implement lazy initialization of UnsafeSorterSpillReader" portion of the changes: It looks like @liutang123 previously proposed a similar change in 2018 in #20184 / [SPARK-22987](https://issues.apache.org/jira/browse/SPARK-22987). I stumbled across that PR earlier this year and [left some comments](https://github.com/apache/spark/pull/20184#issuecomment-511240279): overall, I think that lazy initialization of the individual spill readers makes sense. In order to enable incremental review and merge of these changes, do you think it could make sense to split the lazy spill reader initialization changes into a separate PR? I haven't compared the implementations in detail, so I haven't formed any opinion on the approach here vs. the one in #20184. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] JoshRosen edited a comment on issue #27246: [SPARK-30536][CORE][SQL] Sort-merge join operator spilling performance improvements
JoshRosen edited a comment on issue #27246: [SPARK-30536][CORE][SQL] Sort-merge join operator spilling performance improvements URL: https://github.com/apache/spark/pull/27246#issuecomment-577483354 Regarding the "Implement lazy initialization of UnsafeSorterSpillReader" portion of the changes: It looks like @liutang123 previously proposed a similar change in 2018 in #20184 / [SPARK-22987](https://issues.apache.org/jira/browse/SPARK-22987). I stumbled across that PR earlier this year and [left some comments](https://github.com/apache/spark/pull/20184#issuecomment-511240279): overall, I think that lazy initialization makes sense here. In order to enable incremental review and merge of these changes, do you think it could make sense to split the lazy spill reader initialization changes into a separate PR? I haven't compared the implementations in detail, so I haven't formed any opinion on the approach here vs. the one in #20184. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org