siknezevic commented on issue #27246: [SPARK-30536][CORE][SQL] Sort-merge join operator spilling performance improvements URL: https://github.com/apache/spark/pull/27246#issuecomment-577499526 > Regarding the "Implement lazy initialization of UnsafeSorterSpillReader" portion of the changes: > > It looks like @liutang123 previously proposed a similar change in 2018 in #20184 / [SPARK-22987](https://issues.apache.org/jira/browse/SPARK-22987). I stumbled across that PR earlier this year and [left some comments](https://github.com/apache/spark/pull/20184#issuecomment-511240279): overall, I think that lazy initialization of the individual spill readers makes sense. > > In order to enable incremental review and merge of these changes, do you think it could make sense to split the lazy spill reader initialization changes into a separate PR? I haven't compared the implementations in detail, so I haven't formed any opinion on the approach here vs. the one in #20184. Thank you for the comments. Yes. I am OK with submitting new PR only for lazy spill reader initialization. I looked #20184 and I can see there is still I/O in the constructor Line 69: numRecords = numRecordsRemaining = dis.readInt(); But, I am open for suggestions. Please let me know which approach would make more sense.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
