Github user gczsjdy commented on a diff in the pull request:
https://github.com/apache/spark/pull/19862#discussion_r154563897
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala
---
@@ -674,8 +674,9 @@ private[joins] class SortMergeJoinScanner(
private[this] val bufferedMatches =
new ExternalAppendOnlyUnsafeRowArray(inMemoryThreshold, spillThreshold)
- // Initialization (note: do _not_ want to advance streamed here).
- advancedBufferedToRowWithNullFreeJoinKey()
+ // Initialization (note: do _not_ want to advance streamed here). This
is made lazy to prevent
+ // unnecessary trigger of calculation.
+ private lazy val advancedBufferedIterRes =
advancedBufferedToRowWithNullFreeJoinKey()
--- End diff --
This function should be called (to try to set `BufferedRow`) before
`BufferedRow` is checked, and it should be only once. This is the original
requirement due to the logic. While to add this optimization, I think this is
the best way.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]