erenavsarogullari commented on PR #44646: URL: https://github.com/apache/spark/pull/44646#issuecomment-1892961944
> Adding statistical information is good, but I'm not sure if it's worth it. Currently, `WindowExec` Physical Operator does not have any SQLMetrics except `spillSize` (in-memory). This PR aims to be understood WindowExec runtime behavior such as: - what is created `numOfWindowPartitions` in addition `numOfPartitions`?, - what is processed `numOfOutputRows`?, - `how many rows are spilled` into disk?, - what is the spilled size on disk`?. For example, WindowExec spilling behavior depends on multiple factor and it is hard to track without metrics such as: **1-** WindowExec creates `ExternalAppendOnlyUnsafeRowArray` (internal ArrayBuffer) per `task` (a.k.a child RDD partition) **2-** When ExternalAppendOnlyUnsafeRowArray size exceeds `spark.sql.windowExec.buffer.in.memory.threshold=4096`, ExternalAppendOnlyUnsafeRowArray switches to `UnsafeExternalSorter` as `spillableArray` by moving its all buffered rows into UnsafeExternalSorter and ExternalAppendOnlyUnsafeRowArray (internal ArrayBuffer) is cleared. In this case, WindowExec starts to write UnsafeExternalSorter' s buffer (a.k.a `UnsafeInMemorySorter`). **3-** UnsafeExternalSorter is being created per `window partition`. When UnsafeExternalSorter' buffer size exceeds `spark.sql.windowExec.buffer.spill.threshold=Integer.MAX_VALUE`, it starts to write to disk and get cleared all buffer (a.k.a UnsafeInMemorySorter) content. In this case, UnsafeExternalSorter will continue to buffer next records until exceeding spark.sql.windowExec.buffer.spill.threshold. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
