erenavsarogullari commented on PR #44646:
URL: https://github.com/apache/spark/pull/44646#issuecomment-1892961944

   > Adding statistical information is good, but I'm not sure if it's worth it.
   
   Currently, `WindowExec` Physical Operator does not have any SQLMetrics 
except `spillSize` (in-memory). This PR aims to be understood WindowExec 
runtime behavior such as:
   - what is created `numOfWindowPartitions` in addition `numOfPartitions`?, 
   - what is processed `numOfOutputRows`?, 
   - `how many rows are spilled` into disk?,
   -  what is the spilled size on disk`?. 
   For example, WindowExec spilling behavior depends on multiple factor and it 
is hard to track without metrics  such as:
   
   **1-** WindowExec creates `ExternalAppendOnlyUnsafeRowArray` (internal 
ArrayBuffer) per `task` (a.k.a child RDD partition)
   **2-** When ExternalAppendOnlyUnsafeRowArray size exceeds 
`spark.sql.windowExec.buffer.in.memory.threshold=4096`, 
ExternalAppendOnlyUnsafeRowArray switches to `UnsafeExternalSorter` as 
`spillableArray` by moving its all buffered rows into UnsafeExternalSorter and 
ExternalAppendOnlyUnsafeRowArray (internal ArrayBuffer) is cleared. In this 
case, WindowExec starts to write UnsafeExternalSorter' s buffer (a.k.a 
`UnsafeInMemorySorter`).
   **3-** UnsafeExternalSorter is being created per `window partition`. When 
UnsafeExternalSorter' buffer size exceeds 
`spark.sql.windowExec.buffer.spill.threshold=Integer.MAX_VALUE`, it starts to 
write to disk and get cleared all buffer (a.k.a UnsafeInMemorySorter) content. 
In this case, UnsafeExternalSorter will continue to buffer next records until 
exceeding spark.sql.windowExec.buffer.spill.threshold. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to