viirya edited a comment on pull request #35854:
URL: https://github.com/apache/spark/pull/35854#issuecomment-1067516577


   We know `SessionWindowStateStoreSaveExec` is behind 
`SessionWindowStateStoreRestoreExec` in the operator order. So if input rows 
are dropped by `SessionWindowStateStoreRestoreExec`, we won't see them in later 
operators such as `SessionWindowStateStoreSaveExec`.
   
   That's why we observed that some rows seems dropped by watermark, but we 
don't see any `numRowsDroppedByWatermark`.
   
   `SessionWindowStateStoreRestoreExec` is not a state store writer, so it 
doesn't have `numRowsDroppedByWatermark` metric, but it actually drops input 
rows by watermark predicate. It is confused to end users as they cannot 
accurately measure the number of dropped by watermark.
   
   Does it make sense to you?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to