viirya commented on pull request #35854: URL: https://github.com/apache/spark/pull/35854#issuecomment-1067516577
We know `SessionWindowStateStoreSaveExec` is behind `SessionWindowStateStoreRestoreExec` in the operator order. So if input rows are dropped by `SessionWindowStateStoreRestoreExec`, we won't see them in later operators such as `SessionWindowStateStoreSaveExec`. That's why we observed that some rows seems dropped by watermark, but we don't see any `numRowsDroppedByWatermark`. `SessionWindowStateStoreRestoreExec` is not a state store writer, so it doesn't have `numRowsDroppedByWatermark` metric, but it actually drops input rows by watermark predicate. It is confused to end users as they cannot accurately measure the number of dropped by watermark. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
