HeartSaVioR commented on PR #53911: URL: https://github.com/apache/spark/pull/53911#issuecomment-3797401679
I'll clarify the intention. I argue event time should have been considered as a first class at the first time when designing state store. We didn't do this, hence the operation with event time is always not performant, though it is indeed the way operator produces output on append mode and evicts state. I'd rather say I'm trying to fix it. The only exception from the above is TWS which we separate out the data and the timer and the data won't be coupled with event time since timer will do it. While someone may argue this is a better design since we separate out the concerns, but I believe this doesn't perform well compared to the proposal. So for me the attempt of generalization to remove the concept of event time here and replace it with long type or something is against the direction of proposal. If we have a case where data should be ordered in integer type - should we do the same and expose an API and etcetc? I don't think that has sufficient motivation. The motivation of event time as first class is that it is one of the core concept of the streaming engine and we had been ignoring it in state store. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
