Github user jose-torres commented on the issue:
https://github.com/apache/spark/pull/20675
It's not semantically wrong that the attempt number is never reset; it just
means that for very long-running streams task restarts will eventually run out.
You make a good point that in high parallelism cases we might need to be
able to restart only a single task, although I think we'd still need
query-level restart on top of that. But if you're worried that the current
implementation of task restart will become incorrect as more complex scenarios
are supported, I'd definitely lean towards deferring it until continuous
processing is more feature-complete.
I was working on getting basic aggregation working, and I think we
definitely will need some kind of setOffset-like functionality. Do you want to
spin that off into a separate PR? (I can handle it otherwise.)
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]