gaborgsomogyi commented on pull request #31944: URL: https://github.com/apache/spark/pull/31944#issuecomment-808166003
> Take Kafka as an example, we can have read limit while consuming the offsets, so we can only consume some certain number of offset, but the available data in kafka is more than that. That can be applied to all the other streaming sources too. There are some users want to know whether they fall behind through the listener and want to adjust the cluster size accordingly. If I understand correctly it's planned to monitor whether Spark is behind in Kafka processing. If that's true there is an existing solution for this which works like charm. The user can commit the offsets back to Kafka with a listener and the delta between available and committed offsets can be monitored. If this would be the only use-case then not 100% sure it worth the ~300 lines change set in the Kafka part. I would like to emphasize I'm not against to make this better, just would be good to see a bit more from use-case perspective. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
