Hello list, we encountered the error in subject yesterday for the first time and are looking for an explanation. We can't find any hints of what might have happened. The broker logs contain nothing unusual and all other metrics (hardware and software) were "as always" at the time of the incident. Can anyone give some hints as to the possible cause of this ? Google is no help either.
For context, we have a cluster made of 3 brokers (version 2.2.1) and one topic that is consumed by 3 spark streaming jobs. This setup has been up and running for almost a year now. All three jobs crashed simultaneously and logged the given error. The crash went unnoticed for several hours but the jobs seem to work fine after restart. We always start the jobs at the "latest" offset though. I'd gladly provide more details if needed. Thank you for any help you can provide, R.