Github user sidhavratha commented on the issue:
https://github.com/apache/spark/pull/21685
And yes, both application are tested on same dataset, with only additional
buffer logic applied, and consumer group-id changed.
In before case scheduling delay is increasing because of kafka poll time
during few batches, which is exactly the problem I have tried to solve in this
PR.
We are still trying to figure out reason for frequent high poll time,
however, we should remove dependency of a batch processing time from kafka
poll, which can be done due to executor stickiness for each kafka partition.
This will provide benefit in terms of kafka poll time even if it is few seconds.
Also, the configuration I am adding is optional, without which job will
behave as usual (blocking on poll).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]