Github user sidhavratha commented on the issue:
https://github.com/apache/spark/pull/21685
Thanks a lot for looking into this. Please find comments in [] below every
points.
- You're trying to commit something into 2.4 but in the test result I see
with 2.1.0 version. Have you tested it with 2.4? This part of the code has
significantly changed. Results with this version would be better.
[We do not have 2.4.0 cluster handy. Will try to spawn a 2.4.0 cluster and
test the same.]
- In the before case the input rate was approximately the same just like in
the after case constantly. After the initial good performance something wrong
happened and decreased the rate significantly. What happened exactly there?
Maybe memory filled up and not able to poll things without GC (just guessing)?
[Kafka poll usually bring more records than one batch can process. In my
case it bring ~500 records. That records will be in buffer for 4-5 batches,
after which next poll will happen resulting in increased processing time. Also,
not all kafka poll takes long time. We have raised issue with our kafka team,
but it is inconclusive so far.]
[I looked at GC time on executor (through Spark UI), which was
insignificant. I will enable GC logs and run the job again.]
- Have you considered/tested when driver/receiver dies? Guarantees are
quite important.
[I will test this scenario. Basically, Am I supposed to test if driver dies
it should start from same place when it comes back up?]
- Have you tested it with receivers? Some results would be excellent.
[ I will get results with receivers as well.]
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]