[GitHub] spark issue #21685: [SPARK-24707][DSTREAMS] Enable spark-kafka-streaming to ...

sidhavratha Mon, 02 Jul 2018 06:54:53 -0700

Github user sidhavratha commented on the issue:

    https://github.com/apache/spark/pull/21685
  
    And yes, both application are tested on same dataset, with only additional 
buffer logic applied, and consumer group-id changed.
    
    In before case scheduling delay is increasing because of kafka poll time 
during few batches, which is exactly the problem I have tried to solve in this 
PR.
    
    We are still trying to figure out reason for frequent high poll time, 
however, we should remove dependency of a batch processing time from kafka 
poll, which can be done due to executor stickiness for each kafka partition. 
This will provide benefit in terms of kafka poll time even if it is few seconds.
    
    Also, the configuration I am adding is optional, without which job will 
behave as usual (blocking on poll).



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #21685: [SPARK-24707][DSTREAMS] Enable spark-kafka-streaming to ...

Reply via email to