GitHub user arzt opened a pull request:

    https://github.com/apache/spark/pull/17774

    [SPARK-18371][Streaming] Spark Streaming backpressure generates batch with 
large number of records

    ## What changes were proposed in this pull request?
    
    Omit rounding of backpressure rate. Effects:
    - no batch with large number of records is created when rate from PID 
estimator is small
    - the number of records per batch and partition is more fine-grained 
improving backpressure accuracy
    
    ## How was this patch tested?
    
    This was tested by running:
    - `mvn test -pl external/kafka-0-8`
    - `mvn test -pl external/kafka-0-10`
    - a streaming application which was suffering from the issue
    
    @JasonMWhite
    
    The contribution is my original work and I license the work to the project 
under the project’s open source license
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/arzt/spark kafka-back-pressure

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17774.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17774
    
----
commit a7a5cbca2d453a1691da9ddc1c6f74eba78b6289
Author: Sebastian Arzt <[email protected]>
Date:   2017-04-26T13:39:46Z

    no rounding of backpressure rate

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to