GitHub user arzt opened a pull request:
https://github.com/apache/spark/pull/17774
[SPARK-18371][Streaming] Spark Streaming backpressure generates batch with
large number of records
## What changes were proposed in this pull request?
Omit rounding of backpressure rate. Effects:
- no batch with large number of records is created when rate from PID
estimator is small
- the number of records per batch and partition is more fine-grained
improving backpressure accuracy
## How was this patch tested?
This was tested by running:
- `mvn test -pl external/kafka-0-8`
- `mvn test -pl external/kafka-0-10`
- a streaming application which was suffering from the issue
@JasonMWhite
The contribution is my original work and I license the work to the project
under the projectâs open source license
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/arzt/spark kafka-back-pressure
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/17774.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #17774
----
commit a7a5cbca2d453a1691da9ddc1c6f74eba78b6289
Author: Sebastian Arzt <[email protected]>
Date: 2017-04-26T13:39:46Z
no rounding of backpressure rate
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]