GitHub user brkyvz opened a pull request:
https://github.com/apache/spark/pull/9143
[STREAMING] Batch ReceivedBlockTrackerLogEvents for WAL writes
When using S3 as a directory for WALs, the writes take too long. The driver
gets very easily bottlenecked when multiple receivers send AddBlock events to
the ReceiverTracker. This PR adds batching of events in the
ReceivedBlockTracker so that receivers don't get blocked by the driver for too
long.
cc @zsxwing @tdas
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/brkyvz/spark batch-wal-writes
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/9143.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #9143
----
commit 8f92f10f4aa17b1a19a72e1c257273bb26080bb5
Author: Burak Yavuz <[email protected]>
Date: 2015-10-13T20:40:37Z
ready for testing
commit 78c6069477422d4984b7107db435245c811dbab9
Author: Burak Yavuz <[email protected]>
Date: 2015-10-14T15:21:10Z
save changes
commit ee36f8968354ecb14f1153b9bd7fb8f0d4bb9e1e
Author: Burak Yavuz <[email protected]>
Date: 2015-10-16T00:21:56Z
add more tests
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]