GitHub user jinxing64 opened a pull request:
https://github.com/apache/spark/pull/16690
[SPARK-19347] ReceiverSupervisorImpl can add block to ReceiverTracker
multiple times because of askWithRetry.
## What changes were proposed in this pull request?
`ReceiverSupervisorImpl` on executor side reports block's meta back to
`ReceiverTracker` on driver side. In current code, `askWithRetry` is used.
However, for `AddBlock`, `ReceiverTracker` is not idempotent, which may result
in messages are processed multiple times.
*To reproduce*:
1. Check if it is the first time receiving `AddBlock` in `ReceiverTracker`,
if so sleep long enough(say 200 seconds), thus the first RPC call will be
timeout in `askWithRetry`, then `AddBlock` will be resent.
2. Rebuild Spark and run following job:
```
def streamProcessing(): Unit = {
val conf = new SparkConf()
.setAppName("StreamingTest")
.setMaster(masterUrl)
val ssc = new StreamingContext(conf, Seconds(200))
val stream = ssc.socketTextStream("localhost", 1234)
stream.print()
ssc.start()
ssc.awaitTermination()
}
```
To fix:
It makes sense to provide a blocking version `ask` in RpcEndpointRef, as
mentioned in SPARK-18113
(https://github.com/apache/spark/pull/16503#event-927953218). Because Netty RPC
layer will not drop messages. `askWithRetry` is a leftover from akka days. It
imposes restrictions on the caller(e.g. idempotency) and other things that
people generally don't pay that much attention to when using it.
## How was this patch tested?
Test manually. The scenario described above doesn't happen with this patch.
Please review http://spark.apache.org/contributing.html before opening a
pull request.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/jinxing64/spark SPARK-19347
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/16690.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #16690
----
commit c5bcccf227446f5d044f8fb0518caa12cfef7421
Author: jinxing <[email protected]>
Date: 2017-01-24T09:33:23Z
[SPARK-19347] ReceiverSupervisorImpl can add block to ReceiverTracker
multiple times because of askWithRetry
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]