GitHub user zsxwing opened a pull request:
https://github.com/apache/spark/pull/8877
[SPARK-10769][Streaming][Tests]Fix
o.a.s.streaming.CheckpointSuite.maintains rate controller
Fixed the following failure in
https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1787/testReport/junit/org.apache.spark.streaming/CheckpointSuite/recovery_maintains_rate_controller/
```
sbt.ForkMain$ForkError: The code passed to eventually never returned
normally. Attempted 660 times over 10.000044392000001 seconds. Last failure
message: 9223372036854775807 did not equal 200.
at
org.scalatest.concurrent.Eventually$class.tryTryAgain$1(Eventually.scala:420)
at
org.scalatest.concurrent.Eventually$class.eventually(Eventually.scala:438)
at org.scalatest.concurrent.Eventually$.eventually(Eventually.scala:478)
at
org.scalatest.concurrent.Eventually$class.eventually(Eventually.scala:336)
at org.scalatest.concurrent.Eventually$.eventually(Eventually.scala:478)
at
org.apache.spark.streaming.CheckpointSuite$$anonfun$15.apply$mcV$sp(CheckpointSuite.scala:413)
at
org.apache.spark.streaming.CheckpointSuite$$anonfun$15.apply(CheckpointSuite.scala:396)
at
org.apache.spark.streaming.CheckpointSuite$$anonfun$15.apply(CheckpointSuite.scala:396)
at
org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
at org.scalatest.Transformer.apply(Transformer.scala:22)
```
In this test, it calls `advanceTimeWithRealDelay(ssc, 2)` to run two batch
jobs. However, one race condition is these two jobs can finish before the
receiver is registered. Then `UpdateRateLimit` won't be sent to the receiver
and `getDefaultBlockGeneratorRateLimit` cannot be updated.
Here are the logs related to this issue:
```
15/09/22 19:28:26.154 pool-1-thread-1-ScalaTest-running-CheckpointSuite
INFO CheckpointSuite: Manual clock before advancing = 2500
15/09/22 19:28:26.869 JobScheduler INFO JobScheduler: Finished job
streaming job 3000 ms.0 from job set of time 3000 ms
15/09/22 19:28:26.869 JobScheduler INFO JobScheduler: Total delay:
1442975303.869 s for time 3000 ms (execution: 0.711 s)
15/09/22 19:28:26.873 JobScheduler INFO JobScheduler: Finished job
streaming job 3500 ms.0 from job set of time 3500 ms
15/09/22 19:28:26.873 JobScheduler INFO JobScheduler: Total delay:
1442975303.373 s for time 3500 ms (execution: 0.004 s)
15/09/22 19:28:26.879 sparkDriver-akka.actor.default-dispatcher-3 INFO
ReceiverTracker: Registered receiver for stream 0 from localhost:57749
15/09/22 19:28:27.154 pool-1-thread-1-ScalaTest-running-CheckpointSuite
INFO CheckpointSuite: Manual clock after advancing = 3500
```
`advanceTimeWithRealDelay(ssc, 2)` triggered job 3000ms and 3500ms but the
receiver was registered after job 3000ms and 3500ms finished.
So we should make sure the receiver online before running
`advanceTimeWithRealDelay(ssc, 2)`.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/zsxwing/spark SPARK-10769
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/8877.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #8877
----
commit c4ee299ddb187fe6479c971c3641ee0fa42bd35e
Author: zsxwing <[email protected]>
Date: 2015-09-23T05:37:02Z
Fix o.a.s.streaming.CheckpointSuite.maintains rate controller
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]