GitHub user jkbradley opened a pull request:
https://github.com/apache/spark/pull/8087
[WIP] [SPARK-9805] [MLLIB] [PYTHON] [STREAMING] Added _ssc_wait_checked for
ml streaming pyspark tests
Recently, PySpark ML streaming tests have been flaky, most likely because
of the batches not being processed in time. Proposal: Replace the use of
_ssc_wait (which waits for a fixed amount of time) with a method which waits
for a fixed amount of time but can terminate early based on a termination
condition method. With this, we can extend the waiting period (to make tests
less flaky) but also stop early when possible (making tests faster on average).
CC: @mengxr @tdas @freeman-lab If this looks reasonable, I'll update the
rest of the uses of "ssc_wait"
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/jkbradley/spark streaming-ml-tests
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/8087.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #8087
----
commit 421e68da3f8a121fee05aa5c903406e66e70eb20
Author: Joseph K. Bradley <[email protected]>
Date: 2015-08-10T23:27:42Z
added _ssc_wait_checked for ml streaming tests
commit 3c171b0aced0b7c18ff406e10b763dc3d4609598
Author: Joseph K. Bradley <[email protected]>
Date: 2015-08-10T23:33:43Z
reverted small fix to make wip review easier
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]