Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8087#discussion_r36698469
  
    --- Diff: python/pyspark/mllib/tests.py ---
    @@ -1041,10 +1054,12 @@ def test_trainOn_model(self):
             self.ssc.start()
     
             # Give enough time to train the model.
    -        self._ssc_wait(t, 6.0, 0.01)
    -        finalModel = stkm.latestModel()
    -        self.assertTrue(all(finalModel.centers == array(initCenters)))
    -        self.assertEquals(finalModel.clusterWeights, [5.0, 5.0, 5.0, 5.0])
    +        def termCheck():
    +            finalModel = stkm.latestModel()
    +            all(finalModel.centers == array(initCenters)) and \
    +            finalModel.clusterWeight == [5.0, 5.0, 5.0, 5.0]
    +        self._ssc_wait_checked(t, 20.0, termCheck)
    +        self.assertTrue(termCheck())
    --- End diff --
    
    There is still a slight possibility that between the last time term_check() 
is called in the _ssc_wait_checked, and next time its called in this method, 
another batch may have been processed, which which fail the test unnecessarily. 
So a better approach would be for the _ssc_wait_checked method to return True 
if the term_check() has succeeded within the timeout, otherwise return false. 
Then there is not need to check term_check() once again.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to