Github user nicklavers commented on the issue:

    https://github.com/apache/spark/pull/14551
  
    Hey @yanboliang, I'm failing `test_gmm` in `python/pyspark/mllib/tests.py`, 
and I'm fairly certain it's because the test relies on the bug I'm fixing.
    
    I saw you worked on `GaussianMixture` in 
`python/pyspark/mllib/clustering.py`, which is the subject of the failed test, 
and which I don't initimately understand.
    
    `GaussianMixture` uses `callMLlibFunc` to call 
`mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixture.scala`, 
which in turn calls `RDD.takeSample`, which uses `Utils.randomizeInPlace`, 
which I've changed.
    
    `randomizeInPlace` is supposed to shuffle an array, but as it is now, 
elements can never end up where they started. This problem is most evident for 
small arrays; for example, a two-element array will ALWAYS be reversed by 
`randomizeInPlace`, whereas it should only be reversed 50% of the time.
    
    `test_gmm` uses a very small test array, and so my guess is that a new, 
previously impossible, permutation of that test array is causing the test the 
fail.
    
    Can you help me figure this out?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to