[jira] [Commented] (SPARK-21742) BisectingKMeans generate different models with/without caching

2017-10-05 Thread Ilya Matiach (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16193691#comment-16193691 ] Ilya Matiach commented on SPARK-21742: -- [~podongfeng] The test was just validating that the edge

[jira] [Commented] (SPARK-21742) BisectingKMeans generate different models with/without caching

2017-08-22 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16137773#comment-16137773 ] zhengruifeng commented on SPARK-21742: -- [~srowen] Yes, if we cache the input dataset in testsuite,

[jira] [Commented] (SPARK-21742) BisectingKMeans generate different models with/without caching

2017-08-20 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16134326#comment-16134326 ] Sean Owen commented on SPARK-21742: --- I haven't noticed test failures in Jenkins. Is it something that

[jira] [Commented] (SPARK-21742) BisectingKMeans generate different models with/without caching

2017-08-17 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16130045#comment-16130045 ] zhengruifeng commented on SPARK-21742: -- [~srowen] I cache the dataset in that test, then it fails.

[jira] [Commented] (SPARK-21742) BisectingKMeans generate different models with/without caching

2017-08-17 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16130032#comment-16130032 ] Sean Owen commented on SPARK-21742: --- OK, so there's no difference attributable only to persisting?

[jira] [Commented] (SPARK-21742) BisectingKMeans generate different models with/without caching

2017-08-16 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16128623#comment-16128623 ] zhengruifeng commented on SPARK-21742: -- [~srowen] I create {{random}} and {{rdd}} twice in REPL with

[jira] [Commented] (SPARK-21742) BisectingKMeans generate different models with/without caching

2017-08-16 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16128580#comment-16128580 ] Sean Owen commented on SPARK-21742: --- Fixing the seed still doesn't mean that the two cases get the same

[jira] [Commented] (SPARK-21742) BisectingKMeans generate different models with/without caching

2017-08-16 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16128535#comment-16128535 ] zhengruifeng commented on SPARK-21742: -- [~mlnick] The seed is already fixed. It looks like if we use

[jira] [Commented] (SPARK-21742) BisectingKMeans generate different models with/without caching

2017-08-16 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16128512#comment-16128512 ] Nick Pentreath commented on SPARK-21742: Isn't the solution to set a fixed seed for the randomly

[jira] [Commented] (SPARK-21742) BisectingKMeans generate different models with/without caching

2017-08-16 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16128509#comment-16128509 ] zhengruifeng commented on SPARK-21742: -- [~srowen] you are right. When I create the same dataset in a

[jira] [Commented] (SPARK-21742) BisectingKMeans generate different models with/without caching

2017-08-16 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16128455#comment-16128455 ] Sean Owen commented on SPARK-21742: --- Can you demonstrate this with a data set that isn't randomly

[jira] [Commented] (SPARK-21742) BisectingKMeans generate different models with/without caching

2017-08-16 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16128452#comment-16128452 ] zhengruifeng commented on SPARK-21742: -- [~srowen] I retest it in different spark-shell. And the

[jira] [Commented] (SPARK-21742) BisectingKMeans generate different models with/without caching

2017-08-16 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16128433#comment-16128433 ] Sean Owen commented on SPARK-21742: --- You're defining a source DataFrame that's non-deterministic, and

[jira] [Commented] (SPARK-21742) BisectingKMeans generate different models with/without caching

2017-08-16 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16128427#comment-16128427 ] zhengruifeng commented on SPARK-21742: -- [~srowen] I set the seed for generate dataset and training

[jira] [Commented] (SPARK-21742) BisectingKMeans generate different models with/without caching

2017-08-16 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16128393#comment-16128393 ] Sean Owen commented on SPARK-21742: --- Is that a bug? Isn't it stochastic and dependent on the data order