[jira] [Commented] (SPARK-17836) Use cross validation to determine the number of clusters for EM or KMeans algorithms
[ https://issues.apache.org/jira/browse/SPARK-17836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15669182#comment-15669182 ] Lei Wang commented on SPARK-17836: -- Yes. Of course. Do you also have the same demand? > Use cross validation to determine the number of clusters for EM or KMeans > algorithms > > > Key: SPARK-17836 > URL: https://issues.apache.org/jira/browse/SPARK-17836 > Project: Spark > Issue Type: New Feature > Components: ML >Reporter: Lei Wang >Priority: Minor > > Sometimes it's not easy for users to determine number of clusters. > It would be very useful If spark ml can support this. > There are several methods to do this according to wiki > https://en.wikipedia.org/wiki/Determining_the_number_of_clusters_in_a_data_set > Weka uses cross validation. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17934) Support percentile scale in ml.feature
Lei Wang created SPARK-17934: Summary: Support percentile scale in ml.feature Key: SPARK-17934 URL: https://issues.apache.org/jira/browse/SPARK-17934 Project: Spark Issue Type: New Feature Components: ML Reporter: Lei Wang Percentile scale is often used in feature scale. In my project, I need to use this scaler. Compared to MinMaxScaler, PercentileScaler will not produce unstable result due to anomaly large value. About percentile scale, refer to https://en.wikipedia.org/wiki/Percentile_rank -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14272) Evaluate GaussianMixtureModel with LogLikelihood
[ https://issues.apache.org/jira/browse/SPARK-14272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15564986#comment-15564986 ] Lei Wang commented on SPARK-14272: -- Is this still in progress? > Evaluate GaussianMixtureModel with LogLikelihood > > > Key: SPARK-14272 > URL: https://issues.apache.org/jira/browse/SPARK-14272 > Project: Spark > Issue Type: Improvement > Components: MLlib >Reporter: zhengruifeng >Priority: Minor > > GMM use EM to maximum the likelihood of data. So likelihood can be a useful > metric to evaluate GaussianMixtureModel. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17836) Use cross validation to determine the number of clusters for EM or KMeans algorithms
[ https://issues.apache.org/jira/browse/SPARK-17836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei Wang updated SPARK-17836: - Description: Sometimes it's not easy for users to determine number of clusters. It would be very useful If spark ml can support this. There are several methods to do this according to wiki https://en.wikipedia.org/wiki/Determining_the_number_of_clusters_in_a_data_set Weka uses cross validation. was: Sometimes it's not easy for users to determine number of clusters. It would be very useful If spark ml can support this. There are several methods to do this according to wiki https://en.wikipedia.org/wiki/Determining_the_number_of_clusters_in_a_data_set Weka uses crossing validation. > Use cross validation to determine the number of clusters for EM or KMeans > algorithms > > > Key: SPARK-17836 > URL: https://issues.apache.org/jira/browse/SPARK-17836 > Project: Spark > Issue Type: New Feature > Components: ML >Reporter: Lei Wang >Priority: Minor > > Sometimes it's not easy for users to determine number of clusters. > It would be very useful If spark ml can support this. > There are several methods to do this according to wiki > https://en.wikipedia.org/wiki/Determining_the_number_of_clusters_in_a_data_set > Weka uses cross validation. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17836) Use cross validation to determine the number of clusters for EM or KMeans algorithms
[ https://issues.apache.org/jira/browse/SPARK-17836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei Wang updated SPARK-17836: - Issue Type: New Feature (was: Bug) > Use cross validation to determine the number of clusters for EM or KMeans > algorithms > > > Key: SPARK-17836 > URL: https://issues.apache.org/jira/browse/SPARK-17836 > Project: Spark > Issue Type: New Feature > Components: ML >Reporter: Lei Wang > > Sometimes it's not easy for users to determine number of clusters. > It would be very useful If spark ml can support this. > There are several methods to do this according to wiki > https://en.wikipedia.org/wiki/Determining_the_number_of_clusters_in_a_data_set > Weka uses crossing validation. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17836) Use cross validation to determine the number of clusters for EM or KMeans algorithms
Lei Wang created SPARK-17836: Summary: Use cross validation to determine the number of clusters for EM or KMeans algorithms Key: SPARK-17836 URL: https://issues.apache.org/jira/browse/SPARK-17836 Project: Spark Issue Type: Bug Components: ML Reporter: Lei Wang Sometimes it's not easy for users to determine number of clusters. It would be very useful If spark ml can support this. There are several methods to do this according to wiki https://en.wikipedia.org/wiki/Determining_the_number_of_clusters_in_a_data_set Weka uses crossing validation. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17825) Expose log likelihood of EM algorithm in mllib
[ https://issues.apache.org/jira/browse/SPARK-17825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15557181#comment-15557181 ] Lei Wang commented on SPARK-17825: -- That's good. May I take part in this job? By the way, are you planning to replace mllib with ml in the future? > Expose log likelihood of EM algorithm in mllib > -- > > Key: SPARK-17825 > URL: https://issues.apache.org/jira/browse/SPARK-17825 > Project: Spark > Issue Type: New Feature > Components: ML >Reporter: Lei Wang > > Users sometimes need to get log likelihood of EM algorithm. > For example, one might use this value to choose appropriate cluster number. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17825) Expose log likelihood of EM algorithm in mllib
[ https://issues.apache.org/jira/browse/SPARK-17825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei Wang updated SPARK-17825: - Description: Users sometimes need to get log likelihood of EM algorithm. For example, one might use this value to choose appropriate cluster number. > Expose log likelihood of EM algorithm in mllib > -- > > Key: SPARK-17825 > URL: https://issues.apache.org/jira/browse/SPARK-17825 > Project: Spark > Issue Type: New Feature > Components: MLlib >Reporter: Lei Wang > > Users sometimes need to get log likelihood of EM algorithm. > For example, one might use this value to choose appropriate cluster number. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-17826) Expose log likelihood of EM algorithm in mllib
[ https://issues.apache.org/jira/browse/SPARK-17826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei Wang closed SPARK-17826. Resolution: Duplicate > Expose log likelihood of EM algorithm in mllib > -- > > Key: SPARK-17826 > URL: https://issues.apache.org/jira/browse/SPARK-17826 > Project: Spark > Issue Type: New Feature > Components: MLlib >Reporter: Lei Wang > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17826) Expose log likelihood of EM algorithm in mllib
Lei Wang created SPARK-17826: Summary: Expose log likelihood of EM algorithm in mllib Key: SPARK-17826 URL: https://issues.apache.org/jira/browse/SPARK-17826 Project: Spark Issue Type: New Feature Components: MLlib Reporter: Lei Wang -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17825) Expose log likelihood of EM algorithm in mllib
Lei Wang created SPARK-17825: Summary: Expose log likelihood of EM algorithm in mllib Key: SPARK-17825 URL: https://issues.apache.org/jira/browse/SPARK-17825 Project: Spark Issue Type: New Feature Components: MLlib Reporter: Lei Wang -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org