[jira] [Commented] (SPARK-17836) Use cross validation to determine the number of clusters for EM or KMeans algorithms

2016-11-15 Thread Lei Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15669182#comment-15669182
 ] 

Lei Wang commented on SPARK-17836:
--

Yes. Of course.
Do you also have the same demand?

> Use cross validation to determine the number of clusters for EM or KMeans 
> algorithms
> 
>
> Key: SPARK-17836
> URL: https://issues.apache.org/jira/browse/SPARK-17836
> Project: Spark
>  Issue Type: New Feature
>  Components: ML
>Reporter: Lei Wang
>Priority: Minor
>
> Sometimes it's not easy for users to determine number of clusters.
> It would be very useful If spark ml can support this. 
> There are several methods to do this according to wiki 
> https://en.wikipedia.org/wiki/Determining_the_number_of_clusters_in_a_data_set
> Weka uses cross validation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-17934) Support percentile scale in ml.feature

2016-10-14 Thread Lei Wang (JIRA)
Lei Wang created SPARK-17934:


 Summary: Support percentile scale in ml.feature
 Key: SPARK-17934
 URL: https://issues.apache.org/jira/browse/SPARK-17934
 Project: Spark
  Issue Type: New Feature
  Components: ML
Reporter: Lei Wang


Percentile scale is often used in feature scale.
In my project, I need to use this scaler.
Compared to MinMaxScaler, PercentileScaler will not produce unstable result due 
to anomaly large value.

About percentile scale, refer to https://en.wikipedia.org/wiki/Percentile_rank



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14272) Evaluate GaussianMixtureModel with LogLikelihood

2016-10-11 Thread Lei Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15564986#comment-15564986
 ] 

Lei Wang commented on SPARK-14272:
--

Is this still in progress? 

> Evaluate GaussianMixtureModel with LogLikelihood
> 
>
> Key: SPARK-14272
> URL: https://issues.apache.org/jira/browse/SPARK-14272
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Reporter: zhengruifeng
>Priority: Minor
>
> GMM use EM to maximum the likelihood of data. So likelihood can be a useful 
> metric to evaluate GaussianMixtureModel.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-17836) Use cross validation to determine the number of clusters for EM or KMeans algorithms

2016-10-08 Thread Lei Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Wang updated SPARK-17836:
-
Description: 
Sometimes it's not easy for users to determine number of clusters.
It would be very useful If spark ml can support this. 
There are several methods to do this according to wiki 
https://en.wikipedia.org/wiki/Determining_the_number_of_clusters_in_a_data_set
Weka uses cross validation.

  was:
Sometimes it's not easy for users to determine number of clusters.
It would be very useful If spark ml can support this. 
There are several methods to do this according to wiki 
https://en.wikipedia.org/wiki/Determining_the_number_of_clusters_in_a_data_set
Weka uses crossing validation.


> Use cross validation to determine the number of clusters for EM or KMeans 
> algorithms
> 
>
> Key: SPARK-17836
> URL: https://issues.apache.org/jira/browse/SPARK-17836
> Project: Spark
>  Issue Type: New Feature
>  Components: ML
>Reporter: Lei Wang
>Priority: Minor
>
> Sometimes it's not easy for users to determine number of clusters.
> It would be very useful If spark ml can support this. 
> There are several methods to do this according to wiki 
> https://en.wikipedia.org/wiki/Determining_the_number_of_clusters_in_a_data_set
> Weka uses cross validation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-17836) Use cross validation to determine the number of clusters for EM or KMeans algorithms

2016-10-08 Thread Lei Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Wang updated SPARK-17836:
-
Issue Type: New Feature  (was: Bug)

> Use cross validation to determine the number of clusters for EM or KMeans 
> algorithms
> 
>
> Key: SPARK-17836
> URL: https://issues.apache.org/jira/browse/SPARK-17836
> Project: Spark
>  Issue Type: New Feature
>  Components: ML
>Reporter: Lei Wang
>
> Sometimes it's not easy for users to determine number of clusters.
> It would be very useful If spark ml can support this. 
> There are several methods to do this according to wiki 
> https://en.wikipedia.org/wiki/Determining_the_number_of_clusters_in_a_data_set
> Weka uses crossing validation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-17836) Use cross validation to determine the number of clusters for EM or KMeans algorithms

2016-10-08 Thread Lei Wang (JIRA)
Lei Wang created SPARK-17836:


 Summary: Use cross validation to determine the number of clusters 
for EM or KMeans algorithms
 Key: SPARK-17836
 URL: https://issues.apache.org/jira/browse/SPARK-17836
 Project: Spark
  Issue Type: Bug
  Components: ML
Reporter: Lei Wang


Sometimes it's not easy for users to determine number of clusters.
It would be very useful If spark ml can support this. 
There are several methods to do this according to wiki 
https://en.wikipedia.org/wiki/Determining_the_number_of_clusters_in_a_data_set
Weka uses crossing validation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17825) Expose log likelihood of EM algorithm in mllib

2016-10-07 Thread Lei Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15557181#comment-15557181
 ] 

Lei Wang commented on SPARK-17825:
--

That's good. May I take part in this job?
By the way, are you planning to replace mllib with ml in the future?



> Expose log likelihood of EM algorithm in mllib
> --
>
> Key: SPARK-17825
> URL: https://issues.apache.org/jira/browse/SPARK-17825
> Project: Spark
>  Issue Type: New Feature
>  Components: ML
>Reporter: Lei Wang
>
> Users sometimes need to get log likelihood of EM algorithm.
> For example, one might use this value to choose appropriate cluster number.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-17825) Expose log likelihood of EM algorithm in mllib

2016-10-07 Thread Lei Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Wang updated SPARK-17825:
-
Description: 
Users sometimes need to get log likelihood of EM algorithm.
For example, one might use this value to choose appropriate cluster number.

> Expose log likelihood of EM algorithm in mllib
> --
>
> Key: SPARK-17825
> URL: https://issues.apache.org/jira/browse/SPARK-17825
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Reporter: Lei Wang
>
> Users sometimes need to get log likelihood of EM algorithm.
> For example, one might use this value to choose appropriate cluster number.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-17826) Expose log likelihood of EM algorithm in mllib

2016-10-07 Thread Lei Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Wang closed SPARK-17826.

Resolution: Duplicate

> Expose log likelihood of EM algorithm in mllib
> --
>
> Key: SPARK-17826
> URL: https://issues.apache.org/jira/browse/SPARK-17826
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Reporter: Lei Wang
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-17826) Expose log likelihood of EM algorithm in mllib

2016-10-07 Thread Lei Wang (JIRA)
Lei Wang created SPARK-17826:


 Summary: Expose log likelihood of EM algorithm in mllib
 Key: SPARK-17826
 URL: https://issues.apache.org/jira/browse/SPARK-17826
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Reporter: Lei Wang






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-17825) Expose log likelihood of EM algorithm in mllib

2016-10-07 Thread Lei Wang (JIRA)
Lei Wang created SPARK-17825:


 Summary: Expose log likelihood of EM algorithm in mllib
 Key: SPARK-17825
 URL: https://issues.apache.org/jira/browse/SPARK-17825
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Reporter: Lei Wang






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org