[jira] [Commented] (SPARK-12703) Spark KMeans Documentation Python Api

2016-01-14 Thread Anton (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15098247#comment-15098247
 ] 

Anton commented on SPARK-12703:
---

The result is correct now, thanks!

> Spark KMeans Documentation Python Api
> -
>
> Key: SPARK-12703
> URL: https://issues.apache.org/jira/browse/SPARK-12703
> Project: Spark
>  Issue Type: Documentation
>  Components: MLlib
>Reporter: Anton
>Assignee: Joseph K. Bradley
>Priority: Minor
> Fix For: 2.0.0
>
>   Original Estimate: 5m
>  Remaining Estimate: 5m
>
> In the documentation of Spark's Kmeans - Python api:
> http://spark.apache.org/docs/latest/mllib-clustering.html#k-means
> the cost of the final result is calculated using the 'error()' function where 
> its returning:
> {quote}
> return sqrt(sum([x**2 for x in (point - center)]))
> {quote}
> As I understand, it's wrong to use sqrt() and it should be omitted:
> {quote} return sum([x**2 for x in (point - center)]).{quote}
> Please refer to :
> https://en.wikipedia.org/wiki/K-means_clustering#Description
> Where you can see that the power is canceling the square.
> What do you think? It's minor but wasted me a few min to understand why the 
> result isn't what I'm expecting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12703) Spark KMeans Documentation Python Api

2016-01-11 Thread Imran Younus (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15092695#comment-15092695
 ] 

Imran Younus commented on SPARK-12703:
--

PR stands for pull request.

> Spark KMeans Documentation Python Api
> -
>
> Key: SPARK-12703
> URL: https://issues.apache.org/jira/browse/SPARK-12703
> Project: Spark
>  Issue Type: Documentation
>  Components: MLlib
>Reporter: Anton
>Priority: Minor
>   Original Estimate: 5m
>  Remaining Estimate: 5m
>
> In the documentation of Spark's Kmeans - Python api:
> http://spark.apache.org/docs/latest/mllib-clustering.html#k-means
> the cost of the final result is calculated using the 'error()' function where 
> its returning:
> {quote}
> return sqrt(sum([x**2 for x in (point - center)]))
> {quote}
> As I understand, it's wrong to use sqrt() and it should be omitted:
> {quote} return sum([x**2 for x in (point - center)]).{quote}
> Please refer to :
> https://en.wikipedia.org/wiki/K-means_clustering#Description
> Where you can see that the power is canceling the square.
> What do you think? It's minor but wasted me a few min to understand why the 
> result isn't what I'm expecting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12703) Spark KMeans Documentation Python Api

2016-01-11 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15092699#comment-15092699
 ] 

Joseph K. Bradley commented on SPARK-12703:
---

Oh no problem.  I just sent a Pull Request (PR) which you can view here: 
[https://github.com/apache/spark/pull/10707/files]
Could you please check it out and make sure the updated code works for you?  
Thanks!

If you do want to get involved in contributing more, you can find (a lot) more 
info here: 
[https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark]

> Spark KMeans Documentation Python Api
> -
>
> Key: SPARK-12703
> URL: https://issues.apache.org/jira/browse/SPARK-12703
> Project: Spark
>  Issue Type: Documentation
>  Components: MLlib
>Reporter: Anton
>Priority: Minor
>   Original Estimate: 5m
>  Remaining Estimate: 5m
>
> In the documentation of Spark's Kmeans - Python api:
> http://spark.apache.org/docs/latest/mllib-clustering.html#k-means
> the cost of the final result is calculated using the 'error()' function where 
> its returning:
> {quote}
> return sqrt(sum([x**2 for x in (point - center)]))
> {quote}
> As I understand, it's wrong to use sqrt() and it should be omitted:
> {quote} return sum([x**2 for x in (point - center)]).{quote}
> Please refer to :
> https://en.wikipedia.org/wiki/K-means_clustering#Description
> Where you can see that the power is canceling the square.
> What do you think? It's minor but wasted me a few min to understand why the 
> result isn't what I'm expecting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12703) Spark KMeans Documentation Python Api

2016-01-11 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15092698#comment-15092698
 ] 

Apache Spark commented on SPARK-12703:
--

User 'jkbradley' has created a pull request for this issue:
https://github.com/apache/spark/pull/10707

> Spark KMeans Documentation Python Api
> -
>
> Key: SPARK-12703
> URL: https://issues.apache.org/jira/browse/SPARK-12703
> Project: Spark
>  Issue Type: Documentation
>  Components: MLlib
>Reporter: Anton
>Priority: Minor
>   Original Estimate: 5m
>  Remaining Estimate: 5m
>
> In the documentation of Spark's Kmeans - Python api:
> http://spark.apache.org/docs/latest/mllib-clustering.html#k-means
> the cost of the final result is calculated using the 'error()' function where 
> its returning:
> {quote}
> return sqrt(sum([x**2 for x in (point - center)]))
> {quote}
> As I understand, it's wrong to use sqrt() and it should be omitted:
> {quote} return sum([x**2 for x in (point - center)]).{quote}
> Please refer to :
> https://en.wikipedia.org/wiki/K-means_clustering#Description
> Where you can see that the power is canceling the square.
> What do you think? It's minor but wasted me a few min to understand why the 
> result isn't what I'm expecting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12703) Spark KMeans Documentation Python Api

2016-01-08 Thread Anton (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089853#comment-15089853
 ] 

Anton commented on SPARK-12703:
---

I'm new with all of this system.. what's a PR?


> Spark KMeans Documentation Python Api
> -
>
> Key: SPARK-12703
> URL: https://issues.apache.org/jira/browse/SPARK-12703
> Project: Spark
>  Issue Type: Documentation
>  Components: MLlib
>Reporter: Anton
>Priority: Minor
>   Original Estimate: 5m
>  Remaining Estimate: 5m
>
> In the documentation of Spark's Kmeans - python api:
> http://spark.apache.org/docs/latest/mllib-clustering.html#k-means
> the cost of the final result is calculated using the 'error()' function where 
> its returning:
> {quote}
> return sqrt(sum([x**2 for x in (point - center)]))
> {quote}
> As I understand, it's wrong to use sqrt() and it should be omitted:
> {quote} return sum([x**2 for x in (point - center)]).{quote}
> Please refer to :
> https://en.wikipedia.org/wiki/K-means_clustering#Description
> Where you can see that the power is canceling the square.
> What do you think? It's minor but wasted me a few min to understand why the 
> result isn't what I'm expecting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12703) Spark KMeans Documentation Python Api

2016-01-08 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089844#comment-15089844
 ] 

Joseph K. Bradley commented on SPARK-12703:
---

You're right that it shouldn't be computing sqrt.  Would you mind sending a 
little PR to fix it?  Thanks!

> Spark KMeans Documentation Python Api
> -
>
> Key: SPARK-12703
> URL: https://issues.apache.org/jira/browse/SPARK-12703
> Project: Spark
>  Issue Type: Documentation
>  Components: MLlib
>Reporter: Anton
>Priority: Minor
>   Original Estimate: 5m
>  Remaining Estimate: 5m
>
> In the documentation of Spark's Kmeans - python api:
> http://spark.apache.org/docs/latest/mllib-clustering.html#k-means
> the cost of the final result is calculated using the 'error()' function where 
> its returning:
> {quote}
> return sqrt(sum([x**2 for x in (point - center)]))
> {quote}
> As I understand, it's wrong to use sqrt() and it should be omitted:
> {quote} return sum([x**2 for x in (point - center)]).{quote}
> Please refer to :
> https://en.wikipedia.org/wiki/K-means_clustering#Description
> Where you can see that the power is canceling the square.
> What do you think? It's minor but wasted me a few min to understand why the 
> result isn't what I'm expecting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org