[jira] [Commented] (SPARK-15447) Performance test for ALS in Spark 2.0

2016-06-17 Thread Nick Pentreath (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15335956#comment-15335956
 ] 

Nick Pentreath commented on SPARK-15447:


Finalized results in the linked Google sheet. Also posted raw results in two 
linked Google docs.

[~mengxr] I didn't manage to run 1 billion ratings but did run 250mm (30mm 
users, 10mm items, 250mm ratings). I didn't see any potential performance 
regression issues for checkpointing changes (comparing RDD-based APIs between 
2.0.0 and 1.6.1) or DF changes (comparing DF-based APIs between 2.0.0 and 
1.6.1). I'm resolving this ticket, but let me know if you come up with any 
questions or concerns.

> Performance test for ALS in Spark 2.0
> -
>
> Key: SPARK-15447
> URL: https://issues.apache.org/jira/browse/SPARK-15447
> Project: Spark
>  Issue Type: Task
>  Components: ML
>Affects Versions: 2.0.0
>Reporter: Xiangrui Meng
>Assignee: Nick Pentreath
>Priority: Critical
>  Labels: QA
>
> We made several changes to ALS in 2.0. It is necessary to run some tests to 
> avoid performance regression. We should test (synthetic) datasets from 1 
> million ratings to 1 billion ratings.
> cc [~mlnick] [~holdenk] Do you have time to run some large-scale performance 
> tests?
> Links:
> [Results 
> spreadsheet|https://docs.google.com/spreadsheets/d/1iX5LisfXcZSTCHp8VPoo5z-eCO85A5VsZDtZ5e475ks/edit?usp=sharing]
> [Raw results for 
> SPARK-14891|https://docs.google.com/document/d/1tlWFCv8zWJuxv_gfAhd-57TKURVyrYkF9v4FLl4Lpn0/edit?usp=sharing]
> [Raw results for 
> SPARK-6716|https://docs.google.com/document/d/12qLLX84Dg-XJAgoSQzmb0-bSncjTHhg7A_JJcQneDiE/edit?usp=sharing]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15447) Performance test for ALS in Spark 2.0

2016-06-16 Thread Nick Pentreath (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1530#comment-1530
 ] 

Nick Pentreath commented on SPARK-15447:


Almost there - I'll be able to close this off by Friday




> Performance test for ALS in Spark 2.0
> -
>
> Key: SPARK-15447
> URL: https://issues.apache.org/jira/browse/SPARK-15447
> Project: Spark
>  Issue Type: Task
>  Components: ML
>Affects Versions: 2.0.0
>Reporter: Xiangrui Meng
>Assignee: Nick Pentreath
>Priority: Critical
>  Labels: QA
>
> We made several changes to ALS in 2.0. It is necessary to run some tests to 
> avoid performance regression. We should test (synthetic) datasets from 1 
> million ratings to 1 billion ratings.
> cc [~mlnick] [~holdenk] Do you have time to run some large-scale performance 
> tests?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15447) Performance test for ALS in Spark 2.0

2016-06-15 Thread Reynold Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15332578#comment-15332578
 ] 

Reynold Xin commented on SPARK-15447:
-

We can close this one now can't we?


> Performance test for ALS in Spark 2.0
> -
>
> Key: SPARK-15447
> URL: https://issues.apache.org/jira/browse/SPARK-15447
> Project: Spark
>  Issue Type: Task
>  Components: ML
>Affects Versions: 2.0.0
>Reporter: Xiangrui Meng
>Assignee: Nick Pentreath
>Priority: Critical
>  Labels: QA
>
> We made several changes to ALS in 2.0. It is necessary to run some tests to 
> avoid performance regression. We should test (synthetic) datasets from 1 
> million ratings to 1 billion ratings.
> cc [~mlnick] [~holdenk] Do you have time to run some large-scale performance 
> tests?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15447) Performance test for ALS in Spark 2.0

2016-06-03 Thread Nick Pentreath (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15314441#comment-15314441
 ] 

Nick Pentreath commented on SPARK-15447:


Added a second tab to the sheet for testing DF-based API from 2.0.0-SNAPSHOT vs 
1.6.1 for SPARK-14891. Again, 2.0 is faster and no performance regression.

> Performance test for ALS in Spark 2.0
> -
>
> Key: SPARK-15447
> URL: https://issues.apache.org/jira/browse/SPARK-15447
> Project: Spark
>  Issue Type: Task
>  Components: ML
>Affects Versions: 2.0.0
>Reporter: Xiangrui Meng
>Assignee: Nick Pentreath
>Priority: Critical
>  Labels: QA
>
> We made several changes to ALS in 2.0. It is necessary to run some tests to 
> avoid performance regression. We should test (synthetic) datasets from 1 
> million ratings to 1 billion ratings.
> cc [~mlnick] [~holdenk] Do you have time to run some large-scale performance 
> tests?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15447) Performance test for ALS in Spark 2.0

2016-05-31 Thread Nick Pentreath (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15308797#comment-15308797
 ] 

Nick Pentreath commented on SPARK-15447:


Created a Google sheet with initial results: 
https://docs.google.com/spreadsheets/d/1iX5LisfXcZSTCHp8VPoo5z-eCO85A5VsZDtZ5e475ks/edit?usp=sharing

So far for SPARK-6717 I've just used {{spark-perf}} to compare the RDD-based 
APIs (as the checkpointing only impacts the RDD-based {{train}} method). From 
these results no red flags, and 2.0 is actually faster in general relative to 
1.6. Checkpointing does add a minor overhead (but this overhead is consistent 
across the versions and again better in 2.0).

There is something a little weird about the 1.6 results for 10m ratings case, 
but not sure what's going on there - I've rerun a few times with the same 
result.

Also, haven't managed to get to 1b ratings yet due to cluster size, will keep 
working on it.

> Performance test for ALS in Spark 2.0
> -
>
> Key: SPARK-15447
> URL: https://issues.apache.org/jira/browse/SPARK-15447
> Project: Spark
>  Issue Type: Task
>  Components: ML
>Affects Versions: 2.0.0
>Reporter: Xiangrui Meng
>Assignee: Nick Pentreath
>Priority: Critical
>  Labels: QA
>
> We made several changes to ALS in 2.0. It is necessary to run some tests to 
> avoid performance regression. We should test (synthetic) datasets from 1 
> million ratings to 1 billion ratings.
> cc [~mlnick] [~holdenk] Do you have time to run some large-scale performance 
> tests?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15447) Performance test for ALS in Spark 2.0

2016-05-20 Thread Nick Pentreath (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15294116#comment-15294116
 ] 

Nick Pentreath commented on SPARK-15447:


[~mengxr] yes will aim to run some tests during early next week.

> Performance test for ALS in Spark 2.0
> -
>
> Key: SPARK-15447
> URL: https://issues.apache.org/jira/browse/SPARK-15447
> Project: Spark
>  Issue Type: Task
>  Components: ML
>Affects Versions: 2.0.0
>Reporter: Xiangrui Meng
>Priority: Critical
>  Labels: QA
>
> We made several changes to ALS in 2.0. It is necessary to run some tests to 
> avoid performance regression. We should test (synthetic) datasets from 1 
> million ratings to 1 billion ratings.
> cc [~mlnick] [~holdenk] Do you have time to run some large-scale performance 
> tests?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org