[GitHub] spark issue #17894: [WIP][SPARK-17134][ML] Use level 2 BLAS operations in Lo...

2018-07-15 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/17894
  
gentle ping @VinceShieh for @WeichenXu123's comment.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17894: [WIP][SPARK-17134][ML] Use level 2 BLAS operations in Lo...

2018-01-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17894
  
Build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17894: [WIP][SPARK-17134][ML] Use level 2 BLAS operations in Lo...

2018-01-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17894
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/46/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17894: [WIP][SPARK-17134][ML] Use level 2 BLAS operations in Lo...

2017-08-08 Thread WeichenXu123
Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/17894
  
I am also interested in implementation by level-3 BLAS. Can you post a 
design doc first?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17894: [WIP][SPARK-17134][ML] Use level 2 BLAS operations in Lo...

2017-06-01 Thread VinceShieh
Github user VinceShieh commented on the issue:

https://github.com/apache/spark/pull/17894
  
@sethah yes, we only take 100 samples and trained with 3 iterations, 
numClasss is 20 of our test dataset for single node testing.
Yeah, I also believe it'd have a better result if it's possible to use 
level3 BLAS, please let me know what I can help with that! but some constraint 
will still emerge such as memory shortage bringing up GC issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17894: [WIP][SPARK-17134][ML] Use level 2 BLAS operations in Lo...

2017-06-01 Thread sethah
Github user sethah commented on the issue:

https://github.com/apache/spark/pull/17894
  
@VinceShieh Thanks for posting your results. You tested these on datasets 
with only 100 samples correct? That's probably not a representative use case of 
a normal workload... Also, how many classes (i.e. `numClasses`) did you use? 

I've actually been looking at using level 3 BLAS operations in the logistic 
aggregator, and initial results showed close to 10x speedups in some cases. I 
am holding off submitting any code because it would require a fairly 
significant refactoring of the code, which will be made much easier after 
https://github.com/apache/spark/pull/17094 is merged. Using level 2 BLAS is a 
less invasive change, but the test results you show provide rather small 
speedups.

My preference is to wait a bit and submit a change that incorporates level 
3 BLAS in logistic regression. We should get @dbtsai's opinion too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17894: [WIP][SPARK-17134][ML] Use level 2 BLAS operations in Lo...

2017-06-01 Thread VinceShieh
Github user VinceShieh commented on the issue:

https://github.com/apache/spark/pull/17894
  
Forgot to mention, we observed a nearly 2x performance gain with the help 
of nativeBLAS- MKL, without a fine tuning, so if we can also make F2J version 
run faster in distributed cluster than the current design, it would truly be a 
good PR for community. :)


![image](https://cloud.githubusercontent.com/assets/2673819/26686368/47cefb12-471f-11e7-815d-afb28c7e983d.png)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17894: [WIP][SPARK-17134][ML] Use level 2 BLAS operations in Lo...

2017-06-01 Thread VinceShieh
Github user VinceShieh commented on the issue:

https://github.com/apache/spark/pull/17894
  
sorry for late update!
we tested on this PR against the current implementation with both dense and 
sparse(0.95 sparsity):

![image](https://cloud.githubusercontent.com/assets/2673819/26685356/75984dc6-471c-11e7-8c75-c5c739f8a323.png)

![image](https://cloud.githubusercontent.com/assets/2673819/26685361/795f3686-471c-11e7-9a2b-a818b8b28244.png)

![image](https://cloud.githubusercontent.com/assets/2673819/26685323/528d6ec4-471c-11e7-8f4e-1f5a91e77a21.png)

The test on single machine was run on 100 samples on each feature set 
scale, we can get performance gain (less training time) on both dense and 
sparse dataset, on distributed case, we can also achieve a good performance 
with fine tuning (num_cores, data partitions, etc..), but this change 
inevitably put more constraint on memory and will bring up GC problem if no 
enough memory is available on worker node, for sparse dataset on distributed 
cluster, we are still unable to get a good result, so maybe we should bypass 
this change for sparse case, but before making such change, I
d like to hear your thoughts on current test result we have, maybe we can 
make it a better PR with your input :)

Thanks.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17894: [WIP][SPARK-17134][ML] Use level 2 BLAS operations in Lo...

2017-05-16 Thread VinceShieh
Github user VinceShieh commented on the issue:

https://github.com/apache/spark/pull/17894
  
@sethah Sorry for the late response. Setting as WIP. We have performance 
data for dense features, data for the sparse feature will be ready soon. thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org