[GitHub] spark issue #17894: [WIP][SPARK-17134][ML] Use level 2 BLAS operations in Lo...

VinceShieh Thu, 01 Jun 2017 07:58:14 -0700

Github user VinceShieh commented on the issue:

    https://github.com/apache/spark/pull/17894
  
    sorry for late update!
    we tested on this PR against the current implementation with both dense and 
sparse(0.95 sparsity):
    
![image](https://cloud.githubusercontent.com/assets/2673819/26685356/75984dc6-471c-11e7-8c75-c5c739f8a323.png)
    
![image](https://cloud.githubusercontent.com/assets/2673819/26685361/795f3686-471c-11e7-9a2b-a818b8b28244.png)
    
![image](https://cloud.githubusercontent.com/assets/2673819/26685323/528d6ec4-471c-11e7-8f4e-1f5a91e77a21.png)
    
    The test on single machine was run on 100 samples on each feature set 
scale, we can get performance gain (less training time) on both dense and 
sparse dataset, on distributed case, we can also achieve a good performance 
with fine tuning (num_cores, data partitions, etc..), but this change 
inevitably put more constraint on memory and will bring up GC problem if no 
enough memory is available on worker node, for sparse dataset on distributed 
cluster, we are still unable to get a good result, so maybe we should bypass 
this change for sparse case, but before making such change, I
    d like to hear your thoughts on current test result we have, maybe we can 
make it a better PR with your input :)
    
    Thanks.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #17894: [WIP][SPARK-17134][ML] Use level 2 BLAS operations in Lo...

Reply via email to