zhengruifeng opened a new pull request #28458:
URL: https://github.com/apache/spark/pull/28458


   ### What changes were proposed in this pull request?
   1, reorg the `fit` method in LR to several blocks (`createModel`, 
`createBounds`, `createOptimizer`, `createInitCoefWithInterceptMatrix`);
   2, add new param blockSize;
   3, if blockSize==1, keep original behavior, code path `trainOnRows`; 
   4, if blockSize>1, standardize and stack input vectors to blocks (like 
ALS/MLP), code path `trainOnBlocks`
   
   ### Why are the changes needed?
   On dense dataset `epsilon_normalized.t`:
   1, reduce RAM to persist traing dataset; (save about 40% RAM)
   2, use Level-2 BLAS routines; (4x ~ 5x faster)
   
   
   ### Does this PR introduce _any_ user-facing change?
   Yes, a new param is added
   
   ### How was this patch tested?
   existing and added testsuites


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to