Hey Everyone,
So I have a request that may have been fielded already but I felt compelled to 
inquire...

Logistic regression is obviously a popular tool for classification.  However, 
when confronted with modern problems where, post initial feature selection, we 
are still confronted with 10^6-10^7 features, using straight up LR is 
inappropriate as the solution is most likely embedded in a sparse linear 
subspace.

L1 logistic regression adds a L1-norm penalty on the regression coefficients 
such that, when penalized "enough" many coefficients are "thresholded to zero" 
resulting in a simpler classifier that tends to generalize to out of sample 
test data better than a full fit model.

http://www.stanford.edu/~boyd/l1_logreg/

The interior point method with a preconditioned gradient newton step 
approximation of boyd et al is what I would call the "state of the art" of L1 
LR.

It also accepts data matrices in a sparse MatrixMarket format (most common 
sparse matrix compression) whereas the LR mahout implementation is a dense 
matrix.

Are there any pushes to implement a L1-Logistic Regression solver in the mahout 
libraries?  Obviously any form of LR is serial in nature but certain operations 
within the newton step approximation for instance can be parallelized.

Any thoughts or visions on moving in this direction are welcomed.

Very Best,
Patrick Harrington




Patrick Harrington, Ph.D. | Sr. Data Scientist | OneRiot
1050 Walnut Street, Suite 202 | Boulder, CO 80302
303.938.3071 Direct | 517.881.0628 Cell  | 303.938.3060 Fax
[email protected]<mailto:[email protected]>




Reply via email to