Hey Everyone, So I have a request that may have been fielded already but I felt compelled to inquire...
Logistic regression is obviously a popular tool for classification. However, when confronted with modern problems where, post initial feature selection, we are still confronted with 10^6-10^7 features, using straight up LR is inappropriate as the solution is most likely embedded in a sparse linear subspace. L1 logistic regression adds a L1-norm penalty on the regression coefficients such that, when penalized "enough" many coefficients are "thresholded to zero" resulting in a simpler classifier that tends to generalize to out of sample test data better than a full fit model. http://www.stanford.edu/~boyd/l1_logreg/ The interior point method with a preconditioned gradient newton step approximation of boyd et al is what I would call the "state of the art" of L1 LR. It also accepts data matrices in a sparse MatrixMarket format (most common sparse matrix compression) whereas the LR mahout implementation is a dense matrix. Are there any pushes to implement a L1-Logistic Regression solver in the mahout libraries? Obviously any form of LR is serial in nature but certain operations within the newton step approximation for instance can be parallelized. Any thoughts or visions on moving in this direction are welcomed. Very Best, Patrick Harrington Patrick Harrington, Ph.D. | Sr. Data Scientist | OneRiot 1050 Walnut Street, Suite 202 | Boulder, CO 80302 303.938.3071 Direct | 517.881.0628 Cell | 303.938.3060 Fax [email protected]<mailto:[email protected]>
