Re: [Scikit-learn-general] Benchmarking non-negative least squares solvers, work in progress

2013-11-19 Thread Vlad Niculae
I finally found a desk and some focus. I addressed Mathieu's suggestions and added some timings on real data (with a lot of concessions so that it would run reasonably quick on my machine). Here's the results: http://nbviewer.ipython.org/7224672 It becomes clear that `tol` still means different

Re: [Scikit-learn-general] Benchmarking non-negative least squares solvers, work in progress

2013-11-08 Thread Olivier Grisel
2013/11/7 Mathieu Blondel math...@mblondel.org: On Fri, Nov 8, 2013 at 12:28 AM, Vlad Niculae zephy...@gmail.com wrote: I feel like this would go against explicit is better than implicit, but without it grid search would indeed be awkward. Maybe: if self.alpha_coef == 'same':

Re: [Scikit-learn-general] Benchmarking non-negative least squares solvers, work in progress

2013-11-08 Thread Olivier Grisel
About the LBFGS-B residuals (non-)issue I was probably confused by the overlapping on the plot and mis-interpreted the location of the PG-l1 and PG-l2 curves. -- Olivier -- November Webinars for C, C++, Fortran

Re: [Scikit-learn-general] Benchmarking non-negative least squares solvers, work in progress

2013-11-08 Thread Vlad Niculae
Re: the discussion we had at PyCon.fr, I noticed that the internal elastic net coordinate descent functions are parametrized with `l1_reg` and `l2_reg`, but the exposed classes and functions have `alpha` and `l1_ratio`. Only yesterday there was somebody on IRC who couldn't match Ridge with

Re: [Scikit-learn-general] Benchmarking non-negative least squares solvers, work in progress

2013-11-08 Thread Mathieu Blondel
And lambda is a reserved keyword in Python ;-) On Fri, Nov 8, 2013 at 4:59 PM, Olivier Grisel olivier.gri...@ensta.orgwrote: 2013/11/7 Mathieu Blondel math...@mblondel.org: On Fri, Nov 8, 2013 at 12:28 AM, Vlad Niculae zephy...@gmail.com wrote: I feel like this would go against

Re: [Scikit-learn-general] Benchmarking non-negative least squares solvers, work in progress

2013-11-08 Thread Thomas Unterthiner
Just my 0.02$ as a user: I was also a confused/put-off by `alpha` and `l1_ratio` when I first explored SGDClassifier, I found those names to be pretty inconsistent --- plus I tend to call my regularization parameters `lambda` and use `alpha` for learning rates. I'm sure other people associate

Re: [Scikit-learn-general] Benchmarking non-negative least squares solvers, work in progress

2013-11-08 Thread Peter Prettenhofer
SGDClassifier adopted the parameter names of ElasticNet (which has been around in sklearn for longer) for consistency reasons. I agree that we should strive for concise and intuitive parameter names such as ``l1_ratio``. Naming in sklearn is actually quite unfortunate since the popular R package

Re: [Scikit-learn-general] Benchmarking non-negative least squares solvers, work in progress

2013-11-08 Thread Olivier Grisel
We cannot use lambda as parameter name because it is a reserved keyword of the python language (for defining anonymous functions). This is why used alpha instead of lambda for the ElasticNet / Lasso model initially and then this notation was reused in more recently implemented estimators such as

Re: [Scikit-learn-general] Benchmarking non-negative least squares solvers, work in progress

2013-11-08 Thread Alexandre Gramfort
just a remark in LogisticRegression you can use L1 and L2 reg and there is a single param that is alpha. It's not trivial to have a consistent naming for regularization param. In SVC it is C as it's the common naming... but it corresponds to 1/l2_reg with what you suggest... Alex

Re: [Scikit-learn-general] Benchmarking non-negative least squares solvers, work in progress

2013-11-08 Thread Gael Varoquaux
On Fri, Nov 08, 2013 at 11:56:24AM +0100, Olivier Grisel wrote: In retrospect I would have prefered it named something explicit like regularization or l2_reg instead of alpha. Agreed. Still I like the (alpha, l1_ratio) parameterization better over the (l2_reg, l1_reg) parameter set

Re: [Scikit-learn-general] Benchmarking non-negative least squares solvers, work in progress

2013-11-07 Thread Olivier Grisel
A quick remark: Instead of: %pylab inline --no-import-all you can just do: %matplotlib inline -- Olivier -- November Webinars for C, C++, Fortran Developers Accelerate application performance with scalable

Re: [Scikit-learn-general] Benchmarking non-negative least squares solvers, work in progress

2013-11-07 Thread Olivier Grisel
2013/11/7 Vlad Niculae zephy...@gmail.com: Hi everybody, I just updated the gist quite a lot, please take a look: http://nbviewer.ipython.org/7224672 I'll go to sleep and interpret it with a fresh eye tomorrow, but what's interesting at the moment is: KKT's performance is quite constant,

Re: [Scikit-learn-general] Benchmarking non-negative least squares solvers, work in progress

2013-11-07 Thread Vlad Niculae
The regularization is the same, I think the higher residuals come from the fact that the gradient is raveled, so compared to `n_targets` independent problems, it will take different steps. I don't think there are any convergence issues because I made the solvers print a warning in case they don't

Re: [Scikit-learn-general] Benchmarking non-negative least squares solvers, work in progress

2013-11-07 Thread Vlad Niculae
Come to think of it, Olivier, what do you mean when you say L-BFGS-B has higher residuals? I fail to see this trend; what I see is that L1 L2 no reg. in terms of residuals, with different methods coming very close to one another for the same regularisation objective. Could you be more specific?

Re: [Scikit-learn-general] Benchmarking non-negative least squares solvers, work in progress

2013-11-07 Thread Vlad Niculae
Also I found this pretty big difference in timing when computing elementwise norms and products. In [1]: X = np.random.randn(1000, 900) In [2]: %timeit np.linalg.norm(X, 'fro') 100 loops, best of 3: 4.8 ms per loop In [3]: %timeit np.sqrt(np.sum(X ** 2)) 100 loops, best of 3: 4.5 ms per loop

Re: [Scikit-learn-general] Benchmarking non-negative least squares solvers, work in progress

2013-11-07 Thread Vlad Niculae
In reply to Olivier's previous comment, as it's not at all obvious from the plots, I chose a case where lbfgsb-l1 seems very far away and printed the residuals of it and of pg-l1: In [227]: tall_med[tall_med['solver'] == 'lbfgsb-l1']['residual'] Out[227]: 2580.9370832 2650.9405044 272

Re: [Scikit-learn-general] Benchmarking non-negative least squares solvers, work in progress

2013-11-07 Thread Lars Buitinck
2013/11/7, Vlad Niculae zephy...@gmail.com: Also I found this pretty big difference in timing when computing elementwise norms and products. This is a known problem with np.linalg.norm, and so is the memory consumption. You should use sklearn.utils.extmath.norm for the Frobenius norm. Also

Re: [Scikit-learn-general] Benchmarking non-negative least squares solvers, work in progress

2013-11-07 Thread Vlad Niculae
This is a known problem with np.linalg.norm, and so is the memory consumption. You should use sklearn.utils.extmath.norm for the Frobenius norm. Hmm. Indeed I missed that, but still, this is a bit odd. sklearn.utils.extmath.norm is slower than raveling on my anaconda with MKL accelerate setup:

Re: [Scikit-learn-general] Benchmarking non-negative least squares solvers, work in progress

2013-11-07 Thread Mathieu Blondel
Thanks for the awesome work Vlad! It's nice to see good progress. On Thu, Nov 7, 2013 at 7:12 PM, Vlad Niculae zephy...@gmail.com wrote: The regularization is the same, I think the higher residuals come from the fact that the gradient is raveled, so compared to `n_targets` independent

Re: [Scikit-learn-general] Benchmarking non-negative least squares solvers, work in progress

2013-11-07 Thread Lars Buitinck
2013/11/7 Vlad Niculae zephy...@gmail.com: This is a known problem with np.linalg.norm, and so is the memory consumption. You should use sklearn.utils.extmath.norm for the Frobenius norm. Hmm. Indeed I missed that, but still, this is a bit odd. sklearn.utils.extmath.norm is slower than

Re: [Scikit-learn-general] Benchmarking non-negative least squares solvers, work in progress

2013-11-07 Thread Lars Buitinck
2013/11/7 Mathieu Blondel math...@mblondel.org: Do we need two different regularization parameters for coefficients and components? MiniBatchDictionaryLearning seems to have only one alpha. For reproducing results from literature this is useful. E.g. Hoyer only regularizes one of the matrices.

Re: [Scikit-learn-general] Benchmarking non-negative least squares solvers, work in progress

2013-11-07 Thread Mathieu Blondel
On Thu, Nov 7, 2013 at 11:57 PM, Lars Buitinck larsm...@gmail.com wrote: For reproducing results from literature this is useful. E.g. Hoyer only regularizes one of the matrices. For efficient grid-search with shared values, we could do this: if self.alpha_comp is None and self.alpha_coef is

Re: [Scikit-learn-general] Benchmarking non-negative least squares solvers, work in progress

2013-11-07 Thread Vlad Niculae
I feel like this would go against explicit is better than implicit, but without it grid search would indeed be awkward. Maybe: if self.alpha_coef == 'same': alpha_coef = self.alpha_comp ? On Thu, Nov 7, 2013 at 4:19 PM, Mathieu Blondel math...@mblondel.org wrote: On Thu, Nov 7, 2013 at

Re: [Scikit-learn-general] Benchmarking non-negative least squares solvers, work in progress

2013-11-07 Thread Mathieu Blondel
On Fri, Nov 8, 2013 at 12:28 AM, Vlad Niculae zephy...@gmail.com wrote: I feel like this would go against explicit is better than implicit, but without it grid search would indeed be awkward. Maybe: if self.alpha_coef == 'same': alpha_coef = self.alpha_comp ? Sounds good to me!

Re: [Scikit-learn-general] Benchmarking non-negative least squares solvers, work in progress

2013-11-06 Thread Vlad Niculae
Hi everybody, I just updated the gist quite a lot, please take a look: http://nbviewer.ipython.org/7224672 I'll go to sleep and interpret it with a fresh eye tomorrow, but what's interesting at the moment is: KKT's performance is quite constant, PG with sparsity penalties (the new, simpler

Re: [Scikit-learn-general] Benchmarking non-negative least squares solvers, work in progress

2013-11-06 Thread Alexandre Gramfort
I'd love to add non-negative lasso to this mix. However, I noticed that cd_fast.pyx is missing the positive=True option in multitask lasso (as well as the sparse variant). Is there any other reason for this or just that nobody needed it? indeed nobody needed it :) thanks for looking into

Re: [Scikit-learn-general] Benchmarking non-negative least squares solvers, work in progress

2013-10-30 Thread Alexandre Gramfort
By the way, the MiniBatchDictLearning can be trivially modified to do this: do a non-negative Lasso, instead of a Lasso. This is discussed in the original paper. if somebody has some time to add a positive option to LassoLars like available in Lasso that would be great. It would then be

Re: [Scikit-learn-general] Benchmarking non-negative least squares solvers, work in progress

2013-10-30 Thread Olivier Grisel
Interesting. Also not that the current nls_kkt implementation is using a sequential for loop over the columns. This loop could probably be embarassingly parallelized with very low overhead with thread as scipy is probably releasing the GIL. This is another potential motivation for me to work on a

Re: [Scikit-learn-general] Benchmarking non-negative least squares solvers, work in progress

2013-10-30 Thread Olivier Grisel
Does anyone have a explanation for the discrepancy in the residuals for the lbfgs-b and nnls_kkt? If nnls_kkt can stay so close to zero, unregularized lbfgs-b should reach be able to reach the same training set MSE, no? -- Olivier

Re: [Scikit-learn-general] Benchmarking non-negative least squares solvers, work in progress

2013-10-30 Thread Vlad Niculae
i guess it's just a bug in how the solvers return residuals, I'll add some unit tests with manually-computed residuals to check. On Wed, Oct 30, 2013 at 9:48 AM, Olivier Grisel olivier.gri...@ensta.org wrote: Does anyone have a explanation for the discrepancy in the residuals for the lbfgs-b

Re: [Scikit-learn-general] Benchmarking non-negative least squares solvers, work in progress

2013-10-30 Thread Mathieu Blondel
I think MiniBatchDictLearning supports only dense arrays, though. Mathieu PS: Very nice notebooks, Vlad and Olivier. On Wed, Oct 30, 2013 at 5:44 PM, Olivier Grisel olivier.gri...@ensta.orgwrote: Interesting. Also not that the current nls_kkt implementation is using a sequential for loop

Re: [Scikit-learn-general] Benchmarking non-negative least squares solvers, work in progress

2013-10-30 Thread Olivier Grisel
2013/10/30 Mathieu Blondel math...@mblondel.org: I think MiniBatchDictLearning supports only dense arrays, though. Mathieu PS: Very nice notebooks, Vlad and Olivier. This is all Vlad's work here. -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel

Re: [Scikit-learn-general] Benchmarking non-negative least squares solvers, work in progress

2013-10-30 Thread Vlad Niculae
Thanks Mathieu, well part of it comes from your gist (I added an attribution now) ;) Non-negative lasso is really interesting, I forgot about it but I think it would be very interesting to compare qualitatively. Vlad On Wed, Oct 30, 2013 at 10:15 AM, Olivier Grisel olivier.gri...@ensta.org

Re: [Scikit-learn-general] Benchmarking non-negative least squares solvers, work in progress

2013-10-29 Thread Gael Varoquaux
On Wed, Oct 30, 2013 at 12:49:49AM +0100, Vlad Niculae wrote: Adding L1 (elementwise) regularization makes L-BFGS-B converge much quicker. This is cool because for NMF such a penalty has other advantages. By the way, the MiniBatchDictLearning can be trivially modified to do this: do a