In reply to Olivier's previous comment, as it's not at all obvious from the plots, I chose a case where lbfgsb-l1 seems very far away and printed the residuals of it and of pg-l1:
In [227]: tall_med[tall_med['solver'] == 'lbfgsb-l1']['residual'] Out[227]: 258 0.9370832 265 0.9405044 272 0.9342741 279 0.9336801 286 0.9299868 293 0.9296223 300 0.9261801 307 0.9273685 314 0.9274671 Name: residual, dtype: object In [228]: tall_med[tall_med['solver'] == 'pg-l1']['residual'] Out[228]: 255 0.936736 262 0.940665 269 0.9343853 276 0.9337552 283 0.9300757 290 0.9297058 297 0.9262745 304 0.9274619 311 0.9275654 Name: residual, dtype: object It looks spot on. Note that tolerance is 1e-3. Any idea how to make it visible in the plot when two lines are so close? On Thu, Nov 7, 2013 at 12:26 PM, Vlad Niculae <zephy...@gmail.com> wrote: > Also I found this pretty big difference in timing when computing > elementwise norms and products. > > In [1]: X = np.random.randn(1000, 900) > > In [2]: %timeit np.linalg.norm(X, 'fro') > 100 loops, best of 3: 4.8 ms per loop > > In [3]: %timeit np.sqrt(np.sum(X ** 2)) > 100 loops, best of 3: 4.5 ms per loop > > In [4]: %timeit np.sqrt(np.dot(X.ravel(), X.ravel())) > 1000 loops, best of 3: 552 µs per loop > > In [5]: print np.linalg.norm(X, 'fro') - np.sqrt(np.dot(X.ravel(), X.ravel())) > 3.52429196937e-12 > > And if I'm doing it right, it's also better in terms of memory (but > not better than the sum of squares approach): > > > Filename: fro.py > > Line # Mem usage Increment Line Contents > ================================================ > 7 42.7 MiB 0.0 MiB def sumsq(X): > 8 42.7 MiB 0.0 MiB return np.sqrt(np.sum(X ** 2)) > > > Filename: fro.py > > Line # Mem usage Increment Line Contents > ================================================ > 10 42.7 MiB 0.0 MiB def raveled(X): > 11 44.4 MiB 1.7 MiB return > np.sqrt(np.dot(X.ravel(), X.ravel())) > > > Filename: fro.py > > Line # Mem usage Increment Line Contents > ================================================ > 4 35.7 MiB 0.0 MiB def linalg(X): > 5 42.7 MiB 7.0 MiB return np.linalg.norm(X, 'fro') > > On Thu, Nov 7, 2013 at 11:46 AM, Vlad Niculae <zephy...@gmail.com> wrote: >> Come to think of it, Olivier, what do you mean when you say L-BFGS-B >> has higher residuals? I fail to see this trend; what I see is that L1 >>> L2 > no reg. in terms of residuals, with different methods coming >> very close to one another for the same regularisation objective. >> Could you be more specific? >> >> On Thu, Nov 7, 2013 at 11:12 AM, Vlad Niculae <zephy...@gmail.com> wrote: >>> The regularization is the same, I think the higher residuals come from >>> the fact that the gradient is raveled, so compared to `n_targets` >>> independent problems, it will take different steps. >>> >>> I don't think there are any convergence issues because I made the >>> solvers print a warning in case they don't converge (I had a bug in >>> the projected gradient regularized implementation, because an L2 >>> penalty changes the hessian too). >>> >>> Indeed when I say lasso I meant elastic net. Somebody would need to >>> code it though, but I could try a looped version similar to the kkt >>> one for now. WDYT? >>> I think I will update the notebook to use real data (Y = 20newsgroups, >>> X = NMF learned components, so we would effectively be benchmarking >>> the NMF transform task). >>> Any suggestion of data in a different regime to add, faces maybe? >>> >>> After this change I think I can blog it and postpone multitask elastic >>> net for a later update. >>> >>> As for what happens to NMF I think the way to go is to refactor the >>> projected gradient solver, add the kind of regularization that is in >>> the notebook and rename some of the variables to make it more >>> readable, now that I understand it better. Then we can deprecate or >>> remove completely the `sparseness=..., beta=..., eta=...` parameters >>> of ProjectedGradientNMF and replace them with `components_l1_reg, >>> components_l2_reg, repr_l1_reg, repr_l2_reg` (or alternatively, an >>> alpha/l1_ratio parametrization). >>> >>> This can be done before the release. >>> >>> Cheers, >>> Vlad >>> >>> >>> On Thu, Nov 7, 2013 at 9:51 AM, Olivier Grisel <olivier.gri...@ensta.org> >>> wrote: >>>> 2013/11/7 Vlad Niculae <zephy...@gmail.com>: >>>>> Hi everybody, >>>>> >>>>> I just updated the gist quite a lot, please take a look: >>>>> http://nbviewer.ipython.org/7224672 >>>>> >>>>> I'll go to sleep and interpret it with a fresh eye tomorrow, but >>>>> what's interesting at the moment is: >>>>> >>>>> KKT's performance is quite constant, >>>>> PG with sparsity penalties (the new, simpler ones, not the >>>>> implementation in current master, also with fixed stopping condition) >>>>> is quite fast! >>>>> Residual calculation is fixed and suggests that the solvers work well. >>>> >>>> It seems that the L-BFGS-B residuals can still be significantly higher >>>> than the others. Is the regularization the same for this optimizer? Or >>>> is the convergence criterion too lax? >>>> >>>> Thanks for this evaluation Vlad. Indeed adding multitask lasso and >>>> maybe elastic net would be great. >>>> >>>> -- >>>> Olivier >>>> >>>> ------------------------------------------------------------------------------ >>>> November Webinars for C, C++, Fortran Developers >>>> Accelerate application performance with scalable programming models. >>>> Explore >>>> techniques for threading, error checking, porting, and tuning. Get the most >>>> from the latest Intel processors and coprocessors. See abstracts and >>>> register >>>> http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.clktrk >>>> _______________________________________________ >>>> Scikit-learn-general mailing list >>>> Scikit-learn-general@lists.sourceforge.net >>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general ------------------------------------------------------------------------------ November Webinars for C, C++, Fortran Developers Accelerate application performance with scalable programming models. Explore techniques for threading, error checking, porting, and tuning. Get the most from the latest Intel processors and coprocessors. See abstracts and register http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.clktrk _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general