Also I found this pretty big difference in timing when computing elementwise norms and products.
In [1]: X = np.random.randn(1000, 900) In [2]: %timeit np.linalg.norm(X, 'fro') 100 loops, best of 3: 4.8 ms per loop In [3]: %timeit np.sqrt(np.sum(X ** 2)) 100 loops, best of 3: 4.5 ms per loop In [4]: %timeit np.sqrt(np.dot(X.ravel(), X.ravel())) 1000 loops, best of 3: 552 µs per loop In [5]: print np.linalg.norm(X, 'fro') - np.sqrt(np.dot(X.ravel(), X.ravel())) 3.52429196937e-12 And if I'm doing it right, it's also better in terms of memory (but not better than the sum of squares approach): Filename: fro.py Line # Mem usage Increment Line Contents ================================================ 7 42.7 MiB 0.0 MiB def sumsq(X): 8 42.7 MiB 0.0 MiB return np.sqrt(np.sum(X ** 2)) Filename: fro.py Line # Mem usage Increment Line Contents ================================================ 10 42.7 MiB 0.0 MiB def raveled(X): 11 44.4 MiB 1.7 MiB return np.sqrt(np.dot(X.ravel(), X.ravel())) Filename: fro.py Line # Mem usage Increment Line Contents ================================================ 4 35.7 MiB 0.0 MiB def linalg(X): 5 42.7 MiB 7.0 MiB return np.linalg.norm(X, 'fro') On Thu, Nov 7, 2013 at 11:46 AM, Vlad Niculae <zephy...@gmail.com> wrote: > Come to think of it, Olivier, what do you mean when you say L-BFGS-B > has higher residuals? I fail to see this trend; what I see is that L1 >> L2 > no reg. in terms of residuals, with different methods coming > very close to one another for the same regularisation objective. > Could you be more specific? > > On Thu, Nov 7, 2013 at 11:12 AM, Vlad Niculae <zephy...@gmail.com> wrote: >> The regularization is the same, I think the higher residuals come from >> the fact that the gradient is raveled, so compared to `n_targets` >> independent problems, it will take different steps. >> >> I don't think there are any convergence issues because I made the >> solvers print a warning in case they don't converge (I had a bug in >> the projected gradient regularized implementation, because an L2 >> penalty changes the hessian too). >> >> Indeed when I say lasso I meant elastic net. Somebody would need to >> code it though, but I could try a looped version similar to the kkt >> one for now. WDYT? >> I think I will update the notebook to use real data (Y = 20newsgroups, >> X = NMF learned components, so we would effectively be benchmarking >> the NMF transform task). >> Any suggestion of data in a different regime to add, faces maybe? >> >> After this change I think I can blog it and postpone multitask elastic >> net for a later update. >> >> As for what happens to NMF I think the way to go is to refactor the >> projected gradient solver, add the kind of regularization that is in >> the notebook and rename some of the variables to make it more >> readable, now that I understand it better. Then we can deprecate or >> remove completely the `sparseness=..., beta=..., eta=...` parameters >> of ProjectedGradientNMF and replace them with `components_l1_reg, >> components_l2_reg, repr_l1_reg, repr_l2_reg` (or alternatively, an >> alpha/l1_ratio parametrization). >> >> This can be done before the release. >> >> Cheers, >> Vlad >> >> >> On Thu, Nov 7, 2013 at 9:51 AM, Olivier Grisel <olivier.gri...@ensta.org> >> wrote: >>> 2013/11/7 Vlad Niculae <zephy...@gmail.com>: >>>> Hi everybody, >>>> >>>> I just updated the gist quite a lot, please take a look: >>>> http://nbviewer.ipython.org/7224672 >>>> >>>> I'll go to sleep and interpret it with a fresh eye tomorrow, but >>>> what's interesting at the moment is: >>>> >>>> KKT's performance is quite constant, >>>> PG with sparsity penalties (the new, simpler ones, not the >>>> implementation in current master, also with fixed stopping condition) >>>> is quite fast! >>>> Residual calculation is fixed and suggests that the solvers work well. >>> >>> It seems that the L-BFGS-B residuals can still be significantly higher >>> than the others. Is the regularization the same for this optimizer? Or >>> is the convergence criterion too lax? >>> >>> Thanks for this evaluation Vlad. Indeed adding multitask lasso and >>> maybe elastic net would be great. >>> >>> -- >>> Olivier >>> >>> ------------------------------------------------------------------------------ >>> November Webinars for C, C++, Fortran Developers >>> Accelerate application performance with scalable programming models. Explore >>> techniques for threading, error checking, porting, and tuning. Get the most >>> from the latest Intel processors and coprocessors. See abstracts and >>> register >>> http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.clktrk >>> _______________________________________________ >>> Scikit-learn-general mailing list >>> Scikit-learn-general@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general ------------------------------------------------------------------------------ November Webinars for C, C++, Fortran Developers Accelerate application performance with scalable programming models. Explore techniques for threading, error checking, porting, and tuning. Get the most from the latest Intel processors and coprocessors. See abstracts and register http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.clktrk _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general