Re: [Scikit-learn-general] Benchmarking non-negative least squares solvers, work in progress

Vlad Niculae Thu, 07 Nov 2013 03:27:55 -0800

Also I found this pretty big difference in timing when computing
elementwise norms and products.


In [1]: X = np.random.randn(1000, 900)

In [2]: %timeit np.linalg.norm(X, 'fro')
100 loops, best of 3: 4.8 ms per loop

In [3]: %timeit np.sqrt(np.sum(X ** 2))
100 loops, best of 3: 4.5 ms per loop

In [4]: %timeit np.sqrt(np.dot(X.ravel(), X.ravel()))
1000 loops, best of 3: 552 µs per loop

In [5]: print np.linalg.norm(X, 'fro') - np.sqrt(np.dot(X.ravel(), X.ravel()))
3.52429196937e-12

And if I'm doing it right, it's also better in terms of memory (but
not better than the sum of squares approach):


Filename: fro.py

Line #    Mem usage    Increment   Line Contents
================================================
     7     42.7 MiB      0.0 MiB   def sumsq(X):
     8     42.7 MiB      0.0 MiB       return np.sqrt(np.sum(X ** 2))


Filename: fro.py

Line #    Mem usage    Increment   Line Contents
================================================
    10     42.7 MiB      0.0 MiB   def raveled(X):
    11     44.4 MiB      1.7 MiB       return
np.sqrt(np.dot(X.ravel(), X.ravel()))


Filename: fro.py

Line #    Mem usage    Increment   Line Contents
================================================
     4     35.7 MiB      0.0 MiB   def linalg(X):
     5     42.7 MiB      7.0 MiB       return np.linalg.norm(X, 'fro')

On Thu, Nov 7, 2013 at 11:46 AM, Vlad Niculae <zephy...@gmail.com> wrote:
> Come to think of it, Olivier, what do you mean when you say L-BFGS-B
> has higher residuals? I fail to see this trend; what I see is that L1
>> L2 > no reg. in terms of residuals, with different methods coming
> very close to one another for the same regularisation objective.
> Could you be more specific?
>
> On Thu, Nov 7, 2013 at 11:12 AM, Vlad Niculae <zephy...@gmail.com> wrote:
>> The regularization is the same, I think the higher residuals come from
>> the fact that the gradient is raveled, so compared to `n_targets`
>> independent problems, it will take different steps.
>>
>> I don't think there are any convergence issues because I made the
>> solvers print a warning in case they don't converge (I had a bug in
>> the projected gradient regularized implementation, because an L2
>> penalty changes the hessian too).
>>
>> Indeed when I say lasso I meant elastic net. Somebody would need to
>> code it though, but I could try a looped version similar to the kkt
>> one for now. WDYT?
>> I think I will update the notebook to use real data (Y = 20newsgroups,
>> X = NMF learned components, so we would effectively be benchmarking
>> the NMF transform task).
>> Any suggestion of data in a different regime to add, faces maybe?
>>
>> After this change I think I can blog it and postpone multitask elastic
>> net for a later update.
>>
>> As for what happens to NMF I think the way to go is to refactor the
>> projected gradient solver, add the kind of regularization that is in
>> the notebook and rename some of the variables to make it more
>> readable, now that I understand it better.  Then we can deprecate or
>> remove completely the `sparseness=..., beta=..., eta=...` parameters
>> of ProjectedGradientNMF and replace them with `components_l1_reg,
>> components_l2_reg, repr_l1_reg, repr_l2_reg` (or alternatively, an
>> alpha/l1_ratio parametrization).
>>
>> This can be done before the release.
>>
>> Cheers,
>> Vlad
>>
>>
>> On Thu, Nov 7, 2013 at 9:51 AM, Olivier Grisel <olivier.gri...@ensta.org> 
>> wrote:
>>> 2013/11/7 Vlad Niculae <zephy...@gmail.com>:
>>>> Hi everybody,
>>>>
>>>> I just updated the gist quite a lot, please take a look:
>>>> http://nbviewer.ipython.org/7224672
>>>>
>>>> I'll go to sleep and interpret it with a fresh eye tomorrow, but
>>>> what's interesting at the moment is:
>>>>
>>>> KKT's performance is quite constant,
>>>> PG with sparsity penalties (the new, simpler ones, not the
>>>> implementation in current master, also with fixed stopping condition)
>>>> is quite fast!
>>>> Residual calculation is fixed and suggests that the solvers work well.
>>>
>>> It seems that the L-BFGS-B residuals can still be significantly higher
>>> than the others. Is the regularization the same for this optimizer? Or
>>> is the convergence criterion too lax?
>>>
>>> Thanks for this evaluation Vlad. Indeed adding multitask lasso and
>>> maybe elastic net would be great.
>>>
>>> --
>>> Olivier
>>>
>>> ------------------------------------------------------------------------------
>>> November Webinars for C, C++, Fortran Developers
>>> Accelerate application performance with scalable programming models. Explore
>>> techniques for threading, error checking, porting, and tuning. Get the most
>>> from the latest Intel processors and coprocessors. See abstracts and 
>>> register
>>> http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.clktrk
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-general@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
November Webinars for C, C++, Fortran Developers
Accelerate application performance with scalable programming models. Explore
techniques for threading, error checking, porting, and tuning. Get the most 
from the latest Intel processors and coprocessors. See abstracts and register
http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Benchmarking non-negative least squares solvers, work in progress

Reply via email to