Re: [Scikit-learn-general] Benchmarking non-negative least squares solvers, work in progress

Vlad Niculae Thu, 07 Nov 2013 04:06:11 -0800

In reply to Olivier's previous comment, as it's not at all obvious
from the plots, I chose a case where lbfgsb-l1 seems very far away and
printed the residuals of it and of pg-l1:


In [227]:

tall_med[tall_med['solver'] == 'lbfgsb-l1']['residual']

Out[227]:

258    0.9370832
265    0.9405044
272    0.9342741
279    0.9336801
286    0.9299868
293    0.9296223
300    0.9261801
307    0.9273685
314    0.9274671
Name: residual, dtype: object

In [228]:

tall_med[tall_med['solver'] == 'pg-l1']['residual']

Out[228]:

255     0.936736
262     0.940665
269    0.9343853
276    0.9337552
283    0.9300757
290    0.9297058
297    0.9262745
304    0.9274619
311    0.9275654
Name: residual, dtype: object

It looks spot on. Note that tolerance is 1e-3.  Any idea how to make
it visible in the plot when two lines are so close?

On Thu, Nov 7, 2013 at 12:26 PM, Vlad Niculae <[email protected]> wrote:
> Also I found this pretty big difference in timing when computing
> elementwise norms and products.
>
> In [1]: X = np.random.randn(1000, 900)
>
> In [2]: %timeit np.linalg.norm(X, 'fro')
> 100 loops, best of 3: 4.8 ms per loop
>
> In [3]: %timeit np.sqrt(np.sum(X ** 2))
> 100 loops, best of 3: 4.5 ms per loop
>
> In [4]: %timeit np.sqrt(np.dot(X.ravel(), X.ravel()))
> 1000 loops, best of 3: 552 µs per loop
>
> In [5]: print np.linalg.norm(X, 'fro') - np.sqrt(np.dot(X.ravel(), X.ravel()))
> 3.52429196937e-12
>
> And if I'm doing it right, it's also better in terms of memory (but
> not better than the sum of squares approach):
>
>
> Filename: fro.py
>
> Line #    Mem usage    Increment   Line Contents
> ================================================
>      7     42.7 MiB      0.0 MiB   def sumsq(X):
>      8     42.7 MiB      0.0 MiB       return np.sqrt(np.sum(X ** 2))
>
>
> Filename: fro.py
>
> Line #    Mem usage    Increment   Line Contents
> ================================================
>     10     42.7 MiB      0.0 MiB   def raveled(X):
>     11     44.4 MiB      1.7 MiB       return
> np.sqrt(np.dot(X.ravel(), X.ravel()))
>
>
> Filename: fro.py
>
> Line #    Mem usage    Increment   Line Contents
> ================================================
>      4     35.7 MiB      0.0 MiB   def linalg(X):
>      5     42.7 MiB      7.0 MiB       return np.linalg.norm(X, 'fro')
>
> On Thu, Nov 7, 2013 at 11:46 AM, Vlad Niculae <[email protected]> wrote:
>> Come to think of it, Olivier, what do you mean when you say L-BFGS-B
>> has higher residuals? I fail to see this trend; what I see is that L1
>>> L2 > no reg. in terms of residuals, with different methods coming
>> very close to one another for the same regularisation objective.
>> Could you be more specific?
>>
>> On Thu, Nov 7, 2013 at 11:12 AM, Vlad Niculae <[email protected]> wrote:
>>> The regularization is the same, I think the higher residuals come from
>>> the fact that the gradient is raveled, so compared to `n_targets`
>>> independent problems, it will take different steps.
>>>
>>> I don't think there are any convergence issues because I made the
>>> solvers print a warning in case they don't converge (I had a bug in
>>> the projected gradient regularized implementation, because an L2
>>> penalty changes the hessian too).
>>>
>>> Indeed when I say lasso I meant elastic net. Somebody would need to
>>> code it though, but I could try a looped version similar to the kkt
>>> one for now. WDYT?
>>> I think I will update the notebook to use real data (Y = 20newsgroups,
>>> X = NMF learned components, so we would effectively be benchmarking
>>> the NMF transform task).
>>> Any suggestion of data in a different regime to add, faces maybe?
>>>
>>> After this change I think I can blog it and postpone multitask elastic
>>> net for a later update.
>>>
>>> As for what happens to NMF I think the way to go is to refactor the
>>> projected gradient solver, add the kind of regularization that is in
>>> the notebook and rename some of the variables to make it more
>>> readable, now that I understand it better.  Then we can deprecate or
>>> remove completely the `sparseness=..., beta=..., eta=...` parameters
>>> of ProjectedGradientNMF and replace them with `components_l1_reg,
>>> components_l2_reg, repr_l1_reg, repr_l2_reg` (or alternatively, an
>>> alpha/l1_ratio parametrization).
>>>
>>> This can be done before the release.
>>>
>>> Cheers,
>>> Vlad
>>>
>>>
>>> On Thu, Nov 7, 2013 at 9:51 AM, Olivier Grisel <[email protected]> 
>>> wrote:
>>>> 2013/11/7 Vlad Niculae <[email protected]>:
>>>>> Hi everybody,
>>>>>
>>>>> I just updated the gist quite a lot, please take a look:
>>>>> http://nbviewer.ipython.org/7224672
>>>>>
>>>>> I'll go to sleep and interpret it with a fresh eye tomorrow, but
>>>>> what's interesting at the moment is:
>>>>>
>>>>> KKT's performance is quite constant,
>>>>> PG with sparsity penalties (the new, simpler ones, not the
>>>>> implementation in current master, also with fixed stopping condition)
>>>>> is quite fast!
>>>>> Residual calculation is fixed and suggests that the solvers work well.
>>>>
>>>> It seems that the L-BFGS-B residuals can still be significantly higher
>>>> than the others. Is the regularization the same for this optimizer? Or
>>>> is the convergence criterion too lax?
>>>>
>>>> Thanks for this evaluation Vlad. Indeed adding multitask lasso and
>>>> maybe elastic net would be great.
>>>>
>>>> --
>>>> Olivier
>>>>
>>>> ------------------------------------------------------------------------------
>>>> November Webinars for C, C++, Fortran Developers
>>>> Accelerate application performance with scalable programming models. 
>>>> Explore
>>>> techniques for threading, error checking, porting, and tuning. Get the most
>>>> from the latest Intel processors and coprocessors. See abstracts and 
>>>> register
>>>> http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.clktrk
>>>> _______________________________________________
>>>> Scikit-learn-general mailing list
>>>> [email protected]
>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
November Webinars for C, C++, Fortran Developers
Accelerate application performance with scalable programming models. Explore
techniques for threading, error checking, porting, and tuning. Get the most 
from the latest Intel processors and coprocessors. See abstracts and register
http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Benchmarking non-negative least squares solvers, work in progress

Reply via email to