Thanks for the awesome work Vlad! It's nice to see good progress.

On Thu, Nov 7, 2013 at 7:12 PM, Vlad Niculae <zephy...@gmail.com> wrote:

> The regularization is the same, I think the higher residuals come from
> the fact that the gradient is raveled, so compared to `n_targets`
> independent problems, it will take different steps.
>

Intuitively, the step size should be smaller if you update the entire
matrix (if you update block by block, you can choose a step size which is
tailored to each block). But projected gradient chooses a single step size
for the entire matrix too so LBFGS-B is not different in this regard.

The tolerance constant in the stopping criterion will also affect the
results. What I would do to check for the correctness of each solver is to
choose a solver which we know for sure is correct, run it for many
iterations and regard its solution as the optimal W*. Then I would plot the
difference over time between each solver's current solution and W*. If the
solvers are correctly implemented, the difference should go to zero.


>
> I don't think there are any convergence issues because I made the
> solvers print a warning in case they don't converge (I had a bug in
> the projected gradient regularized implementation, because an L2
> penalty changes the hessian too).
>
> Indeed when I say lasso I meant elastic net. Somebody would need to
> code it though, but I could try a looped version similar to the kkt
> one for now. WDYT?
>

Do you think that the KKT solver in SciPy (I think a better name would be
"active-set solver") is realistic? Even with parallel computations, I feel
that it will be very slow to iterate over each target (in the topic
modeling case). The main advantage of solving the whole problem at once is
that you can take advantage of the sparsity when computing the gradient.


> I think I will update the notebook to use real data (Y = 20newsgroups,
> X = NMF learned components, so we would effectively be benchmarking
> the NMF transform task).
> Any suggestion of data in a different regime to add, faces maybe?
>
> After this change I think I can blog it and postpone multitask elastic
> net for a later update.
>
> As for what happens to NMF I think the way to go is to refactor the
> projected gradient solver, add the kind of regularization that is in
> the notebook and rename some of the variables to make it more
> readable, now that I understand it better.  Then we can deprecate or
> remove completely the `sparseness=..., beta=..., eta=...` parameters
>

+1


> of ProjectedGradientNMF and replace them with `components_l1_reg,
> components_l2_reg, repr_l1_reg, repr_l2_reg` (or alternatively, an
> alpha/l1_ratio parametrization).
>

Do we need two different regularization parameters for coefficients and
components? MiniBatchDictionaryLearning seems to have only one "alpha".

I would still think that a linear_model.nnls module would be a good idea
(with both functions and classes). The NMF code would only import the
functions.

Mathieu


> On Thu, Nov 7, 2013 at 9:51 AM, Olivier Grisel <olivier.gri...@ensta.org>
> wrote:
> > 2013/11/7 Vlad Niculae <zephy...@gmail.com>:
> >> Hi everybody,
> >>
> >> I just updated the gist quite a lot, please take a look:
> >> http://nbviewer.ipython.org/7224672
> >>
> >> I'll go to sleep and interpret it with a fresh eye tomorrow, but
> >> what's interesting at the moment is:
> >>
> >> KKT's performance is quite constant,
> >> PG with sparsity penalties (the new, simpler ones, not the
> >> implementation in current master, also with fixed stopping condition)
> >> is quite fast!
> >> Residual calculation is fixed and suggests that the solvers work well.
> >
> > It seems that the L-BFGS-B residuals can still be significantly higher
> > than the others. Is the regularization the same for this optimizer? Or
> > is the convergence criterion too lax?
> >
> > Thanks for this evaluation Vlad. Indeed adding multitask lasso and
> > maybe elastic net would be great.
> >
> > --
> > Olivier
> >
> >
> ------------------------------------------------------------------------------
> > November Webinars for C, C++, Fortran Developers
> > Accelerate application performance with scalable programming models.
> Explore
> > techniques for threading, error checking, porting, and tuning. Get the
> most
> > from the latest Intel processors and coprocessors. See abstracts and
> register
> >
> http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.clktrk
> > _______________________________________________
> > Scikit-learn-general mailing list
> > Scikit-learn-general@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
> ------------------------------------------------------------------------------
> November Webinars for C, C++, Fortran Developers
> Accelerate application performance with scalable programming models.
> Explore
> techniques for threading, error checking, porting, and tuning. Get the most
> from the latest Intel processors and coprocessors. See abstracts and
> register
> http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
------------------------------------------------------------------------------
November Webinars for C, C++, Fortran Developers
Accelerate application performance with scalable programming models. Explore
techniques for threading, error checking, porting, and tuning. Get the most 
from the latest Intel processors and coprocessors. See abstracts and register
http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to