I finally found a desk and some focus. I addressed Mathieu's
suggestions and added some timings on real data (with a lot of
concessions so that it would run reasonably quick on my machine).
Here's the results: http://nbviewer.ipython.org/7224672
It becomes clear that `tol` still means different
2013/11/7 Mathieu Blondel math...@mblondel.org:
On Fri, Nov 8, 2013 at 12:28 AM, Vlad Niculae zephy...@gmail.com wrote:
I feel like this would go against explicit is better than implicit,
but without it grid search would indeed be awkward. Maybe:
if self.alpha_coef == 'same':
About the LBFGS-B residuals (non-)issue I was probably confused by the
overlapping on the plot and mis-interpreted the location of the PG-l1
and PG-l2 curves.
--
Olivier
--
November Webinars for C, C++, Fortran
Re: the discussion we had at PyCon.fr, I noticed that the internal
elastic net coordinate descent functions are parametrized with
`l1_reg` and `l2_reg`, but the exposed classes and functions have
`alpha` and `l1_ratio`. Only yesterday there was somebody on IRC who
couldn't match Ridge with
And lambda is a reserved keyword in Python ;-)
On Fri, Nov 8, 2013 at 4:59 PM, Olivier Grisel olivier.gri...@ensta.orgwrote:
2013/11/7 Mathieu Blondel math...@mblondel.org:
On Fri, Nov 8, 2013 at 12:28 AM, Vlad Niculae zephy...@gmail.com
wrote:
I feel like this would go against
Just my 0.02$ as a user: I was also a confused/put-off by `alpha` and
`l1_ratio` when I first explored SGDClassifier, I found those names to
be pretty inconsistent --- plus I tend to call my regularization
parameters `lambda` and use `alpha` for learning rates. I'm sure other
people associate
SGDClassifier adopted the parameter names of ElasticNet (which has been
around in sklearn for longer) for consistency reasons.
I agree that we should strive for concise and intuitive parameter names
such as ``l1_ratio``.
Naming in sklearn is actually quite unfortunate since the popular R package
We cannot use lambda as parameter name because it is a reserved
keyword of the python language (for defining anonymous functions).
This is why used alpha instead of lambda for the ElasticNet /
Lasso model initially and then this notation was reused in more
recently implemented estimators such as
just a remark in LogisticRegression you can use L1 and L2 reg and
there is a single param that is alpha.
It's not trivial to have a consistent naming for regularization param.
In SVC it is C as it's the common
naming... but it corresponds to 1/l2_reg with what you suggest...
Alex
On Fri, Nov 08, 2013 at 11:56:24AM +0100, Olivier Grisel wrote:
In retrospect I would have prefered it named something explicit like
regularization or l2_reg instead of alpha.
Agreed.
Still I like the (alpha, l1_ratio) parameterization better over the
(l2_reg, l1_reg) parameter set
A quick remark:
Instead of:
%pylab inline --no-import-all
you can just do:
%matplotlib inline
--
Olivier
--
November Webinars for C, C++, Fortran Developers
Accelerate application performance with scalable
2013/11/7 Vlad Niculae zephy...@gmail.com:
Hi everybody,
I just updated the gist quite a lot, please take a look:
http://nbviewer.ipython.org/7224672
I'll go to sleep and interpret it with a fresh eye tomorrow, but
what's interesting at the moment is:
KKT's performance is quite constant,
The regularization is the same, I think the higher residuals come from
the fact that the gradient is raveled, so compared to `n_targets`
independent problems, it will take different steps.
I don't think there are any convergence issues because I made the
solvers print a warning in case they don't
Come to think of it, Olivier, what do you mean when you say L-BFGS-B
has higher residuals? I fail to see this trend; what I see is that L1
L2 no reg. in terms of residuals, with different methods coming
very close to one another for the same regularisation objective.
Could you be more specific?
Also I found this pretty big difference in timing when computing
elementwise norms and products.
In [1]: X = np.random.randn(1000, 900)
In [2]: %timeit np.linalg.norm(X, 'fro')
100 loops, best of 3: 4.8 ms per loop
In [3]: %timeit np.sqrt(np.sum(X ** 2))
100 loops, best of 3: 4.5 ms per loop
In reply to Olivier's previous comment, as it's not at all obvious
from the plots, I chose a case where lbfgsb-l1 seems very far away and
printed the residuals of it and of pg-l1:
In [227]:
tall_med[tall_med['solver'] == 'lbfgsb-l1']['residual']
Out[227]:
2580.9370832
2650.9405044
272
2013/11/7, Vlad Niculae zephy...@gmail.com:
Also I found this pretty big difference in timing when computing
elementwise norms and products.
This is a known problem with np.linalg.norm, and so is the memory
consumption. You should use sklearn.utils.extmath.norm for the
Frobenius norm.
Also
This is a known problem with np.linalg.norm, and so is the memory
consumption. You should use sklearn.utils.extmath.norm for the
Frobenius norm.
Hmm. Indeed I missed that, but still, this is a bit odd.
sklearn.utils.extmath.norm is slower than raveling on my anaconda with
MKL accelerate setup:
Thanks for the awesome work Vlad! It's nice to see good progress.
On Thu, Nov 7, 2013 at 7:12 PM, Vlad Niculae zephy...@gmail.com wrote:
The regularization is the same, I think the higher residuals come from
the fact that the gradient is raveled, so compared to `n_targets`
independent
2013/11/7 Vlad Niculae zephy...@gmail.com:
This is a known problem with np.linalg.norm, and so is the memory
consumption. You should use sklearn.utils.extmath.norm for the
Frobenius norm.
Hmm. Indeed I missed that, but still, this is a bit odd.
sklearn.utils.extmath.norm is slower than
2013/11/7 Mathieu Blondel math...@mblondel.org:
Do we need two different regularization parameters for coefficients and
components? MiniBatchDictionaryLearning seems to have only one alpha.
For reproducing results from literature this is useful. E.g. Hoyer
only regularizes one of the matrices.
On Thu, Nov 7, 2013 at 11:57 PM, Lars Buitinck larsm...@gmail.com wrote:
For reproducing results from literature this is useful. E.g. Hoyer
only regularizes one of the matrices.
For efficient grid-search with shared values, we could do this:
if self.alpha_comp is None and self.alpha_coef is
I feel like this would go against explicit is better than implicit,
but without it grid search would indeed be awkward. Maybe:
if self.alpha_coef == 'same':
alpha_coef = self.alpha_comp
?
On Thu, Nov 7, 2013 at 4:19 PM, Mathieu Blondel math...@mblondel.org wrote:
On Thu, Nov 7, 2013 at
On Fri, Nov 8, 2013 at 12:28 AM, Vlad Niculae zephy...@gmail.com wrote:
I feel like this would go against explicit is better than implicit,
but without it grid search would indeed be awkward. Maybe:
if self.alpha_coef == 'same':
alpha_coef = self.alpha_comp
?
Sounds good to me!
Hi everybody,
I just updated the gist quite a lot, please take a look:
http://nbviewer.ipython.org/7224672
I'll go to sleep and interpret it with a fresh eye tomorrow, but
what's interesting at the moment is:
KKT's performance is quite constant,
PG with sparsity penalties (the new, simpler
I'd love to add non-negative lasso to this mix. However, I noticed
that cd_fast.pyx is missing the positive=True option in multitask
lasso (as well as the sparse variant). Is there any other reason for
this or just that nobody needed it?
indeed nobody needed it :)
thanks for looking into
By the way, the MiniBatchDictLearning can be trivially modified to do
this: do a non-negative Lasso, instead of a Lasso. This is discussed in
the original paper.
if somebody has some time to add a positive option to LassoLars like
available in Lasso that would be great. It would then be
Interesting. Also not that the current nls_kkt implementation is using
a sequential for loop over the columns. This loop could probably be
embarassingly parallelized with very low overhead with thread as scipy
is probably releasing the GIL.
This is another potential motivation for me to work on a
Does anyone have a explanation for the discrepancy in the residuals
for the lbfgs-b and nnls_kkt? If nnls_kkt can stay so close to zero,
unregularized lbfgs-b should reach be able to reach the same training
set MSE, no?
--
Olivier
i guess it's just a bug in how the solvers return residuals, I'll add
some unit tests with manually-computed residuals to check.
On Wed, Oct 30, 2013 at 9:48 AM, Olivier Grisel
olivier.gri...@ensta.org wrote:
Does anyone have a explanation for the discrepancy in the residuals
for the lbfgs-b
I think MiniBatchDictLearning supports only dense arrays, though.
Mathieu
PS: Very nice notebooks, Vlad and Olivier.
On Wed, Oct 30, 2013 at 5:44 PM, Olivier Grisel olivier.gri...@ensta.orgwrote:
Interesting. Also not that the current nls_kkt implementation is using
a sequential for loop
2013/10/30 Mathieu Blondel math...@mblondel.org:
I think MiniBatchDictLearning supports only dense arrays, though.
Mathieu
PS: Very nice notebooks, Vlad and Olivier.
This is all Vlad's work here.
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Thanks Mathieu, well part of it comes from your gist (I added an
attribution now) ;)
Non-negative lasso is really interesting, I forgot about it but I
think it would be very interesting to compare qualitatively.
Vlad
On Wed, Oct 30, 2013 at 10:15 AM, Olivier Grisel
olivier.gri...@ensta.org
On Wed, Oct 30, 2013 at 12:49:49AM +0100, Vlad Niculae wrote:
Adding L1 (elementwise) regularization makes L-BFGS-B converge much
quicker. This is cool because for NMF such a penalty has other
advantages.
By the way, the MiniBatchDictLearning can be trivially modified to do
this: do a
34 matches
Mail list logo