Hi Andy,
On Mon, Sep 17, 2012 at 10:30 AM, Andreas Mueller
<[email protected]>wrote:
> Hi Ariel.
>
> I think there is some confusion about what rho means, see
> here: https://github.com/scikit-learn/scikit-learn/issues/1139
> and here:
> https://github.com/scikit-learn/scikit-learn/commit/9987b61cf87aa8eeecd9c8e2fae6c29599892613
>
> Atm, rho=1 is L1, and rho=0 is L2 in ElasticNet but the other way around
> in SGDClassifier ....
> Unfortunately, the docs were / are not very clear on this.
>
>
> Can you give a reference for your understanding of rho?
>
>
Thanks for your response and for pointing me to that commit - I didn't even
know this was being actively discussed! I was looking exactly at that part
of the doc-string you refer to in that commit. Now (in Alex's commit) the
docstring is very strange: it looks as though rho doesn't matter at all!
rho=0 is L1 and rho=1 is L1 too! My understanding is that 'rho' should
serve the purpose of what is referred to as '1-\alpha' in equation 3.54 in
chapter 3 of the Elements of Statistical Learning (this one:
http://www-stat.stanford.edu/~tibs/ElemStatLearn/). That is, when alpha is
high, L2 is penalized and when alpha is low L1 is penalized.
The code itself:
https://github.com/scikit-learn/scikit-learn/blob/9987b61cf87aa8eeecd9c8e2fae6c29599892613/sklearn/linear_model/coordinate_descent.py#L190
Also suggests that high rho would lead to high L1 regularization and low
rho would lead to high L2 regularization, which is consistent with what I
get in my computation. So, I am slowly coming to the conclusion that the
docstring should read (e.g. on line 63):
"[...rho=0...]the penalty is an L2 penalty. For rho = 1 it is an L1
penalty"
Does that make sense to you?
Cheers,
Ariel
> Cheers,
> Andy
>
>
>
> On 09/17/2012 05:56 PM, Ariel Rokem wrote:
>
> Hi everyone,
>
> I am using the sklearn.linear_model.ElasticNet class to fit some data.
> The structure of the data is y = Xw, and I am trying to solve for w where
> y.shape is (150,) and X.shape is (150,150), with a non-negativity
> constraint. Both y and each column of X is mean-removed. Some of the
> columns of X are quite correlated with each other. I have been playing
> around a bit with different settings of inputs to the initialization of
> ElasticNet and I am running into the following issue understanding alpha
> and rho: for a given value of alpha (rather small, alpha=0.0075) , when I
> change rho from 0 to 0.5 to 1, I get smaller L1 norm (np.sum(w)) and a
> larger L2 norm (np.sum(w**2)). This defies my intuition that larger values
> of rho should make ElasticNet more and more averse to growing L2 norm and
> less and less averse to growing L1 norm, so I was expecting the exact
> opposite. What is the explanation for this behavior?
>
> Thanks!
>
> Ariel
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>
>
>
> _______________________________________________
> Scikit-learn-general mailing
> [email protected]https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general