hi,
try this:
---
from sklearn import datasets, linear_model
d = datasets.load_diabetes()
print linear_model.Lasso(normalize=True).fit(d['data'], d['target']).coef_
print linear_model.Lasso(normalize=False).fit(2. * d['data'], d['target']).coef_
returns:
[ 0. -0.
Alexandre, my bad completely, and I apologize for taking up your time.
I was mixing up normalize with standardize, which is why none of it made
sense.
Thanks.
Best Regards.
--
Get 100% visibility into Java/.NET code
I haven't seen any patch for this precisely, though it's a known issue
(even if it doesn't seem to be explicitly ticketed; it's closest to
https://github.com/scikit-learn/scikit-learn/issues/1179). There are
various tricky cases not currently supported for which it's easiest to roll
your own
I've been playing around with Lasso and Lars, but there's something that
bothers me about standardization.
If I don't standardize to N(0, 1), these procedures indicate that a certain
set of variables are the most important. Yet, if I standardize, I get a
completely different set of variables. As
Hi,
The main question is, what is your definition of an important variable?
Gilles
On 1 June 2013 14:22, o m oda...@gmail.com wrote:
I've been playing around with Lasso and Lars, but there's something that
bothers me about standardization.
If I don't standardize to N(0, 1), these procedures
On 06/01/2013 01:03 PM, Joel Nothman wrote:
I haven't seen any patch for this precisely, though it's a known issue
(even if it doesn't seem to be explicitly ticketed; it's closest to
https://github.com/scikit-learn/scikit-learn/issues/1179). There are
various tricky cases not currently
Hi,
Unfortunately, statistics is not magic, and they are many situation in
which l1 recovery is not garanteed to work.
I cannot give magic answers, and I suggest that you think a lot about how
you can validate any findings using external sources. That said, I would
suggest, in general, to
The main question is, what is your definition of an important variable?
Gilles
That's a good question;-) Seriously.
I would define it - with many closely related variables - as a member
of a set that gives you the best predictability.
LARS and LASSO with cross validation provide a good story
On 06/01/2013 07:51 PM, o m wrote:
The main question is, what is your definition of an important variable?
Gilles
That's a good question;-) Seriously.
I would define it - with many closely related variables - as a member of a
set that gives you the best predictability.
LARS and LASSO
Hi,
I asked a (perhaps too vague?) question about the use of Random
Forests with a mix of categorical and lexical features on two ML
forums (stats.SE and MetaOp), but since it has received no attention,
I figured that it might work better on this list (I'm using sklearn's
RF of course):
I'm
Hi Christian,
Some time ago I had similar problems. I.e., I wanted to use additional
features to my lexical features and simple concatanation didn't work
that well for me even though both feature sets on their own performed
pretty well.
You can follow the discussion about my problem here [1]
On 06/01/2013 08:30 PM, Christian Jauvin wrote:
Hi,
I asked a (perhaps too vague?) question about the use of Random
Forests with a mix of categorical and lexical features on two ML
forums (stats.SE and MetaOp), but since it has received no attention,
I figured that it might work better on
Hi all,
I was wondering if anyone can point me to a tutorial on clustering text
documents, but then also displaying the results in a graph ? I see some
examples on clustering text documents, but I'd like to be able to visualize
the clusters.
Any help would be appreciated!
Thank you,
Harold
Andy, on reading your tip, and reflecting on what I do, I'm tempted to claim
that standardization is very important, regardless ...
Assume x0 is very important but has a tiny range (-1/100, 1/100) - all
other
variables being significantly larger in range.
Lars/Lasso will drop x0 until the end,
Ahh and I'd forgotten that 1574 included support in grid search. I should
perhaps take a look at that.
On Sun, Jun 2, 2013 at 1:10 AM, Andreas Mueller amuel...@ais.uni-bonn.dewrote:
On 06/01/2013 01:03 PM, Joel Nothman wrote:
I haven't seen any patch for this precisely, though it's a known
TL;DR: a list of `namedtuple`s is a poor solution for parameter search
results; here I suggest better alternatives.
I would like to draw some attention to #1787 which proposes that structured
arrays be used to return parameter search (e.g. GridSearchCV) results. A
few proposals have sought
On Sun, Jun 2, 2013 at 1:35 PM, Mathieu Blondel math...@mblondel.orgwrote:
Sorry for the late answer. It's hard for me to keep track of all the
design-related discussions lately.
No worries. Thanks for the reply!
For me, the advantages of the sequences of sequences format are:
- they are
17 matches
Mail list logo