date:20130601

Re: [Scikit-learn-general] normalize makes no difference to Lasso

2013-06-01 Thread Alexandre Gramfort

hi, try this: --- from sklearn import datasets, linear_model d = datasets.load_diabetes() print linear_model.Lasso(normalize=True).fit(d['data'], d['target']).coef_ print linear_model.Lasso(normalize=False).fit(2. * d['data'], d['target']).coef_ returns: [ 0. -0.

[Scikit-learn-general] normalize makes no difference to Lasso

2013-06-01 Thread o m

Alexandre, my bad completely, and I apologize for taking up your time. I was mixing up normalize with standardize, which is why none of it made sense. Thanks. Best Regards. -- Get 100% visibility into Java/.NET code

Re: [Scikit-learn-general] GridSearch with sample_weights

2013-06-01 Thread Joel Nothman

I haven't seen any patch for this precisely, though it's a known issue (even if it doesn't seem to be explicitly ticketed; it's closest to https://github.com/scikit-learn/scikit-learn/issues/1179). There are various tricky cases not currently supported for which it's easiest to roll your own

[Scikit-learn-general] To standardize is the question ...

2013-06-01 Thread o m

I've been playing around with Lasso and Lars, but there's something that bothers me about standardization. If I don't standardize to N(0, 1), these procedures indicate that a certain set of variables are the most important. Yet, if I standardize, I get a completely different set of variables. As

Re: [Scikit-learn-general] To standardize is the question ...

2013-06-01 Thread Gilles Louppe

Hi, The main question is, what is your definition of an important variable? Gilles On 1 June 2013 14:22, o m oda...@gmail.com wrote: I've been playing around with Lasso and Lars, but there's something that bothers me about standardization. If I don't standardize to N(0, 1), these procedures

Re: [Scikit-learn-general] GridSearch with sample_weights

2013-06-01 Thread Andreas Mueller

On 06/01/2013 01:03 PM, Joel Nothman wrote: I haven't seen any patch for this precisely, though it's a known issue (even if it doesn't seem to be explicitly ticketed; it's closest to https://github.com/scikit-learn/scikit-learn/issues/1179). There are various tricky cases not currently

Re: [Scikit-learn-general] To standardize is the question ...

2013-06-01 Thread Gael Varoquaux

Hi, Unfortunately, statistics is not magic, and they are many situation in which l1 recovery is not garanteed to work. I cannot give magic answers, and I suggest that you think a lot about how you can validate any findings using external sources. That said, I would suggest, in general, to

[Scikit-learn-general] To standardize is the question ...

2013-06-01 Thread o m

The main question is, what is your definition of an important variable? Gilles That's a good question;-) Seriously. I would define it - with many closely related variables - as a member of a set that gives you the best predictability. LARS and LASSO with cross validation provide a good story

Re: [Scikit-learn-general] To standardize is the question ...

2013-06-01 Thread Andreas Mueller

On 06/01/2013 07:51 PM, o m wrote: The main question is, what is your definition of an important variable? Gilles That's a good question;-) Seriously. I would define it - with many closely related variables - as a member of a set that gives you the best predictability. LARS and LASSO

[Scikit-learn-general] Random Forest with a mix of categorical and lexical features

2013-06-01 Thread Christian Jauvin

Hi, I asked a (perhaps too vague?) question about the use of Random Forests with a mix of categorical and lexical features on two ML forums (stats.SE and MetaOp), but since it has received no attention, I figured that it might work better on this list (I'm using sklearn's RF of course): I'm

Re: [Scikit-learn-general] Random Forest with a mix of categorical and lexical features

2013-06-01 Thread Philipp Singer

Hi Christian, Some time ago I had similar problems. I.e., I wanted to use additional features to my lexical features and simple concatanation didn't work that well for me even though both feature sets on their own performed pretty well. You can follow the discussion about my problem here [1]

Re: [Scikit-learn-general] Random Forest with a mix of categorical and lexical features

2013-06-01 Thread Andreas Mueller

On 06/01/2013 08:30 PM, Christian Jauvin wrote: Hi, I asked a (perhaps too vague?) question about the use of Random Forests with a mix of categorical and lexical features on two ML forums (stats.SE and MetaOp), but since it has received no attention, I figured that it might work better on

[Scikit-learn-general] Clustering of Text Documents

2013-06-01 Thread Harold Nguyen

Hi all, I was wondering if anyone can point me to a tutorial on clustering text documents, but then also displaying the results in a graph ? I see some examples on clustering text documents, but I'd like to be able to visualize the clusters. Any help would be appreciated! Thank you, Harold

Re: [Scikit-learn-general] To standardize is the question ...

2013-06-01 Thread o m

Andy, on reading your tip, and reflecting on what I do, I'm tempted to claim that standardization is very important, regardless ... Assume x0 is very important but has a tiny range (-1/100, 1/100) - all other variables being significantly larger in range. Lars/Lasso will drop x0 until the end,

Re: [Scikit-learn-general] GridSearch with sample_weights

2013-06-01 Thread Joel Nothman

Ahh and I'd forgotten that 1574 included support in grid search. I should perhaps take a look at that. On Sun, Jun 2, 2013 at 1:10 AM, Andreas Mueller amuel...@ais.uni-bonn.dewrote: On 06/01/2013 01:03 PM, Joel Nothman wrote: I haven't seen any patch for this precisely, though it's a known

[Scikit-learn-general] How to present parameter search results

2013-06-01 Thread Joel Nothman

TL;DR: a list of `namedtuple`s is a poor solution for parameter search results; here I suggest better alternatives. I would like to draw some attention to #1787 which proposes that structured arrays be used to return parameter search (e.g. GridSearchCV) results. A few proposals have sought

Re: [Scikit-learn-general] Multilabel sequences of sequences considered harmful

2013-06-01 Thread Joel Nothman

On Sun, Jun 2, 2013 at 1:35 PM, Mathieu Blondel math...@mblondel.orgwrote: Sorry for the late answer. It's hard for me to keep track of all the design-related discussions lately. No worries. Thanks for the reply! For me, the advantages of the sequences of sequences format are: - they are

Re: [Scikit-learn-general] normalize makes no difference to Lasso

[Scikit-learn-general] normalize makes no difference to Lasso

Re: [Scikit-learn-general] GridSearch with sample_weights

[Scikit-learn-general] To standardize is the question ...

Re: [Scikit-learn-general] To standardize is the question ...

Re: [Scikit-learn-general] GridSearch with sample_weights

Re: [Scikit-learn-general] To standardize is the question ...

[Scikit-learn-general] To standardize is the question ...

Re: [Scikit-learn-general] To standardize is the question ...

[Scikit-learn-general] Random Forest with a mix of categorical and lexical features

Re: [Scikit-learn-general] Random Forest with a mix of categorical and lexical features

Re: [Scikit-learn-general] Random Forest with a mix of categorical and lexical features

[Scikit-learn-general] Clustering of Text Documents

Re: [Scikit-learn-general] To standardize is the question ...

Re: [Scikit-learn-general] GridSearch with sample_weights

[Scikit-learn-general] How to present parameter search results

Re: [Scikit-learn-general] Multilabel sequences of sequences considered harmful

17 matches

Site Navigation

Mail list logo

Footer information