On Sun, Jun 2, 2013 at 1:44 PM, Joel Nothman
wrote:
>
> From the sounds of things, it would be easier and probably more efficient
> to just always convert to dense binarized matrices, unless we have a good
> case for requiring sparse handling of labels. In particular, scipy.sparse
> does not curre
On Sun, Jun 2, 2013 at 1:35 PM, Mathieu Blondel wrote:
> Sorry for the late answer. It's hard for me to keep track of all the
> design-related discussions lately.
>
No worries. Thanks for the reply!
> For me, the advantages of the sequences of sequences format are:
> - they are quite natural fr
Hi Joel,
Sorry for the late answer. It's hard for me to keep track of all the
design-related discussions lately.
For me, the advantages of the sequences of sequences format are:
- they are quite natural from a user point of view (although, as you said,
an array of sets would be technically better
TL;DR: a list of `namedtuple`s is a poor solution for parameter search
results; here I suggest better alternatives.
I would like to draw some attention to #1787 which proposes that structured
arrays be used to return parameter search (e.g. GridSearchCV) results. A
few proposals have sought additio
Ahh and I'd forgotten that 1574 included support in grid search. I should
perhaps take a look at that.
On Sun, Jun 2, 2013 at 1:10 AM, Andreas Mueller wrote:
> On 06/01/2013 01:03 PM, Joel Nothman wrote:
> > I haven't seen any patch for this precisely, though it's a known issue
> > (even if it d
Andy, on reading your tip, and reflecting on what I do, I'm tempted to claim
that standardization is very important, regardless ...
Assume x0 is very important but has a tiny range (-1/100, 1/100) - all
other
variables being significantly larger in range.
Lars/Lasso will drop x0 until the end, a
Hi all,
I was wondering if anyone can point me to a tutorial on clustering text
documents, but then also displaying the results in a graph ? I see some
examples on clustering text documents, but I'd like to be able to visualize
the clusters.
Any help would be appreciated!
Thank you,
Harold
On 06/01/2013 08:30 PM, Christian Jauvin wrote:
> Hi,
>
> I asked a (perhaps too vague?) question about the use of Random
> Forests with a mix of categorical and lexical features on two ML
> forums (stats.SE and MetaOp), but since it has received no attention,
> I figured that it might work better
Hi Christian,
Some time ago I had similar problems. I.e., I wanted to use additional
features to my lexical features and simple concatanation didn't work
that well for me even though both feature sets on their own performed
pretty well.
You can follow the discussion about my problem here [1] i
Hi,
I asked a (perhaps too vague?) question about the use of Random
Forests with a mix of categorical and lexical features on two ML
forums (stats.SE and MetaOp), but since it has received no attention,
I figured that it might work better on this list (I'm using sklearn's
RF of course):
"I'm work
On 06/01/2013 07:51 PM, o m wrote:
> > The main question is, what is your definition of an "important" variable?
> >
> > Gilles
> That's a good question;-) Seriously.
>
> I would define it - with many closely related variables - as a member of a
> set that gives you the best predictability.
> LARS
> The main question is, what is your definition of an "important" variable?
>
> Gilles
That's a good question;-) Seriously.
I would define it - with many closely related variables - as a member
of a set that gives you the best predictability.
LARS and LASSO with cross validation provide a good s
Hi,
Unfortunately, statistics is not magic, and they are many situation in
which l1 recovery is not garanteed to work.
I cannot give magic answers, and I suggest that you think a lot about how
you can validate any findings using external sources. That said, I would
suggest, in general, to standar
On 06/01/2013 01:03 PM, Joel Nothman wrote:
> I haven't seen any patch for this precisely, though it's a known issue
> (even if it doesn't seem to be explicitly ticketed; it's closest to
> https://github.com/scikit-learn/scikit-learn/issues/1179). There are
> various tricky cases not currently s
Hi,
The main question is, what is your definition of an "important" variable?
Gilles
On 1 June 2013 14:22, o m wrote:
> I've been playing around with Lasso and Lars, but there's something that
> bothers me about standardization.
>
> If I don't standardize to N(0, 1), these procedures indicate t
I've been playing around with Lasso and Lars, but there's something that
bothers me about standardization.
If I don't standardize to N(0, 1), these procedures indicate that a certain
set of variables are the most important. Yet, if I standardize, I get a
completely different set of variables. As e
Updated, new link at:
https://docs.google.com/file/d/0B8FUzd86yYa1SWJXTlkyUF9idlU/edit?usp=sharing
Only the updates here have been changed.
On 27 May 2013 01:03, Lars Buitinck wrote:
> 2013/5/26 Robert Layton
>
>> I've updated the slides for my talk at pycon AU and put them on my Google
>> Dr
I haven't seen any patch for this precisely, though it's a known issue
(even if it doesn't seem to be explicitly ticketed; it's closest to
https://github.com/scikit-learn/scikit-learn/issues/1179). There are
various tricky cases not currently supported for which it's easiest to roll
your own search
Alexandre, my bad completely, and I apologize for taking up your time.
I was mixing up normalize with standardize, which is why none of it made
sense.
Thanks.
Best Regards.
--
Get 100% visibility into Java/.NET code with
hi,
try this:
---
from sklearn import datasets, linear_model
d = datasets.load_diabetes()
print linear_model.Lasso(normalize=True).fit(d['data'], d['target']).coef_
print linear_model.Lasso(normalize=False).fit(2. * d['data'], d['target']).coef_
returns:
[ 0. -0. 36
20 matches
Mail list logo