Re: [Scikit-learn-general] Combining Random Forests

2013-01-10 Thread Olivier Grisel
2013/1/10 Gael Varoquaux gael.varoqu...@normalesup.org: On Thu, Jan 10, 2013 at 03:57:23PM +1100, Juan Nunez-Iglesias wrote: More precisely, I think David wants a function that will take a set of RFs and return a new classifier object that does all the weighted averaging Andy suggested for

Re: [Scikit-learn-general] set sample weights in Pipeline?

2013-01-10 Thread Jaques Grobler
Hi there.. I'm not sure if you have been answered yet.. so perhaps I can help MultinomialNB has a parameter called `class_weight` which you can set at initialization. | class_weight : array-like, size=[n_classes,] | Prior probabilities of the classes. If specified the priors are not |

Re: [Scikit-learn-general] set sample weights in Pipeline?

2013-01-10 Thread Gilles Louppe
... or more simply: pipeline.fit(X, y, nb__sample_weight=sample_weight) On 10 January 2013 15:20, Gilles Louppe g.lou...@gmail.com wrote: Hi, I don't know how it interfaces with NLTK's SklearnClassifier, but if you can work your way using only Scikit-Learn for training, then can you pass

[Scikit-learn-general] PCA: first component too dominant?

2013-01-10 Thread Paul . Czodrowski
Dear SciKitters, when running a PCA on a rather small dataset, I end up in the situation that the first principal component is predominant. My dataset contains 694 samples with 177 features each. Here comes my code X = dataDescrs_array y = dataActs_array target_names = ['inactive','active']

Re: [Scikit-learn-general] PCA: first component too dominant?

2013-01-10 Thread Gael Varoquaux
On Thu, Jan 10, 2013 at 03:25:55PM +0100, paul.czodrow...@merckgroup.com wrote: I fear that I mixed up my syntax... Syntax looks good. If there is one largely predominant component in the data, you should be able to see it with your naked eye: all the features should have series that look

Re: [Scikit-learn-general] PCA: first component too dominant?

2013-01-10 Thread Paul . Czodrowski
Sorry for the confusion, guys. But I did not scale my features - they contain a wild mixture of values: - floats ranging from 0 to 1200 - floats ranging from 0 to 60 - integers between 0 and 25 and so on... My fault! BTW, I tried to re-run the IRIS example (

[Scikit-learn-general] table showing time complexity of algorithms implemented in scikit-learn?

2013-01-10 Thread Andrew Winterman
Is such a table available some place in the docs? Ideally it would have time complexity as a function of both number of samples and features per sample. Thank you, -- Andrew Winterman 714 362 6823 -- Master Visual

Re: [Scikit-learn-general] Multivariate Adaptive Regression Splines (MARS, aka earth)

2013-01-10 Thread Peter Prettenhofer
2013/1/10 Lars Buitinck l.j.buiti...@uva.nl: 2013/1/10 Jason Rudy ja...@clinicast.net I'm working on an implementation of MARS [1] that I'd like to share, and it seems like sklearn would be a good place for it. The MARS algorithm is currently available as part of the R package earth and is

Re: [Scikit-learn-general] table showing time complexity of algorithms implemented in scikit-learn?

2013-01-10 Thread Olivier Grisel
2013/1/10 Andrew Winterman andywinter...@gmail.com: Is such a table available some place in the docs? Ideally it would have time complexity as a function of both number of samples and features per sample. Nope. That would be a great contribution! -- Olivier http://twitter.com/ogrisel -

Re: [Scikit-learn-general] table showing time complexity of algorithms implemented in scikit-learn?

2013-01-10 Thread Ronnie Ghose
+1 for the contribution. I was looking for this quite frequently. On Thu, Jan 10, 2013 at 12:55 PM, Olivier Grisel olivier.gri...@ensta.orgwrote: 2013/1/10 Andrew Winterman andywinter...@gmail.com: Is such a table available some place in the docs? Ideally it would have time complexity as

Re: [Scikit-learn-general] table showing time complexity of algorithms implemented in scikit-learn?

2013-01-10 Thread Mathieu Blondel
On Fri, Jan 11, 2013 at 2:40 AM, Andrew Winterman andywinter...@gmail.com wrote: Is such a table available some place in the docs? Ideally it would have time complexity as a function of both number of samples and features per sample. I think it would fit in this PR:

Re: [Scikit-learn-general] Combining Random Forests

2013-01-10 Thread David Broyles
Thanks for the help, guys. Indeed it's easy enough to implement a class for combining the classifiers in a model-specific way. Thanks for the note on the oob-score! On Thu, Jan 10, 2013 at 2:28 AM, Olivier Grisel olivier.gri...@ensta.orgwrote: 2013/1/10 Gael Varoquaux

Re: [Scikit-learn-general] table showing time complexity of algorithms implemented in scikit-learn?

2013-01-10 Thread Andrew Winterman
I agree, I'll fork that and do some work on it if I have time this weekend. Should the classifiers docstrings also note their time complexity? Seems like something you'd want to know... On Thu, Jan 10, 2013 at 10:12 AM, Mathieu Blondel math...@mblondel.org wrote: On Fri, Jan 11, 2013 at 2:40

Re: [Scikit-learn-general] table showing time complexity of algorithms implemented in scikit-learn?

2013-01-10 Thread Ronnie Ghose
yes please. I was looking all over the place for these the last week or so On Thu, Jan 10, 2013 at 1:33 PM, Andrew Winterman andywinter...@gmail.comwrote: I agree, I'll fork that and do some work on it if I have time this weekend. Should the classifiers docstrings also note their time

Re: [Scikit-learn-general] table showing time complexity of algorithms implemented in scikit-learn?

2013-01-10 Thread Vlad Niculae
PR #804 had some comments about generating the tables automatically, which would be nice. How about a consistently structured `Complexity` section to the docstrings, and use it to populate the table? On Thu, Jan 10, 2013 at 6:38 PM, Ronnie Ghose ronnie.gh...@gmail.comwrote: yes please. I was

Re: [Scikit-learn-general] table showing time complexity of algorithms implemented in scikit-learn?

2013-01-10 Thread Andrew Winterman
That seems to make sense to me, especially since we'll want to analyze the algorithm as written. On Thu, Jan 10, 2013 at 10:46 AM, Vlad Niculae zephy...@gmail.com wrote: PR #804 had some comments about generating the tables automatically, which would be nice. How about a consistently

Re: [Scikit-learn-general] PCA: first component too dominant?

2013-01-10 Thread Andreas Mueller
This is a general problem if the features are not in the same units. As you saw, PCA assumes that features all have equal importance. If you want all to have the same weight, you have to rescale (using StandardScaler for example). The problem is: it is not clear whether this is the right thing

Re: [Scikit-learn-general] Multivariate Adaptive Regression Splines (MARS, aka earth)

2013-01-10 Thread Andreas Mueller
Hi Jason. Thanks for wanting to contribute MARS to sklearn. There is even an issue requesting the feature ;) https://github.com/scikit-learn/scikit-learn/issues/845 I think it would be great addition. You should be aware of the fact that contributing to sklearn is a bit more than just

Re: [Scikit-learn-general] table showing time complexity of algorithms implemented in scikit-learn?

2013-01-10 Thread Andreas Mueller
On 01/10/2013 07:46 PM, Vlad Niculae wrote: PR #804 had some comments about generating the tables automatically, which would be nice. How about a consistently structured `Complexity` section to the docstrings, and use it to populate the table? -1 That would mean hacking the numpy

[Scikit-learn-general] Roadmap / Scope

2013-01-10 Thread Andreas Mueller
Hi everybody. Long and general mail coming on. TL;DR version: do we want to plan for the future? Today I read this blog post on the scope of open source projects: http://brianegranger.com/?p=249 It made me dig up an old mail draft I wrote after reading a post by Gael:

Re: [Scikit-learn-general] Roadmap / Scope

2013-01-10 Thread Andrew Winterman
I am +1 on a plan, since it's helpful for newbies like myself in orienting themselves, and helps focus developer effort. That said, the breadth of this project is pretty amazing, and it's probably a good idea to keep classifiers which are up-and-coming in academia available. I guess I'm voting

Re: [Scikit-learn-general] Roadmap / Scope

2013-01-10 Thread Robert Layton
On 11 January 2013 10:21, Lars Buitinck l.j.buiti...@uva.nl wrote: 2013/1/10 Andreas Mueller amuel...@ais.uni-bonn.de: I wanted to ask: should we try to make plans? We get a lot of PRs and have more and more contributors and I think it might be nice if we had some form of road map to give

Re: [Scikit-learn-general] Roadmap / Scope

2013-01-10 Thread Olivier Grisel
2013/1/11 Lars Buitinck l.j.buiti...@uva.nl: 2013/1/10 Andreas Mueller amuel...@ais.uni-bonn.de: I wanted to ask: should we try to make plans? We get a lot of PRs and have more and more contributors and I think it might be nice if we had some form of road map to give everything a bit more

Re: [Scikit-learn-general] Roadmap / Scope

2013-01-10 Thread Jake Vanderplas
Hi all, One component of a good roadmap would be to make sure we emphasize good implementations of fundamental ML algorithms. One area I'd like to work on is density estimation: KDE in particular is an important component of a wide variety of algorithms, and there is not (to my knowledge) a

Re: [Scikit-learn-general] Roadmap / Scope

2013-01-10 Thread Olivier Grisel
2013/1/11 Vlad Niculae zephy...@gmail.com: I completely agree with everyone regarding 1.0 and I really think we should make a clear list of issues for this (just saying API is pretty vague). However there is life after the 1.0, and I think Andy's message was more about that kind of long-term

Re: [Scikit-learn-general] table showing time complexity of algorithms implemented in scikit-learn?

2013-01-10 Thread Gael Varoquaux
On Thu, Jan 10, 2013 at 10:33:05AM -0800, Andrew Winterman wrote: Should the classifiers docstrings also note their time complexity? I think that it would be good. Thanks, G -- Master HTML5, CSS3, ASP.NET, MVC, AJAX,