[Scikit-learn-general] My personal suggestion regarding topics for GSoC (and my official application :-) )

2015-03-06 Thread Luca Puggini
Thanks a lot for the material provided on randomized pca and random forest it would for sure help me in my research. I talked with my supervisor and he said that I am free to apply for this summer project. I used quiet a lot GAM and I did some work related to high dimensional fault detection

[Scikit-learn-general] My personal suggestion regarding topics for GSoC

2015-03-06 Thread Luca Puggini
Hi, thanks a lot I was not aware of the randomized PCA. Regarding random forest is there any paper or resource that you can suggest me? I tried to use the forest with max_features=1 but it was still biased. I did not try with a limited depth. Thanks a lot, Luca

Re: [Scikit-learn-general] My personal suggestion regarding topics for GSoC

2015-03-06 Thread Gilles Louppe
Hi Luca, On 6 March 2015 at 11:09, Luca Puggini lucapug...@gmail.com wrote: Hi, It seems to me that you are discussing topics that can be introduced in sklearn with GSoC. I use sklearn quiet a lot and there are a couple of things that I really miss in this library: 1- Nipals PCA. The

[Scikit-learn-general] My personal suggestion regarding topics for GSoC

2015-03-06 Thread Luca Puggini
After a little simulated study I agree with the previous comment. With the Extra trees classifier it is possible to reduce the bias. Despite that the result is still biased. Here the sample code: http://jpst.it/x9Mv Here a possible reference: http://www.biomedcentral.com/1471-2105/8/25 Please

Re: [Scikit-learn-general] My personal suggestion regarding topics for GSoC

2015-03-06 Thread Gilles Louppe
Yes, in fact I did something similar in my thesis. See section 7.2 for a discussion about this. Figure 7.5 is similar to what you describe in your sample code. By varying the depth, you can basically control the bias. http://orbi.ulg.ac.be/bitstream/2268/170309/1/thesis.pdf On 6 March 2015 at

Re: [Scikit-learn-general] feature names after OneHotEncoder

2015-03-06 Thread Andreas Mueller
I thought you just wanted to mask some features, but I guess that was not you intend. You could make your code robust to future changes by using the feature_indices_ attribute, while assuming that the result first has all categorical, and then all numerical values. Btw, you might have an easier

Re: [Scikit-learn-general] GSoC2015 topics

2015-03-06 Thread Andreas Mueller
Thanks for trying to make some time :) On 03/06/2015 03:42 AM, Arnaud Joly wrote: Hi, Sadly this year, I won’t have time for mentoring. However, I will try to find some spare time for reviewing! Best regards, Arnaud On 05 Mar 2015, at 22:43, Andreas Mueller t3k...@gmail.com

Re: [Scikit-learn-general] My personal suggestion regarding topics for GSoC

2015-03-06 Thread Michael Eickenberg
On Fri, Mar 6, 2015 at 11:09 AM, Luca Puggini lucapug...@gmail.com wrote: Hi, It seems to me that you are discussing topics that can be introduced in sklearn with GSoC. I use sklearn quiet a lot and there are a couple of things that I really miss in this library: 1- Nipals PCA. The

Re: [Scikit-learn-general] feature names after OneHotEncoder

2015-03-06 Thread Eustache DIEMERT
Well after a bit of tinkering it seems that OneHotEncoder has simple rules to affect columns to the output: 1) first do the categorical, in the order given by the argument, creating columns as needed by the values 2) then the numerical So a piece of code like that seems to work: fn = [] fc =

[Scikit-learn-general] My personal suggestion regarding topics for GSoC

2015-03-06 Thread Luca Puggini
Hi, It seems to me that you are discussing topics that can be introduced in sklearn with GSoC. I use sklearn quiet a lot and there are a couple of things that I really miss in this library: 1- Nipals PCA. The current version of PCA is too low for high dimensional dataset. Suppose to have p=1

Re: [Scikit-learn-general] feature names after OneHotEncoder

2015-03-06 Thread Eustache DIEMERT
2015-03-05 16:57 GMT+01:00 Andy t3k...@gmail.com: Well, the columns after the OneHotEncoder correspond to feature values, not feature names, right? Well, for the categorical ones this is right, except that not all my features are categorical (hence the categorical_features=...) and they are

Re: [Scikit-learn-general] GSoC2015 topics

2015-03-06 Thread Arnaud Joly
Hi, Sadly this year, I won’t have time for mentoring. However, I will try to find some spare time for reviewing! Best regards, Arnaud On 05 Mar 2015, at 22:43, Andreas Mueller t3k...@gmail.com wrote: Hi Wei Xue. Thanks for your interest. For the GMM project being familiar with DPGMM and