Re: [Scikit-learn-general] Contributing to scikit-learn with a re-implementation of a Random Forest based iterative feature selection method

2016-01-31 Thread Daniel Homola
Dear all, I migrated my Python implementation of the Boruta algorithm to: https://github.com/danielhomola/boruta_py I also implemented 3 mutual information based feature selection (JMI, JMIM, MRMR) methods and wrapped them up in scikit-learn like interface:

Re: [Scikit-learn-general] Contributing to Scikit-Learn(GSOC)

2016-01-11 Thread Andy
Hi Imaculate. We have found that in recent years, we were quite limited in terms of mentoring resources. Many of the core-devs are very busy, and we already have many contributions waiting for reviews. If you are interested in working on scikit-learn as part of GSoC, I suggest you start

Re: [Scikit-learn-general] Contributing to scikit-learn

2016-01-10 Thread Raghav R V
Hi Antoine, Welcome to scikit-learn! Please see if you find this issue interesting to start with - https://github.com/scikit-learn/scikit-learn/issues/6149 Thanks On Sat, Jan 9, 2016 at 6:42 PM, WENDLINGER Antoine < antoinewendlin...@gmail.com> wrote: > Hi everyone, > > Let me introduce

[Scikit-learn-general] Contributing to scikit-learn

2016-01-09 Thread WENDLINGER Antoine
Hi everyone, Let me introduce myself : my name is Antoine, I'm a 21-years-old French student in Computer Science, and would love to contribute to scikit-learn. This would be my first contribution to an open-source project so I'm a bit lost and do not really know where to start. I read the pages

[Scikit-learn-general] Contributing to Scikit-Learn(GSOC)

2016-01-09 Thread Imaculate Mosha
Hi all, I would like to contribute to scikit-learn even better for Google Summer of Code.I'm a third year undergrad student. I did an introductory course to Machine Learning but after learning Scikit-Learn I realised we only scratched the surface, we did neural networks, reinforcement learning

Re: [Scikit-learn-general] Contributing to scikit-learn

2015-09-10 Thread Rohit Shinde
Hi Gael, Heeding your advice, I was looking over the possible bugs and I have decided to solve this one: https://github.com/scikit-learn/scikit-learn/issues/5229. Any pointers on how to approach this one? Thanks, Rohit. On Thu, Sep 10, 2015 at 10:27 AM, Gael Varoquaux <

Re: [Scikit-learn-general] Contributing to scikit-learn

2015-09-09 Thread Rohit Shinde
Hello everyone, I have built scikit-learn and I am ready to start coding. Can I get some pointers on how I could start contributing to the projects I mentioned in the earlier mail? Thanks, Rohit. On Mon, Sep 7, 2015 at 11:50 AM, Rohit Shinde wrote: > Hi Jacob, > >

Re: [Scikit-learn-general] Contributing to scikit-learn

2015-09-09 Thread Gael Varoquaux
I would strongly recommend to start with something easier, like issues labelled 'easy'. Starting with such a big project is most likely going to lead to you approaching the project in a way that is not well adapted to scikit-learn, and thus code that does not get merged. Cheers, Gaël On Thu,

Re: [Scikit-learn-general] Contributing to scikit-learn

2015-09-07 Thread Rohit Shinde
Hi Jacob, I am interested in Global optimization based hyperparameter optimization and Generalised Additive Models. However, I don't know what kind of background would be needed and if mine would be sufficient for it. I would like to know the prerequisites for it. On Sun, Sep 6, 2015 at 9:58 PM,

Re: [Scikit-learn-general] Contributing to scikit-learn

2015-09-06 Thread Jacob Schreiber
Hi Rohit I'm glad you want to contribute to scikit-learn! Which idea were you interested in working on? The metric learning and GMM code is currently being worked on by GSOC students AFAIK. Jacob On Sun, Sep 6, 2015 at 8:18 AM, Rohit Shinde wrote: > Hello

Re: [Scikit-learn-general] Contributing to scikit-learn with a re-implementation of a Random Forest based iterative feature selection method

2015-05-08 Thread Andreas Mueller
Btw, an example that compares this against existing feature selection methods that explains differences and advantages would help users and convince us to merge ;) On 05/08/2015 02:34 PM, Daniel Homola wrote: Hi all, I wrote a couple of weeks ago about implementing the Boruta all-relevant

Re: [Scikit-learn-general] Contributing to scikit-learn with a re-implementation of a Random Forest based iterative feature selection method

2015-05-08 Thread Andreas Mueller
Hi Daniel. That looks cool. Can you do a github pull request? See the contributor docs: http://scikit-learn.org/dev/developers/index.html Thanks, Andy On 05/08/2015 02:34 PM, Daniel Homola wrote: Hi all, I wrote a couple of weeks ago about implementing the Boruta all-relevant feature

Re: [Scikit-learn-general] Contributing to scikit-learn with a re-implementation of a Random Forest based iterative feature selection method

2015-05-08 Thread Daniel Homola
Hi Andy, Thanks! Will definitely do a github pull request once Miron confirmed he benchmarked my implementation by running it on the datasets the method was published with. I wrote a blog post about it, which explains the differences but in a quite casual an non rigorous way:

Re: [Scikit-learn-general] Contributing to scikit-learn with a re-implementation of a Random Forest based iterative feature selection method

2015-05-08 Thread Daniel Homola
Hi all, I wrote a couple of weeks ago about implementing the Boruta all-relevant feature selection method algorithm in Python.. I think it's ready to go now. I wrote fit, transform and fit_transform methods for it to make it sklearn like. Here it is:

Re: [Scikit-learn-general] Contributing to scikit-learn with a re-implementation of a Random Forest based iterative feature selection method

2015-05-08 Thread Andreas Mueller
It doesn't need to be super technical, and we try to keep the user guide easy to understand. No bonus points for unnecessary latex ;) The example should be as illustrative and fair as possible, and built-in datasets are preferred. It shouldn't be to heavy-weight, though. If you like, you can

Re: [Scikit-learn-general] Contributing to scikit-learn with a re-implementation of a Random Forest based iterative feature selection method

2015-04-17 Thread Gilles Louppe
Hi, In general, I agree that we should at least add a way to compute feature importances using permutations. This is an alternative, yet standard, way to do it in comparison to what we do (mean decrease of impurity, which is also standard). Assuming we provide permutation importances as a

Re: [Scikit-learn-general] Contributing to scikit-learn with a re-implementation of a Random Forest based iterative feature selection method

2015-04-15 Thread Andreas Mueller
Hi Daniel. That sounds potentially interesting. Is there a widely cited paper for this? I didn't read the paper, but it looks very similar to RFE(RandomForestClassifier()). Is it qualitatively different from that? Does it use a different feature importance? btw: your mail is flagged as spam

Re: [Scikit-learn-general] Contributing to scikit-learn with a re-implementation of a Random Forest based iterative feature selection method

2015-04-15 Thread Daniel Homola
Hi Andy, This is the paper: http://www.jstatsoft.org/v36/i11/ which was cited 79 times according to Google Scholar. Regarding your second point, the first 3 questions of the FAQ on the Boruta website answers it I guess.. https://m2.icm.edu.pl/boruta/ 1. *So, what's so special about

Re: [Scikit-learn-general] Contributing to scikit-learn with a re-implementation of a Random Forest based iterative feature selection method

2015-04-15 Thread Satrajit Ghosh
hi andy and dan, i've been using a similar heuristic with extra trees quite effectively. i have to look at the details of this R package and the paper, but in my case i add a feature that has very low correlation with my target class/value (depending on the problem) and choose features that have

Re: [Scikit-learn-general] Contributing to scikit-learn with a re-implementation of a Random Forest based iterative feature selection method

2015-04-15 Thread Andreas Mueller
Hi Dan. I saw that paper, but it is not well-cited. My question is more how different this is from what we already have. So it looks like some (5) random control features are added and the features importances are compared against the control. The question is whether the feature importance

Re: [Scikit-learn-general] Contributing to scikit-learn with a re-implementation of a Random Forest based iterative feature selection method

2015-04-15 Thread Daniel Homola
Hi Andy, So at each iteration the x predictor matrix (n by m) is practically copied and each column is shuffled in the copied version. This shuffled matrix is then copied next to the original (n by 2m) and fed into the RF, to get the feature importances. Also at the start of the method, a

[Scikit-learn-general] Contributing to scikit-learn with a re-implementation of a Random Forest based iterative feature selection method

2015-04-15 Thread Daniel Homola
Hi all, I needed a multivariate feature selection method for my work. As I'm working with biological/medical data, where n p or even n p I started to read up on Random Forest based methods, as in my limited understanding RF copes pretty well with this suboptimal situation. I came across

[Scikit-learn-general] Contributing to Scikit

2014-02-02 Thread Jitesh Khandelwal
Hi, I have used scikit-learn for academic purposes and I like it very much. I want to contribute to it. I have gone through the developers documentation and setup my local working directory. As suggested in the developers documentation, it did look for some EASY tagged issues in the issue

Re: [Scikit-learn-general] Contributing to Scikit

2014-02-02 Thread Olivier Grisel
2014/2/2 Jitesh Khandelwal jk231...@gmail.com: Hi, I have used scikit-learn for academic purposes and I like it very much. I want to contribute to it. I have gone through the developers documentation and setup my local working directory. As suggested in the developers documentation, it did

Re: [Scikit-learn-general] Contributing to Scikit

2014-02-02 Thread Andy
On 02/02/2014 12:06 PM, Olivier Grisel wrote: Note: the name of the project is scikit-learn, not scikit or SciKit nor sci-kit learn. Cheers, I should make this my signature from now on. Also including pronounciation (sy-kit learn)

Re: [Scikit-learn-general] contributing to scikit

2014-02-02 Thread Andy
On 02/01/2014 10:42 PM, Robert Layton wrote: Finally, when choosing classifiers, it's our preference to focus on heavily used classifiers, rather than state of the art. Many of the core devs (and myself) have coded classifiers that are scikit-learn compatible, but not in the library

Re: [Scikit-learn-general] Contributing to Scikit

2014-02-02 Thread Vlad Niculae
I've heard stchee-kit once, along with stchee-pee and num-pee. Vlad On Sun Feb 2 18:39:58 2014, Hadayat Seddiqi wrote: i always said skikit On Sun, Feb 2, 2014 at 12:20 PM, Andy t3k...@gmail.com mailto:t3k...@gmail.com wrote: On 02/02/2014 12:06 PM, Olivier Grisel wrote: Note:

Re: [Scikit-learn-general] Contributing to Scikit

2014-02-02 Thread Andy
On 02/02/2014 06:39 PM, Hadayat Seddiqi wrote: i always said skikit Many people do ;) sci as in science =) -- WatchGuard Dimension instantly turns raw network data into actionable security intelligence. It gives you

Re: [Scikit-learn-general] Contributing to Scikit

2014-02-02 Thread Andy
On 02/02/2014 07:41 PM, Vlad Niculae wrote: I've heard stchee-kit once, along with stchee-pee and num-pee. We should have an FAQ. It should include What is the project name? scikit-learn, not scikit or SciKit nor sci-kit learn. How do you pronounce the project name? sy-kit learn. sci stands

Re: [Scikit-learn-general] Contributing to Scikit

2014-02-02 Thread Juan Nunez-Iglesias
On Mon, Feb 3, 2014 at 5:49 AM, Andy t3k...@gmail.com wrote: We should have an FAQ. It should include What is the project name? scikit-learn, not scikit or SciKit nor sci-kit learn. How do you pronounce the project name? sy-kit learn. sci stands for science! Do you want to add this

Re: [Scikit-learn-general] contributing to scikit

2014-02-01 Thread Joseph Perla
Is this the right place to ask? I'm just going to send in a pull request if nobody has any suggestions. j On Fri, Jan 31, 2014 at 7:10 PM, Joseph Perla jos...@jperla.com wrote: I love SciKit and I'm going to contribute an SGD classifier for semi-supervised problems. I already read through all

Re: [Scikit-learn-general] contributing to scikit

2014-02-01 Thread Robert Layton
Hi Joseph, In theory, you should be able to take any classifier in sklearn and base your implementation off that. That said, there are a few caveats. Some classifiers are older, before coding was more formalised. Others have a lot of cython code hooks, and can be difficult to read. That all said,

[Scikit-learn-general] contributing to scikit

2014-01-31 Thread Joseph Perla
I love SciKit and I'm going to contribute an SGD classifier for semi-supervised problems. I already read through all the contributor documentation and I've read many of the docs. I'm asking the list if I should model my code off of the style/quality of the SGDClassifier class or if there is a

Re: [Scikit-learn-general] Contributing to scikit-learn

2013-10-14 Thread Olivier Grisel
Please have a look at the contributors guide: http://scikit-learn.org/stable/developers/#contributing-code In particular this doc mentions [Easy] tagged issues: https://github.com/scikit-learn/scikit-learn/issues?labels=Easy But in general the best way to contribute is to actually use the

[Scikit-learn-general] Contributing to Scikit-Learn

2013-10-02 Thread Manoj Kumar
Hi, I am Manoj Kumar, a junior undergrad from Birla Institute of Technology and Science. I've just completed my Google Summer of Code under SymPy. So I have a good programming background in Python. Regarding my Machine Learning background, I've taken an informal Coursera course, under Andrew

Re: [Scikit-learn-general] Contributing to Scikit-Learn

2013-10-02 Thread Olivier Grisel
2013/10/2 Manoj Kumar manojkumarsivaraj...@gmail.com: Hi, I am Manoj Kumar, a junior undergrad from Birla Institute of Technology and Science. I've just completed my Google Summer of Code under SymPy. So I have a good programming background in Python. Regarding my Machine Learning

[Scikit-learn-general] contributing to scikit-learn

2013-08-01 Thread Eustache DIEMERT
Hi list, Not so long ago I had my first PR merged into sklearn. Overall it was a very cool experience, thanks to many of you :) Here is a little post that tells the story : http://stochastics.komodo.re/posts/contributing-to-sklearn.html Cheers, Eustache

Re: [Scikit-learn-general] contributing to scikit-learn

2013-08-01 Thread Gael Varoquaux
On Thu, Aug 01, 2013 at 03:40:05PM +0200, Eustache DIEMERT wrote: Here is a little post that tells the story :  http://stochastics.komodo.re/posts/contributing-to-sklearn.html Cool! Glad you enjoyed it. I tweeted you :) https://twitter.com/GaelVaroquaux/status/362934648302616576 Thanks a lot

Re: [Scikit-learn-general] contributing to scikit-learn

2013-08-01 Thread Andreas Mueller
Hey Eustache. Nice write-up. So who are the tinkerers and who are the prophets ? ;) Cheers, Andy On 08/01/2013 03:40 PM, Eustache DIEMERT wrote: Hi list, Not so long ago I had my first PR merged into sklearn. Overall it was a very cool experience, thanks to many of you :) Here is a little

Re: [Scikit-learn-general] Contributing to scikit-learn

2012-06-07 Thread Andreas Mueller
not a priority. Ad 2) Good idea David -- Forwarded message -- From: *Vandana Bachani* vandana@gmail.com mailto:vandana@gmail.com Date: Tue, Jun 5, 2012 at 6:59 PM Subject: Re: [Scikit-learn-general] Contributing to scikit-learn To: h4wk...@gmail.com mailto:h4wk...@gmail.com

Re: [Scikit-learn-general] Contributing to scikit-learn

2012-06-07 Thread LI Wei
that it's not a priority. Ad 2) Good idea David -- Forwarded message -- From: Vandana Bachani vandana@gmail.com Date: Tue, Jun 5, 2012 at 6:59 PM Subject: Re: [Scikit-learn-general] Contributing to scikit-learn To: h4wk...@gmail.com Hi David, I think we can add

Re: [Scikit-learn-general] Contributing to scikit-learn

2012-06-07 Thread David Warde-Farley
On Thu, Jun 07, 2012 at 03:09:11PM +, LI Wei wrote: Intuitively maybe we can set the missing values using the average over the nearest neighbors calculated using these existing features? Not sure whether it is the correct way to do it :-) That's known as imputation (or in a particular

Re: [Scikit-learn-general] Contributing to scikit-learn

2012-06-07 Thread Vandana Bachani
is one hidden layer, it's certainly possible to add the possibility, but I think we decided that it's not a priority. Ad 2) Good idea David -- Forwarded message -- From: Vandana Bachani vandana@gmail.com Date: Tue, Jun 5, 2012 at 6:59 PM Subject: Re: [Scikit-learn-general

Re: [Scikit-learn-general] Contributing to scikit-learn

2012-06-07 Thread David Warde-Farley
On Thu, Jun 07, 2012 at 10:40:32AM -0700, Vandana Bachani wrote: Hi Andreas, I agree missing data is not specific to MLP. We dealt it with pretty simple as u mentioned by taking mean over the dataset for continuous-valued attributes. Another thing that I feel is not adequately explored in

Re: [Scikit-learn-general] Contributing to scikit-learn

2012-06-07 Thread Vandana Bachani
Hi David, Yes I use one-hot encoding, but my understanding of one-hot encoding says that each discrete attribute can be represented as a bit pattern. So the node corresponding to that input attribute is actually a set of nodes representing that bit pattern. An unknown just means that the bit for

Re: [Scikit-learn-general] Contributing to scikit-learn

2012-06-06 Thread David Warde-Farley
On 2012-06-05, at 1:51 PM, David Marek h4wk...@gmail.com wrote: 1) Afaik all you need is one hidden layer, The universal approximator theorem says that any continuous function can be approximated arbitrarily well if you have one hidden layer with enough hidden units, but it says nothing about

Re: [Scikit-learn-general] Contributing to scikit-learn

2012-06-06 Thread xinfan meng
Deep learning literature said that the more layers you have, the less hidden nodes in one layer you need. But I agree one hidden layer would be sufficient now. On Thu, Jun 7, 2012 at 11:12 AM, David Warde-Farley warde...@iro.umontreal.ca wrote: On 2012-06-05, at 1:51 PM, David Marek

Re: [Scikit-learn-general] Contributing to scikit-learn

2012-06-05 Thread Shreyas Karkhedkar
Hi Gael, Thanks for the response. Vandana and I are really excited about contributing to scikits. I will go through the GMM code and will put in suggestions for refactoring - and if possible implement some new features. Once again, on behalf of Vandana and I, thanks for the reply. Looking

Re: [Scikit-learn-general] Contributing to scikit-learn

2012-06-05 Thread Andreas Mueller
Hi Shreyas. In particular, the VBGMM and DPGMM might need some attention. Once you are a bit familiar with the GMM code, you could have a look at issue 393 https://github.com/scikit-learn/scikit-learn/issues/393. Any help would be much appreciated :) Cheers, Andy Am 05.06.2012 08:07, schrieb

Re: [Scikit-learn-general] Contributing to scikit-learn

2012-06-04 Thread Gael Varoquaux
Hi Vandana and Shreyas, Welcome and thanks for the interest, With regards to MLP (multi-layer perceptrons), David Marek is right now working on such feature: https://github.com/davidmarek/scikit-learn/tree/gsoc_mlp you can probably pitch in with him: 4 eyes are always better than only 2. With