Dear all,
I migrated my Python implementation of the Boruta algorithm to:
https://github.com/danielhomola/boruta_py
I also implemented 3 mutual information based feature selection (JMI,
JMIM, MRMR) methods and wrapped them up in scikit-learn like interface:
Hi Imaculate.
We have found that in recent years, we were quite limited in terms of
mentoring resources.
Many of the core-devs are very busy, and we already have many
contributions waiting for reviews.
If you are interested in working on scikit-learn as part of GSoC, I
suggest you start
Hi Antoine,
Welcome to scikit-learn! Please see if you find this issue interesting to
start with - https://github.com/scikit-learn/scikit-learn/issues/6149
Thanks
On Sat, Jan 9, 2016 at 6:42 PM, WENDLINGER Antoine <
antoinewendlin...@gmail.com> wrote:
> Hi everyone,
>
> Let me introduce
Hi everyone,
Let me introduce myself : my name is Antoine, I'm a 21-years-old French
student in Computer Science, and would love to contribute to scikit-learn.
This would be my first contribution to an open-source project so I'm a bit
lost and do not really know where to start. I read the pages
Hi all,
I would like to contribute to scikit-learn even better for Google Summer of
Code.I'm a third year undergrad student. I did an introductory course to
Machine Learning but after learning Scikit-Learn I realised we only scratched
the surface, we did neural networks, reinforcement learning
Hi Gael,
Heeding your advice, I was looking over the possible bugs and I have
decided to solve this one:
https://github.com/scikit-learn/scikit-learn/issues/5229.
Any pointers on how to approach this one?
Thanks,
Rohit.
On Thu, Sep 10, 2015 at 10:27 AM, Gael Varoquaux <
Hello everyone,
I have built scikit-learn and I am ready to start coding. Can I get some
pointers on how I could start contributing to the projects I mentioned in
the earlier mail?
Thanks,
Rohit.
On Mon, Sep 7, 2015 at 11:50 AM, Rohit Shinde
wrote:
> Hi Jacob,
>
>
I would strongly recommend to start with something easier, like issues
labelled 'easy'. Starting with such a big project is most likely going to
lead to you approaching the project in a way that is not well adapted to
scikit-learn, and thus code that does not get merged.
Cheers,
Gaël
On Thu,
Hi Jacob,
I am interested in Global optimization based hyperparameter optimization
and Generalised Additive Models. However, I don't know what kind of
background would be needed and if mine would be sufficient for it. I would
like to know the prerequisites for it.
On Sun, Sep 6, 2015 at 9:58 PM,
Hi Rohit
I'm glad you want to contribute to scikit-learn! Which idea were you
interested in working on? The metric learning and GMM code is currently
being worked on by GSOC students AFAIK.
Jacob
On Sun, Sep 6, 2015 at 8:18 AM, Rohit Shinde
wrote:
> Hello
Btw, an example that compares this against existing feature selection
methods that explains differences and advantages would help users and
convince us to merge ;)
On 05/08/2015 02:34 PM, Daniel Homola wrote:
Hi all,
I wrote a couple of weeks ago about implementing the Boruta
all-relevant
Hi Daniel.
That looks cool.
Can you do a github pull request?
See the contributor docs:
http://scikit-learn.org/dev/developers/index.html
Thanks,
Andy
On 05/08/2015 02:34 PM, Daniel Homola wrote:
Hi all,
I wrote a couple of weeks ago about implementing the Boruta
all-relevant feature
Hi Andy,
Thanks! Will definitely do a github pull request once Miron confirmed he
benchmarked my implementation by running it on the datasets the method
was published with.
I wrote a blog post about it, which explains the differences but in a
quite casual an non rigorous way:
Hi all,
I wrote a couple of weeks ago about implementing the Boruta all-relevant
feature selection method algorithm in Python..
I think it's ready to go now. I wrote fit, transform and fit_transform
methods for it to make it sklearn like.
Here it is:
It doesn't need to be super technical, and we try to keep the user guide
easy to understand. No bonus points for unnecessary latex ;)
The example should be as illustrative and fair as possible, and built-in
datasets are preferred. It shouldn't be to heavy-weight, though.
If you like, you can
Hi,
In general, I agree that we should at least add a way to compute feature
importances using permutations. This is an alternative, yet standard, way
to do it in comparison to what we do (mean decrease of impurity, which is
also standard).
Assuming we provide permutation importances as a
Hi Daniel.
That sounds potentially interesting.
Is there a widely cited paper for this?
I didn't read the paper, but it looks very similar to
RFE(RandomForestClassifier()).
Is it qualitatively different from that? Does it use a different feature
importance?
btw: your mail is flagged as spam
Hi Andy,
This is the paper: http://www.jstatsoft.org/v36/i11/ which was cited 79
times according to Google Scholar.
Regarding your second point, the first 3 questions of the FAQ on the
Boruta website answers it I guess.. https://m2.icm.edu.pl/boruta/
1. *So, what's so special about
hi andy and dan,
i've been using a similar heuristic with extra trees quite effectively. i
have to look at the details of this R package and the paper, but in my case
i add a feature that has very low correlation with my target class/value
(depending on the problem) and choose features that have
Hi Dan.
I saw that paper, but it is not well-cited.
My question is more how different this is from what we already have.
So it looks like some (5) random control features are added and the
features importances are compared against the control.
The question is whether the feature importance
Hi Andy,
So at each iteration the x predictor matrix (n by m) is practically
copied and each column is shuffled in the copied version. This shuffled
matrix is then copied next to the original (n by 2m) and fed into the
RF, to get the feature importances.
Also at the start of the method, a
Hi all,
I needed a multivariate feature selection method for my work. As I'm
working with biological/medical data, where n p or even n p I
started to read up on Random Forest based methods, as in my limited
understanding RF copes pretty well with this suboptimal situation.
I came across
Hi,
I have used scikit-learn for academic purposes and I like it very much.
I want to contribute to it. I have gone through the developers
documentation and setup my local working directory.
As suggested in the developers documentation, it did look for some EASY
tagged issues in the issue
2014/2/2 Jitesh Khandelwal jk231...@gmail.com:
Hi,
I have used scikit-learn for academic purposes and I like it very much.
I want to contribute to it. I have gone through the developers documentation
and setup my local working directory.
As suggested in the developers documentation, it did
On 02/02/2014 12:06 PM, Olivier Grisel wrote:
Note: the name of the project is scikit-learn, not scikit or SciKit
nor sci-kit learn. Cheers,
I should make this my signature from now on. Also including
pronounciation (sy-kit learn)
On 02/01/2014 10:42 PM, Robert Layton wrote:
Finally, when choosing classifiers, it's our preference to focus on
heavily used classifiers, rather than state of the art. Many of the
core devs (and myself) have coded classifiers that are scikit-learn
compatible, but not in the library
I've heard stchee-kit once, along with stchee-pee and num-pee.
Vlad
On Sun Feb 2 18:39:58 2014, Hadayat Seddiqi wrote:
i always said skikit
On Sun, Feb 2, 2014 at 12:20 PM, Andy t3k...@gmail.com
mailto:t3k...@gmail.com wrote:
On 02/02/2014 12:06 PM, Olivier Grisel wrote:
Note:
On 02/02/2014 06:39 PM, Hadayat Seddiqi wrote:
i always said skikit
Many people do ;)
sci as in science =)
--
WatchGuard Dimension instantly turns raw network data into actionable
security intelligence. It gives you
On 02/02/2014 07:41 PM, Vlad Niculae wrote:
I've heard stchee-kit once, along with stchee-pee and num-pee.
We should have an FAQ.
It should include
What is the project name? scikit-learn, not scikit or SciKit nor sci-kit
learn.
How do you pronounce the project name? sy-kit learn. sci stands
On Mon, Feb 3, 2014 at 5:49 AM, Andy t3k...@gmail.com wrote:
We should have an FAQ.
It should include
What is the project name? scikit-learn, not scikit or SciKit nor sci-kit
learn.
How do you pronounce the project name? sy-kit learn. sci stands for
science!
Do you want to add this
Is this the right place to ask? I'm just going to send in a pull
request if nobody has any suggestions.
j
On Fri, Jan 31, 2014 at 7:10 PM, Joseph Perla jos...@jperla.com wrote:
I love SciKit and I'm going to contribute an SGD classifier for
semi-supervised problems.
I already read through all
Hi Joseph,
In theory, you should be able to take any classifier in sklearn and base
your implementation off that. That said, there are a few caveats. Some
classifiers are older, before coding was more formalised. Others have a lot
of cython code hooks, and can be difficult to read. That all said,
I love SciKit and I'm going to contribute an SGD classifier for
semi-supervised problems.
I already read through all the contributor documentation and I've read
many of the docs.
I'm asking the list if I should model my code off of the style/quality
of the SGDClassifier class or if there is a
Please have a look at the contributors guide:
http://scikit-learn.org/stable/developers/#contributing-code
In particular this doc mentions [Easy] tagged issues:
https://github.com/scikit-learn/scikit-learn/issues?labels=Easy
But in general the best way to contribute is to actually use the
Hi,
I am Manoj Kumar, a junior undergrad from Birla Institute of Technology and
Science.
I've just completed my Google Summer of Code under SymPy. So I have a good
programming background in Python.
Regarding my Machine Learning background, I've taken an informal Coursera
course, under Andrew
2013/10/2 Manoj Kumar manojkumarsivaraj...@gmail.com:
Hi,
I am Manoj Kumar, a junior undergrad from Birla Institute of Technology and
Science.
I've just completed my Google Summer of Code under SymPy. So I have a good
programming background in Python.
Regarding my Machine Learning
Hi list,
Not so long ago I had my first PR merged into sklearn.
Overall it was a very cool experience, thanks to many of you :)
Here is a little post that tells the story :
http://stochastics.komodo.re/posts/contributing-to-sklearn.html
Cheers,
Eustache
On Thu, Aug 01, 2013 at 03:40:05PM +0200, Eustache DIEMERT wrote:
Here is a little post that tells the story :
http://stochastics.komodo.re/posts/contributing-to-sklearn.html
Cool! Glad you enjoyed it. I tweeted you :)
https://twitter.com/GaelVaroquaux/status/362934648302616576
Thanks a lot
Hey Eustache.
Nice write-up.
So who are the tinkerers and who are the prophets ? ;)
Cheers,
Andy
On 08/01/2013 03:40 PM, Eustache DIEMERT wrote:
Hi list,
Not so long ago I had my first PR merged into sklearn.
Overall it was a very cool experience, thanks to many of you :)
Here is a little
not a priority.
Ad 2) Good idea
David
-- Forwarded message --
From: *Vandana Bachani* vandana@gmail.com
mailto:vandana@gmail.com
Date: Tue, Jun 5, 2012 at 6:59 PM
Subject: Re: [Scikit-learn-general] Contributing to scikit-learn
To: h4wk...@gmail.com mailto:h4wk...@gmail.com
that it's not a priority.
Ad 2) Good idea
David
-- Forwarded message --
From: Vandana Bachani vandana@gmail.com
Date: Tue, Jun 5, 2012 at 6:59 PM
Subject: Re: [Scikit-learn-general] Contributing to scikit-learn
To: h4wk...@gmail.com
Hi David,
I think we can add
On Thu, Jun 07, 2012 at 03:09:11PM +, LI Wei wrote:
Intuitively maybe we can set the missing values using the average over the
nearest neighbors calculated using these existing features? Not sure
whether it is the correct way to do it :-)
That's known as imputation (or in a particular
is one hidden layer, it's certainly possible to
add the possibility, but I think we decided that it's not a priority.
Ad 2) Good idea
David
-- Forwarded message --
From: Vandana Bachani vandana@gmail.com
Date: Tue, Jun 5, 2012 at 6:59 PM
Subject: Re: [Scikit-learn-general
On Thu, Jun 07, 2012 at 10:40:32AM -0700, Vandana Bachani wrote:
Hi Andreas,
I agree missing data is not specific to MLP.
We dealt it with pretty simple as u mentioned by taking mean over the
dataset for continuous-valued attributes.
Another thing that I feel is not adequately explored in
Hi David,
Yes I use one-hot encoding, but my understanding of one-hot encoding says
that each discrete attribute can be represented as a bit pattern. So the
node corresponding to that input attribute is actually a set of nodes
representing that bit pattern. An unknown just means that the bit for
On 2012-06-05, at 1:51 PM, David Marek h4wk...@gmail.com wrote:
1) Afaik all you need is one hidden layer,
The universal approximator theorem says that any continuous function can be
approximated arbitrarily well if you have one hidden layer with enough hidden
units, but it says nothing about
Deep learning literature said that the more layers you have, the less
hidden nodes in one layer you need. But I agree one hidden layer would be
sufficient now.
On Thu, Jun 7, 2012 at 11:12 AM, David Warde-Farley
warde...@iro.umontreal.ca wrote:
On 2012-06-05, at 1:51 PM, David Marek
Hi Gael,
Thanks for the response. Vandana and I are really excited about
contributing to scikits.
I will go through the GMM code and will put in suggestions for refactoring
- and if possible implement some new features.
Once again, on behalf of Vandana and I, thanks for the reply.
Looking
Hi Shreyas.
In particular, the VBGMM and DPGMM might need some attention.
Once you are a bit familiar with the GMM code, you could have a look
at issue 393 https://github.com/scikit-learn/scikit-learn/issues/393.
Any help would be much appreciated :)
Cheers,
Andy
Am 05.06.2012 08:07, schrieb
Hi Vandana and Shreyas,
Welcome and thanks for the interest,
With regards to MLP (multi-layer perceptrons), David Marek is right now
working on such feature:
https://github.com/davidmarek/scikit-learn/tree/gsoc_mlp
you can probably pitch in with him: 4 eyes are always better than only 2.
With
50 matches
Mail list logo