Dear all,

I migrated my Python implementation of the Boruta algorithm to:
https://github.com/danielhomola/boruta_py

I also implemented 3 mutual information based feature selection (JMI, JMIM, MRMR) methods and wrapped them up in scikit-learn like interface:
https://github.com/danielhomola/mifs

Could you please have a look at it? I'm writing a blog post demonstrating their strengths against existing methods. Would you require anything else to possibly include these in the next release?

Thanks a lot,
Daniel

On 05/08/2015 08:22 PM, Andreas Mueller wrote:
It doesn't need to be super technical, and we try to keep the user guide "easy to understand". No bonus points for unnecessary latex ;) The example should be as illustrative and fair as possible, and built-in datasets are preferred. It shouldn't be to heavy-weight, though. If you like, you can show off some plots in the PR, that is always very welcome.


On 05/08/2015 03:15 PM, Daniel Homola wrote:
Hi Andy,

Thanks! Will definitely do a github pull request once Miron confirmed he benchmarked my implementation by running it on the datasets the method was published with.

I wrote a blog post about it, which explains the differences but in a quite casual an non rigorous way:
http://danielhomola.com/2015/05/08/borutapy-an-all-relevant-feature-selection-method/

I guess a more technical write-up, with one of the built in datasets would be more useful for the sklearn audience.. I'm happy to do it if Miron says everything looks good.

Cheers,
Daniel

On 08/05/15 21:02, Andreas Mueller wrote:
Btw, an example that compares this against existing feature selection methods that explains differences and advantages would help users and convince us to merge ;)


On 05/08/2015 02:34 PM, Daniel Homola wrote:
Hi all,

I wrote a couple of weeks ago about implementing the Boruta all-relevant feature selection method algorithm in Python..

I think it's ready to go now. I wrote fit, transform and fit_transform methods for it to make it sklearn like.

Here it is:
https://bitbucket.org/danielhomola/boruta_py

Let me know what you think. If anyone thinks this might be worthy of adding it to the feature selection module, the original author Miron is happy to give his blessing, and I'm happy work on it further.

Cheers,
Daniel

On 15/04/15 11:03, Daniel Homola wrote:
Hi all,

I needed a multivariate feature selection method for my work. As I'm working with biological/medical data, where n < p or even n << p I started to read up on Random Forest based methods, as in my limited understanding RF copes pretty well with this suboptimal situation.

I came across an R package called Boruta:https://m2.icm.edu.pl/boruta/ <https://exchange.imperial.ac.uk/owa/redir.aspx?C=Yp1dHGp6hkyiZQZzx17DHznOv7PxStIIK3PgwAs_McazihitoU3Fm6_EBXvwfIJB2CJSzkCKKjo.&URL=https%3a%2f%2fm2.icm.edu.pl%2fboruta%2f>

After reading the paper and checking some of the pretty impressive citations I thought I'd try it, but it was really slow. So I thought I'll reimplement it in Python, because I hoped (based on thishttp://www.slideshare.net/glouppe/accelerating-random-forests-in-scikitlearn <https://exchange.imperial.ac.uk/owa/redir.aspx?C=Yp1dHGp6hkyiZQZzx17DHznOv7PxStIIK3PgwAs_McazihitoU3Fm6_EBXvwfIJB2CJSzkCKKjo.&URL=http%3a%2f%2fwww.slideshare.net%2fglouppe%2faccelerating-random-forests-in-scikitlearn>) that it will be faster. And it is :) I mean a LOT faster..

I was wondering if this would be something that you would consider incorporating into the feature selection module of scikit-learn?

If yes, do you have a tutorial or some sort of guidance about how should I prepare the code, what conventions should I follow, etc?

Cheers,

Daniel Homola

STRATiGRAD PhD Programme
Imperial College London



------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y


_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general



------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y


_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general



------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y


_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general



------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y


_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to