You might want to try to cascade a high precision linear classifier (by tuning the intercept_ attribute based on the PR-curve) to trim most of the majority class with a second stage classifier as described in this paper by Google: http://research.google.com/pubs/pub37195.html
I have never tried it my-self yet but it sounds interesting to try and should be doable by using sklearn models as building blocks. ------------------------------------------------------------------------------ LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99! 1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint 2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13. http://pubads.g.doubleclick.net/gampad/clk?id=58041151&iu=/4140/ostg.clktrk _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general