2012/1/3 Lars Buitinck <[email protected]>: > >> We probably need to extend the sklearn.feature_extraction.text package >> to make it more user friendly to work with with pure categorical >> features occurrences: > > I'm not sure this belongs in feature_extraction.text; it's much more > broadly applicable. > > If you poke around my branches on GitHub, you'll find some preliminary > work on both a one-hot transformer and an ARFF (Weka format) reader. I > think the latter would be very convenient for those wanting mixed > numerical/categorical data sets.
Noted. I don't plan to work on this in the short term but I'll make sure to check your work on ARFF if I ever change my mind. Indeed such a generic mixed numerical / categorical feature extractor would be a very useful contrib to the scikit. -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel ------------------------------------------------------------------------------ Write once. Port to many. Get the SDK and tools to simplify cross-platform app development. Create new or port existing apps to sell to consumers worldwide. Explore the Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join http://p.sf.net/sfu/intel-appdev _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
