On Jan 3, 2012, at 17:02 , Olivier Grisel wrote: > 2012/1/3 Lars Buitinck <[email protected]>: >> >>> We probably need to extend the sklearn.feature_extraction.text package >>> to make it more user friendly to work with with pure categorical >>> features occurrences: >> >> I'm not sure this belongs in feature_extraction.text; it's much more >> broadly applicable. >> >> If you poke around my branches on GitHub, you'll find some preliminary >> work on both a one-hot transformer and an ARFF (Weka format) reader. I >> think the latter would be very convenient for those wanting mixed >> numerical/categorical data sets. > > Noted. I don't plan to work on this in the short term but I'll make > sure to check your work on ARFF if I ever change my mind. > Indeed such a generic mixed numerical / categorical feature extractor > would be a very useful contrib to the scikit.
Isn't this the kind of data that one might store as a pandas DataFrame with some categorical columns? Maybe we should be able to load from this format as well? > -- > Olivier > http://twitter.com/ogrisel - http://github.com/ogrisel > > ------------------------------------------------------------------------------ > Write once. Port to many. > Get the SDK and tools to simplify cross-platform app development. Create > new or port existing apps to sell to consumers worldwide. Explore the > Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join > http://p.sf.net/sfu/intel-appdev > _______________________________________________ > Scikit-learn-general mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general ------------------------------------------------------------------------------ Write once. Port to many. Get the SDK and tools to simplify cross-platform app development. Create new or port existing apps to sell to consumers worldwide. Explore the Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join http://p.sf.net/sfu/intel-appdev _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
