Re: [Scikit-learn-general] using string features for classification

Olivier Grisel Tue, 03 Jan 2012 07:03:21 -0800

2012/1/3 Lars Buitinck <[email protected]>:
>
>> We probably need to extend the sklearn.feature_extraction.text package
>> to make it more user friendly to work with with pure categorical
>> features occurrences:
>
> I'm not sure this belongs in feature_extraction.text; it's much more
> broadly applicable.
>
> If you poke around my branches on GitHub, you'll find some preliminary
> work on both a one-hot transformer and an ARFF (Weka format) reader. I
> think the latter would be very convenient for those wanting mixed
> numerical/categorical data sets.


Noted. I don't plan to work on this in the short term but I'll make
sure to check your work on ARFF if I ever change my mind.
Indeed such a generic mixed numerical / categorical feature extractor
would be a very useful contrib to the scikit.

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

------------------------------------------------------------------------------
Write once. Port to many.
Get the SDK and tools to simplify cross-platform app development. Create 
new or port existing apps to sell to consumers worldwide. Explore the 
Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join
http://p.sf.net/sfu/intel-appdev
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] using string features for classification

Reply via email to