I did a piece of that in the Titanic examples from the SciPy tutorial, but it could definitely use a more thorough and clear example. This version could probably be simplified/streamlined - much of my preprocessing was done with straight numpy, and I am 90% sure there is a more "sklearn approved" way to do it using FeatureUnion, etc.
Kyle On Mon, Oct 5, 2015 at 2:25 PM, Andreas Mueller <t3k...@gmail.com> wrote: > > > On 09/30/2015 05:53 PM, KAB wrote: >> s. And this is due to the special way scikit-learn requires the data >> to be presented to its objects. Last time I checked (I really don't >> know if there has been any change since then) one had to do some >> wrangling with pandas' data frames, however subtle that might be, to >> get scikit-learn to understand them. And there was quite an effort to >> be done regarding how to encode categorical factors and how to >> represent them in a fashion that scikit-learn understands. > The part about categorical variables is true and not covered in the docs > as well as I'd like to. > Having all the features be continuous is the only requirement, though. > > ------------------------------------------------------------------------------ > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general ------------------------------------------------------------------------------ _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general