I did a piece of that in the Titanic examples from the SciPy tutorial,
but it could definitely use a more thorough and clear example. This
version could probably be simplified/streamlined - much of my
preprocessing was done with straight numpy, and I am 90% sure there is
a more "sklearn approved" way to do it using FeatureUnion, etc.

Kyle

On Mon, Oct 5, 2015 at 2:25 PM, Andreas Mueller <t3k...@gmail.com> wrote:
>
>
> On 09/30/2015 05:53 PM, KAB wrote:
>> s. And this is due to the special way scikit-learn requires the data
>> to be presented to its objects. Last time I checked (I really don't
>> know if there has been any change since then) one had to do some
>> wrangling with pandas' data frames, however subtle that might be, to
>> get scikit-learn to understand them. And there was quite an effort to
>> be done regarding how to encode categorical factors and how to
>> represent them in a fashion that scikit-learn understands.
> The part about categorical variables is true and not covered in the docs
> as well as I'd like to.
> Having all the features be continuous is the only requirement, though.
>
> ------------------------------------------------------------------------------
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to