2013/8/20 abhishek <[email protected]>:
> hi,
>
> how does sklearn handle datasets with mixed attributes. e.g. the datasets
> which have numbers as some features and sentences as others?

All models in scikit-learn assume an homegeneous input data
representation: the input data is always a matrix (possibly sparse) of
real valued features with shape (n_samples, n_features).

You have to perform your feature extraction logic to extract a data
representation that meets this assumption. How to do that is problem
specific. We provide some building blocks for common tasks here:

  http://scikit-learn.org/stable/modules/feature_extraction.html

You can also use the FeatureUnion tool to combine representations:

 
http://scikit-learn.org/stable/modules/pipeline.html#featureunion-combining-feature-extractors
 
http://scikit-learn.org/stable/modules/generated/sklearn.pipeline.FeatureUnion.html

Note that if you have to model some structure in the prediction target
itself (for instance predicting a sequence of labels) you might rather
use another library such as pystruct for instance:
http://pystruct.github.io/

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

------------------------------------------------------------------------------
Introducing Performance Central, a new site from SourceForge and 
AppDynamics. Performance Central is your source for news, insights, 
analysis and resources for efficient Application Performance Management. 
Visit us today!
http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to