2013/8/20 abhishek <[email protected]>: > hi, > > how does sklearn handle datasets with mixed attributes. e.g. the datasets > which have numbers as some features and sentences as others?
All models in scikit-learn assume an homegeneous input data representation: the input data is always a matrix (possibly sparse) of real valued features with shape (n_samples, n_features). You have to perform your feature extraction logic to extract a data representation that meets this assumption. How to do that is problem specific. We provide some building blocks for common tasks here: http://scikit-learn.org/stable/modules/feature_extraction.html You can also use the FeatureUnion tool to combine representations: http://scikit-learn.org/stable/modules/pipeline.html#featureunion-combining-feature-extractors http://scikit-learn.org/stable/modules/generated/sklearn.pipeline.FeatureUnion.html Note that if you have to model some structure in the prediction target itself (for instance predicting a sequence of labels) you might rather use another library such as pystruct for instance: http://pystruct.github.io/ -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel ------------------------------------------------------------------------------ Introducing Performance Central, a new site from SourceForge and AppDynamics. Performance Central is your source for news, insights, analysis and resources for efficient Application Performance Management. Visit us today! http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
