2014-06-18 19:05 GMT+02:00 Abijith Kp <[email protected]>: > I would like to use a Json file from which I take the dataset for my > clustering algorithm. The Json would be in the form of a nested dictionary. > It would be great if someone could show me the correct direction regarding > how to load this file(for usage purpose) using scikit-learn module, to use > the resulting object in any clustering algorithm, preferably mean shift and > k-means.
Try sklearn.feature_extraction.DictVectorizer. You'll have to flatten your dictionaries first, so that they're string->string or string->number mappings (see the docstring for details). Basic usage, for clustering, is v = DictVectorizer() X = v.fit_transform(flatten(x) for x in your_data_points) where flatten is some data-dependent function that you'll have to write. ------------------------------------------------------------------------------ HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions Find What Matters Most in Your Big Data with HPCC Systems Open Source. Fast. Scalable. Simple. Ideal for Dirty Data. Leverages Graph Analysis for Fast Processing & Easy Data Exploration http://p.sf.net/sfu/hpccsystems _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
