2014-06-18 19:05 GMT+02:00 Abijith Kp <[email protected]>:
> I would like to use a Json file from which I take the dataset for my
> clustering algorithm. The Json would be in the form of a nested dictionary.
> It would be great if someone could show me the correct direction regarding
> how to load this file(for usage purpose) using scikit-learn module, to use
> the resulting object in any clustering algorithm, preferably mean shift and
> k-means.

Try sklearn.feature_extraction.DictVectorizer. You'll have to flatten
your dictionaries first, so that they're string->string or
string->number mappings (see the docstring for details). Basic usage,
for clustering, is

v = DictVectorizer()
X = v.fit_transform(flatten(x) for x in your_data_points)

where flatten is some data-dependent function that you'll have to write.

------------------------------------------------------------------------------
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://p.sf.net/sfu/hpccsystems
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to