Re: [Scikit-learn-general] Using json files as dataset for clustering

Lars Buitinck Wed, 18 Jun 2014 10:57:12 -0700

2014-06-18 19:05 GMT+02:00 Abijith Kp <[email protected]>:
> I would like to use a Json file from which I take the dataset for my
> clustering algorithm. The Json would be in the form of a nested dictionary.
> It would be great if someone could show me the correct direction regarding
> how to load this file(for usage purpose) using scikit-learn module, to use
> the resulting object in any clustering algorithm, preferably mean shift and
> k-means.


Try sklearn.feature_extraction.DictVectorizer. You'll have to flatten
your dictionaries first, so that they're string->string or
string->number mappings (see the docstring for details). Basic usage,
for clustering, is

v = DictVectorizer()
X = v.fit_transform(flatten(x) for x in your_data_points)

where flatten is some data-dependent function that you'll have to write.

------------------------------------------------------------------------------
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://p.sf.net/sfu/hpccsystems
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Using json files as dataset for clustering

Reply via email to