Hi Andrew, I am invoking Canopy Driver class to perform clustering. I am able to see the results when output format is either TEXT or CSV. However, when I am using JSON, I am getting the exception as I mentioned above.
On Wed, Jun 18, 2014 at 10:32 PM, Andrew Musselman < [email protected]> wrote: > Kamesh, can you please describe the schema of your input data, along with > your command to perform the clustering? > > > On Mon, Jun 16, 2014 at 12:44 AM, Kamesh <[email protected]> wrote: > > > Thanks for the response Andrew. I am using Mahout 0.9 version. However, I > > tried with trunk version but still I am getting output in the following > > format > > > > C-55{n=1 c=[15993058.000] r=[]} > > C-56{n=2 c=[15993061.167] r=[]} > > C-57{n=1 c=[15993062.000] r=[]} > > > > C-97{n=1 c=[15993103.000] r=[]} > > C-98{n=2 c=[15993119.333] r=[0.395]} > > C-99{n=1 c=[15993105.000] r=[]} > > > > and hence, not able to figure out the data points inside each cluster. > > > > Also, When I am running with "-of JSON" getting NPE > > > > Exception in thread "main" java.lang.NullPointerException > > at > > > > > org.apache.mahout.utils.clustering.JsonClusterWriter.getTopFeaturesList(JsonClusterWriter.java:118) > > at > > > > > org.apache.mahout.utils.clustering.JsonClusterWriter.write(JsonClusterWriter.java:73) > > at > > > > > org.apache.mahout.utils.clustering.AbstractClusterWriter.write(AbstractClusterWriter.java:115) > > at > > > > > org.apache.mahout.utils.clustering.AbstractClusterWriter.write(AbstractClusterWriter.java:102) > > > > I am executing cluster dump using the following way > > > > hadoop jar mahout-integration-1.0-SNAPSHOT.jar > > org.apache.mahout.utils.clustering.ClusterDumper -i > > /canopy/clusters-0-final -p /canopy/clusteredPoints -of JSON -n 1000 > > > > Also I have observed that the *part* file created inside > *clusteredPoints* > > is empty. > > > > Please help me how to get data points from each cluster. > > > > > > On Fri, Jun 13, 2014 at 9:24 PM, Andrew Musselman < > > [email protected]> wrote: > > > > > That's going to be easier if you can work off of trunk, since the > output > > of > > > clustering has been cleaned up to write a better format, per > > > https://issues.apache.org/jira/browse/MAHOUT-1505 > > > > > > E.g., > > > > > > { > > > "top_terms": [ > > > {"all":3.0149030685424805}, > > > {"english":3.0149030685424805}, > > > {"best":3.0149030685424805}, > > > {"spaniel":3.0149030685424805}, > > > {"springer":3.0149030685424805}, > > > {"dogs":1.9162907600402832} > > > ], > > > "cluster_id": 7, > > > "cluster": { > > > "r": [], > > > "c": [ > > > {"all":3.015}, > > > {"best":3.015}, > > > {"dogs":1.916}, > > > {"english":3.015}, > > > {"spaniel":3.015}, > > > {"springer":3.015} > > > ], > > > "n": 1, > > > "identifier": "C-7" > > > }, > > > "points": [ > > > { > > > "point": [ > > > {"all":3.015}, > > > {"best":3.015}, > > > {"dogs":1.916}, > > > {"english":3.015}, > > > {"spaniel":3.015}, > > > {"springer":3.015} > > > ], > > > "vector_name": "P(14)", > > > "weight": "1.0" > > > } > > > ] > > > } > > > > > > > > > On Fri, Jun 13, 2014 at 2:42 AM, Kamesh <[email protected]> > wrote: > > > > > > > Hi All, > > > > Please help me in getting the data points inside each cluster. > > > > The output of the clustering algorithm is center of the cluster and > > > radius > > > > of the cluster. How do we derive actual data points inside each > cluster > > > > from this output. > > > > > > > > -- > > > > Kamesh. > > > > > > > > > > > > > > > -- > > Kamesh. > > > -- Kamesh.
