Interesting; could be a bug, I'll take a look.
On Tue, Jun 17, 2014 at 10:38 AM, Han Fan <[email protected]> wrote: > Is this command line what you need? (Replace /user/root/testdataout with > your output directory) > $ mahout seqdumper -i /user/root/testdataout/data/part-m-00000 > Key: 9: Value: {0:1.0,2:-0.956,1:-0.213,5:0.091,3:-0.003,7:-0.024,6:0.017, > 8:1.0,4:0.056} > Key: 9: Value: {0:1.0,2:2.129,1:3.147,5:-0.063,3:-0.006,7:0.109,6:-0.002, > 4:-0.056} > Key: 9: Value: {0:1.0,2:-2.718,1:-2.165,5:-0.103,3:-0.008,7:-0.024,6:-0. > 156,8:1.0,4:0.043} > ... > > Sorry if I misunderstand. > > > > > On 16/6/14 3:44 pm, Kamesh wrote: > >> Thanks for the response Andrew. I am using Mahout 0.9 version. However, I >> tried with trunk version but still I am getting output in the following >> format >> >> C-55{n=1 c=[15993058.000] r=[]} >> C-56{n=2 c=[15993061.167] r=[]} >> C-57{n=1 c=[15993062.000] r=[]} >> >> C-97{n=1 c=[15993103.000] r=[]} >> C-98{n=2 c=[15993119.333] r=[0.395]} >> C-99{n=1 c=[15993105.000] r=[]} >> >> and hence, not able to figure out the data points inside each cluster. >> >> Also, When I am running with "-of JSON" getting NPE >> >> Exception in thread "main" java.lang.NullPointerException >> at >> org.apache.mahout.utils.clustering.JsonClusterWriter.getTopFeaturesList( >> JsonClusterWriter.java:118) >> at >> org.apache.mahout.utils.clustering.JsonClusterWriter. >> write(JsonClusterWriter.java:73) >> at >> org.apache.mahout.utils.clustering.AbstractClusterWriter.write( >> AbstractClusterWriter.java:115) >> at >> org.apache.mahout.utils.clustering.AbstractClusterWriter.write( >> AbstractClusterWriter.java:102) >> >> I am executing cluster dump using the following way >> >> hadoop jar mahout-integration-1.0-SNAPSHOT.jar >> org.apache.mahout.utils.clustering.ClusterDumper -i >> /canopy/clusters-0-final -p /canopy/clusteredPoints -of JSON -n 1000 >> >> Also I have observed that the *part* file created inside *clusteredPoints* >> is empty. >> >> Please help me how to get data points from each cluster. >> >> >> On Fri, Jun 13, 2014 at 9:24 PM, Andrew Musselman < >> [email protected]> wrote: >> >> That's going to be easier if you can work off of trunk, since the output >>> of >>> clustering has been cleaned up to write a better format, per >>> https://issues.apache.org/jira/browse/MAHOUT-1505 >>> >>> E.g., >>> >>> { >>> "top_terms": [ >>> {"all":3.0149030685424805}, >>> {"english":3.0149030685424805}, >>> {"best":3.0149030685424805}, >>> {"spaniel":3.0149030685424805}, >>> {"springer":3.0149030685424805}, >>> {"dogs":1.9162907600402832} >>> ], >>> "cluster_id": 7, >>> "cluster": { >>> "r": [], >>> "c": [ >>> {"all":3.015}, >>> {"best":3.015}, >>> {"dogs":1.916}, >>> {"english":3.015}, >>> {"spaniel":3.015}, >>> {"springer":3.015} >>> ], >>> "n": 1, >>> "identifier": "C-7" >>> }, >>> "points": [ >>> { >>> "point": [ >>> {"all":3.015}, >>> {"best":3.015}, >>> {"dogs":1.916}, >>> {"english":3.015}, >>> {"spaniel":3.015}, >>> {"springer":3.015} >>> ], >>> "vector_name": "P(14)", >>> "weight": "1.0" >>> } >>> ] >>> } >>> >>> >>> On Fri, Jun 13, 2014 at 2:42 AM, Kamesh <[email protected]> wrote: >>> >>> Hi All, >>>> Please help me in getting the data points inside each cluster. >>>> The output of the clustering algorithm is center of the cluster and >>>> >>> radius >>> >>>> of the cluster. How do we derive actual data points inside each cluster >>>> from this output. >>>> >>>> -- >>>> Kamesh. >>>> >>>> >>> >> >> >> > >
