Thanks for the response Andrew. I am using Mahout 0.9 version. However, I
tried with trunk version but still I am getting output in the following
format
C-55{n=1 c=[15993058.000] r=[]}
C-56{n=2 c=[15993061.167] r=[]}
C-57{n=1 c=[15993062.000] r=[]}
C-97{n=1 c=[15993103.000] r=[]}
C-98{n=2 c=[15993119.333] r=[0.395]}
C-99{n=1 c=[15993105.000] r=[]}
and hence, not able to figure out the data points inside each cluster.
Also, When I am running with "-of JSON" getting NPE
Exception in thread "main" java.lang.NullPointerException
at
org.apache.mahout.utils.clustering.JsonClusterWriter.getTopFeaturesList(JsonClusterWriter.java:118)
at
org.apache.mahout.utils.clustering.JsonClusterWriter.write(JsonClusterWriter.java:73)
at
org.apache.mahout.utils.clustering.AbstractClusterWriter.write(AbstractClusterWriter.java:115)
at
org.apache.mahout.utils.clustering.AbstractClusterWriter.write(AbstractClusterWriter.java:102)
I am executing cluster dump using the following way
hadoop jar mahout-integration-1.0-SNAPSHOT.jar
org.apache.mahout.utils.clustering.ClusterDumper -i
/canopy/clusters-0-final -p /canopy/clusteredPoints -of JSON -n 1000
Also I have observed that the *part* file created inside *clusteredPoints*
is empty.
Please help me how to get data points from each cluster.
On Fri, Jun 13, 2014 at 9:24 PM, Andrew Musselman <
[email protected]> wrote:
> That's going to be easier if you can work off of trunk, since the output of
> clustering has been cleaned up to write a better format, per
> https://issues.apache.org/jira/browse/MAHOUT-1505
>
> E.g.,
>
> {
> "top_terms": [
> {"all":3.0149030685424805},
> {"english":3.0149030685424805},
> {"best":3.0149030685424805},
> {"spaniel":3.0149030685424805},
> {"springer":3.0149030685424805},
> {"dogs":1.9162907600402832}
> ],
> "cluster_id": 7,
> "cluster": {
> "r": [],
> "c": [
> {"all":3.015},
> {"best":3.015},
> {"dogs":1.916},
> {"english":3.015},
> {"spaniel":3.015},
> {"springer":3.015}
> ],
> "n": 1,
> "identifier": "C-7"
> },
> "points": [
> {
> "point": [
> {"all":3.015},
> {"best":3.015},
> {"dogs":1.916},
> {"english":3.015},
> {"spaniel":3.015},
> {"springer":3.015}
> ],
> "vector_name": "P(14)",
> "weight": "1.0"
> }
> ]
> }
>
>
> On Fri, Jun 13, 2014 at 2:42 AM, Kamesh <[email protected]> wrote:
>
> > Hi All,
> > Please help me in getting the data points inside each cluster.
> > The output of the clustering algorithm is center of the cluster and
> radius
> > of the cluster. How do we derive actual data points inside each cluster
> > from this output.
> >
> > --
> > Kamesh.
> >
>
--
Kamesh.