Kamesh, can you please describe the schema of your input data, along with
your command to perform the clustering?


On Mon, Jun 16, 2014 at 12:44 AM, Kamesh <[email protected]> wrote:

> Thanks for the response Andrew. I am using Mahout 0.9 version. However, I
> tried with trunk version but still I am getting output in the following
> format
>
> C-55{n=1 c=[15993058.000] r=[]}
> C-56{n=2 c=[15993061.167] r=[]}
> C-57{n=1 c=[15993062.000] r=[]}
>
> C-97{n=1 c=[15993103.000] r=[]}
> C-98{n=2 c=[15993119.333] r=[0.395]}
> C-99{n=1 c=[15993105.000] r=[]}
>
> and hence, not able to figure out the data points inside each cluster.
>
> Also, When I am running with "-of JSON" getting NPE
>
> Exception in thread "main" java.lang.NullPointerException
> at
>
> org.apache.mahout.utils.clustering.JsonClusterWriter.getTopFeaturesList(JsonClusterWriter.java:118)
> at
>
> org.apache.mahout.utils.clustering.JsonClusterWriter.write(JsonClusterWriter.java:73)
> at
>
> org.apache.mahout.utils.clustering.AbstractClusterWriter.write(AbstractClusterWriter.java:115)
> at
>
> org.apache.mahout.utils.clustering.AbstractClusterWriter.write(AbstractClusterWriter.java:102)
>
> I am executing cluster dump using the following way
>
> hadoop jar mahout-integration-1.0-SNAPSHOT.jar
> org.apache.mahout.utils.clustering.ClusterDumper -i
> /canopy/clusters-0-final -p /canopy/clusteredPoints -of JSON -n 1000
>
> Also I have observed that the *part* file created inside *clusteredPoints*
> is empty.
>
> Please help me how to get data points from each cluster.
>
>
> On Fri, Jun 13, 2014 at 9:24 PM, Andrew Musselman <
> [email protected]> wrote:
>
> > That's going to be easier if you can work off of trunk, since the output
> of
> > clustering has been cleaned up to write a better format, per
> > https://issues.apache.org/jira/browse/MAHOUT-1505
> >
> > E.g.,
> >
> > {
> >   "top_terms": [
> >     {"all":3.0149030685424805},
> >     {"english":3.0149030685424805},
> >     {"best":3.0149030685424805},
> >     {"spaniel":3.0149030685424805},
> >     {"springer":3.0149030685424805},
> >     {"dogs":1.9162907600402832}
> >   ],
> >   "cluster_id": 7,
> >   "cluster": {
> >     "r": [],
> >     "c": [
> >       {"all":3.015},
> >       {"best":3.015},
> >       {"dogs":1.916},
> >       {"english":3.015},
> >       {"spaniel":3.015},
> >       {"springer":3.015}
> >     ],
> >     "n": 1,
> >     "identifier": "C-7"
> >   },
> >   "points": [
> >     {
> >       "point": [
> >         {"all":3.015},
> >         {"best":3.015},
> >         {"dogs":1.916},
> >         {"english":3.015},
> >         {"spaniel":3.015},
> >         {"springer":3.015}
> >       ],
> >       "vector_name": "P(14)",
> >       "weight": "1.0"
> >     }
> >   ]
> > }
> >
> >
> > On Fri, Jun 13, 2014 at 2:42 AM, Kamesh <[email protected]> wrote:
> >
> > > Hi All,
> > > Please help me in getting the data points inside each cluster.
> > > The output of the clustering algorithm is center of the cluster and
> > radius
> > > of the cluster. How do we derive actual data points inside each cluster
> > > from this output.
> > >
> > > --
> > > Kamesh.
> > >
> >
>
>
>
> --
> Kamesh.
>

Reply via email to