Re: Interpretation of cluster output

Kamesh Fri, 20 Jun 2014 01:54:07 -0700

Hi Andrew,
 I am invoking Canopy Driver class to perform clustering. I am able to see
the results when output format is either TEXT or CSV. However, when I am
using JSON, I am getting the exception as I mentioned above.



On Wed, Jun 18, 2014 at 10:32 PM, Andrew Musselman <
[email protected]> wrote:

> Kamesh, can you please describe the schema of your input data, along with
> your command to perform the clustering?
>
>
> On Mon, Jun 16, 2014 at 12:44 AM, Kamesh <[email protected]> wrote:
>
> > Thanks for the response Andrew. I am using Mahout 0.9 version. However, I
> > tried with trunk version but still I am getting output in the following
> > format
> >
> > C-55{n=1 c=[15993058.000] r=[]}
> > C-56{n=2 c=[15993061.167] r=[]}
> > C-57{n=1 c=[15993062.000] r=[]}
> >
> > C-97{n=1 c=[15993103.000] r=[]}
> > C-98{n=2 c=[15993119.333] r=[0.395]}
> > C-99{n=1 c=[15993105.000] r=[]}
> >
> > and hence, not able to figure out the data points inside each cluster.
> >
> > Also, When I am running with "-of JSON" getting NPE
> >
> > Exception in thread "main" java.lang.NullPointerException
> > at
> >
> >
> org.apache.mahout.utils.clustering.JsonClusterWriter.getTopFeaturesList(JsonClusterWriter.java:118)
> > at
> >
> >
> org.apache.mahout.utils.clustering.JsonClusterWriter.write(JsonClusterWriter.java:73)
> > at
> >
> >
> org.apache.mahout.utils.clustering.AbstractClusterWriter.write(AbstractClusterWriter.java:115)
> > at
> >
> >
> org.apache.mahout.utils.clustering.AbstractClusterWriter.write(AbstractClusterWriter.java:102)
> >
> > I am executing cluster dump using the following way
> >
> > hadoop jar mahout-integration-1.0-SNAPSHOT.jar
> > org.apache.mahout.utils.clustering.ClusterDumper -i
> > /canopy/clusters-0-final -p /canopy/clusteredPoints -of JSON -n 1000
> >
> > Also I have observed that the *part* file created inside
> *clusteredPoints*
> > is empty.
> >
> > Please help me how to get data points from each cluster.
> >
> >
> > On Fri, Jun 13, 2014 at 9:24 PM, Andrew Musselman <
> > [email protected]> wrote:
> >
> > > That's going to be easier if you can work off of trunk, since the
> output
> > of
> > > clustering has been cleaned up to write a better format, per
> > > https://issues.apache.org/jira/browse/MAHOUT-1505
> > >
> > > E.g.,
> > >
> > > {
> > >   "top_terms": [
> > >     {"all":3.0149030685424805},
> > >     {"english":3.0149030685424805},
> > >     {"best":3.0149030685424805},
> > >     {"spaniel":3.0149030685424805},
> > >     {"springer":3.0149030685424805},
> > >     {"dogs":1.9162907600402832}
> > >   ],
> > >   "cluster_id": 7,
> > >   "cluster": {
> > >     "r": [],
> > >     "c": [
> > >       {"all":3.015},
> > >       {"best":3.015},
> > >       {"dogs":1.916},
> > >       {"english":3.015},
> > >       {"spaniel":3.015},
> > >       {"springer":3.015}
> > >     ],
> > >     "n": 1,
> > >     "identifier": "C-7"
> > >   },
> > >   "points": [
> > >     {
> > >       "point": [
> > >         {"all":3.015},
> > >         {"best":3.015},
> > >         {"dogs":1.916},
> > >         {"english":3.015},
> > >         {"spaniel":3.015},
> > >         {"springer":3.015}
> > >       ],
> > >       "vector_name": "P(14)",
> > >       "weight": "1.0"
> > >     }
> > >   ]
> > > }
> > >
> > >
> > > On Fri, Jun 13, 2014 at 2:42 AM, Kamesh <[email protected]>
> wrote:
> > >
> > > > Hi All,
> > > > Please help me in getting the data points inside each cluster.
> > > > The output of the clustering algorithm is center of the cluster and
> > > radius
> > > > of the cluster. How do we derive actual data points inside each
> cluster
> > > > from this output.
> > > >
> > > > --
> > > > Kamesh.
> > > >
> > >
> >
> >
> >
> > --
> > Kamesh.
> >
>



-- 
Kamesh.

Re: Interpretation of cluster output

Reply via email to