Re: Interpretation of cluster output

Suneel Marthi Fri, 20 Jun 2014 05:26:20 -0700

There was an issue with empty cluster file being created for Canopy which
has since been fixed in present trunk. So u may want to work off of present
trunk.
Also Canopy's been marked for deprecation in future release so whatever u r
trying to do,  you may want to look at the alternatives.



On Fri, Jun 20, 2014 at 4:53 AM, Kamesh <[email protected]> wrote:

> Hi Andrew,
>  I am invoking Canopy Driver class to perform clustering. I am able to see
> the results when output format is either TEXT or CSV. However, when I am
> using JSON, I am getting the exception as I mentioned above.
>
>
> On Wed, Jun 18, 2014 at 10:32 PM, Andrew Musselman <
> [email protected]> wrote:
>
> > Kamesh, can you please describe the schema of your input data, along with
> > your command to perform the clustering?
> >
> >
> > On Mon, Jun 16, 2014 at 12:44 AM, Kamesh <[email protected]>
> wrote:
> >
> > > Thanks for the response Andrew. I am using Mahout 0.9 version.
> However, I
> > > tried with trunk version but still I am getting output in the following
> > > format
> > >
> > > C-55{n=1 c=[15993058.000] r=[]}
> > > C-56{n=2 c=[15993061.167] r=[]}
> > > C-57{n=1 c=[15993062.000] r=[]}
> > >
> > > C-97{n=1 c=[15993103.000] r=[]}
> > > C-98{n=2 c=[15993119.333] r=[0.395]}
> > > C-99{n=1 c=[15993105.000] r=[]}
> > >
> > > and hence, not able to figure out the data points inside each cluster.
> > >
> > > Also, When I am running with "-of JSON" getting NPE
> > >
> > > Exception in thread "main" java.lang.NullPointerException
> > > at
> > >
> > >
> >
> org.apache.mahout.utils.clustering.JsonClusterWriter.getTopFeaturesList(JsonClusterWriter.java:118)
> > > at
> > >
> > >
> >
> org.apache.mahout.utils.clustering.JsonClusterWriter.write(JsonClusterWriter.java:73)
> > > at
> > >
> > >
> >
> org.apache.mahout.utils.clustering.AbstractClusterWriter.write(AbstractClusterWriter.java:115)
> > > at
> > >
> > >
> >
> org.apache.mahout.utils.clustering.AbstractClusterWriter.write(AbstractClusterWriter.java:102)
> > >
> > > I am executing cluster dump using the following way
> > >
> > > hadoop jar mahout-integration-1.0-SNAPSHOT.jar
> > > org.apache.mahout.utils.clustering.ClusterDumper -i
> > > /canopy/clusters-0-final -p /canopy/clusteredPoints -of JSON -n 1000
> > >
> > > Also I have observed that the *part* file created inside
> > *clusteredPoints*
> > > is empty.
> > >
> > > Please help me how to get data points from each cluster.
> > >
> > >
> > > On Fri, Jun 13, 2014 at 9:24 PM, Andrew Musselman <
> > > [email protected]> wrote:
> > >
> > > > That's going to be easier if you can work off of trunk, since the
> > output
> > > of
> > > > clustering has been cleaned up to write a better format, per
> > > > https://issues.apache.org/jira/browse/MAHOUT-1505
> > > >
> > > > E.g.,
> > > >
> > > > {
> > > >   "top_terms": [
> > > >     {"all":3.0149030685424805},
> > > >     {"english":3.0149030685424805},
> > > >     {"best":3.0149030685424805},
> > > >     {"spaniel":3.0149030685424805},
> > > >     {"springer":3.0149030685424805},
> > > >     {"dogs":1.9162907600402832}
> > > >   ],
> > > >   "cluster_id": 7,
> > > >   "cluster": {
> > > >     "r": [],
> > > >     "c": [
> > > >       {"all":3.015},
> > > >       {"best":3.015},
> > > >       {"dogs":1.916},
> > > >       {"english":3.015},
> > > >       {"spaniel":3.015},
> > > >       {"springer":3.015}
> > > >     ],
> > > >     "n": 1,
> > > >     "identifier": "C-7"
> > > >   },
> > > >   "points": [
> > > >     {
> > > >       "point": [
> > > >         {"all":3.015},
> > > >         {"best":3.015},
> > > >         {"dogs":1.916},
> > > >         {"english":3.015},
> > > >         {"spaniel":3.015},
> > > >         {"springer":3.015}
> > > >       ],
> > > >       "vector_name": "P(14)",
> > > >       "weight": "1.0"
> > > >     }
> > > >   ]
> > > > }
> > > >
> > > >
> > > > On Fri, Jun 13, 2014 at 2:42 AM, Kamesh <[email protected]>
> > wrote:
> > > >
> > > > > Hi All,
> > > > > Please help me in getting the data points inside each cluster.
> > > > > The output of the clustering algorithm is center of the cluster and
> > > > radius
> > > > > of the cluster. How do we derive actual data points inside each
> > cluster
> > > > > from this output.
> > > > >
> > > > > --
> > > > > Kamesh.
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Kamesh.
> > >
> >
>
>
>
> --
> Kamesh.
>

Re: Interpretation of cluster output

Reply via email to