Re: where are the points in each cluster - kmeans clusterdump

Delroy Cameron Wed, 26 May 2010 11:20:45 -0700

yeah Jeff, 
the implementation for printing the points has changed. Instead of a list of
strings for each point, we now have a list of WeightedVectorWritable
objects. The problem is that in the previous implementation getting the
point id (i.e. the document id for each document in the cluster) was
straight forward..see below


After looking at the API for the code and testing a few output variations on
points output. i am forced to ask..are the ids for the points in the
WeightedVectorWritable object?

 List<String> points =
clusterIdToPoints.get(String.valueOf(cluster.getId()));
        if (points != null) {
          writer.write("\tPoints: ");
          for (Iterator<String> iterator = points.iterator();
iterator.hasNext();) {
            String point = iterator.next();
            writer.append(point);
            if (iterator.hasNext()) {
              writer.append(", ");
            }
          }
          writer.write('\n');
        }

Top Terms: 
                were                                    =>   32.23076923076923
                expression                              =>  27.333333333333332
                gene                                    =>  23.076923076923077
                from                                    =>  19.641025641025642
                cells                                   =>   17.76923076923077
                c                                       =>   16.23076923076923
                1                                       =>   14.76923076923077
                human                                   =>  14.487179487179487
                5                                       =>  13.820512820512821
                we                                      =>  13.179487179487179
        Points: 10075717, 10330009, 10419905, 10811945, 11116137, 11222753,
11691919

List<WeightedVectorWritable> points =
clusterIdToPoints.get(cluster.getId());
        if (points != null) {
          writer.write("\tWeight:  Point:\n\t");
          for (Iterator<WeightedVectorWritable> iterator =
points.iterator(); iterator.hasNext();) {
            WeightedVectorWritable point = iterator.next();
            writer.append(Double.toString(point.getWeight())).append(": ");
            writer.append(ClusterBase.formatVector(point.getVector().get(),
dictionary));
            if (iterator.hasNext()) {
              writer.append("\n\t");
            }
          }
          writer.write('\n');
        }

Top Terms:
                riele                                   =>  
14.00426959991455
                meredith                                => 
12.727957301669651
                lysine-6                                => 
11.388569796526873
                amores                                  => 
10.307115837379738
                mashimo                                 =>  
9.840165774027506
                halks                                   =>  
9.598452267823395
                maseki                                  =>  
8.773765140109592
                lysine-63                               =>  
8.496143341064453
                saporita                                =>  
8.167389004318803
                a94                                     =>  
8.119972387949625
        Weight:  Point:
        1.0: [265:1.016, 1753:3.503, 2087:2.217, 2162:2.396, 2217:1.347,
2702:1.054, 2886:1.125, 2974:2.472, 3197:1.603, 3472:1.902, 3714:1.658,
3789:1.735, 4003:1.538, 4168:3.849, 4387:6.602, 4399:3.800, 4513:1.717,
4640:1.387, ...]


-----
--cheers
Delroy
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/where-are-the-points-in-each-cluster-kmeans-clusterdump-tp838683p845600.html
Sent from the Mahout User List mailing list archive at Nabble.com.

Re: where are the points in each cluster - kmeans clusterdump

Reply via email to