There is code for this, it's in two places (on trunk, at least):

1. ClusterDumper:
public static Map<Integer, List<WeightedVectorWritable>> readPoints(Path 
pointsPathDir, Configuration conf) {
    Map<Integer, List<WeightedVectorWritable>> result = new TreeMap<Integer, 
List<WeightedVectorWritable>>();
    for (Pair<IntWritable, WeightedVectorWritable> record :
            new SequenceFileDirIterable<IntWritable, WeightedVectorWritable>(
                    pointsPathDir, PathType.LIST, PathFilters.logsCRCFilter(), 
conf)) {
      // value is the cluster id as an int, key is the name/id of the
      // vector, but that doesn't matter because we only care about printing
      // it
      //String clusterId = value.toString();
      int keyValue = record.getFirst().get();
      List<WeightedVectorWritable> pointList = result.get(keyValue);
      if (pointList == null) {
        pointList = Lists.newArrayList();
        result.put(keyValue, pointList);
      }
      pointList.add(record.getSecond());
    }
    return result;
  }

2. ClusterDumperWriter:
List<WeightedVectorWritable> points = clusterIdToPoints.get(value.getId()); 
//look up the points by cluster id
    if (points != null) {
      writer.write("\tWeight : [props - optional]:  Point:\n\t");
      for (Iterator<WeightedVectorWritable> iterator = points.iterator(); 
iterator.hasNext(); ) {
        WeightedVectorWritable point = iterator.next();
        writer.write(String.valueOf(point.getWeight()));

On Nov 3, 2011, at 5:48 AM, WangRamon wrote:

> 
> Yes, Paritosh, it's a bit missleading for new users, I will start to check 
> KMeansDriver, thanks for your quickly reply.
>> Date: Thu, 3 Nov 2011 15:02:28 +0530
>> From: [email protected]
>> To: [email protected]
>> Subject: Re: How to find which point belongs which cluster after running 
>> KMeansClusterer
>> 
>> I also thought in the beginning that using KMeansClusterer and
>> ClusterDumper will help in getting all vectors belonging to a cluster,
>> but it did not help me a lot.
>> 
>> I used KMeansDriver which I think is easy enough to use.
>> 
>> After execution the records are written in the form
>> <cluster id><vector>
>> 
>> "context.write(new Text(cluster.getIdentifier()), cluster);"
>> 
>> So, what helped me was to process this into a map with cluster Id as the
>> key and vector list as the value. I read the clustered points and all
>> the data in the map in the form. In the end, the list against each
>> cluster id was what I needed.
>> 
>> Hope this helps.
>> 
>> Regards,
>> Paritosh
>> 
>> On 03-11-2011 14:23, WangRamon wrote:
>>> 
>>> 
>>> 
>>> Hi All I'm using KMeansClusterer, I will use KMeansDriver on a Hadoop 
>>> environment later, but I think it will be easy to understand it by using 
>>> KMeansClusterer, OK, so the question is i cannot find a way to find the 
>>> cluster a point should belong to after running KMeansClusterer, I expect I 
>>> can get some API on the Cluster interface to get all points/vector belong 
>>> to this cluster, but... so did i miss something? Thanks a lot.  Cheers 
>>> Ramon                                      
>>> 
>>> 
>>> -----
>>> No virus found in this message.
>>> Checked by AVG - www.avg.com
>>> Version: 10.0.1411 / Virus Database: 2092/3992 - Release Date: 11/02/11
>> 
>                                         

--------------------------------------------
Grant Ingersoll
http://www.lucidimagination.com



Reply via email to