OK, me again, I checked the KMeansDriver code for output points information, 
following is the code:   Map<Text, Text> props = new HashMap<Text, Text>();
    props.put(new Text("distance"), new Text(String.valueOf(nearestDistance)));
    context.write(new IntWritable(nearestCluster.getId()), new 
WeightedPropertyVectorWritable(1, vector, props)); It's good to output 
point(the vector) and distance information,  but usually we need something like 
a name in real business to identify the the point, name <--> vector/point,  and 
this information is not written out, if we can add this information, that's 
will be much more better.   Cheers  Ramon
 > Subject: Re: How to find which point belongs which cluster after running 
 > KMeansClusterer
> From: [email protected]
> Date: Thu, 3 Nov 2011 08:28:19 -0400
> To: [email protected]
> 
> There is code for this, it's in two places (on trunk, at least):
> 
> 1. ClusterDumper:
> public static Map<Integer, List<WeightedVectorWritable>> readPoints(Path 
> pointsPathDir, Configuration conf) {
>     Map<Integer, List<WeightedVectorWritable>> result = new TreeMap<Integer, 
> List<WeightedVectorWritable>>();
>     for (Pair<IntWritable, WeightedVectorWritable> record :
>             new SequenceFileDirIterable<IntWritable, WeightedVectorWritable>(
>                     pointsPathDir, PathType.LIST, 
> PathFilters.logsCRCFilter(), conf)) {
>       // value is the cluster id as an int, key is the name/id of the
>       // vector, but that doesn't matter because we only care about printing
>       // it
>       //String clusterId = value.toString();
>       int keyValue = record.getFirst().get();
>       List<WeightedVectorWritable> pointList = result.get(keyValue);
>       if (pointList == null) {
>         pointList = Lists.newArrayList();
>         result.put(keyValue, pointList);
>       }
>       pointList.add(record.getSecond());
>     }
>     return result;
>   }
> 
> 2. ClusterDumperWriter:
> List<WeightedVectorWritable> points = clusterIdToPoints.get(value.getId()); 
> //look up the points by cluster id
>     if (points != null) {
>       writer.write("\tWeight : [props - optional]:  Point:\n\t");
>       for (Iterator<WeightedVectorWritable> iterator = points.iterator(); 
> iterator.hasNext(); ) {
>         WeightedVectorWritable point = iterator.next();
>         writer.write(String.valueOf(point.getWeight()));
> 
> On Nov 3, 2011, at 5:48 AM, WangRamon wrote:
> 
> > 
> > Yes, Paritosh, it's a bit missleading for new users, I will start to check 
> > KMeansDriver, thanks for your quickly reply.
> >> Date: Thu, 3 Nov 2011 15:02:28 +0530
> >> From: [email protected]
> >> To: [email protected]
> >> Subject: Re: How to find which point belongs which cluster after running 
> >> KMeansClusterer
> >> 
> >> I also thought in the beginning that using KMeansClusterer and
> >> ClusterDumper will help in getting all vectors belonging to a cluster,
> >> but it did not help me a lot.
> >> 
> >> I used KMeansDriver which I think is easy enough to use.
> >> 
> >> After execution the records are written in the form
> >> <cluster id><vector>
> >> 
> >> "context.write(new Text(cluster.getIdentifier()), cluster);"
> >> 
> >> So, what helped me was to process this into a map with cluster Id as the
> >> key and vector list as the value. I read the clustered points and all
> >> the data in the map in the form. In the end, the list against each
> >> cluster id was what I needed.
> >> 
> >> Hope this helps.
> >> 
> >> Regards,
> >> Paritosh
> >> 
> >> On 03-11-2011 14:23, WangRamon wrote:
> >>> 
> >>> 
> >>> 
> >>> Hi All I'm using KMeansClusterer, I will use KMeansDriver on a Hadoop 
> >>> environment later, but I think it will be easy to understand it by using 
> >>> KMeansClusterer, OK, so the question is i cannot find a way to find the 
> >>> cluster a point should belong to after running KMeansClusterer, I expect 
> >>> I can get some API on the Cluster interface to get all points/vector 
> >>> belong to this cluster, but... so did i miss something? Thanks a lot.  
> >>> Cheers Ramon                                            
> >>> 
> >>> 
> >>> -----
> >>> No virus found in this message.
> >>> Checked by AVG - www.avg.com
> >>> Version: 10.0.1411 / Virus Database: 2092/3992 - Release Date: 11/02/11
> >> 
> >                                       
> 
> --------------------------------------------
> Grant Ingersoll
> http://www.lucidimagination.com
> 
> 
> 
                                          

Reply via email to