Thanks, that's what i need. I have another question, is there a recommend value 
for the iteration and convergenceDelta in K-Means? Thanks a lot.  Cheers Ramon
 > Date: Fri, 4 Nov 2011 08:07:01 +0530
> From: [email protected]
> To: [email protected]
> Subject: Re: How to find which point belongs which cluster after running 
> KMeansClusterer
> 
> Transform your vector in a NamedVector.
> 
> On 04-11-2011 08:02, WangRamon wrote:
> > OK, me again, I checked the KMeansDriver code for output points 
> > information, following is the code:   Map<Text, Text> props = new 
> > HashMap<Text, Text>();
> >     props.put(new Text("distance"), new 
> > Text(String.valueOf(nearestDistance)));
> >     context.write(new IntWritable(nearestCluster.getId()), new 
> > WeightedPropertyVectorWritable(1, vector, props)); It's good to output 
> > point(the vector) and distance information,  but usually we need something 
> > like a name in real business to identify the the point, name <--> 
> > vector/point,  and this information is not written out, if we can add this 
> > information, that's will be much more better.   Cheers  Ramon
> >  > Subject: Re: How to find which point belongs which cluster after running 
> > KMeansClusterer
> >> From: [email protected]
> >> Date: Thu, 3 Nov 2011 08:28:19 -0400
> >> To: [email protected]
> >>
> >> There is code for this, it's in two places (on trunk, at least):
> >>
> >> 1. ClusterDumper:
> >> public static Map<Integer, List<WeightedVectorWritable>> readPoints(Path 
> >> pointsPathDir, Configuration conf) {
> >>     Map<Integer, List<WeightedVectorWritable>> result = new 
> >> TreeMap<Integer, List<WeightedVectorWritable>>();
> >>     for (Pair<IntWritable, WeightedVectorWritable> record :
> >>             new SequenceFileDirIterable<IntWritable, 
> >> WeightedVectorWritable>(
> >>                     pointsPathDir, PathType.LIST, 
> >> PathFilters.logsCRCFilter(), conf)) {
> >>       // value is the cluster id as an int, key is the name/id of the
> >>       // vector, but that doesn't matter because we only care about 
> >> printing
> >>       // it
> >>       //String clusterId = value.toString();
> >>       int keyValue = record.getFirst().get();
> >>       List<WeightedVectorWritable> pointList = result.get(keyValue);
> >>       if (pointList == null) {
> >>         pointList = Lists.newArrayList();
> >>         result.put(keyValue, pointList);
> >>       }
> >>       pointList.add(record.getSecond());
> >>     }
> >>     return result;
> >>   }
> >>
> >> 2. ClusterDumperWriter:
> >> List<WeightedVectorWritable> points = 
> >> clusterIdToPoints.get(value.getId()); //look up the points by cluster id
> >>     if (points != null) {
> >>       writer.write("\tWeight : [props - optional]:  Point:\n\t");
> >>       for (Iterator<WeightedVectorWritable> iterator = points.iterator(); 
> >> iterator.hasNext(); ) {
> >>         WeightedVectorWritable point = iterator.next();
> >>         writer.write(String.valueOf(point.getWeight()));
> >>
> >> On Nov 3, 2011, at 5:48 AM, WangRamon wrote:
> >>
> >>> Yes, Paritosh, it's a bit missleading for new users, I will start to 
> >>> check KMeansDriver, thanks for your quickly reply.
> >>>> Date: Thu, 3 Nov 2011 15:02:28 +0530
> >>>> From: [email protected]
> >>>> To: [email protected]
> >>>> Subject: Re: How to find which point belongs which cluster after running 
> >>>> KMeansClusterer
> >>>>
> >>>> I also thought in the beginning that using KMeansClusterer and
> >>>> ClusterDumper will help in getting all vectors belonging to a cluster,
> >>>> but it did not help me a lot.
> >>>>
> >>>> I used KMeansDriver which I think is easy enough to use.
> >>>>
> >>>> After execution the records are written in the form
> >>>> <cluster id><vector>
> >>>>
> >>>> "context.write(new Text(cluster.getIdentifier()), cluster);"
> >>>>
> >>>> So, what helped me was to process this into a map with cluster Id as the
> >>>> key and vector list as the value. I read the clustered points and all
> >>>> the data in the map in the form. In the end, the list against each
> >>>> cluster id was what I needed.
> >>>>
> >>>> Hope this helps.
> >>>>
> >>>> Regards,
> >>>> Paritosh
> >>>>
> >>>> On 03-11-2011 14:23, WangRamon wrote:
> >>>>>
> >>>>>
> >>>>> Hi All I'm using KMeansClusterer, I will use KMeansDriver on a Hadoop 
> >>>>> environment later, but I think it will be easy to understand it by 
> >>>>> using KMeansClusterer, OK, so the question is i cannot find a way to 
> >>>>> find the cluster a point should belong to after running 
> >>>>> KMeansClusterer, I expect I can get some API on the Cluster interface 
> >>>>> to get all points/vector belong to this cluster, but... so did i miss 
> >>>>> something? Thanks a lot.  Cheers Ramon                                  
> >>>>>         
> >>>>>
> >>>>>
> >>>>> -----
> >>>>> No virus found in this message.
> >>>>> Checked by AVG - www.avg.com
> >>>>> Version: 10.0.1411 / Virus Database: 2092/3992 - Release Date: 11/02/11
> >>>                                     
> >> --------------------------------------------
> >> Grant Ingersoll
> >> http://www.lucidimagination.com
> >>
> >>
> >>
> >                                       
> >
> >
> > -----
> > No virus found in this message.
> > Checked by AVG - www.avg.com
> > Version: 10.0.1411 / Virus Database: 2092/3992 - Release Date: 11/02/11
> 
                                          

Reply via email to