On Nov 4, 2011, at 3:28 AM, WangRamon wrote:

> 
> Thanks, that's what i need. I have another question, is there a recommend 
> value for the iteration and convergenceDelta in K-Means? Thanks a lot.  
> Cheers Ramon


It's usually determined by testing (what's the minimum values you need that 
give you good results), but also by how long it takes for your system to run 
and what your business requirements are.  Both of those values are really meant 
to be save guards against a runaway process since k-means isn't guaranteed to 
converge.


>> Date: Fri, 4 Nov 2011 08:07:01 +0530
>> From: [email protected]
>> To: [email protected]
>> Subject: Re: How to find which point belongs which cluster after running 
>> KMeansClusterer
>> 
>> Transform your vector in a NamedVector.
>> 
>> On 04-11-2011 08:02, WangRamon wrote:
>>> OK, me again, I checked the KMeansDriver code for output points 
>>> information, following is the code:   Map<Text, Text> props = new 
>>> HashMap<Text, Text>();
>>>    props.put(new Text("distance"), new 
>>> Text(String.valueOf(nearestDistance)));
>>>    context.write(new IntWritable(nearestCluster.getId()), new 
>>> WeightedPropertyVectorWritable(1, vector, props)); It's good to output 
>>> point(the vector) and distance information,  but usually we need something 
>>> like a name in real business to identify the the point, name <--> 
>>> vector/point,  and this information is not written out, if we can add this 
>>> information, that's will be much more better.   Cheers  Ramon
>>>> Subject: Re: How to find which point belongs which cluster after running 
>>>> KMeansClusterer
>>>> From: [email protected]
>>>> Date: Thu, 3 Nov 2011 08:28:19 -0400
>>>> To: [email protected]
>>>> 
>>>> There is code for this, it's in two places (on trunk, at least):
>>>> 
>>>> 1. ClusterDumper:
>>>> public static Map<Integer, List<WeightedVectorWritable>> readPoints(Path 
>>>> pointsPathDir, Configuration conf) {
>>>>    Map<Integer, List<WeightedVectorWritable>> result = new 
>>>> TreeMap<Integer, List<WeightedVectorWritable>>();
>>>>    for (Pair<IntWritable, WeightedVectorWritable> record :
>>>>            new SequenceFileDirIterable<IntWritable, 
>>>> WeightedVectorWritable>(
>>>>                    pointsPathDir, PathType.LIST, 
>>>> PathFilters.logsCRCFilter(), conf)) {
>>>>      // value is the cluster id as an int, key is the name/id of the
>>>>      // vector, but that doesn't matter because we only care about printing
>>>>      // it
>>>>      //String clusterId = value.toString();
>>>>      int keyValue = record.getFirst().get();
>>>>      List<WeightedVectorWritable> pointList = result.get(keyValue);
>>>>      if (pointList == null) {
>>>>        pointList = Lists.newArrayList();
>>>>        result.put(keyValue, pointList);
>>>>      }
>>>>      pointList.add(record.getSecond());
>>>>    }
>>>>    return result;
>>>>  }
>>>> 
>>>> 2. ClusterDumperWriter:
>>>> List<WeightedVectorWritable> points = 
>>>> clusterIdToPoints.get(value.getId()); //look up the points by cluster id
>>>>    if (points != null) {
>>>>      writer.write("\tWeight : [props - optional]:  Point:\n\t");
>>>>      for (Iterator<WeightedVectorWritable> iterator = points.iterator(); 
>>>> iterator.hasNext(); ) {
>>>>        WeightedVectorWritable point = iterator.next();
>>>>        writer.write(String.valueOf(point.getWeight()));
>>>> 
>>>> On Nov 3, 2011, at 5:48 AM, WangRamon wrote:
>>>> 
>>>>> Yes, Paritosh, it's a bit missleading for new users, I will start to 
>>>>> check KMeansDriver, thanks for your quickly reply.
>>>>>> Date: Thu, 3 Nov 2011 15:02:28 +0530
>>>>>> From: [email protected]
>>>>>> To: [email protected]
>>>>>> Subject: Re: How to find which point belongs which cluster after running 
>>>>>> KMeansClusterer
>>>>>> 
>>>>>> I also thought in the beginning that using KMeansClusterer and
>>>>>> ClusterDumper will help in getting all vectors belonging to a cluster,
>>>>>> but it did not help me a lot.
>>>>>> 
>>>>>> I used KMeansDriver which I think is easy enough to use.
>>>>>> 
>>>>>> After execution the records are written in the form
>>>>>> <cluster id><vector>
>>>>>> 
>>>>>> "context.write(new Text(cluster.getIdentifier()), cluster);"
>>>>>> 
>>>>>> So, what helped me was to process this into a map with cluster Id as the
>>>>>> key and vector list as the value. I read the clustered points and all
>>>>>> the data in the map in the form. In the end, the list against each
>>>>>> cluster id was what I needed.
>>>>>> 
>>>>>> Hope this helps.
>>>>>> 
>>>>>> Regards,
>>>>>> Paritosh
>>>>>> 
>>>>>> On 03-11-2011 14:23, WangRamon wrote:
>>>>>>> 
>>>>>>> 
>>>>>>> Hi All I'm using KMeansClusterer, I will use KMeansDriver on a Hadoop 
>>>>>>> environment later, but I think it will be easy to understand it by 
>>>>>>> using KMeansClusterer, OK, so the question is i cannot find a way to 
>>>>>>> find the cluster a point should belong to after running 
>>>>>>> KMeansClusterer, I expect I can get some API on the Cluster interface 
>>>>>>> to get all points/vector belong to this cluster, but... so did i miss 
>>>>>>> something? Thanks a lot.  Cheers Ramon                                  
>>>>>>>         
>>>>>>> 
>>>>>>> 
>>>>>>> -----
>>>>>>> No virus found in this message.
>>>>>>> Checked by AVG - www.avg.com
>>>>>>> Version: 10.0.1411 / Virus Database: 2092/3992 - Release Date: 11/02/11
>>>>>                                     
>>>> --------------------------------------------
>>>> Grant Ingersoll
>>>> http://www.lucidimagination.com
>>>> 
>>>> 
>>>> 
>>>                                       
>>> 
>>> 
>>> -----
>>> No virus found in this message.
>>> Checked by AVG - www.avg.com
>>> Version: 10.0.1411 / Virus Database: 2092/3992 - Release Date: 11/02/11
>> 
>                                         

--------------------------
Grant Ingersoll
http://www.lucidimagination.com





Reply via email to