Hi Matt,

Thanks for the reply. I am using the Elephant-Bird package in order to
generate my vectors, so I am not sure if I can specify to use
NamedVectors. I have asked this in a separate thread.

In the absence of NamedVectors, is there another way I can resolve the
name of the items that you know of?

Currently when I print out the contents of the clusteredPoints file I
see the following output:

1.0: [3887:3.000, 9441:1.000] is in 1205002
1.0: [6773:1.000] is in 1205002
1.0: [8987:2.000] is in 1205002
1.0: [2956:1.000] is in 1205002


Thanks again,
Colum

On Tue, Mar 5, 2013 at 8:57 PM, Matt Molek <[email protected]> wrote:
> If you run kmeans with the "-cl" option (or set the runClustering option to
> true if you're calling the driver from Java code), you'll get a sequence
> file in the directory $KMEANS_OUT/clusteredPoints with an IntWritiable key
> identifying the cluster, and a WeightedVectorWritable with a pdf weight
> (always 1.0 in kmeans) and your original vector. If you want to recover the
> name/id/whatever of the original input, you need to use NamedVector as
> input to kmeans. The name will be preserved in the vector.
>
> Here's one abbreviated line of output. My vector with name "0" was
> classified into cluster 4398:
>     Key: 4398: Value: 1.0: 0 = [0.007, 0.002, -0.016, -0.003,...]
>
> Clusterdump might include this information as well. I can't remember. You'd
> still need to run kmeans with the -cl option.
>
>
> On Tue, Mar 5, 2013 at 1:33 PM, Colum Foley <[email protected]> wrote:
>
>> Hi,
>>
>> I have a simple enough question: having run K-Means clustering
>> (generated the clustered points, and clusters-x, clusters-x-final
>> directories), how do you identify which items were clustered together?
>> Apologies if this is trivial but I could not see an obvious answer in
>> the documentation.
>>
>> Clusterdump seems to be the tool to use, but when I have run it I only
>> see Cluster ids,centroid values, radius etc, but it is not obvious to
>> me how I resolve individual item names? I am looking for something of
>> the following form:
>>
>> cluster_id = (keys)*
>>
>> for example:
>>
>> cluster_1 = {"user104x","user89dc","user22da".}
>> cluster_2 = {"user19c","user11c",....}
>>
>>
>> Thanks,
>> Colum
>>

Reply via email to