On 11/16/10 7:45 PM, beneo_7 wrote:
i use lucene vector to create the namevector using the idField,

however, i found the name disappear after create canopy in the step canopy 
reduce, the namevector is replace with the Randomxxxxvector.

so, i think after the kmeans using the canopy, there is no label for the vector 
after kmeams clustering.

where can i found the name after kmeans clustering??
The CanopyReducer is outputting a Canopy, not an input vector that might be a NamedVector. If you are feeding this output to k-means as its -c parameter then this is correct too. The KMeansReducer will also output a Cluster and not an input vector. The default processing only produces the Clusters, it does not classify the input points.

If you want to use your clustering to actually cluster your NamedVector inputs then you need to add the -cl argument (it is not the default). That will cause a classification step using the final Clusters you computed and your input NamedVectors will be wrapped in a WeightedVectorWritable to appear in the <output>/clusteredPoints directory.

The ClusterDumper will print your idField values.

Reply via email to