On 11/16/10 7:45 PM, beneo_7 wrote:
i use lucene vector to create the namevector using the idField,
however, i found the name disappear after create canopy in the step canopy
reduce, the namevector is replace with the Randomxxxxvector.
so, i think after the kmeans using the canopy, there is no label for the vector
after kmeams clustering.
where can i found the name after kmeans clustering??
The CanopyReducer is outputting a Canopy, not an input vector that might
be a NamedVector. If you are feeding this output to k-means as its -c
parameter then this is correct too. The KMeansReducer will also output a
Cluster and not an input vector. The default processing only produces
the Clusters, it does not classify the input points.
If you want to use your clustering to actually cluster your NamedVector
inputs then you need to add the -cl argument (it is not the default).
That will cause a classification step using the final Clusters you
computed and your input NamedVectors will be wrapped in a
WeightedVectorWritable to appear in the <output>/clusteredPoints directory.
The ClusterDumper will print your idField values.