Hello everyone,

I am trying to run the kmeans clustering algorithm from the hama
examples, but I face some problems. Specifically, I want to change the
number of BSP tasks launched, something that is not possible through
this
<http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hama/hama-examples/0.6.2/org/apache/hama/examples/Kmeans.java>
, right? (meaning that the number of tasks is determined by the number
of blocks of the input file).

To this end, I tried to use the KmeansBSP
<http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hama/hama-ml/0.6.4/org/apache/hama/ml/kmeans/KMeansBSP.java#KMeansBSP.main%28java.lang.String[]%29>
job which exports as a parameter the number of launched tasks but I
can;t make it work :$. Specifically, I tried both text and sequence file
input formats but th job is always failing with the message

"Cannot create <name of input>; already exists as a directory"

When putting a non-existing dir, I get the same message.

Can someone please guide me through this? I want to run KMeans and I
want to set the number of BSP tasks to launch (even if this means
partitioning the input file -- I haven't found anything about thuis
online regarding KMeans).

Thank you in advance,
Giannis

Reply via email to