Hello everyone, I am trying to run the kmeans clustering algorithm from the hama examples, but I face some problems. Specifically, I want to change the number of BSP tasks launched, something that is not possible through this <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hama/hama-examples/0.6.2/org/apache/hama/examples/Kmeans.java> , right? (meaning that the number of tasks is determined by the number of blocks of the input file).
To this end, I tried to use the KmeansBSP <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hama/hama-ml/0.6.4/org/apache/hama/ml/kmeans/KMeansBSP.java#KMeansBSP.main%28java.lang.String[]%29> job which exports as a parameter the number of launched tasks but I can;t make it work :$. Specifically, I tried both text and sequence file input formats but th job is always failing with the message "Cannot create <name of input>; already exists as a directory" When putting a non-existing dir, I get the same message. Can someone please guide me through this? I want to run KMeans and I want to set the number of BSP tasks to launch (even if this means partitioning the input file -- I haven't found anything about thuis online regarding KMeans). Thank you in advance, Giannis
