Hello,

I'm having some difficulties trying to run a basic K-means clustering job from a java project when debugging via NetBeans. I was hunting down the cause of mkdirs failure for writing the cluster output and realized the problem is that the configuration I injected when running the job via ToolRunner isn't the same configuration being used when ClusterClassifier.java tries to write the output.

Here's how I'm instantiating & running the job from my code:

     import org.apache.mahout.clustering.kmeans.KMeansDriver;
     public void main(String[] args) throws Exception {
          KMeansDriver kmeans = new KMeansDriver();
          ToolRunner.run(configuration, kmeans, args);
     }

At this point, and all the way through to the call within buildClusters() in the KMeansDriver.java at line 219 the configuration properties are as I expect them to be:

public static Path buildClusters(Configuration conf, Path input, Path clustersIn, Path output, int maxIterations, String delta, boolean runSequential) throws IOException,
          InterruptedException, ClassNotFoundException {

          ......
          prior.writeToSeqFiles(priorClustersPath);
          ......
     }

Then in writeToSegFiles in ClusterClassifier.java, line 186, there's another call to instantiate a new Configuration() which ends up setting the object with default values, blowing out my config so the write operations fail:

     public void writeToSeqFiles(Path path) throws IOException {
          writePolicy(policy, path);
          Configuration config = new Configuration();
          FileSystem fs = FileSystem.get(path.toUri(), config);
          ......
     }


I've also noticed in the KMeansDriver.java there are various calls to getConf() in AbstractJob.java which in turn makes a call to super.getConf() and here the values I passed during instantiation are picked up. How has Mahout been designed to get these configuration values passed into core java classes when run from a ToolRunner?

Has anyone else encountered this issue? I feel I must be missing something fundamental here, but I can't figure out how to get my config values to stick.


Thanks for any tips...

Terry

Reply via email to