Hello everyone,

I'm using Mahout Streaming K Means multiple times in a loop, every time for same input data, and output path is always different. Concretely, I'm increasing number of clusters in each iteration. Currently it is run on a single machine.

A couple of times (maybe 3 of 20 runs) I get this exception

Oct 09, 2014 11:30:40 AM org.apache.hadoop.mapred.Merger$MergeQueue merge
INFO: Merging 1 sorted segments
Oct 09, 2014 11:30:40 AM org.apache.hadoop.mapred.Merger$MergeQueue merge
INFO: Down to the last merge-pass, with 1 segments left of total size: 1623 bytes Oct 09, 2014 11:30:40 AM org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
INFO:
Oct 09, 2014 11:30:40 AM org.apache.hadoop.mapred.LocalJobRunner$Job run
WARNING: job_local1196467414_0036
java.lang.NullPointerException
at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:213) at org.apache.mahout.math.random.WeightedThing.<init>(WeightedThing.java:31) at org.apache.mahout.math.neighborhood.ProjectionSearch.searchFirst(ProjectionSearch.java:191) at org.apache.mahout.clustering.streaming.cluster.BallKMeans.iterativeAssignment(BallKMeans.java:395) at org.apache.mahout.clustering.streaming.cluster.BallKMeans.cluster(BallKMeans.java:208) at org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.getBestCentroids(StreamingKMeansReducer.java:107) at org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:73) at org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:37)
    at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)

I'm running it like this:

String[] args1 = new String[] {"-i",dataPath,"-o",plusOneCentroids,"-k",String.valueOf(i+1), "--estimatedNumMapClusters",String.valueOf((i+1)*3), "-ow"};
                        StreamingKMeansDriver.main(args1);

I'm using the same configuration, and the same dataset, but I see no reason why I get this exception, and it's even stranger that it doesn't always occur.

Any ideas?

Thanks

Reply via email to