Hello everyone,
I'm using Mahout Streaming K Means multiple times in a loop, every time
for same input data, and output path is always different. Concretely,
I'm increasing number of clusters in each iteration. Currently it is run
on a single machine.
A couple of times (maybe 3 of 20 runs) I get this exception
Oct 09, 2014 11:30:40 AM org.apache.hadoop.mapred.Merger$MergeQueue merge
INFO: Merging 1 sorted segments
Oct 09, 2014 11:30:40 AM org.apache.hadoop.mapred.Merger$MergeQueue merge
INFO: Down to the last merge-pass, with 1 segments left of total size:
1623 bytes
Oct 09, 2014 11:30:40 AM org.apache.hadoop.mapred.LocalJobRunner$Job
statusUpdate
INFO:
Oct 09, 2014 11:30:40 AM org.apache.hadoop.mapred.LocalJobRunner$Job run
WARNING: job_local1196467414_0036
java.lang.NullPointerException
at
com.google.common.base.Preconditions.checkNotNull(Preconditions.java:213)
at
org.apache.mahout.math.random.WeightedThing.<init>(WeightedThing.java:31)
at
org.apache.mahout.math.neighborhood.ProjectionSearch.searchFirst(ProjectionSearch.java:191)
at
org.apache.mahout.clustering.streaming.cluster.BallKMeans.iterativeAssignment(BallKMeans.java:395)
at
org.apache.mahout.clustering.streaming.cluster.BallKMeans.cluster(BallKMeans.java:208)
at
org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.getBestCentroids(StreamingKMeansReducer.java:107)
at
org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:73)
at
org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:37)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
at
org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
I'm running it like this:
String[] args1 = new String[]
{"-i",dataPath,"-o",plusOneCentroids,"-k",String.valueOf(i+1),
"--estimatedNumMapClusters",String.valueOf((i+1)*3), "-ow"};
StreamingKMeansDriver.main(args1);
I'm using the same configuration, and the same dataset, but I see no
reason why I get this exception, and it's even stranger that it doesn't
always occur.
Any ideas?
Thanks