I looked at the code and the -e shouldn’t need a value. The null pointer is
because of the other params. Unfortunately the help doesn’t say which params
are required. Looks like when it runs evaluation it needs the clustered points
and the distance measure should be the same as you used to cluster.
if (runEvaluation) {
HadoopUtil.delete(conf, new Path("tmp/representative"));
int numIters = 5;
RepresentativePointsDriver.main(new String[]{
"--input", seqFileDir.toString(),
"--output", "tmp/representative",
"--clusteredPoints", pointsDir.toString(),
"--distanceMeasure", measure.getClass().getName(),
"--maxIter", String.valueOf(numIters)
});
I don’t have any example clusters right now so can’t run it myself.
On May 20, 2014, at 1:00 AM, hiroshi leon <[email protected]> wrote:
Thanks Pat and David,
I tried what you told me to do, but unfortunately is not working... I get the
following error when running the command:
./mahout clusterdump -i /user/Data-output/clusters-1-final -o analyze.txt
--evaluate true
"ERROR common.AbstractJob: Unexpected true while processing Job-Specific
Options:
Unexpected true while processing Job-Specific Options."
According to the clusterdump help, it is not suppose to have any value in the
parameter --evaluate (-e), but if I do not
put anything I get the Java Null Pointer Exception.
These are 2 of the 23 clusters that are generated of my analyze.txt file, maybe
it can help to explain if there is something unexpected:
CL-0{n=113525 c=[10.821, 48.382, 66.019, 0.004, 0.000, 0.001, 0.000, 0.001,
0.001, 0.000, 0.000, 0.000, 0.000, 4.921, 8.565, 0.068, 0.068, 0.207, 0.205,
0.951, 0.052, 0.139, 209.864, 175.184, 0.731, 0.079, 0.119, 0.025, 0.069,
0.067, 0.191, 0.196] r=[91.194, 45.425, 78.914, 0.110, 0.008, 0.035, 0.028,
0.037, 0.038, 0.013, 0.008, 0.016, 0.011, 10.173, 23.152, 0.252, 0.252, 0.405,
0.403, 0.164, 0.195, 0.292, 80.182, 102.034, 0.395, 0.223, 0.290, 0.072, 0.251,
0.250, 0.381, 0.388]}
VL-1{n=17 c=[1.133, 0.669, 1.874, 1.460, 1.688, 1.818, 1.939, 1.255, 1.484,
1.697, 0.554, 1.042, 1.774, 0.818, 1.901, 1.522, 1.518, 1.098, 1.637, 1.611,
1.615, 1.212, 1.088, 1.133, 1.483, 0.761, 0.757, 0.953, 1.559, 1.696, 0.548,
0.975] r=[0.000, 0.000, 0.000, 0.000, NaN, NaN, NaN, NaN, 0.000, 0.000, NaN,
0.000, 0.000, 0.000, NaN, 0.000, NaN, 0.000, NaN, 0.000, 0.000, 0.000, 0.000,
0.000, NaN, 0.000, NaN, 0.000, 0.000]}
Thanks!
> Subject: Re: Mahout K-Means - Quality of the clusters
> From: [email protected]
> Date: Mon, 19 May 2014 14:50:47 -0700
> To: [email protected]; [email protected]
>
> Yep, the clue is "--evaluate=null” in the console. try "-e true". I think I
> ran into that a long time ago, it should really be fixed.
>
> Try looking here for more explanation of cluster dump:
> https://mahout.apache.org/users/clustering/cluster-dumper.html
>
> The docs are being greatly improved, so there's a chance you’ll find answers
> there.
>
> On May 19, 2014, at 2:34 PM, David Noel <[email protected]> wrote:
>
> It works for me with just -e. Maybe try that or --evaluate true?
>
> On 5/19/14, hiroshi leon <[email protected]> wrote:
>> Thanks Pat,
>>
>> But how exactly can I run clusterdump using the -evaluate (-e) parameter?
>> When i try to run it for example:
>>
>> ./mahout clusterdump -i /user/Data-output/clusters-1-final -o analyze.txt
>> --evaluate
>>
>> I get a Java null pointer Exception
>>
>> 14/05/19 15:02:03 INFO common.AbstractJob: Command line arguments:
>> {--dictionaryType=[text],
>> --distanceMeasure=[org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure],
>> --endPhase=[2147483647], --evaluate=null,
>> --input=[/user/Data-output/clusters-1-final], --output=[analyze.txt],
>> --outputFormat=[TEXT], --startPhase=[0], --tempDir=[temp]}
>> Exception in thread "main" java.lang.NullPointerException
>>
>> Do I have to put a parameter to evaluate? As input for clusterdump I am
>> using the output with the clusters after running mahout K-Means.
>>
>>> Subject: Re: Mahout K-Means - Quality of the clusters
>>> From: [email protected]
>>> Date: Sat, 17 May 2014 09:43:59 -0700
>>> To: [email protected]
>>>
>>> mahout clusterdump —evaluate …
>>>
>>> provides some stats
>>>
>>> On May 15, 2014, at 10:23 PM, hiroshi leon <[email protected]>
>>> wrote:
>>>
>>> Hello everybody,
>>>
>>> Do you know how can I get the MSE of the clusters in mahout K-Means?
>>> I would like to check the quality of the clusters. Thanks!
>>>
>>>
>>>
>>
>