Thanks Pat and David,
I tried what you told me to do, but unfortunately is not working... I get the
following error when running the command:
./mahout clusterdump -i /user/Data-output/clusters-1-final -o analyze.txt
--evaluate true
"ERROR common.AbstractJob: Unexpected true while processing Job-Specific
Options:
Unexpected true while processing Job-Specific Options."
According to the clusterdump help, it is not suppose to have any value in the
parameter --evaluate (-e), but if I do not
put anything I get the Java Null Pointer Exception.
These are 2 of the 23 clusters that are generated of my analyze.txt file, maybe
it can help to explain if there is something unexpected:
CL-0{n=113525 c=[10.821, 48.382, 66.019, 0.004, 0.000, 0.001, 0.000, 0.001,
0.001, 0.000, 0.000, 0.000, 0.000, 4.921, 8.565, 0.068, 0.068, 0.207, 0.205,
0.951, 0.052, 0.139, 209.864, 175.184, 0.731, 0.079, 0.119, 0.025, 0.069,
0.067, 0.191, 0.196] r=[91.194, 45.425, 78.914, 0.110, 0.008, 0.035, 0.028,
0.037, 0.038, 0.013, 0.008, 0.016, 0.011, 10.173, 23.152, 0.252, 0.252, 0.405,
0.403, 0.164, 0.195, 0.292, 80.182, 102.034, 0.395, 0.223, 0.290, 0.072, 0.251,
0.250, 0.381, 0.388]}
VL-1{n=17 c=[1.133, 0.669, 1.874, 1.460, 1.688, 1.818, 1.939, 1.255, 1.484,
1.697, 0.554, 1.042, 1.774, 0.818, 1.901, 1.522, 1.518, 1.098, 1.637, 1.611,
1.615, 1.212, 1.088, 1.133, 1.483, 0.761, 0.757, 0.953, 1.559, 1.696, 0.548,
0.975] r=[0.000, 0.000, 0.000, 0.000, NaN, NaN, NaN, NaN, 0.000, 0.000, NaN,
0.000, 0.000, 0.000, NaN, 0.000, NaN, 0.000, NaN, 0.000, 0.000, 0.000, 0.000,
0.000, NaN, 0.000, NaN, 0.000, 0.000]}
Thanks!
> Subject: Re: Mahout K-Means - Quality of the clusters
> From: [email protected]
> Date: Mon, 19 May 2014 14:50:47 -0700
> To: [email protected]; [email protected]
>
> Yep, the clue is "--evaluate=null” in the console. try "-e true". I think I
> ran into that a long time ago, it should really be fixed.
>
> Try looking here for more explanation of cluster dump:
> https://mahout.apache.org/users/clustering/cluster-dumper.html
>
> The docs are being greatly improved, so there's a chance you’ll find answers
> there.
>
> On May 19, 2014, at 2:34 PM, David Noel <[email protected]> wrote:
>
> It works for me with just -e. Maybe try that or --evaluate true?
>
> On 5/19/14, hiroshi leon <[email protected]> wrote:
> > Thanks Pat,
> >
> > But how exactly can I run clusterdump using the -evaluate (-e) parameter?
> > When i try to run it for example:
> >
> > ./mahout clusterdump -i /user/Data-output/clusters-1-final -o analyze.txt
> > --evaluate
> >
> > I get a Java null pointer Exception
> >
> > 14/05/19 15:02:03 INFO common.AbstractJob: Command line arguments:
> > {--dictionaryType=[text],
> > --distanceMeasure=[org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure],
> > --endPhase=[2147483647], --evaluate=null,
> > --input=[/user/Data-output/clusters-1-final], --output=[analyze.txt],
> > --outputFormat=[TEXT], --startPhase=[0], --tempDir=[temp]}
> > Exception in thread "main" java.lang.NullPointerException
> >
> > Do I have to put a parameter to evaluate? As input for clusterdump I am
> > using the output with the clusters after running mahout K-Means.
> >
> >> Subject: Re: Mahout K-Means - Quality of the clusters
> >> From: [email protected]
> >> Date: Sat, 17 May 2014 09:43:59 -0700
> >> To: [email protected]
> >>
> >> mahout clusterdump —evaluate …
> >>
> >> provides some stats
> >>
> >> On May 15, 2014, at 10:23 PM, hiroshi leon <[email protected]>
> >> wrote:
> >>
> >> Hello everybody,
> >>
> >> Do you know how can I get the MSE of the clusters in mahout K-Means?
> >> I would like to check the quality of the clusters. Thanks!
> >>
> >>
> >>
> >
>