Dear all

I am working on mahout to use canopy and kmeans and got a problem
about clusterdump output.
Each vector has simple number incremented from 1 as its name.

When I used 5,000 vectors, I got a correct output. It looks like:

VL-0{n=64,c=[...], r[...]}
    1.0: 1= [...]
    1.0: 3= [...]
    1.0: 4= [...]
     ...
    1.0: 396= [...]    # The number of vectors is exactly same as n(64).
VL-1{n=5,c=[...], r[...]}
    1.0: 2= [...]
    1.0: 12= [...]
    ...
    1.0: 4221= [...]
VL-2{n=121,c=[...], r[...]}
...

Each number of vectors in VL is exactly same as its n value.

When I used 600,000 vectors, the output looks wrong like:

VL-0{n=14,c=[...], r[...]}
    1.0: 66636= [...]
    1.0: 122570= [...]
    ...
    1.0: 522794= [...]    # The number of vectors is 31.
VL-8{n=0,c=[...], r[...]}
    1.0: 393539= [...]
    1.0: 398877= [...]
    ...
    1.0: 513448= [...]    # The number of vectors is 5.
VL-16{n=2,c=[...], r[...]}
...

It looks VL-1 to VL-7 and VL-9 to VL-15 are not used but I confirmed
them existing in the output.
It seems using VL in order as 0,8,16,...,11552, 1,9,17,...,11553,
2,10,18... and so on.

Can I believe this result or should I doubt this is caused by some bugs?

Hadoop : 0.20.204
Mahout : rev. 1351561, 1366995, 1367871

Best regards.

-- 
nishidy@u-tokyo

Reply via email to