>  Jeff Eastman <jdog <at> windwardsolutions.com> writes:
>   Try naming the input *directory* not the particular input file.

I tried,but the result was the same.
But i did a discovery about a bug of mahout.

When I try to convert a text file in a sequence with the command line:

bin/mahout seqdirectory –input <PATH> --output <PATH> --charset UTF-8

and then in a sparse vector with:

bin/mahout seq2sparse --input <PATH>/content/reuters/seqfiles/ --norm 2 --weight
TF --output <PATH>/content/reuters/seqfiles-TF/ --minDF 5 --maxDFPercent 90

if the original file isn't correct,or the path is incorrect
mahout create a fake chunk-0,not useful for the seq2sparse,and the second 
command create other 
 
useless things because files are empty and you can see this because the file
part-00000 in the folder vector is around 90 bytes.

I think that this was an old your answer to a similar problem like mine ^^

have you got a link or a site where I can download a correct text file that is 
a dataset? so i can try to convert it in sequence and then in vectors to see
what mahout kmeans produce.

Thanks in advance!





Reply via email to