It has been reported recently that some of our jobs fail quietly
and/or in unexpected ways when inputs are not correct. If you can
duplicate this behavior please submit a JIRA and we will look into it.
The 0.4 release is coming up maybe next month so please help us improve
our user experience. To get a batch of correct files to inspect try
running examples/bin/build-reuters.sh.
On 8/28/10 9:33 AM, Valerio wrote:
Jeff Eastman<jdog<at> windwardsolutions.com> writes:
Try naming the input *directory* not the particular input file.
I tried,but the result was the same.
But i did a discovery about a bug of mahout.
When I try to convert a text file in a sequence with the command line:
bin/mahout seqdirectory –input<PATH> --output<PATH> --charset UTF-8
and then in a sparse vector with:
bin/mahout seq2sparse --input<PATH>/content/reuters/seqfiles/ --norm 2 --weight
TF --output<PATH>/content/reuters/seqfiles-TF/ --minDF 5 --maxDFPercent 90
if the original file isn't correct,or the path is incorrect
mahout create a fake chunk-0,not useful for the seq2sparse,and the second
command create other
useless things because files are empty and you can see this because the file
part-00000 in the folder vector is around 90 bytes.
I think that this was an old your answer to a similar problem like mine ^^
have you got a link or a site where I can download a correct text file that is
a dataset? so i can try to convert it in sequence and then in vectors to see
what mahout kmeans produce.
Thanks in advance!