I had the exact same issue and I tried to use the seqdirectory command with a different filter class but It did not work. It seems there's a bug in the mahout-0.6 code.
It ended up as writing a custom map-reduce program that performs just that. Greetiings! Charly On Tue, Oct 30, 2012 at 5:00 PM, Nick Woodward <[email protected]> wrote: > > I have done a lot of searching on the web for this, but I've found > nothing, even though I feel like it has to be somewhat common. I have used > Mahout's 'seqdirectory' command to convert a folder containing text files > (each file is a separate document) in the past. But in this case there are > so many documents (in the 100,000s) that I have one very large text file in > which each line is a document. How can I convert this large file to > SequenceFile format so that Mahout understands that each line should be > considered a separate document? Would it be better if the file was > structured like so....docId1 {tab} document textdocId2 {tab} document > textdocId3 {tab} document text... > > Thank you very much for any help.Nick >
