Yeah, I've looked at filter classes, but nothing worked.  I guess I'll do 
something similar and continuously save each line into a file and then run 
seqdiretory.  The running time won't look good, but at least it should work.  
Thanks for the response.

Nick

> From: [email protected]
> Date: Tue, 30 Oct 2012 18:07:58 -0300
> Subject: Re: Converting one large text file with multiple documents to 
> SequenceFile format
> To: [email protected]
> 
> I had the exact same issue and I tried to use the seqdirectory command with
> a different filter class but It did not work. It seems there's a bug in the
> mahout-0.6 code.
> 
> It ended up as writing a custom map-reduce program that performs just that.
> 
> Greetiings!
> Charly
> 
> On Tue, Oct 30, 2012 at 5:00 PM, Nick Woodward <[email protected]> wrote:
> 
> >
> > I have done a lot of searching on the web for this, but I've found
> > nothing, even though I feel like it has to be somewhat common. I have used
> > Mahout's 'seqdirectory' command to convert a folder containing text files
> > (each file is a separate document) in the past. But in this case there are
> > so many documents (in the 100,000s) that I have one very large text file in
> > which each line is a document. How can I convert this large file to
> > SequenceFile format so that Mahout understands that each line should be
> > considered a separate document?  Would it be better if the file was
> > structured like so....docId1 {tab} document textdocId2 {tab} document
> > textdocId3 {tab} document text...
> >
> > Thank you very much for any help.Nick
> >
                                          

Reply via email to