I have done a lot of searching on the web for this, but I've found nothing, 
even though I feel like it has to be somewhat common. I have used Mahout's 
'seqdirectory' command to convert a folder containing text files (each file is 
a separate document) in the past. But in this case there are so many documents 
(in the 100,000s) that I have one very large text file in which each line is a 
document. How can I convert this large file to SequenceFile format so that 
Mahout understands that each line should be considered a separate document?  
Would it be better if the file was structured like so....docId1 {tab} document 
textdocId2 {tab} document textdocId3 {tab} document text...

Thank you very much for any help.Nick
                                          

Reply via email to