I am looking to performing clustering algorithms on these documents which I thought (I could be wrong) requires sequence files? Is this not the case?

Thanks

On 6/6/11 10:11 AM, Daniel McEnnis wrote:
Mark,

Generally speaking, Mahout has pretty good performance over log files
like the ones your describing, so they typically don't get changed
into sequence files.  You'll need to write one for yourself if you
really need sequence files (such as for key management.)

Daniel.

On Mon, Jun 6, 2011 at 12:04 PM, Mark<[email protected]>  wrote:
I've been running through the examples as described in the Mahout In Action
book and I have some questions regarding the SequenceFilesFromDirectory.java
class.

This class expects a directory of files that contains 1 document per file.
Is there another mahout class or some options I can supply to
SequenceFilesFromDirectory.java to parse multiple documents per file? For
example, my files contain 1 document per line. I would like to parse each
line of each file and create a sequence file from this. Is this possible
with SequenceFilesFromDirectory or would I have to write this myself?

Thanks

Reply via email to