Rahul,
Currently the text file to sequence file functionality is contained in
(as of Mahout 0.6 / trunk):

org.apache.mahout.text.SequenceFilesFromDirectory

and it write a K/V pair to a standard sequence file in the form of:

{ filepath (Text), contents of file (Text) }

In the current single process form of the code it uses a custom
PathFilter (SequenceFilesFromDirectoryFilter) to recursively walk down
a directory and its child directories to write the contained files
into a series of sequence files based on a variety of options like
"chunk size".

An example of running this would be:

bin/mahout seqdirectory -c UTF-8 -i reuters/ -o reuters-seqfiles

Josh

On Wed, Dec 28, 2011 at 7:00 AM, rahul raghavendhra
<[email protected]> wrote:
> I am new to Mahout.. i just want to know how text file is converted into
> seqfile and then to sparse vectors..
> any kind of text file can  be converted into seq file using ./mahout
> seqdirectory ?
>
> thanks in advance..
>
> ./rahul



-- 
Twitter: @jpatanooga
Solution Architect @ Cloudera
hadoop: http://www.cloudera.com

Reply via email to