Rahul,
Currently the text file to sequence file functionality is contained in
(as of Mahout 0.6 / trunk):
org.apache.mahout.text.SequenceFilesFromDirectory
and it write a K/V pair to a standard sequence file in the form of:
{ filepath (Text), contents of file (Text) }
In the current single process form of the code it uses a custom
PathFilter (SequenceFilesFromDirectoryFilter) to recursively walk down
a directory and its child directories to write the contained files
into a series of sequence files based on a variety of options like
"chunk size".
An example of running this would be:
bin/mahout seqdirectory -c UTF-8 -i reuters/ -o reuters-seqfiles
Josh
On Wed, Dec 28, 2011 at 7:00 AM, rahul raghavendhra
<[email protected]> wrote:
> I am new to Mahout.. i just want to know how text file is converted into
> seqfile and then to sparse vectors..
> any kind of text file can be converted into seq file using ./mahout
> seqdirectory ?
>
> thanks in advance..
>
> ./rahul
--
Twitter: @jpatanooga
Solution Architect @ Cloudera
hadoop: http://www.cloudera.com