On Jan 4, 2012, at 3:22 PM, Dmitriy Lyubimov wrote: > also via command line, the same processing is (I think ) achieved by > seqdirectory command.
./bin/mahout seqdirectory will convert to sequence files ./bin/mahout seq2sparse will do the TF-IDF conversion See examples/bin/cluster-reuters, amongst others, for examples of these in action. > > On Wed, Jan 4, 2012 at 8:31 AM, Grant Ingersoll <[email protected]> wrote: >> Hu Junaid, >> >> Have a look at the SparseVectorsFromSequenceFiles class, as this does this >> already, in combination with SequenceFilesFromDirectory which can convert >> text files to SequenceFiles. >> >> -Grant >> On Jan 4, 2012, at 8:30 AM, Junaid Surve wrote: >> >>> Hi >>> >>> I want to develop a Prototype to calculate the TF IDF from the documents >>> present in a directory. >>> >>> Can you please help me with the Steps to go about it using Apache Mahout? >>> Thank you. >>> >>> -- >>> Regards >>> Junaid >> >> -------------------------------------------- >> Grant Ingersoll >> http://www.lucidimagination.com >> >> >>
