Hi Dale, Have a look at:
integration/src/main/java/org/apache/mahout/text/MailArchivesClusteringAnalyzer.java integration/src/main/java/org/apache/mahout/text/SequenceFilesFromMailArchives.java Cheers, Tim On Fri, Dec 16, 2011 at 12:04 PM, Dale McDiarmid <[email protected]> wrote: > Hi, > Im new to the community, mahout and machine learning. Upto now i've been > experimenting the Gensim libraries to perform LDA analysis on a large > document corpus. I do, however, have an hadoop cluster to hand and would > like to explore the capabilities of Mahout. I > have my files in a proprietary format - structured text effectively where > a > text file contains multiple document entries. I understand i need to > produce sparse vectors for the LDA analysis - im happy to write a document > parser and have setup my IntelliJ/maven env with the necessary > dependencies. > Could someone please share some code from a similar requirement to get me > started - an example reading a csv file for example. All help appreciated. > > Thanks > Dale > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Generating-vectors-from-custom-source-tp3592300p3592300.html > Sent from the Mahout User List mailing list archive at Nabble.com. >
