Hi Dale,

Have a look at:

integration/src/main/java/org/apache/mahout/text/MailArchivesClusteringAnalyzer.java
integration/src/main/java/org/apache/mahout/text/SequenceFilesFromMailArchives.java

Cheers,
Tim

On Fri, Dec 16, 2011 at 12:04 PM, Dale McDiarmid <[email protected]> wrote:

> Hi,
> Im new to the community, mahout and machine learning.  Upto now i've been
> experimenting the Gensim libraries to perform LDA analysis on a large
> document corpus.  I do, however, have an hadoop cluster to hand and would
> like to explore the capabilities of Mahout. I
>  have my files in a proprietary format - structured text effectively where
> a
> text file contains multiple document entries.  I understand i need to
> produce sparse vectors for the LDA analysis - im happy to write a document
> parser and have setup my IntelliJ/maven env with the necessary
> dependencies.
> Could someone please share some code from a similar requirement to get me
> started - an example reading a csv file for example.  All help appreciated.
>
> Thanks
> Dale
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Generating-vectors-from-custom-source-tp3592300p3592300.html
> Sent from the Mahout User List mailing list archive at Nabble.com.
>

Reply via email to