Hi,
Im new to the community, mahout and machine learning.  Upto now i've been
experimenting the Gensim libraries to perform LDA analysis on a large
document corpus.  I do, however, have an hadoop cluster to hand and would
like to explore the capabilities of Mahout. I
 have my files in a proprietary format - structured text effectively where a
text file contains multiple document entries.  I understand i need to
produce sparse vectors for the LDA analysis - im happy to write a document
parser and have setup my IntelliJ/maven env with the necessary dependencies. 
Could someone please share some code from a similar requirement to get me
started - an example reading a csv file for example.  All help appreciated.

Thanks
Dale

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Generating-vectors-from-custom-source-tp3592300p3592300.html
Sent from the Mahout User List mailing list archive at Nabble.com.

Reply via email to