Hi Caroline, Jake Mannix and I wrote the LDA CVB implementation. Apologies for the light documentation.
When you invoked Mahout, did you supply the "--doc_topic_output <path>" parameter? If this is present, after training a model the driver app will apply the model to the input term-vectors, storing inference results in the specified path. If the parameter isn't specified, this final inference run is skipped: https://github.com/apache/mahout/blob/trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0Driver.java#L74 https://github.com/apache/mahout/blob/trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0Driver.java#L331 So, assuming you did generate inference output, I should note that both the model and inference output have the *same* format: Both the topic-term matrix and doc-topic inference output are stored as SequenceFile<IntWritable, VectorWritable> data. If you point the vectordump util at either data set and supply a dictionary, it'll happily map term ids or topic ids into term strings using that dictionary... Quite confusing. Just make sure that when you run vectordump against the doc-topic data that you don't supply the dictionary-- This way, you'll see the raw topic ids (zero-based indices) in output, instead of whatever terms those indices might correspond to in your dictionary. Best, Andy @sagemintblue On Wed, Jul 4, 2012 at 2:30 AM, Caroline Meyer <[email protected]>wrote: > Hey Guys, > > I have been able to successfully execute the new lda algorithm as well as > extract the topic/term inference with vectordump. What I was not able to do > was get the document/topic inference. When I run the same vectordump > command I get the same kinds of vectors (term:probability) as before. > Should the vectors not be (topic:probability)? > > The command I run is: > > vectordump -s temp/lda-cvb-doc/part-m-00000 -d > temp/vectors/dictionary.file-* -dt sequencefile -o temp/lda-cvb-topics.txt > > I have not been able to find any documentation except what's in the code. > Thanks for the help. > > Cheers, > Caroline >
