Re: Order of documents in LDA results

Andy Schlaikjer Mon, 02 Jul 2012 08:36:43 -0700

Ivan,

Mahout LDA input:

1) a set of (document id, term vector) pairs in SequenceFile<IntWritable,
VectorWritable> format.
2) optionally, a dictionary of (term, term index) pairs in
SequenceFile<IntWritable, Text> format.

Output:

1) a "model"; set of (topic index, term vector) pairs
in SequenceFile<IntWritable, VectorWritable> format. Topic identifiers are
zero-based indices.
2) optionally, a set of (document id, topic vector) pairs
in SequenceFile<IntWritable, VectorWritable> format. This is inference
output of the trained model on input #1 above. Note that the topic vectors
have cardinality equal to the number of latent topics you trained with
(e.g. 50, 100) and are dense. An entry k in document d's topic vector
represets the model's estimate of p(topic = k | doc = d).

Andy
@sagemintblue

On Mon, Jul 2, 2012 at 5:54 AM, ivan obeso <[email protected]>wrote:

> Hi,
>
> I would like to know wich is the order of the documents in the LDA running
> results. For example, I know that the topic/word file is a group of
> IntWritable keys with VectorWritable values, and the key corresponds with
> the topic id and the intWritable have in position 0 the word in position 0
> in the dictionary file....
>
> but in the document/topic file I am not sure about the order followed. The
> key is an IntWritable that represents the document ID, but i dont know
> where to read the filename/docID table.
>
> Thanks.
>

Re: Order of documents in LDA results

Reply via email to