Re: LDA Questions

2012-08-07 Thread Gokhan Capan
Hi Jake, Today I submitted the diff. It is available at https://issues.apache.org/jira/browse/MAHOUT-1051 Thanks for the advices On Tue, Aug 7, 2012 at 1:06 AM, Jake Mannix wrote: > Sounds great Gokhan! > > On Mon, Aug 6, 2012 at 2:53 PM, Gokhan Capan wrote: > > > Jake, > > > > I converted th

Re: LDA Questions

2012-08-06 Thread Jake Mannix
Sounds great Gokhan! On Mon, Aug 6, 2012 at 2:53 PM, Gokhan Capan wrote: > Jake, > > I converted the ids to integers with rowid, and then > modified InMemoryCollapsedVariationBayes0.loadVectors() such that it > returns a SparseMatrix (instead of SparseRowMatrix) whose row ids are keys > from tf

Re: LDA Questions

2012-08-06 Thread Gokhan Capan
Jake, I converted the ids to integers with rowid, and then modified InMemoryCollapsedVariationBayes0.loadVectors() such that it returns a SparseMatrix (instead of SparseRowMatrix) whose row ids are keys from tf vectors. I am not sure if it works, since the values of mapped integer ids (results of

Re: LDA Questions

2012-08-06 Thread Jake Mannix
Hi Gokhan, This looks like a bug in the InMemoryCollapsedVariationBayes0.loadVectors() method - it takes the SequenceFile and ignores the keys, assigning the rows in order into an in-memory Matrix. If you run "$MAHOUT_HOME/bin/mahout rowid -i -o " this converts Text keys into IntWritable key

LDA Questions

2012-08-06 Thread Gokhan Capan
Hi, My question is about interpreting lda document-topics output. I am using trunk. I have a directory of documents, each of which are named by integers, and there is no sub-directory of the data directory. The directory structure is as follows $ ls /path/to/data/ 1 2 5 ... >From th