Re: lda + vector dump

Suneel Marthi Fri, 23 Aug 2013 08:32:01 -0700


________________________________
 From: Charly Lizarralde <[email protected]>
To: [email protected] 
Sent: Friday, August 23, 2013 11:18 AM
Subject: lda + vector dump
 

Hi everyone, I am experimenting with cvb algorithm and I have a few
questions....

a) Is there any updated documentation? I have been collecting info from
mail lists, blogs, etc. I have been writing a small beginers tutorial, if
you like I'll send it.

>> Not sure if the documentation's been updated for this, what were u looking 
>> at?
      Please send across what you have.

b) Should I remove "stop-words" before building the feature vectors ? I am
having some trouble "reading" the results....

>> If you r running 'seq2sparse' for building the feature vectors and are using 
>> the Lucene  StandardAnalyzer (which is the default),
      the English stopwords should be removed automatically.

c) Vectordump is not sorting well...is this a reported bug ? ( I am
building mahout from trunk now )

>> Please post more details on this.

d) Any considerations on performance? It took 10 hours on a 5 node cluster
and  I've set 20 iterations on less than 10.000 docs and it took

Thanks!
Charly
Re: lda + vector dump

Reply via email to