Re: lda + vector dump

Suneel Marthi Fri, 23 Aug 2013 10:26:15 -0700

Charly,

If the documentation isn't clear, I would like at the CVB example in 
examples/bin/cluster_reuters.sh for the correct sequence of steps and the 
various parameters.





________________________________
 From: Charly Lizarralde <[email protected]>
To: [email protected] 
Sent: Friday, August 23, 2013 11:45 AM
Subject: Re: lda + vector dump
 

I think I am doing it on the cvb output ( 1 record per topic ) so
dictionary is used to output the topic most relevant terms....but I'll
check!


On Fri, Aug 23, 2013 at 12:37 PM, Liz Merkhofer <
[email protected]> wrote:

> Hi Charly,
>
> I've been playing around with cvb, too. I have a few thoughts on b,
> vectordump:
>
> What are you doing vectordump on? If you're doing it on your cvb output,
> you're getting something like a dictionary per topic, with
> <input-word-key>:<probability-it's-in-this-cluster>. If you're doing it on
> cvb-topics output, for each document, you're getting the likelihood that it
> belongs to each of your topics.
>
> I wonder if your problem is that you read the same book I did, "Hadoop
> MapReduce Cookbook," that advised to use vectordump with the dictionary
> flag as your dictionary from s2s. Don't do that - that translates your
> document or topic keys as if they were your vocab keys, and it's just
> completely nonsensical.
>
> Best,
> Liz Merkhofer
>
>
>
> On Fri, Aug 23, 2013 at 11:18 AM, Charly Lizarralde <
> [email protected]> wrote:
>
> > Hi everyone, I am experimenting with cvb algorithm and I have a few
> > questions....
> >
> > a) Is there any updated documentation? I have been collecting info from
> > mail lists, blogs, etc. I have been writing a small beginers tutorial, if
> > you like I'll send it.
> >
> > b) Should I remove "stop-words" before building the feature vectors ? I
> am
> > having some trouble "reading" the results....
> >
> > c) Vectordump is not sorting well...is this a reported bug ? ( I am
> > building mahout from trunk now )
> >
> > d) Any considerations on performance? It took 10 hours on a 5 node
> cluster
> > and  I've set 20 iterations on less than 10.000 docs and it took
> >
> > Thanks!
> > Charly
> >
>

Re: lda + vector dump

Reply via email to