Latent Dirichlet Allocatio (cvb)

2013-07-31 Thread Marco
Hi, I'm new here so forgive my little experience with Mahout. We're trying to use Mahout (on our hadoop cluster) for calculating topics on almost 14000 documents. I've been following this wiki page (http://goo.gl/DcPVjB) but still getting errors. Here's what I'm doing: 1) creating sequence

Re: Latent Dirichlet Allocatio (cvb)

2013-07-31 Thread Suneel Marthi
: Latent Dirichlet Allocatio (cvb) Hi, I'm new here so forgive my little experience with Mahout. We're trying to use Mahout (on our hadoop cluster) for calculating topics on almost 14000 documents. I've been following this wiki page (http://goo.gl/DcPVjB) but still getting errors. Here's what

Re: Latent Dirichlet Allocatio (cvb)

2013-07-31 Thread Marco
...@yahoo.com A: user@mahout.apache.org user@mahout.apache.org; Marco zentrop...@yahoo.co.uk Inviato: Mercoledì 31 Luglio 2013 11:01 Oggetto: Re: Latent Dirichlet Allocatio (cvb) RowId job creates a matrix (IntWritable, VectorWritable) and a docIndex (IntWritable, Text). So you should be seeing

Re: Latent Dirichlet Allocatio (cvb)

2013-07-31 Thread Jake Mannix
a vectodump i always get a Java Heaps Space error Da: Suneel Marthi suneel_mar...@yahoo.com A: user@mahout.apache.org user@mahout.apache.org; Marco zentrop...@yahoo.co.uk Inviato: Mercoledì 31 Luglio 2013 11:01 Oggetto: Re: Latent Dirichlet Allocatio (cvb

Re: Latent Dirichlet Allocatio (cvb)

2013-07-31 Thread Marco
Da: Jake Mannix jake.man...@gmail.com A: user@mahout.apache.org user@mahout.apache.org; Marco zentrop...@yahoo.co.uk Cc: Suneel Marthi suneel_mar...@yahoo.com Inviato: Mercoledì 31 Luglio 2013 16:34 Oggetto: Re: Latent Dirichlet Allocatio (cvb) If you're supplying a dictionary

Re: Latent Dirichlet Allocatio (cvb)

2013-07-31 Thread Jake Mannix
: Re: Latent Dirichlet Allocatio (cvb) If you're supplying a dictionary file (as you are), I'd suggest not specifying the -nt 9 option - you're apparently specifying a numTerms less than the actual number of terms in some of your vectors. If you supply the -dict option, it'll infer

Re: Latent Dirichlet Allocatio (cvb)

2013-07-31 Thread Suneel Marthi
@mahout.apache.org; Marco zentrop...@yahoo.co.uk Sent: Wednesday, July 31, 2013 10:51 AM Subject: Re: Latent Dirichlet Allocatio (cvb) On Wed, Jul 31, 2013 at 7:44 AM, Marco zentrop...@yahoo.co.uk wrote: ok. i'll re run it without that nt (which i supposed was NOT optional). Well, it's

Re: Latent Dirichlet Allocatio (cvb)

2013-07-31 Thread Marco
@mahout.apache.org; Marco zentrop...@yahoo.co.uk Inviato: Mercoledì 31 Luglio 2013 16:51 Oggetto: Re: Latent Dirichlet Allocatio (cvb) On Wed, Jul 31, 2013 at 7:44 AM, Marco zentrop...@yahoo.co.uk wrote: ok. i'll re run it without that nt (which i supposed was NOT optional). Well, it's

Re: Latent Dirichlet Allocatio (cvb)

2013-07-31 Thread Suneel Marthi
Subject: Re: Latent Dirichlet Allocatio (cvb) running: mahout vectordump -i jojoba/to-output -d jojoba/vectors/dictionary.file-0 -dt sequencefile --vectorSize 10 -sort jojoba/to-output it's mahout 0.7 (we're using cloudera CDH4.2) Da: Jake Mannix jake.man

Re: Latent Dirichlet Allocatio (cvb)

2013-07-31 Thread Marco
already looked there. no cvb examle or vectordump :( Da: Suneel Marthi suneel_mar...@yahoo.com A: user@mahout.apache.org user@mahout.apache.org; Marco zentrop...@yahoo.co.uk Inviato: Mercoledì 31 Luglio 2013 16:55 Oggetto: Re: Latent Dirichlet Allocatio (cvb

Re: Latent Dirichlet Allocatio (cvb)

2013-07-31 Thread Jake Mannix
31 Luglio 2013 16:34 Oggetto: Re: Latent Dirichlet Allocatio (cvb) If you're supplying a dictionary file (as you are), I'd suggest not specifying the -nt 9 option - you're apparently specifying a numTerms less than the actual number of terms in some of your vectors. If you supply

Re: Latent Dirichlet Allocatio (cvb)

2013-07-31 Thread Suneel Marthi
, 2013 11:05 AM Subject: Re: Latent Dirichlet Allocatio (cvb) already looked there. no cvb examle or vectordump :( Da: Suneel Marthi suneel_mar...@yahoo.com A: user@mahout.apache.org user@mahout.apache.org; Marco zentrop...@yahoo.co.uk Inviato: Mercoledì 31

Re: Latent Dirichlet Allocatio (cvb)

2013-07-31 Thread Marco
; Marco zentrop...@yahoo.co.uk Inviato: Mercoledì 31 Luglio 2013 17:07 Oggetto: Re: Latent Dirichlet Allocatio (cvb) CVB was added to cluster_reuters.sh in 0.8, u wouldn't see it in 0.7. Suggest that you work off of 0.8. From: Marco zentrop...@yahoo.co.uk

Re: Latent Dirichlet Allocatio (cvb)

2013-07-31 Thread Ted Dunning
On Wed, Jul 31, 2013 at 8:33 AM, Marco zentrop...@yahoo.co.uk wrote: will check out if cloudera supports mahout 0.8. Don't worry about Cloudera support. Mahout support is better. :-)

Re: Latent Dirichlet Allocatio (cvb)

2013-07-31 Thread Sean Owen
FWIW I know Mahout 0.8 works fine with CDH4 (the mr1 version of course) and is what CDH5 will include. Should be no problems there. On Wed, Jul 31, 2013 at 4:33 PM, Marco zentrop...@yahoo.co.uk wrote: great. at least i know what's wrong :) will check out if cloudera supports mahout 0.8.