Hi, I'm new here so forgive my little experience with Mahout.
We're trying to use Mahout (on our hadoop cluster) for calculating topics on
almost 14000 documents.
I've been following this wiki page (http://goo.gl/DcPVjB) but still getting
errors.
Here's what I'm doing:
1) creating sequence
: Latent Dirichlet Allocatio (cvb)
Hi, I'm new here so forgive my little experience with Mahout.
We're trying to use Mahout (on our hadoop cluster) for calculating topics on
almost 14000 documents.
I've been following this wiki page (http://goo.gl/DcPVjB) but still getting
errors.
Here's what
...@yahoo.com
A: user@mahout.apache.org user@mahout.apache.org; Marco
zentrop...@yahoo.co.uk
Inviato: Mercoledì 31 Luglio 2013 11:01
Oggetto: Re: Latent Dirichlet Allocatio (cvb)
RowId job creates a matrix (IntWritable, VectorWritable) and a docIndex
(IntWritable, Text).
So you should be seeing
a vectodump i always get a Java Heaps Space error
Da: Suneel Marthi suneel_mar...@yahoo.com
A: user@mahout.apache.org user@mahout.apache.org; Marco
zentrop...@yahoo.co.uk
Inviato: Mercoledì 31 Luglio 2013 11:01
Oggetto: Re: Latent Dirichlet Allocatio (cvb
Da: Jake Mannix jake.man...@gmail.com
A: user@mahout.apache.org user@mahout.apache.org; Marco
zentrop...@yahoo.co.uk
Cc: Suneel Marthi suneel_mar...@yahoo.com
Inviato: Mercoledì 31 Luglio 2013 16:34
Oggetto: Re: Latent Dirichlet Allocatio (cvb)
If you're supplying a dictionary
: Re: Latent Dirichlet Allocatio (cvb)
If you're supplying a dictionary file (as you are), I'd suggest not
specifying the -nt 9 option - you're apparently specifying a numTerms
less than the actual number of terms in some of your vectors. If you
supply the -dict option, it'll infer
@mahout.apache.org; Marco
zentrop...@yahoo.co.uk
Sent: Wednesday, July 31, 2013 10:51 AM
Subject: Re: Latent Dirichlet Allocatio (cvb)
On Wed, Jul 31, 2013 at 7:44 AM, Marco zentrop...@yahoo.co.uk wrote:
ok. i'll re run it without that nt (which i supposed was NOT optional).
Well, it's
@mahout.apache.org; Marco
zentrop...@yahoo.co.uk
Inviato: Mercoledì 31 Luglio 2013 16:51
Oggetto: Re: Latent Dirichlet Allocatio (cvb)
On Wed, Jul 31, 2013 at 7:44 AM, Marco zentrop...@yahoo.co.uk wrote:
ok. i'll re run it without that nt (which i supposed was NOT optional).
Well, it's
Subject: Re: Latent Dirichlet Allocatio (cvb)
running:
mahout vectordump -i jojoba/to-output -d jojoba/vectors/dictionary.file-0 -dt
sequencefile --vectorSize 10 -sort jojoba/to-output
it's mahout 0.7 (we're using cloudera CDH4.2)
Da: Jake Mannix jake.man
already looked there. no cvb examle or vectordump :(
Da: Suneel Marthi suneel_mar...@yahoo.com
A: user@mahout.apache.org user@mahout.apache.org; Marco
zentrop...@yahoo.co.uk
Inviato: Mercoledì 31 Luglio 2013 16:55
Oggetto: Re: Latent Dirichlet Allocatio (cvb
31 Luglio 2013 16:34
Oggetto: Re: Latent Dirichlet Allocatio (cvb)
If you're supplying a dictionary file (as you are), I'd suggest not
specifying the -nt 9 option - you're apparently specifying a
numTerms
less than the actual number of terms in some of your vectors. If you
supply
, 2013 11:05 AM
Subject: Re: Latent Dirichlet Allocatio (cvb)
already looked there. no cvb examle or vectordump :(
Da: Suneel Marthi suneel_mar...@yahoo.com
A: user@mahout.apache.org user@mahout.apache.org; Marco
zentrop...@yahoo.co.uk
Inviato: Mercoledì 31
; Marco
zentrop...@yahoo.co.uk
Inviato: Mercoledì 31 Luglio 2013 17:07
Oggetto: Re: Latent Dirichlet Allocatio (cvb)
CVB was added to cluster_reuters.sh in 0.8, u wouldn't see it in 0.7.
Suggest that you work off of 0.8.
From: Marco zentrop...@yahoo.co.uk
On Wed, Jul 31, 2013 at 8:33 AM, Marco zentrop...@yahoo.co.uk wrote:
will check out if cloudera supports mahout 0.8.
Don't worry about Cloudera support. Mahout support is better. :-)
FWIW I know Mahout 0.8 works fine with CDH4 (the mr1 version of
course) and is what CDH5 will include. Should be no problems there.
On Wed, Jul 31, 2013 at 4:33 PM, Marco zentrop...@yahoo.co.uk wrote:
great. at least i know what's wrong :)
will check out if cloudera supports mahout 0.8.
15 matches
Mail list logo