Can you send us your command line args? Is that for 1 iteration ?  That
would be very very slow

On Monday, March 4, 2013, Benoit Mathieu wrote:

> Hi mahout users,
>
> I'd like to run the mahout Latent Dirichlet Allocation algorithm (mahout
> cvb) on my own data. I have about 1M "documents" and a vocabulary of 30k
> "terms". Documents are very sparse, each of them contains only 100 terms.
> I'd like to extract "topics" from that.
>
> I have generated mahout vectors from my data using a simple java program,
> and using RandomAccessSparseVector.
>
> I successfully launched the "mahout cvb with" job with num_topics=200, but
> the job seems very slow: 70 running map tasks took 10mn to process about
> 25000 documents on my cluster.
>
> So my questions are:
> - Does this job require specific Vector class for good performance ?
> - Is LDA algorithm suitable to process 1M docs with a dictionary of 30k
> terms ?
>
> Thanks for any insights.
>
> ++
> benoit
>


-- 

  -jake

Reply via email to