Re: LDA/CVB Performance

Alan Gardner Thu, 13 Jun 2013 13:10:30 -0700

Ted: Because the threads were all stuck on one core, there are 8 copies of
the table (1 per core). We have enough RAM to allocate one table per core,
but it limits our ability to scale to larger numbers of topics or terms
because we allocate an 8GB heap per mapper.


Sebastian: This mapper is supposed to be multi-threaded. In practice, on my
cluster each JVM only ever loaded one core, so I turned off the
multi-threading and split more map jobs. If the multi-threading will
alleviate memory pressure, that is perfect for my application. Can you
advise if there's special JVM config that needs to be done to make this
work?


On Thu, Jun 13, 2013 at 4:00 PM, Sebastian Schelter <[email protected]> wrote:

> This table is readonly, right? We could try to apply the trick from our
> ALS code: Instead of running one mapper per core (and thus having one
> copy of the table per core), run a multithreaded mapper and share the
> table between its threads. Works very well for ALS. We can also cache
> the table in a static variable and make Hadoop reuse JVMs, which
> increases performance if the number of blocks to process is larger than
> the number of map slots.
>
> -sebastian
>
> On 13.06.2013 21:56, Ted Dunning wrote:
> > On Thu, Jun 13, 2013 at 6:50 PM, Jake Mannix <[email protected]>
> wrote:
> >
> >> Andy, note that he said he's running with a 1.6M-term dictionary.
>  That's
> >> going
> >> to be 2 * 200 * 1.6M * 8B = 5.1GB for just the term-topic matrices.
> Still
> >> not hitting
> >> 8GB, but getting closer.
> >>
> >
> > It will likely be even worse unless this table is shared between mappers.
> >  With 8 mappers per node, this goes to 41GB.  The OP didn't mention
> machine
> > configuration, but this could easily cause swapping.
> >
>
>


-- 
Alan Gardner
Solutions Architect - CTO Office

[email protected] | LinkedIn:
http://www.linkedin.com/profile/view?id=65508699 |
@alanctgardner<https://twitter.com/alanctgardner>
Tel: +1 613 565 8696 x1218
Mobile: +1 613 897 5655

-- 


--

Re: LDA/CVB Performance

Reply via email to