Ted: Because the threads were all stuck on one core, there are 8 copies of the table (1 per core). We have enough RAM to allocate one table per core, but it limits our ability to scale to larger numbers of topics or terms because we allocate an 8GB heap per mapper.
Sebastian: This mapper is supposed to be multi-threaded. In practice, on my cluster each JVM only ever loaded one core, so I turned off the multi-threading and split more map jobs. If the multi-threading will alleviate memory pressure, that is perfect for my application. Can you advise if there's special JVM config that needs to be done to make this work? On Thu, Jun 13, 2013 at 4:00 PM, Sebastian Schelter <[email protected]> wrote: > This table is readonly, right? We could try to apply the trick from our > ALS code: Instead of running one mapper per core (and thus having one > copy of the table per core), run a multithreaded mapper and share the > table between its threads. Works very well for ALS. We can also cache > the table in a static variable and make Hadoop reuse JVMs, which > increases performance if the number of blocks to process is larger than > the number of map slots. > > -sebastian > > On 13.06.2013 21:56, Ted Dunning wrote: > > On Thu, Jun 13, 2013 at 6:50 PM, Jake Mannix <[email protected]> > wrote: > > > >> Andy, note that he said he's running with a 1.6M-term dictionary. > That's > >> going > >> to be 2 * 200 * 1.6M * 8B = 5.1GB for just the term-topic matrices. > Still > >> not hitting > >> 8GB, but getting closer. > >> > > > > It will likely be even worse unless this table is shared between mappers. > > With 8 mappers per node, this goes to 41GB. The OP didn't mention > machine > > configuration, but this could easily cause swapping. > > > > -- Alan Gardner Solutions Architect - CTO Office [email protected] | LinkedIn: http://www.linkedin.com/profile/view?id=65508699 | @alanctgardner<https://twitter.com/alanctgardner> Tel: +1 613 565 8696 x1218 Mobile: +1 613 897 5655 -- --
