How does memory requirement grow with the number of topics? A little
experimentation shows me that number of documents doesn't matter as much as
the number of topics ... Does the memory requirement grow exponentially
with the number of topics?


On Thu, Mar 10, 2016 at 11:43 AM, David Starina <>

> Hi,
> I realize MapReduce algorithms are not the "hot new stuff" anymore, but I
> am playing around with LDA. I have some problems with the memory, can you
> help me suggest how to set up parameters to make this work?
> I am running on a virtual cluster on my laptop - two nodes with 3 GB of
> memory each - just to prepare before I try this on a physical cluster with
> much larger data set. I am using a data set of 500 documents, averaging
> around 120 kB each, with roughly 60.000 terms. Running this with 20 topics
> runs ok - but when running on 100 topics, I ran out of memory (on the
> mappers). Can you suggest me how to set parameters, so it's going to run
> more mappers that will consume less memory?
> The error I get: Task Id : attempt_1457214584155_0074_m_000000_1, Status :
> *Container*
> [pid=26283,containerID=container_1457214584155_0074_01_000003] *is
> running beyond physical memory limits. Current usage: 1.0 GB of 1 GB
> physical memory used*; 1.7 GB of 2.1 GB virtual memory used. Killing
> container.
> This are the parameters I set for CVB0Driver:
> static int numTopics = 100;
> static double doc_topic_smoothening = 0.5;
> static double term_topic_smoothening = 0.5;
> static int maxIter = 3;
> static int iteration_block_size = 10;
> static double convergenceDelta = 0;
> static float testFraction = 0.0f;
> static int numTrainThreads = 4;
> static int numUpdateThreads = 1;
> static int maxItersPerDoc = 3;
> static int numReduceTasks = 10;
> static boolean backfillPerplexity = false;
> Any suggestion? Should I enlarge the container size on Hadoop, or can I fix 
> this with LDA parameters?
> Cheers,
> David

Reply via email to