Hi,

I am running LDA on 18k documents, each document has 5k terms. total 300k terms. Topics is set to 100.

Running LDA on Hadoop single node configuration takes about 5 hours per stage. And 20 stages would take 100 hours.

However, given 20 machines, running on Amazon EMR is actually much much slower. It takes 1000 minutes per stage. (It takes about 10 minutes for 1% mapping progress.) Reducing is much faster is counted in seconds, almost neglect-able.

Does anyone has similar experience or my setup is wrong?

Chris

Reply via email to