OK, that's reasonable on 35 machines. (You can turn up to 70 reducers, probably, as most machines can handle 2 reducers at once). I think the recommendation step loads one whole matrix into memory. You're not running out of memory but if you're turning up the heap size to accommodate, you might be hitting swapping, yes. I think (?) the conventional wisdom is to turn off swap for Hadoop.
Sebastian yes that is probably a good optimization; I've had good results reusing a mutable object in this context. On Wed, Mar 6, 2013 at 10:54 AM, Josh Devins <[email protected]> wrote: > The factorization at 2-hours is kind of a non-issue (certainly fast > enough). It was run with (if I recall correctly) 30 reducers across a 35 > node cluster, with 10 iterations. > > I was a bit shocked at how long the recommendation step took and will throw > some timing debug in to see where the problem lies exactly. There were no > other jobs running on the cluster during these attempts, but it's certainly > possible that something is swapping or the like. I'll be looking more > closely today before I start to consider other options for calculating the > recommendations. > >
