How many clusters? How large is the dimension of your input data?
On Thu, Feb 3, 2011 at 9:05 PM, james q <[email protected]> wrote: > Hello, > > New user to mahout and hadoop here. Isabel Drost suggested to a colleague I > should post to the mahout user list, as I am having some general > difficulties with memory consumption and KMeans clustering. > > So a general question first and foremost: what determines how much memory > does a map task consume during a KMeans clustering job? Increasing the > number of map tasks by adjusting dfs.block.size and mapred.max.split.size > doesn't seem to make the map task consume less memory. Or at least not a > very noticeable amount. I figured if there are more map tasks, each > individual map task evaluates less input keys and hence there would be less > memory consumption. Is there any way to predict memory usage of map tasks > in > KMeans? > > The cluster I am running consists of 10 machines, each with 8 cores and 68G > of ram. I've configured the cluster to have each machine, at maximum, run 7 > map or reduce tasks. I set the map and reduce tasks to have virtually no > limit on memory consumption ... so with 7 processes each, at around 9 - 10G > per process, the machines will crap out. I can reduce the number of map > tasks per machine, but something tells me that that level of memory > consumption is wrong. > > If any more information is needed to help debug this, please let me know! > Thanks! > > -- james >
