Re: Memory Issue with KMeans clustering

james q Fri, 04 Feb 2011 07:50:42 -0800

I think the job had 5000 - 6000 clusters. The input (sparse) vectors had a
dimension of 6838856.


-- james

On Fri, Feb 4, 2011 at 1:55 AM, Ted Dunning <[email protected]> wrote:

> How many clusters?
>
> How large is the dimension of your input data?
>
> On Thu, Feb 3, 2011 at 9:05 PM, james q <[email protected]>
> wrote:
>
> > Hello,
> >
> > New user to mahout and hadoop here. Isabel Drost suggested to a colleague
> I
> > should post to the mahout user list, as I am having some general
> > difficulties with memory consumption and KMeans clustering.
> >
> > So a general question first and foremost: what determines how much memory
> > does a map task consume during a KMeans clustering job? Increasing the
> > number of map tasks by adjusting dfs.block.size and mapred.max.split.size
> > doesn't seem to make the map task consume less memory. Or at least not a
> > very noticeable amount. I figured if there are more map tasks, each
> > individual map task evaluates less input keys and hence there would be
> less
> > memory consumption. Is there any way to predict memory usage of map tasks
> > in
> > KMeans?
> >
> > The cluster I am running consists of 10 machines, each with 8 cores and
> 68G
> > of ram. I've configured the cluster to have each machine, at maximum, run
> 7
> > map or reduce tasks. I set the map and reduce tasks to have virtually no
> > limit on memory consumption ... so with 7 processes each, at around 9 -
> 10G
> > per process, the machines will crap out. I can reduce the number of map
> > tasks per machine, but something tells me that that level of memory
> > consumption is wrong.
> >
> > If any more information is needed to help debug this, please let me know!
> > Thanks!
> >
> > -- james
> >
>

Re: Memory Issue with KMeans clustering

Reply via email to