Re: Dirichlet clustering woes ...

Timothy Potter Thu, 24 Feb 2011 15:32:27 -0800

I'm re-running it right now on 4-node cluster of EC2 xlarge instances with 3
reducers / node and 4GB max heap per child ... none are swapping and all
have load avg around 3 ... will post results once I have them.

Intuitively, your comment about all points being assigned to one cluster
makes sense because we get through the map tasks and all the reducers except
one very quickly ... and then it bogs down.

Thanks!

On Thu, Feb 24, 2011 at 4:23 PM, Ted Dunning <[email protected]> wrote:

> We should probably have an option to down-sample large clusters to make the
> PDF computation faster.
>
> On Thu, Feb 24, 2011 at 3:09 PM, Jeff Eastman <[email protected]> wrote:
>
> > Again, if most of your points are being assigned to a single cluster that
> > reducer will be bogged down observing them all.
>

Re: Dirichlet clustering woes ...

Reply via email to