To recap, there seems to be three distinct problems: 1) Within a running cluster, adding new nodes causes a lot of object faults. 2) Upon restart of a whole cluster, the active master does a lot of faulting, eventually going OOM. 3) Upon restart, the passive also goes OOM while trying to sync up.
I tried to set the master eviction percentage to 25 and this seems to help, but there's still lots of faulting activity ... Masters have both 1GB of heap, which is a reasonable amount of memory I think to hold just a few GBs of data. Any help, maybe some tuning tips? Sergio Bossa Sent by iPhone Il giorno 17/nov/2010, alle ore 18.36, Sergio Bossa <sergio.bo...@gmail.com > ha scritto: > I tried to wait several minutes for the masters to startup and stop > logging faults, then started the clients and tried working with them. > But, after a few minutes of usage, masters crash with OOME ... > > On Wed, Nov 17, 2010 at 6:01 PM, Sergio Bossa > <sergio.bo...@gmail.com> wrote: >> After restarting all masters and clients, the cluster is completely >> unusable: masters are very busy and clients cannot connect at all. >> I'm attaching the active master logs, where I see lots of object >> faulting and strange messages from the ObjectManager. >> >> Here are some info about my setup: >> - Terracotta 3.4.0. >> - 1.9GB of objectdb (on disk). >> - 1 active and 1 master, each one with 1GB. >> - 2 clients, each one with 1.5GB. >> >> On Wed, Nov 17, 2010 at 4:36 PM, Sergio Bossa >> <sergio.bo...@gmail.com> wrote: >>> Hi guys, >>> >>> when I try to connect a new client to a Terracotta master holding a >>> few millions of objects (I can give you an approximated size if >>> needed), the client fails to connect with a timeout and the master >>> gets heavily busy trying to fault objects to send to the new client. >>> I'm attaching a thread dump of the master: you'll notice several >>> threads being busy at faulting in objects (see >>> managed_object_fault_stage). >>> Two questions: >>> 1) Why is the master trying to fault so many objects at client >>> connection? >>> 2) Why is the master still busy in the fault stage even *after* the >>> client is completely disconnected? >>> >>> -- >>> Sergio Bossa >>> http://www.linkedin.com/in/sergiob >>> >> >> >> >> -- >> Sergio Bossa >> http://www.linkedin.com/in/sergiob >> > > > > -- > Sergio Bossa > http://www.linkedin.com/in/sergiob _______________________________________________ tc-dev mailing list tc-dev@lists.terracotta.org http://lists.terracotta.org/mailman/listinfo/tc-dev