I've finally managed to resolve the OOM errors and the heavy faulting by:

- Reducing the fault count to 0 (I don't actually need it at all,
given that L1s do their own data locality management).
- Increasing the L2 cache manager eviction percentage
(l2.cachemanager.percentageToEvict) to 25.
- Reducing the passive sync batch size
(l2.objectmanager.passive.sync.batch.size) to 100.

On Wed, Nov 17, 2010 at 8:55 PM, Sergio Bossa <sergio.bo...@gmail.com> wrote:
> To recap, there seems to be three distinct problems:
>
> 1) Within a running cluster, adding new nodes causes a lot of object faults.
> 2) Upon restart of a whole cluster, the active master does a lot of
> faulting, eventually going OOM.
> 3) Upon restart, the passive also goes OOM while trying to sync up.
>
> I tried to set the master eviction percentage to 25 and this seems to help,
> but there's still lots of faulting activity ...
>
> Masters have both 1GB of heap, which is a reasonable amount of memory I
> think to hold just a few GBs of data.
>
> Any help, maybe some tuning tips?
>
> Sergio Bossa
> Sent by iPhone
>
> Il giorno 17/nov/2010, alle ore 18.36, Sergio Bossa <sergio.bo...@gmail.com>
> ha scritto:
>
>> I tried to wait several minutes for the masters to startup and stop
>> logging faults, then started the clients and tried working with them.
>> But, after a few minutes of usage, masters crash with OOME ...
>>
>> On Wed, Nov 17, 2010 at 6:01 PM, Sergio Bossa <sergio.bo...@gmail.com>
>> wrote:
>>>
>>> After restarting all masters and clients, the cluster is completely
>>> unusable: masters are very busy and clients cannot connect at all.
>>> I'm attaching the active master logs, where I see lots of object
>>> faulting and strange messages from the ObjectManager.
>>>
>>> Here are some info about my setup:
>>> - Terracotta 3.4.0.
>>> - 1.9GB of objectdb (on disk).
>>> - 1 active and 1 master, each one with 1GB.
>>> - 2 clients, each one with 1.5GB.
>>>
>>> On Wed, Nov 17, 2010 at 4:36 PM, Sergio Bossa <sergio.bo...@gmail.com>
>>> wrote:
>>>>
>>>> Hi guys,
>>>>
>>>> when I try to connect a new client to a Terracotta master holding a
>>>> few millions of objects (I can give you an approximated size if
>>>> needed), the client fails to connect with a timeout and the master
>>>> gets heavily busy trying to fault objects to send to the new client.
>>>> I'm attaching a thread dump of the master: you'll notice several
>>>> threads being busy at faulting in objects (see
>>>> managed_object_fault_stage).
>>>> Two questions:
>>>> 1) Why is the master trying to fault so many objects at client
>>>> connection?
>>>> 2) Why is the master still busy in the fault stage even *after* the
>>>> client is completely disconnected?
>>>>
>>>> --
>>>> Sergio Bossa
>>>> http://www.linkedin.com/in/sergiob
>>>>
>>>
>>>
>>>
>>> --
>>> Sergio Bossa
>>> http://www.linkedin.com/in/sergiob
>>>
>>
>>
>>
>> --
>> Sergio Bossa
>> http://www.linkedin.com/in/sergiob
>



-- 
Sergio Bossa
http://www.linkedin.com/in/sergiob
_______________________________________________
tc-dev mailing list
tc-dev@lists.terracotta.org
http://lists.terracotta.org/mailman/listinfo/tc-dev

Reply via email to