Hi,
We are using a simple replicated Ignite cache with a few continuous queries.
Recently we were running into an OOME after several days of running it without
a restart. The histogram shows that most of the heap is utilized by the
buffered/cached continuous query entries. The code analysis shows that each
continuous query requires a buffer with a batch having a cache for 1000
continuous query entries. It would be understandable but all gets multiplied by
the number of partitions which defaults to 1024 (or 512 for replicated cache
mode). A single continuous query can then over the time cache up-to
1000x1024=>~1M entries including the entry payload (key, newVal, oldVal). We
did not try to understand all the details around the
CacheContinuousQueryEventBuffer class but we feel there might be a chance to
clear some cached entries earlier. Currently there is a clean at the batch
being full condition only. This allows a continuous queries which don't produce
many events to grow to the full size over a long time. We feel it would be much
better to have a dynamic buffer size to cover certain time only instead of the
fixed length. Additionally it would partially solve the problem with the
per-partition buffering as each partition would dynamically buffer it's time
amount required only (obviously with the uniform distribution they would be
more or less buffering the same amount).
Please note that we might be wrong but we feel there is no reason to buffer 1M
entries representing many days of events for each and every continuous query.
We tried to help it by using a less number of partitions but we are still
looking for a detailed explanation of the high heap memory requirement coming
from the continuous query entries buffering/caching.
Best regards,
Michal
Hidden configuration of the entries size
private static final int BUF_SIZE =
IgniteSystemProperties.getInteger("IGNITE_CONTINUOUS_QUERY_SERVER_BUFFER_SIZE",
1000);
Clear all when full
if (pos == entries.length - 1) {
Arrays.fill(entries, null);
Heap
1: 729468 2955637048 [B
2: 14963642 957673088
org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryEntry
3: 40700 163451200
[Lorg.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryEntry;
4: 951580 33617168 [C
5: 951355 22832520 java.lang.String
6: 362093 14483720
org.apache.ignite.internal.binary.BinaryObjectImpl
7: 300481 12019240
com.cbksec.flow.history.shared.document.DataId
8: 363737 8729688
org.apache.ignite.internal.processors.affinity.AffinityTopologyVersion
9: 300483 7211592
com.cbksec.flow.history.shared.document.DataVersion
10: 300481 7211544
com.cbksec.flow.history.shared.document.DocumentId
11: 362093 5793488
org.apache.ignite.internal.processors.cache.CacheObjectByteArrayImpl
Reduce number of partitions to 32
new RendezvousAffinityFunction(false, 32);