Hi,

We are using a simple replicated Ignite cache with a few continuous queries. 
Recently we were running into an OOME after several days of running it without 
a restart. The histogram shows that most of the heap is utilized by the 
buffered/cached continuous query entries. The code analysis shows that each 
continuous query requires a buffer with a batch having a cache for 1000 
continuous query entries. It would be understandable but all gets multiplied by 
the number of partitions which defaults to 1024 (or 512 for replicated cache 
mode). A single continuous query can then over the time cache up-to 
1000x1024=>~1M entries including the entry payload (key, newVal, oldVal). We 
did not try to understand all the details around the 
CacheContinuousQueryEventBuffer class but we feel there might be a chance to 
clear some cached entries earlier. Currently there is a clean at the batch 
being full condition only. This allows a continuous queries which don't produce 
many events to grow to the full size over a long time. We feel it would be much 
better to have a dynamic buffer size to cover certain time only instead of the 
fixed length. Additionally it would partially solve the problem with the 
per-partition buffering as each partition would dynamically buffer it's time 
amount required only (obviously with the uniform distribution they would be 
more or less buffering the same amount).

Please note that we might be wrong but we feel there is no reason to buffer 1M 
entries representing many days of events for each and every continuous query. 
We tried to help it by using a less number of partitions but we are still 
looking for a detailed explanation of the high heap memory requirement coming 
from the continuous query entries buffering/caching.

Best regards,
Michal

Hidden configuration of the entries size
  private static final int BUF_SIZE = 
IgniteSystemProperties.getInteger("IGNITE_CONTINUOUS_QUERY_SERVER_BUFFER_SIZE", 
1000);

Clear all when full
 if (pos == entries.length - 1) {
   Arrays.fill(entries, null);

Heap
   1:        729468     2955637048  [B
   2:      14963642      957673088  
org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryEntry
   3:         40700      163451200  
[Lorg.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryEntry;
   4:        951580       33617168  [C
   5:        951355       22832520  java.lang.String
   6:        362093       14483720  
org.apache.ignite.internal.binary.BinaryObjectImpl
   7:        300481       12019240  
com.cbksec.flow.history.shared.document.DataId
   8:        363737        8729688  
org.apache.ignite.internal.processors.affinity.AffinityTopologyVersion
   9:        300483        7211592  
com.cbksec.flow.history.shared.document.DataVersion
  10:        300481        7211544  
com.cbksec.flow.history.shared.document.DocumentId
  11:        362093        5793488  
org.apache.ignite.internal.processors.cache.CacheObjectByteArrayImpl

Reduce number of partitions to 32
   new RendezvousAffinityFunction(false, 32);

Reply via email to