Hello

We are facing this exception when multiple clients are trying to read a big 
cached object using the standard  value = cache.get(key). 


The cached object is a big serialized  object that can reach hundreds of MB in 
size. The server node has 16GB of heap which should be fairly enough for this 
use case.

The setup to reproduce the issue is simple. 

I launch one server node with 16GB heap
then one producer client node that populate the cache with this big object
then multiple Ignite consumer clients are simultaneously launched and get the 
cached value.

Result in my case I can launch 2 clients in parallel, but if fails with three.
If the clients are launched in sequence with enough idle time between them, 
there is no problem, the heap max size is not reached, given that heap is not 
requested by the network transfer.


I correlated the heap size augmentation with the serialisation process of the 
cached object on the network. It seems that the serialisation process consume 
heap memory at will until OOME happens when 2 many transfers are occurring in 
parallel. 

So simply put it does scale at all, because I have same issue with large number 
of clients and servers. Even with 100 server nodes, at some point 2 or 3 
clients will try to request on the same node which will trigger the OOME

What can I do to solve this issue in the very short term ?  

 Can I configure the network transfer on the caches to limit number of 
simultaneous request i.e a kind of queuing of cache get request per server node 
?

In the long term we'll change the architecture to avoid the spawning of 
hundreds of simultaneous clients but in any case it would be nice to have a 
solution to this issue.


Thanks for your help.

Reply via email to