@DaveHarvey, I'll look at that tomorrow. Seems potentially complicated, but
if that's what has to happen we'll figure it out. 

Interestingly, cutting the cluster to half as many nodes (by reducing the
number of backups) seems to have resolved the issue. Is there a guideline
for how large a cluster should be? 

We were running a single 44-node cluster, with 3 data backups (4 total
copies) and hitting the issue consistently. I switched to running two
separate clusters, each with 22 nodes using 1 data backup (2 total copies).
The smaller clusters seem to work perfectly every time, though I haven't
tried them as much.


@smovva - We're still actively experimenting with instance and cluster
sizing. We were running on c4.4xl instances. However we were barely using
the CPUs, but consistently have memory issues (using a 20GB heap, plus a bit
of off-heap). We just switched to r4.2xl instances which is working better
for us so far, and is a bit cheaper. However I would imagine that the
optimal size depends on your use case - it's basically a tradeoff between
the memory, CPU, networking and operational cost requirements of your use
case. 



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Reply via email to