Thanks for the reply. We've done query analysis and found that pretty much all queries are taking around 1-5ms execution time for our operations which we consider it to be fine.
But we have one query which is adding of rows in a table. We get maximum 5000 records at a time and we use IgniteDataStreamer API and JCache QueryEntity way to insert rows in the table. We don't use JDBC. And for all querying we either use SqlFieldsQuery or JCache getAll query. We are not using job computes as well. Now according to above statement, in real-time we get around 1500-2000 bulk row insertions from each thread. All 300-400 threads insert at once without any locking mechanism. This is actually giving us insert time of around 40-50 ms for one thread. We are suspecting because of this many thread starvations are happening. Weirdly, most of logs are printing read query delays but when seen from Client application there are no delays that are printed too. For the run-time of around two days, I'm sending you the entire logs of the coordinator node. Please find the attached. Logs.zip <http://apache-ignite-users.70518.x6.nabble.com/file/t2763/Logs.zip> Unfortunately I am not system adminstrator and couldn't get to run jstack on that system. But I'll try and get one if possible. Version - We are using gridgain's libraries of version 8.7.10. Client heap size is 2GB and Server heap size is 4GB. Config -> Peristence is ON. Checkpointing frequency is every 15 mins. Wal archiving is off. Data region size off-heap is 20GB. Persistence used would be around 25GB in real-time for our case. All Caches used are of replication mode. Full-sync happens between two server nodes. WAL segment size used is 256MB. And all are atomic mode atomicity. Topology -> two Clients connected to two Servers. Total 4 nodes. All the connection pool threads size are default ones which is equal to number of CPU cores. Client machines are of 40 core CPUs. Server machines are of 64 core CPUs. The system is actually working but we want to know if we actually are worried these may cause any issues in future like cluster wide data corruption, etc. We want to know if we are doing something wrong and would like to correct. Would be grateful if you go through the entire log. Thank you. -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/