Thanks for the reply.

We've done query analysis and found that pretty much all queries are taking
around 1-5ms execution time for our operations which we consider it to be
fine.

But we have one query which is adding of rows in a table. We get maximum
5000 records at a time and we use IgniteDataStreamer API and JCache
QueryEntity way to insert rows in the table. We don't use JDBC. And for all
querying we either use SqlFieldsQuery or JCache getAll query. We are not
using job computes as well.

Now according to above statement, in real-time we get around 1500-2000 bulk
row insertions from each thread. All 300-400 threads insert at once without
any locking mechanism. This is actually giving us insert time of around
40-50 ms for one thread.

We are suspecting because of this many thread starvations are happening.

Weirdly, most of logs are printing read query delays but when seen from
Client application there are no delays that are printed too.

For the run-time of around two days, I'm sending you the entire logs of the
coordinator node. Please find the attached.

Logs.zip
<http://apache-ignite-users.70518.x6.nabble.com/file/t2763/Logs.zip>  

Unfortunately I am not system adminstrator and couldn't get to run jstack on
that system. But I'll try and get one if possible.

Version - We are using gridgain's libraries of version 8.7.10.
Client heap size is 2GB and Server heap size is 4GB.
Config -> Peristence is ON. Checkpointing frequency is every 15 mins. Wal
archiving is off. Data region size off-heap is 20GB. Persistence used would
be around 25GB in real-time for our case. All Caches used are of replication
mode. Full-sync happens between two server nodes. WAL segment size used is
256MB. And all are atomic mode atomicity.

Topology -> two Clients connected to two Servers. Total 4 nodes. All the
connection pool threads size are default ones which is equal to number of
CPU cores. Client machines are of 40 core CPUs. Server machines are of 64
core CPUs.

The system is actually working but we want to know if we actually are
worried these may cause any issues in future like cluster wide data
corruption, etc. We want to know if we are doing something wrong and would
like to correct.

Would be grateful if you go through the entire log. Thank you.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Reply via email to