Hi,

We have been investigating some issues which appear to be related to
checkpointing. We currently use the IA 2.8.1 with the C# client.

I have been trying to gain clarity on how certain aspects of the Ignite
configuration relate to the checkpointing process:

1. Number of check pointing threads. This defaults to 4, but I don't
understand how it applies to the checkpointing process. Are more threads
generally better (eg: because it makes the disk IO parallel across the
threads), or does it only have a positive effect if you have many data
storage regions? Or something else? If this could be clarified in the
documentation (or a pointer to it which Google has not yet found), that
would be good.

2. Checkpoint frequency. This is defaulted to 180 seconds. I was thinking
that reducing this time would result in smaller less disruptive check
points. Setting it to 60 seconds seems pretty safe, but is there a
practical lower limit that should be used for use cases with new data
constantly being added, eg: 5 seconds, 10 seconds?

3. Write exclusivity constraints during checkpointing. I understand that
while a checkpoint is occurring ongoing writes will be supported into the
caches being check pointed, and if those are writes to existing pages then
those will be duplicated into the checkpoint buffer. If this buffer becomes
full or stressed then Ignite will throttle, and perhaps block, writes until
the checkpoint is complete. If this is the case then Ignite will emit
logging (warning or informational?) that writes are being throttled.

We have cases where simple puts to caches (a few requests per second) are
taking up to 90 seconds to execute when there is an active check point
occurring, where the check point has been triggered by the checkpoint
timer. When a checkpoint is not occurring the time to do this is usually in
the milliseconds. The checkpoints themselves can take 90 seconds or longer,
and are updating up to 30,000-40,000 pages, across a pair of data storage
regions, one with 4Gb in-memory space allocated (which should be 1,000,000
pages at the standard 4kb page size), and one small region with 128Mb.
There is no 'throttling' logging being emitted that we can tell, so the
checkpoint buffer (which should be 1Gb for the first data region and 256 Mb
for the second smaller region in this case) does not look like it can fill
up during the checkpoint.

It seems like the checkpoint is affecting the put operations, but I don't
understand why that may be given the documented checkpointing process, and
the checkpoint itself (at least via Informational logging) is not
advertising any restrictions.

Thanks,
Raymond.

-- 
<http://www.trimble.com/>
Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)

Reply via email to