Hello, Please try to apply and consider generic optimization techniques for the persistence: https://apacheignite.readme.io/docs/durable-memory-tuning
In the meantime: - Try to keep investigating the cause of the GC pause unless you 100% sure it's caused by rebalancing - Increase IgniteConfiguration.failureDetectionTimeout to 20 seconds to prevent nodes shut down on long GC pauses - Have you tuned up your JVM settings? https://apacheignite.readme.io/docs/jvm-and-system-tuning As for FSYNC vs LOG_ONLY, the former protects you from global cluster outages when all the nodes go down at one time. If it's an unlikely event situation that it's ok to relax the mode to LOG_ONLY as long as you have backups copies on other nodes. -- Denis On Tue, Jan 1, 2019 at 8:23 PM Ignite Enthusiast <[email protected]> wrote: > Question on Ignite Persistence: > > On a deployed Ignite (3 node) cluster, I see one one node being taken out > of the cluster because it encounters GC Pauses. Worse, when this node > leaves the cluster, a Rebalance is initiated (and re-initiated when the > node joins back). > > Note: Data that Ignite Cluster holds is fully transactional. We cannot put > up with Data Loss. > > From the logs : > > [14:32:01,643][INFO][wal-file-archiver%null-#44][FsyncModeFileWriteAheadLogManager] > Copied file > [src=/data2/data/wal/node00-8d707f27-d022-4237-85cf-28c36828a0a3/0000000000000006.wal, > dst=/data2/data/wal/archive/node00-8d707f27-d022-4237-85cf-28c36828a0a3/0000000000000306.wal] > > [14:32:02,830][INFO][wal-file-archiver%null-#44][FsyncModeFileWriteAheadLogManager] > Starting to copy WAL segment [absIdx=307, segIdx=7, > origFile=/data2/data/wal/node00-8d707f27-d022-4237-85cf-28c36828a0a3/0000000000000007.wal, > dstFile=/data2/data/wal/archive/node00-8d707f27-d022-4237-85cf-28c36828a0a3/0000000000000307.wal] > > [14:32:17,999][WARNING][jvm-pause-detector-worker][IgniteKernal] Possible > too long JVM pause: 15044 milliseconds. > > It is clear that WAL writes (FSYNC in this case) always precede GC Pauses. > > Question: > > The only advantage of FSYNC Vs LOG_ONLY seems to be surviving OS Level > Crashes. With a Journaled filesystem like Ext4FS, do I really need FSYNC? > Can't I get around with LOG_ONLY ? > > If not, how do I minimise the perf bottlenecks using FSYNC ? >
