Hello Naveen Apache Ignite 2.13 is more than 2 years old, 25 months old in actual fact. Three bugfix releases had been rolled out over time up to 2.16 release.
It seems you are restarting your cluster on a regular basis, so you'd better upgrade to 2.16 as soon as possible. Otherwise it will also be very difficult for people on a community based mailing list, on volunteer time, to work out a solution with a 2 years old version running. Besides that, you are not providing very much information about your cluster setup. How many nodes, what infrastructure, how many caches, overall data size. One could only guess you have more than 1 node running, with at least 1 cache, and non-empty dataset. :) This document from GridGain may be helpful but I don't see the same for Ignite, it may still be worth checking it out. https://www.gridgain.com/docs/latest/perf-troubleshooting-guide/maintenance-mode On the other hand you should also check your failing node. If it is always the same node failing, then there should be some root cause apart from Ignite. Indeed if the nodes configuration is the same across all nodes, and just this one fails, you should also consider some network issues (check connectivity and network latency between nodes) and hardware related issues (faulty disks, faulty memory) In the end, one option might be to replace the faulty machine with a brand new one. In cloud environments this is actually quite cheap and easy to do. Cheers Gianluca On Wed, 29 May 2024 at 08:43, Naveen Kumar <naveen.band...@gmail.com> wrote: > Hello All > > We are using Ignite 2.13.0 > > After a cluster restart, one of the node is not coming up and in node logs > are seeing this error - Node requires maintenance, non-empty set of > maintainance tasks is found - node is not coming up > > we are getting errors like time out is reached before computation is > completed error in other nodes as well. > > I could see that, we have control.sh script to backup and clean up the > corrupted files, but when I run the command, it fails. > > I have removed the node from baseline and tried to run as well, still its > failing > > what could be the solution for this, cluster is functioning, however there > are requests failing > > Is there anyway we can start ignite node in maintenance mode and try > running clean corrupted commands > > Thanks > Naveen > > >