Hello! Most likely it’s the same connectivity issue that lead to the lost partitions.
Do you have the full cmd output and some server logs? > On 15 Apr 2021, at 21:15, facundo.maldonado <maldonadofacu...@gmail.com> > wrote: > > I'm testing a 32 nodes cluster with a partitioned cache with one backup. > If 2 of them crashed (not if, when) I have the lost partitions problem. > > Now I ssh to one of the nodes and execute *control.sh --baseline.* > From every node other than the one marked as "coordinator" (?) I get this > output: > > -------------------------------------------------------------------------------- > Failed to execute baseline command='collect' > Failed to communicate with grid nodes (maximum count of retries reached). > Connection to cluster failed. Failed to communicate with grid nodes (maximum > count of retries reached). > > Ok, I went to every node and do the same until I found the 'coordinator'. > Once I made the failing nodes get online again I execute: > *control.sh --cache reset_lost_partitions mycache* > > To my surprise, I'm getting > -------------------------------------------------------------------------------- > Connection to cluster failed. Failed to communicate with grid nodes (maximum > count of retries reached). > > So, started again looking for the nodes where that command actually works. > > I'm sure I'm doing something wrong. Could someone help me? > > > > > -- > Sent from: http://apache-ignite-users.70518.x6.nabble.com/