Alexey Kukushkin created IGNITE-17657:
-----------------------------------------

             Summary: Partition data loss after rolling restart
                 Key: IGNITE-17657
                 URL: https://issues.apache.org/jira/browse/IGNITE-17657
             Project: Ignite
          Issue Type: Bug
    Affects Versions: 2.13
            Reporter: Alexey Kukushkin


*Setup*

An active 3+ node Apache Ignite 2.13 cluster with a cache with 1 backup, 
enabled persistence and default partition loss policy.

*Actions*
 # An application is continuously writing data to the cache
 # The nodes are sequentially restarted one after another while the data is 
being written to the cache. The next node is restarted only after the data 
rebalancing is complete. Using the {{KeysToRebalanceLeft}} metric to monitor 
rebalancing (see the 
[documentation|https://ignite.apache.org/docs/latest/monitoring-metrics/metrics#monitoring-rebalancing]
 for more details)
 # The application reads some of the data after restarting all the nodes.

*Expected*

No data is lost since there is 1 backup and the nodes are restarted 
sequentially after rebalancing is complete.

*Actual*

Sometimes (in our case in more than 50% of cases) there is a "partition data 
has been lost" exception on the attempt to read the data.

*Notes*

Tried to create a JUnit reproducer (all nodes within the same JVM) for the 
above scenario - no success so far.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to