Re: Ignite 2.3 - replicated cache lost data after restart cluster nodes with persistence enabled

Denis Magda Thu, 26 Oct 2017 15:49:37 -0700

Dmitriy,

I don’t see why a result of a simple query such as “select count(*) from t;” 
should be different if a rebalancing is in progress or after a cluster restart. 
Ignite’s SQL engine claims that its fault-tolerant and returns a consistent 
result set all the times unless a partition loss happened. Here is we don’t 
have a partition loss, thus, seems we caught a bug.


Vladimir O., please chime in.

—
Denis

> On Oct 26, 2017, at 3:34 PM, Dmitry Pavlov <dpavlov....@gmail.com> wrote:
> 
> Hi Denis 
> 
> It seems to me that this is not a bug for my scenario, because the data was 
> not loaded within the same transaction using transactional cache. In this 
> case it is ok that cache data is rebalanced according to partition update 
> counters,isn't it?
> 
> I suppose in this case the data was not lost ,it was just not completely 
> transferred to the second node.
> 
> Sincerely, 
> 
> чт, 26 окт. 2017 г., 21:09 Denis Magda <dma...@apache.org 
> <mailto:dma...@apache.org>>:
> + dev list
> 
> This scenario has to be handled automatically by Ignite. Seems like a bug. 
> Please refer to the initial description of the issue. Alex G, please have a 
> look:
> 
> To reproduce:
> 1. create a replicated cache with multiple indexedtypes, with some indexes
> 2. Start first server node
> 3. Insert data into cache (1000000 entries)
> 4. Start second server node
> 
> At this point, seems all is ok, data is apparently successfully rebalanced
> making sql queries (count(*))
> 
> 5. Stop server nodes
> 6. Restart server nodes
> 7. Doing sql queries (count(*)) returns less data
> 
> —
> Denis
> 
> > On Oct 23, 2017, at 5:11 AM, Dmitry Pavlov <dpavlov....@gmail.com 
> > <mailto:dpavlov....@gmail.com>> wrote:
> >
> > Hi,
> >
> > I tried to write the same code that will execute the described scenario. 
> > The results are as follows:
> > If I do not give enough time to completely rebalance partitions, then the 
> > newly launched node will not have enough data to count(*).
> > If I do not wait for enough time to allow to distribute the data on the 
> > grid, the query will return a smaller number - the number of records that 
> > have been uploaded to the node. I guess there is 
> > GridDhtPartitionDemandMessage’s can be found in Ignite debug log in this 
> > moment.
> >
> > If I wait for a sufficient amount of time or directly call the wait on the 
> > newly joined node
> > ignite2.cache (CACHE) .rebalance (). get ();
> > then all results will be correct.
> >
> > About your question>  what's happen if one cluster node crashes in the 
> > middle of rebalance process?
> > In this case normal failover scenario is started, data is rebalanced within 
> > cluster. And if there is enought WAL records on nodes representing history 
> > from crash point, then only recent changes (delta) will be send over 
> > network. If there is no enought history to apply rebalance with most recent 
> > changes, then partition will be rebalanced from scratch to new node.
> >
> > Sincerely,
> > Pavlov Dmitry
> >
> >
> > сб, 21 окт. 2017 г. в 2:07, Manu <maxn...@hotmail.com 
> > <mailto:maxn...@hotmail.com> <mailto:maxn...@hotmail.com 
> > <mailto:maxn...@hotmail.com>>>:
> > Hi,
> >
> > after restart data seems not be consistent.
> >
> > We have been waiting until rebalance was fully completed to restart the
> > cluster to check if durable memory data rebalance works correctly and sql
> > queries still work.
> > Another question (it´s not this case), what's happen if one cluster node
> > crashes in the middle of rebalance process?
> >
> > Thanks!
> >
> >
> >
> > --
> > Sent from: http://apache-ignite-users.70518.x6.nabble.com/ 
> > <http://apache-ignite-users.70518.x6.nabble.com/> 
> > <http://apache-ignite-users.70518.x6.nabble.com/ 
> > <http://apache-ignite-users.70518.x6.nabble.com/>>

Re: Ignite 2.3 - replicated cache lost data after restart cluster nodes with persistence enabled

Reply via email to