[jira] [Comment Edited] (IGNITE-4210) CacheLoadingConcurrentGridStartSelfTest.testLoadCacheFromStore() test lose data.

2018-07-12 Thread Alexey Kuznetsov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16541377#comment-16541377
 ] 

Alexey Kuznetsov edited comment on IGNITE-4210 at 7/12/18 9:39 AM:
---

[~agura] [~andrey-kuznetsov] In case of topology changes during store loading I 
wanted to throw *ClusterTopologyCheckedException* with retry future, but it 
failed to be serialized.
Imagine client calling store load, and loading failed on remote node, 

In this case *ClusterTopologyCheckedException* would be returned back to 
client, but without retry future.

Is it ok, if no retry future present inside exception ?


was (Author: alexey kuznetsov):
[~agura] [~andrey-kuznetsov] In case of topology changes during store loading I 
wanted to throw *ClusterTopologyCheckedException* with retry future, but it 
failed to be serialized.
Imagine client calling store load, and loading failed on remote node, 

In this case *ClusterTopologyCheckedException* would be returned back to 
client, but without retry future.

> CacheLoadingConcurrentGridStartSelfTest.testLoadCacheFromStore() test lose 
> data.
> 
>
> Key: IGNITE-4210
> URL: https://issues.apache.org/jira/browse/IGNITE-4210
> Project: Ignite
>  Issue Type: Bug
>Reporter: Anton Vinogradov
>Assignee: Alexey Kuznetsov
>Priority: Major
>  Labels: MakeTeamcityGreenAgain
> Fix For: 2.7
>
>
> org.apache.ignite.internal.processors.cache.distributed.CacheLoadingConcurrentGridStartSelfTest#testLoadCacheFromStore
>  sometimes have failures.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (IGNITE-4210) CacheLoadingConcurrentGridStartSelfTest.testLoadCacheFromStore() test lose data.

2018-07-12 Thread Alexey Kuznetsov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16541377#comment-16541377
 ] 

Alexey Kuznetsov edited comment on IGNITE-4210 at 7/12/18 9:31 AM:
---

[~agura] [~andrey-kuznetsov] In case of topology changes during store loading I 
wanted to throw *ClusterTopologyCheckedException* with retry future, but it 
failed to be serialized.
Imagine client calling store load, and loading failed on remote node, 

In this case *ClusterTopologyCheckedException* would be returned back to 
client, but without retry future.


was (Author: alexey kuznetsov):
[~agura] In case of topology changes during store loading I wanted to throw 
*ClusterTopologyCheckedException* with retry future, but it failed to be 
serialized.
Imagine client calling store load, and loading failed on remote node, 

In this case *ClusterTopologyCheckedException* would be returned back to 
client, but without retry future.

> CacheLoadingConcurrentGridStartSelfTest.testLoadCacheFromStore() test lose 
> data.
> 
>
> Key: IGNITE-4210
> URL: https://issues.apache.org/jira/browse/IGNITE-4210
> Project: Ignite
>  Issue Type: Bug
>Reporter: Anton Vinogradov
>Assignee: Alexey Kuznetsov
>Priority: Major
>  Labels: MakeTeamcityGreenAgain
> Fix For: 2.7
>
>
> org.apache.ignite.internal.processors.cache.distributed.CacheLoadingConcurrentGridStartSelfTest#testLoadCacheFromStore
>  sometimes have failures.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (IGNITE-4210) CacheLoadingConcurrentGridStartSelfTest.testLoadCacheFromStore() test lose data.

2018-06-13 Thread Alexey Kuznetsov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16511060#comment-16511060
 ] 

Alexey Kuznetsov edited comment on IGNITE-4210 at 6/13/18 1:28 PM:
---

[~agura] 
Am I understand you correctly ? :
1) User starts 1 grid
2) User initiates cache store loading
3) Additional grids connect to cluster. During PME they observe cache store 
loading progress(certain future was created on initiator), 
they cancel cache store loading, pass exception to user.
4) User receives exception during mass node start. Cache contains some values, 
loaded from store.

I have the only question left :
Should we cancel cache store loading if PME was initiated due to node left, or 
new cache created etc.? I think yes. 


was (Author: alexey kuznetsov):
[~agura] 
Am I understand you correctly ? :
1) User starts 1 grid
2) User initiates cache store loading
3) Additional grids connect to cluster. During PME they observe cache store 
loading progress(certain future was created on initiator), 
they cancel cache store loading, pass exception to user.
4) User receives exception during mass node start. Cache contains some values, 
loaded from store.

> CacheLoadingConcurrentGridStartSelfTest.testLoadCacheFromStore() test lose 
> data.
> 
>
> Key: IGNITE-4210
> URL: https://issues.apache.org/jira/browse/IGNITE-4210
> Project: Ignite
>  Issue Type: Bug
>Reporter: Anton Vinogradov
>Assignee: Alexey Kuznetsov
>Priority: Major
>  Labels: MakeTeamcityGreenAgain
> Fix For: 2.6
>
>
> org.apache.ignite.internal.processors.cache.distributed.CacheLoadingConcurrentGridStartSelfTest#testLoadCacheFromStore
>  sometimes have failures.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (IGNITE-4210) CacheLoadingConcurrentGridStartSelfTest.testLoadCacheFromStore() test lose data.

2018-06-13 Thread Andrey Gura (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16510877#comment-16510877
 ] 

Andrey Gura edited comment on IGNITE-4210 at 6/13/18 9:47 AM:
--

[~Alexey Kuznetsov] From my point of view it's a not good solution because it 
will block partition map exchange until loadCache finish data loading and then 
will lead to massive data rebalancing. Better way, I believe, it's passing 
exception to user code in case of topology changes and then user will have 
possibility to manage initial data loading from cache store (e.g. user can 
split whole data set on blocks and retry loading of block on which topology is 
changed).


was (Author: agura):
[~Alexey Kuznetsov] From my point of view it's a not good solution because it 
will block partition map exchange until loadCache finish data loading and then 
will lead to massive data rebalancing. Better way, I believe, it's passing 
exception to user code in case of topology changes and then user will have 
possibility to manage initial data loading (e.g. user can split whole data set 
on blocks and retry loading of block on which topology is changed).

> CacheLoadingConcurrentGridStartSelfTest.testLoadCacheFromStore() test lose 
> data.
> 
>
> Key: IGNITE-4210
> URL: https://issues.apache.org/jira/browse/IGNITE-4210
> Project: Ignite
>  Issue Type: Bug
>Reporter: Anton Vinogradov
>Assignee: Alexey Kuznetsov
>Priority: Major
>  Labels: MakeTeamcityGreenAgain
> Fix For: 2.6
>
>
> org.apache.ignite.internal.processors.cache.distributed.CacheLoadingConcurrentGridStartSelfTest#testLoadCacheFromStore
>  sometimes have failures.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (IGNITE-4210) CacheLoadingConcurrentGridStartSelfTest.testLoadCacheFromStore() test lose data.

2018-04-27 Thread Alexey Kuznetsov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16456279#comment-16456279
 ] 

Alexey Kuznetsov edited comment on IGNITE-4210 at 4/27/18 12:12 PM:


The root cause of the bug identified.
grid.cache(DEFAULT_CACHE_NAME).loadCache(null) performs cache loading only on 
few nodes(ususally one) because other nodes are in the middle of process of 
joining cluster.

In unstable topology(multiple nodes join cluster) some entries aren't get 
loaded into the cache , because partitions cannot be reserved. Partitons 
concurrently are evicted, moved to other nodes while PME.

I put cache lock on topology before 
grid.cache(DEFAULT_CACHE_NAME).loadCache(null), and unlocked it after loading. 
Test passes.

So, we should lock topology before cache loading, or retry loading after 
topology is settled down.


was (Author: alexey kuznetsov):
The root cause of the bug identified.
grid.cache(DEFAULT_CACHE_NAME).loadCache(null) performs cache loading only on 
few nodes(ususally one) because other nodes are in the middle of process of 
joining cluster.

In unstable topology(multiple nodes join cluster) some entries aren't get 
loaded into the cache , because partitions cannot be reserved. Partitons 
concurrently are evicted and moved to other nodes while PME.

I put cache lock on topology before 
grid.cache(DEFAULT_CACHE_NAME).loadCache(null), and unlocked it after loading. 
Test passes.

So, we should lock topology before cache loading, or retry loading after 
topology is settled down.

> CacheLoadingConcurrentGridStartSelfTest.testLoadCacheFromStore() test lose 
> data.
> 
>
> Key: IGNITE-4210
> URL: https://issues.apache.org/jira/browse/IGNITE-4210
> Project: Ignite
>  Issue Type: Bug
>Reporter: Anton Vinogradov
>Assignee: Alexey Kuznetsov
>Priority: Major
>  Labels: MakeTeamcityGreenAgain
>
> org.apache.ignite.internal.processors.cache.distributed.CacheLoadingConcurrentGridStartSelfTest#testLoadCacheFromStore
>  sometimes have failures.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)