[jira] [Created] (IGNITE-12611) EntryProcessorPermissionCheckTest.test: Test looks flaky

2020-01-30 Thread Denis Garus (Jira)
Denis Garus created IGNITE-12611:


 Summary: EntryProcessorPermissionCheckTest.test: Test looks flaky
 Key: IGNITE-12611
 URL: https://issues.apache.org/jira/browse/IGNITE-12611
 Project: Ignite
  Issue Type: Bug
Reporter: Denis Garus
Assignee: Denis Garus


Test looks flaky.Test status change in build without changes: from failed to 
successful



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-12611) EntryProcessorPermissionCheckTest.test: Test looks flaky

2020-01-30 Thread Denis Garus (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denis Garus updated IGNITE-12611:
-
Ignite Flags:   (was: Docs Required,Release Notes Required)

> EntryProcessorPermissionCheckTest.test: Test looks flaky
> 
>
> Key: IGNITE-12611
> URL: https://issues.apache.org/jira/browse/IGNITE-12611
> Project: Ignite
>  Issue Type: Bug
>Reporter: Denis Garus
>Assignee: Denis Garus
>Priority: Major
>
> Test looks flaky.Test status change in build without changes: from failed to 
> successful



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12577) High persistence load can trigger erroneous assert: "Page was pin when we resolve abs pointer, it can not be evicted".

2020-01-30 Thread Stanilovsky Evgeny (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17027232#comment-17027232
 ] 

Stanilovsky Evgeny commented on IGNITE-12577:
-

[~mmuzaf] i still believe it can triggers erroneous assertions, our custom 
build show no problems here. 

> High persistence load can trigger erroneous assert: "Page was pin when we 
> resolve abs pointer, it can not be evicted".
> --
>
> Key: IGNITE-12577
> URL: https://issues.apache.org/jira/browse/IGNITE-12577
> Project: Ignite
>  Issue Type: Bug
>  Components: data structures
>Affects Versions: 2.7.6
>Reporter: Stanilovsky Evgeny
>Assignee: Stanilovsky Evgeny
>Priority: Critical
> Fix For: 2.8
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> For now It seems erroneous assertion in checkpointer code :  "Page was pin 
> when we resolve abs pointer, it can not be evicted", need to be removed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12601) DistributedMetaStoragePersistentTest.testUnstableTopology is flaky

2020-01-30 Thread Ivan Rakov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17027061#comment-17027061
 ] 

Ivan Rakov commented on IGNITE-12601:
-

Seems like the test didn't fail on PR branch.
Merged to master and ignite-2.8.

> DistributedMetaStoragePersistentTest.testUnstableTopology is flaky
> --
>
> Key: IGNITE-12601
> URL: https://issues.apache.org/jira/browse/IGNITE-12601
> Project: Ignite
>  Issue Type: Bug
>Reporter: Vyacheslav Koptilin
>Assignee: Vyacheslav Koptilin
>Priority: Major
> Fix For: 2.8
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> DistributedMetaStoragePersistentTest.testUnstableTopology is flaky
> Please take a look at TC:
> https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8&testNameId=5923369202582779855&tab=testDetails



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (IGNITE-11797) Fix consistency issues for atomic and mixed tx-atomic cache groups.

2020-01-30 Thread Pavel Pereslegin (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-11797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17026863#comment-17026863
 ] 

Pavel Pereslegin edited comment on IGNITE-11797 at 1/30/20 5:35 PM:


Hello [~ascherbakov],
you changed WAL iterator "assertion" to "warning", can you elaborate why you 
done [this 
change|https://github.com/apache/ignite/pull/7315/files#diff-982671dd518b3d8a905da745d8f0942fL1419]?

If I understand correctly - demander can OWN a partition with inconsistent data 
in this case only with a warning in the log file, or I missed something?
>From my point of view - we should at least restart rebalancing when such 
>errors occur.


was (Author: xtern):
Hello [~ascherbakov],
you changed WAL iterator "assertion" to "warning", can you elaborate why you 
done [this 
change|https://github.com/apache/ignite/pull/7315/files#diff-982671dd518b3d8a905da745d8f0942fL1419]?

If I understand correctly - we can OWN a partition with inconsistent data in 
this case only with a warning in the log file, or I missed something?
>From my point of view - we should at least restart rebalancing when such 
>errors occur.

> Fix consistency issues for atomic and mixed tx-atomic cache groups.
> ---
>
> Key: IGNITE-11797
> URL: https://issues.apache.org/jira/browse/IGNITE-11797
> Project: Ignite
>  Issue Type: Bug
>Reporter: Alexey Scherbakov
>Assignee: Alexey Scherbakov
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> IGNITE-10078 only solves consistency problems for tx mode.
> For atomic caches the rebalance consistency issues still remain and should be 
> fixed together with improvement of atomic cache protocol consistency.
> Also, need to disable dynamic start of atomic cache in group having only tx 
> caches because it's not working in current state.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (IGNITE-11797) Fix consistency issues for atomic and mixed tx-atomic cache groups.

2020-01-30 Thread Pavel Pereslegin (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-11797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17026863#comment-17026863
 ] 

Pavel Pereslegin edited comment on IGNITE-11797 at 1/30/20 5:31 PM:


Hello [~ascherbakov],
you changed WAL iterator "assertion" to "warning", can you elaborate why you 
done [this 
change|https://github.com/apache/ignite/pull/7315/files#diff-982671dd518b3d8a905da745d8f0942fL1419]?

If I understand correctly - we can OWN a partition with inconsistent data in 
this case only with a warning in the log file, or I missed something?
>From my point of view - we should at least restart rebalancing when such 
>errors occur.


was (Author: xtern):
[~ascherbakov], you changed WAL iterator "assertion" to "warning", can you 
elaborate why you done [this 
change|https://github.com/apache/ignite/pull/7315/files#diff-982671dd518b3d8a905da745d8f0942fL1419]?
If I understand correctly - we can OWN a partition with inconsistent data in 
this case only with a warning in the log file, or am I missing something?
>From my point of view - we should at least restart rebalancing when such 
>errors occur.

> Fix consistency issues for atomic and mixed tx-atomic cache groups.
> ---
>
> Key: IGNITE-11797
> URL: https://issues.apache.org/jira/browse/IGNITE-11797
> Project: Ignite
>  Issue Type: Bug
>Reporter: Alexey Scherbakov
>Assignee: Alexey Scherbakov
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> IGNITE-10078 only solves consistency problems for tx mode.
> For atomic caches the rebalance consistency issues still remain and should be 
> fixed together with improvement of atomic cache protocol consistency.
> Also, need to disable dynamic start of atomic cache in group having only tx 
> caches because it's not working in current state.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-11797) Fix consistency issues for atomic and mixed tx-atomic cache groups.

2020-01-30 Thread Pavel Pereslegin (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-11797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17026863#comment-17026863
 ] 

Pavel Pereslegin commented on IGNITE-11797:
---

[~ascherbakov], you changed WAL iterator "assertion" to "warning", can you 
elaborate why you done [this 
change|https://github.com/apache/ignite/pull/7315/files#diff-982671dd518b3d8a905da745d8f0942fL1419]?
If I understand correctly - we can OWN a partition with inconsistent data in 
this case only with a warning in the log file, or am I missing something?
>From my point of view - we should at least restart rebalancing when such 
>errors occur.

> Fix consistency issues for atomic and mixed tx-atomic cache groups.
> ---
>
> Key: IGNITE-11797
> URL: https://issues.apache.org/jira/browse/IGNITE-11797
> Project: Ignite
>  Issue Type: Bug
>Reporter: Alexey Scherbakov
>Assignee: Alexey Scherbakov
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> IGNITE-10078 only solves consistency problems for tx mode.
> For atomic caches the rebalance consistency issues still remain and should be 
> fixed together with improvement of atomic cache protocol consistency.
> Also, need to disable dynamic start of atomic cache in group having only tx 
> caches because it's not working in current state.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-6804) Print a warning if HashMap is passed into bulk update operations

2020-01-30 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-6804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17026779#comment-17026779
 ] 

Ignite TC Bot commented on IGNITE-6804:
---

{panel:title=Branch: [pull/6976/head] Base: [master] : Possible Blockers 
(3)|borderStyle=dashed|borderColor=#ccc|titleBGColor=#F7D6C1}
{color:#d04437}Queries 1{color} [[tests 
1|https://ci.ignite.apache.org/viewLog.html?buildId=4973964]]
* IgniteBinaryCacheQueryTestSuite: 
IgniteSqlSplitterSelfTest.testReplicatedTablesUsingPartitionedCacheClient - 
Test has low fail rate in base branch 0,0% and is not flaky

{color:#d04437}Platform .NET (Core Linux){color} [[tests 
1|https://ci.ignite.apache.org/viewLog.html?buildId=4973959]]
* dll: CacheTest.TestCacheWithExpiryPolicyOnUpdate - Test has low fail rate in 
base branch 0,0% and is not flaky

{color:#d04437}Platform .NET (Inspections)*{color} [[tests 0 TIMEOUT , 
TC_BUILD_FAILURE |https://ci.ignite.apache.org/viewLog.html?buildId=4973960]]

{panel}
[TeamCity *--> Run :: All* 
Results|https://ci.ignite.apache.org/viewLog.html?buildId=4973988&buildTypeId=IgniteTests24Java8_RunAll]

> Print a warning if HashMap is passed into bulk update operations
> 
>
> Key: IGNITE-6804
> URL: https://issues.apache.org/jira/browse/IGNITE-6804
> Project: Ignite
>  Issue Type: Improvement
>  Components: cache
>Reporter: Denis A. Magda
>Assignee: Ilya Kasnacheev
>Priority: Critical
>  Labels: usability
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Ignite newcomers tend to stumble on deadlocks simply because the keys are 
> passed in an unordered HashMap. Propose to do the following:
> * update bulk operations Java docs.
> * print out a warning if not SortedMap (e.g. HashMap, 
> Weak/Identity/Concurrent/Linked HashMap etc) is passed into
> a bulk method (instead of SortedMap) and contains more than 1 element. 
> However, we should make sure that we only print that warning once and not 
> every time the API is called.
> * do not produce warning for explicit optimistic transactions
> More details are here:
> http://apache-ignite-developers.2346864.n4.nabble.com/Re-Ignite-2-0-0-GridUnsafe-unmonitor-td23706.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-12610) Disable H2 object cache reliably

2020-01-30 Thread Ivan Pavlukhin (Jira)
Ivan Pavlukhin created IGNITE-12610:
---

 Summary: Disable H2 object cache reliably
 Key: IGNITE-12610
 URL: https://issues.apache.org/jira/browse/IGNITE-12610
 Project: Ignite
  Issue Type: Bug
  Components: sql
Affects Versions: 2.8
Reporter: Ivan Pavlukhin
 Fix For: 2.9


Internally H2 maintains a cache of {{org.h2.value.Value}} objects. It can be 
disabled by using "h2.objectCache" system property. There is a clear intent to 
disable this cache because the system property is set to "false" in 
{{org.apache.ignite.internal.processors.query.h2.ConnectionManager}}. But 
apparently it is too late, because the property is read by H2 internals before 
it. Consequently the object cache is enabled by default.

We need to set this property earlier.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12504) Auto-adjust breaks existing code, should be disabled by default

2020-01-30 Thread Ilya Kasnacheev (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17026711#comment-17026711
 ] 

Ilya Kasnacheev commented on IGNITE-12504:
--

[~akalashnikov] Sounds reasonable.

> Auto-adjust breaks existing code, should be disabled by default
> ---
>
> Key: IGNITE-12504
> URL: https://issues.apache.org/jira/browse/IGNITE-12504
> Project: Ignite
>  Issue Type: Bug
>  Components: cache, persistence
>Affects Versions: 2.8
>Reporter: Ilya Kasnacheev
>Assignee: Anton Kalashnikov
>Priority: Blocker
> Fix For: 2.8
>
>
> We have automatic baseline adjustment now. However, it is 'on' by default, 
> which means it breaks existing code. I see new exceptions when starting an 
> existing project after bumping Ignite dependency version:
> {code}
> Caused by: 
> org.apache.ignite.internal.processors.cluster.BaselineAdjustForbiddenException:
>  Baseline auto-adjust is enabled, please turn-off it before try to adjust 
> baseline manually
> {code}
> (Please see reproducer from attached UL discussion)
> I think we should disable auto-adjust by default, let people enable it when 
> they see it fit.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12504) Auto-adjust breaks existing code, should be disabled by default

2020-01-30 Thread Anton Kalashnikov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17026701#comment-17026701
 ] 

Anton Kalashnikov commented on IGNITE-12504:


[~ilyak] non-persistent clusters weren't supported baseline before changes - it 
was just zero-delay rebalance on each join/fail event. But after the 
implementation of auto-adjust feature non-persistent clusters started to 
support baseline same as a persistent cluster. So for keeping the old behavior 
of zero-delay rebalance auto-adjust should be enabled by default and timeout 
should be equal to 0.

In my opinion, the description of this ticket like ' it breaks existing code' 
should be relative only to persistent cluster because changing of baseline for 
not-persistent cluster before auto-adjust feature didn't make any sense(so code 
with baseline change for non-persistent cluster should not exist). if it so 
I'll close this ticket as duplicate. If you still think that enabled 
auto-adjust for the non-persistent cluster can lead to the problems, we'll try 
to figure it out.

> Auto-adjust breaks existing code, should be disabled by default
> ---
>
> Key: IGNITE-12504
> URL: https://issues.apache.org/jira/browse/IGNITE-12504
> Project: Ignite
>  Issue Type: Bug
>  Components: cache, persistence
>Affects Versions: 2.8
>Reporter: Ilya Kasnacheev
>Assignee: Anton Kalashnikov
>Priority: Blocker
> Fix For: 2.8
>
>
> We have automatic baseline adjustment now. However, it is 'on' by default, 
> which means it breaks existing code. I see new exceptions when starting an 
> existing project after bumping Ignite dependency version:
> {code}
> Caused by: 
> org.apache.ignite.internal.processors.cluster.BaselineAdjustForbiddenException:
>  Baseline auto-adjust is enabled, please turn-off it before try to adjust 
> baseline manually
> {code}
> (Please see reproducer from attached UL discussion)
> I think we should disable auto-adjust by default, let people enable it when 
> they see it fit.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-12609) SQL: GridReduceQueryExecutor refactoring.

2020-01-30 Thread Andrey Mashenkov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Mashenkov updated IGNITE-12609:
--
Ignite Flags:   (was: Docs Required,Release Notes Required)

> SQL: GridReduceQueryExecutor refactoring.
> -
>
> Key: IGNITE-12609
> URL: https://issues.apache.org/jira/browse/IGNITE-12609
> Project: Ignite
>  Issue Type: Task
>  Components: sql
>Reporter: Andrey Mashenkov
>Assignee: Andrey Mashenkov
>Priority: Major
>  Labels: refactoring
>
> For now we have few issues that can be resolved.
> 1. We create fake H2 tables\indices for reduce stage even if there is no need 
> to do so (skipMergeTable=true.
> Let's decouple reduce logic from H2Index adapter code.
> 2. Partition mapping code look to complicated and non-optimal.
> Let's use cached affinity mapping and avoid collections copying when possible.
> 3. Also there is no sense to pass RequestID to mapping code just for logging.
> We'll never be able to match any request as no query really exists at a time 
> when error with RequestID is logged.
> 4. Replicated only flag value semantic (calculation and usage) is not clear.
> 5. GridReduceQueryExecutor.reduce() method is too long (over 400 lines).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-12609) SQL: GridReduceQueryExecutor refactoring.

2020-01-30 Thread Andrey Mashenkov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Mashenkov updated IGNITE-12609:
--
Labels: refactoring  (was: )

> SQL: GridReduceQueryExecutor refactoring.
> -
>
> Key: IGNITE-12609
> URL: https://issues.apache.org/jira/browse/IGNITE-12609
> Project: Ignite
>  Issue Type: Task
>  Components: sql
>Reporter: Andrey Mashenkov
>Assignee: Andrey Mashenkov
>Priority: Major
>  Labels: refactoring
>
> For now we have few issues that can be resolved.
> 1. We create fake H2 tables\indices for reduce stage even if there is no need 
> to do so (skipMergeTable=true.
> Let's decouple reduce logic from H2Index adapter code.
> 2. Partition mapping code look to complicated and non-optimal.
> Let's use cached affinity mapping and avoid collections copying when possible.
> 3. Also there is no sense to pass RequestID to mapping code just for logging.
> We'll never be able to match any request as no query really exists at a time 
> when error with RequestID is logged.
> 4. Replicated only flag value semantic (calculation and usage) is not clear.
> 5. GridReduceQueryExecutor.reduce() method is too long (over 400 lines).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-12609) SQL: GridReduceQueryExecutor refactoring.

2020-01-30 Thread Andrey Mashenkov (Jira)
Andrey Mashenkov created IGNITE-12609:
-

 Summary: SQL: GridReduceQueryExecutor refactoring.
 Key: IGNITE-12609
 URL: https://issues.apache.org/jira/browse/IGNITE-12609
 Project: Ignite
  Issue Type: Task
Reporter: Andrey Mashenkov


For now we have few issues that can be resolved.

1. We create fake H2 tables\indices for reduce stage even if there is no need 
to do so (skipMergeTable=true.
Let's decouple reduce logic from H2Index adapter code.

2. Partition mapping code look to complicated and non-optimal.
Let's use cached affinity mapping and avoid collections copying when possible.

3. Also there is no sense to pass RequestID to mapping code just for logging.
We'll never be able to match any request as no query really exists at a time 
when error with RequestID is logged.

4. Replicated only flag value semantic (calculation and usage) is not clear.

5. GridReduceQueryExecutor.reduce() method is too long (over 400 lines).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (IGNITE-12609) SQL: GridReduceQueryExecutor refactoring.

2020-01-30 Thread Andrey Mashenkov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Mashenkov reassigned IGNITE-12609:
-

Assignee: Andrey Mashenkov

> SQL: GridReduceQueryExecutor refactoring.
> -
>
> Key: IGNITE-12609
> URL: https://issues.apache.org/jira/browse/IGNITE-12609
> Project: Ignite
>  Issue Type: Task
>Reporter: Andrey Mashenkov
>Assignee: Andrey Mashenkov
>Priority: Major
>
> For now we have few issues that can be resolved.
> 1. We create fake H2 tables\indices for reduce stage even if there is no need 
> to do so (skipMergeTable=true.
> Let's decouple reduce logic from H2Index adapter code.
> 2. Partition mapping code look to complicated and non-optimal.
> Let's use cached affinity mapping and avoid collections copying when possible.
> 3. Also there is no sense to pass RequestID to mapping code just for logging.
> We'll never be able to match any request as no query really exists at a time 
> when error with RequestID is logged.
> 4. Replicated only flag value semantic (calculation and usage) is not clear.
> 5. GridReduceQueryExecutor.reduce() method is too long (over 400 lines).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-12609) SQL: GridReduceQueryExecutor refactoring.

2020-01-30 Thread Andrey Mashenkov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Mashenkov updated IGNITE-12609:
--
Component/s: sql

> SQL: GridReduceQueryExecutor refactoring.
> -
>
> Key: IGNITE-12609
> URL: https://issues.apache.org/jira/browse/IGNITE-12609
> Project: Ignite
>  Issue Type: Task
>  Components: sql
>Reporter: Andrey Mashenkov
>Assignee: Andrey Mashenkov
>Priority: Major
>
> For now we have few issues that can be resolved.
> 1. We create fake H2 tables\indices for reduce stage even if there is no need 
> to do so (skipMergeTable=true.
> Let's decouple reduce logic from H2Index adapter code.
> 2. Partition mapping code look to complicated and non-optimal.
> Let's use cached affinity mapping and avoid collections copying when possible.
> 3. Also there is no sense to pass RequestID to mapping code just for logging.
> We'll never be able to match any request as no query really exists at a time 
> when error with RequestID is logged.
> 4. Replicated only flag value semantic (calculation and usage) is not clear.
> 5. GridReduceQueryExecutor.reduce() method is too long (over 400 lines).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12493) Test refactoring. Explicit method for starting client nodes

2020-01-30 Thread Ilya Shishkov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17026686#comment-17026686
 ] 

Ilya Shishkov commented on IGNITE-12493:


LGTM

> Test refactoring. Explicit method for starting client nodes
> ---
>
> Key: IGNITE-12493
> URL: https://issues.apache.org/jira/browse/IGNITE-12493
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.7.6
>Reporter: Nikolay Izhikov
>Assignee: Nikolay Izhikov
>Priority: Major
>  Labels: newbie
>  Time Spent: 11h 50m
>  Remaining Estimate: 0h
>
> Right now there is almost 500 explicit usage of {{setClientMode}} in tests.
> Seems we should support the starting of client nodes in test framework.
> We should refactor tests to use {{startClientNode(String name)}}.
> This will simplify tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12504) Auto-adjust breaks existing code, should be disabled by default

2020-01-30 Thread Ilya Kasnacheev (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17026676#comment-17026676
 ] 

Ilya Kasnacheev commented on IGNITE-12504:
--

[~akalashnikov] I'm not sure. What about non-persistent clusters?

If you are confident that IGNITE-12227 disables auto-adjust by default, you can 
mark this ticket as duplicate.

> Auto-adjust breaks existing code, should be disabled by default
> ---
>
> Key: IGNITE-12504
> URL: https://issues.apache.org/jira/browse/IGNITE-12504
> Project: Ignite
>  Issue Type: Bug
>  Components: cache, persistence
>Affects Versions: 2.8
>Reporter: Ilya Kasnacheev
>Assignee: Anton Kalashnikov
>Priority: Blocker
> Fix For: 2.8
>
>
> We have automatic baseline adjustment now. However, it is 'on' by default, 
> which means it breaks existing code. I see new exceptions when starting an 
> existing project after bumping Ignite dependency version:
> {code}
> Caused by: 
> org.apache.ignite.internal.processors.cluster.BaselineAdjustForbiddenException:
>  Baseline auto-adjust is enabled, please turn-off it before try to adjust 
> baseline manually
> {code}
> (Please see reproducer from attached UL discussion)
> I think we should disable auto-adjust by default, let people enable it when 
> they see it fit.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12608) [ignite-extensions] Setup tests for ignite-client-spring-boot-autoconfigure, ignite-spring-boot-autoconfigure on TC

2020-01-30 Thread Nikolay Izhikov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17026671#comment-17026671
 ] 

Nikolay Izhikov commented on IGNITE-12608:
--

https://ci.ignite.apache.org/viewLog.html?buildId=4973991&tab=queuedBuildOverviewTab
 - now tests from new modules is run on TC.

> [ignite-extensions] Setup tests for ignite-client-spring-boot-autoconfigure, 
> ignite-spring-boot-autoconfigure on TC
> ---
>
> Key: IGNITE-12608
> URL: https://issues.apache.org/jira/browse/IGNITE-12608
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Nikolay Izhikov
>Assignee: Nikolay Izhikov
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Changes relating to setting up new modules tests on TC.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-12608) [ignite-extensions] Setup tests for ignite-client-spring-boot-autoconfigure, ignite-spring-boot-autoconfigure on TC

2020-01-30 Thread Nikolay Izhikov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nikolay Izhikov updated IGNITE-12608:
-
Description: Changes relating to setting up new modules tests on TC.  (was: 
It seems we should update JUnit version to run spring tests on TC)

> [ignite-extensions] Setup tests for ignite-client-spring-boot-autoconfigure, 
> ignite-spring-boot-autoconfigure on TC
> ---
>
> Key: IGNITE-12608
> URL: https://issues.apache.org/jira/browse/IGNITE-12608
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Nikolay Izhikov
>Assignee: Nikolay Izhikov
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Changes relating to setting up new modules tests on TC.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12601) DistributedMetaStoragePersistentTest.testUnstableTopology is flaky

2020-01-30 Thread Vyacheslav Koptilin (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17026665#comment-17026665
 ] 

Vyacheslav Koptilin commented on IGNITE-12601:
--

Hello [~mmuzaf],

> It seems `PDS (Direct IO) 1` suite is not part of Run::All.
I missed this fact. Thank you for pointing this out.

> DistributedMetaStoragePersistentTest.testUnstableTopology is flaky
> --
>
> Key: IGNITE-12601
> URL: https://issues.apache.org/jira/browse/IGNITE-12601
> Project: Ignite
>  Issue Type: Bug
>Reporter: Vyacheslav Koptilin
>Assignee: Vyacheslav Koptilin
>Priority: Major
> Fix For: 2.8
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> DistributedMetaStoragePersistentTest.testUnstableTopology is flaky
> Please take a look at TC:
> https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8&testNameId=5923369202582779855&tab=testDetails



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-6804) Print a warning if HashMap is passed into bulk update operations

2020-01-30 Thread Ilya Kasnacheev (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-6804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17026662#comment-17026662
 ] 

Ilya Kasnacheev commented on IGNITE-6804:
-

[~agura] I have also added detection of transactional getAll().

Please review amended fix. Tests are running, will be checked before merge.

> Print a warning if HashMap is passed into bulk update operations
> 
>
> Key: IGNITE-6804
> URL: https://issues.apache.org/jira/browse/IGNITE-6804
> Project: Ignite
>  Issue Type: Improvement
>  Components: cache
>Reporter: Denis A. Magda
>Assignee: Ilya Kasnacheev
>Priority: Critical
>  Labels: usability
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Ignite newcomers tend to stumble on deadlocks simply because the keys are 
> passed in an unordered HashMap. Propose to do the following:
> * update bulk operations Java docs.
> * print out a warning if not SortedMap (e.g. HashMap, 
> Weak/Identity/Concurrent/Linked HashMap etc) is passed into
> a bulk method (instead of SortedMap) and contains more than 1 element. 
> However, we should make sure that we only print that warning once and not 
> every time the API is called.
> * do not produce warning for explicit optimistic transactions
> More details are here:
> http://apache-ignite-developers.2346864.n4.nabble.com/Re-Ignite-2-0-0-GridUnsafe-unmonitor-td23706.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12584) Query execution is too long issue!

2020-01-30 Thread Ivan Pavlukhin (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17026648#comment-17026648
 ] 

Ivan Pavlukhin commented on IGNITE-12584:
-

[~77aditya77], as you can see in logs "Query execution is too long" message is 
a warning. Do you have any other problems except messages in logs? By default a 
query is treated long if it executes longer than 3 seconds. This timeout can be 
redefined with 
{{org.apache.ignite.configuration.IgniteConfiguration#setLongQueryWarningTimeout}}
 if needed.

> Query execution is too long issue!
> --
>
> Key: IGNITE-12584
> URL: https://issues.apache.org/jira/browse/IGNITE-12584
> Project: Ignite
>  Issue Type: Bug
>Reporter: Aditya
>Priority: Major
> Attachments: uploadthis.txt
>
>
> When querying via some java application and if the topology is in such a way 
> that two clients connect to one server node, then some times we are getting 
> an exception saying query execution is too long.
>  
> This is the SQL schema for table
>  
> stmt.executeUpdate("CREATE TABLE DOCIDS (" +
>  " id LONG PRIMARY KEY, url VARCHAR, score LONG, appname VARCHAR) " +
>  " WITH \"template=replicated\"");
>  
> stmt.executeUpdate("CREATE INDEX idx_doc_name_url ON DOCIDS (appname, url)");
>  
> Query -> 
> SqlFieldsQuery query = new SqlFieldsQuery("SELECT count(id) FROM DOCIDS");
> FieldsQueryCursor> cursor = cache.query(query);
> For warning prints, please check the attachment.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12504) Auto-adjust breaks existing code, should be disabled by default

2020-01-30 Thread Anton Kalashnikov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17026647#comment-17026647
 ] 

Anton Kalashnikov commented on IGNITE-12504:


[~ilyak], since auto-adjust for the persistent cluster is disabled by default 
after https://issues.apache.org/jira/browse/IGNITE-12227. Am I right to 
understand that we can close this ticket?  Or there are some other problem 
cases?

> Auto-adjust breaks existing code, should be disabled by default
> ---
>
> Key: IGNITE-12504
> URL: https://issues.apache.org/jira/browse/IGNITE-12504
> Project: Ignite
>  Issue Type: Bug
>  Components: cache, persistence
>Affects Versions: 2.8
>Reporter: Ilya Kasnacheev
>Assignee: Anton Kalashnikov
>Priority: Blocker
> Fix For: 2.8
>
>
> We have automatic baseline adjustment now. However, it is 'on' by default, 
> which means it breaks existing code. I see new exceptions when starting an 
> existing project after bumping Ignite dependency version:
> {code}
> Caused by: 
> org.apache.ignite.internal.processors.cluster.BaselineAdjustForbiddenException:
>  Baseline auto-adjust is enabled, please turn-off it before try to adjust 
> baseline manually
> {code}
> (Please see reproducer from attached UL discussion)
> I think we should disable auto-adjust by default, let people enable it when 
> they see it fit.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12593) Corruption of B+Tree caused by byte array values and TTL

2020-01-30 Thread Alexey Goncharuk (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17026639#comment-17026639
 ] 

Alexey Goncharuk commented on IGNITE-12593:
---

Merged to master. Targeting this to 2.8 as discussed on the dev-list.

> Corruption of B+Tree caused by byte array values and TTL
> 
>
> Key: IGNITE-12593
> URL: https://issues.apache.org/jira/browse/IGNITE-12593
> Project: Ignite
>  Issue Type: Bug
>Reporter: Anton Kalashnikov
>Assignee: Anton Kalashnikov
>Priority: Major
> Fix For: 2.8
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> It seems that the following set of parameters may lead to a corruption of 
> B+Tree:
>  - persistence is enabled
>  - TTL is enabled 
>  - Expiry policy - AccessedExpiryPolicy 1 sec.
>  - cache value type is byte[]
>  - all caches belong to the same cache group
> Example of the stack trace:
> {code:java}
> [2019-07-16 
> 21:13:19,288][ERROR][sys-stripe-2-#46%db.IgnitePdsWithTtlDeactivateOnHighloadTest1%][IgniteTestResources]
>  Critical system error detected. Will be handled accordingly to configured 
> handler [hnd=NoOpFailureHandler [super=AbstractFailureHandler 
> [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, 
> SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext 
> [type=CRITICAL_ERROR, err=class 
> o.a.i.i.processors.cache.persistence.tree.CorruptedTreeException: B+Tree is 
> corrupted [pages(groupId, pageId)=[IgniteBiTuple [val1=-1237460590, 
> val2=281586645860358]], msg=Runtime failure on search row: SearchRow 
> [key=KeyCacheObjectImpl [part=26, val=378, hasValBytes=true], hash=378, 
> cacheId=-1806498247
> class 
> org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException:
>  B+Tree is corrupted [pages(groupId, pageId)=[IgniteBiTuple 
> [val1=-1237460590, val2=281586645860358]], msg=Runtime failure on search row: 
> SearchRow [key=KeyCacheObjectImpl [part=26, val=378, hasValBytes=true], 
> hash=378, cacheId=-1806498247]]
>   at 
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.corruptedTreeException(BPlusTree.java:5910)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invoke(BPlusTree.java:1859)
>   at 
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke0(IgniteCacheOffheapManagerImpl.java:1662)
>   at 
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke(IgniteCacheOffheapManagerImpl.java:1645)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.invoke(GridCacheOffheapManager.java:2410)
>   at 
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.invoke(IgniteCacheOffheapManagerImpl.java:445)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheMapEntry.innerUpdate(GridCacheMapEntry.java:2309)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateSingle(GridDhtAtomicCache.java:2570)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.update(GridDhtAtomicCache.java:2030)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal0(GridDhtAtomicCache.java:1848)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal(GridDhtAtomicCache.java:1668)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.processNearAtomicUpdateRequest(GridDhtAtomicCache.java:3235)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$400(GridDhtAtomicCache.java:139)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$5.apply(GridDhtAtomicCache.java:273)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$5.apply(GridDhtAtomicCache.java:268)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1141)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:591)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:392)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:318)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:109)
>   at 
> org.apache.ignite.internal.processo

[jira] [Commented] (IGNITE-12594) Deadlock between GridCacheDataStore#purgeExpiredInternal and GridNearTxLocal#enlistWriteEntry

2020-01-30 Thread Alexey Goncharuk (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17026638#comment-17026638
 ] 

Alexey Goncharuk commented on IGNITE-12594:
---

Merged to master. Targeting this to 2.8 as discussed on the dev-list.

> Deadlock between GridCacheDataStore#purgeExpiredInternal and 
> GridNearTxLocal#enlistWriteEntry
> -
>
> Key: IGNITE-12594
> URL: https://issues.apache.org/jira/browse/IGNITE-12594
> Project: Ignite
>  Issue Type: Bug
>Reporter: Anton Kalashnikov
>Assignee: Sergey Chugunov
>Priority: Major
> Fix For: 2.8
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The deadlock is reproduced occasionally in PDS3 suite and can be seen in the 
> thread dump below.
> One thread attempts to unwind evicts, acquires checkpoint read lock and then 
> locks {{GridCacheMapEntry}}. Another thread does 
> {{GridCacheMapEntry#unswap}}, determines that the entry is expired and 
> acquires checkpoint read lock to remove the entry from the store. 
> We should not acquire checkpoint read lock inside of a locked 
> {{GridCacheMapEntry}}.
> {code:java}Thread [name="updater-1", id=29900, state=WAITING, blockCnt=2, 
> waitCnt=4450]
> Lock 
> [object=java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@2fc51685,
>  ownerName=null, ownerId=-1]
> at sun.misc.Unsafe.park(Native Method)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
> at 
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
> at 
> o.a.i.i.processors.cache.persistence.GridCacheDatabaseSharedManager.checkpointReadLock(GridCacheDatabaseSharedManager.java:1632)
><- CP read lock
> at 
> o.a.i.i.processors.cache.GridCacheMapEntry.onExpired(GridCacheMapEntry.java:4081)
> at 
> o.a.i.i.processors.cache.GridCacheMapEntry.unswap(GridCacheMapEntry.java:559)
> at 
> o.a.i.i.processors.cache.GridCacheMapEntry.unswap(GridCacheMapEntry.java:519) 
>  <- locked entry
> at 
> o.a.i.i.processors.cache.distributed.near.GridNearTxLocal.enlistWriteEntry(GridNearTxLocal.java:1437)
> at 
> o.a.i.i.processors.cache.distributed.near.GridNearTxLocal.enlistWrite(GridNearTxLocal.java:1303)
> at 
> o.a.i.i.processors.cache.distributed.near.GridNearTxLocal.putAllAsync0(GridNearTxLocal.java:957)
> at 
> o.a.i.i.processors.cache.distributed.near.GridNearTxLocal.putAllAsync(GridNearTxLocal.java:491)
> at 
> o.a.i.i.processors.cache.GridCacheAdapter$29.inOp(GridCacheAdapter.java:2526)
> at 
> o.a.i.i.processors.cache.GridCacheAdapter$SyncInOp.op(GridCacheAdapter.java:4727)
> at 
> o.a.i.i.processors.cache.GridCacheAdapter.syncOp(GridCacheAdapter.java:3740)
> at 
> o.a.i.i.processors.cache.GridCacheAdapter.putAll0(GridCacheAdapter.java:2524)
> at 
> o.a.i.i.processors.cache.GridCacheAdapter.putAll(GridCacheAdapter.java:2513)
> at 
> o.a.i.i.processors.cache.IgniteCacheProxyImpl.putAll(IgniteCacheProxyImpl.java:1264)
> at 
> o.a.i.i.processors.cache.GatewayProtectedCacheProxy.putAll(GatewayProtectedCacheProxy.java:863)
> at 
> o.a.i.i.processors.cache.persistence.IgnitePdsContinuousRestartTest$1.call(IgnitePdsContinuousRestartTest.java:291)
> at o.a.i.testframework.GridTestThread.run(GridTestThread.java:83)
> Locked synchronizers:
> java.util.concurrent.locks.ReentrantLock$NonfairSync@762613f7
> Thread 
> [name="sys-stripe-0-#24086%persistence.IgnitePdsContinuousRestartTestWithExpiryPolicy0%",
>  id=29617, state=WAITING, blockCnt=2, waitCnt=65381]
> Lock 
> [object=java.util.concurrent.locks.ReentrantLock$NonfairSync@762613f7, 
> ownerName=updater-1, ownerId=29900]
> at sun.misc.Unsafe.park(Native Method)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
> at 
> java.util.concurrent.locks.ReentrantLock$

[jira] [Updated] (IGNITE-12593) Corruption of B+Tree caused by byte array values and TTL

2020-01-30 Thread Alexey Goncharuk (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Goncharuk updated IGNITE-12593:
--
Fix Version/s: 2.8

> Corruption of B+Tree caused by byte array values and TTL
> 
>
> Key: IGNITE-12593
> URL: https://issues.apache.org/jira/browse/IGNITE-12593
> Project: Ignite
>  Issue Type: Bug
>Reporter: Anton Kalashnikov
>Assignee: Anton Kalashnikov
>Priority: Major
> Fix For: 2.8
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> It seems that the following set of parameters may lead to a corruption of 
> B+Tree:
>  - persistence is enabled
>  - TTL is enabled 
>  - Expiry policy - AccessedExpiryPolicy 1 sec.
>  - cache value type is byte[]
>  - all caches belong to the same cache group
> Example of the stack trace:
> {code:java}
> [2019-07-16 
> 21:13:19,288][ERROR][sys-stripe-2-#46%db.IgnitePdsWithTtlDeactivateOnHighloadTest1%][IgniteTestResources]
>  Critical system error detected. Will be handled accordingly to configured 
> handler [hnd=NoOpFailureHandler [super=AbstractFailureHandler 
> [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, 
> SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext 
> [type=CRITICAL_ERROR, err=class 
> o.a.i.i.processors.cache.persistence.tree.CorruptedTreeException: B+Tree is 
> corrupted [pages(groupId, pageId)=[IgniteBiTuple [val1=-1237460590, 
> val2=281586645860358]], msg=Runtime failure on search row: SearchRow 
> [key=KeyCacheObjectImpl [part=26, val=378, hasValBytes=true], hash=378, 
> cacheId=-1806498247
> class 
> org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException:
>  B+Tree is corrupted [pages(groupId, pageId)=[IgniteBiTuple 
> [val1=-1237460590, val2=281586645860358]], msg=Runtime failure on search row: 
> SearchRow [key=KeyCacheObjectImpl [part=26, val=378, hasValBytes=true], 
> hash=378, cacheId=-1806498247]]
>   at 
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.corruptedTreeException(BPlusTree.java:5910)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invoke(BPlusTree.java:1859)
>   at 
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke0(IgniteCacheOffheapManagerImpl.java:1662)
>   at 
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke(IgniteCacheOffheapManagerImpl.java:1645)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.invoke(GridCacheOffheapManager.java:2410)
>   at 
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.invoke(IgniteCacheOffheapManagerImpl.java:445)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheMapEntry.innerUpdate(GridCacheMapEntry.java:2309)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateSingle(GridDhtAtomicCache.java:2570)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.update(GridDhtAtomicCache.java:2030)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal0(GridDhtAtomicCache.java:1848)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal(GridDhtAtomicCache.java:1668)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.processNearAtomicUpdateRequest(GridDhtAtomicCache.java:3235)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$400(GridDhtAtomicCache.java:139)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$5.apply(GridDhtAtomicCache.java:273)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$5.apply(GridDhtAtomicCache.java:268)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1141)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:591)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:392)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:318)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:109)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:308)
>   at 
> org.apache.ignite.inte

[jira] [Updated] (IGNITE-12594) Deadlock between GridCacheDataStore#purgeExpiredInternal and GridNearTxLocal#enlistWriteEntry

2020-01-30 Thread Alexey Goncharuk (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Goncharuk updated IGNITE-12594:
--
Fix Version/s: 2.8

> Deadlock between GridCacheDataStore#purgeExpiredInternal and 
> GridNearTxLocal#enlistWriteEntry
> -
>
> Key: IGNITE-12594
> URL: https://issues.apache.org/jira/browse/IGNITE-12594
> Project: Ignite
>  Issue Type: Bug
>Reporter: Anton Kalashnikov
>Assignee: Sergey Chugunov
>Priority: Major
> Fix For: 2.8
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The deadlock is reproduced occasionally in PDS3 suite and can be seen in the 
> thread dump below.
> One thread attempts to unwind evicts, acquires checkpoint read lock and then 
> locks {{GridCacheMapEntry}}. Another thread does 
> {{GridCacheMapEntry#unswap}}, determines that the entry is expired and 
> acquires checkpoint read lock to remove the entry from the store. 
> We should not acquire checkpoint read lock inside of a locked 
> {{GridCacheMapEntry}}.
> {code:java}Thread [name="updater-1", id=29900, state=WAITING, blockCnt=2, 
> waitCnt=4450]
> Lock 
> [object=java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@2fc51685,
>  ownerName=null, ownerId=-1]
> at sun.misc.Unsafe.park(Native Method)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
> at 
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
> at 
> o.a.i.i.processors.cache.persistence.GridCacheDatabaseSharedManager.checkpointReadLock(GridCacheDatabaseSharedManager.java:1632)
><- CP read lock
> at 
> o.a.i.i.processors.cache.GridCacheMapEntry.onExpired(GridCacheMapEntry.java:4081)
> at 
> o.a.i.i.processors.cache.GridCacheMapEntry.unswap(GridCacheMapEntry.java:559)
> at 
> o.a.i.i.processors.cache.GridCacheMapEntry.unswap(GridCacheMapEntry.java:519) 
>  <- locked entry
> at 
> o.a.i.i.processors.cache.distributed.near.GridNearTxLocal.enlistWriteEntry(GridNearTxLocal.java:1437)
> at 
> o.a.i.i.processors.cache.distributed.near.GridNearTxLocal.enlistWrite(GridNearTxLocal.java:1303)
> at 
> o.a.i.i.processors.cache.distributed.near.GridNearTxLocal.putAllAsync0(GridNearTxLocal.java:957)
> at 
> o.a.i.i.processors.cache.distributed.near.GridNearTxLocal.putAllAsync(GridNearTxLocal.java:491)
> at 
> o.a.i.i.processors.cache.GridCacheAdapter$29.inOp(GridCacheAdapter.java:2526)
> at 
> o.a.i.i.processors.cache.GridCacheAdapter$SyncInOp.op(GridCacheAdapter.java:4727)
> at 
> o.a.i.i.processors.cache.GridCacheAdapter.syncOp(GridCacheAdapter.java:3740)
> at 
> o.a.i.i.processors.cache.GridCacheAdapter.putAll0(GridCacheAdapter.java:2524)
> at 
> o.a.i.i.processors.cache.GridCacheAdapter.putAll(GridCacheAdapter.java:2513)
> at 
> o.a.i.i.processors.cache.IgniteCacheProxyImpl.putAll(IgniteCacheProxyImpl.java:1264)
> at 
> o.a.i.i.processors.cache.GatewayProtectedCacheProxy.putAll(GatewayProtectedCacheProxy.java:863)
> at 
> o.a.i.i.processors.cache.persistence.IgnitePdsContinuousRestartTest$1.call(IgnitePdsContinuousRestartTest.java:291)
> at o.a.i.testframework.GridTestThread.run(GridTestThread.java:83)
> Locked synchronizers:
> java.util.concurrent.locks.ReentrantLock$NonfairSync@762613f7
> Thread 
> [name="sys-stripe-0-#24086%persistence.IgnitePdsContinuousRestartTestWithExpiryPolicy0%",
>  id=29617, state=WAITING, blockCnt=2, waitCnt=65381]
> Lock 
> [object=java.util.concurrent.locks.ReentrantLock$NonfairSync@762613f7, 
> ownerName=updater-1, ownerId=29900]
> at sun.misc.Unsafe.park(Native Method)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
> at 
> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
> at 
> java.util.concurrent.locks.ReentrantLock.lock(Re

[jira] [Commented] (IGNITE-12601) DistributedMetaStoragePersistentTest.testUnstableTopology is flaky

2020-01-30 Thread Maxim Muzafarov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17026637#comment-17026637
 ] 

Maxim Muzafarov commented on IGNITE-12601:
--

It seems `PDS (Direct IO) 1` suite is not part of Run::All.
Should we consider running it a few times to be sure that flaky tets have been 
fixed?

https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_PdsDirectIo1&tab=buildTypeStatusDiv&branch_IgniteTests24Java8=pull%2F7334%2Fhead

> DistributedMetaStoragePersistentTest.testUnstableTopology is flaky
> --
>
> Key: IGNITE-12601
> URL: https://issues.apache.org/jira/browse/IGNITE-12601
> Project: Ignite
>  Issue Type: Bug
>Reporter: Vyacheslav Koptilin
>Assignee: Vyacheslav Koptilin
>Priority: Major
> Fix For: 2.8
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> DistributedMetaStoragePersistentTest.testUnstableTopology is flaky
> Please take a look at TC:
> https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8&testNameId=5923369202582779855&tab=testDetails



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-12608) [ignite-extensions] Setup tests for ignite-client-spring-boot-autoconfigure, ignite-spring-boot-autoconfigure on TC

2020-01-30 Thread Nikolay Izhikov (Jira)
Nikolay Izhikov created IGNITE-12608:


 Summary: [ignite-extensions] Setup tests for 
ignite-client-spring-boot-autoconfigure, ignite-spring-boot-autoconfigure on TC
 Key: IGNITE-12608
 URL: https://issues.apache.org/jira/browse/IGNITE-12608
 Project: Ignite
  Issue Type: Improvement
Reporter: Nikolay Izhikov


It seems we should update JUnit version to run spring tests on TC



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (IGNITE-12608) [ignite-extensions] Setup tests for ignite-client-spring-boot-autoconfigure, ignite-spring-boot-autoconfigure on TC

2020-01-30 Thread Nikolay Izhikov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nikolay Izhikov reassigned IGNITE-12608:


Assignee: Nikolay Izhikov

> [ignite-extensions] Setup tests for ignite-client-spring-boot-autoconfigure, 
> ignite-spring-boot-autoconfigure on TC
> ---
>
> Key: IGNITE-12608
> URL: https://issues.apache.org/jira/browse/IGNITE-12608
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Nikolay Izhikov
>Assignee: Nikolay Izhikov
>Priority: Minor
>
> It seems we should update JUnit version to run spring tests on TC



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-12607) PartitionsExchangeAwareTest is flaky

2020-01-30 Thread Ivan Rakov (Jira)
Ivan Rakov created IGNITE-12607:
---

 Summary: PartitionsExchangeAwareTest is flaky
 Key: IGNITE-12607
 URL: https://issues.apache.org/jira/browse/IGNITE-12607
 Project: Ignite
  Issue Type: Bug
Reporter: Ivan Rakov
Assignee: Ivan Rakov
 Fix For: 2.9


Proof: 
https://ci.ignite.apache.org/buildConfiguration/IgniteTests24Java8_Cache6/4972239
Seems like cache update sometimes is not possible even before topologies are 
locked.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12601) DistributedMetaStoragePersistentTest.testUnstableTopology is flaky

2020-01-30 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17026607#comment-17026607
 ] 

Ignite TC Bot commented on IGNITE-12601:


{panel:title=Branch: [pull/7334/head] Base: [master] : No blockers 
found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel}
[TeamCity *--> Run :: All* 
Results|https://ci.ignite.apache.org/viewLog.html?buildId=4972169&buildTypeId=IgniteTests24Java8_RunAll]

> DistributedMetaStoragePersistentTest.testUnstableTopology is flaky
> --
>
> Key: IGNITE-12601
> URL: https://issues.apache.org/jira/browse/IGNITE-12601
> Project: Ignite
>  Issue Type: Bug
>Reporter: Vyacheslav Koptilin
>Assignee: Vyacheslav Koptilin
>Priority: Major
> Fix For: 2.8
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> DistributedMetaStoragePersistentTest.testUnstableTopology is flaky
> Please take a look at TC:
> https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8&testNameId=5923369202582779855&tab=testDetails



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12594) Deadlock between GridCacheDataStore#purgeExpiredInternal and GridNearTxLocal#enlistWriteEntry

2020-01-30 Thread Alexey Goncharuk (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17026602#comment-17026602
 ] 

Alexey Goncharuk commented on IGNITE-12594:
---

The deadlock is reproduced in PDS3 suite and also a good reproducer was added 
in IGNITE-12593 ticket.

> Deadlock between GridCacheDataStore#purgeExpiredInternal and 
> GridNearTxLocal#enlistWriteEntry
> -
>
> Key: IGNITE-12594
> URL: https://issues.apache.org/jira/browse/IGNITE-12594
> Project: Ignite
>  Issue Type: Bug
>Reporter: Anton Kalashnikov
>Assignee: Sergey Chugunov
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The deadlock is reproduced occasionally in PDS3 suite and can be seen in the 
> thread dump below.
> One thread attempts to unwind evicts, acquires checkpoint read lock and then 
> locks {{GridCacheMapEntry}}. Another thread does 
> {{GridCacheMapEntry#unswap}}, determines that the entry is expired and 
> acquires checkpoint read lock to remove the entry from the store. 
> We should not acquire checkpoint read lock inside of a locked 
> {{GridCacheMapEntry}}.
> {code:java}Thread [name="updater-1", id=29900, state=WAITING, blockCnt=2, 
> waitCnt=4450]
> Lock 
> [object=java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@2fc51685,
>  ownerName=null, ownerId=-1]
> at sun.misc.Unsafe.park(Native Method)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
> at 
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
> at 
> o.a.i.i.processors.cache.persistence.GridCacheDatabaseSharedManager.checkpointReadLock(GridCacheDatabaseSharedManager.java:1632)
><- CP read lock
> at 
> o.a.i.i.processors.cache.GridCacheMapEntry.onExpired(GridCacheMapEntry.java:4081)
> at 
> o.a.i.i.processors.cache.GridCacheMapEntry.unswap(GridCacheMapEntry.java:559)
> at 
> o.a.i.i.processors.cache.GridCacheMapEntry.unswap(GridCacheMapEntry.java:519) 
>  <- locked entry
> at 
> o.a.i.i.processors.cache.distributed.near.GridNearTxLocal.enlistWriteEntry(GridNearTxLocal.java:1437)
> at 
> o.a.i.i.processors.cache.distributed.near.GridNearTxLocal.enlistWrite(GridNearTxLocal.java:1303)
> at 
> o.a.i.i.processors.cache.distributed.near.GridNearTxLocal.putAllAsync0(GridNearTxLocal.java:957)
> at 
> o.a.i.i.processors.cache.distributed.near.GridNearTxLocal.putAllAsync(GridNearTxLocal.java:491)
> at 
> o.a.i.i.processors.cache.GridCacheAdapter$29.inOp(GridCacheAdapter.java:2526)
> at 
> o.a.i.i.processors.cache.GridCacheAdapter$SyncInOp.op(GridCacheAdapter.java:4727)
> at 
> o.a.i.i.processors.cache.GridCacheAdapter.syncOp(GridCacheAdapter.java:3740)
> at 
> o.a.i.i.processors.cache.GridCacheAdapter.putAll0(GridCacheAdapter.java:2524)
> at 
> o.a.i.i.processors.cache.GridCacheAdapter.putAll(GridCacheAdapter.java:2513)
> at 
> o.a.i.i.processors.cache.IgniteCacheProxyImpl.putAll(IgniteCacheProxyImpl.java:1264)
> at 
> o.a.i.i.processors.cache.GatewayProtectedCacheProxy.putAll(GatewayProtectedCacheProxy.java:863)
> at 
> o.a.i.i.processors.cache.persistence.IgnitePdsContinuousRestartTest$1.call(IgnitePdsContinuousRestartTest.java:291)
> at o.a.i.testframework.GridTestThread.run(GridTestThread.java:83)
> Locked synchronizers:
> java.util.concurrent.locks.ReentrantLock$NonfairSync@762613f7
> Thread 
> [name="sys-stripe-0-#24086%persistence.IgnitePdsContinuousRestartTestWithExpiryPolicy0%",
>  id=29617, state=WAITING, blockCnt=2, waitCnt=65381]
> Lock 
> [object=java.util.concurrent.locks.ReentrantLock$NonfairSync@762613f7, 
> ownerName=updater-1, ownerId=29900]
> at sun.misc.Unsafe.park(Native Method)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
> at 
> java.util.concurrent.locks.ReentrantL

[jira] [Commented] (IGNITE-12533) Remote security context tests refactoring.

2020-01-30 Thread Anton Vinogradov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17026600#comment-17026600
 ] 

Anton Vinogradov commented on IGNITE-12533:
---

Merged to master branch.
Thanks for your contribution.

> Remote security context tests refactoring.
> --
>
> Key: IGNITE-12533
> URL: https://issues.apache.org/jira/browse/IGNITE-12533
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Denis Garus
>Assignee: Denis Garus
>Priority: Major
> Fix For: 2.9
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> To make tests more readable and robust, we should use the 
> _AbstractRemoteSecurityContextCheckTest.Verifier#register(String)_ method in 
> all related tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-12013) NullPointerException is thrown by ExchangeLatchManager during cache creation

2020-01-30 Thread Vyacheslav Koptilin (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vyacheslav Koptilin updated IGNITE-12013:
-
Fix Version/s: 2.9

> NullPointerException is thrown by ExchangeLatchManager during cache creation
> 
>
> Key: IGNITE-12013
> URL: https://issues.apache.org/jira/browse/IGNITE-12013
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.7
>Reporter: Vyacheslav Koptilin
>Assignee: Vyacheslav Koptilin
>Priority: Major
> Fix For: 2.9
>
> Attachments: ignitenullpointer.log
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {{NullPointerException}} may be thrown during cluster topology change:
> {code:java}
> [14:15:49,820][SEVERE][exchange-worker-#63][GridDhtPartitionsExchangeFuture] 
> Failed to reinitialize local partitions (rebalancing will be stopped): 
> GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=468, 
> minorTopVer=1], discoEvt=DiscoveryCustomEvent 
> [customMsg=DynamicCacheChangeBatch 
> [id=728f11e1c61-11d31f36-508d-47e0-9a9c-d4f5a270948d, 
> reqs=[DynamicCacheChangeRequest [cacheName=SQL_PUBLIC_UPRIYA_112093_TB, 
> hasCfg=true, nodeId=10a0b1a4-09bb-4aa6-81e0-537a6431283b, 
> clientStartOnly=false, stop=false, destroy=false, disabledAfterStartfalse]], 
> exchangeActions=ExchangeActions [startCaches=[SQL_PUBLIC_UPRIYA_112093_TB], 
> stopCaches=null, startGrps=[SQL_PUBLIC_UPRIYA_112093_TB], stopGrps=[], 
> resetParts=null, stateChangeRequest=null], startCaches=false], 
> affTopVer=AffinityTopologyVersion [topVer=468, minorTopVer=1], 
> super=DiscoveryEvent [evtNode=TcpDiscoveryNode 
> [id=10a0b1a4-09bb-4aa6-81e0-537a6431283b, addrs=[0:0:0:0:0:0:0:1%lo, 
> 10.244.1.100, 127.0.0.1], sockAddrs=[/10.244.1.100:0, /0:0:0:0:0:0:0:1%lo:0, 
> /127.0.0.1:0], discPort=0, order=39, intOrder=27, 
> lastExchangeTime=1563872413854, loc=false, ver=2.7.0#20181130-sha1:256ae401, 
> isClient=true], topVer=468, nodeId8=6a076901, msg=null, 
> type=DISCOVERY_CUSTOM_EVT, tstamp=1563891349722]], nodeId=10a0b1a4, 
> evt=DISCOVERY_CUSTOM_EVT]
> java.lang.NullPointerException
> at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.latch.ExchangeLatchManager.canSkipJoiningNodes(ExchangeLatchManager.java:327)
> at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.distributedExchange(GridDhtPartitionsExchangeFuture.java:1401)
> at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:806)
> at 
> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:2667)
> at 
> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2539)
> at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> The original topic on the user-list: 
> [http://apache-ignite-users.70518.x6.nabble.com/Ignite-2-7-0-server-node-null-pointer-exception-td28899.html]
> *RESOLUTION*
> It seems that the reason for the issue is a small value of 
> IGNITE_DISCOVERY_HISTORY_SIZE ( smaller than the number of nodes joining/left 
> the cluster simultaneously). I could not reproduce the issue with the default 
> values of TcpDiscoverySpi#topHistSize and IGNITE_DISCOVERY_HISTORY_SIZE. I 
> assume that this property was changed by the user.
> So, NullPointerException was changed to IgniteException with the appropriate 
> message which provides a hint to resolve the issue. Perhaps, it would be a 
> good idea to change the implementation of ExchangeLatchManager in the way of 
> using DiscoCache instance instead of AffinityTopologyVersion. This approach 
> has pros and cons, so it requires additional investigation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-12013) NullPointerException is thrown by ExchangeLatchManager during cache creation

2020-01-30 Thread Vyacheslav Koptilin (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vyacheslav Koptilin updated IGNITE-12013:
-
Description: 
{{NullPointerException}} may be thrown during cluster topology change:
{code:java}
[14:15:49,820][SEVERE][exchange-worker-#63][GridDhtPartitionsExchangeFuture] 
Failed to reinitialize local partitions (rebalancing will be stopped): 
GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=468, 
minorTopVer=1], discoEvt=DiscoveryCustomEvent 
[customMsg=DynamicCacheChangeBatch 
[id=728f11e1c61-11d31f36-508d-47e0-9a9c-d4f5a270948d, 
reqs=[DynamicCacheChangeRequest [cacheName=SQL_PUBLIC_UPRIYA_112093_TB, 
hasCfg=true, nodeId=10a0b1a4-09bb-4aa6-81e0-537a6431283b, 
clientStartOnly=false, stop=false, destroy=false, disabledAfterStartfalse]], 
exchangeActions=ExchangeActions [startCaches=[SQL_PUBLIC_UPRIYA_112093_TB], 
stopCaches=null, startGrps=[SQL_PUBLIC_UPRIYA_112093_TB], stopGrps=[], 
resetParts=null, stateChangeRequest=null], startCaches=false], 
affTopVer=AffinityTopologyVersion [topVer=468, minorTopVer=1], 
super=DiscoveryEvent [evtNode=TcpDiscoveryNode 
[id=10a0b1a4-09bb-4aa6-81e0-537a6431283b, addrs=[0:0:0:0:0:0:0:1%lo, 
10.244.1.100, 127.0.0.1], sockAddrs=[/10.244.1.100:0, /0:0:0:0:0:0:0:1%lo:0, 
/127.0.0.1:0], discPort=0, order=39, intOrder=27, 
lastExchangeTime=1563872413854, loc=false, ver=2.7.0#20181130-sha1:256ae401, 
isClient=true], topVer=468, nodeId8=6a076901, msg=null, 
type=DISCOVERY_CUSTOM_EVT, tstamp=1563891349722]], nodeId=10a0b1a4, 
evt=DISCOVERY_CUSTOM_EVT]
java.lang.NullPointerException
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.latch.ExchangeLatchManager.canSkipJoiningNodes(ExchangeLatchManager.java:327)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.distributedExchange(GridDhtPartitionsExchangeFuture.java:1401)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:806)
at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:2667)
at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2539)
at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
at java.lang.Thread.run(Thread.java:745)
{code}
The original topic on the user-list: 
[http://apache-ignite-users.70518.x6.nabble.com/Ignite-2-7-0-server-node-null-pointer-exception-td28899.html]

*RESOLUTION*
It seems that the reason for the issue is a small value of 
IGNITE_DISCOVERY_HISTORY_SIZE ( smaller than the number of nodes joining/left 
the cluster simultaneously). I could not reproduce the issue with the default 
values of TcpDiscoverySpi#topHistSize and IGNITE_DISCOVERY_HISTORY_SIZE. I 
assume that this property was changed by the user.

So, NullPointerException was changed to IgniteException with the appropriate 
message which provides a hint to resolve the issue. Perhaps, it would be a good 
idea to change the implementation of ExchangeLatchManager in the way of using 
DiscoCache instance instead of AffinityTopologyVersion. This approach has pros 
and cons, so it requires additional investigation.

  was:
{{NullPointerException}} may be thrown during cluster topology change:
{code:java}
[14:15:49,820][SEVERE][exchange-worker-#63][GridDhtPartitionsExchangeFuture] 
Failed to reinitialize local partitions (rebalancing will be stopped): 
GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=468, 
minorTopVer=1], discoEvt=DiscoveryCustomEvent 
[customMsg=DynamicCacheChangeBatch 
[id=728f11e1c61-11d31f36-508d-47e0-9a9c-d4f5a270948d, 
reqs=[DynamicCacheChangeRequest [cacheName=SQL_PUBLIC_UPRIYA_112093_TB, 
hasCfg=true, nodeId=10a0b1a4-09bb-4aa6-81e0-537a6431283b, 
clientStartOnly=false, stop=false, destroy=false, disabledAfterStartfalse]], 
exchangeActions=ExchangeActions [startCaches=[SQL_PUBLIC_UPRIYA_112093_TB], 
stopCaches=null, startGrps=[SQL_PUBLIC_UPRIYA_112093_TB], stopGrps=[], 
resetParts=null, stateChangeRequest=null], startCaches=false], 
affTopVer=AffinityTopologyVersion [topVer=468, minorTopVer=1], 
super=DiscoveryEvent [evtNode=TcpDiscoveryNode 
[id=10a0b1a4-09bb-4aa6-81e0-537a6431283b, addrs=[0:0:0:0:0:0:0:1%lo, 
10.244.1.100, 127.0.0.1], sockAddrs=[/10.244.1.100:0, /0:0:0:0:0:0:0:1%lo:0, 
/127.0.0.1:0], discPort=0, order=39, intOrder=27, 
lastExchangeTime=1563872413854, loc=false, ver=2.7.0#20181130-sha1:256ae401, 
isClient=true], topVer=468, nodeId8=6a076901, msg=null, 
type=DISCOVERY_CUSTOM_EVT, tstamp=1563891349722]], nodeId=10a0b1a4, 
evt=DISCOVERY_CUSTOM_EVT]
java.lang.NullPointerException
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.latch.ExchangeLatchManager.canSkip

[jira] [Updated] (IGNITE-12013) NullPointerException is thrown by ExchangeLatchManager during cache creation

2020-01-30 Thread Vyacheslav Koptilin (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vyacheslav Koptilin updated IGNITE-12013:
-
Description: 
{{NullPointerException}} may be thrown during cluster topology change:
{code:java}
[14:15:49,820][SEVERE][exchange-worker-#63][GridDhtPartitionsExchangeFuture] 
Failed to reinitialize local partitions (rebalancing will be stopped): 
GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=468, 
minorTopVer=1], discoEvt=DiscoveryCustomEvent 
[customMsg=DynamicCacheChangeBatch 
[id=728f11e1c61-11d31f36-508d-47e0-9a9c-d4f5a270948d, 
reqs=[DynamicCacheChangeRequest [cacheName=SQL_PUBLIC_UPRIYA_112093_TB, 
hasCfg=true, nodeId=10a0b1a4-09bb-4aa6-81e0-537a6431283b, 
clientStartOnly=false, stop=false, destroy=false, disabledAfterStartfalse]], 
exchangeActions=ExchangeActions [startCaches=[SQL_PUBLIC_UPRIYA_112093_TB], 
stopCaches=null, startGrps=[SQL_PUBLIC_UPRIYA_112093_TB], stopGrps=[], 
resetParts=null, stateChangeRequest=null], startCaches=false], 
affTopVer=AffinityTopologyVersion [topVer=468, minorTopVer=1], 
super=DiscoveryEvent [evtNode=TcpDiscoveryNode 
[id=10a0b1a4-09bb-4aa6-81e0-537a6431283b, addrs=[0:0:0:0:0:0:0:1%lo, 
10.244.1.100, 127.0.0.1], sockAddrs=[/10.244.1.100:0, /0:0:0:0:0:0:0:1%lo:0, 
/127.0.0.1:0], discPort=0, order=39, intOrder=27, 
lastExchangeTime=1563872413854, loc=false, ver=2.7.0#20181130-sha1:256ae401, 
isClient=true], topVer=468, nodeId8=6a076901, msg=null, 
type=DISCOVERY_CUSTOM_EVT, tstamp=1563891349722]], nodeId=10a0b1a4, 
evt=DISCOVERY_CUSTOM_EVT]
java.lang.NullPointerException
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.latch.ExchangeLatchManager.canSkipJoiningNodes(ExchangeLatchManager.java:327)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.distributedExchange(GridDhtPartitionsExchangeFuture.java:1401)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:806)
at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:2667)
at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2539)
at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
at java.lang.Thread.run(Thread.java:745)
{code}
The original topic on the user-list: 
[http://apache-ignite-users.70518.x6.nabble.com/Ignite-2-7-0-server-node-null-pointer-exception-td28899.html]

RESOLUTION
It seems that the reason for the issue is a small value of 
IGNITE_DISCOVERY_HISTORY_SIZE ( smaller than the number of nodes joining/left 
the cluster simultaneously). I could not reproduce the issue with the default 
values of TcpDiscoverySpi#topHistSize and IGNITE_DISCOVERY_HISTORY_SIZE. I 
assume that this property was changed by the user.

So, NullPointerException was changed to IgniteException with the appropriate 
message which provides a hint to resolve the issue. Perhaps, it would be a good 
idea to change the implementation of ExchangeLatchManager in the way of using 
DiscoCache instance instead of AffinityTopologyVersion. This approach has pros 
and cons, so it requires additional investigation.

  was:
{{NullPointerException}} may be thrown during cluster topology change:
{code:java}
[14:15:49,820][SEVERE][exchange-worker-#63][GridDhtPartitionsExchangeFuture] 
Failed to reinitialize local partitions (rebalancing will be stopped): 
GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=468, 
minorTopVer=1], discoEvt=DiscoveryCustomEvent 
[customMsg=DynamicCacheChangeBatch 
[id=728f11e1c61-11d31f36-508d-47e0-9a9c-d4f5a270948d, 
reqs=[DynamicCacheChangeRequest [cacheName=SQL_PUBLIC_UPRIYA_112093_TB, 
hasCfg=true, nodeId=10a0b1a4-09bb-4aa6-81e0-537a6431283b, 
clientStartOnly=false, stop=false, destroy=false, disabledAfterStartfalse]], 
exchangeActions=ExchangeActions [startCaches=[SQL_PUBLIC_UPRIYA_112093_TB], 
stopCaches=null, startGrps=[SQL_PUBLIC_UPRIYA_112093_TB], stopGrps=[], 
resetParts=null, stateChangeRequest=null], startCaches=false], 
affTopVer=AffinityTopologyVersion [topVer=468, minorTopVer=1], 
super=DiscoveryEvent [evtNode=TcpDiscoveryNode 
[id=10a0b1a4-09bb-4aa6-81e0-537a6431283b, addrs=[0:0:0:0:0:0:0:1%lo, 
10.244.1.100, 127.0.0.1], sockAddrs=[/10.244.1.100:0, /0:0:0:0:0:0:0:1%lo:0, 
/127.0.0.1:0], discPort=0, order=39, intOrder=27, 
lastExchangeTime=1563872413854, loc=false, ver=2.7.0#20181130-sha1:256ae401, 
isClient=true], topVer=468, nodeId8=6a076901, msg=null, 
type=DISCOVERY_CUSTOM_EVT, tstamp=1563891349722]], nodeId=10a0b1a4, 
evt=DISCOVERY_CUSTOM_EVT]
java.lang.NullPointerException
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.latch.ExchangeLatchManager.canSkipJo

[jira] [Commented] (IGNITE-12013) NullPointerException is thrown by ExchangeLatchManager during cache creation

2020-01-30 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17026587#comment-17026587
 ] 

Ignite TC Bot commented on IGNITE-12013:


{panel:title=Branch: [pull/7335/head] Base: [master] : No blockers 
found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel}
[TeamCity *--> Run :: All* 
Results|https://ci.ignite.apache.org/viewLog.html?buildId=4972285&buildTypeId=IgniteTests24Java8_RunAll]

> NullPointerException is thrown by ExchangeLatchManager during cache creation
> 
>
> Key: IGNITE-12013
> URL: https://issues.apache.org/jira/browse/IGNITE-12013
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.7
>Reporter: Vyacheslav Koptilin
>Assignee: Vyacheslav Koptilin
>Priority: Major
> Attachments: ignitenullpointer.log
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {{NullPointerException}} may be thrown during cluster topology change:
> {code:java}
> [14:15:49,820][SEVERE][exchange-worker-#63][GridDhtPartitionsExchangeFuture] 
> Failed to reinitialize local partitions (rebalancing will be stopped): 
> GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=468, 
> minorTopVer=1], discoEvt=DiscoveryCustomEvent 
> [customMsg=DynamicCacheChangeBatch 
> [id=728f11e1c61-11d31f36-508d-47e0-9a9c-d4f5a270948d, 
> reqs=[DynamicCacheChangeRequest [cacheName=SQL_PUBLIC_UPRIYA_112093_TB, 
> hasCfg=true, nodeId=10a0b1a4-09bb-4aa6-81e0-537a6431283b, 
> clientStartOnly=false, stop=false, destroy=false, disabledAfterStartfalse]], 
> exchangeActions=ExchangeActions [startCaches=[SQL_PUBLIC_UPRIYA_112093_TB], 
> stopCaches=null, startGrps=[SQL_PUBLIC_UPRIYA_112093_TB], stopGrps=[], 
> resetParts=null, stateChangeRequest=null], startCaches=false], 
> affTopVer=AffinityTopologyVersion [topVer=468, minorTopVer=1], 
> super=DiscoveryEvent [evtNode=TcpDiscoveryNode 
> [id=10a0b1a4-09bb-4aa6-81e0-537a6431283b, addrs=[0:0:0:0:0:0:0:1%lo, 
> 10.244.1.100, 127.0.0.1], sockAddrs=[/10.244.1.100:0, /0:0:0:0:0:0:0:1%lo:0, 
> /127.0.0.1:0], discPort=0, order=39, intOrder=27, 
> lastExchangeTime=1563872413854, loc=false, ver=2.7.0#20181130-sha1:256ae401, 
> isClient=true], topVer=468, nodeId8=6a076901, msg=null, 
> type=DISCOVERY_CUSTOM_EVT, tstamp=1563891349722]], nodeId=10a0b1a4, 
> evt=DISCOVERY_CUSTOM_EVT]
> java.lang.NullPointerException
> at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.latch.ExchangeLatchManager.canSkipJoiningNodes(ExchangeLatchManager.java:327)
> at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.distributedExchange(GridDhtPartitionsExchangeFuture.java:1401)
> at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:806)
> at 
> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:2667)
> at 
> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2539)
> at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> The original topic on the user-list: 
> [http://apache-ignite-users.70518.x6.nabble.com/Ignite-2-7-0-server-node-null-pointer-exception-td28899.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12282) Access restriction to the internal package of Ignite

2020-01-30 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17026550#comment-17026550
 ] 

Ignite TC Bot commented on IGNITE-12282:


{panel:title=Branch: [pull/7137/head] Base: [master] : No blockers 
found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel}
[TeamCity *--> Run :: All* 
Results|https://ci.ignite.apache.org/viewLog.html?buildId=4971193&buildTypeId=IgniteTests24Java8_RunAll]

> Access restriction to the internal package of Ignite
> 
>
> Key: IGNITE-12282
> URL: https://issues.apache.org/jira/browse/IGNITE-12282
> Project: Ignite
>  Issue Type: Task
>Reporter: Denis Garus
>Assignee: Denis Garus
>Priority: Major
>  Labels: iep-38
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> A user-defined code shouldn't have access to _org.apache.ignite.internal_.* 
> that is the internal package of Ignite.
> To restrict a user-defined code, we need to set the package name to the 
> _package.access_ security property.
> To grand access to a package, we have to use _accessClassInPackage.\{package 
> name}_ runtime permission.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12456) Cluster Data Store grid gets Corrupted for Load test

2020-01-30 Thread Alexey Goncharuk (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17026548#comment-17026548
 ] 

Alexey Goncharuk commented on IGNITE-12456:
---

[~ravimsc] given that there are a lot of blocked threads in your case, there is 
a chance that this ticket IGNITE-12523 is the root cause. It does not fit into 
2.8, so I'll try to schedule to to a maintenance release 2.8.1 and we can check 
if your issue gets resolved.

> Cluster Data Store grid gets Corrupted for Load test
> 
>
> Key: IGNITE-12456
> URL: https://issues.apache.org/jira/browse/IGNITE-12456
> Project: Ignite
>  Issue Type: Bug
>  Components: general
>Affects Versions: 2.7
>Reporter: Ravi Kumar Powli
>Assignee: Alexey Goncharuk
>Priority: Critical
> Fix For: 2.9
>
> Attachments: default-config.xml
>
>
> We have Apache Ignite 3 node cluster setup with Amazon S3 Based Discovery.we 
> are into AWS cloud environment with Microservice model with 8 
> microservices.we are using Ignite for session data store.While performing 
> load test we are facing data grid issues if the number of clients reached 40 
> , Once data grid got corrupted we will lost the session store data and 
> Application will not respond due to session data also corrupted and all the 
> instances will auto scaled down to initial size 8.We need to restart Apache 
> Ignite to make the Application up.Please find the attached Apache Ignite 
> Configuration for your reference.
> This is Impacting the scalability for the micro-services. It is very evident 
> that the current state based architecture will not scale beyond a certain TPS 
> and the state store, especially Ignite becomes the single point of failure, 
> stalling the full Micro-service cluster.
>  Apache Ignite Version : 2.7.0
> {code}
> 07:24:46,678][SEVERE][tcp-disco-msg-worker-#2%DataStoreIgniteCache%|#2%DataStoreIgniteCache%][G]
>  Blocked system-critical thread has been detected. This can lead to 
> cluster-wide undefined behaviour [threadName=sys-stripe-5, 
> blockedFor=21s]07:24:46,678][SEVERE][tcp-disco-msg-worker-#2%DataStoreIgniteCache%|#2%DataStoreIgniteCache%][G]
>  Blocked system-critical thread has been detected. This can lead to 
> cluster-wide undefined behaviour [threadName=sys-stripe-5, 
> blockedFor=21s][07:24:46,680][SEVERE][tcp-disco-msg-worker-#2%DataStoreIgniteCache%|#2%DataStoreIgniteCache%][]
>  Critical system error detected. Will be handled accordingly to configured 
> handler [hnd=NoOpFailureHandler [super=AbstractFailureHandler 
> [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED]]], failureCtx=FailureContext 
> [type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker 
> [name=sys-stripe-5, igniteInstanceName=DataStoreIgniteCache, finished=false, 
> heartbeatTs=1575271465499]]]class org.apache.ignite.IgniteException: 
> GridWorker [name=sys-stripe-5, igniteInstanceName=DataStoreIgniteCache, 
> finished=false, heartbeatTs=1575271465499] at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1831)
>  at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1826)
>  at 
> org.apache.ignite.internal.worker.WorkersRegistry.onIdle(WorkersRegistry.java:233)
>  at 
> org.apache.ignite.internal.util.worker.GridWorker.onIdle(GridWorker.java:297) 
> at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.lambda$new$0(ServerImpl.java:2663)
>  at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorker.body(ServerImpl.java:7181)
>  at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2700)
>  at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerThread.body(ServerImpl.java:7119)
>  at 
> org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)[07:24:52,692][SEVERE][tcp-disco-msg-worker-#2%DataStoreIgniteCache%|#2%DataStoreIgniteCache%][G]
>  Blocked system-critical thread has been detected. This can lead to 
> cluster-wide undefined behaviour [threadName=ttl-cleanup-worker, 
> blockedFor=27s][07:24:52,692][SEVERE][tcp-disco-msg-worker-#2%DataStoreIgniteCache%|#2%DataStoreIgniteCache%][]
>  Critical system error detected. Will be handled accordingly to configured 
> handler [hnd=NoOpFailureHandler [super=AbstractFailureHandler 
> [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED]]], failureCtx=FailureContext 
> [type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker 
> [name=ttl-cleanup-worker, igniteInstanceName=DataStoreIgniteCache, 
> finished=false, heartbeatTs=1575271465044]]]class 
> org.apache.ignite.IgniteException: GridWorker [name=ttl-cleanup-worker, 
> igniteInstanceName=Da

[jira] [Updated] (IGNITE-12523) Continuously generated thread dumps in failure processor slow down the whole system

2020-01-30 Thread Alexey Goncharuk (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Goncharuk updated IGNITE-12523:
--
Ignite Flags: Release Notes Required  (was: Docs Required,Release Notes 
Required)

> Continuously generated thread dumps in failure processor slow down the whole 
> system
> ---
>
> Key: IGNITE-12523
> URL: https://issues.apache.org/jira/browse/IGNITE-12523
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Andrey N. Gura
>Assignee: Andrey N. Gura
>Priority: Major
> Fix For: 2.9
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> A lot of threads (hundreds) build indexes. checkpoint-thread tries acquire 
> write lock but can’t because some threads hold read lock. Moreover, some 
> threads try to acquire read lock too. Failure types SYSTEM_WORKER_BLOCKED and 
> SYSTEM_CRITICAL_OPERATION_TIMEOUT are ignored.
> checkpoint-thread treated as blocked critical system worker. So failure 
> processor gets thread dump. 
> Threads  that waiting on read lock reports about 
> SYSTEM_CRITICAL_OPERATION_TIMEOUT and also get thread dump.
> Thread dump generation takes from 500 to 1000 ms.
> All this activity leads to stop-the-world pause and triggers other timeouts. 
> It could take long time because many threads are active and half time is 
> thread dump generation.
> Root cause problem here is checkpoint read-write lock. Discussed with 
> [~agoncharuk] and it seems only implementation of fuzzy checkpoint could 
> solve the problem. But it requires big effort.
> *Solution*
> - New system property IGNITE_DUMP_THREADS_ON_FAILURE_THROTTLING_TIMEOUT 
> added.  Default value is failure detection timeout.
> - Each call of FailureProcessor#process(FailureContext, FailureHandler) 
> method checka throttling timeout before thread dump generation.
> - There is no need to check that failure type is ignored. Throttling will be 
> useful for all cases when context is not invalidated 
> (FailureProcessor.failureCtx != null).
>  - For throttled thread dump we log info message  “Thread dump is hidden due 
> to throttling settings. Set IGNITE_DUMP_THREADS_ON_FAILURE_THROTTLING_TIMEOUT 
> property to 0 to see all thread dumps".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12594) Deadlock between GridCacheDataStore#purgeExpiredInternal and GridNearTxLocal#enlistWriteEntry

2020-01-30 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17026546#comment-17026546
 ] 

Ignite TC Bot commented on IGNITE-12594:


{panel:title=Branch: [pull/7325/head] Base: [master] : No blockers 
found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel}
[TeamCity *--> Run :: All* 
Results|https://ci.ignite.apache.org/viewLog.html?buildId=4971317&buildTypeId=IgniteTests24Java8_RunAll]

> Deadlock between GridCacheDataStore#purgeExpiredInternal and 
> GridNearTxLocal#enlistWriteEntry
> -
>
> Key: IGNITE-12594
> URL: https://issues.apache.org/jira/browse/IGNITE-12594
> Project: Ignite
>  Issue Type: Bug
>Reporter: Anton Kalashnikov
>Assignee: Sergey Chugunov
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The deadlock is reproduced occasionally in PDS3 suite and can be seen in the 
> thread dump below.
> One thread attempts to unwind evicts, acquires checkpoint read lock and then 
> locks {{GridCacheMapEntry}}. Another thread does 
> {{GridCacheMapEntry#unswap}}, determines that the entry is expired and 
> acquires checkpoint read lock to remove the entry from the store. 
> We should not acquire checkpoint read lock inside of a locked 
> {{GridCacheMapEntry}}.
> {code:java}Thread [name="updater-1", id=29900, state=WAITING, blockCnt=2, 
> waitCnt=4450]
> Lock 
> [object=java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@2fc51685,
>  ownerName=null, ownerId=-1]
> at sun.misc.Unsafe.park(Native Method)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
> at 
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
> at 
> o.a.i.i.processors.cache.persistence.GridCacheDatabaseSharedManager.checkpointReadLock(GridCacheDatabaseSharedManager.java:1632)
><- CP read lock
> at 
> o.a.i.i.processors.cache.GridCacheMapEntry.onExpired(GridCacheMapEntry.java:4081)
> at 
> o.a.i.i.processors.cache.GridCacheMapEntry.unswap(GridCacheMapEntry.java:559)
> at 
> o.a.i.i.processors.cache.GridCacheMapEntry.unswap(GridCacheMapEntry.java:519) 
>  <- locked entry
> at 
> o.a.i.i.processors.cache.distributed.near.GridNearTxLocal.enlistWriteEntry(GridNearTxLocal.java:1437)
> at 
> o.a.i.i.processors.cache.distributed.near.GridNearTxLocal.enlistWrite(GridNearTxLocal.java:1303)
> at 
> o.a.i.i.processors.cache.distributed.near.GridNearTxLocal.putAllAsync0(GridNearTxLocal.java:957)
> at 
> o.a.i.i.processors.cache.distributed.near.GridNearTxLocal.putAllAsync(GridNearTxLocal.java:491)
> at 
> o.a.i.i.processors.cache.GridCacheAdapter$29.inOp(GridCacheAdapter.java:2526)
> at 
> o.a.i.i.processors.cache.GridCacheAdapter$SyncInOp.op(GridCacheAdapter.java:4727)
> at 
> o.a.i.i.processors.cache.GridCacheAdapter.syncOp(GridCacheAdapter.java:3740)
> at 
> o.a.i.i.processors.cache.GridCacheAdapter.putAll0(GridCacheAdapter.java:2524)
> at 
> o.a.i.i.processors.cache.GridCacheAdapter.putAll(GridCacheAdapter.java:2513)
> at 
> o.a.i.i.processors.cache.IgniteCacheProxyImpl.putAll(IgniteCacheProxyImpl.java:1264)
> at 
> o.a.i.i.processors.cache.GatewayProtectedCacheProxy.putAll(GatewayProtectedCacheProxy.java:863)
> at 
> o.a.i.i.processors.cache.persistence.IgnitePdsContinuousRestartTest$1.call(IgnitePdsContinuousRestartTest.java:291)
> at o.a.i.testframework.GridTestThread.run(GridTestThread.java:83)
> Locked synchronizers:
> java.util.concurrent.locks.ReentrantLock$NonfairSync@762613f7
> Thread 
> [name="sys-stripe-0-#24086%persistence.IgnitePdsContinuousRestartTestWithExpiryPolicy0%",
>  id=29617, state=WAITING, blockCnt=2, waitCnt=65381]
> Lock 
> [object=java.util.concurrent.locks.ReentrantLock$NonfairSync@762613f7, 
> ownerName=updater-1, ownerId=29900]
> at sun.misc.Unsafe.park(Native Method)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
>

[jira] [Updated] (IGNITE-12456) Cluster Data Store grid gets Corrupted for Load test

2020-01-30 Thread Alexey Goncharuk (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Goncharuk updated IGNITE-12456:
--
Fix Version/s: (was: 2.8)
   2.9

> Cluster Data Store grid gets Corrupted for Load test
> 
>
> Key: IGNITE-12456
> URL: https://issues.apache.org/jira/browse/IGNITE-12456
> Project: Ignite
>  Issue Type: Bug
>  Components: general
>Affects Versions: 2.7
>Reporter: Ravi Kumar Powli
>Assignee: Alexey Goncharuk
>Priority: Critical
> Fix For: 2.9
>
> Attachments: default-config.xml
>
>
> We have Apache Ignite 3 node cluster setup with Amazon S3 Based Discovery.we 
> are into AWS cloud environment with Microservice model with 8 
> microservices.we are using Ignite for session data store.While performing 
> load test we are facing data grid issues if the number of clients reached 40 
> , Once data grid got corrupted we will lost the session store data and 
> Application will not respond due to session data also corrupted and all the 
> instances will auto scaled down to initial size 8.We need to restart Apache 
> Ignite to make the Application up.Please find the attached Apache Ignite 
> Configuration for your reference.
> This is Impacting the scalability for the micro-services. It is very evident 
> that the current state based architecture will not scale beyond a certain TPS 
> and the state store, especially Ignite becomes the single point of failure, 
> stalling the full Micro-service cluster.
>  Apache Ignite Version : 2.7.0
> {code}
> 07:24:46,678][SEVERE][tcp-disco-msg-worker-#2%DataStoreIgniteCache%|#2%DataStoreIgniteCache%][G]
>  Blocked system-critical thread has been detected. This can lead to 
> cluster-wide undefined behaviour [threadName=sys-stripe-5, 
> blockedFor=21s]07:24:46,678][SEVERE][tcp-disco-msg-worker-#2%DataStoreIgniteCache%|#2%DataStoreIgniteCache%][G]
>  Blocked system-critical thread has been detected. This can lead to 
> cluster-wide undefined behaviour [threadName=sys-stripe-5, 
> blockedFor=21s][07:24:46,680][SEVERE][tcp-disco-msg-worker-#2%DataStoreIgniteCache%|#2%DataStoreIgniteCache%][]
>  Critical system error detected. Will be handled accordingly to configured 
> handler [hnd=NoOpFailureHandler [super=AbstractFailureHandler 
> [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED]]], failureCtx=FailureContext 
> [type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker 
> [name=sys-stripe-5, igniteInstanceName=DataStoreIgniteCache, finished=false, 
> heartbeatTs=1575271465499]]]class org.apache.ignite.IgniteException: 
> GridWorker [name=sys-stripe-5, igniteInstanceName=DataStoreIgniteCache, 
> finished=false, heartbeatTs=1575271465499] at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1831)
>  at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1826)
>  at 
> org.apache.ignite.internal.worker.WorkersRegistry.onIdle(WorkersRegistry.java:233)
>  at 
> org.apache.ignite.internal.util.worker.GridWorker.onIdle(GridWorker.java:297) 
> at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.lambda$new$0(ServerImpl.java:2663)
>  at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorker.body(ServerImpl.java:7181)
>  at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2700)
>  at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerThread.body(ServerImpl.java:7119)
>  at 
> org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)[07:24:52,692][SEVERE][tcp-disco-msg-worker-#2%DataStoreIgniteCache%|#2%DataStoreIgniteCache%][G]
>  Blocked system-critical thread has been detected. This can lead to 
> cluster-wide undefined behaviour [threadName=ttl-cleanup-worker, 
> blockedFor=27s][07:24:52,692][SEVERE][tcp-disco-msg-worker-#2%DataStoreIgniteCache%|#2%DataStoreIgniteCache%][]
>  Critical system error detected. Will be handled accordingly to configured 
> handler [hnd=NoOpFailureHandler [super=AbstractFailureHandler 
> [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED]]], failureCtx=FailureContext 
> [type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker 
> [name=ttl-cleanup-worker, igniteInstanceName=DataStoreIgniteCache, 
> finished=false, heartbeatTs=1575271465044]]]class 
> org.apache.ignite.IgniteException: GridWorker [name=ttl-cleanup-worker, 
> igniteInstanceName=DataStoreIgniteCache, finished=false, 
> heartbeatTs=1575271465044] at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1831)
>  at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1826)
>  at 
> org.ap

[jira] [Assigned] (IGNITE-12456) Cluster Data Store grid gets Corrupted for Load test

2020-01-30 Thread Alexey Goncharuk (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Goncharuk reassigned IGNITE-12456:
-

Assignee: Alexey Goncharuk

> Cluster Data Store grid gets Corrupted for Load test
> 
>
> Key: IGNITE-12456
> URL: https://issues.apache.org/jira/browse/IGNITE-12456
> Project: Ignite
>  Issue Type: Bug
>  Components: general
>Affects Versions: 2.7
>Reporter: Ravi Kumar Powli
>Assignee: Alexey Goncharuk
>Priority: Blocker
> Fix For: 2.8
>
> Attachments: default-config.xml
>
>
> We have Apache Ignite 3 node cluster setup with Amazon S3 Based Discovery.we 
> are into AWS cloud environment with Microservice model with 8 
> microservices.we are using Ignite for session data store.While performing 
> load test we are facing data grid issues if the number of clients reached 40 
> , Once data grid got corrupted we will lost the session store data and 
> Application will not respond due to session data also corrupted and all the 
> instances will auto scaled down to initial size 8.We need to restart Apache 
> Ignite to make the Application up.Please find the attached Apache Ignite 
> Configuration for your reference.
> This is Impacting the scalability for the micro-services. It is very evident 
> that the current state based architecture will not scale beyond a certain TPS 
> and the state store, especially Ignite becomes the single point of failure, 
> stalling the full Micro-service cluster.
>  Apache Ignite Version : 2.7.0
> {code}
> 07:24:46,678][SEVERE][tcp-disco-msg-worker-#2%DataStoreIgniteCache%|#2%DataStoreIgniteCache%][G]
>  Blocked system-critical thread has been detected. This can lead to 
> cluster-wide undefined behaviour [threadName=sys-stripe-5, 
> blockedFor=21s]07:24:46,678][SEVERE][tcp-disco-msg-worker-#2%DataStoreIgniteCache%|#2%DataStoreIgniteCache%][G]
>  Blocked system-critical thread has been detected. This can lead to 
> cluster-wide undefined behaviour [threadName=sys-stripe-5, 
> blockedFor=21s][07:24:46,680][SEVERE][tcp-disco-msg-worker-#2%DataStoreIgniteCache%|#2%DataStoreIgniteCache%][]
>  Critical system error detected. Will be handled accordingly to configured 
> handler [hnd=NoOpFailureHandler [super=AbstractFailureHandler 
> [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED]]], failureCtx=FailureContext 
> [type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker 
> [name=sys-stripe-5, igniteInstanceName=DataStoreIgniteCache, finished=false, 
> heartbeatTs=1575271465499]]]class org.apache.ignite.IgniteException: 
> GridWorker [name=sys-stripe-5, igniteInstanceName=DataStoreIgniteCache, 
> finished=false, heartbeatTs=1575271465499] at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1831)
>  at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1826)
>  at 
> org.apache.ignite.internal.worker.WorkersRegistry.onIdle(WorkersRegistry.java:233)
>  at 
> org.apache.ignite.internal.util.worker.GridWorker.onIdle(GridWorker.java:297) 
> at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.lambda$new$0(ServerImpl.java:2663)
>  at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorker.body(ServerImpl.java:7181)
>  at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2700)
>  at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerThread.body(ServerImpl.java:7119)
>  at 
> org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)[07:24:52,692][SEVERE][tcp-disco-msg-worker-#2%DataStoreIgniteCache%|#2%DataStoreIgniteCache%][G]
>  Blocked system-critical thread has been detected. This can lead to 
> cluster-wide undefined behaviour [threadName=ttl-cleanup-worker, 
> blockedFor=27s][07:24:52,692][SEVERE][tcp-disco-msg-worker-#2%DataStoreIgniteCache%|#2%DataStoreIgniteCache%][]
>  Critical system error detected. Will be handled accordingly to configured 
> handler [hnd=NoOpFailureHandler [super=AbstractFailureHandler 
> [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED]]], failureCtx=FailureContext 
> [type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker 
> [name=ttl-cleanup-worker, igniteInstanceName=DataStoreIgniteCache, 
> finished=false, heartbeatTs=1575271465044]]]class 
> org.apache.ignite.IgniteException: GridWorker [name=ttl-cleanup-worker, 
> igniteInstanceName=DataStoreIgniteCache, finished=false, 
> heartbeatTs=1575271465044] at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1831)
>  at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1826)
>  at 
> org.apache.ignite.internal

[jira] [Updated] (IGNITE-12456) Cluster Data Store grid gets Corrupted for Load test

2020-01-30 Thread Alexey Goncharuk (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Goncharuk updated IGNITE-12456:
--
Priority: Critical  (was: Blocker)

> Cluster Data Store grid gets Corrupted for Load test
> 
>
> Key: IGNITE-12456
> URL: https://issues.apache.org/jira/browse/IGNITE-12456
> Project: Ignite
>  Issue Type: Bug
>  Components: general
>Affects Versions: 2.7
>Reporter: Ravi Kumar Powli
>Assignee: Alexey Goncharuk
>Priority: Critical
> Fix For: 2.8
>
> Attachments: default-config.xml
>
>
> We have Apache Ignite 3 node cluster setup with Amazon S3 Based Discovery.we 
> are into AWS cloud environment with Microservice model with 8 
> microservices.we are using Ignite for session data store.While performing 
> load test we are facing data grid issues if the number of clients reached 40 
> , Once data grid got corrupted we will lost the session store data and 
> Application will not respond due to session data also corrupted and all the 
> instances will auto scaled down to initial size 8.We need to restart Apache 
> Ignite to make the Application up.Please find the attached Apache Ignite 
> Configuration for your reference.
> This is Impacting the scalability for the micro-services. It is very evident 
> that the current state based architecture will not scale beyond a certain TPS 
> and the state store, especially Ignite becomes the single point of failure, 
> stalling the full Micro-service cluster.
>  Apache Ignite Version : 2.7.0
> {code}
> 07:24:46,678][SEVERE][tcp-disco-msg-worker-#2%DataStoreIgniteCache%|#2%DataStoreIgniteCache%][G]
>  Blocked system-critical thread has been detected. This can lead to 
> cluster-wide undefined behaviour [threadName=sys-stripe-5, 
> blockedFor=21s]07:24:46,678][SEVERE][tcp-disco-msg-worker-#2%DataStoreIgniteCache%|#2%DataStoreIgniteCache%][G]
>  Blocked system-critical thread has been detected. This can lead to 
> cluster-wide undefined behaviour [threadName=sys-stripe-5, 
> blockedFor=21s][07:24:46,680][SEVERE][tcp-disco-msg-worker-#2%DataStoreIgniteCache%|#2%DataStoreIgniteCache%][]
>  Critical system error detected. Will be handled accordingly to configured 
> handler [hnd=NoOpFailureHandler [super=AbstractFailureHandler 
> [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED]]], failureCtx=FailureContext 
> [type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker 
> [name=sys-stripe-5, igniteInstanceName=DataStoreIgniteCache, finished=false, 
> heartbeatTs=1575271465499]]]class org.apache.ignite.IgniteException: 
> GridWorker [name=sys-stripe-5, igniteInstanceName=DataStoreIgniteCache, 
> finished=false, heartbeatTs=1575271465499] at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1831)
>  at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1826)
>  at 
> org.apache.ignite.internal.worker.WorkersRegistry.onIdle(WorkersRegistry.java:233)
>  at 
> org.apache.ignite.internal.util.worker.GridWorker.onIdle(GridWorker.java:297) 
> at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.lambda$new$0(ServerImpl.java:2663)
>  at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorker.body(ServerImpl.java:7181)
>  at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2700)
>  at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerThread.body(ServerImpl.java:7119)
>  at 
> org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)[07:24:52,692][SEVERE][tcp-disco-msg-worker-#2%DataStoreIgniteCache%|#2%DataStoreIgniteCache%][G]
>  Blocked system-critical thread has been detected. This can lead to 
> cluster-wide undefined behaviour [threadName=ttl-cleanup-worker, 
> blockedFor=27s][07:24:52,692][SEVERE][tcp-disco-msg-worker-#2%DataStoreIgniteCache%|#2%DataStoreIgniteCache%][]
>  Critical system error detected. Will be handled accordingly to configured 
> handler [hnd=NoOpFailureHandler [super=AbstractFailureHandler 
> [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED]]], failureCtx=FailureContext 
> [type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker 
> [name=ttl-cleanup-worker, igniteInstanceName=DataStoreIgniteCache, 
> finished=false, heartbeatTs=1575271465044]]]class 
> org.apache.ignite.IgniteException: GridWorker [name=ttl-cleanup-worker, 
> igniteInstanceName=DataStoreIgniteCache, finished=false, 
> heartbeatTs=1575271465044] at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1831)
>  at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1826)
>  at 
> org.apache.ignite.intern

[jira] [Commented] (IGNITE-12447) Modification of S#compact method

2020-01-30 Thread Kirill Tkalenko (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17026536#comment-17026536
 ] 

Kirill Tkalenko commented on IGNITE-12447:
--

[~ivan.glukos] I added several tests, here is the link to TC.
[https://ci.ignite.apache.org/viewLog.html?buildId=4973405&buildTypeId=IgniteTests24Java8_Basic1&tab=buildResultsDiv&branch_IgniteTests24Java8=pull%2F7139%2Fhead]

> Modification of S#compact method
> 
>
> Key: IGNITE-12447
> URL: https://issues.apache.org/jira/browse/IGNITE-12447
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Kirill Tkalenko
>Assignee: Kirill Tkalenko
>Priority: Major
> Fix For: 2.9
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Modification of S#compact method so that it is possible to pass collection of 
> Numbers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-12434) Dump checkpoint readLock holder threads if writeLock can`t take lock more than threshold timeout.

2020-01-30 Thread Alexey Goncharuk (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Goncharuk updated IGNITE-12434:
--
Fix Version/s: (was: 2.9)

> Dump checkpoint readLock holder threads if writeLock can`t take lock more 
> than threshold timeout.
> -
>
> Key: IGNITE-12434
> URL: https://issues.apache.org/jira/browse/IGNITE-12434
> Project: Ignite
>  Issue Type: Improvement
>  Components: persistence
>Affects Versions: 2.7.6
>Reporter: Stanilovsky Evgeny
>Assignee: Stanilovsky Evgeny
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Huge cache operations like removeAll or some hardware problems with further 
> gc lock can await readLock for a long time , this can prevent for long 
> writeLock wait timeout and as a result - long checkpoint timeout, it would be 
> very informative to log such situations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-12434) Dump checkpoint readLock holder threads if writeLock can`t take lock more than threshold timeout.

2020-01-30 Thread Alexey Goncharuk (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Goncharuk updated IGNITE-12434:
--
Fix Version/s: 2.9

> Dump checkpoint readLock holder threads if writeLock can`t take lock more 
> than threshold timeout.
> -
>
> Key: IGNITE-12434
> URL: https://issues.apache.org/jira/browse/IGNITE-12434
> Project: Ignite
>  Issue Type: Improvement
>  Components: persistence
>Affects Versions: 2.7.6
>Reporter: Stanilovsky Evgeny
>Assignee: Stanilovsky Evgeny
>Priority: Minor
> Fix For: 2.9
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Huge cache operations like removeAll or some hardware problems with further 
> gc lock can await readLock for a long time , this can prevent for long 
> writeLock wait timeout and as a result - long checkpoint timeout, it would be 
> very informative to log such situations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12533) Remote security context tests refactoring.

2020-01-30 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17026524#comment-17026524
 ] 

Ignite TC Bot commented on IGNITE-12533:


{panel:title=Branch: [pull/7276/head] Base: [master] : No blockers 
found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel}
[TeamCity *--> Run :: All* 
Results|https://ci.ignite.apache.org/viewLog.html?buildId=4971077&buildTypeId=IgniteTests24Java8_RunAll]

> Remote security context tests refactoring.
> --
>
> Key: IGNITE-12533
> URL: https://issues.apache.org/jira/browse/IGNITE-12533
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Denis Garus
>Assignee: Denis Garus
>Priority: Major
> Fix For: 2.9
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> To make tests more readable and robust, we should use the 
> _AbstractRemoteSecurityContextCheckTest.Verifier#register(String)_ method in 
> all related tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (IGNITE-12049) Add user attributes to thin clients

2020-01-30 Thread Ryabov Dmitrii (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17024407#comment-17024407
 ] 

Ryabov Dmitrii edited comment on IGNITE-12049 at 1/30/20 8:34 AM:
--

[~ascherbakov],
 I restricted attributes by `String` class as suggested on discussion.

Can we continue to review?


was (Author: somefire):
[~ascherbakov],
 I restricted attributes by `String` class as suggested on discussion.

 

Can we continue to review?

> Add user attributes to thin clients
> ---
>
> Key: IGNITE-12049
> URL: https://issues.apache.org/jira/browse/IGNITE-12049
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ryabov Dmitrii
>Assignee: Ryabov Dmitrii
>Priority: Minor
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Add user attributes to thin clients (like node attributes for server nodes). 
> Make sure that custom authenticators can use these attributes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (IGNITE-12049) Add user attributes to thin clients

2020-01-30 Thread Ryabov Dmitrii (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17024407#comment-17024407
 ] 

Ryabov Dmitrii edited comment on IGNITE-12049 at 1/30/20 8:33 AM:
--

[~ascherbakov],
 I restricted attributes by `String` class as suggested on discussion.

 

Can we continue to review?


was (Author: somefire):
[~ascherbakov],
I restricted attributes by `String` class as suggested on discussion.

> Add user attributes to thin clients
> ---
>
> Key: IGNITE-12049
> URL: https://issues.apache.org/jira/browse/IGNITE-12049
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ryabov Dmitrii
>Assignee: Ryabov Dmitrii
>Priority: Minor
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Add user attributes to thin clients (like node attributes for server nodes). 
> Make sure that custom authenticators can use these attributes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-11797) Fix consistency issues for atomic and mixed tx-atomic cache groups.

2020-01-30 Thread Alexey Scherbakov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-11797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17026497#comment-17026497
 ] 

Alexey Scherbakov commented on IGNITE-11797:


Merged to master.
a9278eedf75d4cefc9642826bfe8dadbc59e04d0

> Fix consistency issues for atomic and mixed tx-atomic cache groups.
> ---
>
> Key: IGNITE-11797
> URL: https://issues.apache.org/jira/browse/IGNITE-11797
> Project: Ignite
>  Issue Type: Bug
>Reporter: Alexey Scherbakov
>Assignee: Alexey Scherbakov
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> IGNITE-10078 only solves consistency problems for tx mode.
> For atomic caches the rebalance consistency issues still remain and should be 
> fixed together with improvement of atomic cache protocol consistency.
> Also, need to disable dynamic start of atomic cache in group having only tx 
> caches because it's not working in current state.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)