[jira] [Commented] (IGNITE-13373) WAL segmentns do not released on releaseHistoryForPreloading()

2020-08-20 Thread Ivan Rakov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-13373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17181159#comment-17181159
 ] 

Ivan Rakov commented on IGNITE-13373:
-

[~alapin] Merged to master.

> WAL segmentns do not released on releaseHistoryForPreloading()
> --
>
> Key: IGNITE-13373
> URL: https://issues.apache.org/jira/browse/IGNITE-13373
> Project: Ignite
>  Issue Type: Bug
>Reporter: Alexander Lapin
>Assignee: Alexander Lapin
>Priority: Major
> Fix For: 2.10
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> * Reserve/releaseHistoryForPreloading() was reworked, now we store oldest 
> WALPointer that was reserved during reserveHistoryForPreloading in 
> reservedForPreloading field. As a result it's possible to release wall 
> reservation on releaseHIstoryForPreloading().
>  * searchAndReserveCheckpoints() was slightly refactored: now it returns not 
> only an earliestValidCheckpoints but also an oldest reservedCheckpoint, so 
> that there’s no need to recalculate it within reserveHistoryForExchange().
>  *



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-13327) Add a metric for processed keys when rebuilding indexes.

2020-08-18 Thread Ivan Rakov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-13327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17179737#comment-17179737
 ] 

Ivan Rakov commented on IGNITE-13327:
-

[~ktkale...@gridgain.com] Thanks! Merged to master.

> Add a metric for processed keys when rebuilding indexes.
> 
>
> Key: IGNITE-13327
> URL: https://issues.apache.org/jira/browse/IGNITE-13327
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Kirill Tkalenko
>Assignee: Kirill Tkalenko
>Priority: Major
> Fix For: 2.10
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> It would be useful to understand how long it will take to rebuild indexes, 
> since there can be a lot of data and indexes. Now there are following metrics 
> that allow to estimate approximately how many indexes are left to rebuild:
> # IsIndexRebuildInProgress - rebuilding cache indexes in the process;
> # IndexBuildCountPartitionsLeft - remaining number of partitions (by cache 
> group) to rebuild indexes for.
> For a more accurate estimate, I suggest adding a metric for caches "Number of 
> keys processed when rebuilding indexes" with the name 
> "IndexRebuildKeyProcessed". This way we can estimate for cache how much index 
> rebuilding will take. To do this, we can get "CacheSize" and use new metric 
> to find out how many keys are left to process.
> I also suggest adding methods:
> # org.apache.ignite.cache.CacheMetrics#getIndexRebuildKeyProcessed
> # org.apache.ignite.cache.CacheMetrics#IsIndexRebuildInProgress



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13371) Sporadic partition inconsistency after historical rebalancing of updates with same key put-remove pattern

2020-08-18 Thread Ivan Rakov (Jira)
Ivan Rakov created IGNITE-13371:
---

 Summary: Sporadic partition inconsistency after historical 
rebalancing of updates with same key put-remove pattern
 Key: IGNITE-13371
 URL: https://issues.apache.org/jira/browse/IGNITE-13371
 Project: Ignite
  Issue Type: Bug
Reporter: Ivan Rakov
Assignee: Ivan Rakov
 Fix For: 2.10


h4. scenario
# start 3 servers 3 clients, create caches
# clients start combined put + 1% remove of data in transactions 
PESSIMISTIC/REPEATABLE_READ
## kill one node
## restart one node
# ensure all transactions completed
# run idle_verify

Expected: no conflicts found
Actual:
{noformat}
[12:03:18][:55 :230] Control utility --cache idle_verify --skip-zeros 
--cache-filter PERSISTENT
[12:03:20][:55 :230] Control utility [ver. 8.7.13#20200228-sha1:7b016d63]
[12:03:20][:55 :230] 2020 Copyright(C) GridGain Systems, Inc. and Contributors
[12:03:20][:55 :230] User: prtagent
[12:03:20][:55 :230] Time: 2020-03-03T12:03:19.836
[12:03:20][:55 :230] Command [CACHE] started
[12:03:20][:55 :230] Arguments: --host 172.25.1.11 --port 11211 --cache 
idle_verify --skip-zeros --cache-filter PERSISTENT 
[12:03:20][:55 :230] 

[12:03:20][:55 :230] idle_verify task was executed with the following args: 
caches=[], excluded=[], cacheFilter=[PERSISTENT]
[12:03:20][:55 :230] idle_verify check has finished, found 1 conflict 
partitions: [counterConflicts=0, hashConflicts=1]
[12:03:20][:55 :230] Hash conflicts:
[12:03:20][:55 :230] Conflict partition: PartitionKeyV2 [grpId=1338167321, 
grpName=cache_group_3_088_1, partId=24]
[12:03:20][:55 :230] Partition instances: [PartitionHashRecordV2 
[isPrimary=false, consistentId=node_1_2, updateCntr=172349, 
partitionState=OWNING, size=6299, partHash=157875238], PartitionHashRecordV2 
[isPrimary=true, consistentId=node_1_1, updateCntr=172349, 
partitionState=OWNING, size=6299, partHash=157875238], PartitionHashRecordV2 
[isPrimary=false, consistentId=node_1_4, updateCntr=172349, 
partitionState=OWNING, size=6300, partHash=-944532882]]
[12:03:20][:55 :230] Command [CACHE] finished with code: 0
[12:03:20][:55 :230] Control utility has completed execution at: 
2020-03-03T12:03:20.593
[12:03:20][:55 :230] Execution time: 757 ms
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-13253) Advanced heuristics for historical rebalance

2020-08-18 Thread Ivan Rakov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-13253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Rakov updated IGNITE-13253:

Release Note: Introduced heuristic for using historical rebalancing 
automatically if it's more efficient. Property 
-DIGNITE_PDS_WAL_REBALANCE_THRESHOLD is still present, but its value changed to 
500. Heuristic is applied only for larger partitions.

> Advanced heuristics for historical rebalance
> 
>
> Key: IGNITE-13253
> URL: https://issues.apache.org/jira/browse/IGNITE-13253
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Vladislav Pyatkov
>Assignee: Ivan Rakov
>Priority: Major
> Fix For: 2.10
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Before, cluster detects partitions that have not to rebalance by history, by 
> them size. This threshold might be set through a system property 
> IGNITE_PDS_WAL_REBALANCE_THRESHOLD. But it is not fair deciding which 
> partitions will be rebalanced by WAL only by them size. WAL can have much 
> more records than size of a partition (many update by one key) and that 
> rebalance required more data than full transferring by network.
> Need to implement a heuristic, that might to estimate data size.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-13253) Advanced heuristics for historical rebalance

2020-08-18 Thread Ivan Rakov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-13253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Rakov updated IGNITE-13253:

Reviewer: Vladislav Pyatkov

> Advanced heuristics for historical rebalance
> 
>
> Key: IGNITE-13253
> URL: https://issues.apache.org/jira/browse/IGNITE-13253
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Vladislav Pyatkov
>Assignee: Ivan Rakov
>Priority: Major
> Fix For: 2.10
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Before, cluster detects partitions that have not to rebalance by history, by 
> them size. This threshold might be set through a system property 
> IGNITE_PDS_WAL_REBALANCE_THRESHOLD. But it is not fair deciding which 
> partitions will be rebalanced by WAL only by them size. WAL can have much 
> more records than size of a partition (many update by one key) and that 
> rebalance required more data than full transferring by network.
> Need to implement a heuristic, that might to estimate data size.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-13253) Advanced heuristics for historical rebalance

2020-08-18 Thread Ivan Rakov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-13253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17179550#comment-17179550
 ] 

Ivan Rakov commented on IGNITE-13253:
-

[~v.pyatkov] Please take a look.

> Advanced heuristics for historical rebalance
> 
>
> Key: IGNITE-13253
> URL: https://issues.apache.org/jira/browse/IGNITE-13253
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Vladislav Pyatkov
>Assignee: Ivan Rakov
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Before, cluster detects partitions that have not to rebalance by history, by 
> them size. This threshold might be set through a system property 
> IGNITE_PDS_WAL_REBALANCE_THRESHOLD. But it is not fair deciding which 
> partitions will be rebalanced by WAL only by them size. WAL can have much 
> more records than size of a partition (many update by one key) and that 
> rebalance required more data than full transferring by network.
> Need to implement a heuristic, that might to estimate data size.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-13253) Advanced heuristics for historical rebalance

2020-08-17 Thread Ivan Rakov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-13253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17179132#comment-17179132
 ] 

Ivan Rakov commented on IGNITE-13253:
-

Dev list discussion: 
http://apache-ignite-developers.2346864.n4.nabble.com/Choosing-historical-rebalance-heuristics-td48389.html#a48409

> Advanced heuristics for historical rebalance
> 
>
> Key: IGNITE-13253
> URL: https://issues.apache.org/jira/browse/IGNITE-13253
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Vladislav Pyatkov
>Assignee: Ivan Rakov
>Priority: Major
>
> Before, cluster detects partitions that have not to rebalance by history, by 
> them size. This threshold might be set through a system property 
> IGNITE_PDS_WAL_REBALANCE_THRESHOLD. But it is not fair deciding which 
> partitions will be rebalanced by WAL only by them size. WAL can have much 
> more records than size of a partition (many update by one key) and that 
> rebalance required more data than full transferring by network.
> Need to implement a heuristic, that might to estimate data size.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (IGNITE-13253) Advanced heuristics for historical rebalance

2020-08-17 Thread Ivan Rakov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-13253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Rakov reassigned IGNITE-13253:
---

Assignee: Ivan Rakov

> Advanced heuristics for historical rebalance
> 
>
> Key: IGNITE-13253
> URL: https://issues.apache.org/jira/browse/IGNITE-13253
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Vladislav Pyatkov
>Assignee: Ivan Rakov
>Priority: Major
>
> Before, cluster detects partitions that have not to rebalance by history, by 
> them size. This threshold might be set through a system property 
> IGNITE_PDS_WAL_REBALANCE_THRESHOLD. But it is not fair deciding which 
> partitions will be rebalanced by WAL only by them size. WAL can have much 
> more records than size of a partition (many update by one key) and that 
> rebalance required more data than full transferring by network.
> Need to implement a heuristic, that might to estimate data size.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (IGNITE-5038) BinaryMarshaller might need to use context class loader for deserialization

2020-08-13 Thread Ivan Rakov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-5038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Rakov reassigned IGNITE-5038:
--

Assignee: Ivan Rakov  (was: Mirza Aliev)

> BinaryMarshaller might need to use context class loader for deserialization
> ---
>
> Key: IGNITE-5038
> URL: https://issues.apache.org/jira/browse/IGNITE-5038
> Project: Ignite
>  Issue Type: Improvement
>  Components: binary
>Affects Versions: 2.0
>Reporter: Dmitry Karachentsev
>Assignee: Ivan Rakov
>Priority: Major
>  Labels: features
> Attachments: results-compound-20170802.zip, 
> results-compound-20170808.zip
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> There is a special use case discussed on the dev list:
> http://apache-ignite-developers.2346864.n4.nabble.com/Re-BinaryObjectImpl-deserializeValue-with-specific-ClassLoader-td17126.html#a17224
> According to the use case, BinaryMarshaller might need to try to deserialize 
> an object using a context class loader if it failed to do so with a custom 
> classloader (`IgniteConfiguration.getClassLoader()`) or the system one.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-5038) BinaryMarshaller might need to use context class loader for deserialization

2020-08-12 Thread Ivan Rakov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-5038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17176394#comment-17176394
 ] 

Ivan Rakov commented on IGNITE-5038:


[~agoncharuk] [~gvvinblade] [~v.pyatkov] Guys, please take a look at new PR: 
https://github.com/apache/ignite/pull/8146
I've addressed Vlad's and Alex's comments in it.

> Also, what happens if I call deserialize() and pass system class loader?
Object instance will unmarshalled with system classloader, without using cache 
(with extra call of Class.forName).

> We need to add tests to verify that user class loaders do not leak to the 
> static cache
It's covered by BinaryClassLoaderMultiJvmTest, see 
BinaryClassLoaderMultiJvmTest#checkClassCacheEmpty. A bit tricky, but should 
work.

I'll groom test code (will add test scenarios description and so on) if TC 
shows that current patch is viable.

> BinaryMarshaller might need to use context class loader for deserialization
> ---
>
> Key: IGNITE-5038
> URL: https://issues.apache.org/jira/browse/IGNITE-5038
> Project: Ignite
>  Issue Type: Improvement
>  Components: binary
>Affects Versions: 2.0
>Reporter: Dmitry Karachentsev
>Assignee: Mirza Aliev
>Priority: Major
>  Labels: features
> Attachments: results-compound-20170802.zip, 
> results-compound-20170808.zip
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> There is a special use case discussed on the dev list:
> http://apache-ignite-developers.2346864.n4.nabble.com/Re-BinaryObjectImpl-deserializeValue-with-specific-ClassLoader-td17126.html#a17224
> According to the use case, BinaryMarshaller might need to try to deserialize 
> an object using a context class loader if it failed to do so with a custom 
> classloader (`IgniteConfiguration.getClassLoader()`) or the system one.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-13327) Add a metric for processed keys when rebuilding indexes.

2020-08-10 Thread Ivan Rakov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-13327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17174421#comment-17174421
 ] 

Ivan Rakov commented on IGNITE-13327:
-

[~ktkale...@gridgain.com] I've shared my thoughts regarding the suggested 
metric on the dev list.

> Add a metric for processed keys when rebuilding indexes.
> 
>
> Key: IGNITE-13327
> URL: https://issues.apache.org/jira/browse/IGNITE-13327
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Kirill Tkalenko
>Assignee: Kirill Tkalenko
>Priority: Major
> Fix For: 2.10
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> It would be useful to understand how long it will take to rebuild indexes, 
> since there can be a lot of data and indexes. Now there are following metrics 
> that allow to estimate approximately how many indexes are left to rebuild:
> # IsIndexRebuildInProgress - rebuilding cache indexes in the process;
> # IndexBuildCountPartitionsLeft - remaining number of partitions (by cache 
> group) to rebuild indexes for.
> For a more accurate estimate, I suggest adding a metric for caches "Number of 
> keys processed when rebuilding indexes" with the name 
> "IndexRebuildKeyProcessed". This way we can estimate for cache how much index 
> rebuilding will take. To do this, we can get "CacheSize" and use new metric 
> to find out how many keys are left to process.
> I also suggest adding methods:
> # org.apache.ignite.cache.CacheMetrics#getIndexRebuildKeyProcessed
> # org.apache.ignite.cache.CacheMetrics#IsIndexRebuildInProgress



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-13158) Get rid of Externalizable interface at IgniteTxAdapter

2020-08-03 Thread Ivan Rakov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-13158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17170019#comment-17170019
 ] 

Ivan Rakov commented on IGNITE-13158:
-

[~slava.koptilin] Looks good, please proceed to merge.

> Get rid of Externalizable interface at IgniteTxAdapter
> --
>
> Key: IGNITE-13158
> URL: https://issues.apache.org/jira/browse/IGNITE-13158
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.8.1
>Reporter: Vyacheslav Koptilin
>Assignee: Vyacheslav Koptilin
>Priority: Major
> Fix For: 2.10
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> It seems that {{IgniteTxAdapter}} implements {{Externalizable}} without any 
> reason. Transaction instances are not transferred via network, and so I see 
> no reason for implementing {{Externalizable}} interface.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-13098) TcpCommunicationSpi split to independent classes

2020-07-31 Thread Ivan Rakov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-13098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17169078#comment-17169078
 ] 

Ivan Rakov commented on IGNITE-13098:
-

There are still suspicious possible blockers. I've scheduled a re-run.

> TcpCommunicationSpi split to independent classes
> 
>
> Key: IGNITE-13098
> URL: https://issues.apache.org/jira/browse/IGNITE-13098
> Project: Ignite
>  Issue Type: Bug
> Environment: TcpCommunicationSpi split to independent classes
>Reporter: Stepachev Maksim
>Assignee: Stepachev Maksim
>Priority: Major
> Fix For: 2.10
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> h2. Description
> This ticket describes  requirements for TcpCommunicationSpi refactoring. The 
> main goal is to split the class without changing behavior and public API.
> *Actual problem:*
> CurrentlyTcpCommunicationSpi has over 5K lines and includes about15+ inner 
> classes like:
>  # ShmemAcceptWorker
>  # SHMemHandshakeClosure
>  # ShmemWorker
>  # CommunicationDiscoveryEventListener
>  # CommunicationWorker
>  # ConnectFuture
>  # ConnectGateway
>  # ConnectionKey
>  # ConnectionPolicy
>  # DisconnectedSessionInfo
>  # FirstConnectionPolicy
>  # HandshakeTimeoutObject
>  # RoundRobinConnectionPolicy
>  # TcpCommunicationConnectionCheckFuture
>  # TcpCommunicationSpiMBeanImpl
> In addition, it contains logic of client connection life cycle, nio server 
> handler, and handshake handler.
> The classes above have cyclic dependencies and high coupling.The whole 
> mechanism works because classes have access to each other via parent class 
> references. As a result, initialization of class isn't consistent. By 
> consistent I mean that class created via constructor is ready to be used. All 
> of the classes work with context and shareproperties everywhere.
> Many methods of TcpCommunicationSpi don’t have a single responsibility. 
> Example is getNodeAttribute:,it makes client reservation,  takes the IP 
> address of the node and provides attributes.
> It works fine and we usually don’t have reasons to change anything. But if 
> you want to create a test that has a little different behavior than a 
> blocking message, you can't mock or change the behavior of inner classes. For 
> example, test covering change in the handshake process. Some people make test 
> methods in public API like "closeConnections" or "openSocketChannel" because 
> the current design isn't fine for it. It also takes a lot of time for test 
> development for minor changes.
> *Solution:*
> The scope of work is big and communication spi is place which should be 
> changed carefully. I recommend to make this refactoring step by step.
>  * The first idea is to split the parent class into independent classes and 
> move them to the internal package. We should achieveSOLID when it’s done.
>  * Extract spread logic to appropriate classes like ClientPool, 
> HandshakeHandler, etc.
>  * Make a common transfer object for TCSpi configuration.
>  * Make dependencies direct if it is possible.
>  * Initialize all dependencies in one place.
>  * Make child classes context-free.
>  * Try to do classes more testable.
>  * Use the idea of dependency injection without a framework for it.
> *Benefits:*
> With the ability to write truly jUnit-style tests and cover functionality 
> with better testing we get a way to easier develop new features and 
> optimizations needed in such low-level components as TcpCommunicationSpi.
> Examples of features that improve usability of Apache Ignite a lot are: 
> inverse communication connection with optimizations and connection 
> multiplexing. Both of the features could be used in environments with 
> restricted network connectivity (e.g. when connections between nodes could be 
> established only in one direction).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-13313) IndexOutOfBoundsException from GridDhtAtomicUpdateRequest on server node startup

2020-07-31 Thread Ivan Rakov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-13313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17169067#comment-17169067
 ] 

Ivan Rakov commented on IGNITE-13313:
-

We need to reproduce an issue to be sure.
[~gdomo], can you write a reproducer or describe a configuration and scenario / 
load pattern?

> IndexOutOfBoundsException from GridDhtAtomicUpdateRequest on server node 
> startup
> 
>
> Key: IGNITE-13313
> URL: https://issues.apache.org/jira/browse/IGNITE-13313
> Project: Ignite
>  Issue Type: Bug
>Reporter: Grigory Domozhirov
>Priority: Major
>
> Sometimes server node fails on startup with following exception.
> GGCE 8.7.21, no persistance, atomic caches
>  
>  2020-07-30 13:39:02,962 [sys-stripe-0-#1|#1] ERROR 
> o.a.i.i.p.c.GridCacheIoManager - Failed processing message 
> [senderId=98dc3c18-ea57-4805-9603-b92eb7e62be2, msg=GridD
>  htAtomicUpdateRequest [keys=ArrayList 
> [com.moex.esb.blackhole.model.fx.trade.Securities$Key [idHash=4701836, 
> hash=890931906, secboard=AETS, seccode=GLDRUB_TOD],
>  *__*
>  ... and 111883 skipped ...=0,_**_
>   prevVals=ArrayList [null, null, null, null, null, null, null, null, null, 
> null, null, null, null, null, null, null, null, null, null, null, null, null, 
> null, null,
>   null, null, null, null, null, null, null, null, null, null, null, null, 
> null, null, null, null, null, null, null, null, null, null, null, null, null, 
> null, null, n
>  ull, null, null, null, null, null, null, null, null, null, null, null, null, 
> null, null, null, null, null, null, null, null, null, null, null, null, null, 
> null, nul
>  l, null, null, null, null, null, null, null, null, null, null, null, null, 
> null, null, null, null, null, null, null, null, null... and 19 more], 
> ttls=null, conflict
>  ExpireTimes=null, nearTtls=null, nearExpireTimes=null, nearKeys=null, 
> nearVals=null, obsoleteIndexes=null, forceTransformBackups=false, 
> updateCntrs=GridLongList [id
>  x=174, 
> arr=[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,1,1,1,1,1,1,1,2,1,1,1,1,1,2,1,1,1,1,1,1,1,1,1,2,1,1,1,1,1,1,1,1,1,1,2,1,1,1,1,1,1,1,2,1,1,
>  
> 1,2,1,1,1,1,1,1,3,1,1,1,1,1,1,1,1,1,2,2,1,1,1,1,2,1,1,3,1,1,1,1,1,2,1,2,1,1,1,1,2,1,1,1,2,1,2,1,2,1,1,1,2,1,2,2,1,4,1,3,3,1,2,2,2,2,1,3,1,1,1,1,1,2,3,4,1,1,2,1,1,1,
>  1,4,1,1,1,2,1,2,1,1,1,1,3,1,1,1]], super=GridDhtAtomicAbstractUpdateRequest 
> [onRes=false, nearNodeId=ae4abad0-d501-4703-98bf-b5eabd10f159, 
> nearFutId=147459, flags=k
>  eepBinary|hasRes]]]
>  java.lang.IndexOutOfBoundsException: Index 119 out of bounds for length 119
>          at 
> java.base/jdk.internal.util.Preconditions.outOfBounds(Preconditions.java:64) 
> ~[na:na]
>          at 
> java.base/jdk.internal.util.Preconditions.outOfBoundsCheckIndex(Preconditions.java:70)
>  ~[na:na]
>          at 
> java.base/jdk.internal.util.Preconditions.checkIndex(Preconditions.java:248) 
> ~[na:na]
>          at java.base/java.util.Objects.checkIndex(Objects.java:373) ~[na:na]
>          at java.base/java.util.ArrayList.get(ArrayList.java:425) ~[na:na]
>          at 
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicUpdateRequest.previousValue(GridDhtAtomicUpdateRequest.java:391)
>  ~[ignite
>  -core-8.7.21.jar:8.7.21]
>          at 
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.processDhtAtomicUpdateRequest(GridDhtAtomicCache.java:3363)
>  ~[ignit
>  e-core-8.7.21.jar:8.7.21]
>          at 
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$600(GridDhtAtomicCache.java:141)
>  ~[ignite-core-8.7.21.jar:8.
>  7.21]
>          at 
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$7.apply(GridDhtAtomicCache.java:311)
>  ~[ignite-core-8.7.21.jar:8.7.2
>  1]
>          at 
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$7.apply(GridDhtAtomicCache.java:306)
>  ~[ignite-core-8.7.21.jar:8.7.2
>  1]
>          at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1142)
>  ~[ignite-core-8.7.21.jar:8.7.21]
>          at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:591)
>  ~[ignite-core-8.7.21.jar:8.7.21]
>          at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:392)
>  ~[ignite-core-8.7.21.jar:8.7.21]
>          at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:318)
>  ~[ignite-core-8.7.21.jar:8.7.21]
>          at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:109)
>  

[jira] [Commented] (IGNITE-13313) IndexOutOfBoundsException from GridDhtAtomicUpdateRequest on server node startup

2020-07-31 Thread Ivan Rakov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-13313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17169065#comment-17169065
 ] 

Ivan Rakov commented on IGNITE-13313:
-

Possible cause: during loop in GridDhtAtomicCache#updateAllAsyncInternal0, we 
called GridDhtAtomicCache#update at least twice.
In case result of calls
{code:java}
boolean sndPrevVal = !top.rebalanceFinished(req.topologyVersion());
{code}
is different (rebalance is finished between them), previous values can be added 
only for part of entries, which can cause described error.

> IndexOutOfBoundsException from GridDhtAtomicUpdateRequest on server node 
> startup
> 
>
> Key: IGNITE-13313
> URL: https://issues.apache.org/jira/browse/IGNITE-13313
> Project: Ignite
>  Issue Type: Bug
>Reporter: Grigory Domozhirov
>Priority: Major
>
> Sometimes server node fails on startup with following exception.
> GGCE 8.7.21, no persistance, atomic caches
>  
>  2020-07-30 13:39:02,962 [sys-stripe-0-#1|#1] ERROR 
> o.a.i.i.p.c.GridCacheIoManager - Failed processing message 
> [senderId=98dc3c18-ea57-4805-9603-b92eb7e62be2, msg=GridD
>  htAtomicUpdateRequest [keys=ArrayList 
> [com.moex.esb.blackhole.model.fx.trade.Securities$Key [idHash=4701836, 
> hash=890931906, secboard=AETS, seccode=GLDRUB_TOD],
>  *__*
>  ... and 111883 skipped ...=0,_**_
>   prevVals=ArrayList [null, null, null, null, null, null, null, null, null, 
> null, null, null, null, null, null, null, null, null, null, null, null, null, 
> null, null,
>   null, null, null, null, null, null, null, null, null, null, null, null, 
> null, null, null, null, null, null, null, null, null, null, null, null, null, 
> null, null, n
>  ull, null, null, null, null, null, null, null, null, null, null, null, null, 
> null, null, null, null, null, null, null, null, null, null, null, null, null, 
> null, nul
>  l, null, null, null, null, null, null, null, null, null, null, null, null, 
> null, null, null, null, null, null, null, null, null... and 19 more], 
> ttls=null, conflict
>  ExpireTimes=null, nearTtls=null, nearExpireTimes=null, nearKeys=null, 
> nearVals=null, obsoleteIndexes=null, forceTransformBackups=false, 
> updateCntrs=GridLongList [id
>  x=174, 
> arr=[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,1,1,1,1,1,1,1,2,1,1,1,1,1,2,1,1,1,1,1,1,1,1,1,2,1,1,1,1,1,1,1,1,1,1,2,1,1,1,1,1,1,1,2,1,1,
>  
> 1,2,1,1,1,1,1,1,3,1,1,1,1,1,1,1,1,1,2,2,1,1,1,1,2,1,1,3,1,1,1,1,1,2,1,2,1,1,1,1,2,1,1,1,2,1,2,1,2,1,1,1,2,1,2,2,1,4,1,3,3,1,2,2,2,2,1,3,1,1,1,1,1,2,3,4,1,1,2,1,1,1,
>  1,4,1,1,1,2,1,2,1,1,1,1,3,1,1,1]], super=GridDhtAtomicAbstractUpdateRequest 
> [onRes=false, nearNodeId=ae4abad0-d501-4703-98bf-b5eabd10f159, 
> nearFutId=147459, flags=k
>  eepBinary|hasRes]]]
>  java.lang.IndexOutOfBoundsException: Index 119 out of bounds for length 119
>          at 
> java.base/jdk.internal.util.Preconditions.outOfBounds(Preconditions.java:64) 
> ~[na:na]
>          at 
> java.base/jdk.internal.util.Preconditions.outOfBoundsCheckIndex(Preconditions.java:70)
>  ~[na:na]
>          at 
> java.base/jdk.internal.util.Preconditions.checkIndex(Preconditions.java:248) 
> ~[na:na]
>          at java.base/java.util.Objects.checkIndex(Objects.java:373) ~[na:na]
>          at java.base/java.util.ArrayList.get(ArrayList.java:425) ~[na:na]
>          at 
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicUpdateRequest.previousValue(GridDhtAtomicUpdateRequest.java:391)
>  ~[ignite
>  -core-8.7.21.jar:8.7.21]
>          at 
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.processDhtAtomicUpdateRequest(GridDhtAtomicCache.java:3363)
>  ~[ignit
>  e-core-8.7.21.jar:8.7.21]
>          at 
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$600(GridDhtAtomicCache.java:141)
>  ~[ignite-core-8.7.21.jar:8.
>  7.21]
>          at 
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$7.apply(GridDhtAtomicCache.java:311)
>  ~[ignite-core-8.7.21.jar:8.7.2
>  1]
>          at 
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$7.apply(GridDhtAtomicCache.java:306)
>  ~[ignite-core-8.7.21.jar:8.7.2
>  1]
>          at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1142)
>  ~[ignite-core-8.7.21.jar:8.7.21]
>          at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:591)
>  ~[ignite-core-8.7.21.jar:8.7.21]
>          at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:392)
>  ~[ignite-core-8.7.21.jar:8.7.21]
>          at 
> 

[jira] [Commented] (IGNITE-13098) TcpCommunicationSpi split to independent classes

2020-07-31 Thread Ivan Rakov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-13098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17168938#comment-17168938
 ] 

Ivan Rakov commented on IGNITE-13098:
-

[~akalashnikov], can you please take a look?

> TcpCommunicationSpi split to independent classes
> 
>
> Key: IGNITE-13098
> URL: https://issues.apache.org/jira/browse/IGNITE-13098
> Project: Ignite
>  Issue Type: Bug
> Environment: TcpCommunicationSpi split to independent classes
>Reporter: Stepachev Maksim
>Assignee: Stepachev Maksim
>Priority: Major
> Fix For: 2.10
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> h2. Description
> This ticket describes  requirements for TcpCommunicationSpi refactoring. The 
> main goal is to split the class without changing behavior and public API.
> *Actual problem:*
> CurrentlyTcpCommunicationSpi has over 5K lines and includes about15+ inner 
> classes like:
>  # ShmemAcceptWorker
>  # SHMemHandshakeClosure
>  # ShmemWorker
>  # CommunicationDiscoveryEventListener
>  # CommunicationWorker
>  # ConnectFuture
>  # ConnectGateway
>  # ConnectionKey
>  # ConnectionPolicy
>  # DisconnectedSessionInfo
>  # FirstConnectionPolicy
>  # HandshakeTimeoutObject
>  # RoundRobinConnectionPolicy
>  # TcpCommunicationConnectionCheckFuture
>  # TcpCommunicationSpiMBeanImpl
> In addition, it contains logic of client connection life cycle, nio server 
> handler, and handshake handler.
> The classes above have cyclic dependencies and high coupling.The whole 
> mechanism works because classes have access to each other via parent class 
> references. As a result, initialization of class isn't consistent. By 
> consistent I mean that class created via constructor is ready to be used. All 
> of the classes work with context and shareproperties everywhere.
> Many methods of TcpCommunicationSpi don’t have a single responsibility. 
> Example is getNodeAttribute:,it makes client reservation,  takes the IP 
> address of the node and provides attributes.
> It works fine and we usually don’t have reasons to change anything. But if 
> you want to create a test that has a little different behavior than a 
> blocking message, you can't mock or change the behavior of inner classes. For 
> example, test covering change in the handshake process. Some people make test 
> methods in public API like "closeConnections" or "openSocketChannel" because 
> the current design isn't fine for it. It also takes a lot of time for test 
> development for minor changes.
> *Solution:*
> The scope of work is big and communication spi is place which should be 
> changed carefully. I recommend to make this refactoring step by step.
>  * The first idea is to split the parent class into independent classes and 
> move them to the internal package. We should achieveSOLID when it’s done.
>  * Extract spread logic to appropriate classes like ClientPool, 
> HandshakeHandler, etc.
>  * Make a common transfer object for TCSpi configuration.
>  * Make dependencies direct if it is possible.
>  * Initialize all dependencies in one place.
>  * Make child classes context-free.
>  * Try to do classes more testable.
>  * Use the idea of dependency injection without a framework for it.
> *Benefits:*
> With the ability to write truly jUnit-style tests and cover functionality 
> with better testing we get a way to easier develop new features and 
> optimizations needed in such low-level components as TcpCommunicationSpi.
> Examples of features that improve usability of Apache Ignite a lot are: 
> inverse communication connection with optimizations and connection 
> multiplexing. Both of the features could be used in environments with 
> restricted network connectivity (e.g. when connections between nodes could be 
> established only in one direction).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-13306) CpuLoad metric return -1 under Java 11

2020-07-31 Thread Ivan Rakov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-13306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17168623#comment-17168623
 ] 

Ivan Rakov commented on IGNITE-13306:
-

[~maliev] Merged to master.
IGNITE-13306 Add Java 11 specific JVM flag for fixing CPU load metric - Fixes 
#8088 Mirza Aliev* 7/30/2020 18:13 65e0ca879975d7aa88d9d94dbb19db1d8ac9a39f
[~alex_pl], can you please cherry-pick it to 2.9?

> CpuLoad metric return -1 under Java 11
> --
>
> Key: IGNITE-13306
> URL: https://issues.apache.org/jira/browse/IGNITE-13306
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.8.1
>Reporter: Mirza Aliev
>Assignee: Mirza Aliev
>Priority: Major
> Fix For: 2.9
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Start cluster under Java 11.
> Observed: 
>  CpuLoad metric will return -1
> Expected:
>  Real CpuLoad.
> We investigated this issue and found that under Java 11 code failed with 
> following trace:
> {code:java}
> class org.apache.ignite.IgniteException: Failed to get property value 
> [property=processCpuTime, 
> obj=com.sun.management.internal.OperatingSystemImpl@1dd92fe2] at 
> org.apache.ignite.internal.util.IgniteUtils.property(IgniteUtils.java:8306) 
> at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$MetricsUpdater.getCpuLoad(GridDiscoveryManager.java:3131)
>  at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$MetricsUpdater.run(GridDiscoveryManager.java:3093)
>  at 
> org.apache.ignite.internal.processors.timeout.GridTimeoutProcessor$CancelableTask.onTimeout(GridTimeoutProcessor.java:364)
>  at 
> org.apache.ignite.internal.processors.timeout.GridTimeoutProcessor$TimeoutWorker.body(GridTimeoutProcessor.java:233)
>  at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:119) at 
> java.base/java.lang.Thread.run(Thread.java:834) Caused by: 
> java.lang.reflect.InaccessibleObjectException: Unable to make public long 
> com.sun.management.internal.OperatingSystemImpl.getProcessCpuTime() 
> accessible: module jdk.management does not "opens 
> com.sun.management.internal" to unnamed module @35fb3008 at 
> java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:340)
>  at 
> java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:280)
>  at java.base/java.lang.reflect.Method.checkCanSetAccessible(Method.java:198) 
> at java.base/java.lang.reflect.Method.setAccessible(Method.java:192) at 
> org.apache.ignite.internal.util.IgniteUtils.property(IgniteUtils.java:8297) 
> ... 6 more
> {code}
> Under Java 8 metric has expected value.
>  
> Solution:
> The behaviour is expected because in Java 11 the CPU load metrics is moved to 
> JDK internal module which is not accessible by default. Adding the following 
> line to the jvm in which Ignite node is started should solve the issue:
> {noformat}
> --add-opens jdk.management/com.sun.management.internal=ALL-UNNAMED{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-13060) Tracing: initial implementation

2020-07-16 Thread Ivan Rakov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-13060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17159114#comment-17159114
 ] 

Ivan Rakov commented on IGNITE-13060:
-

[~alapin] Merged to master.
IGNITE-13060 Tracing: initial implementation - Fixes #7976. alapin* 7/16/2020 
13:44 0ef1debd2fc9452376a9e1ce36f0a9a945469783

> Tracing: initial implementation
> ---
>
> Key: IGNITE-13060
> URL: https://issues.apache.org/jira/browse/IGNITE-13060
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Alexander Lapin
>Assignee: Alexander Lapin
>Priority: Major
> Fix For: 2.9
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Initial tracing implementation. See 
> [IEP-48|https://cwiki.apache.org/confluence/display/IGNITE/IEP-48%3A+Tracing] 
> for details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-12307) Data types coverage for basic cache operations.

2020-07-14 Thread Ivan Rakov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Rakov updated IGNITE-12307:

Fix Version/s: (was: 2.9)
   2.10

> Data types coverage for basic cache operations.
> ---
>
> Key: IGNITE-12307
> URL: https://issues.apache.org/jira/browse/IGNITE-12307
> Project: Ignite
>  Issue Type: Task
>  Components: cache
>Reporter: Alexander Lapin
>Assignee: Alexander Lapin
>Priority: Major
> Fix For: 2.10
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The data types used for testing are not collected at single test/suite and 
> it's not clear which types are covered and which not. We should redesign the 
> coverage and cover following
> Operations:
>  * put
>  * putAll
>  * remove
>  * removeAll
>  * get
>  * getAll
> Data Types both for value and key (if applicable):
>  * byte/Byte
>  * short/Short
>  * int/Integer
>  * long/Long
>  * float/Float
>  * double/Double
>  * boolean/Boolean
>  * char/String
>  * Arrays of primitives (single type)
>  * Arrays of Objects (different types)
>  * Collections
>  * 
>  ** List
>  * 
>  ** Queue
>  * 
>  ** Set
>  * Objects based on:
>  * 
>  ** primitives only
>  * 
>  ** primitives + collections
>  * 
>  ** primitives + collections + nested objects
> Persistance mode:
>  * in-memory
>  * PDS
> Cache configurations:
>  * atomic/tx/mvcc
>  * replication/partitioned
>  * TTL/no TTL
>  * QueryEntnty
>  * Backups=1,2
>  * EvictionPolicy
>  * writeSynchronizationMode(FULL_SYNC, PRIMARY_SYNC, FULL_ASYNC)
>  * onheapCacheEnabled
>  
> We should check basic cache operation, basic sql operations as well as cache 
> to jdbc and jdbc to cache operations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-13245) Rebalance future might hangs in no final state though all partitions are owned

2020-07-14 Thread Ivan Rakov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-13245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Rakov updated IGNITE-13245:

Fix Version/s: (was: 2.9)
   2.10

> Rebalance future might hangs in no final state though all partitions are owned
> --
>
> Key: IGNITE-13245
> URL: https://issues.apache.org/jira/browse/IGNITE-13245
> Project: Ignite
>  Issue Type: Bug
>Reporter: Vladislav Pyatkov
>Assignee: Vladislav Pyatkov
>Priority: Major
> Fix For: 2.10
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> It is very specific case, when supplier go out of cluster and in the same 
> time, its partitions have not needed rebalance in new topology.
> Loot at my PR for to understand it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-12320) Partial index rebuild fails in case indexed cache contains different datatypes

2020-07-14 Thread Ivan Rakov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Rakov updated IGNITE-12320:

Fix Version/s: (was: 2.9)
   2.10

> Partial index rebuild fails in case indexed cache contains different datatypes
> --
>
> Key: IGNITE-12320
> URL: https://issues.apache.org/jira/browse/IGNITE-12320
> Project: Ignite
>  Issue Type: Bug
>Reporter: Alexander Lapin
>Assignee: Alexander Lapin
>Priority: Major
> Fix For: 2.10
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The problem is that in case cache contains different datatypes, all of them 
> will be passed to IndexRebuildPartialClosure during iteration over partition. 
> Perhaps, TableCacheFilter is supposed to filter out entries of unwanted 
> types, but it doesn't work properly.
> Steps to reprocude:
> 1. Add entries of different types (both indexed and not) to cache
> 2. Trigger partial index rebuild
> Index rebuild will fail with the following error:
> {code:java}
> [2019-08-20 
> 00:33:55,640][ERROR][pub-#302%h2.GridIndexFullRebuildTest3%][IgniteTestResources]
>  Critical system error detected. Will be handled accordingly to configured 
> handler [hnd=NoOpFailureHandler [super=AbstractFailureHandler 
> [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, 
> SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext 
> [type=CRITICAL_ERROR, err=class 
> o.a.i.i.processors.cache.persistence.tree.CorruptedTreeException: B+Tree is 
> corrupted [pages(groupId, pageId)=[IgniteBiTuple [val1=98629247, 
> val2=844420635165670]], msg=Runtime failure on row: %s  string representation>]]]
> class 
> org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException:
>  B+Tree is corrupted [pages(groupId, pageId)=[IgniteBiTuple [val1=98629247, 
> val2=844420635165670]], msg=Runtime failure on row: %s  string representation>]
>   at 
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.corruptedTreeException(BPlusTree.java:5126)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.doPut(BPlusTree.java:2236)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.putx(BPlusTree.java:2183)
>   at 
> org.apache.ignite.internal.processors.query.h2.database.H2TreeIndex.putx(H2TreeIndex.java:285)
>   at 
> org.apache.ignite.internal.processors.query.h2.IndexRebuildPartialClosure.apply(IndexRebuildPartialClosure.java:49)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheMapEntry.updateIndex(GridCacheMapEntry.java:3867)
>   at 
> org.apache.ignite.internal.processors.query.schema.SchemaIndexCacheVisitorImpl.processKey(SchemaIndexCacheVisitorImpl.java:254)
>   at 
> org.apache.ignite.internal.processors.query.schema.SchemaIndexCacheVisitorImpl.processPartition(SchemaIndexCacheVisitorImpl.java:217)
>   at 
> org.apache.ignite.internal.processors.query.schema.SchemaIndexCacheVisitorImpl.processPartitions(SchemaIndexCacheVisitorImpl.java:176)
>   at 
> org.apache.ignite.internal.processors.query.schema.SchemaIndexCacheVisitorImpl.visit(SchemaIndexCacheVisitorImpl.java:135)
>   at 
> org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.rebuildIndexesFromHash0(IgniteH2Indexing.java:2191)
>   at 
> org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing$7.body(IgniteH2Indexing.java:2154)
>   at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: class org.apache.ignite.binary.BinaryObjectException: Failed to 
> get field because type ID of passed object differs from type ID this 
> BinaryField belongs to [expected=-635374417, actual=1778229603]
>   at 
> org.apache.ignite.internal.binary.BinaryFieldImpl.fieldOrder(BinaryFieldImpl.java:287)
>   at 
> org.apache.ignite.internal.binary.BinaryFieldImpl.value(BinaryFieldImpl.java:109)
>   at 
> org.apache.ignite.internal.processors.query.property.QueryBinaryProperty.fieldValue(QueryBinaryProperty.java:220)
>   at 
> org.apache.ignite.internal.processors.query.property.QueryBinaryProperty.value(QueryBinaryProperty.java:116)
>   at 
> org.apache.ignite.internal.processors.query.h2.opt.GridH2RowDescriptor.columnValue(GridH2RowDescriptor.java:331)
>   at 
> org.apache.ignite.internal.processors.query.h2.opt.GridH2KeyValueRowOnheap.getValue0(GridH2KeyValueRowOnheap.java:122)
>   at 
> 

[jira] [Updated] (IGNITE-13191) Public-facing API for "waiting for backups on shutdown"

2020-07-14 Thread Ivan Rakov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-13191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Rakov updated IGNITE-13191:

Fix Version/s: (was: 2.9)
   2.10

> Public-facing API for "waiting for backups on shutdown"
> ---
>
> Key: IGNITE-13191
> URL: https://issues.apache.org/jira/browse/IGNITE-13191
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Vladislav Pyatkov
>Assignee: Vladislav Pyatkov
>Priority: Major
> Fix For: 2.10
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> We should introduce "should wait for backups on shutdown" flag in Ignition 
> and/or IgniteConfiguration.
> Maybe we should do the same to "cancel compute tasks" flag.
> Also make sure that we can shut down node explicitly, overriding this flag 
> but without JVM termination.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IGNITE-12320) Partial index rebuild fails in case indexed cache contains different datatypes

2020-07-13 Thread Ivan Rakov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Rakov resolved IGNITE-12320.
-
Fix Version/s: 2.9
   Resolution: Fixed

[~alapin] Thanks, merged to master.

> Partial index rebuild fails in case indexed cache contains different datatypes
> --
>
> Key: IGNITE-12320
> URL: https://issues.apache.org/jira/browse/IGNITE-12320
> Project: Ignite
>  Issue Type: Bug
>Reporter: Alexander Lapin
>Assignee: Alexander Lapin
>Priority: Major
> Fix For: 2.9
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The problem is that in case cache contains different datatypes, all of them 
> will be passed to IndexRebuildPartialClosure during iteration over partition. 
> Perhaps, TableCacheFilter is supposed to filter out entries of unwanted 
> types, but it doesn't work properly.
> Steps to reprocude:
> 1. Add entries of different types (both indexed and not) to cache
> 2. Trigger partial index rebuild
> Index rebuild will fail with the following error:
> {code:java}
> [2019-08-20 
> 00:33:55,640][ERROR][pub-#302%h2.GridIndexFullRebuildTest3%][IgniteTestResources]
>  Critical system error detected. Will be handled accordingly to configured 
> handler [hnd=NoOpFailureHandler [super=AbstractFailureHandler 
> [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, 
> SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext 
> [type=CRITICAL_ERROR, err=class 
> o.a.i.i.processors.cache.persistence.tree.CorruptedTreeException: B+Tree is 
> corrupted [pages(groupId, pageId)=[IgniteBiTuple [val1=98629247, 
> val2=844420635165670]], msg=Runtime failure on row: %s  string representation>]]]
> class 
> org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException:
>  B+Tree is corrupted [pages(groupId, pageId)=[IgniteBiTuple [val1=98629247, 
> val2=844420635165670]], msg=Runtime failure on row: %s  string representation>]
>   at 
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.corruptedTreeException(BPlusTree.java:5126)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.doPut(BPlusTree.java:2236)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.putx(BPlusTree.java:2183)
>   at 
> org.apache.ignite.internal.processors.query.h2.database.H2TreeIndex.putx(H2TreeIndex.java:285)
>   at 
> org.apache.ignite.internal.processors.query.h2.IndexRebuildPartialClosure.apply(IndexRebuildPartialClosure.java:49)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheMapEntry.updateIndex(GridCacheMapEntry.java:3867)
>   at 
> org.apache.ignite.internal.processors.query.schema.SchemaIndexCacheVisitorImpl.processKey(SchemaIndexCacheVisitorImpl.java:254)
>   at 
> org.apache.ignite.internal.processors.query.schema.SchemaIndexCacheVisitorImpl.processPartition(SchemaIndexCacheVisitorImpl.java:217)
>   at 
> org.apache.ignite.internal.processors.query.schema.SchemaIndexCacheVisitorImpl.processPartitions(SchemaIndexCacheVisitorImpl.java:176)
>   at 
> org.apache.ignite.internal.processors.query.schema.SchemaIndexCacheVisitorImpl.visit(SchemaIndexCacheVisitorImpl.java:135)
>   at 
> org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.rebuildIndexesFromHash0(IgniteH2Indexing.java:2191)
>   at 
> org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing$7.body(IgniteH2Indexing.java:2154)
>   at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: class org.apache.ignite.binary.BinaryObjectException: Failed to 
> get field because type ID of passed object differs from type ID this 
> BinaryField belongs to [expected=-635374417, actual=1778229603]
>   at 
> org.apache.ignite.internal.binary.BinaryFieldImpl.fieldOrder(BinaryFieldImpl.java:287)
>   at 
> org.apache.ignite.internal.binary.BinaryFieldImpl.value(BinaryFieldImpl.java:109)
>   at 
> org.apache.ignite.internal.processors.query.property.QueryBinaryProperty.fieldValue(QueryBinaryProperty.java:220)
>   at 
> org.apache.ignite.internal.processors.query.property.QueryBinaryProperty.value(QueryBinaryProperty.java:116)
>   at 
> org.apache.ignite.internal.processors.query.h2.opt.GridH2RowDescriptor.columnValue(GridH2RowDescriptor.java:331)
>   at 
> org.apache.ignite.internal.processors.query.h2.opt.GridH2KeyValueRowOnheap.getValue0(GridH2KeyValueRowOnheap.java:122)
>   at 
> 

[jira] [Commented] (IGNITE-13191) Public-facing API for "waiting for backups on shutdown"

2020-07-09 Thread Ivan Rakov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-13191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154885#comment-17154885
 ] 

Ivan Rakov commented on IGNITE-13191:
-

[~v.pyatkov] I've left a bit more comments in the PR.
Also, compilation is broken after attempt to merge your patch to master because 
of conflicting "IGNITE-13123 Move control utility to a separate module - Fixes 
#7910. Kirill Tkalenko* 7/8/2020 15:19 
071bb4e40d00f9bdaa5833bf41a8b6ed3a32da7f".
Please merge fresh master to your branch and resolve conflicts.

> Public-facing API for "waiting for backups on shutdown"
> ---
>
> Key: IGNITE-13191
> URL: https://issues.apache.org/jira/browse/IGNITE-13191
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Vladislav Pyatkov
>Assignee: Vladislav Pyatkov
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> We should introduce "should wait for backups on shutdown" flag in Ignition 
> and/or IgniteConfiguration.
> Maybe we should do the same to "cancel compute tasks" flag.
> Also make sure that we can shut down node explicitly, overriding this flag 
> but without JVM termination.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-13211) Improve public exceptions for case when user attempts to access data from a lost partition

2020-07-03 Thread Ivan Rakov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-13211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Rakov updated IGNITE-13211:

Description: 
After IGNITE-13003, attempt to access lost partition from public API throws 
CacheException with CacheInvalidStateException inside as a root cause. We can 
improve user experience a bit:
1. Create new type of public exception (subclass of CacheException), which will 
be thrown in scenarios when we access lost data
2. In case partition is lost in persistent cache, error message should be 
changed from "partition data has been lost" to "partition data temporary 
unavailable".

  was:
After IGNITE-13003, attempt to access lost partition from public API throws 
CacheException with CacheInvalidStateException inside as a root cause. We can 
improve user experience a bit:
1. Create new type of public exception (subclass of CacheException), which will 
be thrown in accessing lost data scenarios
2. In case partition is lost in persistent cache, error message should be 
changed from "partition data has been lost" to "partition data temporary 
unavailable".


> Improve public exceptions for case when user attempts to access data from a 
> lost partition
> --
>
> Key: IGNITE-13211
> URL: https://issues.apache.org/jira/browse/IGNITE-13211
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Rakov
>Priority: Major
>
> After IGNITE-13003, attempt to access lost partition from public API throws 
> CacheException with CacheInvalidStateException inside as a root cause. We can 
> improve user experience a bit:
> 1. Create new type of public exception (subclass of CacheException), which 
> will be thrown in scenarios when we access lost data
> 2. In case partition is lost in persistent cache, error message should be 
> changed from "partition data has been lost" to "partition data temporary 
> unavailable".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13211) Improve public exceptions for case when user attempts to access data from a lost partition

2020-07-03 Thread Ivan Rakov (Jira)
Ivan Rakov created IGNITE-13211:
---

 Summary: Improve public exceptions for case when user attempts to 
access data from a lost partition
 Key: IGNITE-13211
 URL: https://issues.apache.org/jira/browse/IGNITE-13211
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Rakov


After IGNITE-13003, attempt to access lost partition from public API throws 
CacheException with CacheInvalidStateException inside as a root cause. We can 
improve user experience a bit:
1. Create new type of public exception (subclass of CacheException), which will 
be thrown in accessing lost data scenarios
2. In case partition is lost in persistent cache, error message should be 
changed from "partition data has been lost" to "partition data temporary 
unavailable".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-13147) Avoid DHT topology map updates before it's initialization

2020-06-16 Thread Ivan Rakov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-13147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17136543#comment-17136543
 ] 

Ivan Rakov commented on IGNITE-13147:
-

[~ascherbakov] LGTM, please proceed to merge.

> Avoid DHT topology map updates before it's initialization
> -
>
> Key: IGNITE-13147
> URL: https://issues.apache.org/jira/browse/IGNITE-13147
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.8
>Reporter: Alexey Scherbakov
>Assignee: Alexey Scherbakov
>Priority: Major
> Fix For: 2.9
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> It can happen if a partition state is restored from persistent store on 
> logical recovery and can cause NPE on older versions during illegal access to 
> unitialized topology:
> {noformat}
> [ERROR][exchange-worker-#41][GridDhtPartitionsExchangeFuture] Failed to 
> reinitialize local partitions (rebalancing will be stopped): 
> GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=102, 
> minorTopVer=0], discoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode 
> [id=d159da1c-6f70-4ef2-bfa4-4feb64b829de, consistentId=T1SivatheObserver, 
> addrs=ArrayList [10.44.166.91, 127.0.0.1], sockAddrs=HashSet 
> [/127.0.0.1:56500, clrv052580.ic.ing.net/10.44.166.91:56500], 
> discPort=56500, order=102, intOrder=60, lastExchangeTime=1586354937705, 
> loc=true,, isClient=false], topVer=102, msgTemplate=null, nodeId8=d159da1c, 
> msg=null, type=NODE_JOINED, tstamp=1586354901638], nodeId=d159da1c, 
> evt=NODE_JOINED]
> java.lang.NullPointerException: null
>   at 
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignmentV2.getIds(GridAffinityAssignmentV2.java:211)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionTopologyImpl.updateLocal(GridDhtPartitionTopologyImpl.java:2554)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionTopologyImpl.afterStateRestored(GridDhtPartitionTopologyImpl.java:714)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.lambda$beforeExchange$38edadb$1(GridCacheDatabaseSharedManager.java:1514)
>   at 
> org.apache.ignite.internal.util.IgniteUtils.lambda$null$1(IgniteUtils.java:10790)
>   at java.util.concurrent.FutureTask.run(Unknown Source) ~[?:1.8.0_241]
>   at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) 
> ~[?:1.8.0_241]
>   at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) 
> ~[?:1.8.0_241]
>   at java.lang.Thread.run(Unknown Source) [?:1.8.0_241]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-13052) Calculate result of reserveHistoryForExchange in advance

2020-06-10 Thread Ivan Rakov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-13052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Rakov updated IGNITE-13052:

Fix Version/s: 2.9

> Calculate result of reserveHistoryForExchange in advance
> 
>
> Key: IGNITE-13052
> URL: https://issues.apache.org/jira/browse/IGNITE-13052
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Rakov
>Assignee: Vladislav Pyatkov
>Priority: Major
> Fix For: 2.9
>
>   Original Estimate: 80h
>  Time Spent: 20m
>  Remaining Estimate: 79h 40m
>
> Method reserveHistoryForExchange() is called on every partition map exchange. 
> It's an expensive call: it requires iteration over the whole checkpoint 
> history with possible retrieve of GroupState from WAL (it's stored on heap 
> with SoftReference). On some deployments this operation can take several 
> minutes.
> The idea of optimization is to calculate its result only on first PME 
> (ideally, even before first PME, on recovery stage), keep resulting map 
> (grpId, partId -> earlisetCheckpoint) on heap and update it if necessary. 
> From the first glance, the map should be updated:
> 1) On checkpoint. If a new partition appears on local node, it should be 
> registered in the map with current checkpoint. If a partition is evicted from 
> local node, or changes its state to non-OWNING, it should be removed from the 
> map. If checkpoint is marked as inapplicable for a certain group, the whole 
> group should be removed from the map.
> 2) On checkpoint history cleanup. For every (grpId, partId), previous 
> earliest checkpoint should be changed with setIfGreater to new earliest 
> checkpoint.
> We should also extract WAL pointer reservation and filtering small partitions 
> from reserveHistoryForExchange(), but this shouldn't be a problem.
> Another point for optimization: searchPartitionCounter() and 
> searchCheckpointEntry() are executed for each (grpId, partId). That means 
> we'll perform O(number of partitions) linear lookups in history. This should 
> be optimized as well: we can perform one lookup for all (grpId, partId) 
> pairs. This is especially critical for reserveHistoryForPreloading() method 
> complexity: it's executed from exchange thread.
> Memory overhead of storing described map on heap is insignificant. Its size 
> isn't greater than size of map returned from reserveHistoryForExchange().
> Described fix should be much simpler than IGNITE-12429.
> P.S. Possibly, instead of storing map, we can keep earliestCheckpoint right 
> in GridDhtLocalPartition. It may simplify implementation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13064) Set default transaction timeout to 5 minutes

2020-05-22 Thread Ivan Rakov (Jira)
Ivan Rakov created IGNITE-13064:
---

 Summary: Set default transaction timeout to 5 minutes
 Key: IGNITE-13064
 URL: https://issues.apache.org/jira/browse/IGNITE-13064
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Rakov


Let's set default TX timeout to 5 minutes (right now it's 0 = no timeout).
Pros:
1. Deadlock detection procedure is triggered on timeout. In case user will get 
into key-level deadlock, he'll be able to discover root cause from the logs 
(even though load will hang for a while) and skip step with googling and 
debugging.
2. Almost every system with transactions has timeout enabled by default.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-13052) Calculate result of reserveHistoryForExchange in advance

2020-05-21 Thread Ivan Rakov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-13052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Rakov updated IGNITE-13052:

Description: 
Method reserveHistoryForExchange() is called on every partition map exchange. 
It's an expensive call: it requires iteration over the whole checkpoint history 
with possible retrieve of GroupState from WAL (it's stored on heap with 
SoftReference). On some deployments this operation can take several minutes.

The idea of optimization is to calculate its result only on first PME (ideally, 
even before first PME, on recovery stage), keep resulting map (grpId, partId -> 
earlisetCheckpoint) on heap and update it if necessary. From the first glance, 
the map should be updated:
1) On checkpoint. If a new partition appears on local node, it should be 
registered in the map with current checkpoint. If a partition is evicted from 
local node, or changes its state to non-OWNING, it should be removed from the 
map. If checkpoint is marked as inapplicable for a certain group, the whole 
group should be removed from the map.
2) On checkpoint history cleanup. For every (grpId, partId), previous earliest 
checkpoint should be changed with setIfGreater to new earliest checkpoint.
We should also extract WAL pointer reservation and filtering small partitions 
from reserveHistoryForExchange(), but this shouldn't be a problem.
Another point for optimization: searchPartitionCounter() and 
searchCheckpointEntry() are executed for each (grpId, partId). That means we'll 
perform O(number of partitions) linear lookups in history. This should be 
optimized as well: we can perform one lookup for all (grpId, partId) pairs. 
This is especially critical for reserveHistoryForPreloading() method 
complexity: it's executed from exchange thread.

Memory overhead of storing described map on heap is insignificant. Its size 
isn't greater than size of map returned from reserveHistoryForExchange().

Described fix should be much simpler than IGNITE-12429.

P.S. Possibly, instead of storing map, we can keep earlistCheckpoint right in 
GridDhtLocalPartition. It may simplify implementation.


  was:
Method reserveHistoryForExchange() is called on every partition map exchange. 
It's an expensive call: it requires iteration over the whole checkpoint history 
with possible retrieve of GroupState from WAL (it's stored on heap with 
SoftReference). On some deployments this operation can take several minutes.

The idea of optimization is to calculate its result only on first PME (ideally, 
even before first PME, on recovery stage), keep resulting map (grpId, partId -> 
earlisetCheckpoint) on heap and update it if necessary. From the first glance, 
the map should be updated:
1) On checkpoint. If a new partition appears on local node, it should be 
registered in the map with current checkpoint. If a partition is evicted from 
local node, or changes its state to non-OWNING, it should be removed from the 
map. If checkpoint is marked as inapplicable for a certain group, the whole 
group should be removed from the map.
2) On checkpoint history cleanup. For every (grpId, partId), previous earliest 
checkpoint should be changed with setIfGreater to new earliest checkpoint.
We should also extract WAL pointer reservation and filtering small partitions 
from reserveHistoryForExchange(), but this shouldn't be a problem.
Another point: possibly, instead of storing map, we can keep earlistCheckpoint 
right in GridDhtLocalPartition. It may simplify implementation.

Memory overhead of storing described map on heap is insignificant. Its size 
isn't greater than size of map returned from reserveHistoryForExchange().

Described fix should be much simpler than IGNITE-12429.


> Calculate result of reserveHistoryForExchange in advance
> 
>
> Key: IGNITE-13052
> URL: https://issues.apache.org/jira/browse/IGNITE-13052
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Rakov
>Priority: Major
>   Original Estimate: 80h
>  Remaining Estimate: 80h
>
> Method reserveHistoryForExchange() is called on every partition map exchange. 
> It's an expensive call: it requires iteration over the whole checkpoint 
> history with possible retrieve of GroupState from WAL (it's stored on heap 
> with SoftReference). On some deployments this operation can take several 
> minutes.
> The idea of optimization is to calculate its result only on first PME 
> (ideally, even before first PME, on recovery stage), keep resulting map 
> (grpId, partId -> earlisetCheckpoint) on heap and update it if necessary. 
> From the first glance, the map should be updated:
> 1) On checkpoint. If a new partition appears on local node, it should be 
> registered in the map with current checkpoint. If a partition is evicted from 
> local node, or changes its state to 

[jira] [Updated] (IGNITE-13052) Calculate result of reserveHistoryForExchange in advance

2020-05-21 Thread Ivan Rakov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-13052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Rakov updated IGNITE-13052:

Description: 
Method reserveHistoryForExchange() is called on every partition map exchange. 
It's an expensive call: it requires iteration over the whole checkpoint history 
with possible retrieve of GroupState from WAL (it's stored on heap with 
SoftReference). On some deployments this operation can take several minutes.

The idea of optimization is to calculate its result only on first PME (ideally, 
even before first PME, on recovery stage), keep resulting map (grpId, partId -> 
earlisetCheckpoint) on heap and update it if necessary. From the first glance, 
the map should be updated:
1) On checkpoint. If a new partition appears on local node, it should be 
registered in the map with current checkpoint. If a partition is evicted from 
local node, or changes its state to non-OWNING, it should be removed from the 
map. If checkpoint is marked as inapplicable for a certain group, the whole 
group should be removed from the map.
2) On checkpoint history cleanup. For every (grpId, partId), previous earliest 
checkpoint should be changed with setIfGreater to new earliest checkpoint.
We should also extract WAL pointer reservation and filtering small partitions 
from reserveHistoryForExchange(), but this shouldn't be a problem.
Another point for optimization: searchPartitionCounter() and 
searchCheckpointEntry() are executed for each (grpId, partId). That means we'll 
perform O(number of partitions) linear lookups in history. This should be 
optimized as well: we can perform one lookup for all (grpId, partId) pairs. 
This is especially critical for reserveHistoryForPreloading() method 
complexity: it's executed from exchange thread.

Memory overhead of storing described map on heap is insignificant. Its size 
isn't greater than size of map returned from reserveHistoryForExchange().

Described fix should be much simpler than IGNITE-12429.

P.S. Possibly, instead of storing map, we can keep earliestCheckpoint right in 
GridDhtLocalPartition. It may simplify implementation.


  was:
Method reserveHistoryForExchange() is called on every partition map exchange. 
It's an expensive call: it requires iteration over the whole checkpoint history 
with possible retrieve of GroupState from WAL (it's stored on heap with 
SoftReference). On some deployments this operation can take several minutes.

The idea of optimization is to calculate its result only on first PME (ideally, 
even before first PME, on recovery stage), keep resulting map (grpId, partId -> 
earlisetCheckpoint) on heap and update it if necessary. From the first glance, 
the map should be updated:
1) On checkpoint. If a new partition appears on local node, it should be 
registered in the map with current checkpoint. If a partition is evicted from 
local node, or changes its state to non-OWNING, it should be removed from the 
map. If checkpoint is marked as inapplicable for a certain group, the whole 
group should be removed from the map.
2) On checkpoint history cleanup. For every (grpId, partId), previous earliest 
checkpoint should be changed with setIfGreater to new earliest checkpoint.
We should also extract WAL pointer reservation and filtering small partitions 
from reserveHistoryForExchange(), but this shouldn't be a problem.
Another point for optimization: searchPartitionCounter() and 
searchCheckpointEntry() are executed for each (grpId, partId). That means we'll 
perform O(number of partitions) linear lookups in history. This should be 
optimized as well: we can perform one lookup for all (grpId, partId) pairs. 
This is especially critical for reserveHistoryForPreloading() method 
complexity: it's executed from exchange thread.

Memory overhead of storing described map on heap is insignificant. Its size 
isn't greater than size of map returned from reserveHistoryForExchange().

Described fix should be much simpler than IGNITE-12429.

P.S. Possibly, instead of storing map, we can keep earlistCheckpoint right in 
GridDhtLocalPartition. It may simplify implementation.



> Calculate result of reserveHistoryForExchange in advance
> 
>
> Key: IGNITE-13052
> URL: https://issues.apache.org/jira/browse/IGNITE-13052
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Rakov
>Priority: Major
>   Original Estimate: 80h
>  Remaining Estimate: 80h
>
> Method reserveHistoryForExchange() is called on every partition map exchange. 
> It's an expensive call: it requires iteration over the whole checkpoint 
> history with possible retrieve of GroupState from WAL (it's stored on heap 
> with SoftReference). On some deployments this operation can take several 
> minutes.
> The idea of optimization is to calculate its result only on 

[jira] [Updated] (IGNITE-13052) Calculate result of reserveHistoryForExchange in advance

2020-05-21 Thread Ivan Rakov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-13052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Rakov updated IGNITE-13052:

Description: 
Method reserveHistoryForExchange() is called on every partition map exchange. 
It's an expensive call: it requires iteration over the whole checkpoint history 
with possible retrieve of GroupState from WAL (it's stored on heap with 
SoftReference). On some deployments this operation can take several minutes.

The idea of optimization is to calculate its result only on first PME (ideally, 
even before first PME, on recovery stage), keep resulting map (grpId, partId -> 
earlisetCheckpoint) on heap and update it if necessary. From the first glance, 
the map should be updated:
1) On checkpoint. If a new partition appears on local node, it should be 
registered in the map with current checkpoint. If a partition is evicted from 
local node, or changes its state to non-OWNING, it should be removed from the 
map. If checkpoint is marked as inapplicable for a certain group, the whole 
group should be removed from the map.
2) On checkpoint history cleanup. For every (grpId, partId), previous earliest 
checkpoint should be changed with setIfGreater to new earliest checkpoint.
We should also extract WAL pointer reservation and filtering small partitions 
from reserveHistoryForExchange(), but this shouldn't be a problem.
Another point: possibly, instead of storing map, we can keep earlistCheckpoint 
right in GridDhtLocalPartition. It may simplify implementation.

Memory overhead of storing described map on heap in significant. It's size 
isn't greater than size of map returned from reserveHistoryForExchange().

Described fix should be much simpler than IGNITE-12429.

  was:
Method reserveHistoryForExchange() is called on every partition map exchange. 
It's an expensive call: it requires iteration over the whole checkpoint history 
with possible retrieve of GroupState from WAL (it's stored on heap with 
SoftReference). On some deployments this operation can take several minutes.

The idea of optimization is to calculate its result only on first PME (ideally, 
even before first PME, on recovery stage), keep resulting map (grpId, partId -> 
earlisetCheckpoint) on heap and update it if necessary. From the first glance, 
the map should be updated:
1) On checkpoint. If a new partition appears on local node, it should be 
registered in the map with current checkpoint. If a partition is evicted from 
local node, or changed its state to non-OWNING, it should be removed from the 
map. If checkpoint is marked as inapplicable for a certain group, the whole 
group should be removed from the map.
2) On checkpoint history cleanup. For every (grpId, partId), previous earliest 
checkpoint should be changed with setIfGreater to new earliest checkpoint.
We should also extract WAL pointer reservation and filtering small partitions 
from reserveHistoryForExchange(), but this shouldn't be a problem.
Another point: possibly, instead of storing map, we can keep earlistCheckpoint 
right in GridDhtLocalPartition. It may simplify implementation.

Memory overhead of storing described map on heap in significant. It's size 
isn't greater than size of map returned from reserveHistoryForExchange().

Described fix should be much simpler than IGNITE-12429.


> Calculate result of reserveHistoryForExchange in advance
> 
>
> Key: IGNITE-13052
> URL: https://issues.apache.org/jira/browse/IGNITE-13052
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Rakov
>Priority: Major
>   Original Estimate: 80h
>  Remaining Estimate: 80h
>
> Method reserveHistoryForExchange() is called on every partition map exchange. 
> It's an expensive call: it requires iteration over the whole checkpoint 
> history with possible retrieve of GroupState from WAL (it's stored on heap 
> with SoftReference). On some deployments this operation can take several 
> minutes.
> The idea of optimization is to calculate its result only on first PME 
> (ideally, even before first PME, on recovery stage), keep resulting map 
> (grpId, partId -> earlisetCheckpoint) on heap and update it if necessary. 
> From the first glance, the map should be updated:
> 1) On checkpoint. If a new partition appears on local node, it should be 
> registered in the map with current checkpoint. If a partition is evicted from 
> local node, or changes its state to non-OWNING, it should be removed from the 
> map. If checkpoint is marked as inapplicable for a certain group, the whole 
> group should be removed from the map.
> 2) On checkpoint history cleanup. For every (grpId, partId), previous 
> earliest checkpoint should be changed with setIfGreater to new earliest 
> checkpoint.
> We should also extract WAL pointer reservation and filtering small partitions 
> 

[jira] [Updated] (IGNITE-13052) Calculate result of reserveHistoryForExchange in advance

2020-05-21 Thread Ivan Rakov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-13052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Rakov updated IGNITE-13052:

Description: 
Method reserveHistoryForExchange() is called on every partition map exchange. 
It's an expensive call: it requires iteration over the whole checkpoint history 
with possible retrieve of GroupState from WAL (it's stored on heap with 
SoftReference). On some deployments this operation can take several minutes.

The idea of optimization is to calculate its result only on first PME (ideally, 
even before first PME, on recovery stage), keep resulting map (grpId, partId -> 
earlisetCheckpoint) on heap and update it if necessary. From the first glance, 
the map should be updated:
1) On checkpoint. If a new partition appears on local node, it should be 
registered in the map with current checkpoint. If a partition is evicted from 
local node, or changed its state to non-OWNING, it should be removed from the 
map. If checkpoint is marked as inapplicable for a certain group, the whole 
group should be removed from the map.
2) On checkpoint history cleanup. For every (grpId, partId), previous earliest 
checkpoint should be changed with setIfGreater to new earliest checkpoint.
We should also extract WAL pointer reservation and filtering small partitions 
from reserveHistoryForExchange(), but this shouldn't be a problem.
Another point: possibly, instead of storing map, we can keep earlistCheckpoint 
right in GridDhtLocalPartition. It may simplify implementation.

Memory overhead of storing described map on heap in significant. It's size 
isn't greater than size of map returned from reserveHistoryForExchange().

Described fix should be much simpler than IGNITE-12429.

  was:
Method reserveHistoryForExchange() is called on every partition map exchange. 
It's an expensive call: it requires iteration over the whole checkpoint history 
with possible retrieve of GroupState from WAL (it's stored on heap with 
SoftReference). On some deployments this operation can take several minutes.

The idea of optimization is to calculate its result only on first PME (ideally, 
even before first PME, on recovery stage), keep resulting map (grpId, partId -> 
earlisetCheckpoint) on heap and update it if necessary. From the first glance, 
the map should be updated:
1) On checkpoint. If a new partition appears on local node, it should be 
registered in the map with current checkpoint. If a partition is evicted from 
local node, or changed its state to non-OWNING, it should be removed from the 
map. If checkpoint is marked as inapplicable for a certain group, the whole 
group should be removed from the map.
2) On checkpoint history cleanup. For every (grpId, partId), previous earliest 
checkpoint should be changed with setIfGreater to new earliest checkpoint.
We should extract WAL pointer reservation

Memory overhead of storing described map on heap in significant. It's size 
isn't greater than size of map returned from reserveHistoryForExchange().

Described fix should be much simpler than IGNITE-12429.


> Calculate result of reserveHistoryForExchange in advance
> 
>
> Key: IGNITE-13052
> URL: https://issues.apache.org/jira/browse/IGNITE-13052
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Rakov
>Priority: Major
>   Original Estimate: 80h
>  Remaining Estimate: 80h
>
> Method reserveHistoryForExchange() is called on every partition map exchange. 
> It's an expensive call: it requires iteration over the whole checkpoint 
> history with possible retrieve of GroupState from WAL (it's stored on heap 
> with SoftReference). On some deployments this operation can take several 
> minutes.
> The idea of optimization is to calculate its result only on first PME 
> (ideally, even before first PME, on recovery stage), keep resulting map 
> (grpId, partId -> earlisetCheckpoint) on heap and update it if necessary. 
> From the first glance, the map should be updated:
> 1) On checkpoint. If a new partition appears on local node, it should be 
> registered in the map with current checkpoint. If a partition is evicted from 
> local node, or changed its state to non-OWNING, it should be removed from the 
> map. If checkpoint is marked as inapplicable for a certain group, the whole 
> group should be removed from the map.
> 2) On checkpoint history cleanup. For every (grpId, partId), previous 
> earliest checkpoint should be changed with setIfGreater to new earliest 
> checkpoint.
> We should also extract WAL pointer reservation and filtering small partitions 
> from reserveHistoryForExchange(), but this shouldn't be a problem.
> Another point: possibly, instead of storing map, we can keep 
> earlistCheckpoint right in GridDhtLocalPartition. It may simplify 
> implementation.
> Memory overhead of storing 

[jira] [Updated] (IGNITE-13052) Calculate result of reserveHistoryForExchange in advance

2020-05-21 Thread Ivan Rakov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-13052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Rakov updated IGNITE-13052:

Description: 
Method reserveHistoryForExchange() is called on every partition map exchange. 
It's an expensive call: it requires iteration over the whole checkpoint history 
with possible retrieve of GroupState from WAL (it's stored on heap with 
SoftReference). On some deployments this operation can take several minutes.

The idea of optimization is to calculate its result only on first PME (ideally, 
even before first PME, on recovery stage), keep resulting map (grpId, partId -> 
earlisetCheckpoint) on heap and update it if necessary. From the first glance, 
the map should be updated:
1) On checkpoint. If a new partition appears on local node, it should be 
registered in the map with current checkpoint. If a partition is evicted from 
local node, or changes its state to non-OWNING, it should be removed from the 
map. If checkpoint is marked as inapplicable for a certain group, the whole 
group should be removed from the map.
2) On checkpoint history cleanup. For every (grpId, partId), previous earliest 
checkpoint should be changed with setIfGreater to new earliest checkpoint.
We should also extract WAL pointer reservation and filtering small partitions 
from reserveHistoryForExchange(), but this shouldn't be a problem.
Another point: possibly, instead of storing map, we can keep earlistCheckpoint 
right in GridDhtLocalPartition. It may simplify implementation.

Memory overhead of storing described map on heap is insignificant. Its size 
isn't greater than size of map returned from reserveHistoryForExchange().

Described fix should be much simpler than IGNITE-12429.

  was:
Method reserveHistoryForExchange() is called on every partition map exchange. 
It's an expensive call: it requires iteration over the whole checkpoint history 
with possible retrieve of GroupState from WAL (it's stored on heap with 
SoftReference). On some deployments this operation can take several minutes.

The idea of optimization is to calculate its result only on first PME (ideally, 
even before first PME, on recovery stage), keep resulting map (grpId, partId -> 
earlisetCheckpoint) on heap and update it if necessary. From the first glance, 
the map should be updated:
1) On checkpoint. If a new partition appears on local node, it should be 
registered in the map with current checkpoint. If a partition is evicted from 
local node, or changes its state to non-OWNING, it should be removed from the 
map. If checkpoint is marked as inapplicable for a certain group, the whole 
group should be removed from the map.
2) On checkpoint history cleanup. For every (grpId, partId), previous earliest 
checkpoint should be changed with setIfGreater to new earliest checkpoint.
We should also extract WAL pointer reservation and filtering small partitions 
from reserveHistoryForExchange(), but this shouldn't be a problem.
Another point: possibly, instead of storing map, we can keep earlistCheckpoint 
right in GridDhtLocalPartition. It may simplify implementation.

Memory overhead of storing described map on heap in significant. Its size isn't 
greater than size of map returned from reserveHistoryForExchange().

Described fix should be much simpler than IGNITE-12429.


> Calculate result of reserveHistoryForExchange in advance
> 
>
> Key: IGNITE-13052
> URL: https://issues.apache.org/jira/browse/IGNITE-13052
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Rakov
>Priority: Major
>   Original Estimate: 80h
>  Remaining Estimate: 80h
>
> Method reserveHistoryForExchange() is called on every partition map exchange. 
> It's an expensive call: it requires iteration over the whole checkpoint 
> history with possible retrieve of GroupState from WAL (it's stored on heap 
> with SoftReference). On some deployments this operation can take several 
> minutes.
> The idea of optimization is to calculate its result only on first PME 
> (ideally, even before first PME, on recovery stage), keep resulting map 
> (grpId, partId -> earlisetCheckpoint) on heap and update it if necessary. 
> From the first glance, the map should be updated:
> 1) On checkpoint. If a new partition appears on local node, it should be 
> registered in the map with current checkpoint. If a partition is evicted from 
> local node, or changes its state to non-OWNING, it should be removed from the 
> map. If checkpoint is marked as inapplicable for a certain group, the whole 
> group should be removed from the map.
> 2) On checkpoint history cleanup. For every (grpId, partId), previous 
> earliest checkpoint should be changed with setIfGreater to new earliest 
> checkpoint.
> We should also extract WAL pointer reservation and filtering small partitions 
> 

[jira] [Updated] (IGNITE-13052) Calculate result of reserveHistoryForExchange in advance

2020-05-21 Thread Ivan Rakov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-13052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Rakov updated IGNITE-13052:

Description: 
Method reserveHistoryForExchange() is called on every partition map exchange. 
It's an expensive call: it requires iteration over the whole checkpoint history 
with possible retrieve of GroupState from WAL (it's stored on heap with 
SoftReference). On some deployments this operation can take several minutes.

The idea of optimization is to calculate its result only on first PME (ideally, 
even before first PME, on recovery stage), keep resulting map (grpId, partId -> 
earlisetCheckpoint) on heap and update it if necessary. From the first glance, 
the map should be updated:
1) On checkpoint. If a new partition appears on local node, it should be 
registered in the map with current checkpoint. If a partition is evicted from 
local node, or changes its state to non-OWNING, it should be removed from the 
map. If checkpoint is marked as inapplicable for a certain group, the whole 
group should be removed from the map.
2) On checkpoint history cleanup. For every (grpId, partId), previous earliest 
checkpoint should be changed with setIfGreater to new earliest checkpoint.
We should also extract WAL pointer reservation and filtering small partitions 
from reserveHistoryForExchange(), but this shouldn't be a problem.
Another point: possibly, instead of storing map, we can keep earlistCheckpoint 
right in GridDhtLocalPartition. It may simplify implementation.

Memory overhead of storing described map on heap in significant. Its size isn't 
greater than size of map returned from reserveHistoryForExchange().

Described fix should be much simpler than IGNITE-12429.

  was:
Method reserveHistoryForExchange() is called on every partition map exchange. 
It's an expensive call: it requires iteration over the whole checkpoint history 
with possible retrieve of GroupState from WAL (it's stored on heap with 
SoftReference). On some deployments this operation can take several minutes.

The idea of optimization is to calculate its result only on first PME (ideally, 
even before first PME, on recovery stage), keep resulting map (grpId, partId -> 
earlisetCheckpoint) on heap and update it if necessary. From the first glance, 
the map should be updated:
1) On checkpoint. If a new partition appears on local node, it should be 
registered in the map with current checkpoint. If a partition is evicted from 
local node, or changes its state to non-OWNING, it should be removed from the 
map. If checkpoint is marked as inapplicable for a certain group, the whole 
group should be removed from the map.
2) On checkpoint history cleanup. For every (grpId, partId), previous earliest 
checkpoint should be changed with setIfGreater to new earliest checkpoint.
We should also extract WAL pointer reservation and filtering small partitions 
from reserveHistoryForExchange(), but this shouldn't be a problem.
Another point: possibly, instead of storing map, we can keep earlistCheckpoint 
right in GridDhtLocalPartition. It may simplify implementation.

Memory overhead of storing described map on heap in significant. It's size 
isn't greater than size of map returned from reserveHistoryForExchange().

Described fix should be much simpler than IGNITE-12429.


> Calculate result of reserveHistoryForExchange in advance
> 
>
> Key: IGNITE-13052
> URL: https://issues.apache.org/jira/browse/IGNITE-13052
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Rakov
>Priority: Major
>   Original Estimate: 80h
>  Remaining Estimate: 80h
>
> Method reserveHistoryForExchange() is called on every partition map exchange. 
> It's an expensive call: it requires iteration over the whole checkpoint 
> history with possible retrieve of GroupState from WAL (it's stored on heap 
> with SoftReference). On some deployments this operation can take several 
> minutes.
> The idea of optimization is to calculate its result only on first PME 
> (ideally, even before first PME, on recovery stage), keep resulting map 
> (grpId, partId -> earlisetCheckpoint) on heap and update it if necessary. 
> From the first glance, the map should be updated:
> 1) On checkpoint. If a new partition appears on local node, it should be 
> registered in the map with current checkpoint. If a partition is evicted from 
> local node, or changes its state to non-OWNING, it should be removed from the 
> map. If checkpoint is marked as inapplicable for a certain group, the whole 
> group should be removed from the map.
> 2) On checkpoint history cleanup. For every (grpId, partId), previous 
> earliest checkpoint should be changed with setIfGreater to new earliest 
> checkpoint.
> We should also extract WAL pointer reservation and filtering small partitions 
> 

[jira] [Updated] (IGNITE-13052) Calculate result of reserveHistoryForExchange in advance

2020-05-21 Thread Ivan Rakov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-13052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Rakov updated IGNITE-13052:

Description: 
Method reserveHistoryForExchange() is called on every partition map exchange. 
It's an expensive call: it requires iteration over the whole checkpoint history 
with possible retrieve of GroupState from WAL (it's stored on heap with 
SoftReference). On some deployments this operation can take several minutes.

The idea of optimization is to calculate its result only on first PME (ideally, 
even before first PME, on recovery stage), keep resulting map (grpId, partId -> 
earlisetCheckpoint) on heap and update it if necessary. From the first glance, 
the map should be updated:
1) On checkpoint. If a new partition appears on local node, it should be 
registered in the map with current checkpoint. If a partition is evicted from 
local node, or changed its state to non-OWNING, it should be removed from the 
map. If checkpoint is marked as inapplicable for a certain group, the whole 
group should be removed from the map.
2) On checkpoint history cleanup. For every (grpId, partId), previous earliest 
checkpoint should be changed with setIfGreater to new earliest checkpoint.
We should extract WAL pointer reservation

Memory overhead of storing described map on heap in significant. It's size 
isn't greater than size of map returned from reserveHistoryForExchange().

Described fix should be much simpler than IGNITE-12429.

  was:
Method reserveHistoryForExchange() is called on every partition map exchange. 
It's an expensive call: it requires iteration over the whole checkpoint history 
with possible retrieve of GroupState from WAL (it's stored on heap with 
SoftReference). On some deployments this operation can take several minutes.

The idea of optimization is to calculate its result only on first PME (ideally, 
even before first PME, on recovery stage), keep resulting map (grpId, partId -> 
earlisetCheckpoint) on heap and update it if necessary. From the first glance, 
the map should be updated:
1) On checkpoint. If a new partition appears on local node, it should be 
registered in the map with current checkpoint. If a partition is evicted from 
local node, or changed its state to non-OWNING, it should be removed from the 
map. If checkpoint is marked as inapplicable for a certain group, the whole 
group should be removed from the map.
2) On checkpoint history cleanup. For every (grpId, partId), previous earliest 
checkpoint should be changed with setIfGreater to new earliest checkpoint.

Memory overhead of storing described map on heap in significant. It's size 
isn't greater than size of map returned from reserveHistoryForExchange().

Described fix should be much simpler than IGNITE-12429.


> Calculate result of reserveHistoryForExchange in advance
> 
>
> Key: IGNITE-13052
> URL: https://issues.apache.org/jira/browse/IGNITE-13052
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Rakov
>Priority: Major
>   Original Estimate: 80h
>  Remaining Estimate: 80h
>
> Method reserveHistoryForExchange() is called on every partition map exchange. 
> It's an expensive call: it requires iteration over the whole checkpoint 
> history with possible retrieve of GroupState from WAL (it's stored on heap 
> with SoftReference). On some deployments this operation can take several 
> minutes.
> The idea of optimization is to calculate its result only on first PME 
> (ideally, even before first PME, on recovery stage), keep resulting map 
> (grpId, partId -> earlisetCheckpoint) on heap and update it if necessary. 
> From the first glance, the map should be updated:
> 1) On checkpoint. If a new partition appears on local node, it should be 
> registered in the map with current checkpoint. If a partition is evicted from 
> local node, or changed its state to non-OWNING, it should be removed from the 
> map. If checkpoint is marked as inapplicable for a certain group, the whole 
> group should be removed from the map.
> 2) On checkpoint history cleanup. For every (grpId, partId), previous 
> earliest checkpoint should be changed with setIfGreater to new earliest 
> checkpoint.
> We should extract WAL pointer reservation
> Memory overhead of storing described map on heap in significant. It's size 
> isn't greater than size of map returned from reserveHistoryForExchange().
> Described fix should be much simpler than IGNITE-12429.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-13052) Calculate result of reserveHistoryForExchange in advance

2020-05-21 Thread Ivan Rakov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-13052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Rakov updated IGNITE-13052:

Description: 
Method reserveHistoryForExchange() is called on every partition map exchange. 
It's an expensive call: it requires iteration over the whole checkpoint history 
with possible retrieve of GroupState from WAL (it's stored on heap with 
SoftReference). On some deployments this operation can take several minutes.

The idea of optimization is to calculate its result only on first PME (ideally, 
even before first PME, on recovery stage), keep resulting map (grpId, partId -> 
earlisetCheckpoint) on heap and update it if necessary. From the first glance, 
the map should be updated:
1) On checkpoint. If a new partition appears on local node, it should be 
registered in the map with current checkpoint. If a partition is evicted from 
local node, or changed its state to non-OWNING, it should be removed from the 
map. If checkpoint is marked as inapplicable for a certain group, the whole 
group should be removed from the map.
2) On checkpoint history cleanup. For every (grpId, partId), previous earliest 
checkpoint should be changed with setIfGreater to new earliest checkpoint.

Memory overhead of storing described map on heap in significant. It's size 
isn't greater than size of map returned from reserveHistoryForExchange().

Described fix should be much simpler than IGNITE-12429.

  was:
Method reserveHistoryForExchange() is called on every partition map exchange. 
It's an expensive call: it requires iteration over the whole checkpoint history 
with possible retrieve of GroupState from WAL (it's stored on heap with 
SoftReference). On some deployments this operation can take several minutes.

The idea of optimization is to calculate its result only on first PME (ideally, 
even before first PME, on recovery stage), keep resulting map (grpId, partId -> 
earlisetCheckpoint) on heap and update it if necessary. From the first glance, 
the map should be updated:
1) On checkpoint. If a new partition appears on local node, it should be 
registered in the map with current checkpoint. If a partition is evicted from 
local node, or changed its state to non-OWNING, it should removed from the map. 
If checkpoint is marked as inapplicable for a certain group, the whole group 
should be removed from the map.
2) On checkpoint history cleanup. For every (grpId, partId), previous earliest 
checkpoint should be changed with setIfGreater to new earliest checkpoint.

Memory overhead of storing described map on heap in significant. It's size 
isn't greater than size of map returned from reserveHistoryForExchange().

Described fix should be much simpler than IGNITE-12429.


> Calculate result of reserveHistoryForExchange in advance
> 
>
> Key: IGNITE-13052
> URL: https://issues.apache.org/jira/browse/IGNITE-13052
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Rakov
>Priority: Major
>   Original Estimate: 80h
>  Remaining Estimate: 80h
>
> Method reserveHistoryForExchange() is called on every partition map exchange. 
> It's an expensive call: it requires iteration over the whole checkpoint 
> history with possible retrieve of GroupState from WAL (it's stored on heap 
> with SoftReference). On some deployments this operation can take several 
> minutes.
> The idea of optimization is to calculate its result only on first PME 
> (ideally, even before first PME, on recovery stage), keep resulting map 
> (grpId, partId -> earlisetCheckpoint) on heap and update it if necessary. 
> From the first glance, the map should be updated:
> 1) On checkpoint. If a new partition appears on local node, it should be 
> registered in the map with current checkpoint. If a partition is evicted from 
> local node, or changed its state to non-OWNING, it should be removed from the 
> map. If checkpoint is marked as inapplicable for a certain group, the whole 
> group should be removed from the map.
> 2) On checkpoint history cleanup. For every (grpId, partId), previous 
> earliest checkpoint should be changed with setIfGreater to new earliest 
> checkpoint.
> Memory overhead of storing described map on heap in significant. It's size 
> isn't greater than size of map returned from reserveHistoryForExchange().
> Described fix should be much simpler than IGNITE-12429.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-13052) Calculate result of reserveHistoryForExchange in advance

2020-05-21 Thread Ivan Rakov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-13052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Rakov updated IGNITE-13052:

Description: 
Method reserveHistoryForExchange() is called on every partition map exchange. 
It's an expensive call: it requires iteration over the whole checkpoint history 
with possible retrieve of GroupState from WAL (it's stored on heap with 
SoftReference). On some deployments this operation can take several minutes.

The idea of optimization is to calculate it's result only on first PME 
(ideally, even before first PME, on recovery stage), keep resulting map (grpId, 
partId -> earlisetCheckpoint) on heap and update it if necessary. From the 
first glance, map should be updated:
1) On checkpoint. If a new partition appears on local node, it should be 
registered in the map with current checkpoint. If a partition is evicted from 
local node, or changed its state to non-OWNING, it should removed from the map. 
If checkpoint is marked as inapplicable for a certain group, the whole group 
should be removed from the map.
2) On checkpoint history cleanup. For every (grpId, partId), previous earliest 
checkpoint should be changed with setIfGreater to new earliest checkpoint.

Memory overhead of storing described map on heap in significant. It's size 
isn't greater than size of map returned from reserveHistoryForExchange().

Described fix should be much simpler than IGNITE-12429.

  was:
Method reserveHistoryForExchange() is called on every partition map exchange. 
It's an expensive call: it requires iteration over the whole checkpoint history 
with possible retrieve of GroupState from WAL (it's stored on heap with 
SoftReference). On some deployments this operation can take several minutes.

The idea of optimization is to calculate it's result only on first PME 
(ideally, even before first PME, on recovery stage), keep resulting map {grpId, 
partId -> earlisetCheckpoint} on heap and update it if necessary. From the 
first glance, map should be updated:
1) On checkpoint. If a new partition appears on local node, it should be 
registered in the map with current checkpoint. If a partition is evicted from 
local node, or changed its state to non-OWNING, it should removed from the map. 
If checkpoint is marked as inapplicable for a certain group, the whole group 
should be removed from the map.
2) On checkpoint history cleanup. For every (grpId, partId), previous earliest 
checkpoint should be changed with setIfGreater to new earliest checkpoint.

Memory overhead of storing described map on heap in significant. It's size 
isn't greater than size of map returned from reserveHistoryForExchange().

Described fix should be much simpler than IGNITE-12429.


> Calculate result of reserveHistoryForExchange in advance
> 
>
> Key: IGNITE-13052
> URL: https://issues.apache.org/jira/browse/IGNITE-13052
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Rakov
>Priority: Major
>   Original Estimate: 80h
>  Remaining Estimate: 80h
>
> Method reserveHistoryForExchange() is called on every partition map exchange. 
> It's an expensive call: it requires iteration over the whole checkpoint 
> history with possible retrieve of GroupState from WAL (it's stored on heap 
> with SoftReference). On some deployments this operation can take several 
> minutes.
> The idea of optimization is to calculate it's result only on first PME 
> (ideally, even before first PME, on recovery stage), keep resulting map 
> (grpId, partId -> earlisetCheckpoint) on heap and update it if necessary. 
> From the first glance, map should be updated:
> 1) On checkpoint. If a new partition appears on local node, it should be 
> registered in the map with current checkpoint. If a partition is evicted from 
> local node, or changed its state to non-OWNING, it should removed from the 
> map. If checkpoint is marked as inapplicable for a certain group, the whole 
> group should be removed from the map.
> 2) On checkpoint history cleanup. For every (grpId, partId), previous 
> earliest checkpoint should be changed with setIfGreater to new earliest 
> checkpoint.
> Memory overhead of storing described map on heap in significant. It's size 
> isn't greater than size of map returned from reserveHistoryForExchange().
> Described fix should be much simpler than IGNITE-12429.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-13052) Calculate result of reserveHistoryForExchange in advance

2020-05-21 Thread Ivan Rakov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-13052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Rakov updated IGNITE-13052:

Remaining Estimate: 80h  (was: 240h)
 Original Estimate: 80h  (was: 240h)

> Calculate result of reserveHistoryForExchange in advance
> 
>
> Key: IGNITE-13052
> URL: https://issues.apache.org/jira/browse/IGNITE-13052
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Rakov
>Priority: Major
>   Original Estimate: 80h
>  Remaining Estimate: 80h
>
> Method reserveHistoryForExchange() is called on every partition map exchange. 
> It's an expensive call: it requires iteration over the whole checkpoint 
> history with possible retrieve of GroupState from WAL (it's stored on heap 
> with SoftReference). On some deployments this operation can take several 
> minutes.
> The idea of optimization is to calculate it's result only on first PME 
> (ideally, even before first PME, on recovery stage), keep resulting map 
> {grpId, partId -> earlisetCheckpoint} on heap and update it if necessary. 
> From the first glance, map should be updated:
> 1) On checkpoint. If a new partition appears on local node, it should be 
> registered in the map with current checkpoint. If a partition is evicted from 
> local node, or changed its state to non-OWNING, it should removed from the 
> map. If checkpoint is marked as inapplicable for a certain group, the whole 
> group should be removed from the map.
> 2) On checkpoint history cleanup. For every (grpId, partId), previous 
> earliest checkpoint should be changed with setIfGreater to new earliest 
> checkpoint.
> Memory overhead of storing described map on heap in significant. It's size 
> isn't greater than size of map returned from reserveHistoryForExchange().
> Described fix should be much simpler than IGNITE-12429.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-13052) Calculate result of reserveHistoryForExchange in advance

2020-05-21 Thread Ivan Rakov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-13052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Rakov updated IGNITE-13052:

Description: 
Method reserveHistoryForExchange() is called on every partition map exchange. 
It's an expensive call: it requires iteration over the whole checkpoint history 
with possible retrieve of GroupState from WAL (it's stored on heap with 
SoftReference). On some deployments this operation can take several minutes.

The idea of optimization is to calculate its result only on first PME (ideally, 
even before first PME, on recovery stage), keep resulting map (grpId, partId -> 
earlisetCheckpoint) on heap and update it if necessary. From the first glance, 
the map should be updated:
1) On checkpoint. If a new partition appears on local node, it should be 
registered in the map with current checkpoint. If a partition is evicted from 
local node, or changed its state to non-OWNING, it should removed from the map. 
If checkpoint is marked as inapplicable for a certain group, the whole group 
should be removed from the map.
2) On checkpoint history cleanup. For every (grpId, partId), previous earliest 
checkpoint should be changed with setIfGreater to new earliest checkpoint.

Memory overhead of storing described map on heap in significant. It's size 
isn't greater than size of map returned from reserveHistoryForExchange().

Described fix should be much simpler than IGNITE-12429.

  was:
Method reserveHistoryForExchange() is called on every partition map exchange. 
It's an expensive call: it requires iteration over the whole checkpoint history 
with possible retrieve of GroupState from WAL (it's stored on heap with 
SoftReference). On some deployments this operation can take several minutes.

The idea of optimization is to calculate its result only on first PME (ideally, 
even before first PME, on recovery stage), keep resulting map (grpId, partId -> 
earlisetCheckpoint) on heap and update it if necessary. From the first glance, 
map should be updated:
1) On checkpoint. If a new partition appears on local node, it should be 
registered in the map with current checkpoint. If a partition is evicted from 
local node, or changed its state to non-OWNING, it should removed from the map. 
If checkpoint is marked as inapplicable for a certain group, the whole group 
should be removed from the map.
2) On checkpoint history cleanup. For every (grpId, partId), previous earliest 
checkpoint should be changed with setIfGreater to new earliest checkpoint.

Memory overhead of storing described map on heap in significant. It's size 
isn't greater than size of map returned from reserveHistoryForExchange().

Described fix should be much simpler than IGNITE-12429.


> Calculate result of reserveHistoryForExchange in advance
> 
>
> Key: IGNITE-13052
> URL: https://issues.apache.org/jira/browse/IGNITE-13052
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Rakov
>Priority: Major
>   Original Estimate: 80h
>  Remaining Estimate: 80h
>
> Method reserveHistoryForExchange() is called on every partition map exchange. 
> It's an expensive call: it requires iteration over the whole checkpoint 
> history with possible retrieve of GroupState from WAL (it's stored on heap 
> with SoftReference). On some deployments this operation can take several 
> minutes.
> The idea of optimization is to calculate its result only on first PME 
> (ideally, even before first PME, on recovery stage), keep resulting map 
> (grpId, partId -> earlisetCheckpoint) on heap and update it if necessary. 
> From the first glance, the map should be updated:
> 1) On checkpoint. If a new partition appears on local node, it should be 
> registered in the map with current checkpoint. If a partition is evicted from 
> local node, or changed its state to non-OWNING, it should removed from the 
> map. If checkpoint is marked as inapplicable for a certain group, the whole 
> group should be removed from the map.
> 2) On checkpoint history cleanup. For every (grpId, partId), previous 
> earliest checkpoint should be changed with setIfGreater to new earliest 
> checkpoint.
> Memory overhead of storing described map on heap in significant. It's size 
> isn't greater than size of map returned from reserveHistoryForExchange().
> Described fix should be much simpler than IGNITE-12429.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-13052) Calculate result of reserveHistoryForExchange in advance

2020-05-21 Thread Ivan Rakov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-13052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Rakov updated IGNITE-13052:

Description: 
Method reserveHistoryForExchange() is called on every partition map exchange. 
It's an expensive call: it requires iteration over the whole checkpoint history 
with possible retrieve of GroupState from WAL (it's stored on heap with 
SoftReference). On some deployments this operation can take several minutes.

The idea of optimization is to calculate its result only on first PME (ideally, 
even before first PME, on recovery stage), keep resulting map (grpId, partId -> 
earlisetCheckpoint) on heap and update it if necessary. From the first glance, 
map should be updated:
1) On checkpoint. If a new partition appears on local node, it should be 
registered in the map with current checkpoint. If a partition is evicted from 
local node, or changed its state to non-OWNING, it should removed from the map. 
If checkpoint is marked as inapplicable for a certain group, the whole group 
should be removed from the map.
2) On checkpoint history cleanup. For every (grpId, partId), previous earliest 
checkpoint should be changed with setIfGreater to new earliest checkpoint.

Memory overhead of storing described map on heap in significant. It's size 
isn't greater than size of map returned from reserveHistoryForExchange().

Described fix should be much simpler than IGNITE-12429.

  was:
Method reserveHistoryForExchange() is called on every partition map exchange. 
It's an expensive call: it requires iteration over the whole checkpoint history 
with possible retrieve of GroupState from WAL (it's stored on heap with 
SoftReference). On some deployments this operation can take several minutes.

The idea of optimization is to calculate it's result only on first PME 
(ideally, even before first PME, on recovery stage), keep resulting map (grpId, 
partId -> earlisetCheckpoint) on heap and update it if necessary. From the 
first glance, map should be updated:
1) On checkpoint. If a new partition appears on local node, it should be 
registered in the map with current checkpoint. If a partition is evicted from 
local node, or changed its state to non-OWNING, it should removed from the map. 
If checkpoint is marked as inapplicable for a certain group, the whole group 
should be removed from the map.
2) On checkpoint history cleanup. For every (grpId, partId), previous earliest 
checkpoint should be changed with setIfGreater to new earliest checkpoint.

Memory overhead of storing described map on heap in significant. It's size 
isn't greater than size of map returned from reserveHistoryForExchange().

Described fix should be much simpler than IGNITE-12429.


> Calculate result of reserveHistoryForExchange in advance
> 
>
> Key: IGNITE-13052
> URL: https://issues.apache.org/jira/browse/IGNITE-13052
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Rakov
>Priority: Major
>   Original Estimate: 80h
>  Remaining Estimate: 80h
>
> Method reserveHistoryForExchange() is called on every partition map exchange. 
> It's an expensive call: it requires iteration over the whole checkpoint 
> history with possible retrieve of GroupState from WAL (it's stored on heap 
> with SoftReference). On some deployments this operation can take several 
> minutes.
> The idea of optimization is to calculate its result only on first PME 
> (ideally, even before first PME, on recovery stage), keep resulting map 
> (grpId, partId -> earlisetCheckpoint) on heap and update it if necessary. 
> From the first glance, map should be updated:
> 1) On checkpoint. If a new partition appears on local node, it should be 
> registered in the map with current checkpoint. If a partition is evicted from 
> local node, or changed its state to non-OWNING, it should removed from the 
> map. If checkpoint is marked as inapplicable for a certain group, the whole 
> group should be removed from the map.
> 2) On checkpoint history cleanup. For every (grpId, partId), previous 
> earliest checkpoint should be changed with setIfGreater to new earliest 
> checkpoint.
> Memory overhead of storing described map on heap in significant. It's size 
> isn't greater than size of map returned from reserveHistoryForExchange().
> Described fix should be much simpler than IGNITE-12429.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13052) Calculate result of reserveHistoryForExchange in advance

2020-05-21 Thread Ivan Rakov (Jira)
Ivan Rakov created IGNITE-13052:
---

 Summary: Calculate result of reserveHistoryForExchange in advance
 Key: IGNITE-13052
 URL: https://issues.apache.org/jira/browse/IGNITE-13052
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Rakov


Method reserveHistoryForExchange() is called on every partition map exchange. 
It's an expensive call: it requires iteration over the whole checkpoint history 
with possible retrieve of GroupState from WAL (it's stored on heap with 
SoftReference). On some deployments this operation can take several minutes.

The idea of optimization is to calculate it's result only on first PME 
(ideally, even before first PME, on recovery stage), keep resulting map {grpId, 
partId -> earlisetCheckpoint} on heap and update it if necessary. From the 
first glance, map should be updated:
1) On checkpoint. If a new partition appears on local node, it should be 
registered in the map with current checkpoint. If a partition is evicted from 
local node, or changed its state to non-OWNING, it should removed from the map. 
If checkpoint is marked as inapplicable for a certain group, the whole group 
should be removed from the map.
2) On checkpoint history cleanup. For every (grpId, partId), previous earliest 
checkpoint should be changed with setIfGreater to new earliest checkpoint.

Memory overhead of storing described map on heap in significant. It's size 
isn't greater than size of map returned from reserveHistoryForExchange().

Described fix should be much simpler than IGNITE-12429.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12942) control.sh add command for checking that inline size is same on all cluster nodes

2020-05-07 Thread Ivan Rakov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17101948#comment-17101948
 ] 

Ivan Rakov commented on IGNITE-12942:
-

[~antonovsergey93] Thanks for your contribution. Merged to master.

> control.sh add command for checking that inline size is same on all cluster 
> nodes
> -
>
> Key: IGNITE-12942
> URL: https://issues.apache.org/jira/browse/IGNITE-12942
> Project: Ignite
>  Issue Type: New Feature
>Reporter: Sergey Antonov
>Assignee: Sergey Antonov
>Priority: Major
> Fix For: 2.9
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We should check that the inline size is the same for the index on all nodes 
> in the cluster.
> 1. Check inline_size of secondary indexes on node join. Warn to log if they 
> differ and propose a recreate problem index.
> 2. Introduce a new command to control.sh (control.sh --cache 
> check_index_inline_sizes). The command will check inline sizes of secondary 
> indexes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12935) Disadvantages in log of historical rebalance

2020-04-29 Thread Ivan Rakov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17095690#comment-17095690
 ] 

Ivan Rakov commented on IGNITE-12935:
-

My comments:
1, 2 - agree
regarding NOT_RESERVED_WAL_REASON - I'd keep it to track cases when 
CheckpointHistory provide entries that aren't present in WAL anymore (shouldn't 
happen, but just in case)
3 - we log best partition only in case we didn't succeed in finding any 
partition suitable for WAL rebalance. Logging other partition is redundant: 
their history is even more shallow.
4 - agree
5 - totally agree, this is crucial. Forgot about it during my review
6 - agree
7 - arguable. Main purpose of these logs is investigation of full rebalance 
reasons post factum. Anyway, lots of messages about lots of partition will 
occur only on PME when some partitions need to be rebalanced, which shouldn't 
happed very often.

> Disadvantages in log of historical rebalance
> 
>
> Key: IGNITE-12935
> URL: https://issues.apache.org/jira/browse/IGNITE-12935
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Vladislav Pyatkov
>Assignee: Vladislav Pyatkov
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> # Mention in the log only partitions for which there are no nodes that suit 
> as historical supplier
>  For these partitions, print minimal counter (since which we should perform 
> historical rebalancing) with corresponding node and maximum reserved counter 
> (since which cluster can perform historical rebalancing) with corresponding 
> node.
>  This will let us know:
>  ## Whether history was reserved at all
>  ## How much reserved history we lack to perform a historical rebalancing
>  ## I see resulting output like this:
> {noformat}
>  Historical rebalancing wasn't scheduled for some partitions:
>  History wasn't reserved for: [list of partitions and groups]
>  History was reserved, but minimum present counter is less than maximum 
> reserved: [[grp=GRP, part=ID, minCntr=cntr, minNodeId=ID, maxReserved=cntr, 
> maxReservedNodeId=ID], ...]{noformat}
>  ## We can also aggregate previous message by (minNodeId) to easily find the 
> exact node (or nodes) which were the reason of full rebalance.
>  # Log results of {{reserveHistoryForExchange()}}. They can be compactly 
> represented as mappings: {{(grpId -> checkpoint (id, timestamp))}}. For every 
> group, also log message about why the previous checkpoint wasn't successfully 
> reserved.
>  There can be three reasons:
>  ## Previous checkpoint simply isn't present in the history (the oldest is 
> reserved)
>  ## WAL reservation failure (call below returned false)
> {code:java}
> chpEntry = entry(cpTs);
> boolean reserved = cctx.wal().reserve(chpEntry.checkpointMark());// If 
> checkpoint WAL history can't be reserved, stop searching. 
> if (!reserved) 
>   break;{code}
> ## Checkpoint was marked as inapplicable for historical rebalancing
> {code:java}
> for (Integer grpId : new HashSet<>(groupsAndPartitions.keySet()))
>    if (!isCheckpointApplicableForGroup(grpId, chpEntry))
>      groupsAndPartitions.remove(grpId);{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12795) Partially revert changes to atomic caches introduced in IGNITE-11797

2020-04-24 Thread Ivan Rakov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091370#comment-17091370
 ] 

Ivan Rakov commented on IGNITE-12795:
-

[~ascherbakov] I'll take a look on what's wrong with TC bot.
LGTM, please merge.

> Partially revert changes to atomic caches introduced in IGNITE-11797
> 
>
> Key: IGNITE-12795
> URL: https://issues.apache.org/jira/browse/IGNITE-12795
> Project: Ignite
>  Issue Type: Bug
>  Components: cache
>Affects Versions: 2.8
>Reporter: Alexey Scherbakov
>Assignee: Alexey Scherbakov
>Priority: Blocker
> Fix For: 2.8.1
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> These changes can trigger node failure during update on backup:
> Better fix for atomics is needed.
> {noformat}
> class org.apache.ignite.IgniteCheckedException: Failed to update the counter 
> [newVal=173, curState=Counter [lwm=172, holes={173=Item [start=173, 
> delta=1]}, maxApplied=174, hwm=173]]
> at 
> org.apache.ignite.internal.processors.cache.PartitionUpdateCounterTrackingImpl.update(PartitionUpdateCounterTrackingImpl.java:152)
> at 
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.updateCounter(IgniteCacheOffheapManagerImpl.java:1578)
> at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.updateCounter(GridCacheOffheapManager.java:2198)
> at 
> org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtLocalPartition.nextUpdateCounter(GridDhtLocalPartition.java:995)
> at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheEntry.nextPartitionCounter(GridDhtCacheEntry.java:104)
> at 
> org.apache.ignite.internal.processors.cache.GridCacheMapEntry$AtomicCacheUpdateClosure.update(GridCacheMapEntry.java:6434)
> at 
> org.apache.ignite.internal.processors.cache.GridCacheMapEntry$AtomicCacheUpdateClosure.call(GridCacheMapEntry.java:6190)
> at 
> org.apache.ignite.internal.processors.cache.GridCacheMapEntry$AtomicCacheUpdateClosure.call(GridCacheMapEntry.java:5881)
> at 
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Invoke.invokeClosure(BPlusTree.java:3995)
> at 
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Invoke.access$5700(BPlusTree.java:3889)
> at 
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invokeDown(BPlusTree.java:2020)
> at 
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invoke(BPlusTree.java:1904)
> at 
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke0(IgniteCacheOffheapManagerImpl.java:1656)
> at 
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke(IgniteCacheOffheapManagerImpl.java:1639)
> at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.invoke(GridCacheOffheapManager.java:2450)
> at 
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.invoke(IgniteCacheOffheapManagerImpl.java:436)
> at 
> org.apache.ignite.internal.processors.cache.GridCacheMapEntry.innerUpdate(GridCacheMapEntry.java:2311)
> at 
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.processDhtAtomicUpdateRequest(GridDhtAtomicCache.java:3362)
> at 
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$600(GridDhtAtomicCache.java:139)
> at 
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$7.apply(GridDhtAtomicCache.java:311)
> at 
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$7.apply(GridDhtAtomicCache.java:306)
> at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1142)
> at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:591)
> at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:392)
> at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:318)
> at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:109)
> at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:308)
> at 
> org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1637)
> at 
> 

[jira] [Commented] (IGNITE-12759) Getting a SecurityContext from GridSecurityProcessor

2020-04-17 Thread Ivan Rakov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17085731#comment-17085731
 ] 

Ivan Rakov commented on IGNITE-12759:
-

[~avinogradov] Looks good.

> Getting a SecurityContext from GridSecurityProcessor
> 
>
> Key: IGNITE-12759
> URL: https://issues.apache.org/jira/browse/IGNITE-12759
> Project: Ignite
>  Issue Type: Improvement
>  Components: security
>Reporter: Denis Garus
>Assignee: Denis Garus
>Priority: Major
>  Labels: iep-41
> Fix For: 2.8.1
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Extend the _GridSecurityProcessor_ interface by adding _securityContext(UUID 
> subjId)_ method and use this method to get the actual security context.
> h4. Backward compatibility
> The logic of getting security context for Ignite:
>  # Try to get a security context using _ClusterNode_ attributes (as it works 
> now);
>  # Get a security context through _GridSecurityProcessor_.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12805) Node fails to restart

2020-04-08 Thread Ivan Rakov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17078461#comment-17078461
 ] 

Ivan Rakov commented on IGNITE-12805:
-

[~slava.koptilin] The fix looks good to me! Please proceed to merge.

> Node fails to restart
> -
>
> Key: IGNITE-12805
> URL: https://issues.apache.org/jira/browse/IGNITE-12805
> Project: Ignite
>  Issue Type: Bug
>  Components: cache
>Affects Versions: 2.8
>Reporter: Sarunas Valaskevicius
>Assignee: Vyacheslav Koptilin
>Priority: Blocker
> Fix For: 2.8.1
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> 1. nodes have default persistence false, but there is a cache region with 
> persistence on.
> 2. a cluster starts ok with ignite data directory clean
> 3. but when the nodes are restarted, they fail and can never join the cluster 
> again:
>  
> {code:java}
> 12:352020-03-19_13:34:30.273 [main-0] ERROR 
> o.a.ignite.internal.IgniteKernal:137 <> - Exception during start processors, 
> node will be stopped and close connections
> java.lang.NullPointerException: null
> at 
> org.apache.ignite.internal.processors.cache.GridCacheUtils.affinityNode(GridCacheUtils.java:1374)
> at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$CachePredicate.dataNode(GridDiscoveryManager.java:3205)
> at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.cacheAffinityNode(GridDiscoveryManager.java:1894)
> at 
> org.apache.ignite.internal.processors.cache.ValidationOnNodeJoinUtils.validate(ValidationOnNodeJoinUtils.java:330)
> at 
> org.apache.ignite.internal.processors.cache.GridCacheProcessor.createCacheContext(GridCacheProcessor.java:1201)
> at 
> org.apache.ignite.internal.processors.cache.GridCacheProcessor.startCacheInRecoveryMode(GridCacheProcessor.java:2291)
> at 
> org.apache.ignite.internal.processors.cache.GridCacheProcessor.access$1700(GridCacheProcessor.java:202)
> at 
> org.apache.ignite.internal.processors.cache.GridCacheProcessor$CacheRecoveryLifecycle.afterBinaryMemoryRestore(GridCacheProcessor.java:5387)
> at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restoreBinaryMemory(GridCacheDatabaseSharedManager.java:1075)
> at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.startMemoryRestore(GridCacheDatabaseSharedManager.java:2068)
> at 
> org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1254)
> at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2038)
> at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1703)
> at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1117)
> at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:637) 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12797) Add command line option to CommandHandler to be able to see full stack trace and cause exception in log in case of error.

2020-04-02 Thread Ivan Rakov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17073988#comment-17073988
 ] 

Ivan Rakov commented on IGNITE-12797:
-

[~ktkale...@gridgain.com] Please get green TC visa and resolve merge conflicts.

> Add command line option to CommandHandler to be able to see full stack trace 
> and cause exception in log in case of error.
> -
>
> Key: IGNITE-12797
> URL: https://issues.apache.org/jira/browse/IGNITE-12797
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Kirill Tkalenko
>Assignee: Kirill Tkalenko
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In case of error control.sh can print common error message without any 
> information about root cause of error. Printing full stack trace and cause 
> can ease the analysis. User should be able to turn it on when launching 
> control.sh with specific option.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12759) Getting a SecurityContext from GridSecurityProcessor

2020-04-01 Thread Ivan Rakov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17073002#comment-17073002
 ] 

Ivan Rakov commented on IGNITE-12759:
-

[~garus.d.g] Looks good to me.

> Getting a SecurityContext from GridSecurityProcessor
> 
>
> Key: IGNITE-12759
> URL: https://issues.apache.org/jira/browse/IGNITE-12759
> Project: Ignite
>  Issue Type: Improvement
>  Components: security
>Reporter: Denis Garus
>Assignee: Denis Garus
>Priority: Major
>  Labels: iep-41
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Extend the _GridSecurityProcessor_ interface by adding _securityContext(UUID 
> subjId)_ method and use this method to get the actual security context.
> h4. Backward compatibility
> The logic of getting security context for Ignite:
>  # Try to get a security context using _ClusterNode_ attributes (as it works 
> now);
>  # Get a security context through _GridSecurityProcessor_.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12759) Getting a SecurityContext from GridSecurityProcessor

2020-03-30 Thread Ivan Rakov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17071160#comment-17071160
 ] 

Ivan Rakov commented on IGNITE-12759:
-

[~garus.d.g] Please check comments in the PR.

> Getting a SecurityContext from GridSecurityProcessor
> 
>
> Key: IGNITE-12759
> URL: https://issues.apache.org/jira/browse/IGNITE-12759
> Project: Ignite
>  Issue Type: Improvement
>  Components: security
>Reporter: Denis Garus
>Assignee: Denis Garus
>Priority: Major
>  Labels: iep-41
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Extend the _GridSecurityProcessor_ interface by adding _securityContext(UUID 
> subjId)_ method and use this method to get the actual security context.
> h4. Backward compatibility
> The logic of getting security context for Ignite:
>  # Try to get a security context using _ClusterNode_ attributes (as it works 
> now);
>  # Get a security context through _GridSecurityProcessor_.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12759) Getting a SecurityContext from GridSecurityProcessor

2020-03-27 Thread Ivan Rakov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17068821#comment-17068821
 ] 

Ivan Rakov commented on IGNITE-12759:
-

[~garus.d.g] Thanks! 
Last commit was 9 days ago, perhaps you forgot to push your changes.

> Getting a SecurityContext from GridSecurityProcessor
> 
>
> Key: IGNITE-12759
> URL: https://issues.apache.org/jira/browse/IGNITE-12759
> Project: Ignite
>  Issue Type: Improvement
>  Components: security
>Reporter: Denis Garus
>Assignee: Denis Garus
>Priority: Major
>  Labels: iep-41
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Extend the _GridSecurityProcessor_ interface by adding _securityContext(UUID 
> subjId)_ method and use this method to get the actual security context.
> h4. Backward compatibility
> The logic of getting security context for Ignite:
>  # Try to get a security context using _ClusterNode_ attributes (as it works 
> now);
>  # Get a security context through _GridSecurityProcessor_.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (IGNITE-12759) Getting a SecurityContext from GridSecurityProcessor

2020-03-27 Thread Ivan Rakov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17068420#comment-17068420
 ] 

Ivan Rakov edited comment on IGNITE-12759 at 3/27/20, 8:42 AM:
---

[~garus.d.g] In general, changes look good to me.
 I'd propose to describe in the javadocs of IgniteSecurityProcessor and 
GridSecurityProcessor the difference between their areas of responsibility.
 The picture that I have right now in my head:

*GridSecurityProcessor is responsible for:*
 - Node authentication
 - Thin client authentication
 - Providing configuration info whether global node authentication is enabled
 - Keeping and propagating all authenticated security subjects
 - Providing configuration info whether security mode is enabled at all
 - Handling expired sessions
 - Providing configuration info whether sandbox is enabled
 - Keeping and propagating authenticated security contexts for thin clients
 - Authorizing specific operations (cache put, task execute, so on) when 
session security context is set

*IgniteSecurityProcessor is responsible for:*
 - Keeping and propagating authenticated security contexts for cluster nodes
 - Delegating calls for all aforementioned actions to GridSecurityProcessor 
(IgniteSecurityProcessor serves here as a facade with is exposed to Ignite 
internal code, while GridSecurityProcessor is hidden and managed from 
IgniteSecurityProcessor)
 - Managing sandbox and proving point of entry to the internal sandbox API

Also, javadocs should answer to the following questions:
 - Difference between security subject and security context
 - Authentication and authorization flow (authenticate call creates and ensures 
further propagation of the security subject / context or subject ID; 
#withContext(ctx / subjId) forces current thread to perform operations is 
secure mode; #authorize called in security mode performs an actual permission 
check)
 - Whether GridSecurityProcessor is responsible for keeping authenticated 
subjects of cluster nodes (or authenticatedSubjects should return only 
instances for thin clients)
- Advice on how to implement user-specific security (the common pattern here is 
to embed custom GridSecurityProcessor via Ignite plugin and keep default 
IgniteSecurityProcessor)

I believe that while our security API is messy, presence of this information in 
the javadocs will help a lot in untangling security logic by another Ignite 
contributors and users.


was (Author: ivan.glukos):
[~garus.d.g] In general, changes look good to me.
 I'd propose to describe in the javadocs of IgniteSecurityProcessor and 
GridSecurityProcessor the difference between their areas of responsibilities.
 The picture that I have right now in my head:

*GridSecurityProcessor is responsible for:*
 - Node authentication
 - Thin client authentication
 - Providing configuration info whether global node authentication is enabled
 - Keeping and propagating all authenticated security subjects
 - Providing configuration info whether security mode is enabled at all
 - Handling expired sessions
 - Providing configuration info whether sandbox is enabled
 - Keeping and propagating authenticated security contexts for thin clients
 - Authorizing specific operations (cache put, task execute, so on) when 
session security context is set

*IgniteSecurityProcessor is responsible for:*
 - Keeping and propagating authenticated security contexts for cluster nodes
 - Delegating calls for all aforementioned actions to GridSecurityProcessor 
(IgniteSecurityProcessor serves here as a facade with is exposed to Ignite 
internal code, while GridSecurityProcessor is hidden and managed from 
IgniteSecurityProcessor)
 - Managing sandbox and proving point of entry to the internal sandbox API

Also, javadocs should answer to the following questions:
 - Difference between security subject and security context
 - Authentication and authorization flow (authenticate call creates and ensures 
further propagation of the security subject / context or subject ID; 
#withContext(ctx / subjId) forces current thread to perform operations is 
secure mode; #authorize called in security mode performs an actual permission 
check)
 - Whether GridSecurityProcessor is responsible for keeping authenticated 
subjects of cluster nodes (or authenticatedSubjects should return only 
instances for thin clients)
- Advice on how to implement user-specific security (the common pattern here is 
to embed custom GridSecurityProcessor via Ignite plugin and keep default 
IgniteSecurityProcessor)

I believe that while our security API is messy, presence of this information in 
the javadocs will help a lot in untangling security logic by another Ignite 
contributors and users.

> Getting a SecurityContext from GridSecurityProcessor
> 
>
> Key: IGNITE-12759
> URL: 

[jira] [Commented] (IGNITE-12759) Getting a SecurityContext from GridSecurityProcessor

2020-03-27 Thread Ivan Rakov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17068420#comment-17068420
 ] 

Ivan Rakov commented on IGNITE-12759:
-

[~garus.d.g] In general, changes look good to me.
 I'd propose to describe in the javadocs of IgniteSecurityProcessor and 
GridSecurityProcessor the difference between their areas of responsibilities.
 The picture that I have right now in my head:

*GridSecurityProcessor is responsible for:*
 - Node authentication
 - Thin client authentication
 - Providing configuration info whether global node authentication is enabled
 - Keeping and propagating all authenticated security subjects
 - Providing configuration info whether security mode is enabled at all
 - Handling expired sessions
 - Providing configuration info whether sandbox is enabled
 - Keeping and propagating authenticated security contexts for thin clients
 - Authorizing specific operations (cache put, task execute, so on) when 
session security context is set

*IgniteSecurityProcessor is responsible for:*
 - Keeping and propagating authenticated security contexts for cluster nodes
 - Delegating calls for all aforementioned actions to GridSecurityProcessor 
(IgniteSecurityProcessor serves here as a facade with is exposed to Ignite 
internal code, while GridSecurityProcessor is hidden and managed from 
IgniteSecurityProcessor)
 - Managing sandbox and proving point of entry to the internal sandbox API

Also, javadocs should answer to the following questions:
 - Difference between security subject and security context
 - Authentication and authorization flow (authenticate call creates and ensures 
further propagation of the security subject / context or subject ID; 
#withContext(ctx / subjId) forces current thread to perform operations is 
secure mode; #authorize called in security mode performs an actual permission 
check)
 - Whether GridSecurityProcessor is responsible for keeping authenticated 
subjects of cluster nodes (or authenticatedSubjects should return only 
instances for thin clients)
- Advice on how to implement user-specific security (the common pattern here is 
to embed custom GridSecurityProcessor via Ignite plugin and keep default 
IgniteSecurityProcessor)

I believe that while our security API is messy, presence of this information in 
the javadocs will help a lot in untangling security logic by another Ignite 
contributors and users.

> Getting a SecurityContext from GridSecurityProcessor
> 
>
> Key: IGNITE-12759
> URL: https://issues.apache.org/jira/browse/IGNITE-12759
> Project: Ignite
>  Issue Type: Improvement
>  Components: security
>Reporter: Denis Garus
>Assignee: Denis Garus
>Priority: Major
>  Labels: iep-41
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Extend the _GridSecurityProcessor_ interface by adding _securityContext(UUID 
> subjId)_ method and use this method to get the actual security context.
> h4. Backward compatibility
> The logic of getting security context for Ignite:
>  # Try to get a security context using _ClusterNode_ attributes (as it works 
> now);
>  # Get a security context through _GridSecurityProcessor_.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-12709) Server latch initialized after client latch in Zookeeper discovery

2020-03-23 Thread Ivan Rakov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Rakov updated IGNITE-12709:

Release Note: Fixed potential partition map exchange hanging on Zookeeper 
discovery clusters

> Server latch initialized after client latch in Zookeeper discovery
> --
>
> Key: IGNITE-12709
> URL: https://issues.apache.org/jira/browse/IGNITE-12709
> Project: Ignite
>  Issue Type: Bug
>Reporter: Anton Kalashnikov
>Assignee: Anton Kalashnikov
>Priority: Major
> Fix For: 2.9
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The coordinator node missed latch message from the client because it doesn't 
> receive a triggered message of exchange. So it leads to infinity wait of 
> answer from the coordinator.
> {noformat}
> [2019-10-23 
> 12:49:42,110]\[ERROR]\[sys-#39470%continuous.GridEventConsumeSelfTest0%]\[GridIoManager]
>  An error occurred processing the message \[msg=GridIoMessage \[plc=2, 
> topic=TOPIC_EXCHANGE, topicOrd=31, ordered=fa
> lse, timeout=0, skipOnTimeout=false, 
> msg=org.apache.ignite.internal.processors.cache.distributed.dht.preloader.latch.LatchAckMessage@7699f4f2],
>  nodeId=857a40a8-f384-4740-816c-dd54d3a1].
> class org.apache.ignite.IgniteException: Topology AffinityTopologyVersion 
> \[topVer=54, minorTopVer=0] not found in discovery history ; consider 
> increasing IGNITE_DISCOVERY_HISTORY_SIZE property. Current value is
> -1
> at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.latch.ExchangeLatchManager.aliveNodesForTopologyVer(ExchangeLatchManager.java:292)
> at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.latch.ExchangeLatchManager.getLatchCoordinator(ExchangeLatchManager.java:334)
> at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.latch.ExchangeLatchManager.processAck(ExchangeLatchManager.java:379)
> at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.latch.ExchangeLatchManager.lambda$new$0(ExchangeLatchManager.java:119)
> at 
> org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1632)
> at 
> org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1252)
> at 
> org.apache.ignite.internal.managers.communication.GridIoManager.access$4300(GridIoManager.java:143)
> at 
> org.apache.ignite.internal.managers.communication.GridIoManager$8.execute(GridIoManager.java:1143)
> at 
> org.apache.ignite.internal.managers.communication.TraceRunnable.run(TraceRunnable.java:50)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> [2019-10-23 12:50:02,106]\[WARN 
> ]\[exchange-worker-#39517%continuous.GridEventConsumeSelfTest1%]\[GridDhtPartitionsExchangeFuture]
>  Unable to await partitions release latch within timeout: ClientLatch 
> \[coordinator=ZookeeperClusterNode \[id=760ca6b5-f30b-4c40-81b1-5b602c20, 
> addrs=\[127.0.0.1], order=1, loc=false, client=false], ackSent=true, 
> super=CompletableLatch \[id=CompletableLatchUid \[id=exchange, 
> topVer=AffinityTopologyVersion \[topVer=54, minorTopVer=0
> [2019-10-23 12:50:02,192]\[WARN 
> ]\[exchange-worker-#39469%continuous.GridEventConsumeSelfTest0%]\[GridDhtPartitionsExchangeFuture]
>  Unable to await partitions release latch within timeout: ServerLatch 
> \[permits=1, pendingAcks=HashSet \[06c3094b-c1f3-4fe8-81e8-22cb6602], 
> super=CompletableLatch \[id=CompletableLatchUid \[id=exchange, 
> topVer=AffinityTopologyVersion \[topVer=54, minorTopVer=0
> {noformat}
> Reproduced by 
> org.apache.ignite.internal.processors.continuous.GridEventConsumeSelfTest#testMultithreadedWithNodeRestart



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (IGNITE-12622) Forbid mixed cache groups with both atomic and transactional caches

2020-03-18 Thread Ivan Rakov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Rakov reassigned IGNITE-12622:
---

Assignee: Ivan Rakov

> Forbid mixed cache groups with both atomic and transactional caches
> ---
>
> Key: IGNITE-12622
> URL: https://issues.apache.org/jira/browse/IGNITE-12622
> Project: Ignite
>  Issue Type: Improvement
>  Components: cache
>Reporter: Ivan Rakov
>Assignee: Ivan Rakov
>Priority: Major
> Fix For: 2.9
>
>
> Apparently it's possible in Ignite to configure a cache group with both 
> ATOMIC and TRANSACTIONAL caches.
> IgniteCacheGroupsTest#testContinuousQueriesMultipleGroups* tests.
> As per discussed on dev list 
> (http://apache-ignite-developers.2346864.n4.nabble.com/Forbid-mixed-cache-groups-with-both-atomic-and-transactional-caches-td45586.html),
>  the community has concluded that such configurations should be prohibited.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-12746) Regression in GridCacheColocatedDebugTest: putAll of sorted keys causes deadlock

2020-03-18 Thread Ivan Rakov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Rakov updated IGNITE-12746:

Release Note: Fixed bug in transaction thread chaining mechanism that 
possibly could cause deadlock on concurrent putAll scenarios

> Regression in GridCacheColocatedDebugTest: putAll of sorted keys causes 
> deadlock
> 
>
> Key: IGNITE-12746
> URL: https://issues.apache.org/jira/browse/IGNITE-12746
> Project: Ignite
>  Issue Type: Bug
>  Components: cache
>Reporter: Ilya Kasnacheev
>Assignee: Ivan Rakov
>Priority: Blocker
> Fix For: 2.8.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> After this commit:
> 7d4bb49264b IGNITE-12329 Invalid handling of remote entries causes partition 
> desync and transaction hanging in COMMITTING state.
> the following tests:
> org.apache.ignite.internal.processors.cache.distributed.dht.GridCacheColocatedDebugTest#testPutsMultithreadedColocated
> org.apache.ignite.internal.processors.cache.distributed.dht.GridCacheColocatedDebugTest#testPutsMultithreadedMixed
> started to be flaky because their ordered putAll operations started 
> deadlocking.
> This is a regression compared to 2.7 and should be fixed, since it may affect 
> production clusters.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12746) Regression in GridCacheColocatedDebugTest: putAll of sorted keys causes deadlock

2020-03-13 Thread Ivan Rakov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17058762#comment-17058762
 ] 

Ivan Rakov commented on IGNITE-12746:
-

Cherry-picked to 2.8.1: 8246bd8427a93c3a3706c4e24b46d1c8758579b1

> Regression in GridCacheColocatedDebugTest: putAll of sorted keys causes 
> deadlock
> 
>
> Key: IGNITE-12746
> URL: https://issues.apache.org/jira/browse/IGNITE-12746
> Project: Ignite
>  Issue Type: Bug
>  Components: cache
>Reporter: Ilya Kasnacheev
>Assignee: Ivan Rakov
>Priority: Blocker
> Fix For: 2.8.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> After this commit:
> 7d4bb49264b IGNITE-12329 Invalid handling of remote entries causes partition 
> desync and transaction hanging in COMMITTING state.
> the following tests:
> org.apache.ignite.internal.processors.cache.distributed.dht.GridCacheColocatedDebugTest#testPutsMultithreadedColocated
> org.apache.ignite.internal.processors.cache.distributed.dht.GridCacheColocatedDebugTest#testPutsMultithreadedMixed
> started to be flaky because their ordered putAll operations started 
> deadlocking.
> This is a regression compared to 2.7 and should be fixed, since it may affect 
> production clusters.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12746) Regression in GridCacheColocatedDebugTest: putAll of sorted keys causes deadlock

2020-03-13 Thread Ivan Rakov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17058761#comment-17058761
 ] 

Ivan Rakov commented on IGNITE-12746:
-

[~ascherbakov], agreed. Test added.
Merged to master.

> Regression in GridCacheColocatedDebugTest: putAll of sorted keys causes 
> deadlock
> 
>
> Key: IGNITE-12746
> URL: https://issues.apache.org/jira/browse/IGNITE-12746
> Project: Ignite
>  Issue Type: Bug
>  Components: cache
>Reporter: Ilya Kasnacheev
>Assignee: Ivan Rakov
>Priority: Blocker
> Fix For: 2.8.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> After this commit:
> 7d4bb49264b IGNITE-12329 Invalid handling of remote entries causes partition 
> desync and transaction hanging in COMMITTING state.
> the following tests:
> org.apache.ignite.internal.processors.cache.distributed.dht.GridCacheColocatedDebugTest#testPutsMultithreadedColocated
> org.apache.ignite.internal.processors.cache.distributed.dht.GridCacheColocatedDebugTest#testPutsMultithreadedMixed
> started to be flaky because their ordered putAll operations started 
> deadlocking.
> This is a regression compared to 2.7 and should be fixed, since it may affect 
> production clusters.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12746) Regression in GridCacheColocatedDebugTest: putAll of sorted keys causes deadlock

2020-03-12 Thread Ivan Rakov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17057915#comment-17057915
 ] 

Ivan Rakov commented on IGNITE-12746:
-

[~ilyak]
> What is the scope of this problem?
It's rare concurrent scenario which is possible with optimistic transactions 
(putAll to transactional cache works as optimistic + read_commited transaction).
Here's the flow that leads to the deadlock:
1) TX 1 adds MVCC candidates on primary node for keys 1, 3 and 5 (addLocal call 
on prepare phase)
2) TX 2 adds MVCC candidates on primary node for keys 2, 3 and 6 (addLocal call 
on prepare phase, XID 1 is first in candidates queue for key 3)
3) TX 2 acquires lock for key 2 (readyLocks call on prepare phase)
4) TX 1 acquires lock for key 1 (readyLocks call on prepare phase)
5) TX 2 tries to acquire lock for 3 (unsuccessfully: TX 1 becomes an owner 
instead: it's first in the queue and its previous chain item [key 1] in the 
thread chain was concurrently owned by TX 1) 
6) Neither TX 1 nor TX 2 continues processing of its thread chain

[~ascherbakov] 
I agree with all of your propositions. I guess I need a fresh TC visa then.

> Regression in GridCacheColocatedDebugTest: putAll of sorted keys causes 
> deadlock
> 
>
> Key: IGNITE-12746
> URL: https://issues.apache.org/jira/browse/IGNITE-12746
> Project: Ignite
>  Issue Type: Bug
>  Components: cache
>Reporter: Ilya Kasnacheev
>Assignee: Ivan Rakov
>Priority: Blocker
> Fix For: 2.8.1
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> After this commit:
> 7d4bb49264b IGNITE-12329 Invalid handling of remote entries causes partition 
> desync and transaction hanging in COMMITTING state.
> the following tests:
> org.apache.ignite.internal.processors.cache.distributed.dht.GridCacheColocatedDebugTest#testPutsMultithreadedColocated
> org.apache.ignite.internal.processors.cache.distributed.dht.GridCacheColocatedDebugTest#testPutsMultithreadedMixed
> started to be flaky because their ordered putAll operations started 
> deadlocking.
> This is a regression compared to 2.7 and should be fixed, since it may affect 
> production clusters.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12746) Regression in GridCacheColocatedDebugTest: putAll of sorted keys causes deadlock

2020-03-11 Thread Ivan Rakov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17056745#comment-17056745
 ] 

Ivan Rakov commented on IGNITE-12746:
-

[~ilyak] [~ascherbakov] Folks, can you please take a look?

> Regression in GridCacheColocatedDebugTest: putAll of sorted keys causes 
> deadlock
> 
>
> Key: IGNITE-12746
> URL: https://issues.apache.org/jira/browse/IGNITE-12746
> Project: Ignite
>  Issue Type: Bug
>  Components: cache
>Reporter: Ilya Kasnacheev
>Assignee: Ivan Rakov
>Priority: Blocker
> Fix For: 2.8.1
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> After this commit:
> 7d4bb49264b IGNITE-12329 Invalid handling of remote entries causes partition 
> desync and transaction hanging in COMMITTING state.
> the following tests:
> org.apache.ignite.internal.processors.cache.distributed.dht.GridCacheColocatedDebugTest#testPutsMultithreadedColocated
> org.apache.ignite.internal.processors.cache.distributed.dht.GridCacheColocatedDebugTest#testPutsMultithreadedMixed
> started to be flaky because their ordered putAll operations started 
> deadlocking.
> This is a regression compared to 2.7 and should be fixed, since it may affect 
> production clusters.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (IGNITE-12746) Regression in GridCacheColocatedDebugTest: putAll of sorted keys causes deadlock

2020-03-05 Thread Ivan Rakov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Rakov reassigned IGNITE-12746:
---

Assignee: Ivan Rakov  (was: Alexey Scherbakov)

> Regression in GridCacheColocatedDebugTest: putAll of sorted keys causes 
> deadlock
> 
>
> Key: IGNITE-12746
> URL: https://issues.apache.org/jira/browse/IGNITE-12746
> Project: Ignite
>  Issue Type: Bug
>  Components: cache
>Reporter: Ilya Kasnacheev
>Assignee: Ivan Rakov
>Priority: Blocker
> Fix For: 2.8
>
>
> After this commit:
> 7d4bb49264b IGNITE-12329 Invalid handling of remote entries causes partition 
> desync and transaction hanging in COMMITTING state.
> the following tests:
> org.apache.ignite.internal.processors.cache.distributed.dht.GridCacheColocatedDebugTest#testPutsMultithreadedColocated
> org.apache.ignite.internal.processors.cache.distributed.dht.GridCacheColocatedDebugTest#testPutsMultithreadedMixed
> started to be flaky because their ordered putAll operations started 
> deadlocking.
> This is a regression compared to 2.7 and should be fixed, since it may affect 
> production clusters.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12656) Cleanup GridCacheProcessor from functionality not related to its responsibility

2020-02-28 Thread Ivan Rakov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17047745#comment-17047745
 ] 

Ivan Rakov commented on IGNITE-12656:
-

[~sk0x50] Looks good to me.

> Cleanup GridCacheProcessor from functionality not related to its 
> responsibility
> ---
>
> Key: IGNITE-12656
> URL: https://issues.apache.org/jira/browse/IGNITE-12656
> Project: Ignite
>  Issue Type: Improvement
>Affects Versions: 2.8
>Reporter: Vyacheslav Koptilin
>Assignee: Vyacheslav Koptilin
>Priority: Major
> Fix For: 2.9
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, we have a couple of functionality in GridCacheProcessor not 
> directly related to its responsibility, like:
> * initQueryStructuresForNotStartedCache
> * addRemovedItemsCleanupTask
> * setTxOwnerDumpRequestsAllowed
> * longTransactionTimeDumpThreshold
> * transactionTimeDumpSamplesCoefficient
> * longTransactionTimeDumpSamplesPerSecondLimit
> * broadcastToNodesSupportingFeature
> * LocalAffinityFunction
> * RemovedItemsCleanupTask
> * TxTimeoutOnPartitionMapExchangeChangeFuture
> * enableRebalance
> We need to move them to the right places and make GridCacheProcessor code 
> cleaner.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-12509) CACHE_REBALANCE_STOPPED event raises for wrong caches in case of specified RebalanceDelay

2020-02-12 Thread Ivan Rakov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Rakov updated IGNITE-12509:

Labels: newbie  (was: )

> CACHE_REBALANCE_STOPPED event raises for wrong caches in case of specified 
> RebalanceDelay
> -
>
> Key: IGNITE-12509
> URL: https://issues.apache.org/jira/browse/IGNITE-12509
> Project: Ignite
>  Issue Type: Bug
>Reporter: Ivan Rakov
>Priority: Major
>  Labels: newbie
> Fix For: 2.9
>
> Attachments: RebalanceDelayTest.java
>
>
> Steps to reproduce:
> 1. Start in-memory cluster with 2 server nodes
> 2. Start 3 caches with different rebalance delays (e.g. 5, 10 and 15 seconds) 
> and upload some data
> 3. Start localListener for EVT_CACHE_REBALANCE_STOPPED event on one of the 
> nodes.
> 4. Start one more server node.
> 5. Wait for 5 seconds, till rebalance delay is reached.
> 6. EVT_CACHE_REBALANCE_STOPPED event received 3 times (1 for each cache), but 
> in fact only 1 cache was rebalanced. The same happens for the rest of the 
> caches.
> As result on rebalance finish we're getting event for each cache 
> [CACHE_COUNT] times, instead of 1.
> Reproducer attached.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-12508) GridCacheProcessor#cacheDescriptor(int) has O(N) complexity

2020-02-12 Thread Ivan Rakov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Rakov updated IGNITE-12508:

Labels: newbie  (was: )

> GridCacheProcessor#cacheDescriptor(int) has O(N) complexity
> ---
>
> Key: IGNITE-12508
> URL: https://issues.apache.org/jira/browse/IGNITE-12508
> Project: Ignite
>  Issue Type: Bug
>Reporter: Ivan Rakov
>Priority: Major
>  Labels: newbie
> Fix For: 2.9
>
>
> See the method code:
> {code}
> @Nullable public DynamicCacheDescriptor cacheDescriptor(int cacheId) {
> for (DynamicCacheDescriptor cacheDesc : cacheDescriptors().values()) {
> CacheConfiguration ccfg = cacheDesc.cacheConfiguration();
> assert ccfg != null : cacheDesc;
> if (CU.cacheId(ccfg.getName()) == cacheId)
> return cacheDesc;
> }
> return null;
> }
> {code}
> This method is invoked in several hot paths which causes significant 
> performance regression when the number of caches is large, for example, 
> logical recovery and security check for indexing.
> The method should be improved to use a hash map or similar data structure to 
> get a better complexity



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-12510) In-memory page eviction may fail in case very large entries are stored in the cache

2020-02-12 Thread Ivan Rakov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Rakov updated IGNITE-12510:

Labels: newbie  (was: )

> In-memory page eviction may fail in case very large entries are stored in the 
> cache
> ---
>
> Key: IGNITE-12510
> URL: https://issues.apache.org/jira/browse/IGNITE-12510
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.7.6
>Reporter: Ivan Rakov
>Priority: Major
>  Labels: newbie
>
> In-memory page eviction (both DataPageEvictionMode#RANDOM_LRU and 
> DataPageEvictionMode#RANDOM_2_LRU) has limited number of attempts to choose 
> candidate page for data removal:
> {code:java}
> if (sampleSpinCnt > SAMPLE_SPIN_LIMIT) { // 5000
> LT.warn(log, "Too many attempts to choose data page: " + 
> SAMPLE_SPIN_LIMIT);
> return;
> }
> {code}
> Large data entries are stored in several data pages which are sequentially 
> linked to each other. Only "head" pages are suitable as candidates for 
> eviction, because the whole entry is available only from "head" page (list of 
> pages is singly linked; there are no reverse links from tail to head).
> The problem is that if we put very large entries to evictable cache (e.g. 
> each entry needs more than 5000 pages to be stored), there are too few head 
> pages and "Too many attempts to choose data page" error is likely to show up.
> We need to perform something like full scan if we failed to find a head page 
> in SAMPLE_SPIN_LIMIT attempts instead of just failing node with error.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-12507) Implement cache size metric in bytes

2020-02-12 Thread Ivan Rakov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Rakov updated IGNITE-12507:

Labels: newbie  (was: )

> Implement cache size metric in bytes
> 
>
> Key: IGNITE-12507
> URL: https://issues.apache.org/jira/browse/IGNITE-12507
> Project: Ignite
>  Issue Type: Improvement
>  Components: cache
>Reporter: Ivan Rakov
>Priority: Major
>  Labels: newbie
> Fix For: 2.9
>
>
> There is a need to have cache size in bytes metric for pure in-memory case.
> When all data is in RAM, it is not obvious to find out exactly how much space 
> is consumed by cache data on a running node as the only things that could be 
> watched on is number of keys in partition on a specific node and memory usage 
> metrics on the machine.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-12297) Detect lost partitions is not happened during cluster activation

2020-02-12 Thread Ivan Rakov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Rakov updated IGNITE-12297:

Labels: newbie  (was: )

> Detect lost partitions is not happened during cluster activation
> 
>
> Key: IGNITE-12297
> URL: https://issues.apache.org/jira/browse/IGNITE-12297
> Project: Ignite
>  Issue Type: Bug
>  Components: cache
>Affects Versions: 2.4
>Reporter: Pavel Kovalenko
>Priority: Major
>  Labels: newbie
>
> We invoke `detectLostPartitions` during PME only if there is a server join or 
> server left.
> However,  we can activate a persistent cluster where a partition may have 
> MOVING status on all nodes. In this case, a partition may stay in MOVING 
> state forever before any other topology event. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-6820) Add data regions to data structures configuration

2020-02-12 Thread Ivan Rakov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-6820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Rakov updated IGNITE-6820:
---
Labels: newbie  (was: )

> Add data regions to data structures configuration
> -
>
> Key: IGNITE-6820
> URL: https://issues.apache.org/jira/browse/IGNITE-6820
> Project: Ignite
>  Issue Type: Improvement
>  Components: cache
>Affects Versions: 2.3
>Reporter: Alexey Goncharuk
>Priority: Major
>  Labels: newbie
> Fix For: 2.9
>
>
> Data structures configuration has cache group name but misses data region 
> name. This makes it tricky to move data structures to a different data 
> region. More specifically, it's hard to have default data region with 
> persistence disabled but configure data structures to be persistent.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-7414) Cluster restart may lead to cluster activation error

2020-02-12 Thread Ivan Rakov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-7414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Rakov updated IGNITE-7414:
---
Labels: newbie  (was: )

> Cluster restart may lead to cluster activation error
> 
>
> Key: IGNITE-7414
> URL: https://issues.apache.org/jira/browse/IGNITE-7414
> Project: Ignite
>  Issue Type: Bug
>  Components: general
>Affects Versions: 2.3
>Reporter: Vyacheslav Koptilin
>Assignee: Alexey Goncharuk
>Priority: Critical
>  Labels: newbie
> Fix For: 2.8
>
> Attachments: Reproducer.java
>
>
> Attempt to execute the following reproducer twice result in an error (please 
> see attached)
> {code}
> public static void main(String[] args) throws IgniteException {
> Ignite ignite = Ignition.start(createIgniteConfiguration());
> ignite.active(true);
> CacheConfiguration cacheCfg = new CacheConfiguration("test-cache");
> cacheCfg.setAtomicityMode(CacheAtomicityMode.ATOMIC);
> cacheCfg.setCacheMode(CacheMode.PARTITIONED);
> cacheCfg.setDataRegionName("inmemory");
> cacheCfg.setIndexedTypes(Integer.class, String.class);
> IgniteCache cache = ignite.getOrCreateCache(cacheCfg);
> cache.put(42, "value-42");
> ignite.close();
> }
> private static IgniteConfiguration createIgniteConfiguration() {
> IgniteConfiguration ignCfg = IgniteConfiguration();
> ...
> DataStorageConfiguration dc = new DataStorageConfiguration();
> // persistence enabled region
> DataRegionConfiguration persistenceRegion = new 
> DataRegionConfiguration();
> persistenceRegion.setName("persistence");
> persistenceRegion.setPersistenceEnabled(true);
> // persistence disabled region
> DataRegionConfiguration inmemoryRegion = new 
> DataRegionConfiguration();
> inmemoryRegion.setName("inmemory");
> inmemoryRegion.setInitialSize(100L * 1024 * 1024);
> inmemoryRegion.setMaxSize(500L * 1024 * 1024);
> inmemoryRegion.setPersistenceEnabled(false);
> dc.setDataRegionConfigurations(persistenceRegion, inmemoryRegion);
> ignCfg.setDataStorageConfiguration(dc);
> return ignCfg;
> }
> {code}
> The second execution failed with the exception as follows:
> {code}
> [2018-01-15 
> 17:12:52,431][ERROR][exchange-worker-#42%test-grid%][GridDhtPartitionsExchangeFuture]
>  Failed to activate node components 
> [nodeId=4ba59aa1-cbc5-4d67-8ca5-b6dd1628c5dc, client=false, 
> topVer=AffinityTopologyVersion [topVer=1, minorTopVer=1]]
> class org.apache.ignite.IgniteCheckedException: Failed to find cache group 
> descriptor [grpId=623628935]
>   at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.getPageMemoryForCacheGroup(GridCacheDatabaseSharedManager.java:1602)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restoreMemory(GridCacheDatabaseSharedManager.java:1544)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readCheckpointAndRestoreMemory(GridCacheDatabaseSharedManager.java:570)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onClusterStateChangeRequest(GridDhtPartitionsExchangeFuture.java:820)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:583)
>   at 
> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2279)
>   at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
>   at java.lang.Thread.run(Thread.java:748)
> Exception in thread "main" class org.apache.ignite.IgniteException: Failed to 
> activate cluster
>   at 
> org.apache.ignite.internal.util.IgniteUtils.convertException(IgniteUtils.java:980)
>   at 
> org.apache.ignite.internal.IgniteKernal.active(IgniteKernal.java:3318)
>   at org.apache.ignite.examples.Reproducer.main(Reproducer.java:33)
> Caused by: class org.apache.ignite.IgniteCheckedException: Failed to activate 
> cluster
>   Suppressed: class org.apache.ignite.IgniteCheckedException: Failed to 
> find cache group descriptor [grpId=623628935]
>   at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.getPageMemoryForCacheGroup(GridCacheDatabaseSharedManager.java:1602)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restoreMemory(GridCacheDatabaseSharedManager.java:1544)
>   at 
> 

[jira] [Updated] (IGNITE-8063) Transaction rollback is unmanaged in case when commit produced Runtime exception

2020-02-12 Thread Ivan Rakov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-8063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Rakov updated IGNITE-8063:
---
Labels: newbie  (was: )

> Transaction rollback is unmanaged in case when commit produced Runtime 
> exception
> 
>
> Key: IGNITE-8063
> URL: https://issues.apache.org/jira/browse/IGNITE-8063
> Project: Ignite
>  Issue Type: Improvement
>  Components: cache
>Affects Versions: 2.4
>Reporter: Pavel Kovalenko
>Assignee: Pavel Kovalenko
>Priority: Minor
>  Labels: newbie
> Fix For: 2.9
>
>
> When 'userCommit' produces an runtime exception transaction state is moved to 
> UNKNOWN, and tx.finishFuture() completes, after that rollback process runs 
> asynchronously and there is no simple way to await rollback completion on 
> such transactions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-9153) Accessing cache from transaction on client node, where it was not accessed yet throws an exception

2020-02-12 Thread Ivan Rakov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Rakov updated IGNITE-9153:
---
Labels: newbie  (was: )

> Accessing cache from transaction on client node, where it was not accessed 
> yet throws an exception
> --
>
> Key: IGNITE-9153
> URL: https://issues.apache.org/jira/browse/IGNITE-9153
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.6
>Reporter: Evgenii Zhuravlev
>Priority: Major
>  Labels: newbie
> Attachments: ClientCacheTransactionsTest.java
>
>
> Exception message: Cannot start/stop cache within lock or transaction. 
> Reproducer is attached: ClientCacheTransactionsTest.java



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-12315) In case of using ArrayBlockingQueue as key, cache.get() returns null.

2020-02-12 Thread Ivan Rakov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Rakov updated IGNITE-12315:

Labels: newbie  (was: )

> In case of using ArrayBlockingQueue as key, cache.get() returns null.
> -
>
> Key: IGNITE-12315
> URL: https://issues.apache.org/jira/browse/IGNITE-12315
> Project: Ignite
>  Issue Type: Bug
>Reporter: Alexander Lapin
>Priority: Major
>  Labels: newbie
>
> In case of using ArrayBlockingQueue as key, cache.get() returns null. In case 
> of using ArrayList or LinkedList everything works as expected.
> {code:java}
> ArrayBlockingQueue queueToCheck = new ArrayBlockingQueue<>(5);
> queueToCheck.addAll(Arrays.asList(1, 2, 3));
> cache.put(queueToCheck, "aaa");
> cache.get(queueToCheck); // returns null
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-12314) Unexpected return type in case of retrieving Byte[]{1,2,3} from cache value.

2020-02-12 Thread Ivan Rakov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Rakov updated IGNITE-12314:

Labels: newbie  (was: )

> Unexpected return type in case of retrieving Byte[]{1,2,3} from cache value.
> 
>
> Key: IGNITE-12314
> URL: https://issues.apache.org/jira/browse/IGNITE-12314
> Project: Ignite
>  Issue Type: Bug
>Reporter: Alexander Lapin
>Priority: Major
>  Labels: newbie
>
> Unexpected return type in case of retrieving Byte[]\{1,2,3} from cache value:
> {code:java}
> cache.put("aaa", new Byte[] {1, 2, 3});
> cache.get("aaa");{code}
> Byte[3]@... expected with corresponding content, however Object[3]@... got.
>  
> Seems that it's related to primitive wrapers, cause String[] as value works 
> as expected: 
> {code:java}
> cache.put("aaa", new String[] {"1", "2", "3"}); 
> cache.get("aaa");{code}
>  
> Arrays of primitives also works as expected:
> {code:java}
> cache.put("aaa", new byte[] {1, 2, 3});
> cache.get("aaa");{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-12621) Node leave may cause NullPointerException during IO message processing if security is enabled

2020-02-07 Thread Ivan Rakov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Rakov updated IGNITE-12621:

Release Note: Fixed possible NPE in security processor that can be 
reproduced on node leave under load

> Node leave may cause NullPointerException during IO message processing if 
> security is enabled
> -
>
> Key: IGNITE-12621
> URL: https://issues.apache.org/jira/browse/IGNITE-12621
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.8
>Reporter: Vyacheslav Koptilin
>Assignee: Vyacheslav Koptilin
>Priority: Major
> Fix For: 2.9
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In case the node will receive IO message from a dead node *after* receiving 
> discovery message about node fail, {{ctx.discovery().node(uuid)}} will return 
> {{null}}, which in turn will cause {{NullPointerException}}.
> We can fix it by peeking disco cache history for retrieving attributes of the 
> dead node.
> See:
> {code}
> /** {@inheritDoc} */
> @Override public OperationSecurityContext withContext(UUID nodeId) {
> return withContext(
> secCtxs.computeIfAbsent(nodeId,
> uuid -> nodeSecurityContext(
> marsh, U.resolveClassLoader(ctx.config()), 
> ctx.discovery().node(uuid)
> )
> )
> );
> }
> {code}
> {noformat}
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.ignite.internal.processors.security.SecurityUtils.nodeSecurityContext(SecurityUtils.java:135)
>   at 
> org.apache.ignite.internal.processors.security.IgniteSecurityProcessor.lambda$withContext$0(IgniteSecurityProcessor.java:112)
>   at 
> java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1660)
>   at 
> org.apache.ignite.internal.processors.security.IgniteSecurityProcessor.withContext(IgniteSecurityProcessor.java:111)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-12636) Full rebalance instead of a historical one

2020-02-07 Thread Ivan Rakov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Rakov updated IGNITE-12636:

Summary: Full rebalance instead of a historical one  (was: Full rebalance 
instead of historical one)

> Full rebalance instead of a historical one
> --
>
> Key: IGNITE-12636
> URL: https://issues.apache.org/jira/browse/IGNITE-12636
> Project: Ignite
>  Issue Type: Bug
>Reporter: Mirza Aliev
>Assignee: Mirza Aliev
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Common configuration:
>  1)4 nodes
>  2)3-4 caches
>  3)IGNITE_PDS_WAL_REBALANCE_THRESHOLD=500
>  4)walHistorySize=500
> 5)IGNITE_PDS_MAX_CHECKPOINT_MEMORY_HISTORY_SIZE=500
> Scenario:
>  1)Load a lot of data.
>  2)Start the transaction on some client but DON'T close it.
>  3)Stop the server node.
>  4)Start the server.
>  5)PME should be started
>  6)Kill the client started the transaction
> Expected result:
>  Rebalance is HISTORICAL.
> Actual result:
>  Rebalance is full:
>  
> {noformat}
> [12:01:58,613][INFO]sys-#95[GridDhtPartitionDemander] Started rebalance 
> routine [cache_group_6, supplier=5462dc46-f71f-49d8-8a1d-d9d69c3e372a, 
> topic=0, fullPartitions=[23], histPartitions=[]]
>  [12:01:58,614][INFO]sys-#109[GridDhtPartitionDemander] Started rebalance 
> routine [cache_group_6, supplier=8ab78982-0bcf-494f-a634-f3fb2d78328f, 
> topic=0, fullPartitions=[1], histPartitions=[]]
>  [12:01:58,614][INFO]sys-#101[GridDhtPartitionDemander] Started rebalance 
> routine [cache_group_6, supplier=8ab78982-0bcf-494f-a634-f3fb2d78328f, 
> topic=1, fullPartitions=[55], histPartitions=[]]
>  [12:01:59,004][INFO]sys-#99[GridDhtPartitionDemander] Started rebalance 
> routine [cache_group_4_118, supplier=5462dc46-f71f-49d8-8a1d-d9d69c3e372a, 
> topic=0, fullPartitions=[5], histPartitions=[]]
>  [12:01:59,004][INFO]sys-#96[GridDhtPartitionDemander] Started rebalance 
> routine [cache_group_4_118, supplier=48e2a2b5-2119-4b5c-873c-eb8d0c436b6a, 
> topic=0, fullPartitions=[15], histPartitions=[]]
>  [12:01:59,196][INFO]sys-#104[GridDhtPartitionDemander] Started rebalance 
> routine [cache_group_2_058, supplier=48e2a2b5-2119-4b5c-873c-eb8d0c436b6a, 
> topic=0, fullPartitions=[21], histPartitions=[]]
>  [12:01:59,196][INFO]sys-#95[GridDhtPartitionDemander] Started rebalance 
> routine [cache_group_2_058, supplier=8ab78982-0bcf-494f-a634-f3fb2d78328f, 
> topic=0, fullPartitions=[19], histPartitions=[]]{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12628) Add tests for jmx metrics return types

2020-02-07 Thread Ivan Rakov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17032460#comment-17032460
 ] 

Ivan Rakov commented on IGNITE-12628:
-

[~vmalinovskiy] Vladimir, hello!
Please pay attention to the Ignite code style.
{code:java}
Set beancClsNames = srv.queryMBeans(null, null).stream()
.map(ObjectInstance::getClassName)
.collect(toSet());
{code}
Ident for next lines in a long expression is 4 bytes.
{code:java}
try {
Ignite ignite = startGrid();

validateMbeans(ignite, 
"org.apache.ignite.spi.discovery.zk.ZookeeperDiscoverySpi$ZookeeperDiscoverySpiMBeanImpl");
}
finally {
stopAllGrids();
}
{code}
finally and catch clauses should be on a separate line.

> Add tests for jmx metrics return types
> --
>
> Key: IGNITE-12628
> URL: https://issues.apache.org/jira/browse/IGNITE-12628
> Project: Ignite
>  Issue Type: Bug
>Reporter: Vladimir Malinovskiy
>Assignee: Vladimir Malinovskiy
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Add tests than check if JMX metrics comply with Oracle best 
> practices([https://www.oracle.com/technetwork/java/javase/tech/best-practices-jsp-136021.html])
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (IGNITE-12628) Add tests for jmx metrics return types

2020-02-07 Thread Ivan Rakov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17032460#comment-17032460
 ] 

Ivan Rakov edited comment on IGNITE-12628 at 2/7/20 3:35 PM:
-

[~vmalinovskiy] Hello!
Please pay attention to the Ignite code style.
{code:java}
Set beancClsNames = srv.queryMBeans(null, null).stream()
.map(ObjectInstance::getClassName)
.collect(toSet());
{code}
Ident for next lines in a long expression is 4 bytes.
{code:java}
try {
Ignite ignite = startGrid();

validateMbeans(ignite, 
"org.apache.ignite.spi.discovery.zk.ZookeeperDiscoverySpi$ZookeeperDiscoverySpiMBeanImpl");
}
finally {
stopAllGrids();
}
{code}
finally and catch clauses should be on a separate line.


was (Author: ivan.glukos):
[~vmalinovskiy] Vladimir, hello!
Please pay attention to the Ignite code style.
{code:java}
Set beancClsNames = srv.queryMBeans(null, null).stream()
.map(ObjectInstance::getClassName)
.collect(toSet());
{code}
Ident for next lines in a long expression is 4 bytes.
{code:java}
try {
Ignite ignite = startGrid();

validateMbeans(ignite, 
"org.apache.ignite.spi.discovery.zk.ZookeeperDiscoverySpi$ZookeeperDiscoverySpiMBeanImpl");
}
finally {
stopAllGrids();
}
{code}
finally and catch clauses should be on a separate line.

> Add tests for jmx metrics return types
> --
>
> Key: IGNITE-12628
> URL: https://issues.apache.org/jira/browse/IGNITE-12628
> Project: Ignite
>  Issue Type: Bug
>Reporter: Vladimir Malinovskiy
>Assignee: Vladimir Malinovskiy
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Add tests than check if JMX metrics comply with Oracle best 
> practices([https://www.oracle.com/technetwork/java/javase/tech/best-practices-jsp-136021.html])
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12014) Getting affinity for topology version earlier than affinity is calculated for system cache

2020-02-05 Thread Ivan Rakov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17030564#comment-17030564
 ] 

Ivan Rakov commented on IGNITE-12014:
-

Please note that there's another corner case that weren't fixed in IGNITE-11465.
Most likely it will be resolved soon under IGNITE-12618 (its sceanrio is very 
similar to the scenario of this issue).

>  Getting affinity for topology version earlier than affinity is calculated 
> for system cache
> ---
>
> Key: IGNITE-12014
> URL: https://issues.apache.org/jira/browse/IGNITE-12014
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.7.5
>Reporter: PetrovMikhail
>Priority: Major
> Attachments: JcacheExchangeAwaitTest.java
>
>
> The following exception was being caught occasionally on a big cluster (128 
> nodes) after it's activation and concurrent Ignite#reentrantLock() method 
> call from different nodes. (On 16 nodes cluster this exception was never 
> detected with no code change).
> It's attached a presumptive reproducer of that problem which stable fails 
> with the specified exception.
> {code:java}
> java.lang.IllegalStateException: Getting affinity for topology version 
> earlier than affinity is calculated [locNode=TcpDiscoveryNode 
> [id=cf397493-7528-46dc-bc5a-444f9d51, consistentId=127.0.0.1:47501, 
> addrs=ArrayList [127.0.0.1], sockAddrs=HashSet [/127.0.0.1:47501], 
> discPort=47501, order=2, intOrder=2, lastExchangeTime=1564050248387, 
> loc=true, ver=2.8.0#20190725-sha1:, isClient=false], 
> grp=default-volatile-ds-group, topVer=AffinityTopologyVersion [topVer=2, 
> minorTopVer=1], lastAffChangeTopVer=AffinityTopologyVersion [topVer=2, 
> minorTopVer=1], head=AffinityTopologyVersion [topVer=2, minorTopVer=2], 
> history=[AffinityTopologyVersion [topVer=2, minorTopVer=2]]]
>   at 
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignmentCache.cachedAffinity(GridAffinityAssignmentCache.java:802)
>   at 
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignmentCache.cachedAffinity(GridAffinityAssignmentCache.java:749)
>   at 
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignmentCache.nodes(GridAffinityAssignmentCache.java:657)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheAffinityManager.nodesByPartition(GridCacheAffinityManager.java:227)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheAffinityManager.primaryByPartition(GridCacheAffinityManager.java:273)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheAffinityManager.primaryByKey(GridCacheAffinityManager.java:264)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheAffinityManager.primaryByKey(GridCacheAffinityManager.java:288)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.colocated.GridDhtColocatedCache.entryExx(GridDhtColocatedCache.java:161)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal.entryEx(GridNearTxLocal.java:4470)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal.enlistRead(GridNearTxLocal.java:2709)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal.getAllAsync(GridNearTxLocal.java:2188)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.colocated.GridDhtColocatedCache$4.op(GridDhtColocatedCache.java:204)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheAdapter$AsyncOp.op(GridCacheAdapter.java:5644)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheAdapter.asyncOp(GridCacheAdapter.java:4561)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.colocated.GridDhtColocatedCache.getAsync(GridDhtColocatedCache.java:202)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheAdapter.get(GridCacheAdapter.java:4842)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheAdapter.repairableGet(GridCacheAdapter.java:4808)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheAdapter.get(GridCacheAdapter.java:1480)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheProxyImpl.get(GridCacheProxyImpl.java:396)
>   at 
> org.apache.ignite.internal.processors.datastructures.DataStructuresProcessor$4.applyx(DataStructuresProcessor.java:561)
>   at 
> org.apache.ignite.internal.processors.datastructures.DataStructuresProcessor$4.applyx(DataStructuresProcessor.java:556)
>   at 
> org.apache.ignite.internal.processors.datastructures.DataStructuresProcessor.retryTopologySafe(DataStructuresProcessor.java:1664)
>   at 
> 

[jira] [Created] (IGNITE-12622) Forbid mixed cache groups with both atomic and transactional caches

2020-02-04 Thread Ivan Rakov (Jira)
Ivan Rakov created IGNITE-12622:
---

 Summary: Forbid mixed cache groups with both atomic and 
transactional caches
 Key: IGNITE-12622
 URL: https://issues.apache.org/jira/browse/IGNITE-12622
 Project: Ignite
  Issue Type: Improvement
  Components: cache
Reporter: Ivan Rakov
 Fix For: 2.9


Apparently it's possible in Ignite to configure a cache group with both ATOMIC 
and TRANSACTIONAL caches.
IgniteCacheGroupsTest#testContinuousQueriesMultipleGroups* tests.
As per discussed on dev list 
(http://apache-ignite-developers.2346864.n4.nabble.com/Forbid-mixed-cache-groups-with-both-atomic-and-transactional-caches-td45586.html),
 the community has concluded that such configurations should be prohibited.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12607) PartitionsExchangeAwareTest is flaky

2020-01-31 Thread Ivan Rakov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17027682#comment-17027682
 ] 

Ivan Rakov commented on IGNITE-12607:
-

The test doesn't seem to be flaky anymore: 
https://ci.ignite.apache.org/buildConfiguration/IgniteTests24Java8_Cache6?branch=pull%2F7339%2Fhead=builds
[~slava.koptilin] Can you please take a look?

> PartitionsExchangeAwareTest is flaky
> 
>
> Key: IGNITE-12607
> URL: https://issues.apache.org/jira/browse/IGNITE-12607
> Project: Ignite
>  Issue Type: Bug
>Reporter: Ivan Rakov
>Assignee: Ivan Rakov
>Priority: Major
> Fix For: 2.9
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Proof: 
> https://ci.ignite.apache.org/buildConfiguration/IgniteTests24Java8_Cache6/4972239
> Seems like cache update sometimes is not possible even before topologies are 
> locked.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12601) DistributedMetaStoragePersistentTest.testUnstableTopology is flaky

2020-01-30 Thread Ivan Rakov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17027061#comment-17027061
 ] 

Ivan Rakov commented on IGNITE-12601:
-

Seems like the test didn't fail on PR branch.
Merged to master and ignite-2.8.

> DistributedMetaStoragePersistentTest.testUnstableTopology is flaky
> --
>
> Key: IGNITE-12601
> URL: https://issues.apache.org/jira/browse/IGNITE-12601
> Project: Ignite
>  Issue Type: Bug
>Reporter: Vyacheslav Koptilin
>Assignee: Vyacheslav Koptilin
>Priority: Major
> Fix For: 2.8
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> DistributedMetaStoragePersistentTest.testUnstableTopology is flaky
> Please take a look at TC:
> https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=5923369202582779855=testDetails



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-12607) PartitionsExchangeAwareTest is flaky

2020-01-30 Thread Ivan Rakov (Jira)
Ivan Rakov created IGNITE-12607:
---

 Summary: PartitionsExchangeAwareTest is flaky
 Key: IGNITE-12607
 URL: https://issues.apache.org/jira/browse/IGNITE-12607
 Project: Ignite
  Issue Type: Bug
Reporter: Ivan Rakov
Assignee: Ivan Rakov
 Fix For: 2.9


Proof: 
https://ci.ignite.apache.org/buildConfiguration/IgniteTests24Java8_Cache6/4972239
Seems like cache update sometimes is not possible even before topologies are 
locked.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12447) Modification of S#compact method

2020-01-29 Thread Ivan Rakov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17025874#comment-17025874
 ] 

Ivan Rakov commented on IGNITE-12447:
-

[~ktkale...@gridgain.com] I'm ok with the solution, but can you add a few unit 
tests for the new method?

> Modification of S#compact method
> 
>
> Key: IGNITE-12447
> URL: https://issues.apache.org/jira/browse/IGNITE-12447
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Kirill Tkalenko
>Assignee: Kirill Tkalenko
>Priority: Major
> Fix For: 2.9
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Modification of S#compact method so that it is possible to pass collection of 
> Numbers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-11797) Fix consistency issues for atomic and mixed tx-atomic cache groups.

2020-01-29 Thread Ivan Rakov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-11797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17025866#comment-17025866
 ] 

Ivan Rakov commented on IGNITE-11797:
-

[~ascherbakov] Looks good, please merge.

> Fix consistency issues for atomic and mixed tx-atomic cache groups.
> ---
>
> Key: IGNITE-11797
> URL: https://issues.apache.org/jira/browse/IGNITE-11797
> Project: Ignite
>  Issue Type: Bug
>Reporter: Alexey Scherbakov
>Assignee: Alexey Scherbakov
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> IGNITE-10078 only solves consistency problems for tx mode.
> For atomic caches the rebalance consistency issues still remain and should be 
> fixed together with improvement of atomic cache protocol consistency.
> Also, need to disable dynamic start of atomic cache in group having only tx 
> caches because it's not working in current state.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-11797) Fix consistency issues for atomic and mixed tx-atomic cache groups.

2020-01-29 Thread Ivan Rakov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Rakov updated IGNITE-11797:

Reviewer: Ivan Rakov

> Fix consistency issues for atomic and mixed tx-atomic cache groups.
> ---
>
> Key: IGNITE-11797
> URL: https://issues.apache.org/jira/browse/IGNITE-11797
> Project: Ignite
>  Issue Type: Bug
>Reporter: Alexey Scherbakov
>Assignee: Alexey Scherbakov
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> IGNITE-10078 only solves consistency problems for tx mode.
> For atomic caches the rebalance consistency issues still remain and should be 
> fixed together with improvement of atomic cache protocol consistency.
> Also, need to disable dynamic start of atomic cache in group having only tx 
> caches because it's not working in current state.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12557) Destroy of big cache which is not only cache in cache group causes IgniteOOME

2020-01-27 Thread Ivan Rakov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17024224#comment-17024224
 ] 

Ivan Rakov commented on IGNITE-12557:
-

The fix looks good to me as a workaround. 
I'm ok with merging this patch (it isn't complete, but doesn't make the 
situation worse) and continuation of work on async destroy and crash recovery 
guarantees under aforementioned separate tickets.
[~alex_pl] Do you agree?

> Destroy of big cache which is not only cache in cache group causes IgniteOOME
> -
>
> Key: IGNITE-12557
> URL: https://issues.apache.org/jira/browse/IGNITE-12557
> Project: Ignite
>  Issue Type: Bug
>  Components: persistence
>Reporter: Aleksey Plekhanov
>Assignee: Alexey Scherbakov
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When {{destroyCache()}} is invoked {{checkpointReadLock}} is held by exchange 
> thread during all time cache entries are cleaning. Meanwhile, 
> {{db-checkpoint-thread}} can't acquire checkpoint write lock and can't start 
> checkpoint. After some time all page-memory has filled with dirty pages and 
> attempt to acquire a new page causes IgniteOOM exception:
> {noformat}
> class org.apache.ignite.internal.mem.IgniteOutOfMemoryException: Failed to 
> find a page for eviction [segmentCapacity=40485, loaded=15881, 
> maxDirtyPages=11910, dirtyPages=15881, cpPages=0, pinnedInSegment=0, 
> failedToPrepare=15881]
> at 
> org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl$Segment.tryToFindSequentially(PageMemoryImpl.java:2420)
> at 
> org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl$Segment.removePageForReplacement(PageMemoryImpl.java:2314)
> at 
> org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.acquirePage(PageMemoryImpl.java:743)
> at 
> org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.acquirePage(PageMemoryImpl.java:679)
> at 
> org.apache.ignite.internal.processors.cache.persistence.DataStructure.acquirePage(DataStructure.java:158)
> at 
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.acquirePage(BPlusTree.java:5872)
> at 
> org.apache.ignite.internal.processors.cache.tree.CacheDataTree.compareKeys(CacheDataTree.java:435)
> at 
> org.apache.ignite.internal.processors.cache.tree.CacheDataTree.compare(CacheDataTree.java:384)
> at 
> org.apache.ignite.internal.processors.cache.tree.CacheDataTree.compare(CacheDataTree.java:63)
> at 
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.compare(BPlusTree.java:5214)
> at 
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findInsertionPoint(BPlusTree.java:5134)
> at 
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Search.run0(BPlusTree.java:298)
> at 
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$GetPageHandler.run(BPlusTree.java:5723)
> at 
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Search.run(BPlusTree.java:278)
> at 
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$GetPageHandler.run(BPlusTree.java:5709)
> at 
> org.apache.ignite.internal.processors.cache.persistence.tree.util.PageHandler.readPage(PageHandler.java:169)
> at 
> org.apache.ignite.internal.processors.cache.persistence.DataStructure.read(DataStructure.java:364)
> at 
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.read(BPlusTree.java:5910)
> at 
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.removeDown(BPlusTree.java:2077)
> at 
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.doRemove(BPlusTree.java:2007)
> at 
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.removex(BPlusTree.java:1838)
> at 
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.clear(IgniteCacheOffheapManagerImpl.java:2963)
> at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.clear(GridCacheOffheapManager.java:2611)
> at 
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.removeCacheData(IgniteCacheOffheapManagerImpl.java:296)
> at 
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.stopCache(IgniteCacheOffheapManagerImpl.java:258)
> at 
> org.apache.ignite.internal.processors.cache.CacheGroupContext.stopCache(CacheGroupContext.java:825)
> at 
> org.apache.ignite.internal.processors.cache.GridCacheProcessor.stopCache(GridCacheProcessor.java:1070)
> at 
> 

[jira] [Updated] (IGNITE-12447) Modification of S#compact method

2020-01-23 Thread Ivan Rakov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Rakov updated IGNITE-12447:

Fix Version/s: (was: 2.8)
   2.9

> Modification of S#compact method
> 
>
> Key: IGNITE-12447
> URL: https://issues.apache.org/jira/browse/IGNITE-12447
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Kirill Tkalenko
>Assignee: Kirill Tkalenko
>Priority: Major
> Fix For: 2.9
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Modification of S#compact method so that it is possible to pass collection of 
> Numbers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12497) PartitionsEvictManager should log all partitions which will be evicted

2020-01-22 Thread Ivan Rakov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17020916#comment-17020916
 ] 

Ivan Rakov commented on IGNITE-12497:
-

[~ktkale...@gridgain.com] Looks good, thanks, merged.

> PartitionsEvictManager should log all partitions which will be evicted
> --
>
> Key: IGNITE-12497
> URL: https://issues.apache.org/jira/browse/IGNITE-12497
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Kirill Tkalenko
>Assignee: Kirill Tkalenko
>Priority: Major
> Fix For: 2.9
>
>
> Now we print information about eviction only if it longer then threshold 
> (i.e. progress). And we can't detect in logs, that partition was evicted due 
> to different reasons (rebalance, wrong configuration custom affinity). 
> I think we could print information on info level about each evicted partition 
> before start of eviction. Information about partitions could be aggregated, 
> compacted and printed by timer, but *all evicted partitions must be printed 
> to log anyway.*
> I would have the following information about each partition:
> * partitionId
> * groupId
> * groupName
> * reason (eviction, clearing)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12530) Pages list caching can cause IgniteOOME when checkpoint is triggered by "too many dirty pages" reason

2020-01-20 Thread Ivan Rakov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17019491#comment-17019491
 ] 

Ivan Rakov commented on IGNITE-12530:
-

[~alex_pl] Indeed, I didn't catch it. Thanks.
Looks good to me.

> Pages list caching can cause IgniteOOME when checkpoint is triggered by "too 
> many dirty pages" reason
> -
>
> Key: IGNITE-12530
> URL: https://issues.apache.org/jira/browse/IGNITE-12530
> Project: Ignite
>  Issue Type: Bug
>  Components: persistence
>Affects Versions: 2.8
>Reporter: Aleksey Plekhanov
>Assignee: Aleksey Plekhanov
>Priority: Major
> Attachments: screenshot-1.png, screenshot-2.png
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> When a checkpoint is triggered, we need some amount of page memory to store 
> pages list on-heap cache.
> If data region is too small, a checkpoint is triggered by "too many dirty 
> pages" reason and pages list cache is rather big, we can get 
> IgniteOutOfMemoryException.
> Reproducer:
> {code:java}
> @Override protected IgniteConfiguration getConfiguration(String name) throws 
> Exception {
> IgniteConfiguration cfg = super.getConfiguration(name);
> cfg.setDataStorageConfiguration(new DataStorageConfiguration()
> .setDefaultDataRegionConfiguration(new DataRegionConfiguration()
> .setPersistenceEnabled(true)
> .setMaxSize(50 * 1024 * 1024)
> ));
> return cfg;
> }
> @Test
> public void testUpdatesNotFittingIntoMemoryRegion() throws Exception {
> IgniteEx ignite = startGrid(0);
> ignite.cluster().active(true);
> ignite.getOrCreateCache(DEFAULT_CACHE_NAME);
> try (IgniteDataStreamer streamer = 
> ignite.dataStreamer(DEFAULT_CACHE_NAME)) {
> for (int i = 0; i < 100_000; i++)
> streamer.addData(i, new byte[i % 2048]);
> }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12546) Prevent partitions owned by other nodes switch their state to MOVING due to counter difference on node join.

2020-01-20 Thread Ivan Rakov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17019407#comment-17019407
 ] 

Ivan Rakov commented on IGNITE-12546:
-

[~ascherbakov] Looks good.

> Prevent partitions owned by other nodes switch their state to MOVING due to 
> counter difference on node join.
> 
>
> Key: IGNITE-12546
> URL: https://issues.apache.org/jira/browse/IGNITE-12546
> Project: Ignite
>  Issue Type: Improvement
>Affects Versions: 2.7.6
>Reporter: Alexey Scherbakov
>Assignee: Alexey Scherbakov
>Priority: Major
> Fix For: 2.9
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When node joins it's expected that MOVING partitions can only belong to 
> joining node.
> But if somehow counters are desynced other nodes can switch state of owning 
> partitions to MOVING too, causing spurious rebalancing and assertions on 
> mapping to moving primary.
> Possible solution: exclude other nodes partitions from switching state to 
> MOVING on node join.
> This only affects persistent groups.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12548) Possible tx desync during recovery on near node left.

2020-01-20 Thread Ivan Rakov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17019408#comment-17019408
 ] 

Ivan Rakov commented on IGNITE-12548:
-

[~ascherbakov] Looks good.

> Possible tx desync during recovery on near node left.
> -
>
> Key: IGNITE-12548
> URL: https://issues.apache.org/jira/browse/IGNITE-12548
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.7.6
>Reporter: Alexey Scherbakov
>Assignee: Alexey Scherbakov
>Priority: Blocker
> Fix For: 2.9
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The problem appears if a transaction is starting to rollback in PREPARED 
> state for some reason and concurrently near node is left triggering tx 
> recovery protocol.
> Consider having two enlisted keys from different partitions mapped to 
> different nodes N1 and N2.
> Due to race N1 local tx can be rolled back while N2 local tx is committed 
> breaking tx atomicity guarantee.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-12101) IgniteQueue.removeAll throws NPE

2020-01-17 Thread Ivan Rakov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Rakov updated IGNITE-12101:

Release Note: Fixed NullPointerException when IgniteQueue.removeAll is 
called

> IgniteQueue.removeAll throws NPE
> 
>
> Key: IGNITE-12101
> URL: https://issues.apache.org/jira/browse/IGNITE-12101
> Project: Ignite
>  Issue Type: Bug
>  Components: data structures
>Affects Versions: 2.5
>Reporter: Denis A. Magda
>Assignee: Vyacheslav Koptilin
>Priority: Major
> Fix For: 2.9
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> See more details here:
> https://stackoverflow.com/questions/57473783/ignite-2-5-ignitequeue-removeall-throwing-npe
> {noformat}
> 2019-08-09 18:18:39,241 ERROR [Inbound-Main-Pool-13] [TransactionId: 
> e5b5bfe3-5246-4d54-a4d6-acd550240e13 Request ID - 27845] [ APP=Server, 
> ACTION=APP_PROCESS, USER=tsgops ] ProcessWorkflowProcessor - Error while 
> processing CLIENT process 
> class org.apache.ignite.IgniteException: Failed to serialize object 
> [typeName=LinkedList] 
>at 
> org.apache.ignite.internal.util.IgniteUtils.convertException(IgniteUtils.java:990)
>  
>at 
> org.apache.ignite.internal.processors.datastructures.GridCacheQueueAdapter$QueueIterator.remove(GridCacheQueueAdapter.java:687)
>  
>at 
> java.util.AbstractCollection.removeAll(AbstractCollection.java:376) 
>at 
> org.apache.ignite.internal.processors.datastructures.GridCacheQueueProxy.removeAll(GridCacheQueueProxy.java:180)
>  
>at 
> com.me.app.service.support.APPOrderProcessIgniteQueueService.removeAll(APPOrderProcessIgniteQueueService.java:63)
>  
>at 
> com.me.app.service.support.APPOrderContextProcessInputManager.removeAllFromCurrentProcessing(APPOrderContextProcessInputManager.java:201)
>  
>at 
> com.me.app.service.support.APPOrderContextProcessInputManager.lambda$removeAll$3(APPOrderContextProcessInputManager.java:100)
>  
>at java.lang.Iterable.forEach(Iterable.java:75) 
>at 
> com.me.app.service.support.APPOrderContextProcessInputManager.removeAll(APPOrderContextProcessInputManager.java:100)
>  
>at 
> com.me.app.service.support.APPOrderContextProcessInputManager.removeAll(APPOrderContextProcessInputManager.java:90)
>  
>at 
> com.me.app.processor.support.ProcessWorkflowProcessor.processOrders(ProcessWorkflowProcessor.java:602)
>  
>at 
> com.me.app.processor.support.ProcessWorkflowProcessor.lambda$null$13(ProcessWorkflowProcessor.java:405)
>  
>at java.util.HashMap.forEach(HashMap.java:1289) 
>at 
> com.me.app.processor.support.ProcessWorkflowProcessor.lambda$null$14(ProcessWorkflowProcessor.java:368)
>  
>at java.util.HashMap.forEach(HashMap.java:1289) 
>at 
> com.me.app.processor.support.ProcessWorkflowProcessor.lambda$null$15(ProcessWorkflowProcessor.java:354)
>  
>at java.util.HashMap.forEach(HashMap.java:1289) 
>at 
> com.me.app.processor.support.ProcessWorkflowProcessor.lambda$null$16(ProcessWorkflowProcessor.java:345)
>  
>at java.util.HashMap.forEach(HashMap.java:1289) 
>at 
> com.me.app.processor.support.ProcessWorkflowProcessor.lambda$executeProcess$17(ProcessWorkflowProcessor.java:337)
>  
>at java.util.HashMap.forEach(HashMap.java:1289) 
>at 
> com.me.app.processor.support.ProcessWorkflowProcessor.executeProcess(ProcessWorkflowProcessor.java:330)
>  
>at 
> com.me.app.processor.support.ProcessWorkflowProcessor.executeProcess(ProcessWorkflowProcessor.java:302)
>  
>at 
> com.me.app.processor.support.ProcessWorkflowProcessor.lambda$processProcessFromQueue$6(ProcessWorkflowProcessor.java:282)
>  
>at 
> com.me.app.locking.support.IgniteLockingService.execute(IgniteLockingService.java:39)
>  
>at 
> com.me.app.locking.support.IgniteLockingService.execute(IgniteLockingService.java:68)
>  
>at 
> com.me.app.processor.support.ProcessWorkflowProcessor.processProcessFromQueue(ProcessWorkflowProcessor.java:281)
>  
>at 
> com.me.app.facade.listener.support.APPProcessEventListener.listen(APPProcessEventListener.java:49)
>  
>at 
> com.me.app.facade.listener.support.APPProcessEventListener.listen(APPProcessEventListener.java:19)
>  
>at 
> com.me.app.common.listener.support.AbstractEventListener.onMessage(AbstractEventListener.java:44)
>  
>at 
> com.me.app.common.listener.support.AbstractEventListener$$FastClassBySpringCGLIB$$f1379f74.invoke()
>  
>at 
> 

[jira] [Commented] (IGNITE-12551) Partition desync if a partition is evicted then owned again and historically rebalanced

2020-01-17 Thread Ivan Rakov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017952#comment-17017952
 ] 

Ivan Rakov commented on IGNITE-12551:
-

[~ascherbakov] Please merge.

> Partition desync if a partition is evicted then owned again and historically 
> rebalanced
> ---
>
> Key: IGNITE-12551
> URL: https://issues.apache.org/jira/browse/IGNITE-12551
> Project: Ignite
>  Issue Type: Bug
>  Components: persistence
>Affects Versions: 2.7.6
>Reporter: Alexey Scherbakov
>Assignee: Alexey Scherbakov
>Priority: Major
> Fix For: 2.9
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Where is a possibility of partition desync in the following scenario:
> 1. Some partition is evicted with non zero counters.
> 2. It is owned again and are going to be rebalanced.
> 3. Some node in a grid has history for the partition defined by it's 
> (initial, current) counters pair.
> In this scenario the partition will be historically rebalanced having only 
> partial data.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-12531) Cluster is unable to change BLT on 2.8 if storage was initially created on 2.7 or less

2020-01-17 Thread Ivan Rakov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Rakov updated IGNITE-12531:

Reviewer: Ivan Rakov

> Cluster is unable to change BLT on 2.8 if storage was initially created on 
> 2.7 or less
> --
>
> Key: IGNITE-12531
> URL: https://issues.apache.org/jira/browse/IGNITE-12531
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.8
>Reporter: Ivan Rakov
>Assignee: Vyacheslav Koptilin
>Priority: Blocker
> Fix For: 2.8
>
> Attachments: TestBltChangeFail.java
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Due to bug in https://issues.apache.org/jira/browse/IGNITE-10348, after 
> storage migration from 2.7- to 2.8 any updates of metastorage are not 
> persisted.
> S2R:
> (on 2.7)
> - Activate persistent cluster with 2 nodes
> - Shutdown the cluster
> (on 2.8)
> - Start cluster with 2 nodes based on persistent storage from 2.7
> - Start 3rd node
> - Change baseline
> - Shutdown the cluster
> - Start initial two nodes
> - Start 3rd node (join is rejected: first two nodes has old BLT of two nodes, 
> 3rd node has new BLT of three nodes)
> Reproducer is attached.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12531) Cluster is unable to change BLT on 2.8 if storage was initially created on 2.7 or less

2020-01-17 Thread Ivan Rakov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017822#comment-17017822
 ] 

Ivan Rakov commented on IGNITE-12531:
-

[~slava.koptilin] looks good.

> Cluster is unable to change BLT on 2.8 if storage was initially created on 
> 2.7 or less
> --
>
> Key: IGNITE-12531
> URL: https://issues.apache.org/jira/browse/IGNITE-12531
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.8
>Reporter: Ivan Rakov
>Assignee: Vyacheslav Koptilin
>Priority: Blocker
> Fix For: 2.8
>
> Attachments: TestBltChangeFail.java
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Due to bug in https://issues.apache.org/jira/browse/IGNITE-10348, after 
> storage migration from 2.7- to 2.8 any updates of metastorage are not 
> persisted.
> S2R:
> (on 2.7)
> - Activate persistent cluster with 2 nodes
> - Shutdown the cluster
> (on 2.8)
> - Start cluster with 2 nodes based on persistent storage from 2.7
> - Start 3rd node
> - Change baseline
> - Shutdown the cluster
> - Start initial two nodes
> - Start 3rd node (join is rejected: first two nodes has old BLT of two nodes, 
> 3rd node has new BLT of three nodes)
> Reproducer is attached.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12530) Pages list caching can cause IgniteOOME when checkpoint is triggered by "too many dirty pages" reason

2020-01-16 Thread Ivan Rakov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017412#comment-17017412
 ] 

Ivan Rakov commented on IGNITE-12530:
-

[~alex_pl] Looks good, please merge.

> Pages list caching can cause IgniteOOME when checkpoint is triggered by "too 
> many dirty pages" reason
> -
>
> Key: IGNITE-12530
> URL: https://issues.apache.org/jira/browse/IGNITE-12530
> Project: Ignite
>  Issue Type: Bug
>  Components: persistence
>Affects Versions: 2.8
>Reporter: Aleksey Plekhanov
>Assignee: Aleksey Plekhanov
>Priority: Major
> Attachments: screenshot-1.png
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> When a checkpoint is triggered, we need some amount of page memory to store 
> pages list on-heap cache.
> If data region is too small, a checkpoint is triggered by "too many dirty 
> pages" reason and pages list cache is rather big, we can get 
> IgniteOutOfMemoryException.
> Reproducer:
> {code:java}
> @Override protected IgniteConfiguration getConfiguration(String name) throws 
> Exception {
> IgniteConfiguration cfg = super.getConfiguration(name);
> cfg.setDataStorageConfiguration(new DataStorageConfiguration()
> .setDefaultDataRegionConfiguration(new DataRegionConfiguration()
> .setPersistenceEnabled(true)
> .setMaxSize(50 * 1024 * 1024)
> ));
> return cfg;
> }
> @Test
> public void testUpdatesNotFittingIntoMemoryRegion() throws Exception {
> IgniteEx ignite = startGrid(0);
> ignite.cluster().active(true);
> ignite.getOrCreateCache(DEFAULT_CACHE_NAME);
> try (IgniteDataStreamer streamer = 
> ignite.dataStreamer(DEFAULT_CACHE_NAME)) {
> for (int i = 0; i < 100_000; i++)
> streamer.addData(i, new byte[i % 2048]);
> }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12530) Pages list caching can cause IgniteOOME when checkpoint is triggered by "too many dirty pages" reason

2020-01-15 Thread Ivan Rakov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016158#comment-17016158
 ] 

Ivan Rakov commented on IGNITE-12530:
-

[~alex_pl] Thanks for your patch, I totally agree with the solution.
I'd propose to add a test where we check that after intensive multithreaded 
load and after checkpoint cache limit counter's value is equal to its maximum 
level (totalPages * threshold). Thus we'll check that for every increment 
there's a decrement.

> Pages list caching can cause IgniteOOME when checkpoint is triggered by "too 
> many dirty pages" reason
> -
>
> Key: IGNITE-12530
> URL: https://issues.apache.org/jira/browse/IGNITE-12530
> Project: Ignite
>  Issue Type: Bug
>  Components: persistence
>Affects Versions: 2.8
>Reporter: Aleksey Plekhanov
>Assignee: Aleksey Plekhanov
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> When a checkpoint is triggered, we need some amount of page memory to store 
> pages list on-heap cache.
> If data region is too small, a checkpoint is triggered by "too many dirty 
> pages" reason and pages list cache is rather big, we can get 
> IgniteOutOfMemoryException.
> Reproducer:
> {code:java}
> @Override protected IgniteConfiguration getConfiguration(String name) throws 
> Exception {
> IgniteConfiguration cfg = super.getConfiguration(name);
> cfg.setDataStorageConfiguration(new DataStorageConfiguration()
> .setDefaultDataRegionConfiguration(new DataRegionConfiguration()
> .setPersistenceEnabled(true)
> .setMaxSize(50 * 1024 * 1024)
> ));
> return cfg;
> }
> @Test
> public void testUpdatesNotFittingIntoMemoryRegion() throws Exception {
> IgniteEx ignite = startGrid(0);
> ignite.cluster().active(true);
> ignite.getOrCreateCache(DEFAULT_CACHE_NAME);
> try (IgniteDataStreamer streamer = 
> ignite.dataStreamer(DEFAULT_CACHE_NAME)) {
> for (int i = 0; i < 100_000; i++)
> streamer.addData(i, new byte[i % 2048]);
> }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12545) Introduce listener interface for components to react to partition map exchange events

2020-01-15 Thread Ivan Rakov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016120#comment-17016120
 ] 

Ivan Rakov commented on IGNITE-12545:
-

Code changes shouldn't semantically affect the code, but TC is started anyway.
[~mmuzaf], would you please take a look?

> Introduce listener interface for components to react to partition map 
> exchange events
> -
>
> Key: IGNITE-12545
> URL: https://issues.apache.org/jira/browse/IGNITE-12545
> Project: Ignite
>  Issue Type: Bug
>Reporter: Ivan Rakov
>Assignee: Ivan Rakov
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> It would be handly to have listener interface for components that should 
> react to PME instead of just adding more and more calls to 
> GridDhtPartitionsExchangeFuture.
> In general, there are four possible moments when a compnent can be notified: 
> on exchnage init (before and after topologies are updates and exchange latch 
> is acquired) and on exchange done (before and after readyTopVer is 
> incremented and user operations are unlocked).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (IGNITE-12545) Introduce listener interface for components to react to partition map exchange events

2020-01-15 Thread Ivan Rakov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Rakov reassigned IGNITE-12545:
---

Assignee: Ivan Rakov

> Introduce listener interface for components to react to partition map 
> exchange events
> -
>
> Key: IGNITE-12545
> URL: https://issues.apache.org/jira/browse/IGNITE-12545
> Project: Ignite
>  Issue Type: Bug
>Reporter: Ivan Rakov
>Assignee: Ivan Rakov
>Priority: Major
>
> It would be handly to have listener interface for components that should 
> react to PME instead of just adding more and more calls to 
> GridDhtPartitionsExchangeFuture.
> In general, there are four possible moments when a compnent can be notified: 
> on exchnage init (before and after topologies are updates and exchange latch 
> is acquired) and on exchange done (before and after readyTopVer is 
> incremented and user operations are unlocked).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-12545) Introduce listener interface for components to react to partition map exchange events

2020-01-15 Thread Ivan Rakov (Jira)
Ivan Rakov created IGNITE-12545:
---

 Summary: Introduce listener interface for components to react to 
partition map exchange events
 Key: IGNITE-12545
 URL: https://issues.apache.org/jira/browse/IGNITE-12545
 Project: Ignite
  Issue Type: Bug
Reporter: Ivan Rakov


It would be handly to have listener interface for components that should react 
to PME instead of just adding more and more calls to 
GridDhtPartitionsExchangeFuture.
In general, there are four possible moments when a compnent can be notified: on 
exchnage init (before and after topologies are updates and exchange latch is 
acquired) and on exchange done (before and after readyTopVer is incremented and 
user operations are unlocked).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   4   5   6   7   8   9   >