[jira] [Created] (IGNITE-12422) Clean up GG-XXX internal ticket references from code base.

2019-12-05 Thread Alexei Scherbakov (Jira)
Alexei Scherbakov created IGNITE-12422:
--

 Summary: Clean up GG-XXX internal ticket references from code base.
 Key: IGNITE-12422
 URL: https://issues.apache.org/jira/browse/IGNITE-12422
 Project: Ignite
  Issue Type: Improvement
Reporter: Alexei Scherbakov
Assignee: Alexei Scherbakov
 Fix For: 2.9


Replace with Apache Ignite equivalent if possible.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-12332) Fix flaky test GridCacheAtomicClientInvalidPartitionHandlingSelfTest#testPrimaryFullAsync

2019-10-28 Thread Alexei Scherbakov (Jira)
Alexei Scherbakov created IGNITE-12332:
--

 Summary: Fix flaky test 
GridCacheAtomicClientInvalidPartitionHandlingSelfTest#testPrimaryFullAsync
 Key: IGNITE-12332
 URL: https://issues.apache.org/jira/browse/IGNITE-12332
 Project: Ignite
  Issue Type: Bug
Affects Versions: 2.7.6
Reporter: Alexei Scherbakov
 Fix For: 2.8


Can be reproduced locally with range = 10_000



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-12329) Invalid handling of remote entries causes partition desync and transaction hanging in COMMITTING state.

2019-10-28 Thread Alexei Scherbakov (Jira)
Alexei Scherbakov created IGNITE-12329:
--

 Summary: Invalid handling of remote entries causes partition 
desync and transaction hanging in COMMITTING state.
 Key: IGNITE-12329
 URL: https://issues.apache.org/jira/browse/IGNITE-12329
 Project: Ignite
  Issue Type: Bug
Affects Versions: 2.7.6
Reporter: Alexei Scherbakov
Assignee: Alexei Scherbakov
 Fix For: 2.8


This can happen if transaction is mapped on a partition which is about to be 
evicted on backup.

Due to bugs entry belonging to other cache may be excluded from commit or entry 
containing a lock can be removed without lock release causes depending 
transactions to hang.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-12328) IgniteException "Failed to resolve nodes topology" during cache.removeAll() and constantly changing topology

2019-10-28 Thread Alexei Scherbakov (Jira)
Alexei Scherbakov created IGNITE-12328:
--

 Summary: IgniteException "Failed to resolve nodes topology" during 
cache.removeAll() and constantly changing topology
 Key: IGNITE-12328
 URL: https://issues.apache.org/jira/browse/IGNITE-12328
 Project: Ignite
  Issue Type: Bug
Affects Versions: 2.7.6
Reporter: Alexei Scherbakov
Assignee: Alexei Scherbakov
 Fix For: 2.8


{noformat}
[2019-09-25 13:13:58,339][ERROR][TxThread-threadNum-3] Failed to complete 
transaction.
org.apache.ignite.IgniteException: Failed to resolve nodes topology 
[cacheGrp=cache_group_36, topVer=AffinityTopologyVersion [topVer=16, 
minorTopVer=0], history=[AffinityTopologyVersion [topVer=13, minorTopVer=0], 
AffinityTopologyVersion [topVer=14, minorTopVer=0], AffinityTopologyVersion 
[topVer=15, minorTopVer=0]], snap=Snapshot [topVer=AffinityTopologyVersion 
[topVer=15, minorTopVer=0]], locNode=TcpDiscoveryNode 
[id=6cbf7666-9a8c-4b61-8b3f-6351ef44bd4a, 
consistentId=poc-tester-client-172.25.1.21-id-0, addrs=ArrayList [172.25.1.21], 
sockAddrs=HashSet [lab21.gridgain.local/172.25.1.21:0], discPort=0, order=13, 
intOrder=0, lastExchangeTime=1569406379934, loc=true, 
ver=2.5.10#20190922-sha1:02133315, isClient=true]]
at 
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.resolveDiscoCache(GridDiscoveryManager.java:2125)
 ~[ignite-core-2.5.10.jar:2.5.10]
at 
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.cacheGroupAffinityNodes(GridDiscoveryManager.java:2007)
 ~[ignite-core-2.5.10.jar:2.5.10]
at 
org.apache.ignite.internal.processors.cache.GridCacheUtils.affinityNodes(GridCacheUtils.java:465)
 ~[ignite-core-2.5.10.jar:2.5.10]
at 
org.apache.ignite.internal.processors.cache.distributed.dht.colocated.GridDhtColocatedLockFuture.map0(GridDhtColocatedLockFuture.java:939)
 ~[ignite-core-2.5.10.jar:2.5.10]
at 
org.apache.ignite.internal.processors.cache.distributed.dht.colocated.GridDhtColocatedLockFuture.map(GridDhtColocatedLockFuture.java:911)
 ~[ignite-core-2.5.10.jar:2.5.10]
at 
org.apache.ignite.internal.processors.cache.distributed.dht.colocated.GridDhtColocatedLockFuture.map(GridDhtColocatedLockFuture.java:811)
 ~[ignite-core-2.5.10.jar:2.5.10]
at 
org.apache.ignite.internal.processors.cache.distributed.dht.colocated.GridDhtColocatedCache.lockAllAsync(GridDhtColocatedCache.java:656)
 ~[ignite-core-2.5.10.jar:2.5.10]
at 
org.apache.ignite.internal.processors.cache.distributed.GridDistributedCacheAdapter.txLockAsync(GridDistributedCacheAdapter.java:109)
 ~[ignite-core-2.5.10.jar:2.5.10]
at 
org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal.removeAllAsync0(GridNearTxLocal.java:1648)
 ~[ignite-core-2.5.10.jar:2.5.10]
at 
org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal.removeAllAsync(GridNearTxLocal.java:521)
 ~[ignite-core-2.5.10.jar:2.5.10]
at 
org.apache.ignite.internal.processors.cache.GridCacheAdapter$33.inOp(GridCacheAdapter.java:2619)
 ~[ignite-core-2.5.10.jar:2.5.10]
at 
org.apache.ignite.internal.processors.cache.GridCacheAdapter$SyncInOp.op(GridCacheAdapter.java:4701)
 ~[ignite-core-2.5.10.jar:2.5.10]
at 
org.apache.ignite.internal.processors.cache.GridCacheAdapter.syncOp(GridCacheAdapter.java:3780)
 ~[ignite-core-2.5.10.jar:2.5.10]
at 
org.apache.ignite.internal.processors.cache.GridCacheAdapter.removeAll0(GridCacheAdapter.java:2617)
 ~[ignite-core-2.5.10.jar:2.5.10]
at 
org.apache.ignite.internal.processors.cache.GridCacheAdapter.removeAll(GridCacheAdapter.java:2606)
 ~[ignite-core-2.5.10.jar:2.5.10]
at 
org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.removeAll(IgniteCacheProxyImpl.java:1553)
 ~[ignite-core-2.5.10.jar:2.5.10]
at 
org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.removeAll(GatewayProtectedCacheProxy.java:1026)
 ~[ignite-core-2.5.10.jar:2.5.10]
at 
org.apache.ignite.scenario.TxBalanceTask$TxBody.doTxRemoveAll(TxBalanceTask.java:291)
 ~[poc-tester-0.1.0-SNAPSHOT.jar:?]
at 
org.apache.ignite.scenario.TxBalanceTask$TxBody.call(TxBalanceTask.java:93) 
~[poc-tester-0.1.0-SNAPSHOT.jar:?]
at 
org.apache.ignite.scenario.TxBalanceTask$TxBody.call(TxBalanceTask.java:70) 
~[poc-tester-0.1.0-SNAPSHOT.jar:?]
at 
org.apache.ignite.scenario.internal.AbstractTxTask.doInTransaction(AbstractTxTask.java:290)
 ~[poc-tester-0.1.0-SNAPSHOT.jar:?]
at 
org.apache.ignite.scenario.internal.AbstractTxTask.access$400(AbstractTxTask.java:56)
 ~[poc-tester-0.1.0-SNAPSHOT.jar:?]
at 
org.apache.ignite.scenario.internal.AbstractTxTask$TxRunner.call(AbstractTxTask.java:470)
 [poc-tester-0.1.0-SNAPSHOT.jar:?]
at 

[jira] [Created] (IGNITE-12327) Cross-cache tx is mapped on wrong primary when enlisted caches have incompatible assignments.

2019-10-25 Thread Alexei Scherbakov (Jira)
Alexei Scherbakov created IGNITE-12327:
--

 Summary: Cross-cache tx is mapped on wrong primary when enlisted 
caches have incompatible assignments.
 Key: IGNITE-12327
 URL: https://issues.apache.org/jira/browse/IGNITE-12327
 Project: Ignite
  Issue Type: Bug
Affects Versions: 2.7.6
Reporter: Alexei Scherbakov
Assignee: Alexei Scherbakov
 Fix For: 2.8


This is happening when supplier node is left while rebalancing is partially 
completed on demander.

Suppose we have 2 cache groups, rebalance is in progress and for first group 
rebalance is done and for second group rebalance is partially done (some 
partitions are still MOVING).
At this moment supplier node dies and corresponding topology version is (N,0).
New assignment is computed using current state of partitions and for first 
group will be ideal and the same as for next topology (N,1) which will be 
triggered after all rebalancing is completed by CacheAffinityChangeMessage.
For second group affinity will not be ideal.

If transaction is started while PME is in progress (N, 0)->(N,1), first lock 
request will pass remap check if it's enslists rebalanced group. All subsequent 
lock requests will use invalid topology from previous assignment.

Possible fix: return actual locked topology version from first lock request and 
use it for all subsequent requests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-12038) Fix several failing tests after IGNITE-10078

2019-08-02 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-12038:
--

 Summary: Fix several failing tests after IGNITE-10078
 Key: IGNITE-12038
 URL: https://issues.apache.org/jira/browse/IGNITE-12038
 Project: Ignite
  Issue Type: Bug
Reporter: Alexei Scherbakov
Assignee: Alexei Scherbakov
 Fix For: 2.8






--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (IGNITE-11939) IgnitePdsTxHistoricalRebalancingTest.testTopologyChangesWithConstantLoad test failure

2019-06-21 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-11939:
--

 Summary:  
IgnitePdsTxHistoricalRebalancingTest.testTopologyChangesWithConstantLoad test 
failure
 Key: IGNITE-11939
 URL: https://issues.apache.org/jira/browse/IGNITE-11939
 Project: Ignite
  Issue Type: Bug
Reporter: Alexei Scherbakov


Caused by exception on releasing reserved segments:
{noformat}
[12:51:23]W: [org.apache.ignite:ignite-indexing] [2019-06-21 
12:51:23,967][ERROR][exchange-worker-#33825%persistence.IgnitePdsTxHistoricalRebalancingTest1%][GridDhtPartitionsExchangeFuture]
 Failed to reinitialize local partitions (rebalancing will be stopped)
: GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=7, 
minorTopVer=1], discoEvt=DiscoveryCustomEvent 
[customMsg=CacheAffinityChangeMessage 
[id=08de0ff7b61-276ac575-e4dc-4525-b24b-d0a5d1d7633d, 
topVer=AffinityTopologyVersion [topVer=7, minorTopVer=0], exc
hId=null, partsMsg=null, exchangeNeeded=true], 
affTopVer=AffinityTopologyVersion [topVer=7, minorTopVer=1], 
super=DiscoveryEvent [evtNode=TcpDiscoveryNode 
[id=97e46568-6aa0-4a4b-864c-f05415c0, 
consistentId=persistence.IgnitePdsTxHistoricalRebalancingTest0, addrs=Arra
yList [127.0.0.1], sockAddrs=HashSet [/127.0.0.1:47500], discPort=47500, 
order=1, intOrder=1, lastExchangeTime=1561110643882, loc=false, 
ver=2.8.0#20190621-sha1:, isClient=false], topVer=7, nodeId8=0ff3354e, 
msg=null, type=DISCOVERY_CUSTOM_EVT, tstamp=15611106839
58]], nodeId=97e46568, evt=DISCOVERY_CUSTOM_EVT]
[12:51:23]W: [org.apache.ignite:ignite-indexing] 
java.lang.AssertionError: cur=null, absIdx=0
[12:51:23]W: [org.apache.ignite:ignite-indexing]at 
org.apache.ignite.internal.processors.cache.persistence.wal.aware.SegmentReservationStorage.release(SegmentReservationStorage.java:55)
[12:51:23]W: [org.apache.ignite:ignite-indexing]at 
org.apache.ignite.internal.processors.cache.persistence.wal.aware.SegmentAware.release(SegmentAware.java:207)
[12:51:23]W: [org.apache.ignite:ignite-indexing]at 
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.release(FileWriteAheadLogManager.java:983)
[12:51:23]W: [org.apache.ignite:ignite-indexing]at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.releaseHistoryForPreloading(GridCacheDatabaseSharedManager.java:1844)
[12:51:23]W: [org.apache.ignite:ignite-indexing]at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.distributedExchange(GridDhtPartitionsExchangeFuture.java:1431)
[12:51:23]W: [org.apache.ignite:ignite-indexing]at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:862)
[12:51:23]W: [org.apache.ignite:ignite-indexing]at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:3079)
[12:51:23]W: [org.apache.ignite:ignite-indexing]at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2928)
[12:51:23]W: [org.apache.ignite:ignite-indexing]at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
[12:51:23]W: [org.apache.ignite:ignite-indexing]at 
java.lang.Thread.run(Thread.java:748)
[12:51:23]W: [org.apache.ignite:ignite-indexing] [12:51:23] (err) 
Failed to notify listener: 
o.a.i.i.processors.timeout.GridTimeoutProcessor$2...@79ba1907java.lang.AssertionError:
 cur=null, absIdx=0
[12:51:23]W: [org.apache.ignite:ignite-indexing]at 
org.apache.ignite.internal.processors.cache.persistence.wal.aware.SegmentReservationStorage.release(SegmentReservationStorage.java:55)
[12:51:23]W: [org.apache.ignite:ignite-indexing]at 
org.apache.ignite.internal.processors.cache.persistence.wal.aware.SegmentAware.release(SegmentAware.java:207)
[12:51:23]W: [org.apache.ignite:ignite-indexing]at 
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.release(FileWriteAheadLogManager.java:983)
[12:51:23]W: [org.apache.ignite:ignite-indexing]at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.releaseHistoryForPreloading(GridCacheDatabaseSharedManager.java:1844)
[12:51:23]W: [org.apache.ignite:ignite-indexing]at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.distributedExchange(GridDhtPartitionsExchangeFuture.java:1431)
[12:51:23]W: [org.apache.ignite:ignite-indexing]at 

[jira] [Created] (IGNITE-11937) Fix MVCC PDS flaky suites timeout

2019-06-20 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-11937:
--

 Summary: Fix MVCC PDS flaky suites timeout
 Key: IGNITE-11937
 URL: https://issues.apache.org/jira/browse/IGNITE-11937
 Project: Ignite
  Issue Type: Bug
Reporter: Alexei Scherbakov


Currently we have non-zero failure rate for some MVCC PDS suites in master.

Seems this is due to failure [1] in testRebalancingDuringLoad* tests group, 
which leads to dumping WAL and lock states at the time proportional to current 
WAL length increasing test duration for random time depending on WAL length.

Worse thing the test remains green despite throwing a critical exception.

[1]  Stacktrace
{noformat}
[2019-06-19 
15:56:53,386][ERROR][sys-stripe-6-#134%persistence.IgnitePdsContinuousRestartTestWithSharedGroupAndIndexes3%][IgniteTestResources]
 Critical system error detected. Will be handled accordingly to configured 
handler [hnd=NoOpFailureHandler [super=AbstractFailure
Handler [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, 
SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext 
[type=CRITICAL_ERROR, err=class 
o.a.i.i.processors.cache.persistence.tree.CorruptedTreeException: B+Tree is 
corrupted [pages(groupId, page
Id)=[IgniteBiTuple [val1=81227264, val2=844420635164676]], msg=Runtime failure 
on search row: TxKey [major=1560948946388, minor=17286
class 
org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException:
 B+Tree is corrupted [pages(groupId, pageId)=[IgniteBiTuple [val1=81227264, 
val2=844420635164676]], msg=Runtime failure on search row: TxKey 
[major=1560948946388, minor=17286]]
at 
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.corruptedTreeException(BPlusTree.java:5909)
at 
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invoke(BPlusTree.java:1859)
at 
org.apache.ignite.internal.processors.cache.mvcc.txlog.TxLog.put(TxLog.java:293)
at 
org.apache.ignite.internal.processors.cache.mvcc.MvccProcessorImpl.updateState(MvccProcessorImpl.java:699)
at 
org.apache.ignite.internal.processors.cache.transactions.IgniteTxManager.setMvccState(IgniteTxManager.java:2570)
at 
org.apache.ignite.internal.processors.cache.transactions.IgniteTxAdapter.state(IgniteTxAdapter.java:1228)
at 
org.apache.ignite.internal.processors.cache.transactions.IgniteTxAdapter.state(IgniteTxAdapter.java:1070)
at 
org.apache.ignite.internal.processors.cache.distributed.GridDistributedTxRemoteAdapter.prepareRemoteTx(GridDistributedTxRemoteAdapter.java:421)
at 
org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.startRemoteTx(IgniteTxHandler.java:1837)
at 
org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.processDhtTxPrepareRequest(IgniteTxHandler.java:1198)
at 
org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.access$400(IgniteTxHandler.java:118)
at 
org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$5.apply(IgniteTxHandler.java:224)
at 
org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$5.apply(IgniteTxHandler.java:222)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1141)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:591)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:392)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:318)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:109)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:308)
at 
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1558)
at 
org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1186)
at 
org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:125)
at 
org.apache.ignite.internal.managers.communication.GridIoManager$8.run(GridIoManager.java:1083)
at 
org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:559)
at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:119)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalStateException: Unexpected new transaction state. 
[currState=2, newState=1, cntr=17286]
at 
org.apache.ignite.internal.processors.cache.mvcc.txlog.TxLog$TxLogUpdateClosure.invalid(TxLog.java:629)
at 

[jira] [Created] (IGNITE-11887) Add more test scenarious for OWNING -> RENTING -> MOVING scenario

2019-05-31 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-11887:
--

 Summary: Add more test scenarious for OWNING -> RENTING -> MOVING 
scenario
 Key: IGNITE-11887
 URL: https://issues.apache.org/jira/browse/IGNITE-11887
 Project: Ignite
  Issue Type: Test
Reporter: Alexei Scherbakov
Assignee: Alexei Scherbakov


Relevant test 
GridCacheRebalancingWithAsyncClearingTest#testCorrectRebalancingCurrentlyRentingPartitions



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11867) Fix flaky test GridCacheRebalancingWithAsyncClearingTest#testCorrectRebalancingCurrentlyRentingPartitions

2019-05-23 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-11867:
--

 Summary: Fix flaky test 
GridCacheRebalancingWithAsyncClearingTest#testCorrectRebalancingCurrentlyRentingPartitions
 Key: IGNITE-11867
 URL: https://issues.apache.org/jira/browse/IGNITE-11867
 Project: Ignite
  Issue Type: Bug
Reporter: Alexei Scherbakov
 Fix For: 2.8






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11862) Cache stopping on supplier during rebalance causes NPE and supplying failure.

2019-05-22 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-11862:
--

 Summary: Cache stopping on supplier during rebalance causes NPE 
and supplying failure.
 Key: IGNITE-11862
 URL: https://issues.apache.org/jira/browse/IGNITE-11862
 Project: Ignite
  Issue Type: Bug
Reporter: Alexei Scherbakov


{noformat}
[21:12:14]W: [org.apache.ignite:ignite-core] [2019-05-20 
21:12:14,376][ERROR][sys-#60310%distributed.CacheParallelStartTest0%][GridDhtPartitionSupplier]
 Failed to continue supplying [grp=static-cache-group45, 
demander=ed1c0109-8721-4cd8-80d9-d36e8251, top
Ver=AffinityTopologyVersion [topVer=2, minorTopVer=0], topic=0]
[21:12:14]W: [org.apache.ignite:ignite-core] java.lang.NullPointerException
[21:12:14]W: [org.apache.ignite:ignite-core] at 
org.apache.ignite.internal.processors.cache.CacheGroupContext.addRebalanceSupplyEvent(CacheGroupContext.java:525)
[21:12:14]W: [org.apache.ignite:ignite-core] at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionSupplier.handleDemandMessage(GridDhtPartitionSupplier.java:422)
[21:12:14]W: [org.apache.ignite:ignite-core] at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader.handleDemandMessage(GridDhtPreloader.java:397)
[21:12:14]W: [org.apache.ignite:ignite-core] at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$5.apply(GridCachePartitionExchangeManager.java:455)
[21:12:14]W: [org.apache.ignite:ignite-core] at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$5.apply(GridCachePartitionExchangeManager.java:440)
[21:12:14]W: [org.apache.ignite:ignite-core] at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1141)
[21:12:14]W: [org.apache.ignite:ignite-core] at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:591)
[21:12:14]W: [org.apache.ignite:ignite-core] at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$800(GridCacheIoManager.java:109)
[21:12:14]W: [org.apache.ignite:ignite-core] at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager$OrderedMessageListener.onMessage(GridCacheIoManager.java:1706)
[21:12:14]W: [org.apache.ignite:ignite-core] at 
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1566)
[21:12:14]W: [org.apache.ignite:ignite-core] at 
org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:129)
[21:12:14]W: [org.apache.ignite:ignite-core] at 
org.apache.ignite.internal.managers.communication.GridIoManager$GridCommunicationMessageSet.unwind(GridIoManager.java:2795)
[21:12:14]W: [org.apache.ignite:ignite-core] at 
org.apache.ignite.internal.managers.communication.GridIoManager.unwindMessageSet(GridIoManager.java:1523)
[21:12:14]W: [org.apache.ignite:ignite-core] at 
org.apache.ignite.internal.managers.communication.GridIoManager.access$4500(GridIoManager.java:129)
[21:12:14]W: [org.apache.ignite:ignite-core] at 
org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1492)
[21:12:14]W: [org.apache.ignite:ignite-core] at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[21:12:14]W: [org.apache.ignite:ignite-core] at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[21:12:14]W: [org.apache.ignite:ignite-core] at 
java.lang.Thread.run(Thread.java:748)
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11857) Investigate performance drop after IGNITE-10078

2019-05-21 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-11857:
--

 Summary: Investigate performance drop after IGNITE-10078
 Key: IGNITE-11857
 URL: https://issues.apache.org/jira/browse/IGNITE-11857
 Project: Ignite
  Issue Type: Improvement
Reporter: Alexei Scherbakov


After IGNITE-1078 yardstick tests show performance drop up to 8% in some 
scenarios:

* tx-optim-repRead-put-get

* tx-optimistic-put

* tx-putAll

Partially this is due new update counter implementation, but not only. 
Investigation is required.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11820) Add partition consistency tests for multiple caches in group.

2019-04-28 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-11820:
--

 Summary: Add partition consistency tests for multiple caches in 
group.
 Key: IGNITE-11820
 URL: https://issues.apache.org/jira/browse/IGNITE-11820
 Project: Ignite
  Issue Type: Improvement
Reporter: Alexei Scherbakov






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11804) Assertion error

2019-04-24 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-11804:
--

 Summary: Assertion error
 Key: IGNITE-11804
 URL: https://issues.apache.org/jira/browse/IGNITE-11804
 Project: Ignite
  Issue Type: Bug
Reporter: Alexei Scherbakov


Reproducer (needs some cleanup)
{noformat}
/*
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 *
 *  http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

package org.apache.ignite.internal.processors.cache.transactions;

import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.atomic.AtomicReference;
import org.apache.ignite.Ignite;
import org.apache.ignite.IgniteCache;
import org.apache.ignite.cache.affinity.rendezvous.RendezvousAffinityFunction;
import org.apache.ignite.configuration.CacheConfiguration;
import org.apache.ignite.configuration.DataRegionConfiguration;
import org.apache.ignite.configuration.DataStorageConfiguration;
import org.apache.ignite.configuration.IgniteConfiguration;
import org.apache.ignite.failure.StopNodeFailureHandler;
import org.apache.ignite.internal.IgniteEx;
import org.apache.ignite.internal.processors.cache.GridCacheContext;
import org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl;
import 
org.apache.ignite.internal.processors.cache.persistence.db.wal.IgniteWalRebalanceTest;
import org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi;
import org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder;
import org.apache.ignite.testframework.junits.common.GridCommonAbstractTest;
import org.apache.ignite.transactions.Transaction;
import org.junit.Test;

import static java.util.concurrent.TimeUnit.DAYS;
import static java.util.concurrent.TimeUnit.MILLISECONDS;
import static org.apache.ignite.cache.CacheAtomicityMode.TRANSACTIONAL;
import static org.apache.ignite.cache.CacheWriteSynchronizationMode.FULL_SYNC;
import static org.apache.ignite.configuration.WALMode.LOG_ONLY;
import static org.apache.ignite.transactions.TransactionConcurrency.PESSIMISTIC;
import static 
org.apache.ignite.transactions.TransactionIsolation.REPEATABLE_READ;

/**
 * Test framework for ordering transaction's prepares and commits by 
intercepting messages and releasing then
 * in user defined order.
 */
public class TxPartitionCounterStateAbstractTest extends GridCommonAbstractTest 
{
/** IP finder. */
private static final TcpDiscoveryVmIpFinder IP_FINDER = new 
TcpDiscoveryVmIpFinder(true);

/** */
private static final int MB = 1024 * 1024;

/** */
protected int backups;

/** */
public static final int TEST_TIMEOUT = 30_000;

public static final String DEFAULT_CACHE_NAME_2 = DEFAULT_CACHE_NAME + "2";

/** */
private AtomicReference testFailed = new AtomicReference<>();

/** Number of keys to preload before txs to enable historical rebalance. */
protected static final int PRELOAD_KEYS_CNT = 1;

/** */
protected static final String CLIENT_GRID_NAME = "client";

/** */
protected static final int PARTS_CNT = 32;

/** {@inheritDoc} */
@Override protected IgniteConfiguration getConfiguration(String 
igniteInstanceName) throws Exception {
IgniteConfiguration cfg = super.getConfiguration(igniteInstanceName);

cfg.setConsistentId("node" + igniteInstanceName);
cfg.setFailureHandler(new StopNodeFailureHandler());
cfg.setRebalanceThreadPoolSize(4); // Necessary to reproduce some 
issues.

((TcpDiscoverySpi)cfg.getDiscoverySpi()).setIpFinder(IP_FINDER);

// TODO set this only for historical rebalance tests.
cfg.setCommunicationSpi(new 
IgniteWalRebalanceTest.WalRebalanceCheckingCommunicationSpi());

boolean client = igniteInstanceName.startsWith(CLIENT_GRID_NAME);

cfg.setClientMode(client);

cfg.setDataStorageConfiguration(new DataStorageConfiguration().
setWalHistorySize(1000).
setWalSegmentSize(8 * MB).setWalMode(LOG_ONLY).setPageSize(1024).
setCheckpointFrequency(MILLISECONDS.convert(365, DAYS)).
setDefaultDataRegionConfiguration(new 
DataRegionConfiguration().setPersistenceEnabled(true).
setInitialSize(100 * MB).setMaxSize(100 * MB)));

if 

[jira] [Created] (IGNITE-11801) Clearing of moving partition may lead to partition desync.

2019-04-24 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-11801:
--

 Summary: Clearing of moving partition may lead to partition desync.
 Key: IGNITE-11801
 URL: https://issues.apache.org/jira/browse/IGNITE-11801
 Project: Ignite
  Issue Type: Bug
Reporter: Alexei Scherbakov


{{o.a.i.i.processors.cache.distributed.dht.topology.GridDhtLocalPartition#tryClear}}
calls 
{{org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtLocalPartition#clearAll}}

Inside clearAll {{clearVer = ctx.versions().next();}} is defined on call time, 
but this may happen after exchange future is finished and some update already 
applied to MOVING partition resulting in removal of actual data from partition.

Fix: assign clear version before exchange future is finished.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11800) Update counters in o.a.i.i.processors.cache.distributed.dht.topology.GridDhtPartitionTopologyImpl#update could be applied from stale messages

2019-04-24 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-11800:
--

 Summary: Update counters in 
o.a.i.i.processors.cache.distributed.dht.topology.GridDhtPartitionTopologyImpl#update
 could be applied from stale messages
 Key: IGNITE-11800
 URL: https://issues.apache.org/jira/browse/IGNITE-11800
 Project: Ignite
  Issue Type: Bug
Reporter: Alexei Scherbakov


Stale check goes after applying incoming counters which seems wrong.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11799) Do not always clear partition in MOVING state before exchange

2019-04-24 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-11799:
--

 Summary: Do not always clear partition in MOVING state before 
exchange
 Key: IGNITE-11799
 URL: https://issues.apache.org/jira/browse/IGNITE-11799
 Project: Ignite
  Issue Type: Improvement
Reporter: Alexei Scherbakov


After IGNITE-10078 if partition was in moving state before exchange and choosed 
for full rebalance (for example, this will happen if any minor PME cancels 
previous rebalance) we always will clear it to avoid desync issues if some 
removals were not delivered to demander.

This is not always necessary to do.

 

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11797) Repair historical rebalancing for atomic and mixed tx-atomic cache groups.

2019-04-23 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-11797:
--

 Summary: Repair historical rebalancing for atomic and mixed 
tx-atomic cache groups.
 Key: IGNITE-11797
 URL: https://issues.apache.org/jira/browse/IGNITE-11797
 Project: Ignite
  Issue Type: Bug
Reporter: Alexei Scherbakov


IGNITE-10078 only solves consistency problems for tx mode.

For atomic caches the rebalance consistency issues still remain and should be 
fixed together with improvement of atomic cache protocol consistency.

Mixed tx-atomic mode for cache group should be not allowed at all.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11794) Remove initial counter from update counter contract.

2019-04-23 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-11794:
--

 Summary: Remove initial counter from update counter contract.
 Key: IGNITE-11794
 URL: https://issues.apache.org/jira/browse/IGNITE-11794
 Project: Ignite
  Issue Type: Improvement
Reporter: Alexei Scherbakov


We gave 
org.apache.ignite.internal.processors.cache.PartitionUpdateCounter#initial and 
org.apache.ignite.internal.processors.cache.PartitionUpdateCounter#updateInitial
 method in patition update counter contract but they are not needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11793) Failover for isolated updater mode.

2019-04-23 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-11793:
--

 Summary: Failover for isolated updater mode.
 Key: IGNITE-11793
 URL: https://issues.apache.org/jira/browse/IGNITE-11793
 Project: Ignite
  Issue Type: Improvement
Reporter: Alexei Scherbakov


Currently with isolated updater (datastream + allowOverride=false) even for 
transactional mode counters are generated independently on all owners.

In case of some nodes fail there is high risk of partition desync.

Also this mode couldn't be used together with concurrent transactions after 
IGNITE-10078.

I suggest to introduce special loading mode for cache where concurrent updates 
are prohibited until initial data loading (using isolated updater) is completed.

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11790) Optimize rebalance history calculation.

2019-04-22 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-11790:
--

 Summary: Optimize rebalance history calculation.
 Key: IGNITE-11790
 URL: https://issues.apache.org/jira/browse/IGNITE-11790
 Project: Ignite
  Issue Type: Improvement
Reporter: Alexei Scherbakov


Currently we pass initial update counters to coordinator during PME.

But this is not needed for calculation rebalance history.

It can be calculated like: maxCntr - updateCounter(last counter for sequential 
history)

 

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11663) Dispose of copypaste code in org.apache.ignite.internal.processors.cache.persistence.wal.record.RecordTypes

2019-04-01 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-11663:
--

 Summary: Dispose of copypaste code in 
org.apache.ignite.internal.processors.cache.persistence.wal.record.RecordTypes
 Key: IGNITE-11663
 URL: https://issues.apache.org/jira/browse/IGNITE-11663
 Project: Ignite
  Issue Type: Improvement
 Environment: 
org.apache.ignite.internal.pagemem.wal.record.WALRecord.RecordPurpose
Reporter: Alexei Scherbakov
 Fix For: 2.8


We already have 
org.apache.ignite.internal.pagemem.wal.record.WALRecord.RecordPurpose for 
defining record relation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11611) If partition cannot be recovered during rebalance it should be moved to LOST state.

2019-03-22 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-11611:
--

 Summary: If partition cannot be recovered during rebalance it 
should be moved to LOST state.
 Key: IGNITE-11611
 URL: https://issues.apache.org/jira/browse/IGNITE-11611
 Project: Ignite
  Issue Type: Improvement
Reporter: Alexei Scherbakov
 Fix For: 2.8






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11607) Historical rebalance is not possible from partition which was recently rebalanced itself

2019-03-22 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-11607:
--

 Summary: Historical rebalance is not possible from partition which 
was recently rebalanced itself
 Key: IGNITE-11607
 URL: https://issues.apache.org/jira/browse/IGNITE-11607
 Project: Ignite
  Issue Type: Improvement
Reporter: Alexei Scherbakov
 Fix For: 2.8






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11594) IgnitePdsContinuousRestartTestWithExpiryPolicy test reports partition sizes validation error.

2019-03-21 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-11594:
--

 Summary: IgnitePdsContinuousRestartTestWithExpiryPolicy test 
reports partition sizes validation error.
 Key: IGNITE-11594
 URL: https://issues.apache.org/jira/browse/IGNITE-11594
 Project: Ignite
  Issue Type: Improvement
Reporter: Alexei Scherbakov
 Fix For: 2.8


Most probably this is due to concurrent expiration during PME.

Looks like validation of sizes for expiring cache partitions have no meaning.

Also base test IgnitePdsContinuousRestartTest doesn't test any invariant.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11256) Implement read-only mode for grid

2019-02-08 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-11256:
--

 Summary: Implement read-only mode for grid
 Key: IGNITE-11256
 URL: https://issues.apache.org/jira/browse/IGNITE-11256
 Project: Ignite
  Issue Type: Improvement
Reporter: Alexei Scherbakov
 Fix For: 2.8


Should be triggered from control.sh utility.

Useful for maintenance work, for example checking partition consistency 
(idle_verify)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11171) Assertion on tx preparing

2019-02-01 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-11171:
--

 Summary: Assertion on tx preparing
 Key: IGNITE-11171
 URL: https://issues.apache.org/jira/browse/IGNITE-11171
 Project: Ignite
  Issue Type: Bug
Reporter: Alexei Scherbakov
 Fix For: 2.8


{noformat}
2019-01-22 
14:00:01.203[ERROR][sys-stripe-15-#16%DPL_GRID%DplGridNodeName%][org.apache.ignite.Ignite]
 Critical system error detected. Will be handled accordingly to configured 
handler [hnd=class o.a.i.failure.StopNodeOrHaltFailureHandle
r, failureCtx=FailureContext [type=CRITICAL_ERROR, 
err=java.lang.AssertionError: Got entry removed exception while holding 
transactional lock on entry 
[e=o.a.i.i.processors.cache.GridCacheEntryRemovedException, 
cached=GridDhtCacheEntry
[rdrs=ReaderId[] [], part=7042, super=GridDistributedCacheEntry 
[super=GridCacheMapEntry [key=KeyCacheObjectImpl [part=7042, 
val=SCHEDULED_CHECK_STOP_PAYMENTS_TASK_DPL_defaultSection, hasValBytes=true], 
val=null, startVer=1548154332959,
ver=GridCacheVersion [topVer=159054171, order=1548061479047, nodeOrder=20], 
hash=1755381247, extras=GridCacheObsoleteEntryExtras 
[obsoleteVer=GridCacheVersion [topVer=2147483647, order=0, nodeOrder=0]], 
flags=2]]
java.lang.AssertionError: Got entry removed exception while holding 
transactional lock on entry 
[e=org.apache.ignite.internal.processors.cache.GridCacheEntryRemovedException, 
cached=GridDhtCacheEntry [rdrs=ReaderId[] [], part=7042, supe
r=GridDistributedCacheEntry [super=GridCacheMapEntry [key=KeyCacheObjectImpl 
[part=7042, val=SCHEDULED_CHECK_STOP_PAYMENTS_TASK_DPL_defaultSection, 
hasValBytes=true], val=null, startVer=1548154332959, ver=GridCacheVersion 
[topVer=159054
171, order=1548061479047, nodeOrder=20], hash=1755381247, 
extras=GridCacheObsoleteEntryExtras [obsoleteVer=GridCacheVersion 
[topVer=2147483647, order=0, nodeOrder=0]], flags=2
        at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.onEntriesLocked(GridDhtTxPrepareFuture.java:512)
        at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.prepare0(GridDhtTxPrepareFuture.java:1231)
        at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.mapIfLocked(GridDhtTxPrepareFuture.java:671)
        at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.prepare(GridDhtTxPrepareFuture.java:1048)
        at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocal.prepareAsync(GridDhtTxLocal.java:397)
        at 
org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.prepareNearTx(IgniteTxHandler.java:520)
        at 
org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.processNearTxPrepareRequest0(IgniteTxHandler.java:161)
        at 
org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.processNearTxPrepareRequest(IgniteTxHandler.java:139)
        at 
org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.access$000(IgniteTxHandler.java:101)
        at 
org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$1.apply(IgniteTxHandler.java:181)
        at 
org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$1.apply(IgniteTxHandler.java:179)
        at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1058)
        at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:583)
        at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:382)
        at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:308)
        at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:101)
        at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:297)
        at 
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1556)
        at 
org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1184)
        at 
org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:125)
        at 
org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1091)
        at 
org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:496)
        at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
        at java.lang.Thread.run(Thread.java:745){noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11131) Invalid use of static system properties in AffinityAssignment

2019-01-29 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-11131:
--

 Summary: Invalid use of static system properties in 
AffinityAssignment
 Key: IGNITE-11131
 URL: https://issues.apache.org/jira/browse/IGNITE-11131
 Project: Ignite
  Issue Type: Task
Reporter: Alexei Scherbakov
 Fix For: 2.8


Recently added properties 
{{org.apache.ignite.internal.processors.affinity.AffinityAssignment#IGNITE_AFFINITY_BACKUPS_THRESHOLD}}
 and 
{{org.apache.ignite.internal.processors.affinity.AffinityAssignment#IGNITE_DISABLE_AFFINITY_MEMORY_OPTIMIZATION}}
 have flaw - the are defined as static making it impossible to change between 
node restarts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11099) Implement test framework for measuring heap utilization

2019-01-28 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-11099:
--

 Summary: Implement test framework for measuring heap utilization
 Key: IGNITE-11099
 URL: https://issues.apache.org/jira/browse/IGNITE-11099
 Project: Ignite
  Issue Type: Task
Reporter: Alexei Scherbakov
 Fix For: 2.8


It's necessary to create special test framwork capable of heap usage comparison 
in "before optimization" vs "after opitmization" modes.

Most probably it should be implemented as special test suite running with 
instrumentation support.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11059) Print information about pending locks queue in case of dht local tx timeout.

2019-01-24 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-11059:
--

 Summary: Print information about pending locks queue in case of 
dht local tx timeout.
 Key: IGNITE-11059
 URL: https://issues.apache.org/jira/browse/IGNITE-11059
 Project: Ignite
  Issue Type: Improvement
Reporter: Alexei Scherbakov
 Fix For: 2.8


Currently in case of dht local tx timeout it's hard to understand which keys 
was not locked.

Addtional information should be printed in log on timeout containing 
information about pending keys:

key, tx info holding a lock (xid, label if present)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11058) Possible OOM due to large discard queue in TcpDiscoverySpi

2019-01-24 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-11058:
--

 Summary: Possible OOM due to large discard queue in TcpDiscoverySpi
 Key: IGNITE-11058
 URL: https://issues.apache.org/jira/browse/IGNITE-11058
 Project: Ignite
  Issue Type: Improvement
Reporter: Alexei Scherbakov
 Fix For: 2.8


Currently it's necessary to store every ensured (marked with 
TcpDiscoveryEnsureDelivery annotation) in pending message queue until it's 
discarded from coordinator for implementing guaranteed delivery, otherwise if 
subsequent nodes will fail while forwarding message the guarantee couldn't be 
fulfilled.

On large topologies with active changes the queue may contain many very large 
messages causing heap usage bursts and possible OOM.

Possible solution:
 # off-load pending messages payload to off-heap or even on disk.
 # store messages in serialized form for avoiding JVM Object overhead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-10957) Reduce EnsuredMessageHistory heap occupation

2019-01-16 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-10957:
--

 Summary: Reduce EnsuredMessageHistory heap occupation
 Key: IGNITE-10957
 URL: https://issues.apache.org/jira/browse/IGNITE-10957
 Project: Ignite
  Issue Type: Improvement
Reporter: Alexei Scherbakov
 Fix For: 2.8


EnsuredMessageHistory can hold up to 512 discovery messages to ensure message 
delivery on client reconnect and clears lazily. With large topology and a large 
amount caches/partitions this can take up to several Gbs of heap.

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-10920) Optimize HistoryAffinityAssignment heap usage.

2019-01-14 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-10920:
--

 Summary: Optimize HistoryAffinityAssignment heap usage.
 Key: IGNITE-10920
 URL: https://issues.apache.org/jira/browse/IGNITE-10920
 Project: Ignite
  Issue Type: Improvement
Reporter: Alexei Scherbakov


With large topology and large amount of caches/partitions many server discovery 
events may quickly produce large affinity history, eating gigabytes of heap.

Solution: implement some kind of a compression for affinity cache map.

On example, affinity history could be stored as delta to some previous version.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-10913) Reduce heap occupation by o.a.i.i.processors.cache.persistence.file.FilePageStore instances.

2019-01-13 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-10913:
--

 Summary: Reduce heap occupation by 
o.a.i.i.processors.cache.persistence.file.FilePageStore instances.
 Key: IGNITE-10913
 URL: https://issues.apache.org/jira/browse/IGNITE-10913
 Project: Ignite
  Issue Type: Improvement
Reporter: Alexei Scherbakov
 Fix For: 2.8


With large topology and large amount of caches/partitions and enabled 
persistence could be millions of FilePageStore objects in heap (for each 
partition).

Each instance has a reference to a File (field cfgFile) storing as String 
absolute path to a partition.

Also internal File inplementation (on example UnixFile) also allocates space 
for file path.

I observed about 2Gb of heap space occupied by these objects in one of 
environments.

Solution: dereference (set to null) cfgFile after object creation, create File 
object lazily on demand when needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-10912) Huge node join request discovery message slows down node joining and corresponding PME

2019-01-13 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-10912:
--

 Summary: Huge node join request discovery message slows down node 
joining and corresponding PME
 Key: IGNITE-10912
 URL: https://issues.apache.org/jira/browse/IGNITE-10912
 Project: Ignite
  Issue Type: Improvement
Reporter: Alexei Scherbakov
 Fix For: 2.8


WIth large topology and large number of caches/groups node join message can 
reach a size > 30M due to a large amount of transferred discovery data.

It adds overhead on ring traversal and slows down "node join" PME.

Possible solution:
 # introduce pre-join message with discovery data which doesn't increment 
topology version. After all nodes wil have corressponding discovery data start 
actual joining. Discovery data probably should be stored off-heap(or even on 
disk) to avoid heap usage bursts on joining of multiple nodes.
 # Add compression to discovery data.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-10894) Reduce heap utilization for grids with big topologies and caches numbers.

2019-01-11 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-10894:
--

 Summary: Reduce heap utilization for grids with big topologies and 
caches numbers.
 Key: IGNITE-10894
 URL: https://issues.apache.org/jira/browse/IGNITE-10894
 Project: Ignite
  Issue Type: Improvement
Reporter: Alexei Scherbakov


This is an unbrella ticket for all optimizations related to reducing of heap 
utilization of large Ignite deployments.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-10443) Fix flaky GridCommandHandlerTest.testKillHangingLocalTransactions

2018-11-28 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-10443:
--

 Summary: Fix flaky 
GridCommandHandlerTest.testKillHangingLocalTransactions
 Key: IGNITE-10443
 URL: https://issues.apache.org/jira/browse/IGNITE-10443
 Project: Ignite
  Issue Type: Test
Reporter: Alexei Scherbakov
Assignee: Alexei Scherbakov
 Fix For: 2.8






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-10418) Implement lightweight profiling for message processing

2018-11-27 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-10418:
--

 Summary: Implement lightweight profiling for message processing
 Key: IGNITE-10418
 URL: https://issues.apache.org/jira/browse/IGNITE-10418
 Project: Ignite
  Issue Type: New Feature
Reporter: Alexei Scherbakov
Assignee: Alexei Scherbakov
 Fix For: 2.8






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-10255) Avoid history reservation on affinity change.

2018-11-14 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-10255:
--

 Summary: Avoid history reservation on affinity change.
 Key: IGNITE-10255
 URL: https://issues.apache.org/jira/browse/IGNITE-10255
 Project: Ignite
  Issue Type: Improvement
Reporter: Alexei Scherbakov


Currently WAL history is reserved even if exchange is triggered by affinity 
change message, which means rebalance completed and assignment is ideal.

Reservation is not needed in such case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-10251) Get rid of the code left from times when lateAffinity=false was supported

2018-11-14 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-10251:
--

 Summary: Get rid of the code left from times when 
lateAffinity=false was supported
 Key: IGNITE-10251
 URL: https://issues.apache.org/jira/browse/IGNITE-10251
 Project: Ignite
  Issue Type: Bug
Reporter: Alexei Scherbakov


This code can hide errors and lead to inefficient processing in some scenarios.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-10117) Node is mistakenly excluded from history suppliers preventing historical rebalance.

2018-11-01 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-10117:
--

 Summary: Node is mistakenly excluded from history suppliers 
preventing historical rebalance.
 Key: IGNITE-10117
 URL: https://issues.apache.org/jira/browse/IGNITE-10117
 Project: Ignite
  Issue Type: Bug
Reporter: Alexei Scherbakov
 Fix For: 2.8


This is because 
org.apache.ignite.internal.processors.cache.persistence.IgniteCacheDatabaseSharedManager#reserveHistoryForExchange
 is called before 
org.apache.ignite.internal.processors.cache.persistence.IgniteCacheDatabaseSharedManager#beforeExchange,
 which restores correct partition state.

{noformat}
public void testHistory() throws Exception {
IgniteEx crd = startGrid(0);
startGrid(1);

crd.cluster().active(true);

awaitPartitionMapExchange();

int part = 0;

List keys = loadDataToPartition(part, DEFAULT_CACHE_NAME, 100, 
0, 1);

forceCheckpoint(); // Prevent IGNITE-10088

stopAllGrids();

awaitPartitionMapExchange();

List keys1 = loadDataToPartition(part, DEFAULT_CACHE_NAME, 
100, 100, 1);

startGrid(0);
startGrid(1);

awaitPartitionMapExchange(); // grid0 will not provide history.
}
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-10112) Prioritize processing of tx finish=false message due to timeout

2018-11-01 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-10112:
--

 Summary: Prioritize processing of tx finish=false message due to 
timeout
 Key: IGNITE-10112
 URL: https://issues.apache.org/jira/browse/IGNITE-10112
 Project: Ignite
  Issue Type: Improvement
Reporter: Alexei Scherbakov
 Fix For: 2.8


Currently tx rollback messages are processed in the same way as others.

For forced rollback on example triggered by tx timeouts on PME (see 
org.apache.ignite.configuration.TransactionConfiguration#getTxTimeoutOnPartitionMapExchange)
 they should be prioritized to avoid timeout violation under load.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-10088) Partition can be restored in moving state instead of owning if node crashed before first checkpoint.

2018-10-31 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-10088:
--

 Summary: Partition can be restored in moving state instead of 
owning if node crashed before first checkpoint.
 Key: IGNITE-10088
 URL: https://issues.apache.org/jira/browse/IGNITE-10088
 Project: Ignite
  Issue Type: Bug
Reporter: Alexei Scherbakov


Scenario:

1. Start grid with large enough checkpoint freq, wait for rebalance, put some 
data.
2. Observe all partitions in OWNING state.
3. Kill of trigger FH for node before checkpoint is started.
4. Return node to grid, observe all partitions created with moving state and 
unnecessary rebalanced.

Problem in 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager#restorePartitionStates
 which doesn't apply owning partition state.

{noformat}
public void testMoving() throws Exception {
IgniteEx crd = startGrid(0);

startGrid(1);

crd.cluster().active(true);

awaitPartitionMapExchange();

stopGrid(1);

awaitPartitionMapExchange();

startGrid(1);

awaitPartitionMapExchange();
}
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-10078) Node failure during concurrent partition updates may cause partition desync between primary and backup.

2018-10-31 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-10078:
--

 Summary: Node failure during concurrent partition updates may 
cause partition desync between primary and backup.
 Key: IGNITE-10078
 URL: https://issues.apache.org/jira/browse/IGNITE-10078
 Project: Ignite
  Issue Type: Bug
Reporter: Alexei Scherbakov
Assignee: Alexei Scherbakov
 Fix For: 2.8


This is possible if some updates with lower partition counter are not written 
to WAL before node failure.

Scenario:

1. Start grid with 3 nodes, 2 backups.
2. Preload some data to partition P.
3. Start two concurrent transactions writing single key to the same partition, 
keys are different
{noformat}
try(Transaction tx = client.transactions().txStart(PESSIMISTIC, 
REPEATABLE_READ, 0, 1)) {
  client.cache(DEFAULT_CACHE_NAME).put(k, v);

  tx.commit();
}
{noformat}
4. Order updates on backup in the way such update with max partition counter is 
written to WAL and update with lesser partition counter failed due to 
triggering of FH before it's added to WAL

5. Return failed node to grid, observe no rebalancing due to same partition 
counters.

Possible solution: detect gaps in update counters on recovery and force 
rebalance from a node without gaps if detected.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-10040) Auto rebalance throttling

2018-10-29 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-10040:
--

 Summary: Auto rebalance throttling
 Key: IGNITE-10040
 URL: https://issues.apache.org/jira/browse/IGNITE-10040
 Project: Ignite
  Issue Type: Improvement
Reporter: Alexei Scherbakov


Currently we provide a few options to control rebalance overhead, most important

org.apache.ignite.configuration.CacheConfiguration#setRebalanceThrottle

org.apache.ignite.configuration.IgniteConfiguration#setRebalanceThreadPoolSize

In general proper option values could be only derived from load testing, which 
is very inconvenient. Moreover, changing the settings requires grid restart.

It's desirable to implement automatic rebalance throttling defined by user 
configuration option, in terms of ratio between dirty pages produced by 
rebalance and dirty pages produced by user activity.





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-10029) Node attributes are not restored from metastore after node restart.

2018-10-26 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-10029:
--

 Summary: Node attributes are not restored from metastore after 
node restart.
 Key: IGNITE-10029
 URL: https://issues.apache.org/jira/browse/IGNITE-10029
 Project: Ignite
  Issue Type: Bug
Reporter: Alexei Scherbakov
 Fix For: 2.8


Scenario:

1. Start node with enabled persistence, configure some user attributes.

2. Restart node without directly setting node attributes again.

3. Read any user attribute and observe NULL value.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-10027) Optimistic transaction doesn't throw TransactionTimeoutException on lock acquisition timeout.

2018-10-26 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-10027:
--

 Summary: Optimistic transaction doesn't throw 
TransactionTimeoutException on lock acquisition timeout.
 Key: IGNITE-10027
 URL: https://issues.apache.org/jira/browse/IGNITE-10027
 Project: Ignite
  Issue Type: Bug
Reporter: Alexei Scherbakov


Reproducer:

{noformat}
/*
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 *
 *  http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

package org.apache.ignite.internal.processors.cache.transactions;

import java.util.concurrent.CountDownLatch;
import java.util.concurrent.TimeUnit;
import org.apache.ignite.Ignite;
import org.apache.ignite.IgniteCheckedException;
import org.apache.ignite.configuration.CacheConfiguration;
import org.apache.ignite.configuration.IgniteConfiguration;
import org.apache.ignite.internal.IgniteEx;
import org.apache.ignite.internal.IgniteInternalFuture;
import org.apache.ignite.internal.IgniteInterruptedCheckedException;
import org.apache.ignite.internal.TestRecordingCommunicationSpi;
import 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareResponse;
import org.apache.ignite.internal.processors.cache.verify.IdleVerifyResultV2;
import org.apache.ignite.internal.util.typedef.X;
import org.apache.ignite.internal.util.typedef.internal.U;
import org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi;
import org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder;
import org.apache.ignite.testframework.junits.common.GridCommonAbstractTest;
import org.apache.ignite.transactions.Transaction;
import org.apache.ignite.transactions.TransactionConcurrency;
import org.apache.ignite.transactions.TransactionTimeoutException;

import static org.apache.ignite.cache.CacheAtomicityMode.TRANSACTIONAL;
import static org.apache.ignite.cache.CacheWriteSynchronizationMode.FULL_SYNC;
import static org.apache.ignite.testframework.GridTestUtils.runAsync;
import static org.apache.ignite.transactions.TransactionConcurrency.OPTIMISTIC;
import static org.apache.ignite.transactions.TransactionConcurrency.PESSIMISTIC;
import static 
org.apache.ignite.transactions.TransactionIsolation.REPEATABLE_READ;

/**
 * Tests rollback on timeout scenarios for one-phase commit protocol.
 */
public class TxRollbackOnTimeoutOnePhaseCommitTest extends 
GridCommonAbstractTest {
/** IP finder. */
private static final TcpDiscoveryVmIpFinder IP_FINDER = new 
TcpDiscoveryVmIpFinder(true);

/** */
private static final int GRID_CNT = 2;

/** {@inheritDoc} */
@Override protected IgniteConfiguration getConfiguration(String 
igniteInstanceName) throws Exception {
IgniteConfiguration cfg = super.getConfiguration(igniteInstanceName);

((TcpDiscoverySpi)cfg.getDiscoverySpi()).setIpFinder(IP_FINDER);

cfg.setCommunicationSpi(new TestRecordingCommunicationSpi());

boolean client = igniteInstanceName.startsWith("client");

cfg.setClientMode(client);

if (!client) {
CacheConfiguration ccfg = new 
CacheConfiguration(DEFAULT_CACHE_NAME);

ccfg.setAtomicityMode(TRANSACTIONAL);
ccfg.setBackups(1);
ccfg.setWriteSynchronizationMode(FULL_SYNC);
ccfg.setOnheapCacheEnabled(false);

cfg.setCacheConfiguration(ccfg);
}

return cfg;
}

/** {@inheritDoc} */
@Override protected void beforeTest() throws Exception {
super.beforeTest();

startGridsMultiThreaded(GRID_CNT);

startGrid("client");
}

/** */
public void testUnlockOptimistic() throws IgniteCheckedException {
IgniteEx client = grid("client");

assertNotNull(client.cache(DEFAULT_CACHE_NAME));

int key = 0;

CountDownLatch lock = new CountDownLatch(1);
CountDownLatch finish = new CountDownLatch(1);

IgniteInternalFuture fut = runAsync(() -> {
try (Transaction tx = client.transactions().txStart(PESSIMISTIC, 
REPEATABLE_READ, 0, 1)) {
client.cache(DEFAULT_CACHE_NAME).put(key, key + 1);

lock.countDown();

try {
assertTrue(U.await(finish, 30, TimeUnit.SECONDS));
 

[jira] [Created] (IGNITE-10019) Documentation: partition preloading

2018-10-26 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-10019:
--

 Summary: Documentation: partition preloading
 Key: IGNITE-10019
 URL: https://issues.apache.org/jira/browse/IGNITE-10019
 Project: Ignite
  Issue Type: Task
Reporter: Alexei Scherbakov
Assignee: Artem Budnikov
 Fix For: 2.8


We have to add documentation for partition preloading feature:

IgniteCache.preloadPartition



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-9998) .NET: Implement partition preload API

2018-10-25 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-9998:
-

 Summary: .NET: Implement partition preload API
 Key: IGNITE-9998
 URL: https://issues.apache.org/jira/browse/IGNITE-9998
 Project: Ignite
  Issue Type: Improvement
Reporter: Alexei Scherbakov
 Fix For: 2.8






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-9926) Improve metada distribution speed in a scenario with concurrent updates for the same schema

2018-10-18 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-9926:
-

 Summary: Improve metada distribution speed in a scenario with 
concurrent updates for the same schema
 Key: IGNITE-9926
 URL: https://issues.apache.org/jira/browse/IGNITE-9926
 Project: Ignite
  Issue Type: Improvement
Reporter: Alexei Scherbakov
 Fix For: 2.8


If multiple threads start putting same object with non-existent schema in the 
cache simultaneously every update will trigger full propose-accept round trip 
in current implementation.

Propose message should be send only for first update, others should wait for 
it's completion instead of sending messages for same schema.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-9896) TxRollbackOnTimeoutNoDeadlockDetectionTest fails in master for many tests

2018-10-16 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-9896:
-

 Summary: TxRollbackOnTimeoutNoDeadlockDetectionTest fails in 
master for many tests
 Key: IGNITE-9896
 URL: https://issues.apache.org/jira/browse/IGNITE-9896
 Project: Ignite
  Issue Type: Bug
Reporter: Alexei Scherbakov
 Fix For: 2.8


Example of 100% failing test:

org.apache.ignite.internal.processors.cache.transactions.TxRollbackOnTimeoutNoDeadlockDetectionTest#testRollbackOnTimeoutTxServerRemapPessimisticReadCommitted



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-9830) o.a.i.i.b.BinaryReaderExImpl#getOrCreateSchema sometimes misses latest metadata version resulting in failed tx commit because of missed schema.

2018-10-09 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-9830:
-

 Summary: o.a.i.i.b.BinaryReaderExImpl#getOrCreateSchema sometimes 
misses latest metadata version resulting in failed tx commit because of missed 
schema.
 Key: IGNITE-9830
 URL: https://issues.apache.org/jira/browse/IGNITE-9830
 Project: Ignite
  Issue Type: Improvement
Reporter: Alexei Scherbakov
Assignee: Alexei Scherbakov
 Fix For: 2.8






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-9806) Legacy tx invalidation code breaks data consistency between owners.

2018-10-07 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-9806:
-

 Summary: Legacy tx invalidation code breaks data consistency 
between owners.
 Key: IGNITE-9806
 URL: https://issues.apache.org/jira/browse/IGNITE-9806
 Project: Ignite
  Issue Type: Improvement
Reporter: Alexei Scherbakov
 Fix For: 2.8


Reproducer:

{noformat}
/*
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 *
 *  http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

package org.apache.ignite.internal.processors.cache.transactions;

import java.util.UUID;
import java.util.function.Supplier;
import org.apache.ignite.Ignite;
import org.apache.ignite.IgniteCheckedException;
import org.apache.ignite.IgniteTransactions;
import org.apache.ignite.cache.CacheAtomicityMode;
import org.apache.ignite.cache.CacheMode;
import org.apache.ignite.cache.CacheWriteSynchronizationMode;
import org.apache.ignite.configuration.CacheConfiguration;
import org.apache.ignite.configuration.IgniteConfiguration;
import org.apache.ignite.internal.IgniteEx;
import org.apache.ignite.internal.managers.communication.GridIoPolicy;
import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
import 
org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal;
import org.apache.ignite.internal.util.typedef.G;
import org.apache.ignite.testframework.junits.common.GridCommonAbstractTest;
import org.apache.ignite.testsuites.IgniteIgnore;
import org.apache.ignite.transactions.Transaction;
import org.apache.ignite.transactions.TransactionConcurrency;
import org.apache.ignite.transactions.TransactionIsolation;
import org.jetbrains.annotations.Nullable;
import org.mockito.Mockito;
import org.mockito.invocation.InvocationOnMock;
import org.mockito.stubbing.Answer;

/**
 * Tests data consistency if transaction is failed due to heuristic exception 
on originating node.
 */
public class TxDataConsistencyOnCommitFailureTest extends 
GridCommonAbstractTest {
/** */
public static final int KEY = 0;

/** */
public static final String CLIENT = "client";

/** */
private int nodesCnt;

/** */
private int backups;

/** {@inheritDoc} */
@Override protected IgniteConfiguration getConfiguration(String 
igniteInstanceName) throws Exception {
IgniteConfiguration cfg = super.getConfiguration(igniteInstanceName);

cfg.setClientMode(igniteInstanceName.startsWith(CLIENT));

cfg.setCacheConfiguration(new CacheConfiguration(DEFAULT_CACHE_NAME).
setCacheMode(CacheMode.PARTITIONED).
setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL).
setBackups(backups).

setWriteSynchronizationMode(CacheWriteSynchronizationMode.FULL_SYNC));

return cfg;
}

/** {@inheritDoc} */
@Override protected void afterTest() throws Exception {
super.afterTest();

stopAllGrids();
}

/** */
@IgniteIgnore(value = "https://issues.apache.org/jira/browse/IGNITE-590;, 
forceFailure = false)
public void testCommitErrorOnColocatedNode2PC() throws Exception {
nodesCnt = 3;

backups = 2;

doTestCommitError(() -> primaryNode(KEY, DEFAULT_CACHE_NAME));
}

/**
 * @param factory Factory.
 */
private void doTestCommitError(Supplier factory) throws Exception {
Ignite crd = startGridsMultiThreaded(nodesCnt);

crd.cache(DEFAULT_CACHE_NAME).put(KEY, KEY);

Ignite ignite = factory.get();

if (ignite == null)
ignite = startGrid("client");

assertNotNull(ignite.cache(DEFAULT_CACHE_NAME));

injectMockedTxManager(ignite);

checkKey();

IgniteTransactions transactions = ignite.transactions();

try(Transaction tx = 
transactions.txStart(TransactionConcurrency.PESSIMISTIC, 
TransactionIsolation.REPEATABLE_READ, 0, 1)) {
assertNotNull(transactions.tx());

ignite.cache(DEFAULT_CACHE_NAME).put(KEY, KEY + 1);

tx.commit();

fail();
}
catch (Exception t) {
// No-op.
}

checkKey();

checkFutures();
}

/**
 * @param ignite Ignite.
 */
private void 

[jira] [Created] (IGNITE-9672) Move o.a.i.i.processors.cache.persistence.tree.io.PageMetaIO to metastore.

2018-09-24 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-9672:
-

 Summary: Move 
o.a.i.i.processors.cache.persistence.tree.io.PageMetaIO to metastore.
 Key: IGNITE-9672
 URL: https://issues.apache.org/jira/browse/IGNITE-9672
 Project: Ignite
  Issue Type: Improvement
Reporter: Alexei Scherbakov
 Fix For: 2.8


We have in current implementation special meta page related to snapshot 
functionality.

Meta page is stored in index partition.

If index.bin is removed (for triggering index rebuild), all information is lost 
and incremental snapshot logic is broken.

Solution: move snapshot metadata in metastore.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-9612) Improve checkpoint mark phase speed.

2018-09-17 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-9612:
-

 Summary: Improve checkpoint mark phase speed.
 Key: IGNITE-9612
 URL: https://issues.apache.org/jira/browse/IGNITE-9612
 Project: Ignite
  Issue Type: Improvement
Reporter: Alexei Scherbakov
Assignee: Alexei Scherbakov
 Fix For: 2.7


I'm observing regular slow checkpoints due to long mark duration, which is not 
related to dirty pages number:

{noformat}
2018-09-01 14:55:20.408 [INFO 
][db-checkpoint-thread-#241%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.p.GridCacheDatabaseSharedManager]
 Checkpoint started [checkpointId=01e0c7bf-842f-4ed6-8589-b4904063434f, 
startPtr=FileWALPointer [idx=19814, fileOff=948996096, len=5233457],
checkpointLockWait=0ms, checkpointLockHoldTime=951ms, 
walCpRecordFsyncDuration=39ms, pages=78477, reason='timeout']
2018-09-01 14:55:21.307 [INFO 
][db-checkpoint-thread-#241%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.p.GridCacheDatabaseSharedManager]
 Checkpoint finished [cpId=01e0c7bf-842f-4ed6-8589-b4904063434f, pages=78477, 
markPos=FileWALPointer [idx=19814, fileOff=948996096, len=5233457], 
walSegmentsCleared=0, walSegmentsCovered=[], *markDuration=1002m*s, 
pagesWrite=478ms, fsync=421ms, total=1901ms] 
{noformat}

{noformat}
2018-09-01 14:58:20.355 [INFO 
][db-checkpoint-thread-#241%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.p.GridCacheDatabaseSharedManager]
 Checkpoint started [checkpointId=09d1f4bc-d3f3-4a16-b291-89d7fa745ea5, 
startPtr=FileWALPointer [idx=19814, fileOff=124208, len=5233457], 
checkpointLockWait=0ms, checkpointLockHoldTime=926ms, 
walCpRecordFsyncDuration=14ms, pages=10837, reason='timeout']
2018-09-01 14:58:20.480 [INFO 
][db-checkpoint-thread-#241%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.p.GridCacheDatabaseSharedManager]
 Checkpoint finished [cpId=09d1f4bc-d3f3-4a16-b291-89d7fa745ea5, pages=10837, 
markPos=FileWALPointer [idx=19814, fileOff=124208, len=5233457], 
walSegmentsCleared=0, walSegmentsCovered=[], *markDuration=943ms*, 
pagesWrite=64ms, fsync=61ms, total=1068ms]
{noformat}

Debugging has revealed what this is due to large amount of work required to 
save metadata for metapages and free/reuse lists. Because this is done under 
checkpoint write lock, all other activities are blocked, resulting in increased 
tx and atomic ops latency.

Simple solution: parallelize metadata processing during mark phase.

Best way to solve the problem is described in IGNITE-9520.






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-9548) Transaction with short timeout is not rolled back on primary node resulting in blocked PME

2018-09-11 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-9548:
-

 Summary: Transaction with short timeout is not rolled back on 
primary node resulting in blocked PME
 Key: IGNITE-9548
 URL: https://issues.apache.org/jira/browse/IGNITE-9548
 Project: Ignite
  Issue Type: Bug
Reporter: Alexei Scherbakov
Assignee: Alexei Scherbakov


{noformat}
2018-09-10 12:38:24.237 [WARN 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%][o.apache.ignite.internal.diagnostic]
 Pending transactions:
2018-09-10 12:38:24.242 [WARN 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%][o.apache.ignite.internal.diagnostic]
 >>> [txVer=AffinityTopologyVersion [topVer=343, minorTopVer=0], exchWait=true, 
tx=GridDhtTxLocal [nearNodeId=eb94406c-a132-4998-bf22-b7d74960b866, nearFut
Id=b7cff46b561-0b500010-3ed6-4b79-8cc8-65b3b3b16738, nearMiniId=1, 
nearFinFutId=null, nearFinMiniId=0, nearXidVer=GridCacheVersion 
[topVer=147809766, order=1536687716227, nodeOrder=182], 
super=GridDhtTxLocalAdapter [nearOnOriginatingNode=false, nearNodes=[], 
dhtNodes=[],
 explicitLock=false, super=IgniteTxLocalAdapter [completedBase=null, 
sndTransformedVals=false, depEnabled=false, txState=IgniteTxStateImpl 
[activeCacheIds=[-1934881220], recovery=false, txMap=[IgniteTxEntry 
[key=KeyCacheObjectImpl [part=12715, val=ucp_ids_counter_name_DP
L_ucp_ids_section_name, hasValBytes=true], cacheId=-1934881220, 
txKey=IgniteTxKey [key=KeyCacheObjectImpl [part=12715, 
val=ucp_ids_counter_name_DPL_ucp_ids_section_name, hasValBytes=true], 
cacheId=-1934881220], val=[op=NOOP, val=null], prevVal=[op=NOOP, val=null], 
oldVal
=[op=NOOP, val=null], entryProcessorsCol=null, ttl=-1, conflictExpireTime=-1, 
conflictVer=null, explicitVer=null, dhtVer=null, filters=[], 
filtersPassed=false, filtersSet=false, entry=GridDhtCacheEntry [rdrs=[], 
part=12715, super=GridDistributedCacheEntry [super=GridCach
eMapEntry [key=KeyCacheObjectImpl [part=12715, 
val=ucp_ids_counter_name_DPL_ucp_ids_section_name, hasValBytes=true], 
val=CacheObjectImpl [val=null, hasValBytes=true], startVer=1536665387604, 
ver=GridCacheVersion [topVer=147809766, order=1536737564543, nodeOrder=33], hash
=-864500235, extras=GridCacheMvccEntryExtras [mvcc=GridCacheMvcc 
[locs=[GridCacheMvccCandidate [nodeId=a4823893-be8f-4b24-abca-0a28efde604a, 
ver=GridCacheVersion [topVer=147809766, order=1536737580715, nodeOrder=33], 
threadId=887, id=43118359, topVer=AffinityTopologyVers
ion [topVer=343, minorTopVer=0], reentry=null, 
otherNodeId=eb94406c-a132-4998-bf22-b7d74960b866, otherVer=GridCacheVersion 
[topVer=147809766, order=1536687716227, nodeOrder=182], mappedDhtNodes=null, 
mappedNearNodes=null, ownerVer=GridCacheVersion [topVer=147809766, orde
r=1536737580560, nodeOrder=33], serOrder=null, key=KeyCacheObjectImpl 
[part=12715, val=ucp_ids_counter_name_DPL_ucp_ids_section_name, 
hasValBytes=true], 
masks=local=1|owner=1|ready=1|reentry=0|used=0|tx=1|single_implicit=0|dht_local=1|near_local=0|removed=0|read=0,
 prevV
er=null, nextVer=null]], rmts=null]], flags=2]]], prepared=0, locked=false, 
nodeId=null, locMapped=false, expiryPlc=null, transferExpiryPlc=false, flags=2, 
partUpdateCntr=0, serReadVer=null, xidVer=GridCacheVersion [topVer=147809766, 
order=1536737580715, nodeOrder=33
, super=IgniteTxAdapter [xidVer=GridCacheVersion [topVer=147809766, 
order=1536737580715, nodeOrder=33], writeVer=null, implicit=false, loc=true, 
threadId=887, startTime=1536416065902, 
nodeId=a4823893-be8f-4b24-abca-0a28efde604a, startVer=GridCacheVersion 
[topVer=14780976
6, order=1536737580715, nodeOrder=33], endVer=null, isolation=REPEATABLE_READ, 
concurrency=PESSIMISTIC, timeout=200, sysInvalidate=false, sys=false, plc=2, 
commitVer=null, finalizing=NONE, invalidParts=null, state=MARKED_ROLLBACK, 
timedOut=false, topVer=AffinityTopologyV
ersion [topVer=343, minorTopVer=0], duration=156238330ms, 
onePhaseCommit=false], size=1
{noformat}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-9512) testRollbackOnTopologyLockPessimistic still fails on master.

2018-09-10 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-9512:
-

 Summary: testRollbackOnTopologyLockPessimistic still fails on 
master.
 Key: IGNITE-9512
 URL: https://issues.apache.org/jira/browse/IGNITE-9512
 Project: Ignite
  Issue Type: Bug
Reporter: Alexei Scherbakov
Assignee: Alexei Scherbakov


Looks like fix in [1] was incomplete.

[1] https://issues.apache.org/jira/browse/IGNITE-9401



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-9445) Use valid tag for page write unlock while reading cold page from disk.

2018-08-31 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-9445:
-

 Summary: Use valid tag for page write unlock while reading cold 
page from disk.
 Key: IGNITE-9445
 URL: https://issues.apache.org/jira/browse/IGNITE-9445
 Project: Ignite
  Issue Type: Bug
Reporter: Alexei Scherbakov
Assignee: Alexei Scherbakov


The problem arises when passing pageId with not actual page rotation tag to 
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl#acquirePage(int,
 long, boolean).

It's not possible in advance to know the actual value without reading stored 
page.

Such scenario may lead to locked forever page if passed and persisted tags are 
different.

Solution - unlock page using actual(persisted) tag value.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-9401) Newly added testRollbackOnTopologyLockPessimistic has a race which leads to suite hang.

2018-08-28 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-9401:
-

 Summary: Newly added testRollbackOnTopologyLockPessimistic has a  
race which leads to suite hang.
 Key: IGNITE-9401
 URL: https://issues.apache.org/jira/browse/IGNITE-9401
 Project: Ignite
  Issue Type: Bug
Reporter: Alexei Scherbakov
Assignee: Alexei Scherbakov
 Fix For: 2.7






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-9386) control.sh --tx can produce confusing results when limit is set to small value

2018-08-27 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-9386:
-

 Summary: control.sh --tx can produce confusing results when limit 
is set to small value
 Key: IGNITE-9386
 URL: https://issues.apache.org/jira/browse/IGNITE-9386
 Project: Ignite
  Issue Type: Improvement
Reporter: Alexei Scherbakov


This is happening because currently the limit is applied to primary and backup 
transactions, which breaks output post-filtering (removal of primary and backup 
transactions from output if near is present).

Possible solution: apply limit only to near valid transactions. If some txs 
have no near part (broken tx topology), they should be always visible in 
output, probably with special "broken" marking.

Best way to achieve this - implement tx paging on client side (using continuous 
mapping)




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-9380) Assertion in TxRollbackOnTimeoutTest

2018-08-24 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-9380:
-

 Summary: Assertion in TxRollbackOnTimeoutTest
 Key: IGNITE-9380
 URL: https://issues.apache.org/jira/browse/IGNITE-9380
 Project: Ignite
  Issue Type: Bug
Reporter: Alexei Scherbakov
Assignee: Alexei Scherbakov


{noformat}
java.lang.AssertionError
at 
org.apache.ignite.internal.processors.cache.transactions.IgniteTransactionsImpl.txStart0(IgniteTransactionsImpl.java:182)
at 
org.apache.ignite.internal.processors.cache.transactions.IgniteTransactionsImpl.txStart(IgniteTransactionsImpl.java:94)
at 
org.apache.ignite.internal.processors.cache.transactions.TxRollbackOnTimeoutTest$4.run(TxRollbackOnTimeoutTest.java:454)
at 
org.apache.ignite.testframework.GridTestUtils$6.call(GridTestUtils.java:1254)
at 
org.apache.ignite.testframework.GridTestThread.run(GridTestThread.java:86)
{noformat}

Looks like it's possible because tx can be rolled back by very short timeout 
before onCreated is called.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-9364) SetTxTimeoutOnPartitionMapExchangeTest.java hangs on TC

2018-08-23 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-9364:
-

 Summary: SetTxTimeoutOnPartitionMapExchangeTest.java hangs on TC
 Key: IGNITE-9364
 URL: https://issues.apache.org/jira/browse/IGNITE-9364
 Project: Ignite
  Issue Type: Bug
Reporter: Alexei Scherbakov
Assignee: Ivan Daschinskiy
 Fix For: 2.7
 Attachments: Ignite_Tests_2.4_Java_8_Basic_1_3255.log.zip

Failed run:

https://ci.ignite.apache.org/viewLog.html?buildId=1707476=IgniteTests24Java8_Basic1=buildLog



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-9319) CacheAsyncOperationsFailoverTxTest.testPutAllAsyncFailover is flaky in master.

2018-08-20 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-9319:
-

 Summary: 
CacheAsyncOperationsFailoverTxTest.testPutAllAsyncFailover is flaky in master.
 Key: IGNITE-9319
 URL: https://issues.apache.org/jira/browse/IGNITE-9319
 Project: Ignite
  Issue Type: Bug
Reporter: Alexei Scherbakov
 Fix For: 2.7


https://ci.ignite.apache.org/viewLog.html?buildId=1688647=queuedBuildOverviewTab

https://ci.ignite.apache.org/viewLog.html?buildId=1688542=queuedBuildOverviewTab



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-9246) Optimistic transactions can wait for topology future on remap for a long time even if timeout is set.

2018-08-10 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-9246:
-

 Summary: Optimistic transactions can wait for topology future on 
remap for a long time even if timeout is set.
 Key: IGNITE-9246
 URL: https://issues.apache.org/jira/browse/IGNITE-9246
 Project: Ignite
  Issue Type: Improvement
Reporter: Alexei Scherbakov
Assignee: Alexei Scherbakov


This is possible if long PME is occured during tx remap phase.

Fix: wait for new topology on remap with timeout if set.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-9208) Allow proper handling of transactions if node is stopped using stop(false)

2018-08-07 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-9208:
-

 Summary: Allow proper handling of transactions if node is stopped 
using stop(false)
 Key: IGNITE-9208
 URL: https://issues.apache.org/jira/browse/IGNITE-9208
 Project: Ignite
  Issue Type: Improvement
Affects Versions: 2.7
Reporter: Alexei Scherbakov


Currently if node is stopped, on example, for maintenance using standard 
Ignition.stop(false) all active transactions are most likely will be rolled 
back or event stopped during commit leading to partition desync, which is not 
desirable.

If cancel=false node must wait for graceful termination of all active 
transactions while blocking new requests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-9188) Unexpected eviction leading to data lost in a scenario with stopping/restarting nodes during rebalancing

2018-08-05 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-9188:
-

 Summary: Unexpected eviction leading to data lost in a scenario 
with stopping/restarting nodes during rebalancing
 Key: IGNITE-9188
 URL: https://issues.apache.org/jira/browse/IGNITE-9188
 Project: Ignite
  Issue Type: Bug
Reporter: Alexei Scherbakov
Assignee: Alexei Scherbakov
 Fix For: 2.7


Scenario:

1. Split grid nodes in two groups with distinct partition mapping. One group 
holds even partitions, other - odd. Rebalancing of even partitions is only 
triggered when number of nodes in grid exceeds n/2 threshold.

2. Start n/2 nodes, activate, put data into even partitions.

3. Start other n/2 nodes, change BLT, delay rebalancing of even partitions.

4. Stop newly started nodes before rebalancing is finished.

Expected behavior: parttiions in "even" group will keep owning state.

Actual behavior: even partitions are evicted leading to data loss.

Unit test reproducer:

{noformat}
/*
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 *
 *  http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

package org.apache.ignite.internal.processors.cache.distributed;

import java.util.ArrayList;
import java.util.Collection;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.UUID;
import org.apache.ignite.Ignite;
import org.apache.ignite.cache.CacheAtomicityMode;
import org.apache.ignite.cache.CacheMode;
import org.apache.ignite.cache.affinity.AffinityFunctionContext;
import org.apache.ignite.cache.affinity.rendezvous.RendezvousAffinityFunction;
import org.apache.ignite.cluster.ClusterNode;
import org.apache.ignite.configuration.CacheConfiguration;
import org.apache.ignite.configuration.DataRegionConfiguration;
import org.apache.ignite.configuration.DataStorageConfiguration;
import org.apache.ignite.configuration.IgniteConfiguration;
import org.apache.ignite.configuration.WALMode;
import org.apache.ignite.internal.TestRecordingCommunicationSpi;
import org.apache.ignite.internal.processors.cache.GridCacheUtils;
import 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition;
import 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionDemandMessage;
import org.apache.ignite.internal.util.typedef.G;
import org.apache.ignite.internal.util.typedef.internal.CU;
import org.apache.ignite.internal.util.typedef.internal.U;
import org.apache.ignite.lang.IgniteBiPredicate;
import org.apache.ignite.plugin.extensions.communication.Message;
import org.apache.ignite.testframework.junits.common.GridCommonAbstractTest;
import org.jetbrains.annotations.Nullable;

import static 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtPartitionState.OWNING;

/**
 *
 */
public class CacheLostPartitionsRestoreStateTest extends GridCommonAbstractTest 
{
/** */
public static final long MB = 1024 * 1024L;

/** */
public static final String GRP_ATTR = "grp";

/** */
public static final int GRIDS_CNT = 6;

/** */
public static final String CACHE_1 = "filled";

/** */
public static final String CACHE_2 = "empty";

/** */
public static final String EVEN_GRP = "event";

/** */
public static final String ODD_GRP = "odd";

/** {@inheritDoc} */
@Override protected IgniteConfiguration getConfiguration(String 
igniteInstanceName) throws Exception {
IgniteConfiguration cfg = super.getConfiguration(igniteInstanceName);

cfg.setCommunicationSpi(new TestRecordingCommunicationSpi());

CacheConfiguration ccfg = new CacheConfiguration("default");

ccfg.setAffinity(new RendezvousAffinityFunction(false, 
CacheConfiguration.MAX_PARTITIONS_COUNT));

cfg.setCacheConfiguration(ccfg);

cfg.setPeerClassLoadingEnabled(true);

Map attrs = new HashMap<>();

attrs.put(GRP_ATTR, 
grp(getTestIgniteInstanceIndex(igniteInstanceName)));

cfg.setUserAttributes(attrs);

DataStorageConfiguration memCfg = new DataStorageConfiguration()
.setDefaultDataRegionConfiguration(
new 
DataRegionConfiguration().setPersistenceEnabled(true).setInitialSize(50 

[jira] [Created] (IGNITE-9094) Request for commit check is sent to backup nodes twice on primary node left.

2018-07-26 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-9094:
-

 Summary: Request for commit check is sent to backup nodes twice on 
primary node left.
 Key: IGNITE-9094
 URL: https://issues.apache.org/jira/browse/IGNITE-9094
 Project: Ignite
  Issue Type: Bug
Reporter: Alexei Scherbakov
 Fix For: 2.7


This causes twice as needed messages during recovery.

First place:
{noformat}
at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxFinishRequest.(GridDhtTxFinishRequest.java:161)
at 
org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxFinishFuture.checkCommittedRequest(GridNearTxFinishFuture.java:911)
at 
org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxFinishFuture.access$400(GridNearTxFinishFuture.java:71)
at 
org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxFinishFuture$FinishMiniFuture.onNodeLeft(GridNearTxFinishFuture.java:1005)
at 
org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxFinishFuture.finish(GridNearTxFinishFuture.java:820)
at 
org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxFinishFuture.finish(GridNearTxFinishFuture.java:741)
at 
org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxFinishFuture.doFinish(GridNearTxFinishFuture.java:479)
at 
org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxFinishFuture.finish(GridNearTxFinishFuture.java:417)
at 
org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal$19.apply(GridNearTxLocal.java:3354)
at 
org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal$19.apply(GridNearTxLocal.java:3335)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:383)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.unblock(GridFutureAdapter.java:347)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.unblockAll(GridFutureAdapter.java:335)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:495)
at 
org.apache.ignite.internal.processors.cache.GridCacheCompoundFuture.onDone(GridCacheCompoundFuture.java:56)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:474)
at 
org.apache.ignite.internal.processors.cache.distributed.near.GridNearPessimisticTxPrepareFuture.onDone(GridNearPessimisticTxPrepareFuture.java:409)
at 
org.apache.ignite.internal.processors.cache.distributed.near.GridNearPessimisticTxPrepareFuture.onDone(GridNearPessimisticTxPrepareFuture.java:58)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:451)
at 
org.apache.ignite.internal.util.future.GridCompoundFuture.checkComplete(GridCompoundFuture.java:285)
at 
org.apache.ignite.internal.util.future.GridCompoundFuture.apply(GridCompoundFuture.java:144)
at 
org.apache.ignite.internal.util.future.GridCompoundFuture.apply(GridCompoundFuture.java:45)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:383)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.unblock(GridFutureAdapter.java:347)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.unblockAll(GridFutureAdapter.java:335)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:495)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:474)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:462)
at 
org.apache.ignite.internal.processors.cache.distributed.near.GridNearPessimisticTxPrepareFuture$MiniFuture.onError(GridNearPessimisticTxPrepareFuture.java:515)
at 
org.apache.ignite.internal.processors.cache.distributed.near.GridNearPessimisticTxPrepareFuture$MiniFuture.onNodeLeft(GridNearPessimisticTxPrepareFuture.java:496)
at 
org.apache.ignite.internal.processors.cache.distributed.near.GridNearPessimisticTxPrepareFuture.onNodeLeft(GridNearPessimisticTxPrepareFuture.java:87)
at 
org.apache.ignite.internal.processors.cache.GridCacheMvccManager$4.onEvent(GridCacheMvccManager.java:266)
at 
org.apache.ignite.internal.managers.eventstorage.GridEventStorageManager$LocalListenerWrapper.onEvent(GridEventStorageManager.java:1384)
at 
org.apache.ignite.internal.managers.eventstorage.GridEventStorageManager.notifyListeners(GridEventStorageManager.java:873)
at 
org.apache.ignite.internal.managers.eventstorage.GridEventStorageManager.notifyListeners(GridEventStorageManager.java:858)
at 

[jira] [Created] (IGNITE-8966) IgnitePdsContinuousRestartTest is often timed out in master

2018-07-09 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-8966:
-

 Summary: IgnitePdsContinuousRestartTest is often timed out in 
master
 Key: IGNITE-8966
 URL: https://issues.apache.org/jira/browse/IGNITE-8966
 Project: Ignite
  Issue Type: Bug
Reporter: Alexei Scherbakov


Easily reproducible event locally.

On example for testRebalancingDuringLoad_1000_2_1_1.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-8949) Unexpected exception after node restart during rebalance.

2018-07-06 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-8949:
-

 Summary: Unexpected exception after node restart during rebalance.
 Key: IGNITE-8949
 URL: https://issues.apache.org/jira/browse/IGNITE-8949
 Project: Ignite
  Issue Type: Bug
Reporter: Alexei Scherbakov
Assignee: Alexei Scherbakov


I've got:

{noformat}
Caused by: org.apache.ignite.IgniteCheckedException: Failed to process invalid 
partitions response (remote node reported invalid partitions but remote 
topology version does not differ from local) 
{noformat}

during implicit get tx.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-8942) In some cases grid cannot be deactivated because of hanging CQ internal cleanup.

2018-07-05 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-8942:
-

 Summary: In some cases grid cannot be deactivated because of 
hanging CQ internal cleanup.
 Key: IGNITE-8942
 URL: https://issues.apache.org/jira/browse/IGNITE-8942
 Project: Ignite
  Issue Type: Bug
Reporter: Alexei Scherbakov
Assignee: Alexei Scherbakov
 Attachments: thread_dump_eip-server_2018-07-05-18-02.log

See the attachment for thread dump.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-8921) Add control.sh --cache affinity command to output current and ideal assignment and optionally show diff between them

2018-07-03 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-8921:
-

 Summary: Add control.sh --cache affinity command to output current 
and ideal assignment and optionally show diff between them
 Key: IGNITE-8921
 URL: https://issues.apache.org/jira/browse/IGNITE-8921
 Project: Ignite
  Issue Type: Bug
Reporter: Alexei Scherbakov


Will help debugging.

Ex:

control.sh --cache affinity current
control.sh --cache affinity ideal
control.sh --cache affinity diff



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-8902) GridDhtTxRemote sometimes not rolled back in one phase commit scenario.

2018-06-30 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-8902:
-

 Summary: GridDhtTxRemote sometimes not rolled back in one phase 
commit scenario.
 Key: IGNITE-8902
 URL: https://issues.apache.org/jira/browse/IGNITE-8902
 Project: Ignite
  Issue Type: Bug
Reporter: Alexei Scherbakov
Assignee: Alexei Scherbakov
 Fix For: 2.6


Near node log:

{noformat}
2018-06-28 18:37:14,541][WARN ][sys-#77] The transaction was forcibly rolled 
back because a timeout is reached: 
GridNearTxLocal[xid=c8c6b184461--0871-da69--0010, 
xidVersion=GridCacheVersion [topVer=141679209, order=1530218114188, 
nodeOrder=16], concurrency=PESSIMISTIC, isolation=REPEATABLE_READ, 
state=MARKED_ROLLBACK, invalidate=false, rollbackOnly=true, 
nodeId=36f1c741-dc02-417a-a27d-fcbc90dd8cf1, timeout=100, duration=101, 
label=null]
{noformat}

{noformat}
[2018-06-28 18:37:14,560][ERROR][pool-356018-thread-1] Timeout (0 sec) is 
exceeded.
org.apache.ignite.transactions.TransactionTimeoutException: Failed to acquire 
lock within provided timeout for transaction [timeout=100, tx=GridDhtTxLocal 
[nearNodeId=36f1c741-dc02-417a-a27d-fcbc90dd8cf1, 
nearFutId=a8563574461-ec96bd57-6a94-4303-8ff5-56eaac137f30, nearMiniId=1, 
nearFinFutId=null, nearFinMiniId=0, nearXidVer=GridCacheVersion 
[topVer=141679209, order=1530218114188, nodeOrder=16], 
super=GridDhtTxLocalAdapter [nearOnOriginatingNode=false, nearNodes=[], 
dhtNodes=[06630e42-1c4d-4011-a388-4ec1dd1824fd], explicitLock=false, 
super=IgniteTxLocalAdapter [completedBase=null, sndTransformedVals=false, 
depEnabled=false, txState=IgniteTxStateImpl 
[activeCacheIds=[117538306,117541069], recovery=false, txMap=[IgniteTxEntry 
[key=KeyCacheObjectImpl [part=779, val=5899, hasValBytes=true], 
cacheId=117541069, txKey=IgniteTxKey [key=KeyCacheObjectImpl [part=779, 
val=5899, hasValBytes=true], cacheId=117541069], val=[op=UPDATE, 
val=org.apache.ignite.scenario.internal.model.SampleObject [idHash=1226505441, 
hash=-1035741988, balance=100051, salary=1, fields=HashMap 
{field19=iiwxvrhxlpwqyixvpiregkuqpxuhtuir, 
field17=dyyxoefmichqvstteqjkbdpgmevifvmt, 
field18=iakcqzxcswsxncvztsotrjrlreuvpnsv, 
field22=wvewstllgkwvcxxujbkqkoihudgkkyve, 
field23=blgtxqcnwmexardyujbibiconowvyxvh, 
field20=mhvicfpnmptjreacgatiyobrmvvloxic, 
field21=bxajcavvwuhjvpugfoqohgulihzdbymr, 
field26=xceztfgnlpfoyciwnvhkorrgfllveocl, 
field27=sxzqvvckcgxgjctmygsibtouuzkfievo, 
field24=lsidfhurdjgjlmkrxyqbrdjzmbcicxie, 
field25=vfnmohbvezajifkqiwqbdqpulnynumfz, 
field28=zcewigkcryznakzsyzqzfdrbhklycjer, 
field29=vkctdybyrmtbitxuuqdlsrilxayorjjd, 
field11=lbwqnwwpwgewyjvlobyqwnvifuiggzio, 
field12=rmxclhojshtijttdjirppbkyudpvunht, 
field1=gvfrrpwkhmiziaortptiytwhviwjcpcr, 
field31=yktxbcjiyqfpaytacoajsiybtqocmezz, 
field0=vcorrbnevfunwssjzckdjlbvkynbogce, 
field10=sawaysrchykcvutlwfvglbvrlxvwlghh, 
field15=udrsigcjfetptnmlcnwjgccdqfmhdabv, 
field16=xjyjehlldwwnpbgjjtzwozqthwoefrin, 
field13=hwooamfugkijverkyqyzfccxvqrqjexx, 
field14=doxxkivwxqdhoozzsvwkkimgswrwoegj, 
field7=sxomkgtpjqyqpkrbxqnuknkmpzzpxuou, 
field6=urnknauwekxtgfbaqmesjwllzokdyktt, 
field9=yqhnowhjfrfueoryqlcvdnaddueliwyr, 
field8=nolotdhjdfyotpcvxnrxshaheofsisnd, 
field3=wijyypzycilbqvjirjkorjfrazfmptrj, 
field2=nvznimfolbszmwiosdpyimlvnbrbmxqx, 
field30=xnvglxqnyseduswirxbmxnwhyxlvptch, 
field5=vxzgcyngwzjpopxascdyltgvxcnckzvv, 
field4=gnweoorjfqsbtbsbeiwronzucyzpjwje}, key=5899]], prevVal=[op=NOOP, 
val=null], oldVal=[op=NOOP, val=null], entryProcessorsCol=null, ttl=-1, 
conflictExpireTime=-1, conflictVer=null, explicitVer=null, dhtVer=null, 
filters=[], filtersPassed=false, filtersSet=false, entry=GridDhtCacheEntry 
[rdrs=[], part=779, super=GridDistributedCacheEntry [super=GridCacheMapEntry 
[key=KeyCacheObjectImpl [part=779, val=5899, hasValBytes=true], 
val=org.apache.ignite.scenario.internal.model.SampleObject [idHash=1532725782, 
hash=-640361617, balance=10, salary=1, fields=HashMap 
{field19=iiwxvrhxlpwqyixvpiregkuqpxuhtuir, 
field17=dyyxoefmichqvstteqjkbdpgmevifvmt, 
field18=iakcqzxcswsxncvztsotrjrlreuvpnsv, 
field22=wvewstllgkwvcxxujbkqkoihudgkkyve, 
field23=blgtxqcnwmexardyujbibiconowvyxvh, 
field20=mhvicfpnmptjreacgatiyobrmvvloxic, 
field21=bxajcavvwuhjvpugfoqohgulihzdbymr, 
field26=xceztfgnlpfoyciwnvhkorrgfllveocl, 
field27=sxzqvvckcgxgjctmygsibtouuzkfievo, 
field24=lsidfhurdjgjlmkrxyqbrdjzmbcicxie, 
field25=vfnmohbvezajifkqiwqbdqpulnynumfz, 
field28=zcewigkcryznakzsyzqzfdrbhklycjer, 
field29=vkctdybyrmtbitxuuqdlsrilxayorjjd, 
field11=lbwqnwwpwgewyjvlobyqwnvifuiggzio, 
field12=rmxclhojshtijttdjirppbkyudpvunht, 
field1=gvfrrpwkhmiziaortptiytwhviwjcpcr, 
field31=yktxbcjiyqfpaytacoajsiybtqocmezz, 
field0=vcorrbnevfunwssjzckdjlbvkynbogce, 
field10=sawaysrchykcvutlwfvglbvrlxvwlghh, 
field15=udrsigcjfetptnmlcnwjgccdqfmhdabv, 
field16=xjyjehlldwwnpbgjjtzwozqthwoefrin, 
field13=hwooamfugkijverkyqyzfccxvqrqjexx, 

[jira] [Created] (IGNITE-8876) Deactivate before checkpoint may lead to assertion and node failture.

2018-06-26 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-8876:
-

 Summary: Deactivate before checkpoint may lead to assertion and 
node failture.
 Key: IGNITE-8876
 URL: https://issues.apache.org/jira/browse/IGNITE-8876
 Project: Ignite
  Issue Type: Bug
 Environment: {noformat}
2018-06-10 17:42:34.453 [INFO 
][db-checkpoint-thread-#164%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.p.GridCacheDatabaseSharedManager]
 Checkpoint started [checkpointId=04d24209-ceaf-4c05-bcaa-bfebc8c83148,
startPtr=FileWALPointer [idx=690, fileOff=656836779, len=41], 
checkpointLockWait=0ms, checkpointLockHoldTime=0ms, 
walCpRecordFsyncDuration=0ms, pages=80236, reason='partition destroy']
2018-06-10 17:42:34.470 
[ERROR][db-checkpoint-thread-#164%DPL_GRID%DplGridNodeName%][org.apache.ignite.Ignite]
 Critical system error detected. Will be handled accordingly to configured 
handler [hnd=cla
ss o.a.i.failure.StopNodeOrHaltFailureHandler, failureCtx=FailureContext 
[type=SYSTEM_WORKER_TERMINATION, err=java.lang.AssertionError: Cache group is 
not initialized [grpId=-1903385190]]]
java.lang.AssertionError: Cache group is not initialized [grpId=-1903385190]
  at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$Checkpointer.destroyEvictedPartitions(GridCacheDatabaseSharedManager.java:3350)
  at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$Checkpointer.doCheckpoint(GridCacheDatabaseSharedManager.java:3262)
  at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$Checkpointer.body(GridCacheDatabaseSharedManager.java:3053)
  at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
  at java.lang.Thread.run(Thread.java:745)
2018-06-10 17:42:34.470 
[ERROR][db-checkpoint-thread-#164%DPL_GRID%DplGridNodeName%][org.apache.ignite.Ignite]
 JVM will be halted immediately due to the failure: [failureCtx=FailureContext 
[type=SYSTEM
_WORKER_TERMINATION, err=java.lang.AssertionError: Cache group is not 
initialized [grpId=-1903385190]]]
{noformat}
Reporter: Alexei Scherbakov
 Fix For: 2.6






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-8873) Optimize cache scans with enabled persistence.

2018-06-25 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-8873:
-

 Summary: Optimize cache scans with enabled persistence.
 Key: IGNITE-8873
 URL: https://issues.apache.org/jira/browse/IGNITE-8873
 Project: Ignite
  Issue Type: Improvement
Reporter: Alexei Scherbakov
 Fix For: 2.6


Currently cache scans with enabled persistence involve link resolution, which 
can lead to radom disk access resulting in bad performace on SAS disks.

One possibility is to preload cache data pages to remove slow random disk 
access.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-8863) Race on rollback and prepare on near tx can cause remote tx hang

2018-06-22 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-8863:
-

 Summary: Race on rollback and prepare on near tx can cause remote 
tx hang
 Key: IGNITE-8863
 URL: https://issues.apache.org/jira/browse/IGNITE-8863
 Project: Ignite
  Issue Type: Bug
Reporter: Alexei Scherbakov
Assignee: Alexei Scherbakov


{noformat}
[16:33:56]W: [org.apache.ignite:ignite-core] [2018-06-08 
13:33:56,931][WARN ][sys-#66696%client%][GridNearTxLocal] The transaction was 
forcibly rolled back because a timeout is reached: 
GridNearTxLocal[xid=e198a9fd361--0857-6387--0004, 
xidVersion=GridCacheVersion [topVer=139944839, order=1528464836894, 
nodeOrder=4], concurrency=PESSIMISTIC, isolation=REPEATABLE_READ, 
state=MARKED_ROLLBACK, invalidate=false, rollbackOnly=true, 
nodeId=3c8d85b2-4eb9-46b2-8bd1-6f18f542fc7a, timeout=1, duration=11]

[16:35:55]W: [org.apache.ignite:ignite-core] [2018-06-08 
13:35:55,056][WARN 
][grid-timeout-worker-#66394%transactions.TxRollbackOnTimeoutTest0%][diagnostic]
 Found long running transaction [startTime=13:33:56.931, curTime=13:35:55.054, 
tx=GridDhtTxRemote [nearNodeId=3c8d85b2-4eb9-46b2-8bd1-6f18f542fc7a, 
rmtFutId=af940d0e361-79c59341-3292-46e4-92ce-5c4ef4eddef8, 
nearXidVer=GridCacheVersion [topVer=139944839, order=1528464836894, 
nodeOrder=4], storeWriteThrough=false, super=GridDistributedTxRemoteAdapter 
[explicitVers=null, started=true, commitAllowed=0, 
txState=IgniteTxRemoteSingleStateImpl [entry=IgniteTxEntry 
[key=KeyCacheObjectImpl [part=1, val=1, hasValBytes=true], cacheId=3556498, 
txKey=IgniteTxKey [key=KeyCacheObjectImpl [part=1, val=1, hasValBytes=true], 
cacheId=3556498], val=[op=CREATE, val=CacheObjectImpl [val=null, 
hasValBytes=true]], prevVal=[op=NOOP, val=null], oldVal=[op=NOOP, val=null], 
entryProcessorsCol=null, ttl=-1, conflictExpireTime=-1, conflictVer=null, 
explicitVer=null, dhtVer=null, filters=[], filtersPassed=false, 
filtersSet=false, entry=GridDhtCacheEntry [rdrs=[], part=1, 
super=GridDistributedCacheEntry [super=GridCacheMapEntry 
[key=KeyCacheObjectImpl [part=1, val=1, hasValBytes=true], val=CacheObjectImpl 
[val=null, hasValBytes=true], startVer=1528464836879, ver=GridCacheVersion 
[topVer=139944839, order=1528464836863, nodeOrder=2], hash=1, 
extras=GridCacheMvccEntryExtras [mvcc=GridCacheMvcc [locs=null, 
rmts=[GridCacheMvccCandidate [nodeId=97ee44cd-73c9-4e79-95df-e1a03481, 
ver=GridCacheVersion [topVer=139944839, order=1528464836897, nodeOrder=2], 
threadId=75880, id=2310313, topVer=AffinityTopologyVersion [topVer=-1, 
minorTopVer=0], reentry=null, otherNodeId=3c8d85b2-4eb9-46b2-8bd1-6f18f542fc7a, 
otherVer=null, mappedDhtNodes=null, mappedNearNodes=null, ownerVer=null, 
serOrder=null, key=KeyCacheObjectImpl [part=1, val=1, hasValBytes=true], 
masks=local=0|owner=0|ready=0|reentry=0|used=0|tx=1|single_implicit=0|dht_local=0|near_local=0|removed=0|read=0,
 prevVer=null, nextVer=null], GridCacheMvccCandidate 
[nodeId=97ee44cd-73c9-4e79-95df-e1a03481, ver=GridCacheVersion 
[topVer=139944839, order=1528464836900, nodeOrder=2], threadId=75875, 
id=2310317, topVer=AffinityTopologyVersion [topVer=-1, minorTopVer=0], 
reentry=null, otherNodeId=3c8d85b2-4eb9-46b2-8bd1-6f18f542fc7a, otherVer=null, 
mappedDhtNodes=null, mappedNearNodes=null, ownerVer=null, serOrder=null, 
key=KeyCacheObjectImpl [part=1, val=1, hasValBytes=true], 
masks=local=0|owner=1|ready=0|reentry=0|used=1|tx=1|single_implicit=0|dht_local=0|near_local=0|removed=0|read=0,
 prevVer=null, nextVer=null, flags=2]]], prepared=1, locked=false, 
nodeId=null, locMapped=false, expiryPlc=null, transferExpiryPlc=false, flags=0, 
partUpdateCntr=0, serReadVer=null, xidVer=null]], skipCompletedVers=false, 
super=IgniteTxAdapter [xidVer=GridCacheVersion [topVer=139944839, 
order=1528464836897, nodeOrder=2], writeVer=GridCacheVersion [topVer=139944839, 
order=1528464836898, nodeOrder=2], implicit=false, loc=false, threadId=75880, 
startTime=1528464836931, nodeId=97ee44cd-73c9-4e79-95df-e1a03481, 
startVer=GridCacheVersion [topVer=139944839, order=1528464836864, nodeOrder=1], 
endVer=null, isolation=REPEATABLE_READ, concurrency=PESSIMISTIC, timeout=1, 
sysInvalidate=false, sys=false, plc=2, commitVer=null, finalizing=NONE, 
invalidParts=null, state=PREPARED, timedOut=false, 
topVer=AffinityTopologyVersion [topVer=4, minorTopVer=0], duration=118123ms, 
onePhaseCommit=false
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-8846) Optimize entry transform operations.

2018-06-21 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-8846:
-

 Summary: Optimize entry transform operations.
 Key: IGNITE-8846
 URL: https://issues.apache.org/jira/browse/IGNITE-8846
 Project: Ignite
  Issue Type: Improvement
Reporter: Alexei Scherbakov
Assignee: Alexei Scherbakov


1. For pessimistic transactions entryProcessor is invoked twice if tx entry is 
already exists in [1]
and after lock acquistion in [2]

Actually this is enough to do it only once in postLockWrite.

2. Cache entry value is not needed on near node if EntryProcessor declares Void 
return type.

We should try to detect this in runtime or provide some kind of annotation to 
mark EntryProcessor not caring about return value. This will bring huge 
performance benefit for transactions updating large values using 
transformations.

[1] 
org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal#enlistWriteEntry

[2] 
org.apache.ignite.internal.processors.cache.transactions.IgniteTxLocalAdapter#postLockWrite



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-8809) Add ability to control.sh to force rebalance for specific partitions on given nodes.

2018-06-15 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-8809:
-

 Summary: Add ability to control.sh to force rebalance for specific 
partitions on given nodes.
 Key: IGNITE-8809
 URL: https://issues.apache.org/jira/browse/IGNITE-8809
 Project: Ignite
  Issue Type: Improvement
Reporter: Alexei Scherbakov


Sometimes it's desirable to force rebalance for specific partitions on given 
nodes, for example, for test reasons or fixing synchronizations issues without 
nodes downtime.

control.sh should contain new command: rebalance, which will execute the 
exchange request carried by new message type, containing partitions for 
rebalancing and mode: full (evict + move) or delta (historical, using counters).

Example:

control.sh --rebalance [full|delta] nodeId:p1,p2,p3 node2:p4,p5 ...




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-8808) Improve control.sh --tx command to show local and remote transactions.

2018-06-15 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-8808:
-

 Summary: Improve control.sh --tx command to show local and remote 
transactions.
 Key: IGNITE-8808
 URL: https://issues.apache.org/jira/browse/IGNITE-8808
 Project: Ignite
  Issue Type: Improvement
Affects Versions: 2.5
Reporter: Alexei Scherbakov
Assignee: Alexei Scherbakov
 Fix For: 2.6


Currently --tx option for control.sh shows only transactions found on 
near(initiating) nodes.

Due to various issues it's possible to have corresponding dht local and remote 
transaction without near part.

Such transactions must be visible to utility.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-8743) TcpCommunicationSpi hangs in rare circumstances on outgoing descriptor reservation.

2018-06-07 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-8743:
-

 Summary: TcpCommunicationSpi hangs in rare circumstances on 
outgoing descriptor reservation.
 Key: IGNITE-8743
 URL: https://issues.apache.org/jira/browse/IGNITE-8743
 Project: Ignite
  Issue Type: Bug
Reporter: Alexei Scherbakov
Assignee: Alexei Scherbakov


Relevant stack trace:

{noformat}
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:502)
at 
org.apache.ignite.internal.util.nio.GridNioRecoveryDescriptor.reserve(GridNioRecoveryDescriptor.java:275)
- locked <0x7fca4b14f560> (a 
org.apache.ignite.internal.util.nio.GridNioRecoveryDescriptor)
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3140)
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioClient(TcpCommunicationSpi.java:2863)
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2750)
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2611)
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2575)
at 
org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1642)
at 
org.apache.ignite.internal.managers.communication.GridIoManager.sendToGridTopic(GridIoManager.java:1714)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.send(GridCacheIoManager.java:1166)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridPartitionedSingleGetFuture.map(GridPartitionedSingleGetFuture.java:311)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridPartitionedSingleGetFuture.init(GridPartitionedSingleGetFuture.java:208)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.colocated.GridDhtColocatedCache.loadAsync(GridDhtColocatedCache.java:389)
at 
org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal.loadMissing(GridNearTxLocal.java:2506)
at 
org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal.checkMissed(GridNearTxLocal.java:3888)
at 
org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal.getAllAsync(GridNearTxLocal.java:1927)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.colocated.GridDhtColocatedCache$4.op(GridDhtColocatedCache.java:197)
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-8684) Partition state exchange during rebalance continues to keep sending state messages (single,full) in loop even if no changes in partitions states

2018-06-01 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-8684:
-

 Summary: Partition state exchange during rebalance continues to 
keep sending state messages (single,full) in loop even if no changes in 
partitions states
 Key: IGNITE-8684
 URL: https://issues.apache.org/jira/browse/IGNITE-8684
 Project: Ignite
  Issue Type: Bug
Reporter: Alexei Scherbakov






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-8651) VisorTxTask fails then printing transactions having implicit single type.

2018-05-30 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-8651:
-

 Summary: VisorTxTask fails then printing transactions having 
implicit single type.
 Key: IGNITE-8651
 URL: https://issues.apache.org/jira/browse/IGNITE-8651
 Project: Ignite
  Issue Type: Bug
Reporter: Alexei Scherbakov
Assignee: Alexei Scherbakov
 Fix For: 2.6


org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal#mappings
 returns null for IgniteTxMappingsSingleImpl



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-8481) VisorValidateIndexesJob works very slowly in case of many partitions/keys for each partition.

2018-05-14 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-8481:
-

 Summary: VisorValidateIndexesJob works very slowly in case of many 
partitions/keys for each partition.
 Key: IGNITE-8481
 URL: https://issues.apache.org/jira/browse/IGNITE-8481
 Project: Ignite
  Issue Type: Bug
Affects Versions: 2.5
Reporter: Alexei Scherbakov
 Fix For: 2.6
 Attachments: ignite.zip, thrdump-server.log

I tried to validate indexes using newly introduced VisorValidateIndexesTask 
from control.sh and found what on large data set it works very slowly. Process 
was not finished for 12 hours from start.

Looking through a thread dump I've noticed following problems:

1. ValidateIndexesClosure works not in optimal way by doing btree lookup for 
each index for each entry of each partition. It should be faster to validate by 
scanning index tree.

2. Thread dump shows contention on acquiring segment read lock by worker 
pool-XXX threads, but no obvious reason for holding write lock (no load on grid)

3. 
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.Segment#partGeneration
 generates garbage on each page access.

Check attachment for log and thread dump.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-8412) Bug with cache name in org.apache.ignite.util.GridCommandHandlerTest#testCacheContention brokes tests in security module.

2018-04-27 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-8412:
-

 Summary: Bug with cache name in 
org.apache.ignite.util.GridCommandHandlerTest#testCacheContention brokes tests 
in security module.
 Key: IGNITE-8412
 URL: https://issues.apache.org/jira/browse/IGNITE-8412
 Project: Ignite
  Issue Type: Bug
Reporter: Alexei Scherbakov
Assignee: Alexei Scherbakov
 Fix For: 2.5






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-8375) NPE due to race on cache stop and timeout handler execution.

2018-04-24 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-8375:
-

 Summary: NPE due to race on cache stop and timeout handler 
execution.
 Key: IGNITE-8375
 URL: https://issues.apache.org/jira/browse/IGNITE-8375
 Project: Ignite
  Issue Type: Bug
Affects Versions: 2.4
Reporter: Alexei Scherbakov
 Fix For: 2.6


NPE caused by execution of method [1] during timeout handler execution [2]:

cacheCfg.isLoadPreviousValue() throws NPE because cacheCfg can be nulled by [3] 
on stop.

[1] 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLockFuture#loadMissingFromStore
[2] 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLockFuture.LockTimeoutObject#onTimeout
[3] org.apache.ignite.internal.processors.cache.GridCacheContext#cleanup



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-8360) Page recovery from WAL can be very slow.

2018-04-23 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-8360:
-

 Summary: Page recovery from WAL can be very slow.
 Key: IGNITE-8360
 URL: https://issues.apache.org/jira/browse/IGNITE-8360
 Project: Ignite
  Issue Type: Improvement
  Components: persistence
Affects Versions: 2.4
Reporter: Alexei Scherbakov
 Fix For: 2.6


Current implementation tries to recover corrupted page from WAL, potentially 
scanning all archived segments [1]

If archive is very large, on example due to large history or enabled 
point-in-time recovery, this might take significant time preventing cache start 
with consequences like hanging PME.

[1] 
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl#tryToRestorePage



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-8358) Deadlock in IgnitePdsAtomicCacheRebalancingTest

2018-04-23 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-8358:
-

 Summary: Deadlock in IgnitePdsAtomicCacheRebalancingTest
 Key: IGNITE-8358
 URL: https://issues.apache.org/jira/browse/IGNITE-8358
 Project: Ignite
  Issue Type: Bug
Affects Versions: 2.4
Reporter: Alexei Scherbakov
 Fix For: 2.6


Deadlocked threads are:

{noformat}
[14:21:46] : [Step 3/4] # DEADLOCKED Thread 
[name="sys-#22788%persistence.IgnitePdsAtomicCacheRebalancingTest2%", id=25953, 
state=WAITING, blockCnt=0, waitCnt=2]
[14:21:46] : [Step 3/4] Lock 
[object=java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@adcfad9, 
ownerName=exchange-worker-#22778%persistence.IgnitePdsAtomicCacheRebalancingTest2%,
 ownerId=25941]
[14:21:46] : [Step 3/4] at sun.misc.Unsafe.park(Native Method)
[14:21:46] : [Step 3/4] at 
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
[14:21:46] : [Step 3/4] at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
[14:21:46] : [Step 3/4] at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
[14:21:46] : [Step 3/4] at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
[14:21:46] : [Step 3/4] at 
java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
[14:21:46] : [Step 3/4] at 
o.a.i.i.processors.cache.distributed.dht.GridDhtPartitionTopologyImpl.localPartitionMap(GridDhtPartitionTopologyImpl.java:1000)
[14:21:46] : [Step 3/4] at 
o.a.i.i.processors.cache.GridCachePartitionExchangeManager.createPartitionsSingleMessage(GridCachePartitionExchangeManager.java:1250)
[14:21:46] : [Step 3/4] at 
o.a.i.i.processors.cache.GridCachePartitionExchangeManager.sendLocalPartitions(GridCachePartitionExchangeManager.java:1205)
[14:21:46] : [Step 3/4] at 
o.a.i.i.processors.cache.GridCachePartitionExchangeManager.refreshPartitions(GridCachePartitionExchangeManager.java:1036)
[14:21:46] : [Step 3/4] at 
o.a.i.i.processors.cache.GridCachePartitionExchangeManager$ResendTimeoutObject$1.run(GridCachePartitionExchangeManager.java:2663)
[14:21:46] : [Step 3/4] at 
o.a.i.i.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:6751)
[14:21:46] : [Step 3/4] at 
o.a.i.i.processors.closure.GridClosureProcessor$1.body(GridClosureProcessor.java:827)
[14:21:46] : [Step 3/4] at 
o.a.i.i.util.worker.GridWorker.run(GridWorker.java:110)
[14:21:46] : [Step 3/4] at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[14:21:46] : [Step 3/4] at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[14:21:46] : [Step 3/4] at java.lang.Thread.run(Thread.java:745)
[14:21:46] : [Step 3/4]
[14:21:46] : [Step 3/4] Locked synchronizers:
[14:21:46] : [Step 3/4] 
java.util.concurrent.ThreadPoolExecutor$Worker@469d36ed

[14:21:46] : [Step 3/4] # DEADLOCKED Thread 
[name="sys-#22787%persistence.IgnitePdsAtomicCacheRebalancingTest2%", id=25952, 
state=WAITING, blockCnt=0, waitCnt=3]
[14:21:46] : [Step 3/4] Lock 
[object=java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@3a2e9f5b, 
ownerName=exchange-worker-#22778%persistence.IgnitePdsAtomicCacheRebalancingTest2%,
 ownerId=25941]
[14:21:46] : [Step 3/4] at sun.misc.Unsafe.park(Native Method)
[14:21:46] : [Step 3/4] at 
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
[14:21:46] : [Step 3/4] at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
[14:21:46] : [Step 3/4] at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
[14:21:46] : [Step 3/4] at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
[14:21:46] : [Step 3/4] at 
java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
[14:21:46] : [Step 3/4] at 
o.a.i.i.util.StripedCompositeReadWriteLock$WriteLock.lock0(StripedCompositeReadWriteLock.java:154)
[14:21:46] : [Step 3/4] at 
o.a.i.i.util.StripedCompositeReadWriteLock$WriteLock.lock(StripedCompositeReadWriteLock.java:123)
[14:21:46] : [Step 3/4] at 
o.a.i.i.processors.cache.distributed.dht.GridDhtPartitionTopologyImpl.onEvicted(GridDhtPartitionTopologyImpl.java:2444)
[14:21:46] : [Step 3/4] at 
o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPreloader.onPartitionEvicted(GridDhtPreloader.java:433)

[jira] [Created] (IGNITE-8075) Add support for two new public methods in .NET API

2018-03-29 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-8075:
-

 Summary: Add support for two new public methods in .NET API
 Key: IGNITE-8075
 URL: https://issues.apache.org/jira/browse/IGNITE-8075
 Project: Ignite
  Issue Type: Improvement
Affects Versions: 2.4
Reporter: Alexei Scherbakov
Assignee: Pavel Tupitsyn
 Fix For: 2.5


Neet to add two described method as part of .NET API.

withLabel
localActiveTransactions

Java implementation is currently available in branch [1]

[1] https://github.com/gridgain/apache-ignite/tree/ignite-6827-2



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-8074) Allow changing of tx rollback timeout on exchange in runtime.

2018-03-29 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-8074:
-

 Summary: Allow changing of tx rollback timeout on exchange in 
runtime.
 Key: IGNITE-8074
 URL: https://issues.apache.org/jira/browse/IGNITE-8074
 Project: Ignite
  Issue Type: Improvement
Affects Versions: 2.4
Reporter: Alexei Scherbakov
 Fix For: 2.5


It's desirable to have the possibility changing in runtime tx rollback timeout, 
introduced in IGNITE-6827.

Simplest implementation: use JMX method call.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-8000) Implicit transactions may not finish properly on unstable topology.

2018-03-20 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-8000:
-

 Summary: Implicit transactions may not finish properly on unstable 
topology.
 Key: IGNITE-8000
 URL: https://issues.apache.org/jira/browse/IGNITE-8000
 Project: Ignite
  Issue Type: Bug
Affects Versions: 2.4
Reporter: Alexei Scherbakov
 Fix For: 2.5


Add default tx timeout [1] to IgniteCacheMultiTxLockSelfTest test configuration.

[1] c.getTransactionConfiguration().setDefaultTxTimeout(10);

Looks like in some case remote tx is added to rolled back version (because 
partition is gone) and subsequent near request for the same tx to this node 
fails.

This is not happen if timeouts are disabled because corresponding check is 
skipped.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-7915) Add transaction debugging support in JMX

2018-03-12 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-7915:
-

 Summary: Add transaction debugging support in JMX
 Key: IGNITE-7915
 URL: https://issues.apache.org/jira/browse/IGNITE-7915
 Project: Ignite
  Issue Type: Improvement
Reporter: Alexei Scherbakov


Detailed description in  IGNITE-7910, paragraph 4.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-7914) Add transaction debugging support in control.sh

2018-03-12 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-7914:
-

 Summary: Add transaction debugging support in control.sh
 Key: IGNITE-7914
 URL: https://issues.apache.org/jira/browse/IGNITE-7914
 Project: Ignite
  Issue Type: Improvement
Affects Versions: 2.3
Reporter: Alexei Scherbakov
 Fix For: 2.5


Detailed description in IGNITE-7910, paragraph 3.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-7913) Current implementation of Internal Diagnostics may cause OOM on server nodes.

2018-03-12 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-7913:
-

 Summary: Current implementation of Internal Diagnostics may cause 
OOM on server nodes.
 Key: IGNITE-7913
 URL: https://issues.apache.org/jira/browse/IGNITE-7913
 Project: Ignite
  Issue Type: Improvement
Affects Versions: 2.3
Reporter: Alexei Scherbakov
 Fix For: 2.5


If many transactions are active in grid, Internal Diagnostics can cause OOM on 
server nodes serving IgniteDiagnosticMessage because of heap buffering.

See the stack trace demonstrating the issue:

{noformat}
at 
org.apache.ignite.internal.util.tostring.GridToStringBuilder.toStringImpl(GridToStringBuilder.java:1012)
at 
org.apache.ignite.internal.util.tostring.GridToStringBuilder.toString(GridToStringBuilder.java:762)
at 
org.apache.ignite.internal.util.tostring.GridToStringBuilder.toString(GridToStringBuilder.java:710)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheEntry.toString(GridDhtCacheEntry.java:818)
at java.lang.String.valueOf(String.java:2994)
at 
org.apache.ignite.internal.util.GridStringBuilder.a(GridStringBuilder.java:101)
at 
org.apache.ignite.internal.util.tostring.SBLimitedLength.a(SBLimitedLength.java:88)
at 
org.apache.ignite.internal.util.tostring.GridToStringBuilder.toString(GridToStringBuilder.java:939)
at 
org.apache.ignite.internal.util.tostring.GridToStringBuilder.toStringImpl(GridToStringBuilder.java:1005)
at 
org.apache.ignite.internal.util.tostring.GridToStringBuilder.toString(GridToStringBuilder.java:826)
at 
org.apache.ignite.internal.util.tostring.GridToStringBuilder.toString(GridToStringBuilder.java:783)
at 
org.apache.ignite.internal.processors.cache.transactions.IgniteTxEntry.toString(IgniteTxEntry.java:1267)
at java.lang.String.valueOf(String.java:2994)
at java.lang.StringBuilder.append(StringBuilder.java:131)
at java.util.AbstractMap.toString(AbstractMap.java:559)
at java.lang.String.valueOf(String.java:2994)
at 
org.apache.ignite.internal.util.GridStringBuilder.a(GridStringBuilder.java:101)
at 
org.apache.ignite.internal.util.tostring.SBLimitedLength.a(SBLimitedLength.java:88)
at 
org.apache.ignite.internal.util.tostring.GridToStringBuilder.toString(GridToStringBuilder.java:939)
at 
org.apache.ignite.internal.util.tostring.GridToStringBuilder.toStringImpl(GridToStringBuilder.java:1005)
at 
org.apache.ignite.internal.util.tostring.GridToStringBuilder.toString(GridToStringBuilder.java:864)
at 
org.apache.ignite.internal.processors.cache.transactions.IgniteTxRemoteStateImpl.toString(IgniteTxRemoteStateImpl.java:180)
at java.lang.String.valueOf(String.java:2994)
at 
org.apache.ignite.internal.util.GridStringBuilder.a(GridStringBuilder.java:101)
at 
org.apache.ignite.internal.util.tostring.SBLimitedLength.a(SBLimitedLength.java:88)
at 
org.apache.ignite.internal.util.tostring.GridToStringBuilder.toString(GridToStringBuilder.java:939)
at 
org.apache.ignite.internal.util.tostring.GridToStringBuilder.toStringImpl(GridToStringBuilder.java:1005)
at 
org.apache.ignite.internal.util.tostring.GridToStringBuilder.toString(GridToStringBuilder.java:826)
at 
org.apache.ignite.internal.util.tostring.GridToStringBuilder.toString(GridToStringBuilder.java:783)
at 
org.apache.ignite.internal.processors.cache.distributed.GridDistributedTxRemoteAdapter.toString(GridDistributedTxRemoteAdapter.java:926)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxRemote.toString(GridDhtTxRemote.java:373)
at java.lang.String.valueOf(String.java:2994)
at 
org.apache.ignite.internal.util.GridStringBuilder.a(GridStringBuilder.java:101)
at 
org.apache.ignite.internal.util.tostring.SBLimitedLength.a(SBLimitedLength.java:88)
at 
org.apache.ignite.internal.util.tostring.GridToStringBuilder.toString(GridToStringBuilder.java:939)
at 
org.apache.ignite.internal.util.tostring.GridToStringBuilder.toStringImpl(GridToStringBuilder.java:1005)
at 
org.apache.ignite.internal.util.tostring.GridToStringBuilder.toString(GridToStringBuilder.java:826)
at 
org.apache.ignite.internal.util.tostring.GridToStringBuilder.toString(GridToStringBuilder.java:783)
at 
org.apache.ignite.internal.processors.cache.transactions.IgniteTxAdapter$TxFinishFuture.toString(IgniteTxAdapter.java:2405)
at java.lang.String.valueOf(String.java:2994)
at java.lang.StringBuilder.append(StringBuilder.java:131)
at java.util.AbstractCollection.toString(AbstractCollection.java:462)
at java.lang.String.valueOf(String.java:2994)
at java.lang.StringBuilder.append(StringBuilder.java:131)
at 

[jira] [Created] (IGNITE-7910) Improve transaction debugging support

2018-03-11 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-7910:
-

 Summary: Improve transaction debugging support
 Key: IGNITE-7910
 URL: https://issues.apache.org/jira/browse/IGNITE-7910
 Project: Ignite
  Issue Type: Improvement
Affects Versions: 2.3
Reporter: Alexei Scherbakov
 Fix For: 2.5


Currently there is no good means to debug problematic transactions without 
parsing cryptic logs on whole grid.

I suggest adding several improvents to mitigate the issue:

1. Add chaining method Transaction.withMeta(String) to attach transaction 
descrtiption.

2. Add method localActiveTransaction to IgniteTransactions interface, which 
will return all active near transactions for local node.

3. Extend control.sh to support retrieving active transactions information from 
grid nodes.
By default it shows N (specified by user) transactions ordered by longest 
duration.
For each transaction is shown:

Near node id(IP, hostname) / xid / state / duration / dht topology / meta from 
1 if presents

It should support filtering by near node / state / duration and printing info 
for single tx if single xid is specified as argument.

In addition to that each transaction from the list may be forcibly rolled back 
by xid.

4. Add mbean with same functionality as in 3.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-7787) Better error reporting when issuing PDS corruptions.

2018-02-22 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-7787:
-

 Summary: Better error reporting when issuing PDS corruptions.
 Key: IGNITE-7787
 URL: https://issues.apache.org/jira/browse/IGNITE-7787
 Project: Ignite
  Issue Type: Improvement
Affects Versions: 2.3
Reporter: Alexei Scherbakov
 Fix For: 2.5


If PDS is corrupted in any way and update hits bad page shown error message is 
not very helping, usually something like "Failed to get page IO instance (page 
content is corrupted)"

For corruptions related to CacheDataRowStore error should contain information 
about how to fix the issue: clear data for cache/group and restart node for 
partition reloading.

For corruptions related to H2Tree (SQL indexes) error should contain suggestion 
to remove index.bin for broken partition and restart node allowing index to 
rebuild.





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-7648) Revert IGNITE_ENABLE_FORCIBLE_NODE_KILL system property.

2018-02-08 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-7648:
-

 Summary: Revert IGNITE_ENABLE_FORCIBLE_NODE_KILL system property.
 Key: IGNITE-7648
 URL: https://issues.apache.org/jira/browse/IGNITE-7648
 Project: Ignite
  Issue Type: Improvement
Affects Versions: 2.3
Reporter: Alexei Scherbakov
Assignee: Alexei Scherbakov
 Fix For: 2.5


IGNITE_ENABLE_FORCIBLE_NODE_KILL system property was introduced in IGNITE-5718 
as a way to prevent unnecessary node drops in case of short network problems.

I suppose it's wrong decision to fix it in such way.

We had faced some issues in our production due to lack of automatic kicking of 
ill-behaving nodes (on example, hanging due to long GC pauses) until we 
realised the necessity of changing default behavior via property.

Right solution is to kick nodes only if failure threshold is reached. Such 
behavior should be always enabled.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-7585) GridDhtLockFuture related memory leak

2018-01-31 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-7585:
-

 Summary: GridDhtLockFuture related memory leak
 Key: IGNITE-7585
 URL: https://issues.apache.org/jira/browse/IGNITE-7585
 Project: Ignite
  Issue Type: Bug
Affects Versions: 2.3
Reporter: Alexei Scherbakov
Assignee: Alexei Scherbakov
 Fix For: 2.5
 Attachments: memleak.jpg

GridDhtLockFuture related LockTimeoutObject is not removed on commit, resulting 
in tx reference until timeout handler is triggered.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-7204) Unexpected behavior if passing null to binaryObject.field method

2017-12-14 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-7204:
-

 Summary: Unexpected behavior if passing null to binaryObject.field 
method
 Key: IGNITE-7204
 URL: https://issues.apache.org/jira/browse/IGNITE-7204
 Project: Ignite
  Issue Type: Improvement
Affects Versions: 2.3
Reporter: Alexei Scherbakov
 Fix For: 2.4


If assertions are disabled, when first field will be returned.

If not, an AssertionError will be thrown.

Reproducer:

{noformat}
public void testNullField() throws Exception {
try {
final IgniteEx ex = startGrid(0);

final IgniteCache test = 
ex.cache("test").withKeepBinary();

final BinaryObjectBuilder bldr = ex.binary().builder("bldr");

bldr.setField("x", 1);

test.put(0, bldr.build());

test.query(new ScanQuery<>(new IgniteBiPredicate() {
@Override public boolean apply(Integer o, BinaryObject o2) {
final Object q = o2.field(null);

return false;
}
})).getAll();
}
finally {
stopAllGrids();
}
}
{noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-7166) SQL join with partition and replicated caches fails if number of partitions is too low.

2017-12-11 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-7166:
-

 Summary: SQL join with partition and replicated caches fails if 
number of partitions is too low.
 Key: IGNITE-7166
 URL: https://issues.apache.org/jira/browse/IGNITE-7166
 Project: Ignite
  Issue Type: Bug
Affects Versions: 2.3
Reporter: Alexei Scherbakov
 Fix For: 2.4


Reproducer:

{noformat}
/*
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 *
 *  http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

package org.apache.ignite.internal.processors.cache.distributed.replicated;

import java.util.List;
import org.apache.ignite.Ignite;
import org.apache.ignite.cache.affinity.rendezvous.RendezvousAffinityFunction;
import org.apache.ignite.cache.query.FieldsQueryCursor;
import org.apache.ignite.cache.query.SqlFieldsQuery;
import org.apache.ignite.cache.query.annotations.QuerySqlField;
import org.apache.ignite.configuration.CacheConfiguration;
import org.apache.ignite.configuration.IgniteConfiguration;
import org.apache.ignite.testframework.junits.common.GridCommonAbstractTest;

import static org.apache.ignite.cache.CacheAtomicityMode.TRANSACTIONAL;
import static org.apache.ignite.cache.CacheMode.PARTITIONED;
import static org.apache.ignite.cache.CacheMode.REPLICATED;
import static org.apache.ignite.cache.CacheWriteSynchronizationMode.FULL_SYNC;

/**
 * Tests non collocated join with replicated cache.
 */
public class IgniteCacheReplicatedJoinSelfTest extends GridCommonAbstractTest {
/** */
public static final String REP_CACHE_NAME = "repCache";

/** */
public static final String PART_CACHE_NAME = "partCache";

/** */
public static final int REP_CNT = 3;

/** */
public static final int PART_CNT = 10_000;

/** {@inheritDoc} */
@Override protected IgniteConfiguration getConfiguration(String 
igniteInstanceName) throws Exception {
final IgniteConfiguration cfg = 
super.getConfiguration(igniteInstanceName);

cfg.setClientMode("client".equals(igniteInstanceName));

final CacheConfiguration ccfg1 = new 
CacheConfiguration(PART_CACHE_NAME);
ccfg1.setCacheMode(PARTITIONED);
ccfg1.setAtomicityMode(TRANSACTIONAL);
ccfg1.setWriteSynchronizationMode(FULL_SYNC);
ccfg1.setIndexedTypes(Integer.class, PartValue.class);

final CacheConfiguration ccfg2 = new CacheConfiguration(REP_CACHE_NAME);
ccfg2.setAffinity(new RendezvousAffinityFunction(false, REP_CNT));
ccfg2.setCacheMode(REPLICATED);
ccfg2.setAtomicityMode(TRANSACTIONAL);
ccfg2.setWriteSynchronizationMode(FULL_SYNC);
ccfg2.setIndexedTypes(Integer.class, RepValue.class);

cfg.setCacheConfiguration(ccfg1, ccfg2);

return cfg;
}

/**
 *
 * @throws Exception
 */
public void testJoinNonCollocated() throws Exception {
startGridsMultiThreaded(3);

final Ignite client = startGrid("client");

for (int i = 0; i < REP_CNT; i++)
client.cache(REP_CACHE_NAME).put(i, new RepValue(i, "rep" + i));

for (int i = 0; i < PART_CNT; i++)
client.cache(PART_CACHE_NAME).put(i, new PartValue(i, "part" + i, 
((i + 1) % REP_CNT)));

final FieldsQueryCursor qry = client.cache(PART_CACHE_NAME).
query(new SqlFieldsQuery("select PartValue._VAL, r._VAL from 
PartValue, \"repCache\".RepValue as r where PartValue.repId=r.id"));

final List all = qry.getAll();

assertEquals(10_000, all.size());

for (List objects : all) {
final PartValue pv = (PartValue)objects.get(0);
final RepValue rv = (RepValue)objects.get(1);

assertNotNull(rv);

assertEquals(rv.getId(), pv.getRepId());
}
}

/** */
public static class PartValue {
/** Id. */
@QuerySqlField
private int id;

/** Name. */
@QuerySqlField
private String name;

/** Rep id. */
@QuerySqlField
private int repId;

/**
 * @param id Id.
 * @param name Name.
 * @param repId Rep id.
 */
public PartValue(int id, String name, int repId) {
   

[jira] [Created] (IGNITE-7049) Optimistic transaction is not properly rolled back if timed out before sending prepare response.

2017-11-28 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-7049:
-

 Summary: Optimistic transaction is not properly rolled back if 
timed out before sending prepare response.
 Key: IGNITE-7049
 URL: https://issues.apache.org/jira/browse/IGNITE-7049
 Project: Ignite
  Issue Type: Bug
Affects Versions: 2.3
Reporter: Alexei Scherbakov
 Fix For: 2.4


Reproducer:

{noformat}
/*
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 *
 *  http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

package org.apache.ignite.internal.processors.cache.transactions;

import org.apache.ignite.Ignite;
import org.apache.ignite.cluster.ClusterNode;
import org.apache.ignite.configuration.CacheConfiguration;
import org.apache.ignite.configuration.IgniteConfiguration;
import org.apache.ignite.internal.TestRecordingCommunicationSpi;
import 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareResponse;
import org.apache.ignite.internal.util.typedef.G;
import org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi;
import org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder;
import org.apache.ignite.testframework.junits.common.GridCommonAbstractTest;
import org.apache.ignite.transactions.Transaction;

import static org.apache.ignite.cache.CacheAtomicityMode.TRANSACTIONAL;
import static org.apache.ignite.cache.CacheWriteSynchronizationMode.FULL_SYNC;
import static org.apache.ignite.transactions.TransactionConcurrency.OPTIMISTIC;
import static org.apache.ignite.transactions.TransactionIsolation.SERIALIZABLE;

/**
 * Tests an ability to eagerly rollback timed out optimistic transactions.
 */
public class TxRollbackOnTimeoutOptimisticTest extends GridCommonAbstractTest {
/** */
private static final String CACHE_NAME = "test";

/** IP finder. */
private static final TcpDiscoveryVmIpFinder IP_FINDER = new 
TcpDiscoveryVmIpFinder(true);

/** */
private static final int GRID_CNT = 3;

/** {@inheritDoc} */
@Override protected IgniteConfiguration getConfiguration(String 
igniteInstanceName) throws Exception {
IgniteConfiguration cfg = super.getConfiguration(igniteInstanceName);

((TcpDiscoverySpi)cfg.getDiscoverySpi()).setIpFinder(IP_FINDER);

TestRecordingCommunicationSpi commSpi = new 
TestRecordingCommunicationSpi();

cfg.setCommunicationSpi(commSpi);

boolean client = "client".equals(igniteInstanceName);

cfg.setClientMode(client);

if (!client) {
CacheConfiguration ccfg = new CacheConfiguration(CACHE_NAME);

ccfg.setAtomicityMode(TRANSACTIONAL);
ccfg.setBackups(2);
ccfg.setWriteSynchronizationMode(FULL_SYNC);

cfg.setCacheConfiguration(ccfg);
}

return cfg;
}

/**
 * @return Near cache flag.
 */
protected boolean nearCacheEnabled() {
return false;
}

/** {@inheritDoc} */
@Override protected void beforeTest() throws Exception {
super.beforeTest();

startGridsMultiThreaded(GRID_CNT);
}

/** {@inheritDoc} */
@Override protected void afterTest() throws Exception {
super.afterTest();

stopAllGrids();
}

/** */
public void testOptimisticTimeout() throws Exception {
final Ignite client = startGrid("client");

assertNotNull(client.cache(CACHE_NAME));

final ClusterNode n0 = client.affinity(CACHE_NAME).mapKeyToNode(0);

final Ignite prim = G.ignite(n0.id());

for (Ignite ignite : G.allGrids()) {
if (ignite == prim)
continue;

final TestRecordingCommunicationSpi spi =

(TestRecordingCommunicationSpi)ignite.configuration().getCommunicationSpi();

spi.blockMessages(GridDhtTxPrepareResponse.class, prim.name());
}

final int val = 0;

try {
multithreaded(new Runnable() {
@Override public void run() {
try (Transaction txOpt = 
client.transactions().txStart(OPTIMISTIC, SERIALIZABLE, 300, 1)) {

client.cache(CACHE_NAME).put(val, val);

txOpt.commit();
   

[jira] [Created] (IGNITE-6998) Activation on bigger topology with enabled persistence doesn't work as expected.

2017-11-23 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-6998:
-

 Summary: Activation on bigger topology with enabled persistence 
doesn't work as expected.
 Key: IGNITE-6998
 URL: https://issues.apache.org/jira/browse/IGNITE-6998
 Project: Ignite
  Issue Type: Bug
  Components: cache, persistence
Affects Versions: 2.3
Reporter: Alexei Scherbakov
 Fix For: 2.4



Reproducer:

{noformat}
/*
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 *
 *  http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

package org.apache.ignite.internal.processors.cache.persistence;

import java.util.HashMap;
import java.util.List;
import java.util.Map;
import org.apache.ignite.Ignite;
import org.apache.ignite.IgniteCheckedException;
import org.apache.ignite.cache.CacheWriteSynchronizationMode;
import org.apache.ignite.cache.affinity.rendezvous.RendezvousAffinityFunction;
import org.apache.ignite.cache.query.ScanQuery;
import org.apache.ignite.cluster.ClusterNode;
import org.apache.ignite.configuration.CacheConfiguration;
import org.apache.ignite.configuration.IgniteConfiguration;
import org.apache.ignite.configuration.MemoryConfiguration;
import org.apache.ignite.configuration.MemoryPolicyConfiguration;
import org.apache.ignite.configuration.PersistentStoreConfiguration;
import org.apache.ignite.configuration.WALMode;
import org.apache.ignite.internal.IgniteEx;
import org.apache.ignite.internal.IgniteKernal;
import org.apache.ignite.internal.processors.affinity.AffinityTopologyVersion;
import org.apache.ignite.internal.processors.cache.IgniteInternalCache;
import org.apache.ignite.internal.util.typedef.G;
import org.apache.ignite.internal.util.typedef.internal.U;
import org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi;
import org.apache.ignite.spi.discovery.tcp.ipfinder.TcpDiscoveryIpFinder;
import org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder;
import org.apache.ignite.testframework.junits.common.GridCommonAbstractTest;

/**
 * Check correctness of activation on bigger topology.
 */
public class IgnitePdsActivationOnBiggerTopologyTest extends 
GridCommonAbstractTest {
/** */
private static TcpDiscoveryIpFinder ipFinder = new 
TcpDiscoveryVmIpFinder(true);

/** {@inheritDoc} */
@Override protected IgniteConfiguration getConfiguration(String 
igniteInstanceName) throws Exception {
IgniteConfiguration cfg = super.getConfiguration(igniteInstanceName);

cfg.setMemoryConfiguration(new 
MemoryConfiguration().setDefaultMemoryPolicyName("d").
setPageSize(1024).setMemoryPolicies(new 
MemoryPolicyConfiguration().setName("d").
setInitialSize(50 * 1024 * 1024L).setMaxSize(50 * 1024 * 1024)));

cfg.setPersistentStoreConfiguration(new 
PersistentStoreConfiguration().setWalMode(WALMode.LOG_ONLY));

((TcpDiscoverySpi)cfg.getDiscoverySpi()).setIpFinder(ipFinder);

CacheConfiguration ccfg = new 
CacheConfiguration<>(DEFAULT_CACHE_NAME);


ccfg.setWriteSynchronizationMode(CacheWriteSynchronizationMode.FULL_SYNC);
ccfg.setAffinity(new RendezvousAffinityFunction(false, 64));

cfg.setCacheConfiguration(ccfg);

return cfg;
}

/** {@inheritDoc} */
@Override protected void beforeTest() throws Exception {
super.beforeTest();

deleteRecursively(U.resolveWorkDirectory(U.defaultWorkDirectory(), 
"db", false));
}

/** {@inheritDoc} */
@Override protected void afterTest() throws Exception {
stopAllGrids();

deleteRecursively(U.resolveWorkDirectory(U.defaultWorkDirectory(), 
"db", false));

super.afterTest();
}

/** */
public void testActivationOnBiggerTopology() throws Exception {
IgniteEx ignite = (IgniteEx)startGridsMultiThreaded(2);

final int keysCnt = 1_000;

for (int i = 0; i < keysCnt; i++)
ignite.cache(DEFAULT_CACHE_NAME).put(i, i);

forceCheckpoint();

assertEquals("Wrong size (before restart)", keysCnt, 
ignite.cache(DEFAULT_CACHE_NAME).size());
assertEquals("Wrong size for scan (before restart)", keysCnt, 
ignite.cache(DEFAULT_CACHE_NAME).query(new 

  1   2   >