[jira] [Created] (IGNITE-13371) Sporadic partition inconsistency after historical rebalancing of updates with same key put-remove pattern

2020-08-18 Thread Ivan Rakov (Jira)
Ivan Rakov created IGNITE-13371:
---

 Summary: Sporadic partition inconsistency after historical 
rebalancing of updates with same key put-remove pattern
 Key: IGNITE-13371
 URL: https://issues.apache.org/jira/browse/IGNITE-13371
 Project: Ignite
  Issue Type: Bug
Reporter: Ivan Rakov
Assignee: Ivan Rakov
 Fix For: 2.10


h4. scenario
# start 3 servers 3 clients, create caches
# clients start combined put + 1% remove of data in transactions 
PESSIMISTIC/REPEATABLE_READ
## kill one node
## restart one node
# ensure all transactions completed
# run idle_verify

Expected: no conflicts found
Actual:
{noformat}
[12:03:18][:55 :230] Control utility --cache idle_verify --skip-zeros 
--cache-filter PERSISTENT
[12:03:20][:55 :230] Control utility [ver. 8.7.13#20200228-sha1:7b016d63]
[12:03:20][:55 :230] 2020 Copyright(C) GridGain Systems, Inc. and Contributors
[12:03:20][:55 :230] User: prtagent
[12:03:20][:55 :230] Time: 2020-03-03T12:03:19.836
[12:03:20][:55 :230] Command [CACHE] started
[12:03:20][:55 :230] Arguments: --host 172.25.1.11 --port 11211 --cache 
idle_verify --skip-zeros --cache-filter PERSISTENT 
[12:03:20][:55 :230] 

[12:03:20][:55 :230] idle_verify task was executed with the following args: 
caches=[], excluded=[], cacheFilter=[PERSISTENT]
[12:03:20][:55 :230] idle_verify check has finished, found 1 conflict 
partitions: [counterConflicts=0, hashConflicts=1]
[12:03:20][:55 :230] Hash conflicts:
[12:03:20][:55 :230] Conflict partition: PartitionKeyV2 [grpId=1338167321, 
grpName=cache_group_3_088_1, partId=24]
[12:03:20][:55 :230] Partition instances: [PartitionHashRecordV2 
[isPrimary=false, consistentId=node_1_2, updateCntr=172349, 
partitionState=OWNING, size=6299, partHash=157875238], PartitionHashRecordV2 
[isPrimary=true, consistentId=node_1_1, updateCntr=172349, 
partitionState=OWNING, size=6299, partHash=157875238], PartitionHashRecordV2 
[isPrimary=false, consistentId=node_1_4, updateCntr=172349, 
partitionState=OWNING, size=6300, partHash=-944532882]]
[12:03:20][:55 :230] Command [CACHE] finished with code: 0
[12:03:20][:55 :230] Control utility has completed execution at: 
2020-03-03T12:03:20.593
[12:03:20][:55 :230] Execution time: 757 ms
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13211) Improve public exceptions for case when user attempts to access data from a lost partition

2020-07-03 Thread Ivan Rakov (Jira)
Ivan Rakov created IGNITE-13211:
---

 Summary: Improve public exceptions for case when user attempts to 
access data from a lost partition
 Key: IGNITE-13211
 URL: https://issues.apache.org/jira/browse/IGNITE-13211
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Rakov


After IGNITE-13003, attempt to access lost partition from public API throws 
CacheException with CacheInvalidStateException inside as a root cause. We can 
improve user experience a bit:
1. Create new type of public exception (subclass of CacheException), which will 
be thrown in accessing lost data scenarios
2. In case partition is lost in persistent cache, error message should be 
changed from "partition data has been lost" to "partition data temporary 
unavailable".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13064) Set default transaction timeout to 5 minutes

2020-05-22 Thread Ivan Rakov (Jira)
Ivan Rakov created IGNITE-13064:
---

 Summary: Set default transaction timeout to 5 minutes
 Key: IGNITE-13064
 URL: https://issues.apache.org/jira/browse/IGNITE-13064
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Rakov


Let's set default TX timeout to 5 minutes (right now it's 0 = no timeout).
Pros:
1. Deadlock detection procedure is triggered on timeout. In case user will get 
into key-level deadlock, he'll be able to discover root cause from the logs 
(even though load will hang for a while) and skip step with googling and 
debugging.
2. Almost every system with transactions has timeout enabled by default.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13052) Calculate result of reserveHistoryForExchange in advance

2020-05-21 Thread Ivan Rakov (Jira)
Ivan Rakov created IGNITE-13052:
---

 Summary: Calculate result of reserveHistoryForExchange in advance
 Key: IGNITE-13052
 URL: https://issues.apache.org/jira/browse/IGNITE-13052
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Rakov


Method reserveHistoryForExchange() is called on every partition map exchange. 
It's an expensive call: it requires iteration over the whole checkpoint history 
with possible retrieve of GroupState from WAL (it's stored on heap with 
SoftReference). On some deployments this operation can take several minutes.

The idea of optimization is to calculate it's result only on first PME 
(ideally, even before first PME, on recovery stage), keep resulting map {grpId, 
partId -> earlisetCheckpoint} on heap and update it if necessary. From the 
first glance, map should be updated:
1) On checkpoint. If a new partition appears on local node, it should be 
registered in the map with current checkpoint. If a partition is evicted from 
local node, or changed its state to non-OWNING, it should removed from the map. 
If checkpoint is marked as inapplicable for a certain group, the whole group 
should be removed from the map.
2) On checkpoint history cleanup. For every (grpId, partId), previous earliest 
checkpoint should be changed with setIfGreater to new earliest checkpoint.

Memory overhead of storing described map on heap in significant. It's size 
isn't greater than size of map returned from reserveHistoryForExchange().

Described fix should be much simpler than IGNITE-12429.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-12622) Forbid mixed cache groups with both atomic and transactional caches

2020-02-04 Thread Ivan Rakov (Jira)
Ivan Rakov created IGNITE-12622:
---

 Summary: Forbid mixed cache groups with both atomic and 
transactional caches
 Key: IGNITE-12622
 URL: https://issues.apache.org/jira/browse/IGNITE-12622
 Project: Ignite
  Issue Type: Improvement
  Components: cache
Reporter: Ivan Rakov
 Fix For: 2.9


Apparently it's possible in Ignite to configure a cache group with both ATOMIC 
and TRANSACTIONAL caches.
IgniteCacheGroupsTest#testContinuousQueriesMultipleGroups* tests.
As per discussed on dev list 
(http://apache-ignite-developers.2346864.n4.nabble.com/Forbid-mixed-cache-groups-with-both-atomic-and-transactional-caches-td45586.html),
 the community has concluded that such configurations should be prohibited.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-12607) PartitionsExchangeAwareTest is flaky

2020-01-30 Thread Ivan Rakov (Jira)
Ivan Rakov created IGNITE-12607:
---

 Summary: PartitionsExchangeAwareTest is flaky
 Key: IGNITE-12607
 URL: https://issues.apache.org/jira/browse/IGNITE-12607
 Project: Ignite
  Issue Type: Bug
Reporter: Ivan Rakov
Assignee: Ivan Rakov
 Fix For: 2.9


Proof: 
https://ci.ignite.apache.org/buildConfiguration/IgniteTests24Java8_Cache6/4972239
Seems like cache update sometimes is not possible even before topologies are 
locked.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-12545) Introduce listener interface for components to react to partition map exchange events

2020-01-15 Thread Ivan Rakov (Jira)
Ivan Rakov created IGNITE-12545:
---

 Summary: Introduce listener interface for components to react to 
partition map exchange events
 Key: IGNITE-12545
 URL: https://issues.apache.org/jira/browse/IGNITE-12545
 Project: Ignite
  Issue Type: Bug
Reporter: Ivan Rakov


It would be handly to have listener interface for components that should react 
to PME instead of just adding more and more calls to 
GridDhtPartitionsExchangeFuture.
In general, there are four possible moments when a compnent can be notified: on 
exchnage init (before and after topologies are updates and exchange latch is 
acquired) and on exchange done (before and after readyTopVer is incremented and 
user operations are unlocked).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-12531) Cluster is unable to change BLT on 2.8 if storage was initially created on 2.7 or less

2020-01-13 Thread Ivan Rakov (Jira)
Ivan Rakov created IGNITE-12531:
---

 Summary: Cluster is unable to change BLT on 2.8 if storage was 
initially created on 2.7 or less
 Key: IGNITE-12531
 URL: https://issues.apache.org/jira/browse/IGNITE-12531
 Project: Ignite
  Issue Type: Bug
Affects Versions: 2.8
Reporter: Ivan Rakov
 Fix For: 2.8


Due to bug in https://issues.apache.org/jira/browse/IGNITE-10348, after storage 
migration from 2.7- to 2.8 any updates of metastorage are not persisted.

S2R:
(on 2.7)
- Activate persistent cluster with 2 nodes
- Shutdown the cluster

(on 2.8)
- Start cluster with 2 nodes based on persistent storage from 2.7
- Start 3rd node
- Change baseline
- Shutdown the cluster
- Start initial two nodes
- Start 3rd node (join is rejected: first two nodes has old BLT of two nodes, 
3rd node has new BLT of three nodes)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-12510) In-memory page eviction may fail in case very large entries are stored in the cache

2019-12-27 Thread Ivan Rakov (Jira)
Ivan Rakov created IGNITE-12510:
---

 Summary: In-memory page eviction may fail in case very large 
entries are stored in the cache
 Key: IGNITE-12510
 URL: https://issues.apache.org/jira/browse/IGNITE-12510
 Project: Ignite
  Issue Type: Bug
Affects Versions: 2.7.6
Reporter: Ivan Rakov


In-memory page eviction (both DataPageEvictionMode#RANDOM_LRU and 
DataPageEvictionMode#RANDOM_2_LRU) has limited number of attempts to choose 
candidate page for data removal:

{code:java}
if (sampleSpinCnt > SAMPLE_SPIN_LIMIT) { // 5000
LT.warn(log, "Too many attempts to choose data page: " + 
SAMPLE_SPIN_LIMIT);

return;
}
{code}
Large data entries are stored in several data pages which are sequentially 
linked to each other. Only "head" pages are suitable for eviction, because the 
whole entry is available only from "head" page (list of pages is singly linked; 
there are no reverse links from tail to head).
The problem is that if we put large enough entries to evictable cache (e.g. 
each entry needs more than 5000 pages to be stored), there are too few head 
pages and "Too many attempts to choose data page" error is likely to show up.
We need to perform something like full scan if we failed to find a head page in 
SAMPLE_SPIN_LIMIT attempts instead of just failing node with error.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-12509) CACHE_REBALANCE_STOPPED event raises for wrong caches in case of specified RebalanceDelay

2019-12-27 Thread Ivan Rakov (Jira)
Ivan Rakov created IGNITE-12509:
---

 Summary: CACHE_REBALANCE_STOPPED event raises for wrong caches in 
case of specified RebalanceDelay
 Key: IGNITE-12509
 URL: https://issues.apache.org/jira/browse/IGNITE-12509
 Project: Ignite
  Issue Type: Bug
Reporter: Ivan Rakov
 Fix For: 2.9


Steps to reproduce:
1. Start in-memory cluster with 2 server nodes
2. Start 3 caches with different rebalance delays (e.g. 5, 10 and 15 seconds) 
and upload some data
3. Start localListener for EVT_CACHE_REBALANCE_STOPPED event on one of the 
nodes.
4. Start one more server node.
5. Wait for 5 seconds, till rebalance delay is reached.
6. EVT_CACHE_REBALANCE_STOPPED event received 3 times (1 for each cache), but 
in fact only 1 cache was rebalanced. The same happens for the rest of the 
caches.
As result on rebalance finish we're getting event for each cache [CACHE_COUNT] 
times, instead of 1.
Reproducer attached.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-12508) GridCacheProcessor#cacheDescriptor(int) has O(N) complexity

2019-12-27 Thread Ivan Rakov (Jira)
Ivan Rakov created IGNITE-12508:
---

 Summary: GridCacheProcessor#cacheDescriptor(int) has O(N) 
complexity
 Key: IGNITE-12508
 URL: https://issues.apache.org/jira/browse/IGNITE-12508
 Project: Ignite
  Issue Type: Bug
Reporter: Ivan Rakov
 Fix For: 2.9


See the method code:
{code}
@Nullable public DynamicCacheDescriptor cacheDescriptor(int cacheId) {
for (DynamicCacheDescriptor cacheDesc : cacheDescriptors().values()) {
CacheConfiguration ccfg = cacheDesc.cacheConfiguration();

assert ccfg != null : cacheDesc;

if (CU.cacheId(ccfg.getName()) == cacheId)
return cacheDesc;
}

return null;
}
{code}

This method is invoked in several hot paths which causes significant 
performance regression when the number of caches is large, for example, logical 
recovery and security check for indexing.

The method should be improved to use a hash map or similar data structure to 
get a better complexity



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-12507) Implement cache size metric in bytes

2019-12-27 Thread Ivan Rakov (Jira)
Ivan Rakov created IGNITE-12507:
---

 Summary: Implement cache size metric in bytes
 Key: IGNITE-12507
 URL: https://issues.apache.org/jira/browse/IGNITE-12507
 Project: Ignite
  Issue Type: Improvement
  Components: cache
Reporter: Ivan Rakov
 Fix For: 2.9


There is a need to have cache size in bytes metric for pure in-memory case.

When all data is in RAM, it is not obvious to find out exactly how much space 
is consumed by cache data on a running node as the only things that could be 
watched on is number of keys in partition on a specific node and memory usage 
metrics on the machine.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-12451) Introduce deadlock detection for cache entry reentrant locks

2019-12-13 Thread Ivan Rakov (Jira)
Ivan Rakov created IGNITE-12451:
---

 Summary: Introduce deadlock detection for cache entry reentrant 
locks
 Key: IGNITE-12451
 URL: https://issues.apache.org/jira/browse/IGNITE-12451
 Project: Ignite
  Issue Type: Improvement
Affects Versions: 2.7.6
Reporter: Ivan Rakov
 Fix For: 2.9


Aside from IGNITE-12365, we still have possible threat of cache-entry-level 
deadlock in case of careless usage of JCache mass operations (putAll, 
removeAll):
1. If two different user threads will perform putAll on the same two keys in 
reverse order (primary node for which is the same), there's a chance that 
sys-stripe threads will be deadlocked.
2. Even without direct contract violation from user side, HashMap can be passed 
as argument for putAll. Even if user threads have called mass operations with 
two keys in the same order, HashMap iteration order is not strictly defined, 
which may cause the same deadlock. 

Local deadlock detection should mitigate this issue. We can create a wrapper 
for ReentrantLock with logic that performs cycle detection in wait-for graph in 
case we are waiting for lock acquisition for too long. Exception will be thrown 
from one of the threads in such case, failing user operation, but letting the 
system make progress.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-12429) Rework bytes-based WAL archive size management logic to make historical rebalance more predictable

2019-12-09 Thread Ivan Rakov (Jira)
Ivan Rakov created IGNITE-12429:
---

 Summary: Rework bytes-based WAL archive size management logic to 
make historical rebalance more predictable
 Key: IGNITE-12429
 URL: https://issues.apache.org/jira/browse/IGNITE-12429
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Rakov


Since 2.7 DataStorageConfiguration allows to specify size of WAL archive in 
bytes (see DataStorageConfiguration#maxWalArchiveSize), which is much more 
trasparent to user. 
Unfortunately, new logic may be unpredictable when it comes to the historical 
rebalance. WAL archive is truncated when one of the following conditions occur:
1. Total number of checkpoints in WAL archive is bigger than 
DataStorageConfiguration#walHistSize
2. Total size of WAL archive is bigger than 
DataStorageConfiguration#maxWalArchiveSize
Independently, in-memory checkpoint history contains only fixed number of last 
checkpoints (can be changed with IGNITE_PDS_MAX_CHECKPOINT_MEMORY_HISTORY_SIZE, 
100 by default).
All these particular qualities make it hard for user to cotrol usage of 
historical rebalance. Imagine the case when user has slight load (WAL gets 
rotated very slowly) and default checkpoint frequency. After 100 * 3 = 300 
minutes, all updates in WAL will be impossible to be received via historical 
rebalance even if:
1. User has configured large DataStorageConfiguration#maxWalArchiveSize
2. User has configured large DataStorageConfiguration#walHistSize
At the same time, setting large IGNITE_PDS_MAX_CHECKPOINT_MEMORY_HISTORY_SIZE 
will help (only with previous two points combined), but Ignite node heap usage 
may increase dramatically.
I propose to change WAL history management logic in the following way:
1. *Don't* cut WAL archive when number of checkpoint exceeds 
DataStorageConfiguration#walHistSize. WAL history should be managed only based 
on DataStorageConfiguration#maxWalArchiveSize.
2. Checkpoint history should contain fixed number of entries, but should cover 
the whole stored WAL archive (not only its more recent part with 
IGNITE_PDS_MAX_CHECKPOINT_MEMORY_HISTORY_SIZE last checkpoints). This can be 
achieved by making checkpoint history sparse: some intermediate checkpoints 
*may be not present in history*, but fixed number of checkpoints can be 
positioned either in uniform distribution (trying to keep fixed number of bytes 
between two neighbour checkpoints) or exponentially (trying to keep fixed ratio 
between (size of WAL from checkpoint(N-1) to current write pointer) and (size 
of WAL from checkpoint(N) to current write pointer).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-12278) Add metric showing how many nodes may safely leave the cluster wihout partition loss

2019-10-10 Thread Ivan Rakov (Jira)
Ivan Rakov created IGNITE-12278:
---

 Summary: Add metric showing how many nodes may safely leave the 
cluster wihout partition loss
 Key: IGNITE-12278
 URL: https://issues.apache.org/jira/browse/IGNITE-12278
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Rakov
 Fix For: 2.8


We already have getMinimumNumberOfPartitionCopies metrics that shows partitions 
redundancy number for a specific cache group.
It would be handy if user has single aggregated metric for all cache groups 
showing how many nodes may leave the cluster without partition loss in any 
cache.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-11807) Index validation control.sh command may provide false-positive error results

2019-04-25 Thread Ivan Rakov (JIRA)
Ivan Rakov created IGNITE-11807:
---

 Summary: Index validation control.sh command may provide 
false-positive error results
 Key: IGNITE-11807
 URL: https://issues.apache.org/jira/browse/IGNITE-11807
 Project: Ignite
  Issue Type: Bug
Reporter: Ivan Rakov
 Fix For: 2.8


There are two possible issues in validate_indexes command:
1. In case index validation is performed under load, there's a chance that 
we'll fetch link from B+ tree and won't found this key in partition cache data 
store as per it was conurrently removed.
We may work it around by double-checking partition update counters (before and 
after indexes validation procedure).
2. Since indexes validation is subscribed to checkpoint start (reason: we 
perform CRC validation of file page store pages which is sensitive to 
concurrent disk page writes), we may bump into the following situation:
- User fairly stops all load
- A few moments later users triggers validate_indexes
- Checkpoint starts due to timeout, pages that were modified before 
validate_indexes start are being written to the disk
- validate_indexes fails
We may work it around by triggering checkpoint forcibly before start of indexes 
validation activities.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11769) Investigate JVM crash in PDS Direct IO TeamCity suites

2019-04-17 Thread Ivan Rakov (JIRA)
Ivan Rakov created IGNITE-11769:
---

 Summary: Investigate JVM crash in PDS Direct IO TeamCity suites
 Key: IGNITE-11769
 URL: https://issues.apache.org/jira/browse/IGNITE-11769
 Project: Ignite
  Issue Type: Bug
Reporter: Ivan Rakov
 Fix For: 2.8


Both PDS Direct IO suites periodically fail with JVM crash.
The issue can be reproduced on Linux machine by running 
IgnitePdsWithTtlTest#testTtlIsAppliedAfterRestart using ignite-direct-io 
classpath.
The investigation is complicated because JVM crash report* is not generated* 
during this crash. After some point, JVM stays dormant for 2 minutes and then 
process gets killed by OS signal
{code:java}
Process finished with exit code 134 (interrupted by signal 6: SIGABRT
{code}
and the following error messages can be dumped to stderr before process death
{code:java}
`corrupted double-linked list`
`free(): corrupted unsorted chunks`
{code}
which appear to be libc error messages. Seems like Ignite corrupts virtual 
memory in sophisticated way which prevents normal JVM Crash flow.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11762) Test testClientStartCloseServersRestart causes hang of the whole Cache 2 suite in master

2019-04-16 Thread Ivan Rakov (JIRA)
Ivan Rakov created IGNITE-11762:
---

 Summary: Test testClientStartCloseServersRestart causes hang of 
the whole Cache 2 suite in master
 Key: IGNITE-11762
 URL: https://issues.apache.org/jira/browse/IGNITE-11762
 Project: Ignite
  Issue Type: Bug
Reporter: Ivan Rakov
Assignee: Pavel Kovalenko
 Fix For: 2.8


Attempt to restart server node in test hangs:

 
{code:java}
[2019-04-16 19:56:45,049][WARN ][restart-1][GridCachePartitionExchangeManager] 
Failed to wait for initial partition map exchange. Possible reasons are:
^-- Transactions in deadlock.
^-- Long running transactions (ignore if this is the case).
^-- Unreleased explicit locks.
{code}
The reason is that previous PME (late affinity assignment) still hangs due to 
pending transaction:

 

 
{code:java}
[2019-04-16 19:56:23,717][WARN 
][exchange-worker-#1039%cache.IgniteClientCacheStartFailoverTest3%][diagnostic] 
Pending transactions:
[2019-04-16 19:56:23,718][WARN 
][exchange-worker-#1039%cache.IgniteClientCacheStartFailoverTest3%][diagnostic] 
>>> [txVer=AffinityTopologyVersion [topVer=11, minorTopVer=0], exchWait=true, 
tx=GridDhtTxLocal [nearNodeId=8559bfe0-3d4a-4090-a457-6df0eba5, 
nearFutId=1edc7172a61-941f9dde-2b60-4a1f-8213-7d23d738bf33, nearMiniId=1, 
nearFinFutId=null, nearFinMiniId=0, nearXidVer=GridCacheVersion 
[topVer=166913752, order=1555433759036, nodeOrder=6], lb=null, 
super=GridDhtTxLocalAdapter [nearOnOriginatingNode=false, nearNodes=KeySetView 
[], dhtNodes=KeySetView [9ef33532-0e4a-4561-b57e-042afe10], 
explicitLock=false, super=IgniteTxLocalAdapter [completedBase=null, 
sndTransformedVals=false, depEnabled=false, txState=IgniteTxStateImpl 
[activeCacheIds=[-1062368467], recovery=false, mvccEnabled=true, 
mvccCachingCacheIds=[], txMap=HashSet []], super=IgniteTxAdapter 
[xidVer=GridCacheVersion [topVer=166913752, order=1555433759045, nodeOrder=10], 
writeVer=null, implicit=false, loc=true, threadId=1210, 
startTime=1555433762847, nodeId=0088e9b8-f859-4d14-8071-6388e473, 
startVer=GridCacheVersion [topVer=166913752, order=1555433759045, 
nodeOrder=10], endVer=null, isolation=REPEATABLE_READ, concurrency=PESSIMISTIC, 
timeout=0, sysInvalidate=false, sys=false, plc=2, commitVer=GridCacheVersion 
[topVer=166913752, order=1555433759045, nodeOrder=10], finalizing=NONE, 
invalidParts=null, state=MARKED_ROLLBACK, timedOut=false, 
topVer=AffinityTopologyVersion [topVer=11, minorTopVer=0], 
mvccSnapshot=MvccSnapshotResponse [futId=292, crdVer=1555433741506, cntr=395, 
opCntr=1, txs=[394], cleanupVer=390, tracking=0], skipCompletedVers=false, 
parentTx=null, duration=20866ms, onePhaseCommit=false], size=0

{code}
However, load threads don't start any explicit transactions: they either hang 
on put()/get() or on clientCache.close().

Rolling back IGNITE-10799 resolves the issue (however, test remains flaky with 
~10% fail rate due to unhandled TransactionSerializationException).

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11747) Document --tx control script commands

2019-04-15 Thread Ivan Rakov (JIRA)
Ivan Rakov created IGNITE-11747:
---

 Summary: Document --tx control script commands
 Key: IGNITE-11747
 URL: https://issues.apache.org/jira/browse/IGNITE-11747
 Project: Ignite
  Issue Type: Task
  Components: documentation
Reporter: Ivan Rakov


Along with consistency check utilities, ./control.sh script has --tx command 
which allows to display info about active transactions and even kill hanging 
transactions directly.

./control.sh provides just brief description of options:
{code:java}
List or kill transactions:
control.sh --tx [--xid XID] [--min-duration SECONDS] [--min-size SIZE] [--label 
PATTERN_REGEX] [--servers|--clients] [--nodes 
consistentId1[,consistentId2,,consistentIdN]] [--limit NUMBER] [--order 
DURATION|SIZE|START_TIME] [--kill] [--info] [--yes]
{code}
We should document possible use cases and options of the command, possibly 
somewhere close to [https://apacheignite-tools.readme.io/docs/control-script]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11735) Safely handle new closures of IGNITE-11392 in mixed cluster environment

2019-04-12 Thread Ivan Rakov (JIRA)
Ivan Rakov created IGNITE-11735:
---

 Summary: Safely handle new closures of IGNITE-11392 in mixed 
cluster environment
 Key: IGNITE-11735
 URL: https://issues.apache.org/jira/browse/IGNITE-11735
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Rakov
Assignee: Denis Chudov
 Fix For: 2.8


Under IGNITE-11392 we have added two new closures 
(FetchActiveTxOwnerTraceClosure and TxOwnerDumpRequestAllowedSettingClosure).
In case we'll assemble mixed cluster (some nodes contain the patch, some 
don't), we may bump into situation when closures are sent to node that doesn't 
contain corresponding classes in classpath. Normally, closurer will be deployed 
to "old" node via peer-to-peer class deployment. However, p2p may be disabled 
in configuration, which will cause ClassNotFoundException on "old" node.
We should register IGNITE-11392 in IgniteFeatures (recent example: 
IGNITE-11598) and filter out nodes that don't support new feature before 
sending compute.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11591) Add info about lock candidates that are ahead in queue to transaction timeout error message

2019-03-21 Thread Ivan Rakov (JIRA)
Ivan Rakov created IGNITE-11591:
---

 Summary: Add info about lock candidates that are ahead in queue to 
transaction timeout error message
 Key: IGNITE-11591
 URL: https://issues.apache.org/jira/browse/IGNITE-11591
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Rakov
 Fix For: 2.8


If transaction is timed out due to lock acquisition failure, corresponding 
error will show up in server log on DHT node:
{code:java}
[2019-03-20 
21:13:10,831][ERROR][grid-timeout-worker-#23%transactions.TxRollbackOnTimeoutTest0%][GridDhtColocatedCache]
  Failed to acquire lock for request: GridNearLockRequest 
[topVer=AffinityTopologyVersion [topVer=4, minorTopVer=0], miniId=1, 
dhtVers=GridCacheVersion[] [null], subjId=651a30e1-45ac-4b35-86d2-028d1f81d8dc, 
taskNameHash=0, createTtl=-1, accessTtl=-1, flags=6, txLbl=null, filter=null, 
super=GridDistributedLockRequest [nodeId=651a30e1-45ac-4b35-86d2-028d1f81d8dc, 
nearXidVer=GridCacheVersion [topVer=164585585, order=1553105588524, 
nodeOrder=4], threadId=262, 
futId=5967e4c9961-d32ea2a6-1789-47d7-bdbf-aa66e6d8c35b, timeout=890, 
isInTx=true, isInvalidate=false, isRead=false, isolation=REPEATABLE_READ, 
retVals=[false], txSize=2, flags=0, keysCnt=1, super=GridDistributedBaseMessage 
[ver=GridCacheVersion [topVer=164585585, order=1553105588524, nodeOrder=4], 
committedVers=null, rolledbackVers=null, cnt=0, super=GridCacheIdMessage 
[cacheId=3556498
class org.apache.ignite.internal.transactions.IgniteTxTimeoutCheckedException: 
Failed to acquire lock within provided timeout for transaction [timeout=890, 
tx=GridDhtTxLocal[xid=f219e4c9961--09cf-6071--0001, 
xidVersion=GridCacheVersion [topVer=164585585, order=1553105588527, 
nodeOrder=1], concurrency=PESSIMISTIC, isolation=REPEATABLE_READ, 
state=MARKED_ROLLBACK, invalidate=false, rollbackOnly=true, 
nodeId=c7dccddb-dee1-4499-94b1-03896350, timeout=890, duration=891]]
at 
org.apache.ignite.internal.processors.cache.transactions.IgniteTxLocalAdapter$PostLockClosure1.apply(IgniteTxLocalAdapter.java:1766)
at 
org.apache.ignite.internal.processors.cache.transactions.IgniteTxLocalAdapter$PostLockClosure1.apply(IgniteTxLocalAdapter.java:1714)
at 
org.apache.ignite.internal.util.future.GridEmbeddedFuture$2.applyx(GridEmbeddedFuture.java:86)
at 
org.apache.ignite.internal.util.future.GridEmbeddedFuture$AsyncListener1.apply(GridEmbeddedFuture.java:292)
at 
org.apache.ignite.internal.util.future.GridEmbeddedFuture$AsyncListener1.apply(GridEmbeddedFuture.java:285)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:399)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.unblock(GridFutureAdapter.java:347)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.unblockAll(GridFutureAdapter.java:335)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:511)
at 
org.apache.ignite.internal.processors.cache.GridCacheCompoundIdentityFuture.onDone(GridCacheCompoundIdentityFuture.java:56)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:490)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLockFuture.onComplete(GridDhtLockFuture.java:793)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLockFuture.access$900(GridDhtLockFuture.java:89)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLockFuture$LockTimeoutObject.onTimeout(GridDhtLockFuture.java:1189)
at 
org.apache.ignite.internal.processors.timeout.GridTimeoutProcessor$TimeoutWorker.body(GridTimeoutProcessor.java:234)
at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
at java.lang.Thread.run(Thread.j
{code}
It would be much more useful if this message also contained information about 
transaction that actually owns corresponding lock (or information about all 
transactions that are ahead in queue if there are several).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11484) Get rid of ForkJoinPool#commonPool usage for csystem critical tasks

2019-03-05 Thread Ivan Rakov (JIRA)
Ivan Rakov created IGNITE-11484:
---

 Summary: Get rid of ForkJoinPool#commonPool usage for csystem 
critical tasks
 Key: IGNITE-11484
 URL: https://issues.apache.org/jira/browse/IGNITE-11484
 Project: Ignite
  Issue Type: Bug
Reporter: Ivan Rakov
Assignee: Ivan Rakov
 Fix For: 2.8


We use ForkJoinPool#commonPool for sorting checkpoint pages.
This may backfire if common pool is already utilized in current JVM: checkpoint 
may wait for sorting for a long time, which in turn will cause user load 
dropdown.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11465) Multiple client leave/join events may wipe affinity assignment history and cause transactions fail

2019-03-02 Thread Ivan Rakov (JIRA)
Ivan Rakov created IGNITE-11465:
---

 Summary: Multiple client leave/join events may wipe affinity 
assignment history and cause transactions fail
 Key: IGNITE-11465
 URL: https://issues.apache.org/jira/browse/IGNITE-11465
 Project: Ignite
  Issue Type: Bug
Reporter: Ivan Rakov
Assignee: Ivan Rakov
 Fix For: 2.8


We keep history of GridAffinityAssignmentCache#MAX_HIST_SIZE affinity 
assignments, however flood of client joins/leaves may wipe it out entirely and 
cause fail/hang of transaction that was started before the flood:
{code:java}
if (cache == null || cache.topologyVersion().compareTo(topVer) > 0) 
{
throw new IllegalStateException("Getting affinity for topology 
version earlier than affinity is " +
"calculated [locNode=" + ctx.discovery().localNode() +
", grp=" + cacheOrGrpName +
", topVer=" + topVer +
", head=" + head.get().topologyVersion() +
", history=" + affCache.keySet() +
']');
}
{code}
History is limited in order to prevent JVM heap overflow. At the same time, 
only "server event" affinity assignments are heavy: "client event" assignments 
are just shallow copies of "server event" assignments.
I suggest to limit history by the number of "server event" assignments.
Also, consider the provided fix, I don't see any need to keep 500 items in 
history. I changed history size to 40.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11415) Initiator server node writes all transaction entries to WAL instead of local partition ones

2019-02-25 Thread Ivan Rakov (JIRA)
Ivan Rakov created IGNITE-11415:
---

 Summary: Initiator server node writes all transaction entries to 
WAL instead of local partition ones
 Key: IGNITE-11415
 URL: https://issues.apache.org/jira/browse/IGNITE-11415
 Project: Ignite
  Issue Type: Bug
Reporter: Ivan Rakov
 Fix For: 2.8


IgniteTxLocalAdapter#userCommit assebles allEntries()/writeEntries() and writes 
them to WAL:
{code:java}
Collection commitEntries = (near() || 
cctx.snapshot().needTxReadLogging()) ? allEntries() : writeEntries();
{code}
If transaction is initiated by server node, all transaction entries will be 
contained in allEntries()/writeEntries(). Thus, we may write entries to WAL 
even if local node doesn't own corresponding partition.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11367) Fix several issues in PageMemoryTracker

2019-02-20 Thread Ivan Rakov (JIRA)
Ivan Rakov created IGNITE-11367:
---

 Summary: Fix several issues in PageMemoryTracker
 Key: IGNITE-11367
 URL: https://issues.apache.org/jira/browse/IGNITE-11367
 Project: Ignite
  Issue Type: Bug
Reporter: Ivan Rakov
 Fix For: 2.8


I've discovered some issues in PageMemoryTracker while debugging IGNITE-10873:
1) Mock page memory doesn't implement PageMemoryImpl#pageBuffer. As a result, 
some delta records (which applies changes via buffer) can't be applied. Example:
{code:java}
Caused by: java.lang.NullPointerException
at 
org.apache.ignite.internal.processors.cache.persistence.tree.io.TrackingPageIO.getLastSnapshotTag0(TrackingPageIO.java:235)
at 
org.apache.ignite.internal.processors.cache.persistence.tree.io.TrackingPageIO.getLastSnapshotTag(TrackingPageIO.java:227)
at 
org.apache.ignite.internal.processors.cache.persistence.tree.io.TrackingPageIO.validateSnapshotTag(TrackingPageIO.java:135)
at 
org.apache.ignite.internal.processors.cache.persistence.tree.io.TrackingPageIO.markChanged(TrackingPageIO.java:93)
at 
org.apache.ignite.internal.pagemem.wal.record.delta.TrackingPageDeltaRecord.applyDelta(TrackingPageDeltaRecord.java:75)
at 
org.apache.ignite.internal.processors.cache.persistence.wal.memtracker.PageMemoryTracker.applyWalRecord(PageMemoryTracker.java:447)
at 
org.apache.ignite.internal.processors.cache.persistence.wal.memtracker.PageMemoryTracker.access$000(PageMemoryTracker.java:81)
at 
org.apache.ignite.internal.processors.cache.persistence.wal.memtracker.PageMemoryTracker$1.log(PageMemoryTracker.java:159)
at 
org.gridgain.grid.internal.processors.cache.database.snapshot.GridCacheSnapshotManager.onChangeTrackerPage(GridCacheSnapshotManager.java:2801)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$5.applyx(GridCacheDatabaseSharedManager.java:1084)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$5.applyx(GridCacheDatabaseSharedManager.java:1077)
at 
org.apache.ignite.internal.util.lang.GridInClosure3X.apply(GridInClosure3X.java:34)
at 
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.writeUnlockPage(PageMemoryImpl.java:1572)
at 
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.writeUnlock(PageMemoryImpl.java:495)
at 
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.writeUnlock(PageMemoryImpl.java:487)
at 
org.apache.ignite.internal.processors.cache.persistence.tree.util.PageHandler.writeUnlock(PageHandler.java:394)
at 
org.apache.ignite.internal.processors.cache.persistence.tree.util.PageHandler.writePage(PageHandler.java:369)
at 
org.apache.ignite.internal.processors.cache.persistence.DataStructure.write(DataStructure.java:285)
at 
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.access$11500(BPlusTree.java:92)
at 
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Put.tryReplace(BPlusTree.java:3638)
at 
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.putDown(BPlusTree.java:2565)
at 
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.doPut(BPlusTree.java:2293)
... 33 more
{code}
2) During binary recovery phase, page memory is changed with applying delta 
records and page snapshots (GridCacheDatabaseSharedManager#applyPageDelta, 
#applyPageSnapshot). Such changes are not replicated by logging delta records 
to WAL (we don't log physical records on binary recovery - we just apply 
already logged ones). This leads to false positive broken consistency reports. 
To prevent this, we should apply changes to both regular page memory and mock 
page memory in PageMemoryTracker.
3) PagesList.java:918:
{code:java}
// Here we should never write full page, because it is 
known to be new.
if (needWalDeltaRecord(nextId, nextPage, FALSE))
wal.log(new PagesListInitNewPageRecord(
grpId,
nextId,
io.getType(),
io.getVersion(),
nextId,
prevId,
0L
));
{code}

[jira] [Created] (IGNITE-11199) Extend logging for client-server connections in TCP discovery

2019-02-04 Thread Ivan Rakov (JIRA)
Ivan Rakov created IGNITE-11199:
---

 Summary: Extend logging for client-server connections in TCP 
discovery
 Key: IGNITE-11199
 URL: https://issues.apache.org/jira/browse/IGNITE-11199
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Rakov


When client node connects to a server node, it should print detailed 
information about server (at least, server node ID and consistent ID).
When server node starts serving client node connection, it also should print 
detailed information about client.
Currently, all we have is abstract logs about connections.
On client side:
{code:java}
[2019-02-02 17:50:43,270][INFO 
][grid-nio-worker-tcp-comm-0-#24][TcpCommunicationSpi] Established outgoing 
communication connection [locAddr=/127.0.0.1:53183, rmtAddr=/127.0.0.1:47100]
[2019-02-02 17:50:43,446][INFO 
][grid-nio-worker-tcp-comm-1-#25][TcpCommunicationSpi] Established outgoing 
communication connection [locAddr=/127.0.0.1:53184, rmtAddr=/127.0.0.1:47103]
{code}
On server side:
{code:java}
./mahina98-2019-02-01.log:<190>Feb  1 18:24:19 mahina98.ca.sbrf.ru 2019-02-01 
18:24:19.236[INFO 
][tcp-disco-sock-reader-#5%DPL_GRID%DplGridNodeName%][o.a.i.s.d.tcp.TcpDiscoverySpi][tcp-disco-sock-reader-#5%DPL_GRID%DplGridNodeName%]
 Started serving remote node connection [rmtAddr=/10.124.133.5:56297, 
rmtPort=56297]
{code}
This is definetely not enough to find out which clients connected to local 
server node and to which server local client node has been connected.





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11156) FreeLists are overflowed with pages with almost no free space left

2019-01-30 Thread Ivan Rakov (JIRA)
Ivan Rakov created IGNITE-11156:
---

 Summary: FreeLists are overflowed with pages with almost no free 
space left
 Key: IGNITE-11156
 URL: https://issues.apache.org/jira/browse/IGNITE-11156
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Rakov
Assignee: Ivan Rakov
 Fix For: 2.8


{code:java}
/** */
private static final int MIN_PAGE_FREE_SPACE = 8;
{code}
If data pages has 8 free bytes or more, it will be stored in a free list. As 
result, free lists mostly contain "free pages" that are actually useless: pair 
of (boolean, boolean) takes approximately 50 bytes. I think, we'll increase 
this constant to something like 40, memory will be used more efficiently in 
result and WAL usage will decrease.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11093) Introduce transaction state change tracking framework

2019-01-25 Thread Ivan Rakov (JIRA)
Ivan Rakov created IGNITE-11093:
---

 Summary: Introduce transaction state change tracking framework
 Key: IGNITE-11093
 URL: https://issues.apache.org/jira/browse/IGNITE-11093
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Rakov


This ticket covers creation of a framework that would allow to track current 
state of active transactions. When it's enabled, transactions code will notify 
framework on every crucial state change of active transaction (prepare phase 
done, commit phase done, rollback). In turn, framework should:
1) Provide list of currently prepared transactions
2) Upon request, start tracking of all prepared transactions and provide list 
of all transactions that were prepared since that
3) Upon request, start tracking of all commited transactions and provide list 
of all transactions that were committed since that
4) Provide a future that will be completed when all prepared transactions will 
be committed
As a possible use case, such framework will allow to perform WAL shipping and 
establish transactional consistency externally.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11089) Get rid of partition ID in intra-partition page links stored in delta records

2019-01-25 Thread Ivan Rakov (JIRA)
Ivan Rakov created IGNITE-11089:
---

 Summary: Get rid of partition ID in intra-partition page links 
stored in delta records
 Key: IGNITE-11089
 URL: https://issues.apache.org/jira/browse/IGNITE-11089
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Rakov
 Fix For: 2.8


We had faced numerous bugs when pages that were initially allocated in 
partition X migrated to partition Y (example: IGNITE-8659). Such migration may 
cause storage corruption: in partition Y gets evicted, attempt to dereference 
link to migrated page will cause error.
We may prevent such situations in general and gain few percent performance 
boost at the same time:
1) Locate all links to pages in delta records, including self-links (examples: 
InitNewPageRecord#newPageId, PagesListSetNextRecord#nextPageId, 
MergeRecord#rightId).
2) Change storing format for such links: save only 6 bytes instead of 8 
(without partition ID).
3) In every delta record constructor, assert that link's partition ID is equal 
to PageDeltaRecord#pageId. Exception is pages from index partition: they may 
refer to pages from other partitions by design.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-10413) Perform cache validation logic on primary node instead of near node

2018-11-26 Thread Ivan Rakov (JIRA)
Ivan Rakov created IGNITE-10413:
---

 Summary: Perform cache validation logic on primary node instead of 
near node
 Key: IGNITE-10413
 URL: https://issues.apache.org/jira/browse/IGNITE-10413
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Rakov
Assignee: Ivan Rakov
 Fix For: 2.8


Exchange is completed on clients asynchronously, that's why we can perform 
outdated validation when near node is client.
We have to execute validation on dht node instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-10387) Enable IgniteNodeAttributes#ATTR_MACS_OVERRIDE attribute in basic TC suites

2018-11-22 Thread Ivan Rakov (JIRA)
Ivan Rakov created IGNITE-10387:
---

 Summary: Enable IgniteNodeAttributes#ATTR_MACS_OVERRIDE attribute 
in basic TC suites
 Key: IGNITE-10387
 URL: https://issues.apache.org/jira/browse/IGNITE-10387
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Rakov
 Fix For: 2.8


On TeamCity all node instances start on the same physical machine, which 
implies that ATTR_MACS attribute will be the same over the whole cluster.
Some logic in Ignite depends on ATTR_MACS. Example is load balancing: 
GridCacheContext#selectAffinityNodeBalanced tries to pick node that has same 
MACs. This makes features like node balancing not fully covered by our tests.
In order to improve tests coverage we should use ATTR_MACS_OVERRIDE attribute 
at least in basic tests suites.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-10386) Add mode when WAL won't be disabled during rebalancing caused by BLT change

2018-11-22 Thread Ivan Rakov (JIRA)
Ivan Rakov created IGNITE-10386:
---

 Summary: Add mode when WAL won't be disabled during rebalancing 
caused by BLT change
 Key: IGNITE-10386
 URL: https://issues.apache.org/jira/browse/IGNITE-10386
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Rakov
 Fix For: 2.8


Enabling IgniteSystemProperties#IGNITE_DISABLE_WAL_DURING_REBALANCING disables 
WAL for cache group during rebalancing in case local node has no OWNING 
partitions for this group. 
We should add mode when in specific case (after BaselineTopology change) WAL 
won't be disabled even if this property is switched on.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-10277) Prepare-commit ordering is violated for one-phase commit transactions

2018-11-15 Thread Ivan Rakov (JIRA)
Ivan Rakov created IGNITE-10277:
---

 Summary: Prepare-commit ordering is violated for one-phase commit 
transactions
 Key: IGNITE-10277
 URL: https://issues.apache.org/jira/browse/IGNITE-10277
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Rakov
 Fix For: 2.8


Basic transactions invariant (all prepares should happen-before all commits) is 
violated in one-phase commit scenario.
Reproducer is attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-10048) Bounded iteration in standalone WAL iterator with compaction enabled may skip records

2018-10-29 Thread Ivan Rakov (JIRA)
Ivan Rakov created IGNITE-10048:
---

 Summary: Bounded iteration in standalone WAL iterator with 
compaction enabled may skip records
 Key: IGNITE-10048
 URL: https://issues.apache.org/jira/browse/IGNITE-10048
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Rakov
 Fix For: 2.8


Bounded iteration with non-zero start/end offsets may skip some records in 
intermediate segments. Reproducer (wal compaction should be enabled):
{noformat}
/**
 *
 */
public void testBoundedIterationOverSeveralSegments() throws Exception {
walCompactionEnabled = true;

IgniteEx ig = (IgniteEx)startGrid();

String archiveWalDir = getArchiveWalDirPath(ig);

ig.cluster().active(true);

IgniteCache cache = ig.getOrCreateCache(
new CacheConfiguration<>().setName("c-n").setAffinity(new 
RendezvousAffinityFunction(false, 32)));

IgniteCacheDatabaseSharedManager sharedMgr = 
ig.context().cache().context().database();

IgniteWriteAheadLogManager walMgr = 
ig.context().cache().context().wal();

WALPointer fromPtr = null;

int recordsCnt = WAL_SEGMENT_SIZE / 8 /* record size */ * 5;

for (int i = 0; i < recordsCnt; i++) {
WALPointer ptr = walMgr.log(new PartitionDestroyRecord(i, i));

if (i == 100)
fromPtr = ptr;
}

assertNotNull(fromPtr);

cache.put(1, 1);

forceCheckpoint();

// Generate WAL segments for filling WAL archive folder.
for (int i = 0; i < 2 * 
ig.configuration().getDataStorageConfiguration().getWalSegments(); i++) {
sharedMgr.checkpointReadLock();

try {
walMgr.log(new SnapshotRecord(i, false), 
RolloverType.NEXT_SEGMENT);
}
finally {
sharedMgr.checkpointReadUnlock();
}
}

cache.put(2, 2);

forceCheckpoint();

U.sleep(5000);

stopGrid();

WALIterator it = new IgniteWalIteratorFactory(log)
.iterator(new 
IteratorParametersBuilder().from((FileWALPointer)fromPtr).filesOrDirs(archiveWalDir));

TreeSet foundCounters = new TreeSet<>();

it.forEach(x -> {
WALRecord rec = x.get2();

if (rec instanceof PartitionDestroyRecord)
foundCounters.add(((WalRecordCacheGroupAware)rec).groupId());
});

assertEquals(new Integer(100), foundCounters.first());
assertEquals(new Integer(recordsCnt - 1), foundCounters.last());
assertEquals(recordsCnt - 100, foundCounters.size());
}

{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-10045) Add fail-fast mode to bounded iteration of StandaloneWalRecordsIterator

2018-10-29 Thread Ivan Rakov (JIRA)
Ivan Rakov created IGNITE-10045:
---

 Summary: Add fail-fast mode to bounded iteration of 
StandaloneWalRecordsIterator
 Key: IGNITE-10045
 URL: https://issues.apache.org/jira/browse/IGNITE-10045
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Rakov
 Fix For: 2.8


Since IGNITE-9294 StandaloneWalRecordsIterator supports bounded iteration. That 
means we can specify "from" and "to" WAL pointers and iterator will return 
records only between given bounds. 
The problem is that in current implementation StandaloneWalRecordsIterator just 
skips segments if they are missing. For example: if we'll specify fromIdx=0, 
toIdx = 10 and segments with indexes=[9, 10] will be missing, we'll just 
silently finish iteration on idx=8.
To prevent that, we should be able to switch on fail-fast mode, in which 
StandaloneWalRecordsIterator will throw error unless iteration is really 
started from left bound and ended on right bound.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-9913) Prevent data updates blocking in case of backup BLT server node leave

2018-10-17 Thread Ivan Rakov (JIRA)
Ivan Rakov created IGNITE-9913:
--

 Summary: Prevent data updates blocking in case of backup BLT 
server node leave
 Key: IGNITE-9913
 URL: https://issues.apache.org/jira/browse/IGNITE-9913
 Project: Ignite
  Issue Type: Improvement
  Components: general
Reporter: Ivan Rakov
Assignee: Ivan Rakov
 Fix For: 2.8


Ignite cluster performs distributed partition map exchange when any server node 
leaves or joins the topology.
Distributed PME blocks all updates and may take a long time. If all partitions 
are assigned according to the baseline topology and server node leaves, there's 
no actual need to perform distributed PME: every cluster node is able to 
recalculate new affinity assigments and partition states locally. If we'll 
implement such lightweight PME and handle mapping and lock requests on new 
topology version correctly, updates won't be stopped on server node leave 
(except updates of partitions that lost their primary copy).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-9785) Introduce read-only state in local node context

2018-10-03 Thread Ivan Rakov (JIRA)
Ivan Rakov created IGNITE-9785:
--

 Summary: Introduce read-only state in local node context
 Key: IGNITE-9785
 URL: https://issues.apache.org/jira/browse/IGNITE-9785
 Project: Ignite
  Issue Type: New Feature
Reporter: Ivan Rakov
Assignee: Aleksey Plekhanov
 Fix For: 2.8


It would be useful to have option to switch "read-only" state on Ignite node. 
Under read-only state:
1) Any attempt to update data via Cache API should throw exception
2) Any attempt to update data via DataStreamer should throw exception
Local read-only state may be helpful in further implementing global read-only 
cluster state, which can be switched via user API. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-9341) Notify metastorage listeners righ before start of discovery processor

2018-08-21 Thread Ivan Rakov (JIRA)
Ivan Rakov created IGNITE-9341:
--

 Summary: Notify metastorage listeners righ before start of 
discovery processor
 Key: IGNITE-9341
 URL: https://issues.apache.org/jira/browse/IGNITE-9341
 Project: Ignite
  Issue Type: Improvement
  Components: general
Reporter: Ivan Rakov
 Fix For: 2.7


onReadyForRead() is called only for inheritors of MetastorageLifecycleListener 
interface which are started prior to GridCacheProcessor. Listeners are notified 
at the moment of ReadOnlyMetastorage initialization, which in turn occur during 
GridCacheDatabaseSharedManager start.
We can split ReadOnlyMetastorage initialization and notification of listeners - 
this will allow all components to subscribe for read-only metastorage ready 
event.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-9294) StandaloneWalRecordsIterator: support iteration from custom pointer

2018-08-16 Thread Ivan Rakov (JIRA)
Ivan Rakov created IGNITE-9294:
--

 Summary: StandaloneWalRecordsIterator: support iteration from 
custom pointer
 Key: IGNITE-9294
 URL: https://issues.apache.org/jira/browse/IGNITE-9294
 Project: Ignite
  Issue Type: Improvement
  Components: persistence
Reporter: Ivan Rakov


StandaloneWalRecordsIterator can be constructed from set of files and dirs, but 
there's no option to pass WAL pointer to the iterator factory class to start 
iteration with. It can be worked around (by filtering all records prior to 
needed pointer), but also it would be handy to add such option to 
IgniteWalIteratorFactory API.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-8946) AssertionError can occur during reservation of WAL history for historical rebalance

2018-07-05 Thread Ivan Rakov (JIRA)
Ivan Rakov created IGNITE-8946:
--

 Summary: AssertionError can occur during reservation of WAL 
history for historical rebalance
 Key: IGNITE-8946
 URL: https://issues.apache.org/jira/browse/IGNITE-8946
 Project: Ignite
  Issue Type: Bug
Reporter: Ivan Rakov


Attempt to release WAL after exchange may fail with AssertionError. Seems like 
we have a bug and may try to release more WAL segments than we have reserved:
{noformat}
java.lang.AssertionError: null
at 
org.apache.ignite.internal.processors.cache.persistence.wal.SegmentReservationStorage.release(SegmentReservationStorage.java:54)
  - locked <0x1c12> (a 
org.apache.ignite.internal.processors.cache.persistence.wal.SegmentReservationStorage)
  at 
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.release(FileWriteAheadLogManager.java:862)
  at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.releaseHistoryForExchange(GridCacheDatabaseSharedManager.java:1691)
  - locked <0x1c17> (a 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager)
  at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onDone(GridDhtPartitionsExchangeFuture.java:1751)
  at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.finishExchangeOnCoordinator(GridDhtPartitionsExchangeFuture.java:2858)
  at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onAllReceived(GridDhtPartitionsExchangeFuture.java:2591)
  at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.processSingleMessage(GridDhtPartitionsExchangeFuture.java:2283)
  at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.access$100(GridDhtPartitionsExchangeFuture.java:129)
  at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$2.apply(GridDhtPartitionsExchangeFuture.java:2140)
  at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$2.apply(GridDhtPartitionsExchangeFuture.java:2128)
  at 
org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:383)
  at 
org.apache.ignite.internal.util.future.GridFutureAdapter.listen(GridFutureAdapter.java:353)
  at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onReceiveSingleMessage(GridDhtPartitionsExchangeFuture.java:2128)
  at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.processSinglePartitionUpdate(GridCachePartitionExchangeManager.java:1580)
  at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.access$1000(GridCachePartitionExchangeManager.java:138)
  at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$2.onMessage(GridCachePartitionExchangeManager.java:345)
  at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$2.onMessage(GridCachePartitionExchangeManager.java:325)
  at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$MessageHandler.apply(GridCachePartitionExchangeManager.java:2848)
  at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$MessageHandler.apply(GridCachePartitionExchangeManager.java:2827)
  at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1056)
  at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:581)
  at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:380)
  at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:306)
  at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:101)
  at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:295)
  at 
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1556)
  at 
org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1184)
  at 
org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:125)
  at 
org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1091)
  at 

[jira] [Created] (IGNITE-8910) PagesList.takeEmptyPage may fail with AssertionError: type = 1

2018-07-02 Thread Ivan Rakov (JIRA)
Ivan Rakov created IGNITE-8910:
--

 Summary: PagesList.takeEmptyPage may fail with AssertionError: 
type = 1
 Key: IGNITE-8910
 URL: https://issues.apache.org/jira/browse/IGNITE-8910
 Project: Ignite
  Issue Type: Bug
Reporter: Ivan Rakov
Assignee: Dmitriy Sorokin


Even after IGNITE-8769 fix, we sometimes get an AssertionError from free list 
during update operation. Page with type PageIO#T_DATA appears in free list for 
some reason.
Example hang on TC: 
https://ci.ignite.apache.org/viewLog.html?buildId=1442664=IgniteTests24Java8_PdsIndexingWalRecovery
Example stacktrace:
{noformat}
[15:59:26]W: [org.apache.ignite:ignite-indexing] 
java.lang.AssertionError: Assertion error on search row: 
org.apache.ignite.internal.processors.cache.tree.SearchRow@1e76dfc5
[15:59:26]W: [org.apache.ignite:ignite-indexing]at 
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invoke(BPlusTree.java:1643)
[15:59:26]W: [org.apache.ignite:ignite-indexing]at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke(IgniteCacheOffheapManagerImpl.java:1272)
[15:59:26]W: [org.apache.ignite:ignite-indexing]at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.invoke(GridCacheOffheapManager.java:1603)
[15:59:26]W: [org.apache.ignite:ignite-indexing]at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.invoke(IgniteCacheOffheapManagerImpl.java:370)
[15:59:26]W: [org.apache.ignite:ignite-indexing]at 
org.apache.ignite.internal.processors.cache.GridCacheMapEntry.innerUpdate(GridCacheMapEntry.java:1755)
[15:59:26]W: [org.apache.ignite:ignite-indexing]at 
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateSingle(GridDhtAtomicCache.java:2436)
[15:59:26]W: [org.apache.ignite:ignite-indexing]at 
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.update(GridDhtAtomicCache.java:1898)
[15:59:26]W: [org.apache.ignite:ignite-indexing]at 
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal0(GridDhtAtomicCache.java:1740)
[15:59:26]W: [org.apache.ignite:ignite-indexing]at 
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal(GridDhtAtomicCache.java:1630)
[15:59:26]W: [org.apache.ignite:ignite-indexing]at 
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicAbstractUpdateFuture.sendSingleRequest(GridNearAtomicAbstractUpdateFuture.java:299)
[15:59:26]W: [org.apache.ignite:ignite-indexing]at 
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicSingleUpdateFuture.map(GridNearAtomicSingleUpdateFuture.java:483)
[15:59:26]W: [org.apache.ignite:ignite-indexing]at 
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicSingleUpdateFuture.mapOnTopology(GridNearAtomicSingleUpdateFuture.java:443)
[15:59:26]W: [org.apache.ignite:ignite-indexing]at 
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicAbstractUpdateFuture.map(GridNearAtomicAbstractUpdateFuture.java:248)
[15:59:26]W: [org.apache.ignite:ignite-indexing]at 
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.update0(GridDhtAtomicCache.java:1119)
[15:59:26]W: [org.apache.ignite:ignite-indexing]at 
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.put0(GridDhtAtomicCache.java:609)
[15:59:26]W: [org.apache.ignite:ignite-indexing]at 
org.apache.ignite.internal.processors.cache.GridCacheAdapter.put(GridCacheAdapter.java:2428)
[15:59:26]W: [org.apache.ignite:ignite-indexing]at 
org.apache.ignite.internal.processors.cache.GridCacheAdapter.put(GridCacheAdapter.java:2405)
[15:59:26]W: [org.apache.ignite:ignite-indexing]at 
org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.put(IgniteCacheProxyImpl.java:1084)
[15:59:26]W: [org.apache.ignite:ignite-indexing]at 
org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.put(GatewayProtectedCacheProxy.java:812)
[15:59:26]W: [org.apache.ignite:ignite-indexing]at 
org.apache.ignite.internal.processors.cache.persistence.db.wal.IgniteWalRecoveryTest$2.call(IgniteWalRecoveryTest.java:551)
[15:59:26]W: [org.apache.ignite:ignite-indexing]at 
org.apache.ignite.internal.processors.cache.persistence.db.wal.IgniteWalRecoveryTest$2.call(IgniteWalRecoveryTest.java:546)
[15:59:26]W:

[jira] [Created] (IGNITE-8811) Create copies of basic TC suites with PDS delta checking framework enabled

2018-06-15 Thread Ivan Rakov (JIRA)
Ivan Rakov created IGNITE-8811:
--

 Summary: Create copies of basic TC suites with PDS delta checking 
framework enabled
 Key: IGNITE-8811
 URL: https://issues.apache.org/jira/browse/IGNITE-8811
 Project: Ignite
  Issue Type: Bug
 Environment: With having PDS delta checking framework implemented 
(IGNITE-8529), we should add some copies of existing TC suites with system 
property that enables self-check on checkpoint.
We can start with Full API and SQL suites. Please note that some tests may fail 
with OOM as Ignite node with framework consumes x2 RAM.
Reporter: Ivan Rakov
Assignee: Ivan Rakov






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-8783) Failover tests periodically cause hanging of the whole Data Structures suite on TC

2018-06-13 Thread Ivan Rakov (JIRA)
Ivan Rakov created IGNITE-8783:
--

 Summary: Failover tests periodically cause hanging of the whole 
Data Structures suite on TC
 Key: IGNITE-8783
 URL: https://issues.apache.org/jira/browse/IGNITE-8783
 Project: Ignite
  Issue Type: Bug
  Components: data structures
Reporter: Ivan Rakov


History of suite runs: 
https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_DataStructures=buildTypeHistoryList_IgniteTests24Java8=%3Cdefault%3E
Chance of suite hang is 18% (based on previous 50 runs).
One of the following failover tests is always a reason of hang:
{noformat}
GridCacheReplicatedDataStructuresFailoverSelfTest#testAtomicSequenceConstantTopologyChange
GridCachePartitionedDataStructuresFailoverSelfTest#testFairReentrantLockConstantTopologyChangeNonFailoverSafe
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-8761) WAL fsync at rollover should be asynchronous in LOG_ONLY and BACKGROUND modes

2018-06-09 Thread Ivan Rakov (JIRA)
Ivan Rakov created IGNITE-8761:
--

 Summary: WAL fsync at rollover should be asynchronous in LOG_ONLY 
and BACKGROUND modes
 Key: IGNITE-8761
 URL: https://issues.apache.org/jira/browse/IGNITE-8761
 Project: Ignite
  Issue Type: Improvement
  Components: persistence
Reporter: Ivan Rakov
 Fix For: 2.6


Transactions may periodically hang for a few seconds in LOG_ONLY or BACKGROUND 
persistent modes. Thread dumps show that threads are hanging on syncing 
previous WAL segment during rollover:
{noformat}
  java.lang.Thread.State: RUNNABLE
   at java.nio.MappedByteBuffer.force0(MappedByteBuffer.java:-1)
   at java.nio.MappedByteBuffer.force(MappedByteBuffer.java:203)
   at 
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileWriteHandle.close(FileWriteAheadLogManager.java:2843)
   at 
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileWriteHandle.access$600(FileWriteAheadLogManager.java:2483)
   at 
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.rollOver(FileWriteAheadLogManager.java:1094)
{noformat}
Waiting for this fsync is not necessary action to ensure crash recovery 
guarantees. Instead of this, we should just perform fsyncs asychronously and 
ensure that they are completed prior to next checkpoint start.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-8757) idle_verify utility doesn't show both update counter and hash conflicts

2018-06-08 Thread Ivan Rakov (JIRA)
Ivan Rakov created IGNITE-8757:
--

 Summary: idle_verify utility doesn't show both update counter and 
hash conflicts
 Key: IGNITE-8757
 URL: https://issues.apache.org/jira/browse/IGNITE-8757
 Project: Ignite
  Issue Type: Bug
Reporter: Ivan Rakov
Assignee: Ivan Rakov


If there are two partitions in cluster, one with different update counters and 
one with different data, idle_verify will show only partition with broken 
counters.
We should show both for better visibility. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-8735) Metastorage creates its own index partition

2018-06-07 Thread Ivan Rakov (JIRA)
Ivan Rakov created IGNITE-8735:
--

 Summary: Metastorage creates its own index partition
 Key: IGNITE-8735
 URL: https://issues.apache.org/jira/browse/IGNITE-8735
 Project: Ignite
  Issue Type: Bug
  Components: persistence
Reporter: Ivan Rakov
 Fix For: 2.6


By design, all metastorage data should be stored in single partition with index 
= 0. However, allocatePageNoReuse is not overriden in MetastorageTree, which 
cause allocation of extra pages for the tree in index partition.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-8694) SQL JOIN between PARTITIONED and REPLICATED cache fails

2018-06-04 Thread Ivan Rakov (JIRA)
Ivan Rakov created IGNITE-8694:
--

 Summary: SQL JOIN between PARTITIONED and REPLICATED cache fails
 Key: IGNITE-8694
 URL: https://issues.apache.org/jira/browse/IGNITE-8694
 Project: Ignite
  Issue Type: Bug
Reporter: Ivan Rakov
Assignee: Ivan Rakov
 Fix For: 2.6


We already have IGNITE-7766 where test fails due to the same problem.
Particular case with PARTITIONED and REPLICATED cache will be fixed under this 
ticket, while rest of the work will be completed under IGNITE-7766.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-8682) Attempt to configure IGFS in persistent mode without specific data region ends with AssertionError

2018-06-01 Thread Ivan Rakov (JIRA)
Ivan Rakov created IGNITE-8682:
--

 Summary: Attempt to configure IGFS in persistent mode without 
specific data region ends with AssertionError
 Key: IGNITE-8682
 URL: https://issues.apache.org/jira/browse/IGNITE-8682
 Project: Ignite
  Issue Type: Bug
Reporter: Ivan Rakov


If persistence is enabled and data region name is not specified in IGFS 
configuration, attempt to access IGFS internal cache results in the following 
error:
{noformat}
[00:40:03]W:  java.lang.AssertionError
[00:40:03]W:at 
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.allocatePage(PageMemoryImpl.java:463)
[00:40:03]W:at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.allocateForTree(IgniteCacheOffheapManagerImpl.java:818)
[00:40:03]W:at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.initPendingTree(IgniteCacheOffheapManagerImpl.java:164)
[00:40:03]W:at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.onCacheStarted(IgniteCacheOffheapManagerImpl.java:151)
[00:40:03]W:at 
org.apache.ignite.internal.processors.cache.CacheGroupContext.onCacheStarted(CacheGroupContext.java:283)
[00:40:03]W:at 
org.apache.ignite.internal.processors.cache.GridCacheProcessor.prepareCacheStart(GridCacheProcessor.java:1965)
[00:40:03]W:at 
org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager.onCacheChangeRequest(CacheAffinitySharedManager.java:791)
[00:40:03]W:at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onClusterStateChangeRequest(GridDhtPartitionsExchangeFuture.java:946)
[00:40:03]W:at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:651)
[00:40:03]W:at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:2458)
[00:40:03]W:at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2338)
[00:40:03]W:at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
[00:40:03]W:at java.lang.Thread.run(Thread.java:748)
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-8652) Cache dynamically started from client while there are no affinity server nodes will be considered in-memory

2018-05-30 Thread Ivan Rakov (JIRA)
Ivan Rakov created IGNITE-8652:
--

 Summary: Cache dynamically started from client while there are no 
affinity server nodes will be considered in-memory
 Key: IGNITE-8652
 URL: https://issues.apache.org/jira/browse/IGNITE-8652
 Project: Ignite
  Issue Type: Bug
Reporter: Ivan Rakov
 Fix For: 2.6


We implemented stealing data storage configuration from affinity server node 
during initialization of dynamic cache on client (IGNITE-8476). Thought, if 
there are no affinity nodes at the moment of cache start, client will consider 
cache as in-memory even when affinity node will proper data storage 
configuration (telling that it's actually persistent cache) will appear.
That means, cache operations on client may fail with the same error:
{noformat}
java.lang.AssertionError: Wrong ready topology version for invalid partitions 
response
{noformat}
Reproduced by 
ClientAffinityAssignmentWithBaselineTest#testDynamicCacheStartNoAffinityNodes 
should pass after the fix.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-8646) Setting different MAC addresses to nodes in test environment causes mass test fail

2018-05-30 Thread Ivan Rakov (JIRA)
Ivan Rakov created IGNITE-8646:
--

 Summary: Setting different MAC addresses to nodes in test 
environment causes mass test fail
 Key: IGNITE-8646
 URL: https://issues.apache.org/jira/browse/IGNITE-8646
 Project: Ignite
  Issue Type: Bug
Reporter: Ivan Rakov
 Fix For: 2.6


There are some parts of logic in Ignite that check whether two nodes are 
actually hosted on the same physical machine (e.g. excludeNeighbors flag in 
affinity function, load balancing for replicated cache, etc) and choose the 
appropriate behavior. These part can be tracked by usages of 
IgniteNodeAttributes#ATTR_MACS attribute.
I've tried to emulate distributed environment in tests by overriding ATTR_MACS 
with random UUID. This caused mass consistency failures in basic and Full API 
tests. We should investigate this: probably, many bugs are hidden by the fact 
that nodes are always started on the same physical machine in our TeamCity 
tests.

PR with macs override: https://github.com/apache/ignite/pull/4084
TC run: 
https://ci.ignite.apache.org/viewLog.html?buildId=1342076=buildResultsDiv=IgniteTests24Java8_RunAll



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-8625) Dynamic SQL index recreate after cache clear may result in AssertionError or JVM crash

2018-05-28 Thread Ivan Rakov (JIRA)
Ivan Rakov created IGNITE-8625:
--

 Summary: Dynamic SQL index recreate after cache clear may result 
in AssertionError or JVM crash
 Key: IGNITE-8625
 URL: https://issues.apache.org/jira/browse/IGNITE-8625
 Project: Ignite
  Issue Type: Bug
Reporter: Ivan Rakov
 Fix For: 2.6


After recreation of previously dropped SQL index, root page of new index B+ 
tree may contain links to data entries from previous index tree. If they were 
removed or relocated to another data page, attempt to dereference these links 
may throw AssertionError or even cause JVM crash.
Patch with reproducer is attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-8594) Make error messages in validate_indexes command report more informative

2018-05-24 Thread Ivan Rakov (JIRA)
Ivan Rakov created IGNITE-8594:
--

 Summary: Make error messages in validate_indexes command report 
more informative
 Key: IGNITE-8594
 URL: https://issues.apache.org/jira/browse/IGNITE-8594
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Rakov
Assignee: Ivan Rakov
 Fix For: 2.6


In case index is broken and contains links to missing items in data pages, 
validate_indexes command will show "Item not found" messages in report:
{noformat}
IndexValidationIssue [key=null, cacheName=cache_group_1_028, idxName=_key_PK], 
class java.lang.IllegalStateException: Item not found: 65
IndexValidationIssue [key=null, cacheName=cache_group_1_028, idxName=_key_PK], 
class java.lang.IllegalStateException: Item not found: 15
SQL Index [cache=cache_group_1_028, idx=LONG__VAL_IDX] 
ValidateIndexesPartitionResult [consistentId=node2, sqlIdxName=LONG__VAL_IDX]
IndexValidationIssue [key=null, cacheName=cache_group_1_028, 
idxName=LONG__VAL_IDX], class java.lang.IllegalStateException: Item not found: 
60
IndexValidationIssue [key=null, cacheName=cache_group_1_028, 
idxName=LONG__VAL_IDX], class java.lang.IllegalStateException: Item not found: 
65
IndexValidationIssue [key=null, cacheName=cache_group_1_028, 
idxName=LONG__VAL_IDX], class java.lang.IllegalStateException: Item not found: 
65
IndexValidationIssue [key=null, cacheName=cache_group_1_028, 
idxName=LONG__VAL_IDX], class java.lang.IllegalStateException: Item not found: 
15
{noformat}
It would be better to explain what is happening: key is present in SQL index, 
but missing in corresponding data page.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-8572) StandaloneWalRecordsIterator may throw NPE if compressed WAL segment is empty

2018-05-23 Thread Ivan Rakov (JIRA)
Ivan Rakov created IGNITE-8572:
--

 Summary: StandaloneWalRecordsIterator may throw NPE if compressed 
WAL segment is empty
 Key: IGNITE-8572
 URL: https://issues.apache.org/jira/browse/IGNITE-8572
 Project: Ignite
  Issue Type: New Feature
  Components: persistence
Reporter: Ivan Rakov
Assignee: Ivan Rakov
 Fix For: 2.6


In case ZIP archive with WAL segment doesn't contain any ZIP entries, attempt 
to iterate through it with standalone WAL iterator will throw NPE:
{noformat}
Caused by: java.lang.NullPointerException
at 
org.apache.ignite.internal.processors.cache.persistence.file.UnzipFileIO.(UnzipFileIO.java:53)
at 
org.apache.ignite.internal.processors.cache.persistence.wal.AbstractWalRecordsIterator.initReadHandle(AbstractWalRecordsIterator.java:265)
at 
org.apache.ignite.internal.processors.cache.persistence.wal.reader.StandaloneWalRecordsIterator.advanceSegment(StandaloneWalRecordsIterator.java:262)
at 
org.apache.ignite.internal.processors.cache.persistence.wal.AbstractWalRecordsIterator.advance(AbstractWalRecordsIterator.java:155)
at 
org.apache.ignite.internal.processors.cache.persistence.wal.reader.StandaloneWalRecordsIterator.(StandaloneWalRecordsIterator.java:111)
at 
org.apache.ignite.internal.processors.cache.persistence.wal.reader.IgniteWalIteratorFactory.iteratorArchiveDirectory(IgniteWalIteratorFactory.java:156)
... 6 more
{noformat}
We should throw excpetion with descriptive error message instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-8529) Implement testing framework for checking delta records consistency

2018-05-18 Thread Ivan Rakov (JIRA)
Ivan Rakov created IGNITE-8529:
--

 Summary: Implement testing framework for checking delta records 
consistency
 Key: IGNITE-8529
 URL: https://issues.apache.org/jira/browse/IGNITE-8529
 Project: Ignite
  Issue Type: New Feature
  Components: persistence
Reporter: Ivan Rakov


We use sharp checkpointing of page memory in persistent mode. That implies that 
we write two types of record to write-ahead log: logical (e.g. data records) 
and phyisical (page snapshots + binary delta records). Physical records are 
applied only when node crashes/stops during ongoing checkpoint. We have the 
following invariant: checkpoint #(n-1) + all physical records = checkpoint #n.
If correctness of physical records is broken, Ignite node may recover with 
incorrect page memory state, which in turn can bring unexpected delayed errors. 
However, consistency of physical records is poorly tested: only small part of 
our autotests perform node restarts, and even less part of them performs node 
stop when ongoing checkpoint is running.
We should implement abstract test that:
1. Enforces checkpoint, freezes memory state at the moment of checkpoint.
2. Performs necessary test load.
3. Enforces checkpoint again, replays WAL and checks that page store at the 
moment of previous checkpoint with all applied physical records exactly equals 
to current checkpoint state.
Except for checking correctness, test framework should do the following:
1. Gather statistics (like histogram) for types of wriiten physical records. 
That will help us to know what types of physical records are covered by test.
2. Visualize expected and actual page state (with all applied physical records) 
if incorrect page state is detected.
Regarding implementation, I suppose we can use checkpoint listener mechanism to 
freeze page memory state at the moment of checkpoint.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-8499) validate_indexes command doesn't detect absent rows in cache data tree

2018-05-15 Thread Ivan Rakov (JIRA)
Ivan Rakov created IGNITE-8499:
--

 Summary: validate_indexes command doesn't detect absent rows in 
cache data tree
 Key: IGNITE-8499
 URL: https://issues.apache.org/jira/browse/IGNITE-8499
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Rakov
Assignee: Ivan Rakov


validate_indexes command performs lookup only in one direction: 
*if* something is present in cache data tree *then* it should be present in SQL 
indexes.
We should perform lookup in reverse direction as well to ensure that indexes 
are correct.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-8473) Add option to enable/disable WAL for several caches with single command

2018-05-11 Thread Ivan Rakov (JIRA)
Ivan Rakov created IGNITE-8473:
--

 Summary: Add option to enable/disable WAL for several caches with 
single command
 Key: IGNITE-8473
 URL: https://issues.apache.org/jira/browse/IGNITE-8473
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Rakov
 Fix For: 2.6


API method for disabling WAL in IgniteCluster accepts only one cache name. 
Every call triggers exchange and checkpoints cluster-wide - it takes plenty of 
time to disable/enable WAL for multiple caches.
We should add option to disable/enable WAL for several caches with single 
command. 

New proposed API methods:
IgniteCluster.disableWal(Collection cacheNames)
IgniteCluster.enableWal(Collection cacheNames)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-8453) FileDecompressor may access HashMap without proper synchronization

2018-05-08 Thread Ivan Rakov (JIRA)
Ivan Rakov created IGNITE-8453:
--

 Summary: FileDecompressor may access HashMap without proper 
synchronization
 Key: IGNITE-8453
 URL: https://issues.apache.org/jira/browse/IGNITE-8453
 Project: Ignite
  Issue Type: Bug
Reporter: Ivan Rakov
Assignee: Ivan Rakov
 Fix For: 2.5


Caused by IGNITE-8429.
FileDecompressor performs a remove from regular HashMap (which is shared to 
other threads) without synchronization.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-8416) CommandHandlerParsingTest stably fails with parsing error

2018-04-27 Thread Ivan Rakov (JIRA)
Ivan Rakov created IGNITE-8416:
--

 Summary: CommandHandlerParsingTest stably fails with parsing error
 Key: IGNITE-8416
 URL: https://issues.apache.org/jira/browse/IGNITE-8416
 Project: Ignite
  Issue Type: Bug
Reporter: Ivan Rakov
Assignee: Ivan Rakov
 Fix For: 2.5


Example stacktrace:
{noformat}
java.lang.IllegalArgumentException: Arguments are expected for --cache 
subcommand, run --cache help for more info.
at 
org.apache.ignite.internal.commandline.CommandHandlerParsingTest.testConnectionSettings(CommandHandlerParsingTest.java:94)
--- Stderr: ---
java.lang.IllegalArgumentException: Invalid value for port: wrong-port
at 
org.apache.ignite.internal.commandline.CommandHandler.parseAndValidate(CommandHandler.java:1112)
at 
org.apache.ignite.internal.commandline.CommandHandlerParsingTest.testConnectionSettings(CommandHandlerParsingTest.java:110)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at junit.framework.TestCase.runTest(TestCase.java:176)
at junit.framework.TestCase.runBare(TestCase.java:141)
at junit.framework.TestResult$1.protect(TestResult.java:122)
at junit.framework.TestResult.runProtected(TestResult.java:142)
at junit.framework.TestResult.run(TestResult.java:125)
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-8299) Optimize allocations and CPU consumption in active page replacement scenario

2018-04-17 Thread Ivan Rakov (JIRA)
Ivan Rakov created IGNITE-8299:
--

 Summary: Optimize allocations and CPU consumption in active page 
replacement scenario
 Key: IGNITE-8299
 URL: https://issues.apache.org/jira/browse/IGNITE-8299
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Rakov
Assignee: Ivan Rakov


Ignite performance significantly decreases when total size of local data is 
much greater than size of RAM. It can be explained by change of disk access 
pattern (random reads + random writes is complex even for SSDs), but after 
analysis of persistence code and JFRs it's clear that there's still room for 
optimization.
The following possible optimizations should be investigated:
1) PageMemoryImpl.Segment#partGeneration performs allocation of 
GroupPartitionId during HashMap.get - we can get rid of it
2) LoadedPagesMap#getNearestAt is invoked at least 5 times in 
PageMemoryImpl.Segment#removePageForReplacement. It performs two allocations - 
we can get rid of it
3) If one of 5 evict candidates was erroneous, we'll find 5 new ones - we can 
reuse remaining 4 instead



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-8162) Handle ClassNotFoundException during deserialization of persisted cache configuration

2018-04-06 Thread Ivan Rakov (JIRA)
Ivan Rakov created IGNITE-8162:
--

 Summary: Handle ClassNotFoundException during deserialization of 
persisted cache configuration 
 Key: IGNITE-8162
 URL: https://issues.apache.org/jira/browse/IGNITE-8162
 Project: Ignite
  Issue Type: Improvement
  Components: general
Affects Versions: 2.4
Reporter: Ivan Rakov
 Fix For: 2.6


Ticket is created according to dev list discussion: 
http://apache-ignite-developers.2346864.n4.nabble.com/Fwd-Data-Loss-while-upgrading-custom-jar-from-old-jar-in-server-and-client-nodes-td28808.html
Cache configuration is serialized by JDK marshaller and persisted in 
cache_data.dat file. It may contain instances of classes that disappeared from 
runtime classpath (e.g. implementation of CacheStore has been renamed). In such 
case, node will fail on start.
We should handle this and show meaningful message with instruction how to 
overcome this issue - delete cache_data.dat and restart cache.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-8111) Add extra validation for WAL segment size

2018-04-02 Thread Ivan Rakov (JIRA)
Ivan Rakov created IGNITE-8111:
--

 Summary: Add extra validation for WAL segment size
 Key: IGNITE-8111
 URL: https://issues.apache.org/jira/browse/IGNITE-8111
 Project: Ignite
  Issue Type: Improvement
Affects Versions: 2.4
Reporter: Ivan Rakov


Currently we can set extra small DataStorageConfiguration#walSegmentSize (10 
pages or even less than one page), which will trigger multiple assertion errors 
in code.
We have to implement validation on node start that WAL segment size has 
reasonable value (e.g. more than 512KB).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-8057) Execute WAL fsync preventively before checkpoint begin

2018-03-27 Thread Ivan Rakov (JIRA)
Ivan Rakov created IGNITE-8057:
--

 Summary: Execute WAL fsync preventively before checkpoint begin
 Key: IGNITE-8057
 URL: https://issues.apache.org/jira/browse/IGNITE-8057
 Project: Ignite
  Issue Type: Improvement
Affects Versions: 2.5
Reporter: Ivan Rakov


After fix of IGNITE-7754, we execute explicit WAL fsync on checkpoint begin:
{noformat}
if (hasPages) {
assert cpPtr != null;

tracker.onWalCpRecordFsyncStart();

// Sync log outside the checkpoint write lock.
cctx.wal().flush(cpPtr, true);

tracker.onWalCpRecordFsyncEnd();
{noformat}
It's executed outside of checkpoint write lock. However, it still can decrease 
overall throughput by suspending writing of dirty pages by checkpoint threads.
We can decrease time of this fsync by executing it preemptevely, before 
acquiring checkpoint write lock. 
We should prioritize this ticket if value of walCpRecordFsyncDuration metric in 
"Checkpoint started" message will be too big.
Note: it's possible to give a fsync hint to WAL manager in single-writer mode.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-7901) Refactor Pages Write Throttling: introduce exponential throttle as separate class

2018-03-07 Thread Ivan Rakov (JIRA)
Ivan Rakov created IGNITE-7901:
--

 Summary: Refactor Pages Write Throttling: introduce exponential 
throttle as separate class
 Key: IGNITE-7901
 URL: https://issues.apache.org/jira/browse/IGNITE-7901
 Project: Ignite
  Issue Type: Improvement
Affects Versions: 2.5
Reporter: Ivan Rakov
Assignee: Dmitriy Pavlov


After IGNITE-7751 fix, we have three incarnations of Pages Write Throttle:

1) Only checkpoint buffer throttling - always on

2) Ratio based throttling - legacy

3) Speed based throttling - default when throttling is enabled

However, all three options use exponential throttling for preventing checkpoint 
buffer overflow (see PagesWriteSpeedBasedThrottle.ThrottleMode#EXPONENTIAL 
usages and isPageInCheckpoint branch of PagesWriteThrottle). 
For the sake of getting rid of copypaste, it would be better to refactor this 
and introduce exponential throttling to separate class. Two callbacks now will 
be called instead of one, but code will become nicer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-7751) Pages Write Throttle mode doesn't protect from checkpoint buffer overflow

2018-02-19 Thread Ivan Rakov (JIRA)
Ivan Rakov created IGNITE-7751:
--

 Summary: Pages Write Throttle mode doesn't protect from checkpoint 
buffer overflow
 Key: IGNITE-7751
 URL: https://issues.apache.org/jira/browse/IGNITE-7751
 Project: Ignite
  Issue Type: Improvement
Affects Versions: 2.3
Reporter: Ivan Rakov
Assignee: Ivan Rakov
 Fix For: 2.5


Even with write throttling enabled, checkpoint buffer still can be overflowed. 
Example stacktrace:
{noformat}
2018-02-17 21:00:14.777 
[ERROR][sys-stripe-12-#13%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.dht.GridDhtTxRemote]
 Commit failed.
org.apache.ignite.internal.transactions.IgniteTxHeuristicCheckedException: 
Commit produced a runtime exception (all transaction entries will be 
invalidated): 
GridDhtTxRemote[id=06db48da161--07c5-23f5--0005, 
concurrency=OPTIMISTIC, isolation=SERIALIZABLE, state=COMMITTING, 
invalidate=false, rollbackOnly=false, 
nodeId=da415868-d9b3-48a5-9b56-0706ae60dd3b, duration=60]
at 
org.apache.ignite.internal.processors.cache.distributed.GridDistributedTxRemoteAdapter.commitIfLocked(GridDistributedTxRemoteAdapter.java:739)
at 
org.apache.ignite.internal.processors.cache.distributed.GridDistributedTxRemoteAdapter.commitRemoteTx(GridDistributedTxRemoteAdapter.java:813)
at 
org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.finish(IgniteTxHandler.java:1319)
at 
org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.processDhtTxFinishRequest(IgniteTxHandler.java:1231)
at 
org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.access$600(IgniteTxHandler.java:97)
at 
org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$7.apply(IgniteTxHandler.java:213)
at 
org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$7.apply(IgniteTxHandler.java:211)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1060)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:579)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:378)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:304)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:99)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:293)
at 
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1555)
at 
org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1183)
at 
org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:126)
at 
org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1090)
at 
org.apache.ignite.internal.util.StripedExecutor$Stripe.run(StripedExecutor.java:499)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.ignite.IgniteException: Runtime failure on row: 
Row@9f0a081[ key: 4694439661580364888, val: 
com.sbt.bm.ucp.common.dpl.model.party.DUserInfo_DPL_PROXY [idHash=1290746929, 
hash=400782371, colocationKey=16678, lastChangeDate=1518890414661, 
userFullName=null, partition_DPL_id=6, bankInfo_DPL_id=4694439661580364888, 
bankInfo_DPL_colocationKey=16678, ownerId=null, 
infoFlowChannel_DPL_colocationKey=0, userLogin=reloading, 
uid=1102030258731339432, isDeleted=false, infoFlowChannel_DPL_id=0, 
sourceSystem_DPL_id=65, id=4694439661580364888, 
colocationId=1102030258828706483], ver: GridCacheVersion [topVer=130360309, 
order=1519034613156, nodeOrder=5] ][ 1102030258731339432, reloading, 
4694439661580364888, 0, null, 65, 4694439661580364888, FALSE, 6 ]
at 
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.doPut(BPlusTree.java:2102)
at 
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.putx(BPlusTree.java:2049)
at 
org.apache.ignite.internal.processors.query.h2.database.H2TreeIndex.putx(H2TreeIndex.java:247)
at 
org.apache.ignite.internal.processors.query.h2.opt.GridH2Table.addToIndex(GridH2Table.java:536)
at 
org.apache.ignite.internal.processors.query.h2.opt.GridH2Table.update(GridH2Table.java:468)
at 
org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.store(IgniteH2Indexing.java:595)
at 
org.apache.ignite.internal.processors.query.GridQueryProcessor.store(GridQueryProcessor.java:1865)
at 
org.apache.ignite.internal.processors.cache.query.GridCacheQueryManager.store(GridCacheQueryManager.java:407)
at 

[jira] [Created] (IGNITE-7475) Improve VerifyBackupPartitionsTask to calculate partition hashes in multiple threads

2018-01-19 Thread Ivan Rakov (JIRA)
Ivan Rakov created IGNITE-7475:
--

 Summary: Improve VerifyBackupPartitionsTask to calculate partition 
hashes in multiple threads
 Key: IGNITE-7475
 URL: https://issues.apache.org/jira/browse/IGNITE-7475
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Rakov
Assignee: Ivan Rakov


Currently, compute task VerifyBackupPartitionsTask calculates all hashes in 
ComputeJob#execute caller thread. Using multiple threads can bring significant 
speedup.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-7182) Long sorting of pages collection on checkpoint begin can cause zero dropdown even with throttling enabled

2017-12-13 Thread Ivan Rakov (JIRA)
Ivan Rakov created IGNITE-7182:
--

 Summary: Long sorting of pages collection on checkpoint begin can 
cause zero dropdown even with throttling enabled
 Key: IGNITE-7182
 URL: https://issues.apache.org/jira/browse/IGNITE-7182
 Project: Ignite
  Issue Type: Bug
  Components: persistence
Affects Versions: 2.3
Reporter: Ivan Rakov
Assignee: Dmitriy Pavlov
 Fix For: 2.4


Tests show that GridCacheDatabaseSharedManager#splitAndSortCpPagesIfNeeded call 
can last several seconds on nodes with big amount af memory (>10GB). We should 
optimize sorting algorithm, possibly making it multithreaded.
Another option to make pages write throttling more smooth is to get rid of this 
heuristic:
{noformat}
// Starting with 0.05 to avoid throttle right after checkpoint 
start
// 7/12 is maximum ratio of dirty pages
dirtyRatioThreshold = (dirtyRatioThreshold * 0.95 + 0.05) * 7 / 
12;
{noformat}
We should replace "magic" lower bound 0.05 * 7 / 12 with the real percentage of 
dirty pages at the moment of 
GridCacheDatabaseSharedManager.Checkpointer#markCheckpointBegin call return.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-6629) Make service persistence and automatic redeployment configurable in ServiceConfiguration

2017-10-13 Thread Ivan Rakov (JIRA)
Ivan Rakov created IGNITE-6629:
--

 Summary: Make service persistence and automatic redeployment 
configurable in ServiceConfiguration
 Key: IGNITE-6629
 URL: https://issues.apache.org/jira/browse/IGNITE-6629
 Project: Ignite
  Issue Type: Improvement
  Components: managed services
Affects Versions: 2.3
Reporter: Ivan Rakov
Assignee: Alexey Goncharuk
 Fix For: 2.4


Before 2.3, if persistence was enabled globally, services were recovered along 
with system cache. But in 2.3, persistence can be enabled for per data region 
(IGNITE-6030), and system data region is not persistent.
We should add feaure to configure service redeployment after restart. 
Service-related information should be stored in Metastore instead of system 
cache.

IgniteChangeGlobalStateServiceTest#testDeployService should be fixed under this 
ticket.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-6532) Introduce preallocation in LFS files to avoid high fragmentation on filesystem level

2017-09-29 Thread Ivan Rakov (JIRA)
Ivan Rakov created IGNITE-6532:
--

 Summary: Introduce preallocation in LFS files to avoid high 
fragmentation on filesystem level
 Key: IGNITE-6532
 URL: https://issues.apache.org/jira/browse/IGNITE-6532
 Project: Ignite
  Issue Type: Bug
  Components: persistence
Affects Versions: 2.2
Reporter: Ivan Rakov
 Fix For: 2.4


Modern databases (Oracle, MySql) work with storage drive on physical level, 
creating their own partition table and filesystem.
Ignite Persistent Store work with regular files. It appends new pages to 
partition file once new pages are allocated and written on checkpoint. These 
new pages can form one or several fragments on filesystem level.
As a result, after weeks of uptime, partition files can contain huge amount of 
fragments. There were reports about 120 fragments in index.bin file on XFS 
filesystem. 
We can work this around by preallocating files in bigger chunks, e.g. 1000 
pages at a time. On the other hand, early allocation will increase LFS size 
overhead, so we should consider reasonable heuristic for allocation.
Allocation should be performed on native level. Just writing a byte at position 
(file_size + page_size * 1000) won't do it because XFS (and other filesystems 
as well) has an optimization for that case. Missing range will be just skipped.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-6504) Very quick checkpoint can cause AssertionError on next start from LFS

2017-09-26 Thread Ivan Rakov (JIRA)
Ivan Rakov created IGNITE-6504:
--

 Summary: Very quick checkpoint can cause AssertionError on next 
start from LFS
 Key: IGNITE-6504
 URL: https://issues.apache.org/jira/browse/IGNITE-6504
 Project: Ignite
  Issue Type: Bug
  Components: persistence
Affects Versions: 2.1
Reporter: Ivan Rakov
Assignee: Ivan Rakov
 Fix For: 2.4


Checkpoint markers are compared using their timestamps. If checkpoint took less 
than 1 millisecond, two subsequent markers will have same timestamp, which will 
lead to error:
{noformat}
java.lang.AssertionError: 
o1=/data/teamcity/tmpfs/work/db/127_0_0_1_47503/cp/1506338145591-c4f23411-e1b1-4468-856a-4419003bba93-END.bin,
 
o2=/data/teamcity/tmpfs/work/db/127_0_0_1_47503/cp/1506338145591-f76c023b-9982-40d7-a1eb-855a33b710f2-END.bin
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$4.compare(GridCacheDatabaseSharedManager.java:216)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$4.compare(GridCacheDatabaseSharedManager.java:195)
at java.util.TimSort.binarySort(TimSort.java:265)
at java.util.TimSort.sort(TimSort.java:208)
at java.util.TimSort.sort(TimSort.java:173)
at java.util.Arrays.sort(Arrays.java:659)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$CheckpointHistory.loadHistory(GridCacheDatabaseSharedManager.java:2704)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$CheckpointHistory.access$2600(GridCacheDatabaseSharedManager.java:2685)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restoreMemory(GridCacheDatabaseSharedManager.java:1468)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readCheckpointAndRestoreMemory(GridCacheDatabaseSharedManager.java:562)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.initCachesOnLocalJoin(GridDhtPartitionsExchangeFuture.java:722)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:613)
at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2289)
at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
at java.lang.Thread.run(Thread.java:745)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-6452) Invocation of getAll() through cache proxy during cache restart can throw unexpected CacheException

2017-09-20 Thread Ivan Rakov (JIRA)
Ivan Rakov created IGNITE-6452:
--

 Summary: Invocation of getAll() through cache proxy during cache 
restart can throw unexpected CacheException
 Key: IGNITE-6452
 URL: https://issues.apache.org/jira/browse/IGNITE-6452
 Project: Ignite
  Issue Type: Bug
Affects Versions: 2.1
Reporter: Ivan Rakov
Assignee: Ivan Rakov
 Fix For: 2.3


Instead of expected IgniteCacheRestartingException, load test shows the 
following exception sometimes:
{noformat}
javax.cache.CacheException: class org.apache.ignite.IgniteCheckedException: 
Failed to find message handler for message: GridNearGetRequest 
[futId=6fc73459e51-84b93e3c-47e1-433c-8a91-0700f131c617, 
miniId=27d73459e51-84b93e3c-47e1-433c-8a91-0700f131c617, ver=null, keyMap=null, 
flags=1, topVer=AffinityTopologyVersion [topVer=4, minorTopVer=32], 
subjId=080177d4-b78e-4f6f-a386-77be8830, taskNameHash=0, createTtl=-1, 
accessTtl=-1]

at 
org.apache.ignite.internal.processors.cache.GridCacheUtils.convertToCacheException(GridCacheUtils.java:1285)
at 
org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.cacheException(IgniteCacheProxyImpl.java:1648)
at 
org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.getAll(IgniteCacheProxyImpl.java:873)
at 
org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.getAll(GatewayProtectedCacheProxy.java:718)
at 
org.gridgain.grid.internal.processors.cache.database.IgniteDbSnapshotSelfTest$15.apply(IgniteDbSnapshotSelfTest.java:1911)
at 
org.gridgain.grid.internal.processors.cache.database.IgniteDbSnapshotSelfTest$15.apply(IgniteDbSnapshotSelfTest.java:1904)
at 
org.gridgain.grid.internal.processors.cache.database.IgniteDbSnapshotSelfTest.testReuseCacheProxyAfterRestore(IgniteDbSnapshotSelfTest.java:1796)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at junit.framework.TestCase.runTest(TestCase.java:176)
at 
org.apache.ignite.testframework.junits.GridAbstractTest.runTestInternal(GridAbstractTest.java:2000)
at 
org.apache.ignite.testframework.junits.GridAbstractTest.access$000(GridAbstractTest.java:132)
at 
org.apache.ignite.testframework.junits.GridAbstractTest$5.run(GridAbstractTest.java:1915)
at java.lang.Thread.run(Thread.java:745)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-6385) Offheap page eviction is broken in case loading data without data streamer

2017-09-14 Thread Ivan Rakov (JIRA)
Ivan Rakov created IGNITE-6385:
--

 Summary: Offheap page eviction is broken in case loading data 
without data streamer
 Key: IGNITE-6385
 URL: https://issues.apache.org/jira/browse/IGNITE-6385
 Project: Ignite
  Issue Type: Bug
Affects Versions: 2.3
Reporter: Ivan Rakov
Assignee: Ivan Rakov
 Fix For: 2.3


Page eviction was broken by recent optimizations from IGNITE-5658. We can 
allocate data pages if there are empty pages in separate stripe of free list 
bucket.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-6355) Calculating cache size during cache stop sporadically fails with ClusterGroupEmptyCheckedException

2017-09-12 Thread Ivan Rakov (JIRA)
Ivan Rakov created IGNITE-6355:
--

 Summary: Calculating cache size during cache stop sporadically 
fails with ClusterGroupEmptyCheckedException
 Key: IGNITE-6355
 URL: https://issues.apache.org/jira/browse/IGNITE-6355
 Project: Ignite
  Issue Type: Bug
Affects Versions: 2.1
Reporter: Ivan Rakov
Assignee: Ivan Rakov
 Fix For: 2.3


After 10-20 runs of IgniteDbSnapshotSelfTest#testReuseCacheProxyAfterRestore 
cache.size() fails:
{noformat}
[16:21:06,343][ERROR][main][root] Test failed.
javax.cache.CacheException: class 
org.apache.ignite.cluster.ClusterGroupEmptyException: Topology projection is 
empty.
at 
org.apache.ignite.internal.processors.cache.GridCacheUtils.convertToCacheException(GridCacheUtils.java:1327)
at 
org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.cacheException(IgniteCacheProxyImpl.java:1672)
at 
org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.size(IgniteCacheProxyImpl.java:762)
at 
org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.size(GatewayProtectedCacheProxy.java:508)
at 
org.gridgain.grid.internal.processors.cache.database.IgniteDbSnapshotSelfTest.testReuseCacheProxyAfterRestore(IgniteDbSnapshotSelfTest.java:1793)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at junit.framework.TestCase.runTest(TestCase.java:176)
at 
org.apache.ignite.testframework.junits.GridAbstractTest.runTestInternal(GridAbstractTest.java:2000)
at 
org.apache.ignite.testframework.junits.GridAbstractTest.access$000(GridAbstractTest.java:132)
at 
org.apache.ignite.testframework.junits.GridAbstractTest$5.run(GridAbstractTest.java:1915)
at java.lang.Thread.run(Thread.java:745)
Caused by: class org.apache.ignite.cluster.ClusterGroupEmptyException: Topology 
projection is empty.
at 
org.apache.ignite.internal.util.IgniteUtils$6.apply(IgniteUtils.java:823)
at 
org.apache.ignite.internal.util.IgniteUtils$6.apply(IgniteUtils.java:821)
... 14 more
Caused by: class 
org.apache.ignite.internal.cluster.ClusterGroupEmptyCheckedException: Topology 
projection is empty.
at 
org.apache.ignite.internal.processors.task.GridTaskWorker.getTaskTopology(GridTaskWorker.java:665)
at 
org.apache.ignite.internal.processors.task.GridTaskWorker.body(GridTaskWorker.java:500)
at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
at 
org.apache.ignite.internal.processors.task.GridTaskProcessor.startTask(GridTaskProcessor.java:758)
at 
org.apache.ignite.internal.processors.task.GridTaskProcessor.execute(GridTaskProcessor.java:454)
at 
org.apache.ignite.internal.processors.task.GridTaskProcessor.execute(GridTaskProcessor.java:410)
at 
org.apache.ignite.internal.processors.cache.GridCacheAdapter.sizeAsync(GridCacheAdapter.java:3747)
at 
org.apache.ignite.internal.processors.cache.GridCacheAdapter.size(GridCacheAdapter.java:3704)
at 
org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.size(IgniteCacheProxyImpl.java:759)
... 11 more
{noformat}
Data race stems from here:
{noformat}
Group grp = modes.near ? cluster.forCacheNodes(name(), true, true, 
false) : cluster.forDataNodes(name());

Collection nodes = grp.nodes();

if (nodes.isEmpty())
return new GridFinishedFuture<>(0);

ctx.kernalContext().task().setThreadContext(TC_SUBGRID, nodes);

return ctx.kernalContext().task().execute(
new SizeTask(ctx.name(), ctx.affinity().affinityTopologyVersion(), 
peekModes), null);
{noformat}
grp.nodes() returns PredicateCollectionView, which predicate depends on Ignite 
state. It can pass nodes.isEmpty() check and become empty later.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-6228) Avoid closing page store file with ClosedByInterruptException when user thread is interrupted

2017-08-30 Thread Ivan Rakov (JIRA)
Ivan Rakov created IGNITE-6228:
--

 Summary: Avoid closing page store file with 
ClosedByInterruptException when user thread is interrupted
 Key: IGNITE-6228
 URL: https://issues.apache.org/jira/browse/IGNITE-6228
 Project: Ignite
  Issue Type: Bug
  Components: persistence
Affects Versions: 2.1
Reporter: Ivan Rakov
 Fix For: 2.3


If cache proxy is in synchronous mode, user thread may be interrupted during 
read from file page store file. This will cause closing of partition file with 
ClosedByInterruptException.
Example stacktrace:
{noformat}
class org.apache.ignite.IgniteCheckedException: Runtime failure on lookup row: 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$SearchRow@717729d
at 
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findOne(BPlusTree.java:1070)
at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.find(IgniteCacheOffheapManagerImpl.java:1476)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.find(GridCacheOffheapManager.java:1276)
at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.read(IgniteCacheOffheapManagerImpl.java:394)
at 
org.apache.ignite.internal.processors.cache.GridCacheMapEntry.unswap(GridCacheMapEntry.java:371)
at 
org.apache.ignite.internal.processors.cache.GridCacheMapEntry.onTtlExpired(GridCacheMapEntry.java:2952)
at 
org.apache.ignite.internal.processors.cache.GridCacheTtlManager$1.applyx(GridCacheTtlManager.java:61)
at 
org.apache.ignite.internal.processors.cache.GridCacheTtlManager$1.applyx(GridCacheTtlManager.java:52)
at 
org.apache.ignite.internal.util.lang.IgniteInClosure2X.apply(IgniteInClosure2X.java:38)
at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.expire(IgniteCacheOffheapManagerImpl.java:1012)
at 
org.apache.ignite.internal.processors.cache.GridCacheTtlManager.expire(GridCacheTtlManager.java:198)
at 
org.apache.ignite.internal.processors.cache.GridCacheUtils.unwindEvicts(GridCacheUtils.java:868)
at 
org.apache.ignite.internal.processors.cache.GridCacheGateway.leaveNoLock(GridCacheGateway.java:240)
at 
org.apache.ignite.internal.processors.cache.GridCacheGateway.leave(GridCacheGateway.java:225)
at 
org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.onLeave(GatewayProtectedCacheProxy.java:1680)
at 
org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.put(GatewayProtectedCacheProxy.java:875)
at 
org.apache.ignite.internal.processors.cache.persistence.db.RestartGridTest$TestService.execute(RestartGridTest.java:160)
at 
org.apache.ignite.internal.processors.service.GridServiceProcessor$2.run(GridServiceProcessor.java:1160)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: class org.apache.ignite.IgniteCheckedException: Read error
at 
org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore.read(FilePageStore.java:356)
at 
org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.read(FilePageStoreManager.java:287)
at 
org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.read(FilePageStoreManager.java:272)
at 
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.acquirePage(PageMemoryImpl.java:570)
at 
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.acquirePage(PageMemoryImpl.java:488)
at 
org.apache.ignite.internal.processors.cache.persistence.DataStructure.acquirePage(DataStructure.java:129)
at 
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.treeMeta(BPlusTree.java:822)
at 
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.access$7700(BPlusTree.java:81)
at 
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Get.init(BPlusTree.java:2392)
at 
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.doFind(BPlusTree.java:1099)
at 
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findOne(BPlusTree.java:1065)
... 20 more
Caused by: java.nio.channels.ClosedByInterruptException
at 
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
at sun.nio.ch.FileChannelImpl.readInternal(FileChannelImpl.java:746)
at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:724)
at 

[jira] [Created] (IGNITE-6216) Add CheckpointWriteOrder enum in .NET persistent store configuration

2017-08-29 Thread Ivan Rakov (JIRA)
Ivan Rakov created IGNITE-6216:
--

 Summary: Add CheckpointWriteOrder enum in .NET persistent store 
configuration
 Key: IGNITE-6216
 URL: https://issues.apache.org/jira/browse/IGNITE-6216
 Project: Ignite
  Issue Type: Bug
Affects Versions: 2.1
Reporter: Ivan Rakov
Assignee: Pavel Tupitsyn
 Fix For: 2.3


Since 2.2 we have CheckpointWriteOrder property in 
PersistentStoreConfiguration. It should be possible to set through .NET 
configuration classes.
Default value should be CheckpointWriteOrder#SEQUENTIAL.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-6178) Make CheckpointWriteOrder.SEQUENTIAL and checkpointingThreads=4 default in persistent store confguration

2017-08-24 Thread Ivan Rakov (JIRA)
Ivan Rakov created IGNITE-6178:
--

 Summary: Make CheckpointWriteOrder.SEQUENTIAL and 
checkpointingThreads=4 default in persistent store confguration
 Key: IGNITE-6178
 URL: https://issues.apache.org/jira/browse/IGNITE-6178
 Project: Ignite
  Issue Type: Improvement
Affects Versions: 2.1
Reporter: Ivan Rakov
Assignee: Ivan Rakov
 Fix For: 2.2


Multithreaded and ordered checkpoints show better performance on most 
enviroments.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-6176) Request to BinaryMetadataTransport may cause deadlock on grid stop

2017-08-24 Thread Ivan Rakov (JIRA)
Ivan Rakov created IGNITE-6176:
--

 Summary: Request to BinaryMetadataTransport may cause deadlock on 
grid stop
 Key: IGNITE-6176
 URL: https://issues.apache.org/jira/browse/IGNITE-6176
 Project: Ignite
  Issue Type: Bug
Affects Versions: 2.1
Reporter: Ivan Rakov
 Fix For: 2.2


We may access BinaryMetadataTransport while holding striped GridCacheIOManager 
lock in some cases (for example, during partition exhange). 
When grid is stopping, BinaryMetadataTransport#requestMetadataUpdate may hang 
on future. This will result a deadlock: this future won't be cancelled. as we 
firstly stop GridCacheIOManager (which, in turn, requires releasin striped 
lock).

Steps to reproduce:
1) Remove partition exchange messages from classnames.properties
2) Run IgniteClusterActivateDeactivateTestWithPersistence multiple times

Stacktrace of deadlocked threads:
{noformat}
"sys-#9065%cache.IgniteClusterActivateDeactivateTestWithPersistence4%@18095" 
prio=5 tid=0x2b55 nid=NA waiting
  java.lang.Thread.State: WAITING
  at sun.misc.Unsafe.park(Unsafe.java:-1)
  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
  at 
org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:176)
  at 
org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:139)
  at 
org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.addMeta(CacheObjectBinaryProcessorImpl.java:432)
  at 
org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl$2.addMeta(CacheObjectBinaryProcessorImpl.java:173)
  at 
org.apache.ignite.internal.binary.BinaryContext.updateMetadata(BinaryContext.java:1276)
  at 
org.apache.ignite.internal.binary.BinaryClassDescriptor.write(BinaryClassDescriptor.java:783)
  at 
org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal0(BinaryWriterExImpl.java:206)
  at 
org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal(BinaryWriterExImpl.java:147)
  at 
org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal(BinaryWriterExImpl.java:134)
  at 
org.apache.ignite.internal.binary.BinaryWriterExImpl.doWriteObject(BinaryWriterExImpl.java:496)
  at 
org.apache.ignite.internal.binary.BinaryWriterExImpl.doWriteMap(BinaryWriterExImpl.java:786)
  at 
org.apache.ignite.internal.binary.BinaryClassDescriptor.write(BinaryClassDescriptor.java:699)
  at 
org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal0(BinaryWriterExImpl.java:206)
  at 
org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal(BinaryWriterExImpl.java:147)
  at 
org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal(BinaryWriterExImpl.java:134)
  at 
org.apache.ignite.internal.binary.GridBinaryMarshaller.marshal(GridBinaryMarshaller.java:251)
  at 
org.apache.ignite.internal.binary.BinaryMarshaller.marshal0(BinaryMarshaller.java:82)
  at 
org.apache.ignite.marshaller.AbstractNodeNameAwareMarshaller.marshal(AbstractNodeNameAwareMarshaller.java:58)
  at 
org.apache.ignite.internal.util.IgniteUtils.marshal(IgniteUtils.java:9849)
  at 
org.apache.ignite.internal.util.IgniteUtils.marshal(IgniteUtils.java:9913)
  at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsSingleMessage.prepareMarshal(GridDhtPartitionsSingleMessage.java:289)
  at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.onSend(GridCacheIoManager.java:1120)
  at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.send(GridCacheIoManager.java:1154)
  at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.processSinglePartitionRequest(GridDhtPartitionsExchangeFuture.java:2588)
  at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.access$1500(GridDhtPartitionsExchangeFuture.java:114)
  at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$6.apply(GridDhtPartitionsExchangeFuture.java:2492)
  at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$6.apply(GridDhtPartitionsExchangeFuture.java:2490)
  at 
org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:382)
  at 
org.apache.ignite.internal.util.future.GridFutureAdapter.listen(GridFutureAdapter.java:352)
  at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onReceivePartitionRequest(GridDhtPartitionsExchangeFuture.java:2490)
  at 

[jira] [Created] (IGNITE-6102) Implement checks that node that joins topology has consistent database configuration

2017-08-17 Thread Ivan Rakov (JIRA)
Ivan Rakov created IGNITE-6102:
--

 Summary: Implement checks that node that joins topology has 
consistent database configuration
 Key: IGNITE-6102
 URL: https://issues.apache.org/jira/browse/IGNITE-6102
 Project: Ignite
  Issue Type: Task
Affects Versions: 2.1
Reporter: Ivan Rakov
Assignee: Ivan Rakov
 Fix For: 2.2


We need to check for node misconfiguration: all nodes in cluster should have 
same (or consistent) database settings.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-6033) Add sorted and multithreaded modes in checkpoint algorithm

2017-08-10 Thread Ivan Rakov (JIRA)
Ivan Rakov created IGNITE-6033:
--

 Summary: Add sorted and multithreaded modes in checkpoint algorithm
 Key: IGNITE-6033
 URL: https://issues.apache.org/jira/browse/IGNITE-6033
 Project: Ignite
  Issue Type: Improvement
  Components: persistence
Affects Versions: 2.1
Reporter: Ivan Rakov
Assignee: Ivan Rakov
 Fix For: 2.2


Sequential writes to SSD are faster than random. When we write checkpoint, we 
iterate through hash table, which is actually random order. We should add an 
option to write pages sorted by page index. It should be configured in 
PersistentStoreConfiguration.
Also, we already have PersistentStoreConfiguration#checkpointingThreads option, 
but we don't use it - we create thread pool, but submit only one task to it. 
This should be fixed as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-6016) Get rid of checking topology hash in ackTopology

2017-08-09 Thread Ivan Rakov (JIRA)
Ivan Rakov created IGNITE-6016:
--

 Summary: Get rid of checking topology hash in ackTopology
 Key: IGNITE-6016
 URL: https://issues.apache.org/jira/browse/IGNITE-6016
 Project: Ignite
  Issue Type: Improvement
  Components: general
Affects Versions: 2.1
Reporter: Ivan Rakov
Assignee: Ivan Rakov
 Fix For: 2.2


We check topologyHash in ackTopology in order to avoid printing "Topology 
snapshot" message twice. It's redundant - we can just atomically increase 
topology version with GridAtomicLong#setIfGreater.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-5971) Ignite Continuous Query 2: Flaky failure of #testMultiThreadedFailover

2017-08-07 Thread Ivan Rakov (JIRA)
Ivan Rakov created IGNITE-5971:
--

 Summary: Ignite Continuous Query 2: Flaky failure of 
#testMultiThreadedFailover
 Key: IGNITE-5971
 URL: https://issues.apache.org/jira/browse/IGNITE-5971
 Project: Ignite
  Issue Type: Test
Affects Versions: 2.1
Reporter: Ivan Rakov
 Fix For: 2.2


Bunch of tests inherited from CacheContinuousQueryFailoverAbstractSelfTest have 
flaky #testMultiThreadedFailover test. It fails from time to time in all 
inherited test classes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-5970) Ignite Binary Objects Simple Mapper Basic: Flaky failure of org.apache.ignite.p2p.GridP2PLocalDeploymentSelfTest#testConcurrentDeploymentWithDelegatingClassloader

2017-08-07 Thread Ivan Rakov (JIRA)
Ivan Rakov created IGNITE-5970:
--

 Summary: Ignite Binary Objects Simple Mapper Basic: Flaky failure 
of 
org.apache.ignite.p2p.GridP2PLocalDeploymentSelfTest#testConcurrentDeploymentWithDelegatingClassloader
 Key: IGNITE-5970
 URL: https://issues.apache.org/jira/browse/IGNITE-5970
 Project: Ignite
  Issue Type: Test
Affects Versions: 2.1
Reporter: Ivan Rakov
 Fix For: 2.2


Can't reproduce locally on Win 10.
On TC test has 50% success rate.
{noformat}
org.apache.ignite.internal.IgniteDeploymentCheckedException: Task not deployed: 
org.apache.ignite.p2p.GridP2PLocalDeploymentSelfTest$TestClosure
at 
org.apache.ignite.internal.processors.task.GridTaskProcessor.startTask(GridTaskProcessor.java:712)
at 
org.apache.ignite.internal.processors.task.GridTaskProcessor.execute(GridTaskProcessor.java:448)
at 
org.apache.ignite.internal.processors.closure.GridClosureProcessor.callAsync(GridClosureProcessor.java:673)
at 
org.apache.ignite.internal.processors.closure.GridClosureProcessor.callAsync(GridClosureProcessor.java:478)
at 
org.apache.ignite.internal.IgniteComputeImpl.callAsync0(IgniteComputeImpl.java:809)
at 
org.apache.ignite.internal.IgniteComputeImpl.call(IgniteComputeImpl.java:785)
at 
org.apache.ignite.p2p.GridP2PLocalDeploymentSelfTest$1.call(GridP2PLocalDeploymentSelfTest.java:240)
at 
org.apache.ignite.p2p.GridP2PLocalDeploymentSelfTest$1.call(GridP2PLocalDeploymentSelfTest.java:235)
at 
org.apache.ignite.testframework.GridTestThread.run(GridTestThread.java:86)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-5965) Ignite Basic: Flaky failure of GridServiceProcessorMultiNodeConfigSelfTest.testDeployOnEachNodeUpdateTopology

2017-08-07 Thread Ivan Rakov (JIRA)
Ivan Rakov created IGNITE-5965:
--

 Summary: Ignite Basic: Flaky failure of  
GridServiceProcessorMultiNodeConfigSelfTest.testDeployOnEachNodeUpdateTopology
 Key: IGNITE-5965
 URL: https://issues.apache.org/jira/browse/IGNITE-5965
 Project: Ignite
  Issue Type: Test
Affects Versions: 2.1
Reporter: Ivan Rakov
 Fix For: 2.2


Test has 85% success rate in master:
http://ci.ignite.apache.org/project.html?projectId=Ignite20Tests=-2642357454043293898=testDetails_Ignite20Tests=%3Cdefault%3E
Flaky failure is reproduced locally with similar success rate (24/30, Win 10).
{noformat}
junit.framework.AssertionFailedError: 
Expected :4
Actual   :5
 


at junit.framework.Assert.fail(Assert.java:57)
at junit.framework.Assert.failNotEquals(Assert.java:329)
at junit.framework.Assert.assertEquals(Assert.java:78)
at junit.framework.Assert.assertEquals(Assert.java:234)
at junit.framework.Assert.assertEquals(Assert.java:241)
at junit.framework.TestCase.assertEquals(TestCase.java:409)
at 
org.apache.ignite.internal.processors.service.GridServiceProcessorAbstractSelfTest.checkCount(GridServiceProcessorAbstractSelfTest.java:765)
at 
org.apache.ignite.internal.processors.service.GridServiceProcessorMultiNodeConfigSelfTest.checkDeployOnEachNodeUpdateTopology(GridServiceProcessorMultiNodeConfigSelfTest.java:287)
at 
org.apache.ignite.internal.processors.service.GridServiceProcessorMultiNodeConfigSelfTest.testDeployOnEachNodeUpdateTopology(GridServiceProcessorMultiNodeConfigSelfTest.java:144)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at junit.framework.TestCase.runTest(TestCase.java:176)
at 
org.apache.ignite.testframework.junits.GridAbstractTest.runTestInternal(GridAbstractTest.java:2000)
at 
org.apache.ignite.testframework.junits.GridAbstractTest.access$000(GridAbstractTest.java:132)
at 
org.apache.ignite.testframework.junits.GridAbstractTest$5.run(GridAbstractTest.java:1915)
at java.lang.Thread.run(Thread.java:745)


{noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-5961) Align pages in LFS partition files to pageSize

2017-08-07 Thread Ivan Rakov (JIRA)
Ivan Rakov created IGNITE-5961:
--

 Summary: Align pages in LFS partition files to pageSize
 Key: IGNITE-5961
 URL: https://issues.apache.org/jira/browse/IGNITE-5961
 Project: Ignite
  Issue Type: Improvement
  Components: persistence
Affects Versions: 2.1
Reporter: Ivan Rakov
Assignee: Ivan Rakov
Priority: Critical
 Fix For: 2.2


We store 17 bytes of header at the start of every partition file:
{noformat}
/** Allocated field offset. */
static final int HEADER_SIZE = 8/*SIGNATURE*/ + 4/*VERSION*/ + 1/*type*/ + 
4/*page size*/;
{noformat}
Even if pageSize is equal to OS page cache size and equal to SSD disk page size 
(which is best scenario), when generate two dirty pages insteadf of one. This 
is suboptimal and can be a bottleneck of checkpoint write speed. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-5884) Change default pageSize of page memory to 4KB

2017-07-31 Thread Ivan Rakov (JIRA)
Ivan Rakov created IGNITE-5884:
--

 Summary: Change default pageSize of page memory to 4KB
 Key: IGNITE-5884
 URL: https://issues.apache.org/jira/browse/IGNITE-5884
 Project: Ignite
  Issue Type: Improvement
  Components: persistence
Reporter: Ivan Rakov
 Fix For: 2.2


Checkpoint write speed is suboptimal with default 2K page on most UNIX-driven 
enviroments with SSD disk. There are several reasons for this:
1) Page size of linux page cache is 4k by default on most kernels (you can 
check yours by "getconf PAGE_SIZE" command). With 2k random writes 
vm.dirty_ratio threshold is reached two times faster than with 4k random writes.
2) Most SSD manufacturers don't reveal actual disk page size, but they 
recommend to write at least 4k at once. Also, 4k blocks are used during 
benchmarking SSD random writes. Related question: 
https://superuser.com/questions/1168014/nvme-ssd-why-is-4k-writing-faster-than-reading
I've prepared a checkpoint emulation benchmark (code and results attached). Run 
on production-level hardware (CentOS, 100 GB RAM, total LFS size is 100GB, 
vm.dirty_ratio=10) showed that checkpointing with 4k pages is much more 
efficient than with 2k.
*Important: backwards compatibility must be ensured with LFS files created with 
old 2k default page size.*



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-5883) Ignite Start Nodes test suite is flaky

2017-07-31 Thread Ivan Rakov (JIRA)
Ivan Rakov created IGNITE-5883:
--

 Summary: Ignite Start Nodes test suite is flaky
 Key: IGNITE-5883
 URL: https://issues.apache.org/jira/browse/IGNITE-5883
 Project: Ignite
  Issue Type: Bug
Reporter: Ivan Rakov


Ignite Start Nodes suite contains several flaky tests: 
http://ci.ignite.apache.org/viewType.html?buildTypeId=Ignite20Tests_IgniteStartNodes
Example runs with flaky fails: 
1) 
http://ci.ignite.apache.org/viewLog.html?buildId=746188=buildResultsDiv=Ignite20Tests_IgniteStartNodes
2) 
http://ci.ignite.apache.org/viewLog.html?buildId=747364=buildResultsDiv=Ignite20Tests_IgniteStartNodes
The ticket is for investigation and making tests stable.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-5875) Implement memory metric that will signal about uneven distribution of pages between segments of durable memory

2017-07-28 Thread Ivan Rakov (JIRA)
Ivan Rakov created IGNITE-5875:
--

 Summary: Implement memory metric that will signal about uneven 
distribution of pages between segments of durable memory
 Key: IGNITE-5875
 URL: https://issues.apache.org/jira/browse/IGNITE-5875
 Project: Ignite
  Issue Type: Improvement
  Components: persistence
Affects Versions: 2.1
Reporter: Ivan Rakov
 Fix For: 2.2


When persistence is enabled, we split memory policy into K equal segments (K = 
concurrency level). We calculate hash of pageId to determine segment where page 
will be stored.
Pages eviction to disk starts when segment is full. If hash function is bad 
enough, one segment can overflow when there are lots of free space, and 
evictions can start too early. We want to be able to distinguish such 
situations. 
Proposed name for metrics: pageHashFunctionScore
Proposed formula: difference between maximum and minimum percentage of segment 
fill



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-5874) Store TTL expire times in B+ tree on per-partition basis

2017-07-28 Thread Ivan Rakov (JIRA)
Ivan Rakov created IGNITE-5874:
--

 Summary: Store TTL expire times in B+ tree on per-partition basis
 Key: IGNITE-5874
 URL: https://issues.apache.org/jira/browse/IGNITE-5874
 Project: Ignite
  Issue Type: Improvement
  Components: cache
Affects Versions: 2.1
Reporter: Ivan Rakov
 Fix For: 2.2


TTL expire times for entries are stored in PendingEntriesTree, which is 
singleton for cache. When expiration occurs, all system threads iterate through 
the tree in order to remove expired entries. Iterating through single tree 
causes contention and perfomance loss.
We should keep instance of PendingEntriesTree for each partition, like we do 
for CacheDataTree.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-5793) Cache with constant time TTL for entries and enabled persistence hangs for a long time when TTL expirations start

2017-07-20 Thread Ivan Rakov (JIRA)
Ivan Rakov created IGNITE-5793:
--

 Summary: Cache with constant time TTL for entries and enabled 
persistence hangs for a long time when TTL expirations start
 Key: IGNITE-5793
 URL: https://issues.apache.org/jira/browse/IGNITE-5793
 Project: Ignite
  Issue Type: Bug
  Components: cache
Affects Versions: 2.0
Reporter: Ivan Rakov
 Fix For: 2.2


Right after expiration time, all threads from sys-stripe pool are busy with 
removing expired entries:
{noformat}
Thread 
[name="sys-stripe-3-#35%database.IgniteDbSnapshotWithEvictionsSelfTest1%", 
id=60, state=RUNNABLE, blockCnt=0, waitCnt=101794]
at o.a.i.i.binary.BinaryObjectImpl.typeId(BinaryObjectImpl.java:278)
at 
o.a.i.i.processors.cache.binary.CacheObjectBinaryProcessorImpl.typeId(CacheObjectBinaryProcessorImpl.java:672)
at 
o.a.i.i.processors.query.GridQueryProcessor.typeByValue(GridQueryProcessor.java:1688)
at 
o.a.i.i.processors.query.GridQueryProcessor.remove(GridQueryProcessor.java:2177)
at 
o.a.i.i.processors.cache.query.GridCacheQueryManager.remove(GridCacheQueryManager.java:451)
at 
o.a.i.i.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.finishRemove(IgniteCacheOffheapManagerImpl.java:1456)
at 
o.a.i.i.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.remove(IgniteCacheOffheapManagerImpl.java:1419)
at 
o.a.i.i.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.remove(GridCacheOffheapManager.java:1241)
at 
o.a.i.i.processors.cache.IgniteCacheOffheapManagerImpl.remove(IgniteCacheOffheapManagerImpl.java:383)
at 
o.a.i.i.processors.cache.GridCacheMapEntry.removeValue(GridCacheMapEntry.java:3221)
at 
o.a.i.i.processors.cache.GridCacheMapEntry.onExpired(GridCacheMapEntry.java:3028)
at 
o.a.i.i.processors.cache.GridCacheMapEntry.onTtlExpired(GridCacheMapEntry.java:2961)
at 
o.a.i.i.processors.cache.GridCacheTtlManager$1.applyx(GridCacheTtlManager.java:61)
at 
o.a.i.i.processors.cache.GridCacheTtlManager$1.applyx(GridCacheTtlManager.java:52)
at o.a.i.i.util.lang.IgniteInClosure2X.apply(IgniteInClosure2X.java:38)
at 
o.a.i.i.processors.cache.IgniteCacheOffheapManagerImpl.expire(IgniteCacheOffheapManagerImpl.java:1007)
at 
o.a.i.i.processors.cache.GridCacheTtlManager.expire(GridCacheTtlManager.java:198)
at 
o.a.i.i.processors.cache.GridCacheTtlManager.expire(GridCacheTtlManager.java:160)
at 
o.a.i.i.processors.cache.GridCacheUtils.unwindEvicts(GridCacheUtils.java:854)
at 
o.a.i.i.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1073)
at 
o.a.i.i.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:561)
at 
o.a.i.i.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:378)
at 
o.a.i.i.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:304)
at 
o.a.i.i.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:99)
at 
o.a.i.i.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:293)
at 
o.a.i.i.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1556)
at 
o.a.i.i.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1184)
at 
o.a.i.i.managers.communication.GridIoManager.access$4200(GridIoManager.java:126)
at 
o.a.i.i.managers.communication.GridIoManager$9.run(GridIoManager.java:1097)
at o.a.i.i.util.StripedExecutor$Stripe.run(StripedExecutor.java:483)
at java.lang.Thread.run(Thread.java:745)
{noformat}
System totally stops responding to user get/put/etc operation. The freeze can 
last for several checkpoints.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-5526) Dynamic cache stop/start from non-affinity coordinator node hangs with AssertionError

2017-06-16 Thread Ivan Rakov (JIRA)
Ivan Rakov created IGNITE-5526:
--

 Summary: Dynamic cache stop/start from non-affinity coordinator 
node hangs with AssertionError
 Key: IGNITE-5526
 URL: https://issues.apache.org/jira/browse/IGNITE-5526
 Project: Ignite
  Issue Type: Bug
  Components: cache
Affects Versions: 2.1
Reporter: Ivan Rakov
Assignee: Ivan Rakov
 Fix For: 2.1


Reproducer is in attachment.
{noformat}
[18:43:17,372][ERROR][exchange-worker-#29%dummy%][GridCachePartitionExchangeManager]
 Runtime error caught during grid runnable execution: GridWorker 
[name=partition-exchanger, igniteInstanceName=dummy, finished=false, 
hashCode=1561691363, interrupted=false, runner=exchange-worker-#29%dummy%]
java.lang.AssertionError
at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtPartitionTopologyImpl.update(GridDhtPartitionTopologyImpl.java:1065)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.updateTopologies(GridDhtPartitionsExchangeFuture.java:688)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:592)
at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:1858)
at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
at java.lang.Thread.run(Thread.java:745)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-5490) Implement replacement for obsolete CacheMetrics#getOffHeapAllocatedSize

2017-06-14 Thread Ivan Rakov (JIRA)
Ivan Rakov created IGNITE-5490:
--

 Summary: Implement replacement for obsolete 
CacheMetrics#getOffHeapAllocatedSize
 Key: IGNITE-5490
 URL: https://issues.apache.org/jira/browse/IGNITE-5490
 Project: Ignite
  Issue Type: Improvement
  Components: cache
Affects Versions: 2.0
Reporter: Ivan Rakov


With 2.0 new architecture, many caches can share one memory policy. Memory 
metrics allows to measure memory usage (loaded pages) for the whole policy. 
However, there's also a need to measure how much memory (or pages) is used by 
each cache.
Before 2.0 such information was accessible with 
CacheMetrics#getOffHeapAllocatedSize, but current implemetation returns 0.
We should either implement it or provide alternative metrics (e. g. approximate 
number of loaded pages per cache). Please note that precise number of loaded 
pages per cache is not defined - one page can contain entries of differet 
caches.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-5215) Allow user to configure memory policy with maxSize lesser than default initialSize

2017-05-15 Thread Ivan Rakov (JIRA)
Ivan Rakov created IGNITE-5215:
--

 Summary: Allow user to configure memory policy with maxSize lesser 
than default initialSize
 Key: IGNITE-5215
 URL: https://issues.apache.org/jira/browse/IGNITE-5215
 Project: Ignite
  Issue Type: Improvement
  Components: cache
Affects Versions: 2.0
Reporter: Ivan Rakov
Assignee: Ivan Rakov


Attempt to create memory policy with small maxSize ends with error:
{noformat}
Caused by: class org.apache.ignite.IgniteCheckedException: MemoryPolicy maxSize 
must not be smaller than initialSize [name=dfltMemPlc, initSize=268,4 MB, 
maxSize=209,7 MB]
at 
org.apache.ignite.internal.processors.cache.database.IgniteCacheDatabaseSharedManager.checkPolicySize(IgniteCacheDatabaseSharedManager.java:419)
at 
org.apache.ignite.internal.processors.cache.database.IgniteCacheDatabaseSharedManager.validateConfiguration(IgniteCacheDatabaseSharedManager.java:337)
at 
org.apache.ignite.internal.processors.cache.database.IgniteCacheDatabaseSharedManager.init(IgniteCacheDatabaseSharedManager.java:112)
at 
org.apache.ignite.internal.processors.cache.database.IgniteCacheDatabaseSharedManager.start0(IgniteCacheDatabaseSharedManager.java:99)
at 
org.gridgain.grid.internal.processors.cache.database.GridCacheDatabaseSharedManager.initDataBase(GridCacheDatabaseSharedManager.java:493)
at 
org.gridgain.grid.internal.processors.cache.database.GridCacheDatabaseSharedManager.start0(GridCacheDatabaseSharedManager.java:436)
at 
org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter.start(GridCacheSharedManagerAdapter.java:53)
at 
org.apache.ignite.internal.processors.cache.GridCacheProcessor.start(GridCacheProcessor.java:644)
at 
org.apache.ignite.internal.IgniteKernal.startProcessor(IgniteKernal.java:1763)
... 29 more
{noformat}
This can be easily fixed by setting initialSize. Though, it would be better to 
don't oblige user spending time on fixing it.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (IGNITE-4959) Possible slight memory leak in free list

2017-04-13 Thread Ivan Rakov (JIRA)
Ivan Rakov created IGNITE-4959:
--

 Summary: Possible slight memory leak in free list
 Key: IGNITE-4959
 URL: https://issues.apache.org/jira/browse/IGNITE-4959
 Project: Ignite
  Issue Type: Improvement
  Components: cache
Affects Versions: 2.0
Reporter: Ivan Rakov
Assignee: Alexey Goncharuk


To reproduce, run PageEvictionMultinodeTest (any eviction mode), set ENTRIES to 
Integer.MAX_VALUE.
Observations:
1) After a few minutes of test running, number of allocated pages looks like a 
constant (a bit more than eviciton threshold, 90% by default). This is expected 
behaviour with enabled page eviction.
2) More precise measurement shows that there's slow linear growth of allocated 
pages number, literally 10-20 pages per minute.
3) Number of pages with type T_PAGE_LIST_NODE grows, number of all other pages 
remains constant.
4) Though, total number of pages in free list remains constant (with minor 
fluctuations).
We have to find out whether this process has a saturation point, after which 
pages number stop growing. Otherwise, it's a memory leak and should be fixed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (IGNITE-4958) Make data pages recyclable into index/meta/etc pages and vice versa

2017-04-13 Thread Ivan Rakov (JIRA)
Ivan Rakov created IGNITE-4958:
--

 Summary: Make data pages recyclable into index/meta/etc pages and 
vice versa
 Key: IGNITE-4958
 URL: https://issues.apache.org/jira/browse/IGNITE-4958
 Project: Ignite
  Issue Type: Improvement
  Components: cache
Affects Versions: 2.0
Reporter: Ivan Rakov
Assignee: Alexey Goncharuk


Recycling for data pages is disabled for now. Empty data pages are accumulated 
in FreeListImpl#emptyDataPagesBucket, and can be reused only as data pages 
again. What has to be done:
* Empty data pages should be recycled into reuse bucket
* We should check reuse bucket first before allocating a new data page
* MemoryPolicyConfiguration#emptyPagesPoolSize should be removed



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (IGNITE-4921) Refactor segments array in PageMemoryNoStoreImpl

2017-04-05 Thread Ivan Rakov (JIRA)
Ivan Rakov created IGNITE-4921:
--

 Summary: Refactor segments array in PageMemoryNoStoreImpl
 Key: IGNITE-4921
 URL: https://issues.apache.org/jira/browse/IGNITE-4921
 Project: Ignite
  Issue Type: Improvement
Affects Versions: 2.0
Reporter: Ivan Rakov
Assignee: Alexey Goncharuk


In current version of PageMemoryNoStoreImpl, offheap memory is split into equal 
segments. Quantity of segments is based on 
MemoryConfiguration#getConcurrencyLevel (or Runtime#availableProcessors, if 
previous is not set).
This approach is obsolete. In order to reduce code complexity, segments should 
be refactored into one memory region.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)