[jira] [Created] (IGNITE-7788) Data loss after cold restart with PDS and cache group change

2018-02-22 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-7788:
--

 Summary: Data loss after cold restart with PDS and cache group 
change
 Key: IGNITE-7788
 URL: https://issues.apache.org/jira/browse/IGNITE-7788
 Project: Ignite
  Issue Type: Bug
  Components: persistence
Affects Versions: 2.3
Reporter: Alexandr Kuramshin


Reproduced by improved test 
{{IgnitePdsCacheRestoreTest.testRestoreAndNewCache6}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-7723) Data loss after node restart with PDS

2018-02-15 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-7723:
--

 Summary: Data loss after node restart with PDS
 Key: IGNITE-7723
 URL: https://issues.apache.org/jira/browse/IGNITE-7723
 Project: Ignite
  Issue Type: Bug
  Components: general, persistence
Affects Versions: 2.3
Reporter: Alexandr Kuramshin
 Attachments: IgnitePdsDataLossTest.java

Split-brain scenario with topology validator is used to convince possible data 
loss. The same results may be achieved on accidental network problems combined 
with node restart.

See the reproducer {{IgnitePdsDataLossTest}} for details.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-7634) Wrong NodeStoppingException on destroying cache

2018-02-06 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-7634:
--

 Summary: Wrong NodeStoppingException on destroying cache
 Key: IGNITE-7634
 URL: https://issues.apache.org/jira/browse/IGNITE-7634
 Project: Ignite
  Issue Type: Bug
  Components: cache
Affects Versions: 2.3
Reporter: Alexandr Kuramshin


Get multiple {{NodeStoppingException}} on concurrent cache operations actually 
meaning the cache destroying

{noformat}
Error during parallel index create/rebuild.
org.apache.ignite.internal.NodeStoppingException: Operation has been cancelled 
(node is stopping).
at 
org.apache.ignite.internal.processors.cache.query.GridCacheQueryManager.store(GridCacheQueryManager.java:393)
at 
org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing$RebuldIndexFromHashClosure.apply(IgniteH2Indexing.java:2635)
at 
org.apache.ignite.internal.processors.cache.GridCacheMapEntry.updateIndex(GridCacheMapEntry.java:3305)
at 
org.apache.ignite.internal.processors.query.schema.SchemaIndexCacheVisitorImpl.processKey(SchemaIndexCacheVisitorImpl.java:243)
at 
org.apache.ignite.internal.processors.query.schema.SchemaIndexCacheVisitorImpl.processPartition(SchemaIndexCacheVisitorImpl.java:206)
at 
org.apache.ignite.internal.processors.query.schema.SchemaIndexCacheVisitorImpl.processPartitions(SchemaIndexCacheVisitorImpl.java:165)
at 
org.apache.ignite.internal.processors.query.schema.SchemaIndexCacheVisitorImpl.access$100(SchemaIndexCacheVisitorImpl.java:50)
at 
org.apache.ignite.internal.processors.query.schema.SchemaIndexCacheVisitorImpl$AsyncWorker.body(SchemaIndexCacheVisitorImpl.java:316)
at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
at java.lang.Thread.run(Thread.java:745)
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-7633) Multiple errors on accessing page store while destroying cache

2018-02-06 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-7633:
--

 Summary: Multiple errors on accessing page store while destroying 
cache
 Key: IGNITE-7633
 URL: https://issues.apache.org/jira/browse/IGNITE-7633
 Project: Ignite
  Issue Type: Bug
Affects Versions: 2.3
Reporter: Alexandr Kuramshin


A single common exception
{noformat}
Partition eviction failed, this can cause grid hang.
org.apache.ignite.IgniteException: Failed to get page store for the given cache 
ID (cache has not been started): -1903385190
at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.destroyCacheDataStore(IgniteCacheOffheapManagerImpl.java:931)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition.destroyCacheDataStore(GridDhtLocalPartition.java:772)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition.finishDestroy(GridDhtLocalPartition.java:730)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition.clearEvicting(GridDhtLocalPartition.java:702)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition.tryEvict(GridDhtLocalPartition.java:762)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$3.call(GridDhtPreloader.java:593)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$3.call(GridDhtPreloader.java:580)
at 
org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:6639)
at 
org.apache.ignite.internal.processors.closure.GridClosureProcessor$2.body(GridClosureProcessor.java:967)
at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.ignite.IgniteCheckedException: Failed to get page store 
for the given cache ID (cache has not been started): -1903385190
at 
org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getStore(FilePageStoreManager.java:670)
at 
org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.onPartitionDestroyed(FilePageStoreManager.java:268)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager.destroyCacheDataStore0(GridCacheOffheapManager.java:494)
at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.destroyCacheDataStore(IgniteCacheOffheapManagerImpl.java:928)
... 12 common frames omitted
{noformat}
And multiple another for many pages
{noformat}
There was an exception while updating tracking page: 000119a20001
org.apache.ignite.IgniteCheckedException: Failed to get page store for the 
given cache ID (cache has not been started): -1903385190
at 
org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getStore(FilePageStoreManager.java:670)
at 
org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.read(FilePageStoreManager.java:290)
at 
org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.read(FilePageStoreManager.java:277)
at 
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.acquirePage(PageMemoryImpl.java:608)
at 
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.acquirePage(PageMemoryImpl.java:528)
at 
org.gridgain.grid.internal.processors.cache.database.GridCacheSnapshotManager.onChangeTrackerPage(GridCacheSnapshotManager.java:1921)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$9.applyx(GridCacheDatabaseSharedManager.java:966)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$9.applyx(GridCacheDatabaseSharedManager.java:959)
at 
org.apache.ignite.internal.util.lang.GridInClosure3X.apply(GridInClosure3X.java:34)
at 
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.writeUnlockPage(PageMemoryImpl.java:1274)
at 
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.writeUnlock(PageMemoryImpl.java:419)
at 
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.writeUnlock(PageMemoryImpl.java:413)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager.saveStoreMetadata(GridCacheOffheapManager.java:304)
at 

[jira] [Created] (IGNITE-7632) NPE in IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.updateIgfsMetrics()

2018-02-05 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-7632:
--

 Summary: NPE in 
IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.updateIgfsMetrics()
 Key: IGNITE-7632
 URL: https://issues.apache.org/jira/browse/IGNITE-7632
 Project: Ignite
  Issue Type: Bug
  Components: cache
Affects Versions: 2.3
Reporter: Alexandr Kuramshin


Occurs on destroying cache while rebuilding indices in progress

{noformat}
Partition eviction failed, this can cause grid hang.
java.lang.NullPointerException: null
at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.updateIgfsMetrics(IgniteCacheOffheapManagerImpl.java:1576)
at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.finishRemove(IgniteCacheOffheapManagerImpl.java:1403)
at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.remove(IgniteCacheOffheapManagerImpl.java:1368)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.remove(GridCacheOffheapManager.java:1312)
at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.remove(IgniteCacheOffheapManagerImpl.java:368)
at 
org.apache.ignite.internal.processors.cache.GridCacheMapEntry.removeValue(GridCacheMapEntry.java:3224)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheEntry.clearInternal(GridDhtCacheEntry.java:588)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition.clearAll(GridDhtLocalPartition.java:895)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition.tryEvict(GridDhtLocalPartition.java:753)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$3.call(GridDhtPreloader.java:593)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$3.call(GridDhtPreloader.java:580)
at 
org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:6639)
at 
org.apache.ignite.internal.processors.closure.GridClosureProcessor$2.body(GridClosureProcessor.java:967)
...
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-7631) Failed to clear page memory with AssertionError: Release pinned page

2018-02-05 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-7631:
--

 Summary: Failed to clear page memory with AssertionError: Release 
pinned page
 Key: IGNITE-7631
 URL: https://issues.apache.org/jira/browse/IGNITE-7631
 Project: Ignite
  Issue Type: Bug
  Components: cache
Affects Versions: 2.3
Reporter: Alexandr Kuramshin


The following scenario produces a problem:

# Cluster was started and activated.
# Snapshot has been restored.
# Rebuilding indexes in progress.
# Caches destroyed.
# Multiple NPE exceptions occurs.
# The following exception occurs:

{noformat}
Failed to clear page memory
org.apache.ignite.IgniteCheckedException: Compound exception for 
CountDownFuture.
at 
org.apache.ignite.internal.util.future.CountDownFuture.addError(CountDownFuture.java:72)
at 
org.apache.ignite.internal.util.future.CountDownFuture.onDone(CountDownFuture.java:46)
at 
org.apache.ignite.internal.util.future.CountDownFuture.onDone(CountDownFuture.java:28)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:462)
at 
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl$ClearSegmentRunnable.run(PageMemoryImpl.java:2449)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Suppressed: java.lang.AssertionError: Release pinned page: FullPageId 
[pageId=000100f40007, effectivePageId=00f40007, grpId=321390040]
at 
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl$PagePool.releaseFreePage(PageMemoryImpl.java:1593)
at 
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl$PagePool.access$1900(PageMemoryImpl.java:1465)
at 
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl$ClearSegmentRunnable.run(PageMemoryImpl.java:2440)
... 3 common frames omitted
Suppressed: java.lang.AssertionError: Release pinned page: FullPageId 
[pageId=000200019986, effectivePageId=00019986, grpId=-1903385190]
at 
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl$PagePool.releaseFreePage(PageMemoryImpl.java:1593)
at 
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl$PagePool.access$1900(PageMemoryImpl.java:1465)
at 
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl$ClearSegmentRunnable.run(PageMemoryImpl.java:2440)
... 3 common frames omitted
Suppressed: java.lang.AssertionError: Release pinned page: FullPageId 
[pageId=0002c85c, effectivePageId=c85c, grpId=-1903385190]
at 
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl$PagePool.releaseFreePage(PageMemoryImpl.java:1593)
at 
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl$PagePool.access$1900(PageMemoryImpl.java:1465)
at 
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl$ClearSegmentRunnable.run(PageMemoryImpl.java:2440)
... 3 common frames omitted
Suppressed: java.lang.AssertionError: Release pinned page: FullPageId 
[pageId=000232da, effectivePageId=32da, grpId=321390040]
at 
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl$PagePool.releaseFreePage(PageMemoryImpl.java:1593)
at 
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl$PagePool.access$1900(PageMemoryImpl.java:1465)
at 
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl$ClearSegmentRunnable.run(PageMemoryImpl.java:2440)
... 3 common frames omitted
Suppressed: java.lang.AssertionError: Release pinned page: FullPageId 
[pageId=000200011d30, effectivePageId=00011d30, grpId=-1903385190]
at 
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl$PagePool.releaseFreePage(PageMemoryImpl.java:1593)
at 
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl$PagePool.access$1900(PageMemoryImpl.java:1465)
at 
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl$ClearSegmentRunnable.run(PageMemoryImpl.java:2440)
... 3 common frames omitted
Suppressed: java.lang.AssertionError: Release pinned page: FullPageId 
[pageId=0002d346, effectivePageId=d346, grpId=-1903385190]
at 

[jira] [Created] (IGNITE-7630) NPE in SchemaIndexCacheVisitorImpl.processKey()

2018-02-05 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-7630:
--

 Summary: NPE in SchemaIndexCacheVisitorImpl.processKey()
 Key: IGNITE-7630
 URL: https://issues.apache.org/jira/browse/IGNITE-7630
 Project: Ignite
  Issue Type: Bug
Affects Versions: 2.3
Reporter: Alexandr Kuramshin


Occurs after destroying cache while rebuilding indices in progress

{noformat}
[Thread] parallel-idx-worker-GridDhtColocatedCache [...]
[Emitter] o.a.i.i.p.q.s.SchemaIndexCacheVisitorImpl$AsyncWorker
[Message]  Error during parallel index create/rebuild.
java.lang.NullPointerException: null
at 
org.apache.ignite.internal.processors.query.schema.SchemaIndexCacheVisitorImpl.processKey(SchemaIndexCacheVisitorImpl.java:246)
at 
org.apache.ignite.internal.processors.query.schema.SchemaIndexCacheVisitorImpl.processPartition(SchemaIndexCacheVisitorImpl.java:206)
at 
org.apache.ignite.internal.processors.query.schema.SchemaIndexCacheVisitorImpl.processPartitions(SchemaIndexCacheVisitorImpl.java:165)
at 
org.apache.ignite.internal.processors.query.schema.SchemaIndexCacheVisitorImpl.access$100(SchemaIndexCacheVisitorImpl.java:50)
at 
org.apache.ignite.internal.processors.query.schema.SchemaIndexCacheVisitorImpl$AsyncWorker.body(SchemaIndexCacheVisitorImpl.java:316)
at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
at java.lang.Thread.run(Thread.java:745)
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-7629) NPE when Finished indexes rebuilding for cache

2018-02-05 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-7629:
--

 Summary: NPE when Finished indexes rebuilding for cache
 Key: IGNITE-7629
 URL: https://issues.apache.org/jira/browse/IGNITE-7629
 Project: Ignite
  Issue Type: Bug
Affects Versions: 2.3
Reporter: Alexandr Kuramshin


Occurs after destroying cache while rebuilding indices in progress

{noformat}
Runtime error caught during grid runnable execution: GridWorker 
[name=index-rebuild-worker, igniteInstanceName=DPL_GRID%DplGridNodeName, 
finished=false, hashCode=1940633631, interrupted=false, 
runner=pub-#2054%DPL_GRID%DplGridNodeName%]
java.lang.NullPointerException: null
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$11.apply(GridCacheDatabaseSharedManager.java:1163)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$11.apply(GridCacheDatabaseSharedManager.java:1159)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:383)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.unblock(GridFutureAdapter.java:347)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.unblockAll(GridFutureAdapter.java:335)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:495)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:474)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:462)
at 
org.apache.ignite.internal.util.future.GridCompoundFuture.apply(GridCompoundFuture.java:125)
at 
org.apache.ignite.internal.util.future.GridCompoundFuture.apply(GridCompoundFuture.java:45)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:383)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.unblock(GridFutureAdapter.java:347)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.unblockAll(GridFutureAdapter.java:335)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:495)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:474)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:462)
at 
org.apache.ignite.internal.processors.query.GridQueryProcessor$3.body(GridQueryProcessor.java:1678)
...
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-7579) NPE in GridDhtLocalPartition.cacheMapHolder()

2018-01-30 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-7579:
--

 Summary: NPE in GridDhtLocalPartition.cacheMapHolder()
 Key: IGNITE-7579
 URL: https://issues.apache.org/jira/browse/IGNITE-7579
 Project: Ignite
  Issue Type: Bug
Affects Versions: 2.3
Reporter: Alexandr Kuramshin


The following scenario may occurs:
 # Multiple nodes construct an inactive cluster.
 # Cluster activation performed.
 # Some nodes fail activation.
 # On the other nodes caches will be stopped.
 # NPE occurs as a consequence of {{GridDhtPreloader.evictPartitionAsync()}}
{noformat}
Partition eviction failed, this can cause grid hang.
java.lang.NullPointerException: null
    at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition.cacheMapHolder(GridDhtLocalPartition.java:253)
    at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition.clearAll(GridDhtLocalPartition.java:880)
    at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition.tryEvict(GridDhtLocalPartition.java:753)
    at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$3.call(GridDhtPreloader.java:593)
    at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$3.call(GridDhtPreloader.java:580)
    at 
org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:6639)
    at 
org.apache.ignite.internal.processors.closure.GridClosureProcessor$2.body(GridClosureProcessor.java:967)
{noformat}
# Drop failed nodes from the cluster.
# The latter activation will be successful.
# PDS seems to be corrupted by the cause of NPE.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-7383) Failed to restore memory after cluster restart and activating from outdated node

2018-01-10 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-7383:
--

 Summary: Failed to restore memory after cluster restart and 
activating from outdated node
 Key: IGNITE-7383
 URL: https://issues.apache.org/jira/browse/IGNITE-7383
 Project: Ignite
  Issue Type: Bug
  Components: persistence
Affects Versions: 2.3
Reporter: Alexandr Kuramshin


Do the following steps for reproducing the problem:

1) start nodes 0-1-2

2) stop node 2

3) create a new cache and put some data into it

4) stop remaining nodes 0-1

5) start nodes 0-1-2

6) activate the cluster from the node 2

Then 2 different results could be taken depending on which node is coordinator:

a) node 2 is a coordinator:

{noformat}
Failed to activate node components 
[nodeId=42d762c7-b1e0-4283-939b-aeeb3c70, client=false, 
topVer=AffinityTopologyVersion [topVer=3, minorTopVer=1]]
class org.apache.ignite.IgniteCheckedException: Failed to find cache group 
descriptor [grpId=3119]
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.getPageMemoryForCacheGroup(GridCacheDatabaseSharedManager.java:1602)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restoreMemory(GridCacheDatabaseSharedManager.java:1544)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readCheckpointAndRestoreMemory(GridCacheDatabaseSharedManager.java:570)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onClusterStateChangeRequest(GridDhtPartitionsExchangeFuture.java:820)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:583)
at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2279)
at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
at java.lang.Thread.run(Thread.java:748)
{noformat}

and activation will be failed.

b) node 2 is NOT a coordinator:

we will get an error from the previous version, but the activation process will 
not be failed and then we will take "Failed to wait PME" after a number of 
assertions

{noformat}
Failed to process message [senderId=a940742f-bf17-41b4-bfc2-728bee72, 
messageType=class 
o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionsSingleMessage]
java.lang.AssertionError: -2100569601
at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.clientTopology(GridCachePartitionExchangeManager.java:733)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.updatePartitionSingleMap(GridDhtPartitionsExchangeFuture.java:2877)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.processSingleMessage(GridDhtPartitionsExchangeFuture.java:1935)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.access$100(GridDhtPartitionsExchangeFuture.java:116)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$2.apply(GridDhtPartitionsExchangeFuture.java:1810)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$2.apply(GridDhtPartitionsExchangeFuture.java:1798)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:383)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.listen(GridFutureAdapter.java:353)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onReceiveSingleMessage(GridDhtPartitionsExchangeFuture.java:1798)
at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.processSinglePartitionUpdate(GridCachePartitionExchangeManager.java:1484)
at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.access$1000(GridCachePartitionExchangeManager.java:131)
at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$2.onMessage(GridCachePartitionExchangeManager.java:327)
at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$2.onMessage(GridCachePartitionExchangeManager.java:307)
at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$MessageHandler.apply(GridCachePartitionExchangeManager.java:2627)
at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$MessageHandler.apply(GridCachePartitionExchangeManager.java:2606)
at 

[jira] [Created] (IGNITE-7163) Validate connection from a pre-previous node

2017-12-11 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-7163:
--

 Summary: Validate connection from a pre-previous node
 Key: IGNITE-7163
 URL: https://issues.apache.org/jira/browse/IGNITE-7163
 Project: Ignite
  Issue Type: Sub-task
Affects Versions: 2.3
Reporter: Alexandr Kuramshin
Assignee: Alexandr Kuramshin


If some pre-previous node connects to the local node with the previous node in 
the message's failed nodes collection additional steps should be done:

# Connection with the previous node should be validated.
# If a message from the previous node was not received a long time ago, the 
previous node should be considered as failed and the pre-previous node 
connection accepted.
# If the previous node connection is alive then different scenarios possible
## Answer with a new result code causing the pre-previous node to try to 
reconnect to the previous node
## Break connection with the pre-previous node causing to continue the possible 
cluster split.
## Check connections with nodes after pre-previous node and delay decision by 
answering RES_WAIT to get more predictable split and stable topology.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-7162) Control discovery messages processing time

2017-12-11 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-7162:
--

 Summary: Control discovery messages processing time
 Key: IGNITE-7162
 URL: https://issues.apache.org/jira/browse/IGNITE-7162
 Project: Ignite
  Issue Type: Sub-task
  Components: general
Affects Versions: 2.3
Reporter: Alexandr Kuramshin
Assignee: Alexandr Kuramshin


The majority of discovery message processing occurs in a single thread.

If some message processing takes significant time it causes delaying of 
processing other messages and further undesirable effects on another protocols.

Proposed to control processing time on the every node and total processing time 
of any given message. If processing takes significant time - log the warning.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-7161) Detect self-freeze on remote node related operations with timeout

2017-12-11 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-7161:
--

 Summary: Detect self-freeze on remote node related operations with 
timeout
 Key: IGNITE-7161
 URL: https://issues.apache.org/jira/browse/IGNITE-7161
 Project: Ignite
  Issue Type: Sub-task
Affects Versions: 2.3
Reporter: Alexandr Kuramshin
Assignee: Alexandr Kuramshin


After getting next timeout from 
{{IgniteSpiOperationTimeoutHelper.nextTimeoutChunk()}} we starting a network 
operation and expecting to end it at the specific timestamp (or near about).

We should take into account that some local thread freeze may be occurred. In 
such situation a remote node should not be considered as failed and the local 
network operation has to be retried.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-7160) Ignore messages from not alive and failed nodes

2017-12-11 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-7160:
--

 Summary: Ignore messages from not alive and failed nodes
 Key: IGNITE-7160
 URL: https://issues.apache.org/jira/browse/IGNITE-7160
 Project: Ignite
  Issue Type: Sub-task
  Components: general
Affects Versions: 2.3
Reporter: Alexandr Kuramshin
Assignee: Alexandr Kuramshin


Current implementation of {{ServerImpl}} accepts and processes messages from 
any other remote node even it was failed or removed from the ring.

Proposed to process only specific messages (which have to be processed in the 
current node state). Some messages could be silently ignored, receiving other 
undesirable messages causes the remote socket disconnect.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-7158) TCP discovery improvement

2017-12-11 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-7158:
--

 Summary: TCP discovery improvement
 Key: IGNITE-7158
 URL: https://issues.apache.org/jira/browse/IGNITE-7158
 Project: Ignite
  Issue Type: Improvement
  Components: general
Affects Versions: 2.3
Reporter: Alexandr Kuramshin
Assignee: Alexandr Kuramshin


Current TCP discovery implementation has different drawbacks which should be 
fixed.

See sub-tasks for details.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-7152) Failure detection timeout don't work on permanent send message errors causing infinite loop

2017-12-08 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-7152:
--

 Summary: Failure detection timeout don't work on permanent send 
message errors causing infinite loop
 Key: IGNITE-7152
 URL: https://issues.apache.org/jira/browse/IGNITE-7152
 Project: Ignite
  Issue Type: Bug
  Components: general
Affects Versions: 2.3
Reporter: Alexandr Kuramshin
Priority: Critical
 Fix For: 2.4


Relative to {{RingMessageWorker.sendMessageAcrossRing}} implementation.

{{IgniteSpiOperationTimeoutHelper}} reinitialized every time the socket 
successfully connected.

If any of {{IOException, IgniteCheckedException}} occurs upon message send the 
socket will be closed and old {{IgniteSpiOperationTimeoutHelper}} will be used 
to reconnect.

But after successful reconnect the new one will be created and the cycle 
repeat. With a permanent send message error this causes an infinite loop.

The only send error which may cause to exit out of the loop and the next node 
failure is {{IgniteSpiOperationTimeoutException, SocketTimeoutException, 
SocketException}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-7135) IgniteCluster.startNodes() returns successful ClusterStartNodeResult even though the remote process fails

2017-12-07 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-7135:
--

 Summary: IgniteCluster.startNodes() returns successful 
ClusterStartNodeResult even though the remote process fails
 Key: IGNITE-7135
 URL: https://issues.apache.org/jira/browse/IGNITE-7135
 Project: Ignite
  Issue Type: Bug
Affects Versions: 2.3
Reporter: Alexandr Kuramshin
 Fix For: 2.4


After unsuccessful start of three remote nodes with 
{{IgniteCluster#startNodes(Collection>, Map, 
boolean, int, int)}} we get {{Collection}} with three 
elements, each has {{isSuccess()}} is true.

But the remote node startup log was
{noformat}
nohup: ignoring input
/data/teamcity/work/820be461cd64b574/bin/ignite.sh, ERROR:
The version of JAVA installed in JAVA_HOME=/usr/lib/jvm/java-9-oracle is 
incorrect.
Please point JAVA_HOME variable to installation of JDK 1.7 or JDK 1.8.
You can also download latest JDK at http://java.com/download
{noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-7134) Never-ending timeout in IgniteSpiOperationTimeoutHelper.nextTimeoutChunk()

2017-12-07 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-7134:
--

 Summary: Never-ending timeout in 
IgniteSpiOperationTimeoutHelper.nextTimeoutChunk()
 Key: IGNITE-7134
 URL: https://issues.apache.org/jira/browse/IGNITE-7134
 Project: Ignite
  Issue Type: Bug
  Components: general
Affects Versions: 2.3
Reporter: Alexandr Kuramshin
Priority: Critical
 Fix For: 2.4


{noformat}
org.apache.ignite.spi.IgniteSpiOperationTimeoutHelper#nextTimeoutChunk

long curTs = U.currentTimeMillis();

timeout = timeout - (curTs - lastOperStartTs);
{noformat}

Timeout will not be decreased at all if delay between successive calls to 
nextTimeoutChunk() is smaller than U.currentTimeMillis() discretization. Such 
behaviour could be easily achieved when getting an error right after the 
nextTimeoutChunk() invocation and do the retry.

Only rare calls (the first right before U.currentTimeMillis() and the second 
right after that) may decrease timeout, so actual 
IgniteSpiOperationTimeoutHelper timeout could be much bigger than the 
failureDetectionTimeout.

My opinion to not split failureDetectionTimeout between network operations, but 
initialize first operation timestamp at first call to nextTimeoutChunk(), and 
then calculate the timeout as a difference between the current timestamp and 
the first operation timestamp.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-6967) PME deadlock on reassigning service deployment

2017-11-20 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-6967:
--

 Summary: PME deadlock on reassigning service deployment
 Key: IGNITE-6967
 URL: https://issues.apache.org/jira/browse/IGNITE-6967
 Project: Ignite
  Issue Type: Bug
  Components: general
Affects Versions: 2.3
Reporter: Alexandr Kuramshin


With a service deployment when topology change occurs the discovery event 
listener calls {{GridServiceProcessor.reassign()}} causing to acquire a lock on 
utility cache (where the GridServiceAssignments stored) which prevents PME from 
completion.

Stack traces:

{{noformat}}
Thread [name="test-runner-#186%service.IgniteServiceDynamicCachesSelfTest%", 
id=232, state=WAITING, blockCnt=0, waitCnt=8]
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
at 
o.a.i.i.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:177)
at o.a.i.i.util.future.GridFutureAdapter.get(GridFutureAdapter.java:140)
at o.a.i.i.IgniteKernal.createCache(IgniteKernal.java:2841)
at 
o.a.i.i.processors.service.IgniteServiceDynamicCachesSelfTest.testDeployCalledBeforeCacheStart(IgniteServiceDynamicCachesSelfTest.java:140)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at junit.framework.TestCase.runTest(TestCase.java:176)
at 
o.a.i.testframework.junits.GridAbstractTest.runTestInternal(GridAbstractTest.java:2000)
at 
o.a.i.testframework.junits.GridAbstractTest.access$000(GridAbstractTest.java:132)
at 
o.a.i.testframework.junits.GridAbstractTest$5.run(GridAbstractTest.java:1915)
at java.lang.Thread.run(Thread.java:748)

Thread [name="srvc-deploy-#38%service.IgniteServiceDynamicCachesSelfTest0%", 
id=56, state=WAITING, blockCnt=5, waitCnt=9]
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
at 
o.a.i.i.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:177)
at o.a.i.i.util.future.GridFutureAdapter.get(GridFutureAdapter.java:140)
at 
o.a.i.i.processors.cache.GridCacheContext.awaitStarted(GridCacheContext.java:443)
at 
o.a.i.i.processors.affinity.GridAffinityProcessor.affinityCache(GridAffinityProcessor.java:373)
at 
o.a.i.i.processors.affinity.GridAffinityProcessor.keysToNodes(GridAffinityProcessor.java:347)
at 
o.a.i.i.processors.affinity.GridAffinityProcessor.mapKeyToNode(GridAffinityProcessor.java:259)
at 
o.a.i.i.processors.service.GridServiceProcessor.reassign(GridServiceProcessor.java:1163)
at 
o.a.i.i.processors.service.GridServiceProcessor.access$2400(GridServiceProcessor.java:123)
at 
o.a.i.i.processors.service.GridServiceProcessor$TopologyListener$1.run0(GridServiceProcessor.java:1763)
at 
o.a.i.i.processors.service.GridServiceProcessor$DepRunnable.run(GridServiceProcessor.java:1976)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)

Locked synchronizers:
java.util.concurrent.ThreadPoolExecutor$Worker@27f723
{{noformat}}

Problematic code:
{{noformat}}
org.apache.ignite.internal.processors.service.GridServiceProcessor#reassign

try (GridNearTxLocal tx = cache.txStartEx(PESSIMISTIC, 
REPEATABLE_READ)) {
GridServiceAssignmentsKey key = new 
GridServiceAssignmentsKey(cfg.getName());

GridServiceAssignments oldAssigns = 
(GridServiceAssignments)cache.get(key);

Map cnts = new HashMap<>();

if (affKey != null) {
ClusterNode n = ctx.affinity().mapKeyToNode(cacheName, 
affKey, topVer);

// WAIT HERE UNTIL PME FINISHED (INFINITELY)
{{noformat}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-6965) affinityCall() with key mapping may not be successful with AlwaysFailoverSpi when node left

2017-11-20 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-6965:
--

 Summary: affinityCall() with key mapping may not be successful 
with AlwaysFailoverSpi when node left
 Key: IGNITE-6965
 URL: https://issues.apache.org/jira/browse/IGNITE-6965
 Project: Ignite
  Issue Type: Bug
  Components: cache, compute
Affects Versions: 2.3
Reporter: Alexandr Kuramshin


When doing {{affinityCall(cacheName, key, callable)}} there is a race between 
affinity node left then stopped and {{AlwaysFailoverSpi}} max attempts reached.

Suppose the following sequence (more probable when {{grid2.order}} >> 
{{grid1.order}}):

1. {{grid1.affinityCall(cacheName, key, callable)}}
2. {{grid1}}: {{key}} mapped to the primary partition on {{grid2}}
3. {{grid2.stop()}}
4. {{grid1}} receives {{NODE_LEFT}} and updates {{discoCache}}
5. {{grid1}} execution {{callable}} failed with 'Failed to send job request 
because remote node left grid (if fail-over is enabled, will attempt fail-over 
to another node'
6. {{grid1}}: {{AlwaysFailoverSpi}} max attempts reached.
7. {{grid1.affinityCall}} failed with 'Job failover failed because number of 
maximum failover attempts for affinity call is exceeded'
8. {{grid2}} receives verified node left message then stopping.

The patched {{CacheAffinityCallSelfTest}} reproduces the problem.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-6860) Lack of context information upon serializing and marshalling (writeObject and writeFields)

2017-11-10 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-6860:
--

 Summary: Lack of context information upon serializing and 
marshalling (writeObject and writeFields)
 Key: IGNITE-6860
 URL: https://issues.apache.org/jira/browse/IGNITE-6860
 Project: Ignite
  Issue Type: Bug
  Security Level: Public (Viewable by anyone)
  Components: general
Affects Versions: 2.3
Reporter: Alexandr Kuramshin
 Fix For: 2.4


Having the stack trace

{noformat}
Caused by: org.apache.ignite.binary.BinaryObjectException: Failed to marshal 
object with optimized marshaller: 
[org.apache.logging.log4j.core.config.AppenderControl@302e61a8]
at 
org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal0(BinaryWriterExImpl.java:186)
at 
org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal(BinaryWriterExImpl.java:147)
at 
org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal(BinaryWriterExImpl.java:134)
at 
org.apache.ignite.internal.binary.BinaryWriterExImpl.doWriteObject(BinaryWriterExImpl.java:496)
at 
org.apache.ignite.internal.binary.BinaryWriterExImpl.writeObjectField(BinaryWriterExImpl.java:1160)
at 
org.apache.ignite.internal.binary.BinaryFieldAccessor$DefaultFinalClassAccessor.write(BinaryFieldAccessor.java:663)
at 
org.apache.ignite.internal.binary.BinaryClassDescriptor.write(BinaryClassDescriptor.java:793)
at 
org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal0(BinaryWriterExImpl.java:206)
at 
org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal(BinaryWriterExImpl.java:147)
at 
org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal(BinaryWriterExImpl.java:134)
at 
org.apache.ignite.internal.binary.BinaryWriterExImpl.doWriteObject(BinaryWriterExImpl.java:496)
at 
org.apache.ignite.internal.binary.BinaryWriterExImpl.writeObjectField(BinaryWriterExImpl.java:1160)
at 
org.apache.ignite.internal.binary.BinaryFieldAccessor$DefaultFinalClassAccessor.write(BinaryFieldAccessor.java:663)
at 
org.apache.ignite.internal.binary.BinaryClassDescriptor.write(BinaryClassDescriptor.java:793)
at 
org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal0(BinaryWriterExImpl.java:206)
at 
org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal(BinaryWriterExImpl.java:147)
at 
org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal(BinaryWriterExImpl.java:134)
at 
org.apache.ignite.internal.binary.BinaryWriterExImpl.doWriteObject(BinaryWriterExImpl.java:496)
at 
org.apache.ignite.internal.binary.BinaryWriterExImpl.writeObjectField(BinaryWriterExImpl.java:1160)
at 
org.apache.ignite.internal.binary.BinaryFieldAccessor$DefaultFinalClassAccessor.write(BinaryFieldAccessor.java:663)
at 
org.apache.ignite.internal.binary.BinaryClassDescriptor.write(BinaryClassDescriptor.java:793)
at 
org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal0(BinaryWriterExImpl.java:206)
at 
org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal(BinaryWriterExImpl.java:147)
at 
org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal(BinaryWriterExImpl.java:134)
at 
org.apache.ignite.internal.binary.BinaryWriterExImpl.doWriteObject(BinaryWriterExImpl.java:496)
at 
org.apache.ignite.internal.binary.BinaryWriterExImpl.writeObjectField(BinaryWriterExImpl.java:1160)
at 
org.apache.ignite.internal.binary.BinaryFieldAccessor$DefaultFinalClassAccessor.write(BinaryFieldAccessor.java:663)
at 
org.apache.ignite.internal.binary.BinaryClassDescriptor.write(BinaryClassDescriptor.java:793)
at 
org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal0(BinaryWriterExImpl.java:206)
at 
org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal(BinaryWriterExImpl.java:147)
at 
org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal(BinaryWriterExImpl.java:134)
at 
org.apache.ignite.internal.binary.BinaryWriterExImpl.doWriteObject(BinaryWriterExImpl.java:496)
at 
org.apache.ignite.internal.binary.BinaryWriterExImpl.writeObjectField(BinaryWriterExImpl.java:1160)
at 
org.apache.ignite.internal.binary.BinaryFieldAccessor$DefaultFinalClassAccessor.write(BinaryFieldAccessor.java:663)
at 
org.apache.ignite.internal.binary.BinaryClassDescriptor.write(BinaryClassDescriptor.java:793)
at 
org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal0(BinaryWriterExImpl.java:206)
at 
org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal(BinaryWriterExImpl.java:147)
at 
org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal(BinaryWriterExImpl.java:134)
at 
org.apache.ignite.internal.binary.BinaryWriterExImpl.doWriteObject(BinaryWriterExImpl.java:496)
at 

[jira] [Created] (IGNITE-6858) Wait for exchange inside GridReduceQueryExecutor.query which never finishes due to opened transaction

2017-11-09 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-6858:
--

 Summary: Wait for exchange inside GridReduceQueryExecutor.query 
which never finishes due to opened transaction
 Key: IGNITE-6858
 URL: https://issues.apache.org/jira/browse/IGNITE-6858
 Project: Ignite
  Issue Type: Bug
  Security Level: Public (Viewable by anyone)
  Components: sql
Affects Versions: 2.3
Reporter: Alexandr Kuramshin
Assignee: Vladimir Ozerov
 Fix For: 2.4


Infinite waiting in loop

{noformat}
for (int attempt = 0;; attempt++) {
if (attempt != 0) {
try {
Thread.sleep(attempt * 10); // Wait for exchange.
}
catch (InterruptedException e) {
Thread.currentThread().interrupt();

throw new CacheException("Query was interrupted.", e);
}
}
{noformat}

because of exchange will wait for partition eviction with opened transaction in 
a related thread

{noformat}
at java.lang.Thread.sleep(Native Method)
at 
o.a.i.i.processors.query.h2.twostep.GridReduceQueryExecutor.query(GridReduceQueryExecutor.java:546)
at 
o.a.i.i.processors.query.h2.IgniteH2Indexing$8.iterator(IgniteH2Indexing.java:1236)
at 
o.a.i.i.processors.cache.QueryCursorImpl.iterator(QueryCursorImpl.java:95)
{noformat}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-6636) BinaryStream position integer overflow

2017-10-16 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-6636:
--

 Summary: BinaryStream position integer overflow
 Key: IGNITE-6636
 URL: https://issues.apache.org/jira/browse/IGNITE-6636
 Project: Ignite
  Issue Type: Bug
  Security Level: Public (Viewable by anyone)
  Components: general
Affects Versions: 2.2
Reporter: Alexandr Kuramshin


There were some issues with negative {{BinaryAbstractStream#pos}} value.

We may get stack trace like that
{noformat}
java.lang.ArrayIndexOutOfBoundsException: -2142240123
at 
org.apache.ignite.internal.binary.streams.BinaryHeapOutputStream.writeByteAndShift(BinaryHeapOutputStream.java)
at 
org.apache.ignite.internal.binary.streams.BinaryAbstractOutputStream.writeByte(BinaryAbstractOutputStream.java)
at 
org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal0(BinaryWriterExImpl.java)
at 
org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal(BinaryWriterExImpl.java)
{noformat}

The worst of it is that the {{ArrayIndexOutOfBoundsException}} has been thrown 
on the next write to the stream, and upon stack unwinding we couldn't know 
which object actually cause the overflow.

I've to suggest to check all updates to the {{BinaryAbstractStream#pos}} and 
throw exception right after the change.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-6536) NPE on registerClassName() with MappedName

2017-10-02 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-6536:
--

 Summary: NPE on registerClassName() with MappedName
 Key: IGNITE-6536
 URL: https://issues.apache.org/jira/browse/IGNITE-6536
 Project: Ignite
  Issue Type: Bug
  Components: binary
Affects Versions: 2.1
Reporter: Alexandr Kuramshin
 Fix For: None


{{NullPointerException}} occurs in 
{{org.apache.ignite.internal.MarshallerContextImpl#registerClassName}} on 
trying to compare {{mappedName.className()}} of already existed {{typeId}} 
mapping with the new one {{clsName}} has come as a parameter.

Actually 
{{org.apache.ignite.internal.processors.marshaller.MappedName#className}} may 
not be null but it was. So we should check {{clsName}} comes in {{MappedName}} 
constructor, to prevent same NPEs in the future.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-6521) Review default JVM options for better performance

2017-09-28 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-6521:
--

 Summary: Review default JVM options for better performance
 Key: IGNITE-6521
 URL: https://issues.apache.org/jira/browse/IGNITE-6521
 Project: Ignite
  Issue Type: Improvement
  Components: general, visor
Affects Versions: 2.1
Reporter: Alexandr Kuramshin
Assignee: Alexandr Kuramshin


Non-optimal recommendations are present in ignite startup scrips

{noformat}
::
:: Uncomment the following GC settings if you see spikes in your throughput due 
to Garbage Collection.
::
:: set JVM_OPTS=%JVM_OPTS% -XX:+UseParNewGC -XX:+UseConcMarkSweepGC 
-XX:+UseTLAB -XX:NewSize=128m -XX:MaxNewSize=128m
:: set JVM_OPTS=%JVM_OPTS% -XX:MaxTenuringThreshold=0 -XX:SurvivorRatio=1024 
-XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=60
{noformat}

Some utilities (like Visor) are hanged up in continuous GCs when connected to 
large clusters (above one hundred nodes). Even after using large heap (about 32 
Gb).

I'd like to propose to remove this lines and modify default JVM_OPTS as follows

{noformat}
set JVM_OPTS=-Xms1g -Xmx8g -XX:+UseG1GC -server -XX:+AggressiveOpts 
-XX:MaxPermSize=256m
{noformat}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-6519) Race in SplitAwareTopologyValidator on activator and server node join

2017-09-28 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-6519:
--

 Summary: Race in SplitAwareTopologyValidator on activator and 
server node join
 Key: IGNITE-6519
 URL: https://issues.apache.org/jira/browse/IGNITE-6519
 Project: Ignite
  Issue Type: Bug
  Components: cache
Affects Versions: 2.1
Reporter: Alexandr Kuramshin
Assignee: Alexandr Kuramshin


The following sequence may occur:

1. {{SplitAwareTopologyValidator}} detects split, gets {{NOTVALID}} and returns 
false from {{validate()}}

2. Activator node joins and {{SplitAwareTopologyValidator}} gets {{REPAIRED}}

3. Server node joins from other DC and it makes {{SplitAwareTopologyValidator}} 
gets {{VALID}}

4. Then the server node left the cluster and {{SplitAwareTopologyValidator}} 
should return false from {{validate()}} in cause of next split

But current implementation makes {{SplitAwareTopologyValidator}} 
auto-{{REPAIRED}}. Actually if the activator node will being forgotten to leave 
the cluster it may automatically repair a split many times. But it supposed to 
be manual operation.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-6499) Compact NULL fields binary representation

2017-09-26 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-6499:
--

 Summary: Compact NULL fields binary representation
 Key: IGNITE-6499
 URL: https://issues.apache.org/jira/browse/IGNITE-6499
 Project: Ignite
  Issue Type: Improvement
  Components: binary
Affects Versions: 2.1
Reporter: Alexandr Kuramshin
Assignee: Vladimir Ozerov


Current compact footer implementation writes offset for the every field in 
schema. Depending on serialized size of an object offset may be 1, 2 or 4 bytes.

Imagine an object with some 100 fields are null. It takes from 100 to 400 bytes 
overhead. For middle-sized objects (about 260 bytes) it doubles the memory 
usage. For a small-sized objects (about 40 bytes) the memory usage increased by 
factor 3 or 4.

Proposed two optimizations, the both should be implemented, the most optimal 
implementation should be selected dynamically upon object marshalling.

1. Write field ID and offset for the only non-null fields in footer.

2. Write footer header then field offsets for the only non-null fields as 
follows

[0] bit mask for first 8 fields, 0 - field is null, 1 - field is non-null
[1] cumulative sum of "1" bits
[2] bit mask for the next 8 fields
[3] cumulative sum of "1" bits
... and so on
[N1...N2] offset of first non-null field
[N3...N4] offset of next non-null field
... and so on

If we want to read fields from 0 to 7, then we read first footer byte, step 
through bits and find the offset index for non-null field or find that field is 
null.

If we want to read fields from 8, then we read two footer bytes, take start 
offset from the first byte, and then step through bits and find the offset 
index for non-null field or find that field is null.

This supports up to 255 non-null fields per binary object.

Overhead would be only 24 bytes per 100 null fields instead of 200 bytes for 
the middle-sized object.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-6491) Race in TopologyValidator.validate() and EVT_NODE_LEFT listener calls (split-brain activator)

2017-09-25 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-6491:
--

 Summary: Race in TopologyValidator.validate() and EVT_NODE_LEFT 
listener calls (split-brain activator)
 Key: IGNITE-6491
 URL: https://issues.apache.org/jira/browse/IGNITE-6491
 Project: Ignite
  Issue Type: Bug
  Components: cache, general
Affects Versions: 2.1
Reporter: Alexandr Kuramshin
Assignee: Alexandr Kuramshin
 Fix For: 2.2


The following wrong cache {{validate}}/{{put}} sequence may occur

On node left {{GridDhtPartitionsExchangeFuture}} will be generated by the 
{{disco-event-worker}} thread.

Then the {{exchange-worker}} thread does

{noformat}
Split-brain detected [cacheName=test40, activatorTopVer=0, cacheTopVer=14]
at 
org.apache.ignite.internal.util.IgniteUtils.dumpStack(IgniteUtils.java:1141)
at 
org.apache.ignite.internal.processors.cache.IgniteTopologyValidatorGridSplitCacheTest$SplitAwareTopologyValidator.validate(IgniteTopologyValidatorGridSplitCacheTest.java:307)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTopologyFutureAdapter.validateCacheGroup(GridDhtTopologyFutureAdapter.java:64)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onDone(GridDhtPartitionsExchangeFuture.java:1456)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onDone(GridDhtPartitionsExchangeFuture.java:115)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:450)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:668)
at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2278)
{noformat}

The result of validation is stored in {{grpValidRes}} with value of {{false}}.

After some delay the {{disco-event-worker}} thread will do

{noformat}
java.lang.Exception: Node is segment activator [cacheName=test40, 
activatorTopVer=14]
at 
org.apache.ignite.internal.util.IgniteUtils.dumpStack(IgniteUtils.java:1141)
at 
org.apache.ignite.internal.processors.cache.IgniteTopologyValidatorGridSplitCacheTest$SplitAwareTopologyValidator$2.apply(IgniteTopologyValidatorGridSplitCacheTest.java:360)
at 
org.apache.ignite.internal.processors.cache.IgniteTopologyValidatorGridSplitCacheTest$SplitAwareTopologyValidator$2.apply(IgniteTopologyValidatorGridSplitCacheTest.java:349)
at 
org.apache.ignite.internal.managers.eventstorage.GridEventStorageManager$UserListenerWrapper.onEvent(GridEventStorageManager.java:1463)
at 
org.apache.ignite.internal.managers.eventstorage.GridEventStorageManager.notifyListeners(GridEventStorageManager.java:859)
at 
org.apache.ignite.internal.managers.eventstorage.GridEventStorageManager.notifyListeners(GridEventStorageManager.java:844)
at 
org.apache.ignite.internal.managers.eventstorage.GridEventStorageManager.record0(GridEventStorageManager.java:341)
at 
org.apache.ignite.internal.managers.eventstorage.GridEventStorageManager.record(GridEventStorageManager.java:307)
at 
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryWorker.recordEvent(GridDiscoveryManager.java:2478)
at 
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryWorker.body0(GridDiscoveryManager.java:2684)
at 
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryWorker.body(GridDiscoveryManager.java:2507)
{noformat}

After this invocation the result of {{SplitAwareTopologyValidator.validate}} 
should be changed to {{true}}, but it was already invoked and the result has 
been cached in {{grpValidRes}} with the value of {{false}}.

So any successive calls to {{cache.put}} causes to fail

{noformat}
Test failed.
java.lang.RuntimeException: tryPut() failed 
[gridName=cache.IgniteTopologyValidatorGridSplitCacheTest0]
at 
org.apache.ignite.internal.processors.cache.IgniteTopologyValidatorGridSplitCacheTest.tryPut(IgniteTopologyValidatorGridSplitCacheTest.java:262)
at 
org.apache.ignite.internal.processors.cache.IgniteTopologyValidatorGridSplitCacheTest.testTopologyValidator(IgniteTopologyValidatorGridSplitCacheTest.java:182)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at junit.framework.TestCase.runTest(TestCase.java:176)
at 

[jira] [Created] (IGNITE-6347) Exception in GridDhtPartitionMap.readExternal

2017-09-11 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-6347:
--

 Summary: Exception in GridDhtPartitionMap.readExternal
 Key: IGNITE-6347
 URL: https://issues.apache.org/jira/browse/IGNITE-6347
 Project: Ignite
  Issue Type: Bug
  Components: general
Affects Versions: 2.1
Reporter: Alexandr Kuramshin
 Fix For: 2.1


Reading partition state with {{id > Short.MAX_VALUE}} causes to read negative 
value in {{int part = in.readShort()}}

{{in.readUnsignedShort()}} should be used instead.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-5798) Logging Ignite configuration at startup

2017-07-21 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-5798:
--

 Summary: Logging Ignite configuration at startup
 Key: IGNITE-5798
 URL: https://issues.apache.org/jira/browse/IGNITE-5798
 Project: Ignite
  Issue Type: Improvement
Reporter: Alexandr Kuramshin
 Fix For: 2.1


I've found that IgniteConfiguration is not logged even when -DIGNITE_QUIET=false

When we starting Ignite with path to the xml, or InputStream, we have to 
ensure, that all configuration options were properly read. And also we would 
like to know actual values of uninitialized configuration properties (default 
values), which will be set only after Ignite get started.

Monitoring tools, like Visor or WebConsole, do not show all configuration 
options. And even though they will be updated to show all properties, when new 
configuration options appear, then tools update will be needed.

Logging IgniteConfiguration at startup gives a possibility to ensure that the 
right grid configuration has been applied and leads to better user support 
based on log analyzing.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-5750) Format of uptime for metrics

2017-07-13 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-5750:
--

 Summary: Format of uptime for metrics
 Key: IGNITE-5750
 URL: https://issues.apache.org/jira/browse/IGNITE-5750
 Project: Ignite
  Issue Type: Bug
  Components: general
Affects Versions: 2.0
Reporter: Alexandr Kuramshin
Priority: Trivial
 Fix For: 2.1


Metrics for local node shows uptime formatted as 00:00:00:000

But the last colon should be changed to the dot.

Right format is 00:00:00.000



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-5251) Some JVM implementations may return null from getClassLoader()

2017-05-18 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-5251:
--

 Summary: Some JVM implementations may return null from 
getClassLoader()
 Key: IGNITE-5251
 URL: https://issues.apache.org/jira/browse/IGNITE-5251
 Project: Ignite
  Issue Type: Bug
  Components: general
Affects Versions: 2.0
 Environment: OpenJDK Runtime Environment (build 1.8.0_131-b11)
OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode)
Reporter: Alexandr Kuramshin
 Fix For: 2.1


Starting Ignite instance causes the NPE

{noformat}
java.lang.NullPointerException
at 
org.apache.ignite.internal.util.IgniteUtils.appendClassLoaderHash(IgniteUtils.java:4438)
at 
org.apache.ignite.internal.util.IgniteUtils.makeMBeanName(IgniteUtils.java:4418)
at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.registerFactoryMbean(IgnitionEx.java:2499)
at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:1801)
at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1604)
at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1041)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:568)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:515)
at org.apache.ignite.Ignition.start(Ignition.java:322)
{noformat}

Should be implemented {{IgniteUtils.getClassLoader(Class cls)}} which checks 
{{cls.getClassLoader()}} and in the case of null returns 
{{ClassLoader.getSystemClassLoader()}}.

All usages of {{Class.getClassLoader()}} should be replaced with  
{{IgniteUtils.getClassLoader()}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (IGNITE-5084) PagesList.put() assertion: pageId != tailId

2017-04-26 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-5084:
--

 Summary: PagesList.put() assertion: pageId != tailId
 Key: IGNITE-5084
 URL: https://issues.apache.org/jira/browse/IGNITE-5084
 Project: Ignite
  Issue Type: Bug
  Components: general
Affects Versions: 2.0
Reporter: Alexandr Kuramshin


Get an error upon rebalancing on topology update

{noformat}
Failed processing message [senderId=78a8f841-5d40-4ac7-b26b-f1b5e7f3faa0, 
msg=GridDhtPartitionSupplyMessageV2 [updateSeq=142, 
topVer=AffinityTopologyVersion [topVer=8, minorTopVer=0], missed=null, 
clean=[0, 1, 2, 3, 4, 5, 6, 7, 8, 11, 12, 13, 14, 17, 19, 21, 20, 23, 22, 24, 
26, 29, 28, 31, 35, 32, 33, 39, 36, 37, 42, 43, 40, 41, 44, 45, 51, 48, 55, 54, 
53, 52, 58, 56, 63, 62, 60, 68, 69, 65, 66, 76, 77, 78, 74, 85, 87, 86, 81, 80, 
82, 92, 91, 90, 98, 96, 97], msgSize=0, size=67, parts=[0, 1, 2, 3, 4, 5, 6, 7, 
8, 11, 12, 13, 14, 17, 19, 21, 20, 23, 22, 24, 26, 29, 28, 31, 35, 32, 33, 39, 
36, 37, 42, 43, 40, 41, 44, 45, 51, 48, 55, 54, 53, 52, 58, 56, 63, 62, 60, 68, 
69, 65, 66, 76, 77, 78, 74, 85, 87, 86, 81, 80, 82, 92, 91, 90, 98, 96, 97], 
super=GridCacheMessage [msgId=100460, depInfo=null, err=null, 
skipPrepare=false, cacheId=-2100569601, cacheId=-2100569601]]]
java.lang.AssertionError: pageId = 0, tailId = 281556581089286
at 
org.apache.ignite.internal.processors.cache.database.freelist.PagesList.put(PagesList.java:~)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (IGNITE-5026) getOrCreateCaches() hangs if any exception in GridDhtPartitionsExchangeFuture.init()

2017-04-19 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-5026:
--

 Summary: getOrCreateCaches() hangs if any exception in 
GridDhtPartitionsExchangeFuture.init()
 Key: IGNITE-5026
 URL: https://issues.apache.org/jira/browse/IGNITE-5026
 Project: Ignite
  Issue Type: Bug
  Components: cache
Affects Versions: 1.9, 2.0
Reporter: Alexandr Kuramshin
 Fix For: 2.1


Any exception has been thrown by {{GridDhtPartitionsExchangeFuture.init()}} 
causes to wait indefinitely {{GridCompoundFuture}} returned by 
{{GridCacheProcessor.dynamicStartCaches()}}.

Reproduced by 
{{IgniteDynamicCacheStartSelfTest.testGetOrCreateCollectionExceptional()}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (IGNITE-4865) Non-informative error message on using GridClientOptimizedMarshaller with unknown task classes

2017-03-27 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-4865:
--

 Summary: Non-informative error message on using 
GridClientOptimizedMarshaller with unknown task classes
 Key: IGNITE-4865
 URL: https://issues.apache.org/jira/browse/IGNITE-4865
 Project: Ignite
  Issue Type: Improvement
  Components: rest
Affects Versions: 2.0
Reporter: Alexandr Kuramshin
Assignee: Alexandr Kuramshin


Upon {{GridClientCompute.execute()}} I get non-informative error if a task 
class is not present in {{classnames.properties}}. It occurs when 
{{GridClient}} was configured to use {{GridClientOptimizedMarshaller}}.

{noformat}
Closing NIO session because of unhandled exception [cls=class 
o.a.i.i.util.nio.GridNioException, msg=class o.a.i.IgniteCheckedException: 
Failed to deserialize object with given class loader: null]
{noformat}

There is two problems:
* Actual problem was hidden
{noformat}
Caused by: java.lang.UnsupportedOperationException
at 
org.apache.ignite.internal.client.marshaller.optimized.GridClientOptimizedMarshaller$ClientMarshallerContext.className(GridClientOptimizedMarshaller.java:137)
at 
org.apache.ignite.internal.MarshallerContextAdapter.getClass(MarshallerContextAdapter.java:174)
at 
org.apache.ignite.marshaller.optimized.OptimizedMarshallerUtils.classDescriptor(OptimizedMarshallerUtils.java:266)
at 
org.apache.ignite.marshaller.optimized.OptimizedObjectInputStream.readObjectOverride(OptimizedObjectInputStream.java:318)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:367)
{noformat}
* Even reading the cause we don't understand what is wrong

What to do:
* Log stacktrace every time
* Throw UnsupportedOperationException with informative message.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (IGNITE-4767) rollback exception hides the origin exception (e.g. commit)

2017-03-02 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-4767:
--

 Summary: rollback exception hides the origin exception (e.g. 
commit)
 Key: IGNITE-4767
 URL: https://issues.apache.org/jira/browse/IGNITE-4767
 Project: Ignite
  Issue Type: Bug
  Components: cache, general
Affects Versions: 1.8
Reporter: Alexandr Kuramshin
 Fix For: 2.0


There is too much code places like:
{noformat}
try {
return txFuture.get();
}
catch (IgniteCheckedException e) {
tx.rollbackAsync();

throw e;
}
{noformat}
where an error upon rollback hides the actual exception {{e}}.

This should be implemented in the way like try-with-resources does:
{noformat}
try {
return txFuture.get();
}
catch (IgniteCheckedException e1) {
try {
tx.rollbackAsync();
}
catch (Throwable inner) {
e.addSuppressed(inner);
}

throw e;
}
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (IGNITE-4632) AffinityFunction unchecked exception handling (unassigned backup)

2017-01-31 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-4632:
--

 Summary: AffinityFunction unchecked exception handling (unassigned 
backup)
 Key: IGNITE-4632
 URL: https://issues.apache.org/jira/browse/IGNITE-4632
 Project: Ignite
  Issue Type: Bug
  Components: general
Affects Versions: 1.8
Reporter: Alexandr Kuramshin
Priority: Minor


{{AffinityFunction}} implementation may throw unchecked exception upon 
assignment. In some cases additional processing should be performed when 
affinity function method invocation throws an exception.

Special case when the cache with backups is running, and a node with a primary 
partition will left. Then we get the primary partition unassigned if 
{{AffinityFunction.partition(Object)}} throws an exception. My suggestion is to 
shutdown the node in such the case (like SEGMENTED), because the cluster could 
not work normally without the primary partition assigned.

{noformat}
Failed processing message [senderId=8a1ab9a3-786e-4601-ba22-efd380849d99, 
msg=GridDhtPartitionSupplyMessageV2 [updateSeq=16069, 
topVer=AffinityTopologyVersion [topVer=7, minorTopVer=0], missed=[16, 17, 33, 
22, 56, 10], clean=[0, 1, 2, 34, 3, 5, 7, 9, 45, 46, 49, 18, 50, 55, 25, 26, 
58, 29, 61], msgSize=0, size=19, parts=[0, 1, 2, 34, 3, 5, 7, 9, 45, 46, 49, 
18, 50, 55, 25, 26, 58, 29, 61], super=GridCacheMessage [msgId=70098615, 
depInfo=null, err=null, skipPrepare=false, cacheId=-148990687, 
cacheId=-148990687]]]
com.sbt.persistence.exceptions.DPLException: ParticleKeyMapper не может 
обратывать никаких других объектов кроме ОУ. Системная ошибка - обратитесь в 
службу технической поддержки DPL
 at 
com.sbt.dpl.gridgain.ParticleAffinityFunction.partition(ParticleAffinityFunction.java:67)
 at 
org.apache.ignite.internal.processors.cache.GridCacheAffinityManager.partition(GridCacheAffinityManager.java:219)
 at 
org.apache.ignite.internal.processors.cache.GridCacheAffinityManager.partition(GridCacheAffinityManager.java:194)
 at 
org.apache.ignite.internal.processors.cache.GridCacheAffinityManager.localNode(GridCacheAffinityManager.java:382)
 at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander.handleSupplyMessage(GridDhtPartitionDemander.java:680)
 at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader.handleSupplyMessage(GridDhtPreloader.java:390)
 at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$5.apply(GridCachePartitionExchangeManager.java:395)
 at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$5.apply(GridCachePartitionExchangeManager.java:385)
 at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:758)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (IGNITE-4538) BinaryObjectImpl: lack of context information upon deserialization

2017-01-10 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-4538:
--

 Summary: BinaryObjectImpl: lack of context information upon 
deserialization
 Key: IGNITE-4538
 URL: https://issues.apache.org/jira/browse/IGNITE-4538
 Project: Ignite
  Issue Type: Improvement
  Components: binary
Affects Versions: 1.8, 1.7
Reporter: Alexandr Kuramshin


Taking an error we don't know the cache name was accessed, the type of 
BinaryClassDescriptor was used, and the entry was accessed (the key of an entry 
should be logged with respect to the *include sensitive* system property).

Such context information should be appended by wrapping inner exception on the 
every key stack frame.

{noformat}
org.apache.ignite.binary.BinaryObjectException: Unexpected flag value [pos=24, 
expected=4, actual=9]
at 
org.apache.ignite.internal.binary.BinaryReaderExImpl.checkFlagNoHandles(BinaryReaderExImpl.java:1423)
 ~[ignite-core-1.10.1.ea7.jar:1.10.1.ea7]
at 
org.apache.ignite.internal.binary.BinaryReaderExImpl.readLongNullable(BinaryReaderExImpl.java:723)
 ~[ignite-core-1.10.1.ea7.jar:1.10.1.ea7]
at 
org.apache.ignite.internal.binary.BinaryFieldAccessor$DefaultFinalClassAccessor.readFixedType(BinaryFieldAccessor.java:677)
 ~[ignite-core-1.10.1.ea7.jar:1.10.1.ea7]
at 
org.apache.ignite.internal.binary.BinaryFieldAccessor$DefaultFinalClassAccessor.read(BinaryFieldAccessor.java:639)
 ~[ignite-core-1.10.1.ea7.jar:1.10.1.ea7]
at 
org.apache.ignite.internal.binary.BinaryClassDescriptor.read(BinaryClassDescriptor.java:818)
 ~[ignite-core-1.10.1.ea7.jar:1.10.1.ea7]
at 
org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize(BinaryReaderExImpl.java:1481)
 ~[ignite-core-1.10.1.ea7.jar:1.10.1.ea7]
at 
org.apache.ignite.internal.binary.BinaryObjectImpl.deserializeValue(BinaryObjectImpl.java:717)
 ~[ignite-core-1.10.1.ea7.jar:1.10.1.ea7]
at 
org.apache.ignite.internal.binary.BinaryObjectImpl.value(BinaryObjectImpl.java:143)
 ~[ignite-core-1.10.1.ea7.jar:1.10.1.ea7]
at 
org.apache.ignite.internal.processors.cache.CacheObjectContext.unwrapBinary(CacheObjectContext.java:272)
 ~[ignite-core-1.10.1.ea7.jar:1.10.1.ea7]
at 
org.apache.ignite.internal.processors.cache.CacheObjectContext.unwrapBinaryIfNeeded(CacheObjectContext.java:160)
 ~[ignite-core-1.10.1.ea7.jar:1.10.1.ea7]
at 
org.apache.ignite.internal.processors.cache.CacheObjectContext.unwrapBinaryIfNeeded(CacheObjectContext.java:147)
 ~[ignite-core-1.10.1.ea7.jar:1.10.1.ea7]
at 
org.apache.ignite.internal.processors.cache.GridCacheContext.unwrapBinaryIfNeeded(GridCacheContext.java:1706)
 ~[ignite-core-1.10.1.ea7.jar:1.10.1.ea7]
at 
org.apache.ignite.internal.processors.cache.query.GridCacheQueryManager$PeekValueExpiryAwareIterator.advance(GridCacheQueryManager.java:2875)
 ~[ignite-core-1.10.1.ea7.jar:1.10.1.ea7]
at 
org.apache.ignite.internal.processors.cache.query.GridCacheQueryManager$PeekValueExpiryAwareIterator.(GridCacheQueryManager.java:2814)
 ~[ignite-core-1.10.1.ea7.jar:1.10.1.ea7]
at 
org.apache.ignite.internal.processors.cache.query.GridCacheQueryManager$PeekValueExpiryAwareIterator.(GridCacheQueryManager.java:2752)
 ~[ignite-core-1.10.1.ea7.jar:1.10.1.ea7]
at 
org.apache.ignite.internal.processors.cache.query.GridCacheQueryManager$5.(GridCacheQueryManager.java:863)
 ~[ignite-core-1.10.1.ea7.jar:1.10.1.ea7]
at 
org.apache.ignite.internal.processors.cache.query.GridCacheQueryManager.scanIterator(GridCacheQueryManager.java:863)
 ~[ignite-core-1.10.1.ea7.jar:1.10.1.ea7]
at 
org.apache.ignite.internal.processors.cache.query.GridCacheQueryManager.scanQueryLocal(GridCacheQueryManager.java:1436)
 ~[ignite-core-1.10.1.ea7.jar:1.10.1.ea7]
at 
org.apache.ignite.internal.processors.cache.query.GridCacheQueryAdapter.executeScanQuery(GridCacheQueryAdapter.java:552)
 ~[ignite-core-1.10.1.ea7.jar:1.10.1.ea7]
at 
org.apache.ignite.internal.processors.cache.GridCacheAdapter.igniteIterator(GridCacheAdapter.java:4115)
 ~[ignite-core-1.10.1.ea7.jar:1.10.1.ea7]
at 
org.apache.ignite.internal.processors.cache.GridCacheAdapter.igniteIterator(GridCacheAdapter.java:4092)
 ~[ignite-core-1.10.1.ea7.jar:1.10.1.ea7]
at 
org.apache.ignite.internal.processors.cache.IgniteCacheProxy.iterator(IgniteCacheProxy.java:1979)
 ~[ignite-core-1.10.1.ea7.jar:1.10.1.ea7]
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (IGNITE-4533) GridDhtPartitionsExchangeFuture stores unnecessary messages after processing done

2017-01-10 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-4533:
--

 Summary: GridDhtPartitionsExchangeFuture stores unnecessary 
messages after processing done
 Key: IGNITE-4533
 URL: https://issues.apache.org/jira/browse/IGNITE-4533
 Project: Ignite
  Issue Type: Bug
  Components: cache
Affects Versions: 1.8, 1.7
Reporter: Alexandr Kuramshin


After GridDhtPartitionsExchangeFuture has been completed, 
GridCachePartitionExchangeManager still stores it in field ExchangeFutureSet 
exchFuts (for race condition handling).

But many GridDhtPartitionsSingleMessage objects stored in field 
ConcurrentMap msgs is not needed after 
the future has been processed.

This map should be cleared in the end of the method onAllReceived().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (IGNITE-4496) Review all logging for sensitive data leak

2016-12-26 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-4496:
--

 Summary: Review all logging for sensitive data leak
 Key: IGNITE-4496
 URL: https://issues.apache.org/jira/browse/IGNITE-4496
 Project: Ignite
  Issue Type: Improvement
Reporter: Alexandr Kuramshin
Assignee: Alexandr Kuramshin


While sensitive logging option added and toString() methods fixed, not all 
logging was checked for sensitive data leak



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (IGNITE-4485) CacheJdbcPojoStore returns unexpected BinaryObject upon loadCache()

2016-12-22 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-4485:
--

 Summary: CacheJdbcPojoStore returns unexpected BinaryObject upon 
loadCache()
 Key: IGNITE-4485
 URL: https://issues.apache.org/jira/browse/IGNITE-4485
 Project: Ignite
  Issue Type: Bug
  Components: cache
Affects Versions: 1.8, 1.7
Reporter: Alexandr Kuramshin


When calling loadCache(IgniteBiInClosure clo, Object... args) sometimes we get 
unexpected values of type BinaryObject in IgniteBiInClosure.apply(), whereas 
POJO value kind was registered previously for well known key type.

It's so because getOrCreateCacheMappings returns HashMap which resorts entity 
mappings for the same key but with different value kind. When BinaryMarshaller 
is used, then this map contains two mappings for the same key - POJO and BINARY.

Possible fix is to use LinkedHashMap, then POJO mapping will be picked first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (IGNITE-4417) OptimizedMarshaller: show property path causing ClassNotFoundException

2016-12-12 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-4417:
--

 Summary: OptimizedMarshaller: show property path causing 
ClassNotFoundException
 Key: IGNITE-4417
 URL: https://issues.apache.org/jira/browse/IGNITE-4417
 Project: Ignite
  Issue Type: Improvement
  Components: general
Reporter: Alexandr Kuramshin
Priority: Minor


When OptimizedMarshaller could not unmarshal an object on remote side by cause 
of ClassNotFoundException, then IgniteCheckedException is thrown.

We could see in stack trace the class loader toString() value and the name of 
the class which was not found. This information is insufficient.

We should also know which field or property of an object causes 
ClassNotFoundException. And, if this object contains inside another object, we 
should know the type of this object and its field or property as well.

For example, IgniteCheckedException: Failed to unmarshal an object ClassName1 
root.ClassName2 fieldName2.ClassName3 propName3. Given class loader: 
classLoaderToString.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (IGNITE-4245) Get EXCEPTION_ACCESS_VIOLATION with OFFHEAP_TIRED cache

2016-11-18 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-4245:
--

 Summary: Get EXCEPTION_ACCESS_VIOLATION with OFFHEAP_TIRED cache
 Key: IGNITE-4245
 URL: https://issues.apache.org/jira/browse/IGNITE-4245
 Project: Ignite
  Issue Type: Bug
Affects Versions: 1.7, 1.6, 1.8
Reporter: Alexandr Kuramshin


Get EXCEPTION_ACCESS_VIOLATION while iterating through local cache entries 
stored in the OFFHEAP_TIRED cache.

Test class and log are attached.

I've try the same test on 1.6.11, 1.7.4 and 1.8 versions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)