[jira] [Created] (IGNITE-7788) Data loss after cold restart with PDS and cache group change

2018-02-22 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-7788:
--

 Summary: Data loss after cold restart with PDS and cache group 
change
 Key: IGNITE-7788
 URL: https://issues.apache.org/jira/browse/IGNITE-7788
 Project: Ignite
  Issue Type: Bug
  Components: persistence
Affects Versions: 2.3
Reporter: Alexandr Kuramshin


Reproduced by improved test 
{{IgnitePdsCacheRestoreTest.testRestoreAndNewCache6}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-7723) Data loss after node restart with PDS

2018-02-15 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-7723:
--

 Summary: Data loss after node restart with PDS
 Key: IGNITE-7723
 URL: https://issues.apache.org/jira/browse/IGNITE-7723
 Project: Ignite
  Issue Type: Bug
  Components: general, persistence
Affects Versions: 2.3
Reporter: Alexandr Kuramshin
 Attachments: IgnitePdsDataLossTest.java

Split-brain scenario with topology validator is used to convince possible data 
loss. The same results may be achieved on accidental network problems combined 
with node restart.

See the reproducer {{IgnitePdsDataLossTest}} for details.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-7634) Wrong NodeStoppingException on destroying cache

2018-02-06 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-7634:
--

 Summary: Wrong NodeStoppingException on destroying cache
 Key: IGNITE-7634
 URL: https://issues.apache.org/jira/browse/IGNITE-7634
 Project: Ignite
  Issue Type: Bug
  Components: cache
Affects Versions: 2.3
Reporter: Alexandr Kuramshin


Get multiple {{NodeStoppingException}} on concurrent cache operations actually 
meaning the cache destroying

{noformat}
Error during parallel index create/rebuild.
org.apache.ignite.internal.NodeStoppingException: Operation has been cancelled 
(node is stopping).
at 
org.apache.ignite.internal.processors.cache.query.GridCacheQueryManager.store(GridCacheQueryManager.java:393)
at 
org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing$RebuldIndexFromHashClosure.apply(IgniteH2Indexing.java:2635)
at 
org.apache.ignite.internal.processors.cache.GridCacheMapEntry.updateIndex(GridCacheMapEntry.java:3305)
at 
org.apache.ignite.internal.processors.query.schema.SchemaIndexCacheVisitorImpl.processKey(SchemaIndexCacheVisitorImpl.java:243)
at 
org.apache.ignite.internal.processors.query.schema.SchemaIndexCacheVisitorImpl.processPartition(SchemaIndexCacheVisitorImpl.java:206)
at 
org.apache.ignite.internal.processors.query.schema.SchemaIndexCacheVisitorImpl.processPartitions(SchemaIndexCacheVisitorImpl.java:165)
at 
org.apache.ignite.internal.processors.query.schema.SchemaIndexCacheVisitorImpl.access$100(SchemaIndexCacheVisitorImpl.java:50)
at 
org.apache.ignite.internal.processors.query.schema.SchemaIndexCacheVisitorImpl$AsyncWorker.body(SchemaIndexCacheVisitorImpl.java:316)
at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
at java.lang.Thread.run(Thread.java:745)
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-7633) Multiple errors on accessing page store while destroying cache

2018-02-06 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-7633:
--

 Summary: Multiple errors on accessing page store while destroying 
cache
 Key: IGNITE-7633
 URL: https://issues.apache.org/jira/browse/IGNITE-7633
 Project: Ignite
  Issue Type: Bug
Affects Versions: 2.3
Reporter: Alexandr Kuramshin


A single common exception
{noformat}
Partition eviction failed, this can cause grid hang.
org.apache.ignite.IgniteException: Failed to get page store for the given cache 
ID (cache has not been started): -1903385190
at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.destroyCacheDataStore(IgniteCacheOffheapManagerImpl.java:931)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition.destroyCacheDataStore(GridDhtLocalPartition.java:772)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition.finishDestroy(GridDhtLocalPartition.java:730)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition.clearEvicting(GridDhtLocalPartition.java:702)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition.tryEvict(GridDhtLocalPartition.java:762)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$3.call(GridDhtPreloader.java:593)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$3.call(GridDhtPreloader.java:580)
at 
org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:6639)
at 
org.apache.ignite.internal.processors.closure.GridClosureProcessor$2.body(GridClosureProcessor.java:967)
at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.ignite.IgniteCheckedException: Failed to get page store 
for the given cache ID (cache has not been started): -1903385190
at 
org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getStore(FilePageStoreManager.java:670)
at 
org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.onPartitionDestroyed(FilePageStoreManager.java:268)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager.destroyCacheDataStore0(GridCacheOffheapManager.java:494)
at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.destroyCacheDataStore(IgniteCacheOffheapManagerImpl.java:928)
... 12 common frames omitted
{noformat}
And multiple another for many pages
{noformat}
There was an exception while updating tracking page: 000119a20001
org.apache.ignite.IgniteCheckedException: Failed to get page store for the 
given cache ID (cache has not been started): -1903385190
at 
org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getStore(FilePageStoreManager.java:670)
at 
org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.read(FilePageStoreManager.java:290)
at 
org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.read(FilePageStoreManager.java:277)
at 
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.acquirePage(PageMemoryImpl.java:608)
at 
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.acquirePage(PageMemoryImpl.java:528)
at 
org.gridgain.grid.internal.processors.cache.database.GridCacheSnapshotManager.onChangeTrackerPage(GridCacheSnapshotManager.java:1921)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$9.applyx(GridCacheDatabaseSharedManager.java:966)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$9.applyx(GridCacheDatabaseSharedManager.java:959)
at 
org.apache.ignite.internal.util.lang.GridInClosure3X.apply(GridInClosure3X.java:34)
at 
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.writeUnlockPage(PageMemoryImpl.java:1274)
at 
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.writeUnlock(PageMemoryImpl.java:419)
at 
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.writeUnlock(PageMemoryImpl.java:413)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager.saveStoreMetadata(GridCacheOffheapManager.java:304)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager.destroyCacheDataStore0

[jira] [Created] (IGNITE-7632) NPE in IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.updateIgfsMetrics()

2018-02-05 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-7632:
--

 Summary: NPE in 
IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.updateIgfsMetrics()
 Key: IGNITE-7632
 URL: https://issues.apache.org/jira/browse/IGNITE-7632
 Project: Ignite
  Issue Type: Bug
  Components: cache
Affects Versions: 2.3
Reporter: Alexandr Kuramshin


Occurs on destroying cache while rebuilding indices in progress

{noformat}
Partition eviction failed, this can cause grid hang.
java.lang.NullPointerException: null
at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.updateIgfsMetrics(IgniteCacheOffheapManagerImpl.java:1576)
at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.finishRemove(IgniteCacheOffheapManagerImpl.java:1403)
at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.remove(IgniteCacheOffheapManagerImpl.java:1368)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.remove(GridCacheOffheapManager.java:1312)
at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.remove(IgniteCacheOffheapManagerImpl.java:368)
at 
org.apache.ignite.internal.processors.cache.GridCacheMapEntry.removeValue(GridCacheMapEntry.java:3224)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheEntry.clearInternal(GridDhtCacheEntry.java:588)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition.clearAll(GridDhtLocalPartition.java:895)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition.tryEvict(GridDhtLocalPartition.java:753)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$3.call(GridDhtPreloader.java:593)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$3.call(GridDhtPreloader.java:580)
at 
org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:6639)
at 
org.apache.ignite.internal.processors.closure.GridClosureProcessor$2.body(GridClosureProcessor.java:967)
...
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-7631) Failed to clear page memory with AssertionError: Release pinned page

2018-02-05 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-7631:
--

 Summary: Failed to clear page memory with AssertionError: Release 
pinned page
 Key: IGNITE-7631
 URL: https://issues.apache.org/jira/browse/IGNITE-7631
 Project: Ignite
  Issue Type: Bug
  Components: cache
Affects Versions: 2.3
Reporter: Alexandr Kuramshin


The following scenario produces a problem:

# Cluster was started and activated.
# Snapshot has been restored.
# Rebuilding indexes in progress.
# Caches destroyed.
# Multiple NPE exceptions occurs.
# The following exception occurs:

{noformat}
Failed to clear page memory
org.apache.ignite.IgniteCheckedException: Compound exception for 
CountDownFuture.
at 
org.apache.ignite.internal.util.future.CountDownFuture.addError(CountDownFuture.java:72)
at 
org.apache.ignite.internal.util.future.CountDownFuture.onDone(CountDownFuture.java:46)
at 
org.apache.ignite.internal.util.future.CountDownFuture.onDone(CountDownFuture.java:28)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:462)
at 
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl$ClearSegmentRunnable.run(PageMemoryImpl.java:2449)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Suppressed: java.lang.AssertionError: Release pinned page: FullPageId 
[pageId=000100f40007, effectivePageId=00f40007, grpId=321390040]
at 
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl$PagePool.releaseFreePage(PageMemoryImpl.java:1593)
at 
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl$PagePool.access$1900(PageMemoryImpl.java:1465)
at 
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl$ClearSegmentRunnable.run(PageMemoryImpl.java:2440)
... 3 common frames omitted
Suppressed: java.lang.AssertionError: Release pinned page: FullPageId 
[pageId=000200019986, effectivePageId=00019986, grpId=-1903385190]
at 
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl$PagePool.releaseFreePage(PageMemoryImpl.java:1593)
at 
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl$PagePool.access$1900(PageMemoryImpl.java:1465)
at 
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl$ClearSegmentRunnable.run(PageMemoryImpl.java:2440)
... 3 common frames omitted
Suppressed: java.lang.AssertionError: Release pinned page: FullPageId 
[pageId=0002c85c, effectivePageId=c85c, grpId=-1903385190]
at 
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl$PagePool.releaseFreePage(PageMemoryImpl.java:1593)
at 
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl$PagePool.access$1900(PageMemoryImpl.java:1465)
at 
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl$ClearSegmentRunnable.run(PageMemoryImpl.java:2440)
... 3 common frames omitted
Suppressed: java.lang.AssertionError: Release pinned page: FullPageId 
[pageId=000232da, effectivePageId=32da, grpId=321390040]
at 
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl$PagePool.releaseFreePage(PageMemoryImpl.java:1593)
at 
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl$PagePool.access$1900(PageMemoryImpl.java:1465)
at 
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl$ClearSegmentRunnable.run(PageMemoryImpl.java:2440)
... 3 common frames omitted
Suppressed: java.lang.AssertionError: Release pinned page: FullPageId 
[pageId=000200011d30, effectivePageId=00011d30, grpId=-1903385190]
at 
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl$PagePool.releaseFreePage(PageMemoryImpl.java:1593)
at 
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl$PagePool.access$1900(PageMemoryImpl.java:1465)
at 
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl$ClearSegmentRunnable.run(PageMemoryImpl.java:2440)
... 3 common frames omitted
Suppressed: java.lang.AssertionError: Release pinned page: FullPageId 
[pageId=0002d346, effectivePageId=d346, grpId=-1903385190

[jira] [Created] (IGNITE-7630) NPE in SchemaIndexCacheVisitorImpl.processKey()

2018-02-05 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-7630:
--

 Summary: NPE in SchemaIndexCacheVisitorImpl.processKey()
 Key: IGNITE-7630
 URL: https://issues.apache.org/jira/browse/IGNITE-7630
 Project: Ignite
  Issue Type: Bug
Affects Versions: 2.3
Reporter: Alexandr Kuramshin


Occurs after destroying cache while rebuilding indices in progress

{noformat}
[Thread] parallel-idx-worker-GridDhtColocatedCache [...]
[Emitter] o.a.i.i.p.q.s.SchemaIndexCacheVisitorImpl$AsyncWorker
[Message]  Error during parallel index create/rebuild.
java.lang.NullPointerException: null
at 
org.apache.ignite.internal.processors.query.schema.SchemaIndexCacheVisitorImpl.processKey(SchemaIndexCacheVisitorImpl.java:246)
at 
org.apache.ignite.internal.processors.query.schema.SchemaIndexCacheVisitorImpl.processPartition(SchemaIndexCacheVisitorImpl.java:206)
at 
org.apache.ignite.internal.processors.query.schema.SchemaIndexCacheVisitorImpl.processPartitions(SchemaIndexCacheVisitorImpl.java:165)
at 
org.apache.ignite.internal.processors.query.schema.SchemaIndexCacheVisitorImpl.access$100(SchemaIndexCacheVisitorImpl.java:50)
at 
org.apache.ignite.internal.processors.query.schema.SchemaIndexCacheVisitorImpl$AsyncWorker.body(SchemaIndexCacheVisitorImpl.java:316)
at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
at java.lang.Thread.run(Thread.java:745)
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-7629) NPE when Finished indexes rebuilding for cache

2018-02-05 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-7629:
--

 Summary: NPE when Finished indexes rebuilding for cache
 Key: IGNITE-7629
 URL: https://issues.apache.org/jira/browse/IGNITE-7629
 Project: Ignite
  Issue Type: Bug
Affects Versions: 2.3
Reporter: Alexandr Kuramshin


Occurs after destroying cache while rebuilding indices in progress

{noformat}
Runtime error caught during grid runnable execution: GridWorker 
[name=index-rebuild-worker, igniteInstanceName=DPL_GRID%DplGridNodeName, 
finished=false, hashCode=1940633631, interrupted=false, 
runner=pub-#2054%DPL_GRID%DplGridNodeName%]
java.lang.NullPointerException: null
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$11.apply(GridCacheDatabaseSharedManager.java:1163)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$11.apply(GridCacheDatabaseSharedManager.java:1159)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:383)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.unblock(GridFutureAdapter.java:347)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.unblockAll(GridFutureAdapter.java:335)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:495)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:474)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:462)
at 
org.apache.ignite.internal.util.future.GridCompoundFuture.apply(GridCompoundFuture.java:125)
at 
org.apache.ignite.internal.util.future.GridCompoundFuture.apply(GridCompoundFuture.java:45)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:383)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.unblock(GridFutureAdapter.java:347)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.unblockAll(GridFutureAdapter.java:335)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:495)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:474)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:462)
at 
org.apache.ignite.internal.processors.query.GridQueryProcessor$3.body(GridQueryProcessor.java:1678)
...
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-7579) NPE in GridDhtLocalPartition.cacheMapHolder()

2018-01-30 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-7579:
--

 Summary: NPE in GridDhtLocalPartition.cacheMapHolder()
 Key: IGNITE-7579
 URL: https://issues.apache.org/jira/browse/IGNITE-7579
 Project: Ignite
  Issue Type: Bug
Affects Versions: 2.3
Reporter: Alexandr Kuramshin


The following scenario may occurs:
 # Multiple nodes construct an inactive cluster.
 # Cluster activation performed.
 # Some nodes fail activation.
 # On the other nodes caches will be stopped.
 # NPE occurs as a consequence of {{GridDhtPreloader.evictPartitionAsync()}}
{noformat}
Partition eviction failed, this can cause grid hang.
java.lang.NullPointerException: null
    at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition.cacheMapHolder(GridDhtLocalPartition.java:253)
    at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition.clearAll(GridDhtLocalPartition.java:880)
    at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition.tryEvict(GridDhtLocalPartition.java:753)
    at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$3.call(GridDhtPreloader.java:593)
    at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$3.call(GridDhtPreloader.java:580)
    at 
org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:6639)
    at 
org.apache.ignite.internal.processors.closure.GridClosureProcessor$2.body(GridClosureProcessor.java:967)
{noformat}
# Drop failed nodes from the cluster.
# The latter activation will be successful.
# PDS seems to be corrupted by the cause of NPE.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-7383) Failed to restore memory after cluster restart and activating from outdated node

2018-01-10 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-7383:
--

 Summary: Failed to restore memory after cluster restart and 
activating from outdated node
 Key: IGNITE-7383
 URL: https://issues.apache.org/jira/browse/IGNITE-7383
 Project: Ignite
  Issue Type: Bug
  Components: persistence
Affects Versions: 2.3
Reporter: Alexandr Kuramshin


Do the following steps for reproducing the problem:

1) start nodes 0-1-2

2) stop node 2

3) create a new cache and put some data into it

4) stop remaining nodes 0-1

5) start nodes 0-1-2

6) activate the cluster from the node 2

Then 2 different results could be taken depending on which node is coordinator:

a) node 2 is a coordinator:

{noformat}
Failed to activate node components 
[nodeId=42d762c7-b1e0-4283-939b-aeeb3c70, client=false, 
topVer=AffinityTopologyVersion [topVer=3, minorTopVer=1]]
class org.apache.ignite.IgniteCheckedException: Failed to find cache group 
descriptor [grpId=3119]
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.getPageMemoryForCacheGroup(GridCacheDatabaseSharedManager.java:1602)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restoreMemory(GridCacheDatabaseSharedManager.java:1544)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readCheckpointAndRestoreMemory(GridCacheDatabaseSharedManager.java:570)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onClusterStateChangeRequest(GridDhtPartitionsExchangeFuture.java:820)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:583)
at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2279)
at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
at java.lang.Thread.run(Thread.java:748)
{noformat}

and activation will be failed.

b) node 2 is NOT a coordinator:

we will get an error from the previous version, but the activation process will 
not be failed and then we will take "Failed to wait PME" after a number of 
assertions

{noformat}
Failed to process message [senderId=a940742f-bf17-41b4-bfc2-728bee72, 
messageType=class 
o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionsSingleMessage]
java.lang.AssertionError: -2100569601
at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.clientTopology(GridCachePartitionExchangeManager.java:733)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.updatePartitionSingleMap(GridDhtPartitionsExchangeFuture.java:2877)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.processSingleMessage(GridDhtPartitionsExchangeFuture.java:1935)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.access$100(GridDhtPartitionsExchangeFuture.java:116)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$2.apply(GridDhtPartitionsExchangeFuture.java:1810)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$2.apply(GridDhtPartitionsExchangeFuture.java:1798)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:383)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.listen(GridFutureAdapter.java:353)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onReceiveSingleMessage(GridDhtPartitionsExchangeFuture.java:1798)
at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.processSinglePartitionUpdate(GridCachePartitionExchangeManager.java:1484)
at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.access$1000(GridCachePartitionExchangeManager.java:131)
at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$2.onMessage(GridCachePartitionExchangeManager.java:327)
at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$2.onMessage(GridCachePartitionExchangeManager.java:307)
at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$MessageHandler.apply(GridCachePartitionExchangeManager.java:2627)
at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$MessageHandler.apply(GridCachePartitionExchangeManager

Re: IGNITE-7135 needs review

2017-12-27 Thread Alexandr Kuramshin
Ticket was assigned to me until the patch available state has been reached.

Till now no one committer take responsibility to review and merge the PR.

2017-12-26 21:08 GMT+07:00 Dmitriy Setrakyan <dsetrak...@apache.org>:

> Here is the link to the ticket:
> https://issues.apache.org/jira/browse/IGNITE-7135
>
> For some odd reason, the ticket is in unassigned state. Alexander,
> shouldn't it be assigned to you?
>
> D.
>
> On Mon, Dec 25, 2017 at 11:58 PM, Alexandr Kuramshin <ein.nsk...@gmail.com
> >
> wrote:
>
> > Hello community!
> >
> > I've implemented IGNITE-7135 doing two improvements:
> >
> > 1) control remote node startup (successful or not) through
> > IgniteCluster.startNodes();
> >
> > 2) keep the first Java principle working "Compile once, run everywhere" -
> > from now running remotely on Windows also supported.
> >
> > Committers, please review.
> >
> > --
> > Thanks,
> > Alexandr Kuramshin
> >
>



-- 
Thanks,
Alexandr Kuramshin


IGNITE-7135 needs review

2017-12-25 Thread Alexandr Kuramshin
Hello community!

I've implemented IGNITE-7135 doing two improvements:

1) control remote node startup (successful or not) through
IgniteCluster.startNodes();

2) keep the first Java principle working "Compile once, run everywhere" -
from now running remotely on Windows also supported.

Committers, please review.

-- 
Thanks,
Alexandr Kuramshin


[jira] [Created] (IGNITE-7163) Validate connection from a pre-previous node

2017-12-11 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-7163:
--

 Summary: Validate connection from a pre-previous node
 Key: IGNITE-7163
 URL: https://issues.apache.org/jira/browse/IGNITE-7163
 Project: Ignite
  Issue Type: Sub-task
Affects Versions: 2.3
Reporter: Alexandr Kuramshin
Assignee: Alexandr Kuramshin


If some pre-previous node connects to the local node with the previous node in 
the message's failed nodes collection additional steps should be done:

# Connection with the previous node should be validated.
# If a message from the previous node was not received a long time ago, the 
previous node should be considered as failed and the pre-previous node 
connection accepted.
# If the previous node connection is alive then different scenarios possible
## Answer with a new result code causing the pre-previous node to try to 
reconnect to the previous node
## Break connection with the pre-previous node causing to continue the possible 
cluster split.
## Check connections with nodes after pre-previous node and delay decision by 
answering RES_WAIT to get more predictable split and stable topology.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-7162) Control discovery messages processing time

2017-12-11 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-7162:
--

 Summary: Control discovery messages processing time
 Key: IGNITE-7162
 URL: https://issues.apache.org/jira/browse/IGNITE-7162
 Project: Ignite
  Issue Type: Sub-task
  Components: general
Affects Versions: 2.3
Reporter: Alexandr Kuramshin
Assignee: Alexandr Kuramshin


The majority of discovery message processing occurs in a single thread.

If some message processing takes significant time it causes delaying of 
processing other messages and further undesirable effects on another protocols.

Proposed to control processing time on the every node and total processing time 
of any given message. If processing takes significant time - log the warning.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-7161) Detect self-freeze on remote node related operations with timeout

2017-12-11 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-7161:
--

 Summary: Detect self-freeze on remote node related operations with 
timeout
 Key: IGNITE-7161
 URL: https://issues.apache.org/jira/browse/IGNITE-7161
 Project: Ignite
  Issue Type: Sub-task
Affects Versions: 2.3
Reporter: Alexandr Kuramshin
Assignee: Alexandr Kuramshin


After getting next timeout from 
{{IgniteSpiOperationTimeoutHelper.nextTimeoutChunk()}} we starting a network 
operation and expecting to end it at the specific timestamp (or near about).

We should take into account that some local thread freeze may be occurred. In 
such situation a remote node should not be considered as failed and the local 
network operation has to be retried.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-7160) Ignore messages from not alive and failed nodes

2017-12-11 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-7160:
--

 Summary: Ignore messages from not alive and failed nodes
 Key: IGNITE-7160
 URL: https://issues.apache.org/jira/browse/IGNITE-7160
 Project: Ignite
  Issue Type: Sub-task
  Components: general
Affects Versions: 2.3
Reporter: Alexandr Kuramshin
Assignee: Alexandr Kuramshin


Current implementation of {{ServerImpl}} accepts and processes messages from 
any other remote node even it was failed or removed from the ring.

Proposed to process only specific messages (which have to be processed in the 
current node state). Some messages could be silently ignored, receiving other 
undesirable messages causes the remote socket disconnect.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-7158) TCP discovery improvement

2017-12-11 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-7158:
--

 Summary: TCP discovery improvement
 Key: IGNITE-7158
 URL: https://issues.apache.org/jira/browse/IGNITE-7158
 Project: Ignite
  Issue Type: Improvement
  Components: general
Affects Versions: 2.3
Reporter: Alexandr Kuramshin
Assignee: Alexandr Kuramshin


Current TCP discovery implementation has different drawbacks which should be 
fixed.

See sub-tasks for details.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-7152) Failure detection timeout don't work on permanent send message errors causing infinite loop

2017-12-08 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-7152:
--

 Summary: Failure detection timeout don't work on permanent send 
message errors causing infinite loop
 Key: IGNITE-7152
 URL: https://issues.apache.org/jira/browse/IGNITE-7152
 Project: Ignite
  Issue Type: Bug
  Components: general
Affects Versions: 2.3
Reporter: Alexandr Kuramshin
Priority: Critical
 Fix For: 2.4


Relative to {{RingMessageWorker.sendMessageAcrossRing}} implementation.

{{IgniteSpiOperationTimeoutHelper}} reinitialized every time the socket 
successfully connected.

If any of {{IOException, IgniteCheckedException}} occurs upon message send the 
socket will be closed and old {{IgniteSpiOperationTimeoutHelper}} will be used 
to reconnect.

But after successful reconnect the new one will be created and the cycle 
repeat. With a permanent send message error this causes an infinite loop.

The only send error which may cause to exit out of the loop and the next node 
failure is {{IgniteSpiOperationTimeoutException, SocketTimeoutException, 
SocketException}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-7135) IgniteCluster.startNodes() returns successful ClusterStartNodeResult even though the remote process fails

2017-12-07 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-7135:
--

 Summary: IgniteCluster.startNodes() returns successful 
ClusterStartNodeResult even though the remote process fails
 Key: IGNITE-7135
 URL: https://issues.apache.org/jira/browse/IGNITE-7135
 Project: Ignite
  Issue Type: Bug
Affects Versions: 2.3
Reporter: Alexandr Kuramshin
 Fix For: 2.4


After unsuccessful start of three remote nodes with 
{{IgniteCluster#startNodes(Collection<Map<String,Object>>, Map<String,Object>, 
boolean, int, int)}} we get {{Collection}} with three 
elements, each has {{isSuccess()}} is true.

But the remote node startup log was
{noformat}
nohup: ignoring input
/data/teamcity/work/820be461cd64b574/bin/ignite.sh, ERROR:
The version of JAVA installed in JAVA_HOME=/usr/lib/jvm/java-9-oracle is 
incorrect.
Please point JAVA_HOME variable to installation of JDK 1.7 or JDK 1.8.
You can also download latest JDK at http://java.com/download
{noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-7134) Never-ending timeout in IgniteSpiOperationTimeoutHelper.nextTimeoutChunk()

2017-12-07 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-7134:
--

 Summary: Never-ending timeout in 
IgniteSpiOperationTimeoutHelper.nextTimeoutChunk()
 Key: IGNITE-7134
 URL: https://issues.apache.org/jira/browse/IGNITE-7134
 Project: Ignite
  Issue Type: Bug
  Components: general
Affects Versions: 2.3
Reporter: Alexandr Kuramshin
Priority: Critical
 Fix For: 2.4


{noformat}
org.apache.ignite.spi.IgniteSpiOperationTimeoutHelper#nextTimeoutChunk

long curTs = U.currentTimeMillis();

timeout = timeout - (curTs - lastOperStartTs);
{noformat}

Timeout will not be decreased at all if delay between successive calls to 
nextTimeoutChunk() is smaller than U.currentTimeMillis() discretization. Such 
behaviour could be easily achieved when getting an error right after the 
nextTimeoutChunk() invocation and do the retry.

Only rare calls (the first right before U.currentTimeMillis() and the second 
right after that) may decrease timeout, so actual 
IgniteSpiOperationTimeoutHelper timeout could be much bigger than the 
failureDetectionTimeout.

My opinion to not split failureDetectionTimeout between network operations, but 
initialize first operation timestamp at first call to nextTimeoutChunk(), and 
then calculate the timeout as a difference between the current timestamp and 
the first operation timestamp.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-6967) PME deadlock on reassigning service deployment

2017-11-20 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-6967:
--

 Summary: PME deadlock on reassigning service deployment
 Key: IGNITE-6967
 URL: https://issues.apache.org/jira/browse/IGNITE-6967
 Project: Ignite
  Issue Type: Bug
  Components: general
Affects Versions: 2.3
Reporter: Alexandr Kuramshin


With a service deployment when topology change occurs the discovery event 
listener calls {{GridServiceProcessor.reassign()}} causing to acquire a lock on 
utility cache (where the GridServiceAssignments stored) which prevents PME from 
completion.

Stack traces:

{{noformat}}
Thread [name="test-runner-#186%service.IgniteServiceDynamicCachesSelfTest%", 
id=232, state=WAITING, blockCnt=0, waitCnt=8]
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
at 
o.a.i.i.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:177)
at o.a.i.i.util.future.GridFutureAdapter.get(GridFutureAdapter.java:140)
at o.a.i.i.IgniteKernal.createCache(IgniteKernal.java:2841)
at 
o.a.i.i.processors.service.IgniteServiceDynamicCachesSelfTest.testDeployCalledBeforeCacheStart(IgniteServiceDynamicCachesSelfTest.java:140)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at junit.framework.TestCase.runTest(TestCase.java:176)
at 
o.a.i.testframework.junits.GridAbstractTest.runTestInternal(GridAbstractTest.java:2000)
at 
o.a.i.testframework.junits.GridAbstractTest.access$000(GridAbstractTest.java:132)
at 
o.a.i.testframework.junits.GridAbstractTest$5.run(GridAbstractTest.java:1915)
at java.lang.Thread.run(Thread.java:748)

Thread [name="srvc-deploy-#38%service.IgniteServiceDynamicCachesSelfTest0%", 
id=56, state=WAITING, blockCnt=5, waitCnt=9]
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
at 
o.a.i.i.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:177)
at o.a.i.i.util.future.GridFutureAdapter.get(GridFutureAdapter.java:140)
at 
o.a.i.i.processors.cache.GridCacheContext.awaitStarted(GridCacheContext.java:443)
at 
o.a.i.i.processors.affinity.GridAffinityProcessor.affinityCache(GridAffinityProcessor.java:373)
at 
o.a.i.i.processors.affinity.GridAffinityProcessor.keysToNodes(GridAffinityProcessor.java:347)
at 
o.a.i.i.processors.affinity.GridAffinityProcessor.mapKeyToNode(GridAffinityProcessor.java:259)
at 
o.a.i.i.processors.service.GridServiceProcessor.reassign(GridServiceProcessor.java:1163)
at 
o.a.i.i.processors.service.GridServiceProcessor.access$2400(GridServiceProcessor.java:123)
at 
o.a.i.i.processors.service.GridServiceProcessor$TopologyListener$1.run0(GridServiceProcessor.java:1763)
at 
o.a.i.i.processors.service.GridServiceProcessor$DepRunnable.run(GridServiceProcessor.java:1976)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)

Locked synchronizers:
java.util.concurrent.ThreadPoolExecutor$Worker@27f723
{{noformat}}

Problematic code:
{{noformat}}
org.apache.ignite.internal.processors.service.GridServiceProcessor#reassign

try (GridNearTxLocal tx = cache.txStartEx(PESSIMISTIC, 
REPEATABLE_READ)) {
GridServiceAssignmentsKey key = new 
GridServiceAssignmentsKey(cfg.getName());

GridServiceAssignments oldAssigns = 
(GridServiceAssignments)cache.get(key);

Map<UUID, Integer> cnts = new HashMap<>();

if (affKey != null) {
ClusterNode n = ctx.affinity().mapKeyToNode(cacheName, 
affKey, topVer);

// WAIT HERE UNTIL PME FINISHED (INFINITELY)
{{noformat}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-6965) affinityCall() with key mapping may not be successful with AlwaysFailoverSpi when node left

2017-11-20 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-6965:
--

 Summary: affinityCall() with key mapping may not be successful 
with AlwaysFailoverSpi when node left
 Key: IGNITE-6965
 URL: https://issues.apache.org/jira/browse/IGNITE-6965
 Project: Ignite
  Issue Type: Bug
  Components: cache, compute
Affects Versions: 2.3
Reporter: Alexandr Kuramshin


When doing {{affinityCall(cacheName, key, callable)}} there is a race between 
affinity node left then stopped and {{AlwaysFailoverSpi}} max attempts reached.

Suppose the following sequence (more probable when {{grid2.order}} >> 
{{grid1.order}}):

1. {{grid1.affinityCall(cacheName, key, callable)}}
2. {{grid1}}: {{key}} mapped to the primary partition on {{grid2}}
3. {{grid2.stop()}}
4. {{grid1}} receives {{NODE_LEFT}} and updates {{discoCache}}
5. {{grid1}} execution {{callable}} failed with 'Failed to send job request 
because remote node left grid (if fail-over is enabled, will attempt fail-over 
to another node'
6. {{grid1}}: {{AlwaysFailoverSpi}} max attempts reached.
7. {{grid1.affinityCall}} failed with 'Job failover failed because number of 
maximum failover attempts for affinity call is exceeded'
8. {{grid2}} receives verified node left message then stopping.

The patched {{CacheAffinityCallSelfTest}} reproduces the problem.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-6860) Lack of context information upon serializing and marshalling (writeObject and writeFields)

2017-11-10 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-6860:
--

 Summary: Lack of context information upon serializing and 
marshalling (writeObject and writeFields)
 Key: IGNITE-6860
 URL: https://issues.apache.org/jira/browse/IGNITE-6860
 Project: Ignite
  Issue Type: Bug
  Security Level: Public (Viewable by anyone)
  Components: general
Affects Versions: 2.3
Reporter: Alexandr Kuramshin
 Fix For: 2.4


Having the stack trace

{noformat}
Caused by: org.apache.ignite.binary.BinaryObjectException: Failed to marshal 
object with optimized marshaller: 
[org.apache.logging.log4j.core.config.AppenderControl@302e61a8]
at 
org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal0(BinaryWriterExImpl.java:186)
at 
org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal(BinaryWriterExImpl.java:147)
at 
org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal(BinaryWriterExImpl.java:134)
at 
org.apache.ignite.internal.binary.BinaryWriterExImpl.doWriteObject(BinaryWriterExImpl.java:496)
at 
org.apache.ignite.internal.binary.BinaryWriterExImpl.writeObjectField(BinaryWriterExImpl.java:1160)
at 
org.apache.ignite.internal.binary.BinaryFieldAccessor$DefaultFinalClassAccessor.write(BinaryFieldAccessor.java:663)
at 
org.apache.ignite.internal.binary.BinaryClassDescriptor.write(BinaryClassDescriptor.java:793)
at 
org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal0(BinaryWriterExImpl.java:206)
at 
org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal(BinaryWriterExImpl.java:147)
at 
org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal(BinaryWriterExImpl.java:134)
at 
org.apache.ignite.internal.binary.BinaryWriterExImpl.doWriteObject(BinaryWriterExImpl.java:496)
at 
org.apache.ignite.internal.binary.BinaryWriterExImpl.writeObjectField(BinaryWriterExImpl.java:1160)
at 
org.apache.ignite.internal.binary.BinaryFieldAccessor$DefaultFinalClassAccessor.write(BinaryFieldAccessor.java:663)
at 
org.apache.ignite.internal.binary.BinaryClassDescriptor.write(BinaryClassDescriptor.java:793)
at 
org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal0(BinaryWriterExImpl.java:206)
at 
org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal(BinaryWriterExImpl.java:147)
at 
org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal(BinaryWriterExImpl.java:134)
at 
org.apache.ignite.internal.binary.BinaryWriterExImpl.doWriteObject(BinaryWriterExImpl.java:496)
at 
org.apache.ignite.internal.binary.BinaryWriterExImpl.writeObjectField(BinaryWriterExImpl.java:1160)
at 
org.apache.ignite.internal.binary.BinaryFieldAccessor$DefaultFinalClassAccessor.write(BinaryFieldAccessor.java:663)
at 
org.apache.ignite.internal.binary.BinaryClassDescriptor.write(BinaryClassDescriptor.java:793)
at 
org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal0(BinaryWriterExImpl.java:206)
at 
org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal(BinaryWriterExImpl.java:147)
at 
org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal(BinaryWriterExImpl.java:134)
at 
org.apache.ignite.internal.binary.BinaryWriterExImpl.doWriteObject(BinaryWriterExImpl.java:496)
at 
org.apache.ignite.internal.binary.BinaryWriterExImpl.writeObjectField(BinaryWriterExImpl.java:1160)
at 
org.apache.ignite.internal.binary.BinaryFieldAccessor$DefaultFinalClassAccessor.write(BinaryFieldAccessor.java:663)
at 
org.apache.ignite.internal.binary.BinaryClassDescriptor.write(BinaryClassDescriptor.java:793)
at 
org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal0(BinaryWriterExImpl.java:206)
at 
org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal(BinaryWriterExImpl.java:147)
at 
org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal(BinaryWriterExImpl.java:134)
at 
org.apache.ignite.internal.binary.BinaryWriterExImpl.doWriteObject(BinaryWriterExImpl.java:496)
at 
org.apache.ignite.internal.binary.BinaryWriterExImpl.writeObjectField(BinaryWriterExImpl.java:1160)
at 
org.apache.ignite.internal.binary.BinaryFieldAccessor$DefaultFinalClassAccessor.write(BinaryFieldAccessor.java:663)
at 
org.apache.ignite.internal.binary.BinaryClassDescriptor.write(BinaryClassDescriptor.java:793)
at 
org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal0(BinaryWriterExImpl.java:206)
at 
org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal(BinaryWriterExImpl.java:147)
at 
org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal(BinaryWriterExImpl.java:134)
at 
org.apache.ignite.internal.binary.BinaryWriterExImpl.doWriteObject(BinaryWriterExImpl.java:496

[jira] [Created] (IGNITE-6858) Wait for exchange inside GridReduceQueryExecutor.query which never finishes due to opened transaction

2017-11-09 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-6858:
--

 Summary: Wait for exchange inside GridReduceQueryExecutor.query 
which never finishes due to opened transaction
 Key: IGNITE-6858
 URL: https://issues.apache.org/jira/browse/IGNITE-6858
 Project: Ignite
  Issue Type: Bug
  Security Level: Public (Viewable by anyone)
  Components: sql
Affects Versions: 2.3
Reporter: Alexandr Kuramshin
Assignee: Vladimir Ozerov
 Fix For: 2.4


Infinite waiting in loop

{noformat}
for (int attempt = 0;; attempt++) {
if (attempt != 0) {
try {
Thread.sleep(attempt * 10); // Wait for exchange.
}
catch (InterruptedException e) {
Thread.currentThread().interrupt();

throw new CacheException("Query was interrupted.", e);
}
}
{noformat}

because of exchange will wait for partition eviction with opened transaction in 
a related thread

{noformat}
at java.lang.Thread.sleep(Native Method)
at 
o.a.i.i.processors.query.h2.twostep.GridReduceQueryExecutor.query(GridReduceQueryExecutor.java:546)
at 
o.a.i.i.processors.query.h2.IgniteH2Indexing$8.iterator(IgniteH2Indexing.java:1236)
at 
o.a.i.i.processors.cache.QueryCursorImpl.iterator(QueryCursorImpl.java:95)
{noformat}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-6636) BinaryStream position integer overflow

2017-10-16 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-6636:
--

 Summary: BinaryStream position integer overflow
 Key: IGNITE-6636
 URL: https://issues.apache.org/jira/browse/IGNITE-6636
 Project: Ignite
  Issue Type: Bug
  Security Level: Public (Viewable by anyone)
  Components: general
Affects Versions: 2.2
Reporter: Alexandr Kuramshin


There were some issues with negative {{BinaryAbstractStream#pos}} value.

We may get stack trace like that
{noformat}
java.lang.ArrayIndexOutOfBoundsException: -2142240123
at 
org.apache.ignite.internal.binary.streams.BinaryHeapOutputStream.writeByteAndShift(BinaryHeapOutputStream.java)
at 
org.apache.ignite.internal.binary.streams.BinaryAbstractOutputStream.writeByte(BinaryAbstractOutputStream.java)
at 
org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal0(BinaryWriterExImpl.java)
at 
org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal(BinaryWriterExImpl.java)
{noformat}

The worst of it is that the {{ArrayIndexOutOfBoundsException}} has been thrown 
on the next write to the stream, and upon stack unwinding we couldn't know 
which object actually cause the overflow.

I've to suggest to check all updates to the {{BinaryAbstractStream#pos}} and 
throw exception right after the change.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-6536) NPE on registerClassName() with MappedName

2017-10-02 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-6536:
--

 Summary: NPE on registerClassName() with MappedName
 Key: IGNITE-6536
 URL: https://issues.apache.org/jira/browse/IGNITE-6536
 Project: Ignite
  Issue Type: Bug
  Components: binary
Affects Versions: 2.1
Reporter: Alexandr Kuramshin
 Fix For: None


{{NullPointerException}} occurs in 
{{org.apache.ignite.internal.MarshallerContextImpl#registerClassName}} on 
trying to compare {{mappedName.className()}} of already existed {{typeId}} 
mapping with the new one {{clsName}} has come as a parameter.

Actually 
{{org.apache.ignite.internal.processors.marshaller.MappedName#className}} may 
not be null but it was. So we should check {{clsName}} comes in {{MappedName}} 
constructor, to prevent same NPEs in the future.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-6521) Review default JVM options for better performance

2017-09-28 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-6521:
--

 Summary: Review default JVM options for better performance
 Key: IGNITE-6521
 URL: https://issues.apache.org/jira/browse/IGNITE-6521
 Project: Ignite
  Issue Type: Improvement
  Components: general, visor
Affects Versions: 2.1
Reporter: Alexandr Kuramshin
Assignee: Alexandr Kuramshin


Non-optimal recommendations are present in ignite startup scrips

{noformat}
::
:: Uncomment the following GC settings if you see spikes in your throughput due 
to Garbage Collection.
::
:: set JVM_OPTS=%JVM_OPTS% -XX:+UseParNewGC -XX:+UseConcMarkSweepGC 
-XX:+UseTLAB -XX:NewSize=128m -XX:MaxNewSize=128m
:: set JVM_OPTS=%JVM_OPTS% -XX:MaxTenuringThreshold=0 -XX:SurvivorRatio=1024 
-XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=60
{noformat}

Some utilities (like Visor) are hanged up in continuous GCs when connected to 
large clusters (above one hundred nodes). Even after using large heap (about 32 
Gb).

I'd like to propose to remove this lines and modify default JVM_OPTS as follows

{noformat}
set JVM_OPTS=-Xms1g -Xmx8g -XX:+UseG1GC -server -XX:+AggressiveOpts 
-XX:MaxPermSize=256m
{noformat}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-6519) Race in SplitAwareTopologyValidator on activator and server node join

2017-09-28 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-6519:
--

 Summary: Race in SplitAwareTopologyValidator on activator and 
server node join
 Key: IGNITE-6519
 URL: https://issues.apache.org/jira/browse/IGNITE-6519
 Project: Ignite
  Issue Type: Bug
  Components: cache
Affects Versions: 2.1
Reporter: Alexandr Kuramshin
Assignee: Alexandr Kuramshin


The following sequence may occur:

1. {{SplitAwareTopologyValidator}} detects split, gets {{NOTVALID}} and returns 
false from {{validate()}}

2. Activator node joins and {{SplitAwareTopologyValidator}} gets {{REPAIRED}}

3. Server node joins from other DC and it makes {{SplitAwareTopologyValidator}} 
gets {{VALID}}

4. Then the server node left the cluster and {{SplitAwareTopologyValidator}} 
should return false from {{validate()}} in cause of next split

But current implementation makes {{SplitAwareTopologyValidator}} 
auto-{{REPAIRED}}. Actually if the activator node will being forgotten to leave 
the cluster it may automatically repair a split many times. But it supposed to 
be manual operation.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-6499) Compact NULL fields binary representation

2017-09-26 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-6499:
--

 Summary: Compact NULL fields binary representation
 Key: IGNITE-6499
 URL: https://issues.apache.org/jira/browse/IGNITE-6499
 Project: Ignite
  Issue Type: Improvement
  Components: binary
Affects Versions: 2.1
Reporter: Alexandr Kuramshin
Assignee: Vladimir Ozerov


Current compact footer implementation writes offset for the every field in 
schema. Depending on serialized size of an object offset may be 1, 2 or 4 bytes.

Imagine an object with some 100 fields are null. It takes from 100 to 400 bytes 
overhead. For middle-sized objects (about 260 bytes) it doubles the memory 
usage. For a small-sized objects (about 40 bytes) the memory usage increased by 
factor 3 or 4.

Proposed two optimizations, the both should be implemented, the most optimal 
implementation should be selected dynamically upon object marshalling.

1. Write field ID and offset for the only non-null fields in footer.

2. Write footer header then field offsets for the only non-null fields as 
follows

[0] bit mask for first 8 fields, 0 - field is null, 1 - field is non-null
[1] cumulative sum of "1" bits
[2] bit mask for the next 8 fields
[3] cumulative sum of "1" bits
... and so on
[N1...N2] offset of first non-null field
[N3...N4] offset of next non-null field
... and so on

If we want to read fields from 0 to 7, then we read first footer byte, step 
through bits and find the offset index for non-null field or find that field is 
null.

If we want to read fields from 8, then we read two footer bytes, take start 
offset from the first byte, and then step through bits and find the offset 
index for non-null field or find that field is null.

This supports up to 255 non-null fields per binary object.

Overhead would be only 24 bytes per 100 null fields instead of 200 bytes for 
the middle-sized object.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-6491) Race in TopologyValidator.validate() and EVT_NODE_LEFT listener calls (split-brain activator)

2017-09-25 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-6491:
--

 Summary: Race in TopologyValidator.validate() and EVT_NODE_LEFT 
listener calls (split-brain activator)
 Key: IGNITE-6491
 URL: https://issues.apache.org/jira/browse/IGNITE-6491
 Project: Ignite
  Issue Type: Bug
  Components: cache, general
Affects Versions: 2.1
Reporter: Alexandr Kuramshin
Assignee: Alexandr Kuramshin
 Fix For: 2.2


The following wrong cache {{validate}}/{{put}} sequence may occur

On node left {{GridDhtPartitionsExchangeFuture}} will be generated by the 
{{disco-event-worker}} thread.

Then the {{exchange-worker}} thread does

{noformat}
Split-brain detected [cacheName=test40, activatorTopVer=0, cacheTopVer=14]
at 
org.apache.ignite.internal.util.IgniteUtils.dumpStack(IgniteUtils.java:1141)
at 
org.apache.ignite.internal.processors.cache.IgniteTopologyValidatorGridSplitCacheTest$SplitAwareTopologyValidator.validate(IgniteTopologyValidatorGridSplitCacheTest.java:307)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTopologyFutureAdapter.validateCacheGroup(GridDhtTopologyFutureAdapter.java:64)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onDone(GridDhtPartitionsExchangeFuture.java:1456)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onDone(GridDhtPartitionsExchangeFuture.java:115)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:450)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:668)
at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2278)
{noformat}

The result of validation is stored in {{grpValidRes}} with value of {{false}}.

After some delay the {{disco-event-worker}} thread will do

{noformat}
java.lang.Exception: Node is segment activator [cacheName=test40, 
activatorTopVer=14]
at 
org.apache.ignite.internal.util.IgniteUtils.dumpStack(IgniteUtils.java:1141)
at 
org.apache.ignite.internal.processors.cache.IgniteTopologyValidatorGridSplitCacheTest$SplitAwareTopologyValidator$2.apply(IgniteTopologyValidatorGridSplitCacheTest.java:360)
at 
org.apache.ignite.internal.processors.cache.IgniteTopologyValidatorGridSplitCacheTest$SplitAwareTopologyValidator$2.apply(IgniteTopologyValidatorGridSplitCacheTest.java:349)
at 
org.apache.ignite.internal.managers.eventstorage.GridEventStorageManager$UserListenerWrapper.onEvent(GridEventStorageManager.java:1463)
at 
org.apache.ignite.internal.managers.eventstorage.GridEventStorageManager.notifyListeners(GridEventStorageManager.java:859)
at 
org.apache.ignite.internal.managers.eventstorage.GridEventStorageManager.notifyListeners(GridEventStorageManager.java:844)
at 
org.apache.ignite.internal.managers.eventstorage.GridEventStorageManager.record0(GridEventStorageManager.java:341)
at 
org.apache.ignite.internal.managers.eventstorage.GridEventStorageManager.record(GridEventStorageManager.java:307)
at 
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryWorker.recordEvent(GridDiscoveryManager.java:2478)
at 
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryWorker.body0(GridDiscoveryManager.java:2684)
at 
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryWorker.body(GridDiscoveryManager.java:2507)
{noformat}

After this invocation the result of {{SplitAwareTopologyValidator.validate}} 
should be changed to {{true}}, but it was already invoked and the result has 
been cached in {{grpValidRes}} with the value of {{false}}.

So any successive calls to {{cache.put}} causes to fail

{noformat}
Test failed.
java.lang.RuntimeException: tryPut() failed 
[gridName=cache.IgniteTopologyValidatorGridSplitCacheTest0]
at 
org.apache.ignite.internal.processors.cache.IgniteTopologyValidatorGridSplitCacheTest.tryPut(IgniteTopologyValidatorGridSplitCacheTest.java:262)
at 
org.apache.ignite.internal.processors.cache.IgniteTopologyValidatorGridSplitCacheTest.testTopologyValidator(IgniteTopologyValidatorGridSplitCacheTest.java:182)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at junit.framework.TestCase.runTest(TestCase.java:176

[jira] [Created] (IGNITE-6347) Exception in GridDhtPartitionMap.readExternal

2017-09-11 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-6347:
--

 Summary: Exception in GridDhtPartitionMap.readExternal
 Key: IGNITE-6347
 URL: https://issues.apache.org/jira/browse/IGNITE-6347
 Project: Ignite
  Issue Type: Bug
  Components: general
Affects Versions: 2.1
Reporter: Alexandr Kuramshin
 Fix For: 2.1


Reading partition state with {{id > Short.MAX_VALUE}} causes to read negative 
value in {{int part = in.readShort()}}

{{in.readUnsignedShort()}} should be used instead.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-5798) Logging Ignite configuration at startup

2017-07-21 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-5798:
--

 Summary: Logging Ignite configuration at startup
 Key: IGNITE-5798
 URL: https://issues.apache.org/jira/browse/IGNITE-5798
 Project: Ignite
  Issue Type: Improvement
Reporter: Alexandr Kuramshin
 Fix For: 2.1


I've found that IgniteConfiguration is not logged even when -DIGNITE_QUIET=false

When we starting Ignite with path to the xml, or InputStream, we have to 
ensure, that all configuration options were properly read. And also we would 
like to know actual values of uninitialized configuration properties (default 
values), which will be set only after Ignite get started.

Monitoring tools, like Visor or WebConsole, do not show all configuration 
options. And even though they will be updated to show all properties, when new 
configuration options appear, then tools update will be needed.

Logging IgniteConfiguration at startup gives a possibility to ensure that the 
right grid configuration has been applied and leads to better user support 
based on log analyzing.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-5750) Format of uptime for metrics

2017-07-13 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-5750:
--

 Summary: Format of uptime for metrics
 Key: IGNITE-5750
 URL: https://issues.apache.org/jira/browse/IGNITE-5750
 Project: Ignite
  Issue Type: Bug
  Components: general
Affects Versions: 2.0
Reporter: Alexandr Kuramshin
Priority: Trivial
 Fix For: 2.1


Metrics for local node shows uptime formatted as 00:00:00:000

But the last colon should be changed to the dot.

Right format is 00:00:00.000



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-5251) Some JVM implementations may return null from getClassLoader()

2017-05-18 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-5251:
--

 Summary: Some JVM implementations may return null from 
getClassLoader()
 Key: IGNITE-5251
 URL: https://issues.apache.org/jira/browse/IGNITE-5251
 Project: Ignite
  Issue Type: Bug
  Components: general
Affects Versions: 2.0
 Environment: OpenJDK Runtime Environment (build 1.8.0_131-b11)
OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode)
Reporter: Alexandr Kuramshin
 Fix For: 2.1


Starting Ignite instance causes the NPE

{noformat}
java.lang.NullPointerException
at 
org.apache.ignite.internal.util.IgniteUtils.appendClassLoaderHash(IgniteUtils.java:4438)
at 
org.apache.ignite.internal.util.IgniteUtils.makeMBeanName(IgniteUtils.java:4418)
at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.registerFactoryMbean(IgnitionEx.java:2499)
at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:1801)
at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1604)
at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1041)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:568)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:515)
at org.apache.ignite.Ignition.start(Ignition.java:322)
{noformat}

Should be implemented {{IgniteUtils.getClassLoader(Class cls)}} which checks 
{{cls.getClassLoader()}} and in the case of null returns 
{{ClassLoader.getSystemClassLoader()}}.

All usages of {{Class.getClassLoader()}} should be replaced with  
{{IgniteUtils.getClassLoader()}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: Add ability to enable and disable rebalancing per-node

2017-05-11 Thread Alexandr Kuramshin
to Nick,

could you please describe in more detail the use of DataStreamer (do you
use StreamReceiver)?

It seems that you've unnecessary care about synchronous service startup and
cache rebalance. Service should start quickly after node has joined the
topology, and will process all the data has been collected by local
partitions the moments before. You may use rebalance delay to minimize the
amount of collected data before the service has been be started.

But if your service depends on external resources (another service),
managing rebalance won't help you because external resource may get
unavailable even after your service gets started and rebalance has occur.
You can't unrebalance partitions in such the case. In addition, if some
event-cache should be supplied with other caches (storing additional data
for service processing), there is always the gap between rebalancing
partition of first and the last cache containing collocated data. I think
you should not worry about additional network calls while rebalancing in
progress.

to Sasha,

I think we need configuration property enablePartitionExchange (in addition
to MBean flag) to have an ability to disable partition exchange at node
startup.

2017-05-06 2:51 GMT+07:00 npordash <nickpord...@gmail.com>:

> I can outline a use-case I have which may help define requirements for this
> task. For context, I was originally going to try and address the below
> use-case by disabling automatic rebalancing on a per-cache basis and use a
> cluster-wide task to orchestrate manual rebalancing; however, this issue
> sounds like it may provide a better approach.
>
> I have caches setup for the sole purpose of routing data to nodes via a
> Data
> Streamer. The logic in the streamer is simply to access a plugin on the
> data
> node which exposes a processing pipeline and runs the received cache
> entries
> through it. The data in this case is monitoring related and there is one
> cache (or logical stream) per data type (f.e. logs, events, metrics).
>
> The pipeline is composed of N services which are deployed as node
> singletons
> and have a service filter which targets a particular cache. These services
> can be deployed and un-deployed as processing requirements change or bugs
> are fixed without requiring clients to know or care about it.
>
> The catch here is that when nodes are added I don't want map partitions to
> rebalance to a new node until I know all of the necessary services are
> running, otherwise we may have a small window where data is processed
> through a pipeline that isn't completely initialized yet which would result
> in a data quality issue. Alternatively, I could have the pipeline raise an
> error which would cause the streamer to retry, but I'd like this to be
> handled more gracefully, if possible.
>
> In addition, it will probably be the case were these caches eventually have
> node filters so that we can isolate resources for these streams across
> different computes. This means that, for example, if we add a node only for
> metrics then deferring rebalancing should ideally only impact caches that
> would get assigned to that node.
>
> Going even further... so far we've talked about one cache which is used
> just
> for streaming, but at least one of the services would create its own set of
> caches as an in-memory storage layer which maintains an inverted index and
> time series data for elements coming through the stream. The storage caches
> in this case would only exist on nodes where the stream cache is and most
> of
> the write activity to these caches would be local since they would use the
> same affinity as the stream cache (if most writes were remote this wouldn't
> scale well). So... these caches would need to rebalance at the same time in
> order to minimize the possibility of additional network calls.
>
> The main concern I have is how to avoid the race condition of another node
> joining the topology _after_ it has been determined rebalancing should
> happen, but _before_ rebalancing is triggered. If this is controlled on a
> per-node (+cache) basis - as the ticket describes - it's probably a
> non-issue, but it's definitely an issue if it's only on a per-cache basis.
>
> -Nick
>
>
>
> --
> View this message in context: http://apache-ignite-
> developers.2346864.n4.nabble.com/Add-ability-to-enable-and-
> disable-rebalancing-per-node-tp17494p17529.html
> Sent from the Apache Ignite Developers mailing list archive at Nabble.com.
>



-- 
Thanks,
Alexandr Kuramshin


[jira] [Created] (IGNITE-5084) PagesList.put() assertion: pageId != tailId

2017-04-26 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-5084:
--

 Summary: PagesList.put() assertion: pageId != tailId
 Key: IGNITE-5084
 URL: https://issues.apache.org/jira/browse/IGNITE-5084
 Project: Ignite
  Issue Type: Bug
  Components: general
Affects Versions: 2.0
Reporter: Alexandr Kuramshin


Get an error upon rebalancing on topology update

{noformat}
Failed processing message [senderId=78a8f841-5d40-4ac7-b26b-f1b5e7f3faa0, 
msg=GridDhtPartitionSupplyMessageV2 [updateSeq=142, 
topVer=AffinityTopologyVersion [topVer=8, minorTopVer=0], missed=null, 
clean=[0, 1, 2, 3, 4, 5, 6, 7, 8, 11, 12, 13, 14, 17, 19, 21, 20, 23, 22, 24, 
26, 29, 28, 31, 35, 32, 33, 39, 36, 37, 42, 43, 40, 41, 44, 45, 51, 48, 55, 54, 
53, 52, 58, 56, 63, 62, 60, 68, 69, 65, 66, 76, 77, 78, 74, 85, 87, 86, 81, 80, 
82, 92, 91, 90, 98, 96, 97], msgSize=0, size=67, parts=[0, 1, 2, 3, 4, 5, 6, 7, 
8, 11, 12, 13, 14, 17, 19, 21, 20, 23, 22, 24, 26, 29, 28, 31, 35, 32, 33, 39, 
36, 37, 42, 43, 40, 41, 44, 45, 51, 48, 55, 54, 53, 52, 58, 56, 63, 62, 60, 68, 
69, 65, 66, 76, 77, 78, 74, 85, 87, 86, 81, 80, 82, 92, 91, 90, 98, 96, 97], 
super=GridCacheMessage [msgId=100460, depInfo=null, err=null, 
skipPrepare=false, cacheId=-2100569601, cacheId=-2100569601]]]
java.lang.AssertionError: pageId = 0, tailId = 281556581089286
at 
org.apache.ignite.internal.processors.cache.database.freelist.PagesList.put(PagesList.java:~)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (IGNITE-5026) getOrCreateCaches() hangs if any exception in GridDhtPartitionsExchangeFuture.init()

2017-04-19 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-5026:
--

 Summary: getOrCreateCaches() hangs if any exception in 
GridDhtPartitionsExchangeFuture.init()
 Key: IGNITE-5026
 URL: https://issues.apache.org/jira/browse/IGNITE-5026
 Project: Ignite
  Issue Type: Bug
  Components: cache
Affects Versions: 1.9, 2.0
Reporter: Alexandr Kuramshin
 Fix For: 2.1


Any exception has been thrown by {{GridDhtPartitionsExchangeFuture.init()}} 
causes to wait indefinitely {{GridCompoundFuture}} returned by 
{{GridCacheProcessor.dynamicStartCaches()}}.

Reproduced by 
{{IgniteDynamicCacheStartSelfTest.testGetOrCreateCollectionExceptional()}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (IGNITE-4865) Non-informative error message on using GridClientOptimizedMarshaller with unknown task classes

2017-03-27 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-4865:
--

 Summary: Non-informative error message on using 
GridClientOptimizedMarshaller with unknown task classes
 Key: IGNITE-4865
 URL: https://issues.apache.org/jira/browse/IGNITE-4865
 Project: Ignite
  Issue Type: Improvement
  Components: rest
Affects Versions: 2.0
Reporter: Alexandr Kuramshin
Assignee: Alexandr Kuramshin


Upon {{GridClientCompute.execute()}} I get non-informative error if a task 
class is not present in {{classnames.properties}}. It occurs when 
{{GridClient}} was configured to use {{GridClientOptimizedMarshaller}}.

{noformat}
Closing NIO session because of unhandled exception [cls=class 
o.a.i.i.util.nio.GridNioException, msg=class o.a.i.IgniteCheckedException: 
Failed to deserialize object with given class loader: null]
{noformat}

There is two problems:
* Actual problem was hidden
{noformat}
Caused by: java.lang.UnsupportedOperationException
at 
org.apache.ignite.internal.client.marshaller.optimized.GridClientOptimizedMarshaller$ClientMarshallerContext.className(GridClientOptimizedMarshaller.java:137)
at 
org.apache.ignite.internal.MarshallerContextAdapter.getClass(MarshallerContextAdapter.java:174)
at 
org.apache.ignite.marshaller.optimized.OptimizedMarshallerUtils.classDescriptor(OptimizedMarshallerUtils.java:266)
at 
org.apache.ignite.marshaller.optimized.OptimizedObjectInputStream.readObjectOverride(OptimizedObjectInputStream.java:318)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:367)
{noformat}
* Even reading the cause we don't understand what is wrong

What to do:
* Log stacktrace every time
* Throw UnsupportedOperationException with informative message.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: Inaccurate documentation about transactions

2017-03-06 Thread Alexandr Kuramshin
Yes, please.

https://issues.apache.org/jira/browse/IGNITE-4795

2017-02-28 2:22 GMT+07:00 Denis Magda <dma...@apache.org>:

> +1 to Alexander’s proposal.
>
> Alexander, could you wrap the discussion up creating a ticket with
> detailed explanation what to do?
>
> —
> Denis
>
> On Feb 27, 2017, at 9:01 AM, Dmitriy Setrakyan <dsetrak...@apache.org>
> wrote:
>
> I like the idea of fixing the exception inheritance.
>
> On Mon, Feb 27, 2017 at 1:40 AM, Alexandr Kuramshin <ein.nsk...@gmail.com>
> wrote:
>
>> I think annotating the methods with @IgniteTransactional is redundant,
>> because they are already marked by "throws TransactionTimeoutException/Tr
>> ansactionRollbackException/TransactionHeuristicException".
>>
>> For example, the same approach was used in JavaBeans 1.01 specs [1] with
>> TooManyListenersException.
>>
>> The only thing I'd like to do: make all TransactionTimeoutException/Tr
>> ansactionRollbackException/TransactionHeuristicException are derived
>> from the same parent TransactionException. And declare all transactional
>> methods as "throws TransactionException" with consequent Javadoc update.
>>
>> [1] http://download.oracle.com/otndocs/jcp/7224-javabeans-1.
>> 01-fr-spec-oth-JSpec/
>>
>> 2017-02-18 1:07 GMT+07:00 Dmitriy Setrakyan <dsetrak...@apache.org>:
>>
>>> On Fri, Feb 17, 2017 at 3:35 AM, Andrey Gura <ag...@apache.org> wrote:
>>>
>>> > From my point of view @IgniteTransactional annotation is redundant
>>> > entity which will just confuse and lead to questions like "How to use
>>> > this annotation?" I think documention update is better way.
>>> >
>>>
>>> Why do you think it will be confusing? This annotation is suggested
>>> purely
>>> for documentation purposes, nothing else. Instead of adding documentation
>>> to every method, we just add the annotation. User can check the
>>> @IgniteTransactional javadoc to understand what this annotation means.
>>>
>>
>>
>>
>> --
>> Thanks,
>> Alexandr Kuramshin
>>
>
>
>


-- 
Thanks,
Alexandr Kuramshin


[jira] [Created] (IGNITE-4767) rollback exception hides the origin exception (e.g. commit)

2017-03-02 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-4767:
--

 Summary: rollback exception hides the origin exception (e.g. 
commit)
 Key: IGNITE-4767
 URL: https://issues.apache.org/jira/browse/IGNITE-4767
 Project: Ignite
  Issue Type: Bug
  Components: cache, general
Affects Versions: 1.8
Reporter: Alexandr Kuramshin
 Fix For: 2.0


There is too much code places like:
{noformat}
try {
return txFuture.get();
}
catch (IgniteCheckedException e) {
tx.rollbackAsync();

throw e;
}
{noformat}
where an error upon rollback hides the actual exception {{e}}.

This should be implemented in the way like try-with-resources does:
{noformat}
try {
return txFuture.get();
}
catch (IgniteCheckedException e1) {
try {
tx.rollbackAsync();
}
catch (Throwable inner) {
e.addSuppressed(inner);
}

throw e;
}
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: Inaccurate documentation about transactions

2017-02-27 Thread Alexandr Kuramshin
I think annotating the methods with @IgniteTransactional is redundant,
because they are already marked by "throws
TransactionTimeoutException/TransactionRollbackException/TransactionHeuristicException".

For example, the same approach was used in JavaBeans 1.01 specs [1] with
TooManyListenersException.

The only thing I'd like to do: make all
TransactionTimeoutException/TransactionRollbackException/TransactionHeuristicException
are derived from the same parent TransactionException. And declare all
transactional methods as "throws TransactionException" with consequent
Javadoc update.

[1]
http://download.oracle.com/otndocs/jcp/7224-javabeans-1.01-fr-spec-oth-JSpec/

2017-02-18 1:07 GMT+07:00 Dmitriy Setrakyan <dsetrak...@apache.org>:

> On Fri, Feb 17, 2017 at 3:35 AM, Andrey Gura <ag...@apache.org> wrote:
>
> > From my point of view @IgniteTransactional annotation is redundant
> > entity which will just confuse and lead to questions like "How to use
> > this annotation?" I think documention update is better way.
> >
>
> Why do you think it will be confusing? This annotation is suggested purely
> for documentation purposes, nothing else. Instead of adding documentation
> to every method, we just add the annotation. User can check the
> @IgniteTransactional javadoc to understand what this annotation means.
>



-- 
Thanks,
Alexandr Kuramshin


Inaccurate documentation about transactions

2017-02-14 Thread Alexandr Kuramshin
After doing some tests with transactions I've found transactions work not
as expected after reading the documentation [1].

First of all, nowhere's written which methods of the cache are
transactional and which are not. Quite the contrary, after reading
documentation we get know that each TRANSACTIONAL cache is fully
ACID-compliant without exceptions.

Only after deep multi-thread testing, and consulting with other developers,
I get know that only get and put methods are running within transaction,
but iterator and query methods are running outside (in autonomous)
transaction with READ_COMMITTED isolation level.

Later I've understood that only methods throwing
TransactionTimeoutException/TransactionRollbackException/TransactionHeuristicException
are fully transactional. I think all methods on page [2] should be directly
described - are they transactional or not. Btw, why these exceptions are
not derived from the common base class, e.g. TransactionException?

Secondary, using the transactional get() method inside the READ_COMMITTED
transaction we expect to get the committed value, as the documentation [1]
claims:

* READ_COMMITTED - Data is read without a lock and is never cached in the
transaction itself.

Ok, but what about put()? After doing the put() a new value, we get
successive reads of the new value, that is actually DIRTY READ. Hence the
value is cached within transaction. It's not documented behavior.

[1] https://apacheignite.readme.io/docs/transactions

[2]
https://ignite.apache.org/releases/1.8.0/javadoc/org/apache/ignite/IgniteCache.html

-- 
Thanks,
Alexandr Kuramshin


[jira] [Created] (IGNITE-4632) AffinityFunction unchecked exception handling (unassigned backup)

2017-01-31 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-4632:
--

 Summary: AffinityFunction unchecked exception handling (unassigned 
backup)
 Key: IGNITE-4632
 URL: https://issues.apache.org/jira/browse/IGNITE-4632
 Project: Ignite
  Issue Type: Bug
  Components: general
Affects Versions: 1.8
Reporter: Alexandr Kuramshin
Priority: Minor


{{AffinityFunction}} implementation may throw unchecked exception upon 
assignment. In some cases additional processing should be performed when 
affinity function method invocation throws an exception.

Special case when the cache with backups is running, and a node with a primary 
partition will left. Then we get the primary partition unassigned if 
{{AffinityFunction.partition(Object)}} throws an exception. My suggestion is to 
shutdown the node in such the case (like SEGMENTED), because the cluster could 
not work normally without the primary partition assigned.

{noformat}
Failed processing message [senderId=8a1ab9a3-786e-4601-ba22-efd380849d99, 
msg=GridDhtPartitionSupplyMessageV2 [updateSeq=16069, 
topVer=AffinityTopologyVersion [topVer=7, minorTopVer=0], missed=[16, 17, 33, 
22, 56, 10], clean=[0, 1, 2, 34, 3, 5, 7, 9, 45, 46, 49, 18, 50, 55, 25, 26, 
58, 29, 61], msgSize=0, size=19, parts=[0, 1, 2, 34, 3, 5, 7, 9, 45, 46, 49, 
18, 50, 55, 25, 26, 58, 29, 61], super=GridCacheMessage [msgId=70098615, 
depInfo=null, err=null, skipPrepare=false, cacheId=-148990687, 
cacheId=-148990687]]]
com.sbt.persistence.exceptions.DPLException: ParticleKeyMapper не может 
обратывать никаких других объектов кроме ОУ. Системная ошибка - обратитесь в 
службу технической поддержки DPL
 at 
com.sbt.dpl.gridgain.ParticleAffinityFunction.partition(ParticleAffinityFunction.java:67)
 at 
org.apache.ignite.internal.processors.cache.GridCacheAffinityManager.partition(GridCacheAffinityManager.java:219)
 at 
org.apache.ignite.internal.processors.cache.GridCacheAffinityManager.partition(GridCacheAffinityManager.java:194)
 at 
org.apache.ignite.internal.processors.cache.GridCacheAffinityManager.localNode(GridCacheAffinityManager.java:382)
 at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander.handleSupplyMessage(GridDhtPartitionDemander.java:680)
 at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader.handleSupplyMessage(GridDhtPreloader.java:390)
 at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$5.apply(GridCachePartitionExchangeManager.java:395)
 at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$5.apply(GridCachePartitionExchangeManager.java:385)
 at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:758)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Ignite configuration at runtime

2017-01-17 Thread Alexandr Kuramshin
Hi community!

I've found that IgniteConfiguration is not logged even when
-DIGNITE_QUIET=false

When we starting Ignite with path to the xml, or InputStream, we have to
ensure, that all configuration options were properly read. And also we
would like to know actual values of uninitialized configuration properties
(default values), which will be set only after Ignite get started.

Monitoring tools, like Visor or WebConsole, do not show all configuration
options. And even though they will be updated to show all properties, when
new configuration options appear, then tools update will be needed.

So logging IgniteConfiguration in whole is the really needed improvement.

-- 
Thanks,
Alexandr Kuramshin


Empty cache memory overhead

2017-01-10 Thread Alexandr Kuramshin
Hi community,

I'd like to share my investigations about the subject.

Even if the caches is off-heap and contains no data, the JVM heap memory
consumed. I'm calling this feature "empty cache memory overhead"
("overhead" later for shot).

The size of the memory consumed depends on many factors, and varying from 1
to 50 Mb per cache on every node in the cluster.

There is real systems uses >1000 caches within the cluster. So the heap
memory consumed on each node will be 50 Gb or more.

I've found that overhead mainly depends on this factors:

1) local partitions count assigned to the node by the affinity function;

1.a) total number of partitions of the affinity function;

1.b) number of backups;

2) IGNITE_ATOMIC_CACHE_DELETE_HISTORY_SIZE

3) IGNITE_AFFINITY_HISTORY_SIZE

After analyzing heapdumps and the sources I've found this countable objects
upon overhead depends:

1) First group.

GridDhtPartitionTopologyImpl = cache count

GridDhtLocalPartition = cache count * local partitions count

GridCircularBuffer$Item = cache count * local partitions count * item
factor (default 32).

Local partitions count = affinity function total partitions / node count *
(1 + number of backups)

Item factor = map capacity for storing ->
IGNITE_ATOMIC_CACHE_DELETE_HISTORY_SIZE / affinity function partitions
count, but minimum 20.

Real values:

GridDhtPartitionTopologyImpl = 1000
Affinity function total partitions = 1024
Node count = 16
Number of backups = 3
Local partitions count = 256
GridDhtLocalPartition = 256_000
GridCircularBuffer$Item = 8_192_000

2) Second group.

GridAffinityAssignmentCache = cache count * node count

GridAffinityAssignment = cache count * node count * assignment factor

Assignment factor depends on topology version and
IGNITE_AFFINITY_HISTORY_SIZE, default 6-7.

Real values:

GridAffinityAssignmentCache = 16_000
GridAffinityAssignment  = 112_000

I think the implementation should be changed in the way the object counts
should depends on cache data size. And the small (or empty) caches should
be more lightweight as possible.

-- 
Thanks,
Alexandr Kuramshin


[jira] [Created] (IGNITE-4538) BinaryObjectImpl: lack of context information upon deserialization

2017-01-10 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-4538:
--

 Summary: BinaryObjectImpl: lack of context information upon 
deserialization
 Key: IGNITE-4538
 URL: https://issues.apache.org/jira/browse/IGNITE-4538
 Project: Ignite
  Issue Type: Improvement
  Components: binary
Affects Versions: 1.8, 1.7
Reporter: Alexandr Kuramshin


Taking an error we don't know the cache name was accessed, the type of 
BinaryClassDescriptor was used, and the entry was accessed (the key of an entry 
should be logged with respect to the *include sensitive* system property).

Such context information should be appended by wrapping inner exception on the 
every key stack frame.

{noformat}
org.apache.ignite.binary.BinaryObjectException: Unexpected flag value [pos=24, 
expected=4, actual=9]
at 
org.apache.ignite.internal.binary.BinaryReaderExImpl.checkFlagNoHandles(BinaryReaderExImpl.java:1423)
 ~[ignite-core-1.10.1.ea7.jar:1.10.1.ea7]
at 
org.apache.ignite.internal.binary.BinaryReaderExImpl.readLongNullable(BinaryReaderExImpl.java:723)
 ~[ignite-core-1.10.1.ea7.jar:1.10.1.ea7]
at 
org.apache.ignite.internal.binary.BinaryFieldAccessor$DefaultFinalClassAccessor.readFixedType(BinaryFieldAccessor.java:677)
 ~[ignite-core-1.10.1.ea7.jar:1.10.1.ea7]
at 
org.apache.ignite.internal.binary.BinaryFieldAccessor$DefaultFinalClassAccessor.read(BinaryFieldAccessor.java:639)
 ~[ignite-core-1.10.1.ea7.jar:1.10.1.ea7]
at 
org.apache.ignite.internal.binary.BinaryClassDescriptor.read(BinaryClassDescriptor.java:818)
 ~[ignite-core-1.10.1.ea7.jar:1.10.1.ea7]
at 
org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize(BinaryReaderExImpl.java:1481)
 ~[ignite-core-1.10.1.ea7.jar:1.10.1.ea7]
at 
org.apache.ignite.internal.binary.BinaryObjectImpl.deserializeValue(BinaryObjectImpl.java:717)
 ~[ignite-core-1.10.1.ea7.jar:1.10.1.ea7]
at 
org.apache.ignite.internal.binary.BinaryObjectImpl.value(BinaryObjectImpl.java:143)
 ~[ignite-core-1.10.1.ea7.jar:1.10.1.ea7]
at 
org.apache.ignite.internal.processors.cache.CacheObjectContext.unwrapBinary(CacheObjectContext.java:272)
 ~[ignite-core-1.10.1.ea7.jar:1.10.1.ea7]
at 
org.apache.ignite.internal.processors.cache.CacheObjectContext.unwrapBinaryIfNeeded(CacheObjectContext.java:160)
 ~[ignite-core-1.10.1.ea7.jar:1.10.1.ea7]
at 
org.apache.ignite.internal.processors.cache.CacheObjectContext.unwrapBinaryIfNeeded(CacheObjectContext.java:147)
 ~[ignite-core-1.10.1.ea7.jar:1.10.1.ea7]
at 
org.apache.ignite.internal.processors.cache.GridCacheContext.unwrapBinaryIfNeeded(GridCacheContext.java:1706)
 ~[ignite-core-1.10.1.ea7.jar:1.10.1.ea7]
at 
org.apache.ignite.internal.processors.cache.query.GridCacheQueryManager$PeekValueExpiryAwareIterator.advance(GridCacheQueryManager.java:2875)
 ~[ignite-core-1.10.1.ea7.jar:1.10.1.ea7]
at 
org.apache.ignite.internal.processors.cache.query.GridCacheQueryManager$PeekValueExpiryAwareIterator.(GridCacheQueryManager.java:2814)
 ~[ignite-core-1.10.1.ea7.jar:1.10.1.ea7]
at 
org.apache.ignite.internal.processors.cache.query.GridCacheQueryManager$PeekValueExpiryAwareIterator.(GridCacheQueryManager.java:2752)
 ~[ignite-core-1.10.1.ea7.jar:1.10.1.ea7]
at 
org.apache.ignite.internal.processors.cache.query.GridCacheQueryManager$5.(GridCacheQueryManager.java:863)
 ~[ignite-core-1.10.1.ea7.jar:1.10.1.ea7]
at 
org.apache.ignite.internal.processors.cache.query.GridCacheQueryManager.scanIterator(GridCacheQueryManager.java:863)
 ~[ignite-core-1.10.1.ea7.jar:1.10.1.ea7]
at 
org.apache.ignite.internal.processors.cache.query.GridCacheQueryManager.scanQueryLocal(GridCacheQueryManager.java:1436)
 ~[ignite-core-1.10.1.ea7.jar:1.10.1.ea7]
at 
org.apache.ignite.internal.processors.cache.query.GridCacheQueryAdapter.executeScanQuery(GridCacheQueryAdapter.java:552)
 ~[ignite-core-1.10.1.ea7.jar:1.10.1.ea7]
at 
org.apache.ignite.internal.processors.cache.GridCacheAdapter.igniteIterator(GridCacheAdapter.java:4115)
 ~[ignite-core-1.10.1.ea7.jar:1.10.1.ea7]
at 
org.apache.ignite.internal.processors.cache.GridCacheAdapter.igniteIterator(GridCacheAdapter.java:4092)
 ~[ignite-core-1.10.1.ea7.jar:1.10.1.ea7]
at 
org.apache.ignite.internal.processors.cache.IgniteCacheProxy.iterator(IgniteCacheProxy.java:1979)
 ~[ignite-core-1.10.1.ea7.jar:1.10.1.ea7]
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (IGNITE-4533) GridDhtPartitionsExchangeFuture stores unnecessary messages after processing done

2017-01-10 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-4533:
--

 Summary: GridDhtPartitionsExchangeFuture stores unnecessary 
messages after processing done
 Key: IGNITE-4533
 URL: https://issues.apache.org/jira/browse/IGNITE-4533
 Project: Ignite
  Issue Type: Bug
  Components: cache
Affects Versions: 1.8, 1.7
Reporter: Alexandr Kuramshin


After GridDhtPartitionsExchangeFuture has been completed, 
GridCachePartitionExchangeManager still stores it in field ExchangeFutureSet 
exchFuts (for race condition handling).

But many GridDhtPartitionsSingleMessage objects stored in field 
ConcurrentMap<UUID, GridDhtPartitionsAbstractMessage> msgs is not needed after 
the future has been processed.

This map should be cleared in the end of the method onAllReceived().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (IGNITE-4496) Review all logging for sensitive data leak

2016-12-26 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-4496:
--

 Summary: Review all logging for sensitive data leak
 Key: IGNITE-4496
 URL: https://issues.apache.org/jira/browse/IGNITE-4496
 Project: Ignite
  Issue Type: Improvement
Reporter: Alexandr Kuramshin
Assignee: Alexandr Kuramshin


While sensitive logging option added and toString() methods fixed, not all 
logging was checked for sensitive data leak



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Capacity Planning - Calculating Memory Usage

2016-12-25 Thread Alexandr Kuramshin
Hi Val,

I'm sorry, of course only @QuerySqlField(index = true) makes an index on
objects field. Fields without indexes make none additional overhead.

Group index on multiple fields is a one index (isn't it?)

I don't understand what is still unclear.

Entry footprint  = key footprint + value footprint + entry overhead + index
overhead.

Index overhead depends on how many indices are enabled for the entry type.

2016-12-23 2:06 GMT+07:00 Valentin Kulichenko <valentin.kuliche...@gmail.com
>:

> Alexandr,
>
> See my comments below.
>
> On Wed, Dec 21, 2016 at 7:01 PM, Alexandr Kuramshin <ein.nsk...@gmail.com>
> wrote:
>
> > Hi Val,
> >
> > the understanding is simple.
> >
> > When you enables the single index on entry class you get "First index
> > overhead" per entry.
> >
> > When you enables two indices on entry class you get "First index
> overhead"
> > + "Next index overhead" per entry.
> >
> > With three indices you get "First index overhead" + 2 * "Next index
> > overhead", and so on...
> >
>
> This should be explained in more detail, probably with some trivial
> example. Currently it's very unclear.
>
>
> >
> > Each annotated field with @QuerySqlField is an index, except multiple
> > fields annotated with @QuerySqlField.Group.
> >
>
> This actually confuses me a lot, because a field can be created with or
> without index? Can you please clarify? How much overhead is introduced by a
> field without index? With index? What about group indexes?
>
>
> >
> > Another way to defining indices is to use property "queryEntities" and
> it's
> > subproperty "indexes". See the article [1]
> >
> > [1] https://apacheignite.readme.io/docs/indexes
> >
> > 2016-12-20 8:38 GMT+07:00 Valentin Kulichenko <
> > valentin.kuliche...@gmail.com
> > >:
> >
> > > Alexandr,
> > >
> > > Can you please clarify what is "First index overhead" and "Next index
> > > overhead"? Generally, I think overhead provided by indexes should be
> > > described in more details, now it's not very clear what happens when
> > > indexes are added.
> > >
> > > Also the calculation example should be a separate section.
> > >
> > > -Val
> > >
> > > On Wed, Dec 14, 2016 at 1:07 AM, Alexandr Kuramshin <
> > ein.nsk...@gmail.com>
> > > wrote:
> > >
> > > > Thank you, Andrey,
> > > >
> > > > I'll do additional tests with expire policy and update the article.
> > > >
> > > > 2016-12-13 22:10 GMT+07:00 Andrey Mashenkov <
> > andrey.mashen...@gmail.com
> > > >:
> > > >
> > > > > Alexandr,
> > > > >
> > > > > In addition. If expire policy is configured, there is additional
> > > overhead
> > > > > to entries can be tracked by TtlManager.
> > > > > This overhead is OnHeap and does not depend on cache MemoryMode
> > (until
> > > > > Ignite-3840 will be in master).
> > > > >
> > > > > For now overhead is about 32-40 bytes (EntryWrapper itself) +
> (40-48)
> > > > bytes
> > > > > (ConcurrentSkipList node) per entry.
> > > > >
> > > > >
> > > > >
> > > > > On Tue, Dec 13, 2016 at 10:37 AM, Alexandr Kuramshin <
> > > > ein.nsk...@gmail.com
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > Hello, Igniters,
> > > > > >
> > > > > > I'd like to represent updated article [1] about the subject.
> > > > > >
> > > > > > And I'll very appreciate your comments and questions about it.
> > > > > >
> > > > > > Please review.
> > > > > >
> > > > > > [1] http://apacheignite.gridgain.org/docs/capacity-planning
> > > > > >
> > > > > > --
> > > > > > Thanks,
> > > > > > Alexandr Kuramshin
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > С уважением,
> > > > > Машенков Андрей Владимирович
> > > > > Тел. +7-921-932-61-82
> > > > >
> > > > > Best regards,
> > > > > Andrey V. Mashenkov
> > > > > Cerr: +7-921-932-61-82
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Thanks,
> > > > Alexandr Kuramshin
> > > >
> > >
> >
> >
> >
> > --
> > Thanks,
> > Alexandr Kuramshin
> >
>



-- 
Thanks,
Alexandr Kuramshin


[jira] [Created] (IGNITE-4485) CacheJdbcPojoStore returns unexpected BinaryObject upon loadCache()

2016-12-22 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-4485:
--

 Summary: CacheJdbcPojoStore returns unexpected BinaryObject upon 
loadCache()
 Key: IGNITE-4485
 URL: https://issues.apache.org/jira/browse/IGNITE-4485
 Project: Ignite
  Issue Type: Bug
  Components: cache
Affects Versions: 1.8, 1.7
Reporter: Alexandr Kuramshin


When calling loadCache(IgniteBiInClosure clo, Object... args) sometimes we get 
unexpected values of type BinaryObject in IgniteBiInClosure.apply(), whereas 
POJO value kind was registered previously for well known key type.

It's so because getOrCreateCacheMappings returns HashMap which resorts entity 
mappings for the same key but with different value kind. When BinaryMarshaller 
is used, then this map contains two mappings for the same key - POJO and BINARY.

Possible fix is to use LinkedHashMap, then POJO mapping will be picked first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Capacity Planning - Calculating Memory Usage

2016-12-21 Thread Alexandr Kuramshin
Hi Val,

the understanding is simple.

When you enables the single index on entry class you get "First index
overhead" per entry.

When you enables two indices on entry class you get "First index overhead"
+ "Next index overhead" per entry.

With three indices you get "First index overhead" + 2 * "Next index
overhead", and so on...

Each annotated field with @QuerySqlField is an index, except multiple
fields annotated with @QuerySqlField.Group.

Another way to defining indices is to use property "queryEntities" and it's
subproperty "indexes". See the article [1]

[1] https://apacheignite.readme.io/docs/indexes

2016-12-20 8:38 GMT+07:00 Valentin Kulichenko <valentin.kuliche...@gmail.com
>:

> Alexandr,
>
> Can you please clarify what is "First index overhead" and "Next index
> overhead"? Generally, I think overhead provided by indexes should be
> described in more details, now it's not very clear what happens when
> indexes are added.
>
> Also the calculation example should be a separate section.
>
> -Val
>
> On Wed, Dec 14, 2016 at 1:07 AM, Alexandr Kuramshin <ein.nsk...@gmail.com>
> wrote:
>
> > Thank you, Andrey,
> >
> > I'll do additional tests with expire policy and update the article.
> >
> > 2016-12-13 22:10 GMT+07:00 Andrey Mashenkov <andrey.mashen...@gmail.com
> >:
> >
> > > Alexandr,
> > >
> > > In addition. If expire policy is configured, there is additional
> overhead
> > > to entries can be tracked by TtlManager.
> > > This overhead is OnHeap and does not depend on cache MemoryMode (until
> > > Ignite-3840 will be in master).
> > >
> > > For now overhead is about 32-40 bytes (EntryWrapper itself) + (40-48)
> > bytes
> > > (ConcurrentSkipList node) per entry.
> > >
> > >
> > >
> > > On Tue, Dec 13, 2016 at 10:37 AM, Alexandr Kuramshin <
> > ein.nsk...@gmail.com
> > > >
> > > wrote:
> > >
> > > > Hello, Igniters,
> > > >
> > > > I'd like to represent updated article [1] about the subject.
> > > >
> > > > And I'll very appreciate your comments and questions about it.
> > > >
> > > > Please review.
> > > >
> > > > [1] http://apacheignite.gridgain.org/docs/capacity-planning
> > > >
> > > > --
> > > > Thanks,
> > > > Alexandr Kuramshin
> > > >
> > >
> > >
> > >
> > > --
> > > С уважением,
> > > Машенков Андрей Владимирович
> > > Тел. +7-921-932-61-82
> > >
> > > Best regards,
> > > Andrey V. Mashenkov
> > > Cerr: +7-921-932-61-82
> > >
> >
> >
> >
> > --
> > Thanks,
> > Alexandr Kuramshin
> >
>



-- 
Thanks,
Alexandr Kuramshin


Re: Capacity Planning - Calculating Memory Usage

2016-12-14 Thread Alexandr Kuramshin
Thank you, Andrey,

I'll do additional tests with expire policy and update the article.

2016-12-13 22:10 GMT+07:00 Andrey Mashenkov <andrey.mashen...@gmail.com>:

> Alexandr,
>
> In addition. If expire policy is configured, there is additional overhead
> to entries can be tracked by TtlManager.
> This overhead is OnHeap and does not depend on cache MemoryMode (until
> Ignite-3840 will be in master).
>
> For now overhead is about 32-40 bytes (EntryWrapper itself) + (40-48) bytes
> (ConcurrentSkipList node) per entry.
>
>
>
> On Tue, Dec 13, 2016 at 10:37 AM, Alexandr Kuramshin <ein.nsk...@gmail.com
> >
> wrote:
>
> > Hello, Igniters,
> >
> > I'd like to represent updated article [1] about the subject.
> >
> > And I'll very appreciate your comments and questions about it.
> >
> > Please review.
> >
> > [1] http://apacheignite.gridgain.org/docs/capacity-planning
> >
> > --
> > Thanks,
> > Alexandr Kuramshin
> >
>
>
>
> --
> С уважением,
> Машенков Андрей Владимирович
> Тел. +7-921-932-61-82
>
> Best regards,
> Andrey V. Mashenkov
> Cerr: +7-921-932-61-82
>



-- 
Thanks,
Alexandr Kuramshin


Capacity Planning - Calculating Memory Usage

2016-12-12 Thread Alexandr Kuramshin
Hello, Igniters,

I'd like to represent updated article [1] about the subject.

And I'll very appreciate your comments and questions about it.

Please review.

[1] http://apacheignite.gridgain.org/docs/capacity-planning

-- 
Thanks,
Alexandr Kuramshin


[jira] [Created] (IGNITE-4417) OptimizedMarshaller: show property path causing ClassNotFoundException

2016-12-12 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-4417:
--

 Summary: OptimizedMarshaller: show property path causing 
ClassNotFoundException
 Key: IGNITE-4417
 URL: https://issues.apache.org/jira/browse/IGNITE-4417
 Project: Ignite
  Issue Type: Improvement
  Components: general
Reporter: Alexandr Kuramshin
Priority: Minor


When OptimizedMarshaller could not unmarshal an object on remote side by cause 
of ClassNotFoundException, then IgniteCheckedException is thrown.

We could see in stack trace the class loader toString() value and the name of 
the class which was not found. This information is insufficient.

We should also know which field or property of an object causes 
ClassNotFoundException. And, if this object contains inside another object, we 
should know the type of this object and its field or property as well.

For example, IgniteCheckedException: Failed to unmarshal an object ClassName1 
root.ClassName2 fieldName2.ClassName3 propName3. Given class loader: 
classLoaderToString.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: IgniteCache.loadCache improvement proposal

2016-11-22 Thread Alexandr Kuramshin
Val, Yakov,

Sorry for delay, I need time to think and to do some tests.

Anyway, extending the API and supply default implementation - is good. It
makes frameworks more flexible and usable.

But your proposal of extension will not solve the problem that I have
raise. Please, read the next with special attention.

Current implementation IgniteCache.loadCache causes parallel execution of
IgniteCache.localLoadCache on each node in the cluster. It's bad
implementation, but it's *right semantic*.

You propose to extend IgniteCache.localLoadCache and use it to load data on
all the nodes. It's bad semantic. But it also leads to bad implementation.
Please note why.

When you filter the data with the supplied IgniteBiPredicate, you may
access the data that must be co-located. Hence to load the data to all the
nodes, you need access to all the related data partitioned by the cluster.
This leads to great network overhead and near caches overload.

And that is why am I wondering that IgniteBiPredicate is executed for every
key supplied by Cache.loadCache, but not only for those keys, which will be
stored on this node.

My opinion in conclusion.

localLoadCache should first filter a key by the affinity function and the
current cache topology, *then *invoke the predicate, and then store the
entity in the cache (possibly by invoking the supplied closure). All
associated partitions should be locked for the time of loading.

IgniteCache.loadCache should perform Cache.loadCache on the one (or some
more) nodes, then transfer entities to the remote nodes, *then *invoke the
predicate and closure on the remote nodes.


2016-11-22 2:16 GMT+03:00 Valentin Kulichenko <valentin.kuliche...@gmail.com
>:

> Guys,
>
> I created a ticket for this:
> https://issues.apache.org/jira/browse/IGNITE-4255
>
> Feel free to provide comments.
>
> -Val
>
> On Sat, Nov 19, 2016 at 6:56 AM, Yakov Zhdanov <yzhda...@apache.org>
> wrote:
>
> > >
> > >
> > > Why not store the partition ID in the database and query only local
> > > partitions? Whatever approach we design with a DataStreamer will be
> > slower
> > > than this.
> > >
> >
> > Because this can be some generic DB. Imagine the app migrating to IMDG.
> >
> > I am pretty sure that in many cases approach with data streamer will be
> > faster and in many cases approach with multiple queries will be faster.
> And
> > the choice should depend on many factors. I like Val's suggestions. I
> think
> > he goes in the right direction.
> >
> > --Yakov
> >
>



-- 
Thanks,
Alexandr Kuramshin


Re: IgniteCache.loadCache improvement proposal

2016-11-18 Thread Alexandr Kuramshin
Dmitriy,

I will not be fully confident that partition ID is the best approach in all
cases. Even if we have full access to the database structure, there are
another problems.

Assume we have a table PERSON (ID NUMBER, NAME VARCHAR, SURNAME VARCHAR,
AGE NUMBER, EMPL_DATE DATE). And we add our column PART NUMBER.

While we already have indexes IDX1(NAME), IDX2(SURNAME), IDX3(AGE),
IDX4(EMPL_DATE), we have to add new 2-column index IDX5(PART, EMPL_DATE)
for pre-loading at startup, for example, recently employed persons.

And if we'd like to query filtered data from the database, we'd also have
to create the other compound indexes IDX6(PART, NAME), IDX7(PART, SURNAME),
IDX8(PART, AGE). So we doubling overhead is defined by indexes.

After this modifications on the database has been done and the PART column
is filled, what we should do to preload the data?

We should perform so many database queries so many partitions are stored on
the nodes. Number of queries would be 1024 by default settings in the
affinity functions. Some calls may not return any data at all, and it will
be a vain network round-trip. Also it may be a problem for some databases
to effectively perform number of parallel queries without a degradation on
the total throughput.

DataStreamer approach may be faster, but it should be tested.

2016-11-16 16:40 GMT+03:00 Dmitriy Setrakyan <dsetrak...@apache.org>:

> On Wed, Nov 16, 2016 at 1:54 PM, Yakov Zhdanov <yzhda...@apache.org>
> wrote:
>
> > > On Wed, Nov 16, 2016 at 11:22 AM, Yakov Zhdanov <yzhda...@apache.org>
> > wrote:
> >
> > > > > Yakov, I agree that such scenario should be avoided. I also think
> > that
> >
> > > > > loadCache(...) method, as it is right now, provides a way to avoid
> > it.
> >
> > > >
> >
> > > > No, it does not.
> >
> > > >
> > > Yes it does :)
> >
> > No it doesn't. Load cache should either send a query to DB that filters
> all
> > the data on server side which, in turn, may result to full-scan of 2 Tb
> > data set dozens of times (equal to node count) or send a query that
> brings
> > the whole dataset to each node which is unacceptable as well.
> >
>
> Why not store the partition ID in the database and query only local
> partitions? Whatever approach we design with a DataStreamer will be slower
> than this.
>



-- 
Thanks,
Alexandr Kuramshin


[jira] [Created] (IGNITE-4245) Get EXCEPTION_ACCESS_VIOLATION with OFFHEAP_TIRED cache

2016-11-18 Thread Alexandr Kuramshin (JIRA)
Alexandr Kuramshin created IGNITE-4245:
--

 Summary: Get EXCEPTION_ACCESS_VIOLATION with OFFHEAP_TIRED cache
 Key: IGNITE-4245
 URL: https://issues.apache.org/jira/browse/IGNITE-4245
 Project: Ignite
  Issue Type: Bug
Affects Versions: 1.7, 1.6, 1.8
Reporter: Alexandr Kuramshin


Get EXCEPTION_ACCESS_VIOLATION while iterating through local cache entries 
stored in the OFFHEAP_TIRED cache.

Test class and log are attached.

I've try the same test on 1.6.11, 1.7.4 and 1.8 versions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: IgniteCache.loadCache improvement proposal

2016-11-16 Thread Alexandr Kuramshin
in Kulichenko <
> >>>> valentin.kuliche...@gmail.com> wrote:
> >>>>>
> >>>>> It sounds like Aleksandr is basically proposing to support automatic
> >>>>> persistence [1] for loading through data streamer and we really don't
> >>>> have
> >>>>> this. However, I think I have more generic solution in mind.
> >>>>>
> >>>>> What if we add one more IgniteCache.loadCache overload like this:
> >>>>>
> >>>>> loadCache(@Nullable IgniteBiPredicate<K, V> p, IgniteBiInClosure<K,
> V>
> >>>>> clo, @Nullable
> >>>>> Object... args)
> >>>>>
> >>>>> It's the same as the existing one, but with the key-value closure
> >>>> provided
> >>>>> as a parameter. This closure will be passed to the
> CacheStore.loadCache
> >>>>> along with the arguments and will allow to override the logic that
> >>>> actually
> >>>>> saves the loaded entry in cache (currently this logic is always
> >> provided
> >>>> by
> >>>>> the cache itself and user can't control it).
> >>>>>
> >>>>> We can then provide the implementation of this closure that will
> >> create a
> >>>>> data streamer and call addData() within its apply() method.
> >>>>>
> >>>>> I see the following advantages:
> >>>>>
> >>>>> - Any existing CacheStore implementation can be reused to load
> through
> >>>>> streamer (our JDBC and Cassandra stores or anything else that user
> >>>> has).
> >>>>> - Loading code is always part of CacheStore implementation, so it's
> >>>> very
> >>>>> easy to switch between different ways of loading.
> >>>>> - User is not limited by two approaches we provide out of the box,
> >> they
> >>>>> can always implement a new one.
> >>>>>
> >>>>> Thoughts?
> >>>>>
> >>>>> [1] https://apacheignite.readme.io/docs/automatic-persistence
> >>>>>
> >>>>> -Val
> >>>>>
> >>>>> On Tue, Nov 15, 2016 at 2:27 AM, Alexey Kuznetsov <
> >> akuznet...@apache.org
> >>>>>
> >>>>> wrote:
> >>>>>
> >>>>>> Hi, All!
> >>>>>>
> >>>>>> I think we do not need to chage API at all.
> >>>>>>
> >>>>>> public void loadCache(@Nullable IgniteBiPredicate<K, V> p, @Nullable
> >>>>>> Object... args) throws CacheException;
> >>>>>>
> >>>>>> We could pass any args to loadCache();
> >>>>>>
> >>>>>> So we could create class
> >>>>>> IgniteCacheLoadDescriptor {
> >>>>>> some fields that will describe how to load
> >>>>>> }
> >>>>>>
> >>>>>>
> >>>>>> and modify POJO store to detect and use such arguments.
> >>>>>>
> >>>>>>
> >>>>>> All we need is to implement this and write good documentation and
> >>>> examples.
> >>>>>>
> >>>>>> Thoughts?
> >>>>>>
> >>>>>> On Tue, Nov 15, 2016 at 5:22 PM, Alexandr Kuramshin <
> >>>> ein.nsk...@gmail.com>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Hi Vladimir,
> >>>>>>>
> >>>>>>> I don't offer any changes in API. Usage scenario is the same as it
> >> was
> >>>>>>> described in
> >>>>>>> https://apacheignite.readme.io/docs/persistent-store#
> >>>> section-loadcache-
> >>>>>>>
> >>>>>>> The preload cache logic invokes IgniteCache.loadCache() with some
> >>>>>>> additional arguments, depending on a CacheStore implementation, and
> >>>> then
> >>>>>>> the loading occurs in the way I've already described.
> >>>>>>>
> >>>>>>>
> >>>>>>> 2016-11-15 11:26 GMT+03:00 Vladimir Ozerov <voze...@gridgain.com>:
> >>>>>>>
> >

Re: IgniteCache.loadCache improvement proposal

2016-11-15 Thread Alexandr Kuramshin
Hi Vladimir,

I don't offer any changes in API. Usage scenario is the same as it was
described in
https://apacheignite.readme.io/docs/persistent-store#section-loadcache-

The preload cache logic invokes IgniteCache.loadCache() with some
additional arguments, depending on a CacheStore implementation, and then
the loading occurs in the way I've already described.


2016-11-15 11:26 GMT+03:00 Vladimir Ozerov <voze...@gridgain.com>:

> Hi Alex,
>
> >>> Let's give the user the reusable code which is convenient, reliable and
> fast.
> Convenience - this is why I asked for example on how API can look like and
> how users are going to use it.
>
> Vladimir.
>
> On Tue, Nov 15, 2016 at 11:18 AM, Alexandr Kuramshin <ein.nsk...@gmail.com
> >
> wrote:
>
> > Hi all,
> >
> > I think the discussion goes a wrong direction. Certainly it's not a big
> > deal to implement some custom user logic to load the data into caches.
> But
> > Ignite framework gives the user some reusable code build on top of the
> > basic system.
> >
> > So the main question is: Why developers let the user to use convenient
> way
> > to load caches with totally non-optimal solution?
> >
> > We could talk too much about different persistence storage types, but
> > whenever we initiate the loading with IgniteCache.loadCache the current
> > implementation imposes much overhead on the network.
> >
> > Partition-aware data loading may be used in some scenarios to avoid this
> > network overhead, but the users are compelled to do additional steps to
> > achieve this optimization: adding the column to tables, adding compound
> > indices including the added column, write a peace of repeatable code to
> > load the data in different caches in fault-tolerant fashion, etc.
> >
> > Let's give the user the reusable code which is convenient, reliable and
> > fast.
> >
> > 2016-11-14 20:56 GMT+03:00 Valentin Kulichenko <
> > valentin.kuliche...@gmail.com>:
> >
> > > Hi Aleksandr,
> > >
> > > Data streamer is already outlined as one of the possible approaches for
> > > loading the data [1]. Basically, you start a designated client node or
> > > chose a leader among server nodes [1] and then use IgniteDataStreamer
> API
> > > to load the data. With this approach there is no need to have the
> > > CacheStore implementation at all. Can you please elaborate what
> > additional
> > > value are you trying to add here?
> > >
> > > [1] https://apacheignite.readme.io/docs/data-loading#
> ignitedatastreamer
> > > [2] https://apacheignite.readme.io/docs/leader-election
> > >
> > > -Val
> > >
> > > On Mon, Nov 14, 2016 at 8:23 AM, Dmitriy Setrakyan <
> > dsetrak...@apache.org>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > I just want to clarify a couple of API details from the original
> email
> > to
> > > > make sure that we are making the right assumptions here.
> > > >
> > > > *"Because of none keys are passed to the CacheStore.loadCache
> methods,
> > > the
> > > > > underlying implementation is forced to read all the data from the
> > > > > persistence storage"*
> > > >
> > > >
> > > > According to the javadoc, loadCache(...) method receives an optional
> > > > argument from the user. You can pass anything you like, including a
> > list
> > > of
> > > > keys, or an SQL where clause, etc.
> > > >
> > > > *"The partition-aware data loading approach is not a choice. It
> > requires
> > > > > persistence of the volatile data depended on affinity function
> > > > > implementation and settings."*
> > > >
> > > >
> > > > This is only partially true. While Ignite allows to plugin custom
> > > affinity
> > > > functions, the affinity function is not something that changes
> > > dynamically
> > > > and should always return the same partition for the same key.So, the
> > > > partition assignments are not volatile at all. If, in some very rare
> > > case,
> > > > the partition assignment logic needs to change, then you could update
> > the
> > > > partition assignments that you may have persisted elsewhere as well,
> > e.g.
> > > > database.
> > > >
> > > > D.
> > > >
> > > > On Mon, Nov 14, 2016 at 10:23 AM, Vladimir

Re: IgniteCache.loadCache improvement proposal

2016-11-15 Thread Alexandr Kuramshin
> > > Looks good for me.
> > > >
> > > > But I will suggest to consider one more use-case:
> > > >
> > > > If user knows its data he could manually split loading.
> > > > For example: table Persons contains 10M rows.
> > > > User could provide something like:
> > > > cache.loadCache(null, "Person", "select * from Person where id <
> > > > 1_000_000",
> > > > "Person", "select * from Person where id >=  1_000_000 and id <
> > > 2_000_000",
> > > > 
> > > > "Person", "select * from Person where id >= 9_000_000 and id <
> > > 10_000_000",
> > > > );
> > > >
> > > > or may be it could be some descriptor object like
> > > >
> > > >  {
> > > >sql: select * from Person where id >=  ? and id < ?"
> > > >range: 0...10_000_000
> > > > }
> > > >
> > > > In this case provided queries will be send to mach nodes as number of
> > > > queries.
> > > > And data will be loaded in parallel and for keys that a not local -
> > data
> > > > streamer
> > > > should be used (as described Alexandr description).
> > > >
> > > > I think it is a good issue for Ignite 2.0
> > > >
> > > > Vova, Val - what do you think?
> > > >
> > > >
> > > > On Mon, Nov 14, 2016 at 4:01 PM, Alexandr Kuramshin <
> > > ein.nsk...@gmail.com>
> > > > wrote:
> > > >
> > > >> All right,
> > > >>
> > > >> Let's assume a simple scenario. When the IgniteCache.loadCache is
> > > invoked,
> > > >> we check whether the cache is not local, and if so, then we'll
> > initiate
> > > >> the
> > > >> new loading logic.
> > > >>
> > > >> First, we take a "streamer" node, it could be done by
> > > >> utilizing LoadBalancingSpi, or it may be configured statically, for
> > the
> > > >> reason that the streamer node is running on the same host as the
> > > >> persistence storage provider.
> > > >>
> > > >> After that we start the loading task on the streamer node which
> > > >> creates IgniteDataStreamer and loads the cache with
> > > CacheStore.loadCache.
> > > >> Every call to IgniteBiInClosure.apply simply
> > > >> invokes IgniteDataStreamer.addData.
> > > >>
> > > >> This implementation will completely relieve overhead on the
> > persistence
> > > >> storage provider. Network overhead is also decreased in the case of
> > > >> partitioned caches. For two nodes we get 1-1/2 amount of data
> > > transferred
> > > >> by the network (1 part well be transferred from the persistence
> > storage
> > > to
> > > >> the streamer, and then 1/2 from the streamer node to the another
> > node).
> > > >> For
> > > >> three nodes it will be 1-2/3 and so on, up to the two times amount
> of
> > > data
> > > >> on the big clusters.
> > > >>
> > > >> I'd like to propose some additional optimization at this place. If
> we
> > > have
> > > >> the streamer node on the same machine as the persistence storage
> > > provider,
> > > >> then we completely relieve the network overhead as well. It could
> be a
> > > >> some
> > > >> special daemon node for the cache loading assigned in the cache
> > > >> configuration, or an ordinary sever node as well.
> > > >>
> > > >> Certainly this calculations have been done in assumption that we
> have
> > > even
> > > >> partitioned cache with only primary nodes (without backups). In the
> > case
> > > >> of
> > > >> one backup (the most frequent case I think), we get 2 amount of data
> > > >> transferred by the network on two nodes, 2-1/3 on three, 2-1/2 on
> > four,
> > > >> and
> > > >> so on up to the three times amount of data on the big clusters.
> Hence
> > > it's
> > > >> still better than the current implementation. In the worst case
> with a
> > > >> fully replicated cache we take N+1 amount of data transferred by the
> > > >

Re: IgniteCache.loadCache improvement proposal

2016-11-14 Thread Alexandr Kuramshin
All right,

Let's assume a simple scenario. When the IgniteCache.loadCache is invoked,
we check whether the cache is not local, and if so, then we'll initiate the
new loading logic.

First, we take a "streamer" node, it could be done by
utilizing LoadBalancingSpi, or it may be configured statically, for the
reason that the streamer node is running on the same host as the
persistence storage provider.

After that we start the loading task on the streamer node which
creates IgniteDataStreamer and loads the cache with CacheStore.loadCache.
Every call to IgniteBiInClosure.apply simply
invokes IgniteDataStreamer.addData.

This implementation will completely relieve overhead on the persistence
storage provider. Network overhead is also decreased in the case of
partitioned caches. For two nodes we get 1-1/2 amount of data transferred
by the network (1 part well be transferred from the persistence storage to
the streamer, and then 1/2 from the streamer node to the another node). For
three nodes it will be 1-2/3 and so on, up to the two times amount of data
on the big clusters.

I'd like to propose some additional optimization at this place. If we have
the streamer node on the same machine as the persistence storage provider,
then we completely relieve the network overhead as well. It could be a some
special daemon node for the cache loading assigned in the cache
configuration, or an ordinary sever node as well.

Certainly this calculations have been done in assumption that we have even
partitioned cache with only primary nodes (without backups). In the case of
one backup (the most frequent case I think), we get 2 amount of data
transferred by the network on two nodes, 2-1/3 on three, 2-1/2 on four, and
so on up to the three times amount of data on the big clusters. Hence it's
still better than the current implementation. In the worst case with a
fully replicated cache we take N+1 amount of data transferred by the
network (where N is the number of nodes in the cluster). But it's not a
problem in small clusters, and a little overhead in big clusters. And we
still gain the persistence storage provider optimization.

Now let's take more complex scenario. To achieve some level of parallelism,
we could split our cluster on several groups. It could be a parameter of
the IgniteCache.loadCache method or a cache configuration option. The
number of groups could be a fixed value, or it could be calculated
dynamically by the maximum number of nodes in the group.

After splitting the whole cluster on groups we will take the streamer node
in the each group and submit the task for loading the cache similar to the
single streamer scenario, except as the only keys will be passed to
the IgniteDataStreamer.addData method those correspond to the cluster group
where is the streamer node running.

In this case we get equal level of overhead as the parallelism, but not so
surplus as how many nodes in whole the cluster.

2016-11-11 15:37 GMT+03:00 Alexey Kuznetsov <akuznet...@apache.org>:

> Alexandr,
>
> Could you describe your proposal in more details?
> Especially in case with several nodes.
>
> On Fri, Nov 11, 2016 at 6:34 PM, Alexandr Kuramshin <ein.nsk...@gmail.com>
> wrote:
>
> > Hi,
> >
> > You know CacheStore API that is commonly used for read/write-through
> > relationship of the in-memory data with the persistence storage.
> >
> > There is also IgniteCache.loadCache method for hot-loading the cache on
> > startup. Invocation of this method causes execution of
> CacheStore.loadCache
> > on the all nodes storing the cache partitions. Because of none keys are
> > passed to the CacheStore.loadCache methods, the underlying implementation
> > is forced to read all the data from the persistence storage, but only
> part
> > of the data will be stored on each node.
> >
> > So, the current implementation have two general drawbacks:
> >
> > 1. Persistence storage is forced to perform as many identical queries as
> > many nodes on the cluster. Each query may involve much additional
> > computation on the persistence storage server.
> >
> > 2. Network is forced to transfer much more data, so obviously the big
> > disadvantage on large systems.
> >
> > The partition-aware data loading approach, described in
> > https://apacheignite.readme.io/docs/data-loading#section-
> > partition-aware-data-loading
> > , is not a choice. It requires persistence of the volatile data depended
> on
> > affinity function implementation and settings.
> >
> > I propose using something like IgniteDataStreamer inside
> > IgniteCache.loadCache implementation.
> >
> >
> > --
> > Thanks,
> > Alexandr Kuramshin
> >
>
>
>
> --
> Alexey Kuznetsov
>



-- 
Thanks,
Alexandr Kuramshin


IgniteCache.loadCache improvement proposal

2016-11-11 Thread Alexandr Kuramshin
Hi,

You know CacheStore API that is commonly used for read/write-through
relationship of the in-memory data with the persistence storage.

There is also IgniteCache.loadCache method for hot-loading the cache on
startup. Invocation of this method causes execution of CacheStore.loadCache
on the all nodes storing the cache partitions. Because of none keys are
passed to the CacheStore.loadCache methods, the underlying implementation
is forced to read all the data from the persistence storage, but only part
of the data will be stored on each node.

So, the current implementation have two general drawbacks:

1. Persistence storage is forced to perform as many identical queries as
many nodes on the cluster. Each query may involve much additional
computation on the persistence storage server.

2. Network is forced to transfer much more data, so obviously the big
disadvantage on large systems.

The partition-aware data loading approach, described in
https://apacheignite.readme.io/docs/data-loading#section-partition-aware-data-loading
, is not a choice. It requires persistence of the volatile data depended on
affinity function implementation and settings.

I propose using something like IgniteDataStreamer inside
IgniteCache.loadCache implementation.


-- 
Thanks,
Alexandr Kuramshin