[jira] [Created] (IGNITE-7788) Data loss after cold restart with PDS and cache group change
Alexandr Kuramshin created IGNITE-7788: -- Summary: Data loss after cold restart with PDS and cache group change Key: IGNITE-7788 URL: https://issues.apache.org/jira/browse/IGNITE-7788 Project: Ignite Issue Type: Bug Components: persistence Affects Versions: 2.3 Reporter: Alexandr Kuramshin Reproduced by improved test {{IgnitePdsCacheRestoreTest.testRestoreAndNewCache6}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-7723) Data loss after node restart with PDS
Alexandr Kuramshin created IGNITE-7723: -- Summary: Data loss after node restart with PDS Key: IGNITE-7723 URL: https://issues.apache.org/jira/browse/IGNITE-7723 Project: Ignite Issue Type: Bug Components: general, persistence Affects Versions: 2.3 Reporter: Alexandr Kuramshin Attachments: IgnitePdsDataLossTest.java Split-brain scenario with topology validator is used to convince possible data loss. The same results may be achieved on accidental network problems combined with node restart. See the reproducer {{IgnitePdsDataLossTest}} for details. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-7634) Wrong NodeStoppingException on destroying cache
Alexandr Kuramshin created IGNITE-7634: -- Summary: Wrong NodeStoppingException on destroying cache Key: IGNITE-7634 URL: https://issues.apache.org/jira/browse/IGNITE-7634 Project: Ignite Issue Type: Bug Components: cache Affects Versions: 2.3 Reporter: Alexandr Kuramshin Get multiple {{NodeStoppingException}} on concurrent cache operations actually meaning the cache destroying {noformat} Error during parallel index create/rebuild. org.apache.ignite.internal.NodeStoppingException: Operation has been cancelled (node is stopping). at org.apache.ignite.internal.processors.cache.query.GridCacheQueryManager.store(GridCacheQueryManager.java:393) at org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing$RebuldIndexFromHashClosure.apply(IgniteH2Indexing.java:2635) at org.apache.ignite.internal.processors.cache.GridCacheMapEntry.updateIndex(GridCacheMapEntry.java:3305) at org.apache.ignite.internal.processors.query.schema.SchemaIndexCacheVisitorImpl.processKey(SchemaIndexCacheVisitorImpl.java:243) at org.apache.ignite.internal.processors.query.schema.SchemaIndexCacheVisitorImpl.processPartition(SchemaIndexCacheVisitorImpl.java:206) at org.apache.ignite.internal.processors.query.schema.SchemaIndexCacheVisitorImpl.processPartitions(SchemaIndexCacheVisitorImpl.java:165) at org.apache.ignite.internal.processors.query.schema.SchemaIndexCacheVisitorImpl.access$100(SchemaIndexCacheVisitorImpl.java:50) at org.apache.ignite.internal.processors.query.schema.SchemaIndexCacheVisitorImpl$AsyncWorker.body(SchemaIndexCacheVisitorImpl.java:316) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) at java.lang.Thread.run(Thread.java:745) {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-7633) Multiple errors on accessing page store while destroying cache
Alexandr Kuramshin created IGNITE-7633: -- Summary: Multiple errors on accessing page store while destroying cache Key: IGNITE-7633 URL: https://issues.apache.org/jira/browse/IGNITE-7633 Project: Ignite Issue Type: Bug Affects Versions: 2.3 Reporter: Alexandr Kuramshin A single common exception {noformat} Partition eviction failed, this can cause grid hang. org.apache.ignite.IgniteException: Failed to get page store for the given cache ID (cache has not been started): -1903385190 at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.destroyCacheDataStore(IgniteCacheOffheapManagerImpl.java:931) at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition.destroyCacheDataStore(GridDhtLocalPartition.java:772) at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition.finishDestroy(GridDhtLocalPartition.java:730) at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition.clearEvicting(GridDhtLocalPartition.java:702) at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition.tryEvict(GridDhtLocalPartition.java:762) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$3.call(GridDhtPreloader.java:593) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$3.call(GridDhtPreloader.java:580) at org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:6639) at org.apache.ignite.internal.processors.closure.GridClosureProcessor$2.body(GridClosureProcessor.java:967) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.ignite.IgniteCheckedException: Failed to get page store for the given cache ID (cache has not been started): -1903385190 at org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getStore(FilePageStoreManager.java:670) at org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.onPartitionDestroyed(FilePageStoreManager.java:268) at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager.destroyCacheDataStore0(GridCacheOffheapManager.java:494) at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.destroyCacheDataStore(IgniteCacheOffheapManagerImpl.java:928) ... 12 common frames omitted {noformat} And multiple another for many pages {noformat} There was an exception while updating tracking page: 000119a20001 org.apache.ignite.IgniteCheckedException: Failed to get page store for the given cache ID (cache has not been started): -1903385190 at org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getStore(FilePageStoreManager.java:670) at org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.read(FilePageStoreManager.java:290) at org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.read(FilePageStoreManager.java:277) at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.acquirePage(PageMemoryImpl.java:608) at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.acquirePage(PageMemoryImpl.java:528) at org.gridgain.grid.internal.processors.cache.database.GridCacheSnapshotManager.onChangeTrackerPage(GridCacheSnapshotManager.java:1921) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$9.applyx(GridCacheDatabaseSharedManager.java:966) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$9.applyx(GridCacheDatabaseSharedManager.java:959) at org.apache.ignite.internal.util.lang.GridInClosure3X.apply(GridInClosure3X.java:34) at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.writeUnlockPage(PageMemoryImpl.java:1274) at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.writeUnlock(PageMemoryImpl.java:419) at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.writeUnlock(PageMemoryImpl.java:413) at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager.saveStoreMetadata(GridCacheOffheapManager.java:304) at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager.destroyCacheDataStore0
[jira] [Created] (IGNITE-7632) NPE in IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.updateIgfsMetrics()
Alexandr Kuramshin created IGNITE-7632: -- Summary: NPE in IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.updateIgfsMetrics() Key: IGNITE-7632 URL: https://issues.apache.org/jira/browse/IGNITE-7632 Project: Ignite Issue Type: Bug Components: cache Affects Versions: 2.3 Reporter: Alexandr Kuramshin Occurs on destroying cache while rebuilding indices in progress {noformat} Partition eviction failed, this can cause grid hang. java.lang.NullPointerException: null at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.updateIgfsMetrics(IgniteCacheOffheapManagerImpl.java:1576) at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.finishRemove(IgniteCacheOffheapManagerImpl.java:1403) at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.remove(IgniteCacheOffheapManagerImpl.java:1368) at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.remove(GridCacheOffheapManager.java:1312) at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.remove(IgniteCacheOffheapManagerImpl.java:368) at org.apache.ignite.internal.processors.cache.GridCacheMapEntry.removeValue(GridCacheMapEntry.java:3224) at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheEntry.clearInternal(GridDhtCacheEntry.java:588) at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition.clearAll(GridDhtLocalPartition.java:895) at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition.tryEvict(GridDhtLocalPartition.java:753) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$3.call(GridDhtPreloader.java:593) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$3.call(GridDhtPreloader.java:580) at org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:6639) at org.apache.ignite.internal.processors.closure.GridClosureProcessor$2.body(GridClosureProcessor.java:967) ... {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-7631) Failed to clear page memory with AssertionError: Release pinned page
Alexandr Kuramshin created IGNITE-7631: -- Summary: Failed to clear page memory with AssertionError: Release pinned page Key: IGNITE-7631 URL: https://issues.apache.org/jira/browse/IGNITE-7631 Project: Ignite Issue Type: Bug Components: cache Affects Versions: 2.3 Reporter: Alexandr Kuramshin The following scenario produces a problem: # Cluster was started and activated. # Snapshot has been restored. # Rebuilding indexes in progress. # Caches destroyed. # Multiple NPE exceptions occurs. # The following exception occurs: {noformat} Failed to clear page memory org.apache.ignite.IgniteCheckedException: Compound exception for CountDownFuture. at org.apache.ignite.internal.util.future.CountDownFuture.addError(CountDownFuture.java:72) at org.apache.ignite.internal.util.future.CountDownFuture.onDone(CountDownFuture.java:46) at org.apache.ignite.internal.util.future.CountDownFuture.onDone(CountDownFuture.java:28) at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:462) at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl$ClearSegmentRunnable.run(PageMemoryImpl.java:2449) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Suppressed: java.lang.AssertionError: Release pinned page: FullPageId [pageId=000100f40007, effectivePageId=00f40007, grpId=321390040] at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl$PagePool.releaseFreePage(PageMemoryImpl.java:1593) at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl$PagePool.access$1900(PageMemoryImpl.java:1465) at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl$ClearSegmentRunnable.run(PageMemoryImpl.java:2440) ... 3 common frames omitted Suppressed: java.lang.AssertionError: Release pinned page: FullPageId [pageId=000200019986, effectivePageId=00019986, grpId=-1903385190] at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl$PagePool.releaseFreePage(PageMemoryImpl.java:1593) at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl$PagePool.access$1900(PageMemoryImpl.java:1465) at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl$ClearSegmentRunnable.run(PageMemoryImpl.java:2440) ... 3 common frames omitted Suppressed: java.lang.AssertionError: Release pinned page: FullPageId [pageId=0002c85c, effectivePageId=c85c, grpId=-1903385190] at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl$PagePool.releaseFreePage(PageMemoryImpl.java:1593) at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl$PagePool.access$1900(PageMemoryImpl.java:1465) at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl$ClearSegmentRunnable.run(PageMemoryImpl.java:2440) ... 3 common frames omitted Suppressed: java.lang.AssertionError: Release pinned page: FullPageId [pageId=000232da, effectivePageId=32da, grpId=321390040] at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl$PagePool.releaseFreePage(PageMemoryImpl.java:1593) at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl$PagePool.access$1900(PageMemoryImpl.java:1465) at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl$ClearSegmentRunnable.run(PageMemoryImpl.java:2440) ... 3 common frames omitted Suppressed: java.lang.AssertionError: Release pinned page: FullPageId [pageId=000200011d30, effectivePageId=00011d30, grpId=-1903385190] at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl$PagePool.releaseFreePage(PageMemoryImpl.java:1593) at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl$PagePool.access$1900(PageMemoryImpl.java:1465) at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl$ClearSegmentRunnable.run(PageMemoryImpl.java:2440) ... 3 common frames omitted Suppressed: java.lang.AssertionError: Release pinned page: FullPageId [pageId=0002d346, effectivePageId=d346, grpId=-1903385190
[jira] [Created] (IGNITE-7630) NPE in SchemaIndexCacheVisitorImpl.processKey()
Alexandr Kuramshin created IGNITE-7630: -- Summary: NPE in SchemaIndexCacheVisitorImpl.processKey() Key: IGNITE-7630 URL: https://issues.apache.org/jira/browse/IGNITE-7630 Project: Ignite Issue Type: Bug Affects Versions: 2.3 Reporter: Alexandr Kuramshin Occurs after destroying cache while rebuilding indices in progress {noformat} [Thread] parallel-idx-worker-GridDhtColocatedCache [...] [Emitter] o.a.i.i.p.q.s.SchemaIndexCacheVisitorImpl$AsyncWorker [Message] Error during parallel index create/rebuild. java.lang.NullPointerException: null at org.apache.ignite.internal.processors.query.schema.SchemaIndexCacheVisitorImpl.processKey(SchemaIndexCacheVisitorImpl.java:246) at org.apache.ignite.internal.processors.query.schema.SchemaIndexCacheVisitorImpl.processPartition(SchemaIndexCacheVisitorImpl.java:206) at org.apache.ignite.internal.processors.query.schema.SchemaIndexCacheVisitorImpl.processPartitions(SchemaIndexCacheVisitorImpl.java:165) at org.apache.ignite.internal.processors.query.schema.SchemaIndexCacheVisitorImpl.access$100(SchemaIndexCacheVisitorImpl.java:50) at org.apache.ignite.internal.processors.query.schema.SchemaIndexCacheVisitorImpl$AsyncWorker.body(SchemaIndexCacheVisitorImpl.java:316) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) at java.lang.Thread.run(Thread.java:745) {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-7629) NPE when Finished indexes rebuilding for cache
Alexandr Kuramshin created IGNITE-7629: -- Summary: NPE when Finished indexes rebuilding for cache Key: IGNITE-7629 URL: https://issues.apache.org/jira/browse/IGNITE-7629 Project: Ignite Issue Type: Bug Affects Versions: 2.3 Reporter: Alexandr Kuramshin Occurs after destroying cache while rebuilding indices in progress {noformat} Runtime error caught during grid runnable execution: GridWorker [name=index-rebuild-worker, igniteInstanceName=DPL_GRID%DplGridNodeName, finished=false, hashCode=1940633631, interrupted=false, runner=pub-#2054%DPL_GRID%DplGridNodeName%] java.lang.NullPointerException: null at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$11.apply(GridCacheDatabaseSharedManager.java:1163) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$11.apply(GridCacheDatabaseSharedManager.java:1159) at org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:383) at org.apache.ignite.internal.util.future.GridFutureAdapter.unblock(GridFutureAdapter.java:347) at org.apache.ignite.internal.util.future.GridFutureAdapter.unblockAll(GridFutureAdapter.java:335) at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:495) at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:474) at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:462) at org.apache.ignite.internal.util.future.GridCompoundFuture.apply(GridCompoundFuture.java:125) at org.apache.ignite.internal.util.future.GridCompoundFuture.apply(GridCompoundFuture.java:45) at org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:383) at org.apache.ignite.internal.util.future.GridFutureAdapter.unblock(GridFutureAdapter.java:347) at org.apache.ignite.internal.util.future.GridFutureAdapter.unblockAll(GridFutureAdapter.java:335) at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:495) at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:474) at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:462) at org.apache.ignite.internal.processors.query.GridQueryProcessor$3.body(GridQueryProcessor.java:1678) ... {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-7579) NPE in GridDhtLocalPartition.cacheMapHolder()
Alexandr Kuramshin created IGNITE-7579: -- Summary: NPE in GridDhtLocalPartition.cacheMapHolder() Key: IGNITE-7579 URL: https://issues.apache.org/jira/browse/IGNITE-7579 Project: Ignite Issue Type: Bug Affects Versions: 2.3 Reporter: Alexandr Kuramshin The following scenario may occurs: # Multiple nodes construct an inactive cluster. # Cluster activation performed. # Some nodes fail activation. # On the other nodes caches will be stopped. # NPE occurs as a consequence of {{GridDhtPreloader.evictPartitionAsync()}} {noformat} Partition eviction failed, this can cause grid hang. java.lang.NullPointerException: null at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition.cacheMapHolder(GridDhtLocalPartition.java:253) at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition.clearAll(GridDhtLocalPartition.java:880) at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition.tryEvict(GridDhtLocalPartition.java:753) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$3.call(GridDhtPreloader.java:593) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$3.call(GridDhtPreloader.java:580) at org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:6639) at org.apache.ignite.internal.processors.closure.GridClosureProcessor$2.body(GridClosureProcessor.java:967) {noformat} # Drop failed nodes from the cluster. # The latter activation will be successful. # PDS seems to be corrupted by the cause of NPE. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-7383) Failed to restore memory after cluster restart and activating from outdated node
Alexandr Kuramshin created IGNITE-7383: -- Summary: Failed to restore memory after cluster restart and activating from outdated node Key: IGNITE-7383 URL: https://issues.apache.org/jira/browse/IGNITE-7383 Project: Ignite Issue Type: Bug Components: persistence Affects Versions: 2.3 Reporter: Alexandr Kuramshin Do the following steps for reproducing the problem: 1) start nodes 0-1-2 2) stop node 2 3) create a new cache and put some data into it 4) stop remaining nodes 0-1 5) start nodes 0-1-2 6) activate the cluster from the node 2 Then 2 different results could be taken depending on which node is coordinator: a) node 2 is a coordinator: {noformat} Failed to activate node components [nodeId=42d762c7-b1e0-4283-939b-aeeb3c70, client=false, topVer=AffinityTopologyVersion [topVer=3, minorTopVer=1]] class org.apache.ignite.IgniteCheckedException: Failed to find cache group descriptor [grpId=3119] at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.getPageMemoryForCacheGroup(GridCacheDatabaseSharedManager.java:1602) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restoreMemory(GridCacheDatabaseSharedManager.java:1544) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readCheckpointAndRestoreMemory(GridCacheDatabaseSharedManager.java:570) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onClusterStateChangeRequest(GridDhtPartitionsExchangeFuture.java:820) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:583) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2279) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) at java.lang.Thread.run(Thread.java:748) {noformat} and activation will be failed. b) node 2 is NOT a coordinator: we will get an error from the previous version, but the activation process will not be failed and then we will take "Failed to wait PME" after a number of assertions {noformat} Failed to process message [senderId=a940742f-bf17-41b4-bfc2-728bee72, messageType=class o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionsSingleMessage] java.lang.AssertionError: -2100569601 at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.clientTopology(GridCachePartitionExchangeManager.java:733) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.updatePartitionSingleMap(GridDhtPartitionsExchangeFuture.java:2877) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.processSingleMessage(GridDhtPartitionsExchangeFuture.java:1935) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.access$100(GridDhtPartitionsExchangeFuture.java:116) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$2.apply(GridDhtPartitionsExchangeFuture.java:1810) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$2.apply(GridDhtPartitionsExchangeFuture.java:1798) at org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:383) at org.apache.ignite.internal.util.future.GridFutureAdapter.listen(GridFutureAdapter.java:353) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onReceiveSingleMessage(GridDhtPartitionsExchangeFuture.java:1798) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.processSinglePartitionUpdate(GridCachePartitionExchangeManager.java:1484) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.access$1000(GridCachePartitionExchangeManager.java:131) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$2.onMessage(GridCachePartitionExchangeManager.java:327) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$2.onMessage(GridCachePartitionExchangeManager.java:307) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$MessageHandler.apply(GridCachePartitionExchangeManager.java:2627) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$MessageHandler.apply(GridCachePartitionExchangeManager
Re: IGNITE-7135 needs review
Ticket was assigned to me until the patch available state has been reached. Till now no one committer take responsibility to review and merge the PR. 2017-12-26 21:08 GMT+07:00 Dmitriy Setrakyan <dsetrak...@apache.org>: > Here is the link to the ticket: > https://issues.apache.org/jira/browse/IGNITE-7135 > > For some odd reason, the ticket is in unassigned state. Alexander, > shouldn't it be assigned to you? > > D. > > On Mon, Dec 25, 2017 at 11:58 PM, Alexandr Kuramshin <ein.nsk...@gmail.com > > > wrote: > > > Hello community! > > > > I've implemented IGNITE-7135 doing two improvements: > > > > 1) control remote node startup (successful or not) through > > IgniteCluster.startNodes(); > > > > 2) keep the first Java principle working "Compile once, run everywhere" - > > from now running remotely on Windows also supported. > > > > Committers, please review. > > > > -- > > Thanks, > > Alexandr Kuramshin > > > -- Thanks, Alexandr Kuramshin
IGNITE-7135 needs review
Hello community! I've implemented IGNITE-7135 doing two improvements: 1) control remote node startup (successful or not) through IgniteCluster.startNodes(); 2) keep the first Java principle working "Compile once, run everywhere" - from now running remotely on Windows also supported. Committers, please review. -- Thanks, Alexandr Kuramshin
[jira] [Created] (IGNITE-7163) Validate connection from a pre-previous node
Alexandr Kuramshin created IGNITE-7163: -- Summary: Validate connection from a pre-previous node Key: IGNITE-7163 URL: https://issues.apache.org/jira/browse/IGNITE-7163 Project: Ignite Issue Type: Sub-task Affects Versions: 2.3 Reporter: Alexandr Kuramshin Assignee: Alexandr Kuramshin If some pre-previous node connects to the local node with the previous node in the message's failed nodes collection additional steps should be done: # Connection with the previous node should be validated. # If a message from the previous node was not received a long time ago, the previous node should be considered as failed and the pre-previous node connection accepted. # If the previous node connection is alive then different scenarios possible ## Answer with a new result code causing the pre-previous node to try to reconnect to the previous node ## Break connection with the pre-previous node causing to continue the possible cluster split. ## Check connections with nodes after pre-previous node and delay decision by answering RES_WAIT to get more predictable split and stable topology. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (IGNITE-7162) Control discovery messages processing time
Alexandr Kuramshin created IGNITE-7162: -- Summary: Control discovery messages processing time Key: IGNITE-7162 URL: https://issues.apache.org/jira/browse/IGNITE-7162 Project: Ignite Issue Type: Sub-task Components: general Affects Versions: 2.3 Reporter: Alexandr Kuramshin Assignee: Alexandr Kuramshin The majority of discovery message processing occurs in a single thread. If some message processing takes significant time it causes delaying of processing other messages and further undesirable effects on another protocols. Proposed to control processing time on the every node and total processing time of any given message. If processing takes significant time - log the warning. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (IGNITE-7161) Detect self-freeze on remote node related operations with timeout
Alexandr Kuramshin created IGNITE-7161: -- Summary: Detect self-freeze on remote node related operations with timeout Key: IGNITE-7161 URL: https://issues.apache.org/jira/browse/IGNITE-7161 Project: Ignite Issue Type: Sub-task Affects Versions: 2.3 Reporter: Alexandr Kuramshin Assignee: Alexandr Kuramshin After getting next timeout from {{IgniteSpiOperationTimeoutHelper.nextTimeoutChunk()}} we starting a network operation and expecting to end it at the specific timestamp (or near about). We should take into account that some local thread freeze may be occurred. In such situation a remote node should not be considered as failed and the local network operation has to be retried. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (IGNITE-7160) Ignore messages from not alive and failed nodes
Alexandr Kuramshin created IGNITE-7160: -- Summary: Ignore messages from not alive and failed nodes Key: IGNITE-7160 URL: https://issues.apache.org/jira/browse/IGNITE-7160 Project: Ignite Issue Type: Sub-task Components: general Affects Versions: 2.3 Reporter: Alexandr Kuramshin Assignee: Alexandr Kuramshin Current implementation of {{ServerImpl}} accepts and processes messages from any other remote node even it was failed or removed from the ring. Proposed to process only specific messages (which have to be processed in the current node state). Some messages could be silently ignored, receiving other undesirable messages causes the remote socket disconnect. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (IGNITE-7158) TCP discovery improvement
Alexandr Kuramshin created IGNITE-7158: -- Summary: TCP discovery improvement Key: IGNITE-7158 URL: https://issues.apache.org/jira/browse/IGNITE-7158 Project: Ignite Issue Type: Improvement Components: general Affects Versions: 2.3 Reporter: Alexandr Kuramshin Assignee: Alexandr Kuramshin Current TCP discovery implementation has different drawbacks which should be fixed. See sub-tasks for details. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (IGNITE-7152) Failure detection timeout don't work on permanent send message errors causing infinite loop
Alexandr Kuramshin created IGNITE-7152: -- Summary: Failure detection timeout don't work on permanent send message errors causing infinite loop Key: IGNITE-7152 URL: https://issues.apache.org/jira/browse/IGNITE-7152 Project: Ignite Issue Type: Bug Components: general Affects Versions: 2.3 Reporter: Alexandr Kuramshin Priority: Critical Fix For: 2.4 Relative to {{RingMessageWorker.sendMessageAcrossRing}} implementation. {{IgniteSpiOperationTimeoutHelper}} reinitialized every time the socket successfully connected. If any of {{IOException, IgniteCheckedException}} occurs upon message send the socket will be closed and old {{IgniteSpiOperationTimeoutHelper}} will be used to reconnect. But after successful reconnect the new one will be created and the cycle repeat. With a permanent send message error this causes an infinite loop. The only send error which may cause to exit out of the loop and the next node failure is {{IgniteSpiOperationTimeoutException, SocketTimeoutException, SocketException}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (IGNITE-7135) IgniteCluster.startNodes() returns successful ClusterStartNodeResult even though the remote process fails
Alexandr Kuramshin created IGNITE-7135: -- Summary: IgniteCluster.startNodes() returns successful ClusterStartNodeResult even though the remote process fails Key: IGNITE-7135 URL: https://issues.apache.org/jira/browse/IGNITE-7135 Project: Ignite Issue Type: Bug Affects Versions: 2.3 Reporter: Alexandr Kuramshin Fix For: 2.4 After unsuccessful start of three remote nodes with {{IgniteCluster#startNodes(Collection<Map<String,Object>>, Map<String,Object>, boolean, int, int)}} we get {{Collection}} with three elements, each has {{isSuccess()}} is true. But the remote node startup log was {noformat} nohup: ignoring input /data/teamcity/work/820be461cd64b574/bin/ignite.sh, ERROR: The version of JAVA installed in JAVA_HOME=/usr/lib/jvm/java-9-oracle is incorrect. Please point JAVA_HOME variable to installation of JDK 1.7 or JDK 1.8. You can also download latest JDK at http://java.com/download {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (IGNITE-7134) Never-ending timeout in IgniteSpiOperationTimeoutHelper.nextTimeoutChunk()
Alexandr Kuramshin created IGNITE-7134: -- Summary: Never-ending timeout in IgniteSpiOperationTimeoutHelper.nextTimeoutChunk() Key: IGNITE-7134 URL: https://issues.apache.org/jira/browse/IGNITE-7134 Project: Ignite Issue Type: Bug Components: general Affects Versions: 2.3 Reporter: Alexandr Kuramshin Priority: Critical Fix For: 2.4 {noformat} org.apache.ignite.spi.IgniteSpiOperationTimeoutHelper#nextTimeoutChunk long curTs = U.currentTimeMillis(); timeout = timeout - (curTs - lastOperStartTs); {noformat} Timeout will not be decreased at all if delay between successive calls to nextTimeoutChunk() is smaller than U.currentTimeMillis() discretization. Such behaviour could be easily achieved when getting an error right after the nextTimeoutChunk() invocation and do the retry. Only rare calls (the first right before U.currentTimeMillis() and the second right after that) may decrease timeout, so actual IgniteSpiOperationTimeoutHelper timeout could be much bigger than the failureDetectionTimeout. My opinion to not split failureDetectionTimeout between network operations, but initialize first operation timestamp at first call to nextTimeoutChunk(), and then calculate the timeout as a difference between the current timestamp and the first operation timestamp. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (IGNITE-6967) PME deadlock on reassigning service deployment
Alexandr Kuramshin created IGNITE-6967: -- Summary: PME deadlock on reassigning service deployment Key: IGNITE-6967 URL: https://issues.apache.org/jira/browse/IGNITE-6967 Project: Ignite Issue Type: Bug Components: general Affects Versions: 2.3 Reporter: Alexandr Kuramshin With a service deployment when topology change occurs the discovery event listener calls {{GridServiceProcessor.reassign()}} causing to acquire a lock on utility cache (where the GridServiceAssignments stored) which prevents PME from completion. Stack traces: {{noformat}} Thread [name="test-runner-#186%service.IgniteServiceDynamicCachesSelfTest%", id=232, state=WAITING, blockCnt=0, waitCnt=8] at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304) at o.a.i.i.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:177) at o.a.i.i.util.future.GridFutureAdapter.get(GridFutureAdapter.java:140) at o.a.i.i.IgniteKernal.createCache(IgniteKernal.java:2841) at o.a.i.i.processors.service.IgniteServiceDynamicCachesSelfTest.testDeployCalledBeforeCacheStart(IgniteServiceDynamicCachesSelfTest.java:140) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at junit.framework.TestCase.runTest(TestCase.java:176) at o.a.i.testframework.junits.GridAbstractTest.runTestInternal(GridAbstractTest.java:2000) at o.a.i.testframework.junits.GridAbstractTest.access$000(GridAbstractTest.java:132) at o.a.i.testframework.junits.GridAbstractTest$5.run(GridAbstractTest.java:1915) at java.lang.Thread.run(Thread.java:748) Thread [name="srvc-deploy-#38%service.IgniteServiceDynamicCachesSelfTest0%", id=56, state=WAITING, blockCnt=5, waitCnt=9] at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304) at o.a.i.i.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:177) at o.a.i.i.util.future.GridFutureAdapter.get(GridFutureAdapter.java:140) at o.a.i.i.processors.cache.GridCacheContext.awaitStarted(GridCacheContext.java:443) at o.a.i.i.processors.affinity.GridAffinityProcessor.affinityCache(GridAffinityProcessor.java:373) at o.a.i.i.processors.affinity.GridAffinityProcessor.keysToNodes(GridAffinityProcessor.java:347) at o.a.i.i.processors.affinity.GridAffinityProcessor.mapKeyToNode(GridAffinityProcessor.java:259) at o.a.i.i.processors.service.GridServiceProcessor.reassign(GridServiceProcessor.java:1163) at o.a.i.i.processors.service.GridServiceProcessor.access$2400(GridServiceProcessor.java:123) at o.a.i.i.processors.service.GridServiceProcessor$TopologyListener$1.run0(GridServiceProcessor.java:1763) at o.a.i.i.processors.service.GridServiceProcessor$DepRunnable.run(GridServiceProcessor.java:1976) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748) Locked synchronizers: java.util.concurrent.ThreadPoolExecutor$Worker@27f723 {{noformat}} Problematic code: {{noformat}} org.apache.ignite.internal.processors.service.GridServiceProcessor#reassign try (GridNearTxLocal tx = cache.txStartEx(PESSIMISTIC, REPEATABLE_READ)) { GridServiceAssignmentsKey key = new GridServiceAssignmentsKey(cfg.getName()); GridServiceAssignments oldAssigns = (GridServiceAssignments)cache.get(key); Map<UUID, Integer> cnts = new HashMap<>(); if (affKey != null) { ClusterNode n = ctx.affinity().mapKeyToNode(cacheName, affKey, topVer); // WAIT HERE UNTIL PME FINISHED (INFINITELY) {{noformat}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (IGNITE-6965) affinityCall() with key mapping may not be successful with AlwaysFailoverSpi when node left
Alexandr Kuramshin created IGNITE-6965: -- Summary: affinityCall() with key mapping may not be successful with AlwaysFailoverSpi when node left Key: IGNITE-6965 URL: https://issues.apache.org/jira/browse/IGNITE-6965 Project: Ignite Issue Type: Bug Components: cache, compute Affects Versions: 2.3 Reporter: Alexandr Kuramshin When doing {{affinityCall(cacheName, key, callable)}} there is a race between affinity node left then stopped and {{AlwaysFailoverSpi}} max attempts reached. Suppose the following sequence (more probable when {{grid2.order}} >> {{grid1.order}}): 1. {{grid1.affinityCall(cacheName, key, callable)}} 2. {{grid1}}: {{key}} mapped to the primary partition on {{grid2}} 3. {{grid2.stop()}} 4. {{grid1}} receives {{NODE_LEFT}} and updates {{discoCache}} 5. {{grid1}} execution {{callable}} failed with 'Failed to send job request because remote node left grid (if fail-over is enabled, will attempt fail-over to another node' 6. {{grid1}}: {{AlwaysFailoverSpi}} max attempts reached. 7. {{grid1.affinityCall}} failed with 'Job failover failed because number of maximum failover attempts for affinity call is exceeded' 8. {{grid2}} receives verified node left message then stopping. The patched {{CacheAffinityCallSelfTest}} reproduces the problem. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (IGNITE-6860) Lack of context information upon serializing and marshalling (writeObject and writeFields)
Alexandr Kuramshin created IGNITE-6860: -- Summary: Lack of context information upon serializing and marshalling (writeObject and writeFields) Key: IGNITE-6860 URL: https://issues.apache.org/jira/browse/IGNITE-6860 Project: Ignite Issue Type: Bug Security Level: Public (Viewable by anyone) Components: general Affects Versions: 2.3 Reporter: Alexandr Kuramshin Fix For: 2.4 Having the stack trace {noformat} Caused by: org.apache.ignite.binary.BinaryObjectException: Failed to marshal object with optimized marshaller: [org.apache.logging.log4j.core.config.AppenderControl@302e61a8] at org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal0(BinaryWriterExImpl.java:186) at org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal(BinaryWriterExImpl.java:147) at org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal(BinaryWriterExImpl.java:134) at org.apache.ignite.internal.binary.BinaryWriterExImpl.doWriteObject(BinaryWriterExImpl.java:496) at org.apache.ignite.internal.binary.BinaryWriterExImpl.writeObjectField(BinaryWriterExImpl.java:1160) at org.apache.ignite.internal.binary.BinaryFieldAccessor$DefaultFinalClassAccessor.write(BinaryFieldAccessor.java:663) at org.apache.ignite.internal.binary.BinaryClassDescriptor.write(BinaryClassDescriptor.java:793) at org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal0(BinaryWriterExImpl.java:206) at org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal(BinaryWriterExImpl.java:147) at org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal(BinaryWriterExImpl.java:134) at org.apache.ignite.internal.binary.BinaryWriterExImpl.doWriteObject(BinaryWriterExImpl.java:496) at org.apache.ignite.internal.binary.BinaryWriterExImpl.writeObjectField(BinaryWriterExImpl.java:1160) at org.apache.ignite.internal.binary.BinaryFieldAccessor$DefaultFinalClassAccessor.write(BinaryFieldAccessor.java:663) at org.apache.ignite.internal.binary.BinaryClassDescriptor.write(BinaryClassDescriptor.java:793) at org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal0(BinaryWriterExImpl.java:206) at org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal(BinaryWriterExImpl.java:147) at org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal(BinaryWriterExImpl.java:134) at org.apache.ignite.internal.binary.BinaryWriterExImpl.doWriteObject(BinaryWriterExImpl.java:496) at org.apache.ignite.internal.binary.BinaryWriterExImpl.writeObjectField(BinaryWriterExImpl.java:1160) at org.apache.ignite.internal.binary.BinaryFieldAccessor$DefaultFinalClassAccessor.write(BinaryFieldAccessor.java:663) at org.apache.ignite.internal.binary.BinaryClassDescriptor.write(BinaryClassDescriptor.java:793) at org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal0(BinaryWriterExImpl.java:206) at org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal(BinaryWriterExImpl.java:147) at org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal(BinaryWriterExImpl.java:134) at org.apache.ignite.internal.binary.BinaryWriterExImpl.doWriteObject(BinaryWriterExImpl.java:496) at org.apache.ignite.internal.binary.BinaryWriterExImpl.writeObjectField(BinaryWriterExImpl.java:1160) at org.apache.ignite.internal.binary.BinaryFieldAccessor$DefaultFinalClassAccessor.write(BinaryFieldAccessor.java:663) at org.apache.ignite.internal.binary.BinaryClassDescriptor.write(BinaryClassDescriptor.java:793) at org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal0(BinaryWriterExImpl.java:206) at org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal(BinaryWriterExImpl.java:147) at org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal(BinaryWriterExImpl.java:134) at org.apache.ignite.internal.binary.BinaryWriterExImpl.doWriteObject(BinaryWriterExImpl.java:496) at org.apache.ignite.internal.binary.BinaryWriterExImpl.writeObjectField(BinaryWriterExImpl.java:1160) at org.apache.ignite.internal.binary.BinaryFieldAccessor$DefaultFinalClassAccessor.write(BinaryFieldAccessor.java:663) at org.apache.ignite.internal.binary.BinaryClassDescriptor.write(BinaryClassDescriptor.java:793) at org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal0(BinaryWriterExImpl.java:206) at org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal(BinaryWriterExImpl.java:147) at org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal(BinaryWriterExImpl.java:134) at org.apache.ignite.internal.binary.BinaryWriterExImpl.doWriteObject(BinaryWriterExImpl.java:496
[jira] [Created] (IGNITE-6858) Wait for exchange inside GridReduceQueryExecutor.query which never finishes due to opened transaction
Alexandr Kuramshin created IGNITE-6858: -- Summary: Wait for exchange inside GridReduceQueryExecutor.query which never finishes due to opened transaction Key: IGNITE-6858 URL: https://issues.apache.org/jira/browse/IGNITE-6858 Project: Ignite Issue Type: Bug Security Level: Public (Viewable by anyone) Components: sql Affects Versions: 2.3 Reporter: Alexandr Kuramshin Assignee: Vladimir Ozerov Fix For: 2.4 Infinite waiting in loop {noformat} for (int attempt = 0;; attempt++) { if (attempt != 0) { try { Thread.sleep(attempt * 10); // Wait for exchange. } catch (InterruptedException e) { Thread.currentThread().interrupt(); throw new CacheException("Query was interrupted.", e); } } {noformat} because of exchange will wait for partition eviction with opened transaction in a related thread {noformat} at java.lang.Thread.sleep(Native Method) at o.a.i.i.processors.query.h2.twostep.GridReduceQueryExecutor.query(GridReduceQueryExecutor.java:546) at o.a.i.i.processors.query.h2.IgniteH2Indexing$8.iterator(IgniteH2Indexing.java:1236) at o.a.i.i.processors.cache.QueryCursorImpl.iterator(QueryCursorImpl.java:95) {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (IGNITE-6636) BinaryStream position integer overflow
Alexandr Kuramshin created IGNITE-6636: -- Summary: BinaryStream position integer overflow Key: IGNITE-6636 URL: https://issues.apache.org/jira/browse/IGNITE-6636 Project: Ignite Issue Type: Bug Security Level: Public (Viewable by anyone) Components: general Affects Versions: 2.2 Reporter: Alexandr Kuramshin There were some issues with negative {{BinaryAbstractStream#pos}} value. We may get stack trace like that {noformat} java.lang.ArrayIndexOutOfBoundsException: -2142240123 at org.apache.ignite.internal.binary.streams.BinaryHeapOutputStream.writeByteAndShift(BinaryHeapOutputStream.java) at org.apache.ignite.internal.binary.streams.BinaryAbstractOutputStream.writeByte(BinaryAbstractOutputStream.java) at org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal0(BinaryWriterExImpl.java) at org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal(BinaryWriterExImpl.java) {noformat} The worst of it is that the {{ArrayIndexOutOfBoundsException}} has been thrown on the next write to the stream, and upon stack unwinding we couldn't know which object actually cause the overflow. I've to suggest to check all updates to the {{BinaryAbstractStream#pos}} and throw exception right after the change. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (IGNITE-6536) NPE on registerClassName() with MappedName
Alexandr Kuramshin created IGNITE-6536: -- Summary: NPE on registerClassName() with MappedName Key: IGNITE-6536 URL: https://issues.apache.org/jira/browse/IGNITE-6536 Project: Ignite Issue Type: Bug Components: binary Affects Versions: 2.1 Reporter: Alexandr Kuramshin Fix For: None {{NullPointerException}} occurs in {{org.apache.ignite.internal.MarshallerContextImpl#registerClassName}} on trying to compare {{mappedName.className()}} of already existed {{typeId}} mapping with the new one {{clsName}} has come as a parameter. Actually {{org.apache.ignite.internal.processors.marshaller.MappedName#className}} may not be null but it was. So we should check {{clsName}} comes in {{MappedName}} constructor, to prevent same NPEs in the future. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (IGNITE-6521) Review default JVM options for better performance
Alexandr Kuramshin created IGNITE-6521: -- Summary: Review default JVM options for better performance Key: IGNITE-6521 URL: https://issues.apache.org/jira/browse/IGNITE-6521 Project: Ignite Issue Type: Improvement Components: general, visor Affects Versions: 2.1 Reporter: Alexandr Kuramshin Assignee: Alexandr Kuramshin Non-optimal recommendations are present in ignite startup scrips {noformat} :: :: Uncomment the following GC settings if you see spikes in your throughput due to Garbage Collection. :: :: set JVM_OPTS=%JVM_OPTS% -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+UseTLAB -XX:NewSize=128m -XX:MaxNewSize=128m :: set JVM_OPTS=%JVM_OPTS% -XX:MaxTenuringThreshold=0 -XX:SurvivorRatio=1024 -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=60 {noformat} Some utilities (like Visor) are hanged up in continuous GCs when connected to large clusters (above one hundred nodes). Even after using large heap (about 32 Gb). I'd like to propose to remove this lines and modify default JVM_OPTS as follows {noformat} set JVM_OPTS=-Xms1g -Xmx8g -XX:+UseG1GC -server -XX:+AggressiveOpts -XX:MaxPermSize=256m {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (IGNITE-6519) Race in SplitAwareTopologyValidator on activator and server node join
Alexandr Kuramshin created IGNITE-6519: -- Summary: Race in SplitAwareTopologyValidator on activator and server node join Key: IGNITE-6519 URL: https://issues.apache.org/jira/browse/IGNITE-6519 Project: Ignite Issue Type: Bug Components: cache Affects Versions: 2.1 Reporter: Alexandr Kuramshin Assignee: Alexandr Kuramshin The following sequence may occur: 1. {{SplitAwareTopologyValidator}} detects split, gets {{NOTVALID}} and returns false from {{validate()}} 2. Activator node joins and {{SplitAwareTopologyValidator}} gets {{REPAIRED}} 3. Server node joins from other DC and it makes {{SplitAwareTopologyValidator}} gets {{VALID}} 4. Then the server node left the cluster and {{SplitAwareTopologyValidator}} should return false from {{validate()}} in cause of next split But current implementation makes {{SplitAwareTopologyValidator}} auto-{{REPAIRED}}. Actually if the activator node will being forgotten to leave the cluster it may automatically repair a split many times. But it supposed to be manual operation. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (IGNITE-6499) Compact NULL fields binary representation
Alexandr Kuramshin created IGNITE-6499: -- Summary: Compact NULL fields binary representation Key: IGNITE-6499 URL: https://issues.apache.org/jira/browse/IGNITE-6499 Project: Ignite Issue Type: Improvement Components: binary Affects Versions: 2.1 Reporter: Alexandr Kuramshin Assignee: Vladimir Ozerov Current compact footer implementation writes offset for the every field in schema. Depending on serialized size of an object offset may be 1, 2 or 4 bytes. Imagine an object with some 100 fields are null. It takes from 100 to 400 bytes overhead. For middle-sized objects (about 260 bytes) it doubles the memory usage. For a small-sized objects (about 40 bytes) the memory usage increased by factor 3 or 4. Proposed two optimizations, the both should be implemented, the most optimal implementation should be selected dynamically upon object marshalling. 1. Write field ID and offset for the only non-null fields in footer. 2. Write footer header then field offsets for the only non-null fields as follows [0] bit mask for first 8 fields, 0 - field is null, 1 - field is non-null [1] cumulative sum of "1" bits [2] bit mask for the next 8 fields [3] cumulative sum of "1" bits ... and so on [N1...N2] offset of first non-null field [N3...N4] offset of next non-null field ... and so on If we want to read fields from 0 to 7, then we read first footer byte, step through bits and find the offset index for non-null field or find that field is null. If we want to read fields from 8, then we read two footer bytes, take start offset from the first byte, and then step through bits and find the offset index for non-null field or find that field is null. This supports up to 255 non-null fields per binary object. Overhead would be only 24 bytes per 100 null fields instead of 200 bytes for the middle-sized object. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (IGNITE-6491) Race in TopologyValidator.validate() and EVT_NODE_LEFT listener calls (split-brain activator)
Alexandr Kuramshin created IGNITE-6491: -- Summary: Race in TopologyValidator.validate() and EVT_NODE_LEFT listener calls (split-brain activator) Key: IGNITE-6491 URL: https://issues.apache.org/jira/browse/IGNITE-6491 Project: Ignite Issue Type: Bug Components: cache, general Affects Versions: 2.1 Reporter: Alexandr Kuramshin Assignee: Alexandr Kuramshin Fix For: 2.2 The following wrong cache {{validate}}/{{put}} sequence may occur On node left {{GridDhtPartitionsExchangeFuture}} will be generated by the {{disco-event-worker}} thread. Then the {{exchange-worker}} thread does {noformat} Split-brain detected [cacheName=test40, activatorTopVer=0, cacheTopVer=14] at org.apache.ignite.internal.util.IgniteUtils.dumpStack(IgniteUtils.java:1141) at org.apache.ignite.internal.processors.cache.IgniteTopologyValidatorGridSplitCacheTest$SplitAwareTopologyValidator.validate(IgniteTopologyValidatorGridSplitCacheTest.java:307) at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTopologyFutureAdapter.validateCacheGroup(GridDhtTopologyFutureAdapter.java:64) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onDone(GridDhtPartitionsExchangeFuture.java:1456) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onDone(GridDhtPartitionsExchangeFuture.java:115) at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:450) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:668) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2278) {noformat} The result of validation is stored in {{grpValidRes}} with value of {{false}}. After some delay the {{disco-event-worker}} thread will do {noformat} java.lang.Exception: Node is segment activator [cacheName=test40, activatorTopVer=14] at org.apache.ignite.internal.util.IgniteUtils.dumpStack(IgniteUtils.java:1141) at org.apache.ignite.internal.processors.cache.IgniteTopologyValidatorGridSplitCacheTest$SplitAwareTopologyValidator$2.apply(IgniteTopologyValidatorGridSplitCacheTest.java:360) at org.apache.ignite.internal.processors.cache.IgniteTopologyValidatorGridSplitCacheTest$SplitAwareTopologyValidator$2.apply(IgniteTopologyValidatorGridSplitCacheTest.java:349) at org.apache.ignite.internal.managers.eventstorage.GridEventStorageManager$UserListenerWrapper.onEvent(GridEventStorageManager.java:1463) at org.apache.ignite.internal.managers.eventstorage.GridEventStorageManager.notifyListeners(GridEventStorageManager.java:859) at org.apache.ignite.internal.managers.eventstorage.GridEventStorageManager.notifyListeners(GridEventStorageManager.java:844) at org.apache.ignite.internal.managers.eventstorage.GridEventStorageManager.record0(GridEventStorageManager.java:341) at org.apache.ignite.internal.managers.eventstorage.GridEventStorageManager.record(GridEventStorageManager.java:307) at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryWorker.recordEvent(GridDiscoveryManager.java:2478) at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryWorker.body0(GridDiscoveryManager.java:2684) at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryWorker.body(GridDiscoveryManager.java:2507) {noformat} After this invocation the result of {{SplitAwareTopologyValidator.validate}} should be changed to {{true}}, but it was already invoked and the result has been cached in {{grpValidRes}} with the value of {{false}}. So any successive calls to {{cache.put}} causes to fail {noformat} Test failed. java.lang.RuntimeException: tryPut() failed [gridName=cache.IgniteTopologyValidatorGridSplitCacheTest0] at org.apache.ignite.internal.processors.cache.IgniteTopologyValidatorGridSplitCacheTest.tryPut(IgniteTopologyValidatorGridSplitCacheTest.java:262) at org.apache.ignite.internal.processors.cache.IgniteTopologyValidatorGridSplitCacheTest.testTopologyValidator(IgniteTopologyValidatorGridSplitCacheTest.java:182) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at junit.framework.TestCase.runTest(TestCase.java:176
[jira] [Created] (IGNITE-6347) Exception in GridDhtPartitionMap.readExternal
Alexandr Kuramshin created IGNITE-6347: -- Summary: Exception in GridDhtPartitionMap.readExternal Key: IGNITE-6347 URL: https://issues.apache.org/jira/browse/IGNITE-6347 Project: Ignite Issue Type: Bug Components: general Affects Versions: 2.1 Reporter: Alexandr Kuramshin Fix For: 2.1 Reading partition state with {{id > Short.MAX_VALUE}} causes to read negative value in {{int part = in.readShort()}} {{in.readUnsignedShort()}} should be used instead. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (IGNITE-5798) Logging Ignite configuration at startup
Alexandr Kuramshin created IGNITE-5798: -- Summary: Logging Ignite configuration at startup Key: IGNITE-5798 URL: https://issues.apache.org/jira/browse/IGNITE-5798 Project: Ignite Issue Type: Improvement Reporter: Alexandr Kuramshin Fix For: 2.1 I've found that IgniteConfiguration is not logged even when -DIGNITE_QUIET=false When we starting Ignite with path to the xml, or InputStream, we have to ensure, that all configuration options were properly read. And also we would like to know actual values of uninitialized configuration properties (default values), which will be set only after Ignite get started. Monitoring tools, like Visor or WebConsole, do not show all configuration options. And even though they will be updated to show all properties, when new configuration options appear, then tools update will be needed. Logging IgniteConfiguration at startup gives a possibility to ensure that the right grid configuration has been applied and leads to better user support based on log analyzing. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (IGNITE-5750) Format of uptime for metrics
Alexandr Kuramshin created IGNITE-5750: -- Summary: Format of uptime for metrics Key: IGNITE-5750 URL: https://issues.apache.org/jira/browse/IGNITE-5750 Project: Ignite Issue Type: Bug Components: general Affects Versions: 2.0 Reporter: Alexandr Kuramshin Priority: Trivial Fix For: 2.1 Metrics for local node shows uptime formatted as 00:00:00:000 But the last colon should be changed to the dot. Right format is 00:00:00.000 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (IGNITE-5251) Some JVM implementations may return null from getClassLoader()
Alexandr Kuramshin created IGNITE-5251: -- Summary: Some JVM implementations may return null from getClassLoader() Key: IGNITE-5251 URL: https://issues.apache.org/jira/browse/IGNITE-5251 Project: Ignite Issue Type: Bug Components: general Affects Versions: 2.0 Environment: OpenJDK Runtime Environment (build 1.8.0_131-b11) OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode) Reporter: Alexandr Kuramshin Fix For: 2.1 Starting Ignite instance causes the NPE {noformat} java.lang.NullPointerException at org.apache.ignite.internal.util.IgniteUtils.appendClassLoaderHash(IgniteUtils.java:4438) at org.apache.ignite.internal.util.IgniteUtils.makeMBeanName(IgniteUtils.java:4418) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.registerFactoryMbean(IgnitionEx.java:2499) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:1801) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1604) at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1041) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:568) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:515) at org.apache.ignite.Ignition.start(Ignition.java:322) {noformat} Should be implemented {{IgniteUtils.getClassLoader(Class cls)}} which checks {{cls.getClassLoader()}} and in the case of null returns {{ClassLoader.getSystemClassLoader()}}. All usages of {{Class.getClassLoader()}} should be replaced with {{IgniteUtils.getClassLoader()}}. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
Re: Add ability to enable and disable rebalancing per-node
to Nick, could you please describe in more detail the use of DataStreamer (do you use StreamReceiver)? It seems that you've unnecessary care about synchronous service startup and cache rebalance. Service should start quickly after node has joined the topology, and will process all the data has been collected by local partitions the moments before. You may use rebalance delay to minimize the amount of collected data before the service has been be started. But if your service depends on external resources (another service), managing rebalance won't help you because external resource may get unavailable even after your service gets started and rebalance has occur. You can't unrebalance partitions in such the case. In addition, if some event-cache should be supplied with other caches (storing additional data for service processing), there is always the gap between rebalancing partition of first and the last cache containing collocated data. I think you should not worry about additional network calls while rebalancing in progress. to Sasha, I think we need configuration property enablePartitionExchange (in addition to MBean flag) to have an ability to disable partition exchange at node startup. 2017-05-06 2:51 GMT+07:00 npordash <nickpord...@gmail.com>: > I can outline a use-case I have which may help define requirements for this > task. For context, I was originally going to try and address the below > use-case by disabling automatic rebalancing on a per-cache basis and use a > cluster-wide task to orchestrate manual rebalancing; however, this issue > sounds like it may provide a better approach. > > I have caches setup for the sole purpose of routing data to nodes via a > Data > Streamer. The logic in the streamer is simply to access a plugin on the > data > node which exposes a processing pipeline and runs the received cache > entries > through it. The data in this case is monitoring related and there is one > cache (or logical stream) per data type (f.e. logs, events, metrics). > > The pipeline is composed of N services which are deployed as node > singletons > and have a service filter which targets a particular cache. These services > can be deployed and un-deployed as processing requirements change or bugs > are fixed without requiring clients to know or care about it. > > The catch here is that when nodes are added I don't want map partitions to > rebalance to a new node until I know all of the necessary services are > running, otherwise we may have a small window where data is processed > through a pipeline that isn't completely initialized yet which would result > in a data quality issue. Alternatively, I could have the pipeline raise an > error which would cause the streamer to retry, but I'd like this to be > handled more gracefully, if possible. > > In addition, it will probably be the case were these caches eventually have > node filters so that we can isolate resources for these streams across > different computes. This means that, for example, if we add a node only for > metrics then deferring rebalancing should ideally only impact caches that > would get assigned to that node. > > Going even further... so far we've talked about one cache which is used > just > for streaming, but at least one of the services would create its own set of > caches as an in-memory storage layer which maintains an inverted index and > time series data for elements coming through the stream. The storage caches > in this case would only exist on nodes where the stream cache is and most > of > the write activity to these caches would be local since they would use the > same affinity as the stream cache (if most writes were remote this wouldn't > scale well). So... these caches would need to rebalance at the same time in > order to minimize the possibility of additional network calls. > > The main concern I have is how to avoid the race condition of another node > joining the topology _after_ it has been determined rebalancing should > happen, but _before_ rebalancing is triggered. If this is controlled on a > per-node (+cache) basis - as the ticket describes - it's probably a > non-issue, but it's definitely an issue if it's only on a per-cache basis. > > -Nick > > > > -- > View this message in context: http://apache-ignite- > developers.2346864.n4.nabble.com/Add-ability-to-enable-and- > disable-rebalancing-per-node-tp17494p17529.html > Sent from the Apache Ignite Developers mailing list archive at Nabble.com. > -- Thanks, Alexandr Kuramshin
[jira] [Created] (IGNITE-5084) PagesList.put() assertion: pageId != tailId
Alexandr Kuramshin created IGNITE-5084: -- Summary: PagesList.put() assertion: pageId != tailId Key: IGNITE-5084 URL: https://issues.apache.org/jira/browse/IGNITE-5084 Project: Ignite Issue Type: Bug Components: general Affects Versions: 2.0 Reporter: Alexandr Kuramshin Get an error upon rebalancing on topology update {noformat} Failed processing message [senderId=78a8f841-5d40-4ac7-b26b-f1b5e7f3faa0, msg=GridDhtPartitionSupplyMessageV2 [updateSeq=142, topVer=AffinityTopologyVersion [topVer=8, minorTopVer=0], missed=null, clean=[0, 1, 2, 3, 4, 5, 6, 7, 8, 11, 12, 13, 14, 17, 19, 21, 20, 23, 22, 24, 26, 29, 28, 31, 35, 32, 33, 39, 36, 37, 42, 43, 40, 41, 44, 45, 51, 48, 55, 54, 53, 52, 58, 56, 63, 62, 60, 68, 69, 65, 66, 76, 77, 78, 74, 85, 87, 86, 81, 80, 82, 92, 91, 90, 98, 96, 97], msgSize=0, size=67, parts=[0, 1, 2, 3, 4, 5, 6, 7, 8, 11, 12, 13, 14, 17, 19, 21, 20, 23, 22, 24, 26, 29, 28, 31, 35, 32, 33, 39, 36, 37, 42, 43, 40, 41, 44, 45, 51, 48, 55, 54, 53, 52, 58, 56, 63, 62, 60, 68, 69, 65, 66, 76, 77, 78, 74, 85, 87, 86, 81, 80, 82, 92, 91, 90, 98, 96, 97], super=GridCacheMessage [msgId=100460, depInfo=null, err=null, skipPrepare=false, cacheId=-2100569601, cacheId=-2100569601]]] java.lang.AssertionError: pageId = 0, tailId = 281556581089286 at org.apache.ignite.internal.processors.cache.database.freelist.PagesList.put(PagesList.java:~) {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (IGNITE-5026) getOrCreateCaches() hangs if any exception in GridDhtPartitionsExchangeFuture.init()
Alexandr Kuramshin created IGNITE-5026: -- Summary: getOrCreateCaches() hangs if any exception in GridDhtPartitionsExchangeFuture.init() Key: IGNITE-5026 URL: https://issues.apache.org/jira/browse/IGNITE-5026 Project: Ignite Issue Type: Bug Components: cache Affects Versions: 1.9, 2.0 Reporter: Alexandr Kuramshin Fix For: 2.1 Any exception has been thrown by {{GridDhtPartitionsExchangeFuture.init()}} causes to wait indefinitely {{GridCompoundFuture}} returned by {{GridCacheProcessor.dynamicStartCaches()}}. Reproduced by {{IgniteDynamicCacheStartSelfTest.testGetOrCreateCollectionExceptional()}} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (IGNITE-4865) Non-informative error message on using GridClientOptimizedMarshaller with unknown task classes
Alexandr Kuramshin created IGNITE-4865: -- Summary: Non-informative error message on using GridClientOptimizedMarshaller with unknown task classes Key: IGNITE-4865 URL: https://issues.apache.org/jira/browse/IGNITE-4865 Project: Ignite Issue Type: Improvement Components: rest Affects Versions: 2.0 Reporter: Alexandr Kuramshin Assignee: Alexandr Kuramshin Upon {{GridClientCompute.execute()}} I get non-informative error if a task class is not present in {{classnames.properties}}. It occurs when {{GridClient}} was configured to use {{GridClientOptimizedMarshaller}}. {noformat} Closing NIO session because of unhandled exception [cls=class o.a.i.i.util.nio.GridNioException, msg=class o.a.i.IgniteCheckedException: Failed to deserialize object with given class loader: null] {noformat} There is two problems: * Actual problem was hidden {noformat} Caused by: java.lang.UnsupportedOperationException at org.apache.ignite.internal.client.marshaller.optimized.GridClientOptimizedMarshaller$ClientMarshallerContext.className(GridClientOptimizedMarshaller.java:137) at org.apache.ignite.internal.MarshallerContextAdapter.getClass(MarshallerContextAdapter.java:174) at org.apache.ignite.marshaller.optimized.OptimizedMarshallerUtils.classDescriptor(OptimizedMarshallerUtils.java:266) at org.apache.ignite.marshaller.optimized.OptimizedObjectInputStream.readObjectOverride(OptimizedObjectInputStream.java:318) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:367) {noformat} * Even reading the cause we don't understand what is wrong What to do: * Log stacktrace every time * Throw UnsupportedOperationException with informative message. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
Re: Inaccurate documentation about transactions
Yes, please. https://issues.apache.org/jira/browse/IGNITE-4795 2017-02-28 2:22 GMT+07:00 Denis Magda <dma...@apache.org>: > +1 to Alexander’s proposal. > > Alexander, could you wrap the discussion up creating a ticket with > detailed explanation what to do? > > — > Denis > > On Feb 27, 2017, at 9:01 AM, Dmitriy Setrakyan <dsetrak...@apache.org> > wrote: > > I like the idea of fixing the exception inheritance. > > On Mon, Feb 27, 2017 at 1:40 AM, Alexandr Kuramshin <ein.nsk...@gmail.com> > wrote: > >> I think annotating the methods with @IgniteTransactional is redundant, >> because they are already marked by "throws TransactionTimeoutException/Tr >> ansactionRollbackException/TransactionHeuristicException". >> >> For example, the same approach was used in JavaBeans 1.01 specs [1] with >> TooManyListenersException. >> >> The only thing I'd like to do: make all TransactionTimeoutException/Tr >> ansactionRollbackException/TransactionHeuristicException are derived >> from the same parent TransactionException. And declare all transactional >> methods as "throws TransactionException" with consequent Javadoc update. >> >> [1] http://download.oracle.com/otndocs/jcp/7224-javabeans-1. >> 01-fr-spec-oth-JSpec/ >> >> 2017-02-18 1:07 GMT+07:00 Dmitriy Setrakyan <dsetrak...@apache.org>: >> >>> On Fri, Feb 17, 2017 at 3:35 AM, Andrey Gura <ag...@apache.org> wrote: >>> >>> > From my point of view @IgniteTransactional annotation is redundant >>> > entity which will just confuse and lead to questions like "How to use >>> > this annotation?" I think documention update is better way. >>> > >>> >>> Why do you think it will be confusing? This annotation is suggested >>> purely >>> for documentation purposes, nothing else. Instead of adding documentation >>> to every method, we just add the annotation. User can check the >>> @IgniteTransactional javadoc to understand what this annotation means. >>> >> >> >> >> -- >> Thanks, >> Alexandr Kuramshin >> > > > -- Thanks, Alexandr Kuramshin
[jira] [Created] (IGNITE-4767) rollback exception hides the origin exception (e.g. commit)
Alexandr Kuramshin created IGNITE-4767: -- Summary: rollback exception hides the origin exception (e.g. commit) Key: IGNITE-4767 URL: https://issues.apache.org/jira/browse/IGNITE-4767 Project: Ignite Issue Type: Bug Components: cache, general Affects Versions: 1.8 Reporter: Alexandr Kuramshin Fix For: 2.0 There is too much code places like: {noformat} try { return txFuture.get(); } catch (IgniteCheckedException e) { tx.rollbackAsync(); throw e; } {noformat} where an error upon rollback hides the actual exception {{e}}. This should be implemented in the way like try-with-resources does: {noformat} try { return txFuture.get(); } catch (IgniteCheckedException e1) { try { tx.rollbackAsync(); } catch (Throwable inner) { e.addSuppressed(inner); } throw e; } {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
Re: Inaccurate documentation about transactions
I think annotating the methods with @IgniteTransactional is redundant, because they are already marked by "throws TransactionTimeoutException/TransactionRollbackException/TransactionHeuristicException". For example, the same approach was used in JavaBeans 1.01 specs [1] with TooManyListenersException. The only thing I'd like to do: make all TransactionTimeoutException/TransactionRollbackException/TransactionHeuristicException are derived from the same parent TransactionException. And declare all transactional methods as "throws TransactionException" with consequent Javadoc update. [1] http://download.oracle.com/otndocs/jcp/7224-javabeans-1.01-fr-spec-oth-JSpec/ 2017-02-18 1:07 GMT+07:00 Dmitriy Setrakyan <dsetrak...@apache.org>: > On Fri, Feb 17, 2017 at 3:35 AM, Andrey Gura <ag...@apache.org> wrote: > > > From my point of view @IgniteTransactional annotation is redundant > > entity which will just confuse and lead to questions like "How to use > > this annotation?" I think documention update is better way. > > > > Why do you think it will be confusing? This annotation is suggested purely > for documentation purposes, nothing else. Instead of adding documentation > to every method, we just add the annotation. User can check the > @IgniteTransactional javadoc to understand what this annotation means. > -- Thanks, Alexandr Kuramshin
Inaccurate documentation about transactions
After doing some tests with transactions I've found transactions work not as expected after reading the documentation [1]. First of all, nowhere's written which methods of the cache are transactional and which are not. Quite the contrary, after reading documentation we get know that each TRANSACTIONAL cache is fully ACID-compliant without exceptions. Only after deep multi-thread testing, and consulting with other developers, I get know that only get and put methods are running within transaction, but iterator and query methods are running outside (in autonomous) transaction with READ_COMMITTED isolation level. Later I've understood that only methods throwing TransactionTimeoutException/TransactionRollbackException/TransactionHeuristicException are fully transactional. I think all methods on page [2] should be directly described - are they transactional or not. Btw, why these exceptions are not derived from the common base class, e.g. TransactionException? Secondary, using the transactional get() method inside the READ_COMMITTED transaction we expect to get the committed value, as the documentation [1] claims: * READ_COMMITTED - Data is read without a lock and is never cached in the transaction itself. Ok, but what about put()? After doing the put() a new value, we get successive reads of the new value, that is actually DIRTY READ. Hence the value is cached within transaction. It's not documented behavior. [1] https://apacheignite.readme.io/docs/transactions [2] https://ignite.apache.org/releases/1.8.0/javadoc/org/apache/ignite/IgniteCache.html -- Thanks, Alexandr Kuramshin
[jira] [Created] (IGNITE-4632) AffinityFunction unchecked exception handling (unassigned backup)
Alexandr Kuramshin created IGNITE-4632: -- Summary: AffinityFunction unchecked exception handling (unassigned backup) Key: IGNITE-4632 URL: https://issues.apache.org/jira/browse/IGNITE-4632 Project: Ignite Issue Type: Bug Components: general Affects Versions: 1.8 Reporter: Alexandr Kuramshin Priority: Minor {{AffinityFunction}} implementation may throw unchecked exception upon assignment. In some cases additional processing should be performed when affinity function method invocation throws an exception. Special case when the cache with backups is running, and a node with a primary partition will left. Then we get the primary partition unassigned if {{AffinityFunction.partition(Object)}} throws an exception. My suggestion is to shutdown the node in such the case (like SEGMENTED), because the cluster could not work normally without the primary partition assigned. {noformat} Failed processing message [senderId=8a1ab9a3-786e-4601-ba22-efd380849d99, msg=GridDhtPartitionSupplyMessageV2 [updateSeq=16069, topVer=AffinityTopologyVersion [topVer=7, minorTopVer=0], missed=[16, 17, 33, 22, 56, 10], clean=[0, 1, 2, 34, 3, 5, 7, 9, 45, 46, 49, 18, 50, 55, 25, 26, 58, 29, 61], msgSize=0, size=19, parts=[0, 1, 2, 34, 3, 5, 7, 9, 45, 46, 49, 18, 50, 55, 25, 26, 58, 29, 61], super=GridCacheMessage [msgId=70098615, depInfo=null, err=null, skipPrepare=false, cacheId=-148990687, cacheId=-148990687]]] com.sbt.persistence.exceptions.DPLException: ParticleKeyMapper не может обратывать никаких других объектов кроме ОУ. Системная ошибка - обратитесь в службу технической поддержки DPL at com.sbt.dpl.gridgain.ParticleAffinityFunction.partition(ParticleAffinityFunction.java:67) at org.apache.ignite.internal.processors.cache.GridCacheAffinityManager.partition(GridCacheAffinityManager.java:219) at org.apache.ignite.internal.processors.cache.GridCacheAffinityManager.partition(GridCacheAffinityManager.java:194) at org.apache.ignite.internal.processors.cache.GridCacheAffinityManager.localNode(GridCacheAffinityManager.java:382) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander.handleSupplyMessage(GridDhtPartitionDemander.java:680) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader.handleSupplyMessage(GridDhtPreloader.java:390) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$5.apply(GridCachePartitionExchangeManager.java:395) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$5.apply(GridCachePartitionExchangeManager.java:385) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:758) {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
Ignite configuration at runtime
Hi community! I've found that IgniteConfiguration is not logged even when -DIGNITE_QUIET=false When we starting Ignite with path to the xml, or InputStream, we have to ensure, that all configuration options were properly read. And also we would like to know actual values of uninitialized configuration properties (default values), which will be set only after Ignite get started. Monitoring tools, like Visor or WebConsole, do not show all configuration options. And even though they will be updated to show all properties, when new configuration options appear, then tools update will be needed. So logging IgniteConfiguration in whole is the really needed improvement. -- Thanks, Alexandr Kuramshin
Empty cache memory overhead
Hi community, I'd like to share my investigations about the subject. Even if the caches is off-heap and contains no data, the JVM heap memory consumed. I'm calling this feature "empty cache memory overhead" ("overhead" later for shot). The size of the memory consumed depends on many factors, and varying from 1 to 50 Mb per cache on every node in the cluster. There is real systems uses >1000 caches within the cluster. So the heap memory consumed on each node will be 50 Gb or more. I've found that overhead mainly depends on this factors: 1) local partitions count assigned to the node by the affinity function; 1.a) total number of partitions of the affinity function; 1.b) number of backups; 2) IGNITE_ATOMIC_CACHE_DELETE_HISTORY_SIZE 3) IGNITE_AFFINITY_HISTORY_SIZE After analyzing heapdumps and the sources I've found this countable objects upon overhead depends: 1) First group. GridDhtPartitionTopologyImpl = cache count GridDhtLocalPartition = cache count * local partitions count GridCircularBuffer$Item = cache count * local partitions count * item factor (default 32). Local partitions count = affinity function total partitions / node count * (1 + number of backups) Item factor = map capacity for storing -> IGNITE_ATOMIC_CACHE_DELETE_HISTORY_SIZE / affinity function partitions count, but minimum 20. Real values: GridDhtPartitionTopologyImpl = 1000 Affinity function total partitions = 1024 Node count = 16 Number of backups = 3 Local partitions count = 256 GridDhtLocalPartition = 256_000 GridCircularBuffer$Item = 8_192_000 2) Second group. GridAffinityAssignmentCache = cache count * node count GridAffinityAssignment = cache count * node count * assignment factor Assignment factor depends on topology version and IGNITE_AFFINITY_HISTORY_SIZE, default 6-7. Real values: GridAffinityAssignmentCache = 16_000 GridAffinityAssignment = 112_000 I think the implementation should be changed in the way the object counts should depends on cache data size. And the small (or empty) caches should be more lightweight as possible. -- Thanks, Alexandr Kuramshin
[jira] [Created] (IGNITE-4538) BinaryObjectImpl: lack of context information upon deserialization
Alexandr Kuramshin created IGNITE-4538: -- Summary: BinaryObjectImpl: lack of context information upon deserialization Key: IGNITE-4538 URL: https://issues.apache.org/jira/browse/IGNITE-4538 Project: Ignite Issue Type: Improvement Components: binary Affects Versions: 1.8, 1.7 Reporter: Alexandr Kuramshin Taking an error we don't know the cache name was accessed, the type of BinaryClassDescriptor was used, and the entry was accessed (the key of an entry should be logged with respect to the *include sensitive* system property). Such context information should be appended by wrapping inner exception on the every key stack frame. {noformat} org.apache.ignite.binary.BinaryObjectException: Unexpected flag value [pos=24, expected=4, actual=9] at org.apache.ignite.internal.binary.BinaryReaderExImpl.checkFlagNoHandles(BinaryReaderExImpl.java:1423) ~[ignite-core-1.10.1.ea7.jar:1.10.1.ea7] at org.apache.ignite.internal.binary.BinaryReaderExImpl.readLongNullable(BinaryReaderExImpl.java:723) ~[ignite-core-1.10.1.ea7.jar:1.10.1.ea7] at org.apache.ignite.internal.binary.BinaryFieldAccessor$DefaultFinalClassAccessor.readFixedType(BinaryFieldAccessor.java:677) ~[ignite-core-1.10.1.ea7.jar:1.10.1.ea7] at org.apache.ignite.internal.binary.BinaryFieldAccessor$DefaultFinalClassAccessor.read(BinaryFieldAccessor.java:639) ~[ignite-core-1.10.1.ea7.jar:1.10.1.ea7] at org.apache.ignite.internal.binary.BinaryClassDescriptor.read(BinaryClassDescriptor.java:818) ~[ignite-core-1.10.1.ea7.jar:1.10.1.ea7] at org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize(BinaryReaderExImpl.java:1481) ~[ignite-core-1.10.1.ea7.jar:1.10.1.ea7] at org.apache.ignite.internal.binary.BinaryObjectImpl.deserializeValue(BinaryObjectImpl.java:717) ~[ignite-core-1.10.1.ea7.jar:1.10.1.ea7] at org.apache.ignite.internal.binary.BinaryObjectImpl.value(BinaryObjectImpl.java:143) ~[ignite-core-1.10.1.ea7.jar:1.10.1.ea7] at org.apache.ignite.internal.processors.cache.CacheObjectContext.unwrapBinary(CacheObjectContext.java:272) ~[ignite-core-1.10.1.ea7.jar:1.10.1.ea7] at org.apache.ignite.internal.processors.cache.CacheObjectContext.unwrapBinaryIfNeeded(CacheObjectContext.java:160) ~[ignite-core-1.10.1.ea7.jar:1.10.1.ea7] at org.apache.ignite.internal.processors.cache.CacheObjectContext.unwrapBinaryIfNeeded(CacheObjectContext.java:147) ~[ignite-core-1.10.1.ea7.jar:1.10.1.ea7] at org.apache.ignite.internal.processors.cache.GridCacheContext.unwrapBinaryIfNeeded(GridCacheContext.java:1706) ~[ignite-core-1.10.1.ea7.jar:1.10.1.ea7] at org.apache.ignite.internal.processors.cache.query.GridCacheQueryManager$PeekValueExpiryAwareIterator.advance(GridCacheQueryManager.java:2875) ~[ignite-core-1.10.1.ea7.jar:1.10.1.ea7] at org.apache.ignite.internal.processors.cache.query.GridCacheQueryManager$PeekValueExpiryAwareIterator.(GridCacheQueryManager.java:2814) ~[ignite-core-1.10.1.ea7.jar:1.10.1.ea7] at org.apache.ignite.internal.processors.cache.query.GridCacheQueryManager$PeekValueExpiryAwareIterator.(GridCacheQueryManager.java:2752) ~[ignite-core-1.10.1.ea7.jar:1.10.1.ea7] at org.apache.ignite.internal.processors.cache.query.GridCacheQueryManager$5.(GridCacheQueryManager.java:863) ~[ignite-core-1.10.1.ea7.jar:1.10.1.ea7] at org.apache.ignite.internal.processors.cache.query.GridCacheQueryManager.scanIterator(GridCacheQueryManager.java:863) ~[ignite-core-1.10.1.ea7.jar:1.10.1.ea7] at org.apache.ignite.internal.processors.cache.query.GridCacheQueryManager.scanQueryLocal(GridCacheQueryManager.java:1436) ~[ignite-core-1.10.1.ea7.jar:1.10.1.ea7] at org.apache.ignite.internal.processors.cache.query.GridCacheQueryAdapter.executeScanQuery(GridCacheQueryAdapter.java:552) ~[ignite-core-1.10.1.ea7.jar:1.10.1.ea7] at org.apache.ignite.internal.processors.cache.GridCacheAdapter.igniteIterator(GridCacheAdapter.java:4115) ~[ignite-core-1.10.1.ea7.jar:1.10.1.ea7] at org.apache.ignite.internal.processors.cache.GridCacheAdapter.igniteIterator(GridCacheAdapter.java:4092) ~[ignite-core-1.10.1.ea7.jar:1.10.1.ea7] at org.apache.ignite.internal.processors.cache.IgniteCacheProxy.iterator(IgniteCacheProxy.java:1979) ~[ignite-core-1.10.1.ea7.jar:1.10.1.ea7] {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (IGNITE-4533) GridDhtPartitionsExchangeFuture stores unnecessary messages after processing done
Alexandr Kuramshin created IGNITE-4533: -- Summary: GridDhtPartitionsExchangeFuture stores unnecessary messages after processing done Key: IGNITE-4533 URL: https://issues.apache.org/jira/browse/IGNITE-4533 Project: Ignite Issue Type: Bug Components: cache Affects Versions: 1.8, 1.7 Reporter: Alexandr Kuramshin After GridDhtPartitionsExchangeFuture has been completed, GridCachePartitionExchangeManager still stores it in field ExchangeFutureSet exchFuts (for race condition handling). But many GridDhtPartitionsSingleMessage objects stored in field ConcurrentMap<UUID, GridDhtPartitionsAbstractMessage> msgs is not needed after the future has been processed. This map should be cleared in the end of the method onAllReceived(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (IGNITE-4496) Review all logging for sensitive data leak
Alexandr Kuramshin created IGNITE-4496: -- Summary: Review all logging for sensitive data leak Key: IGNITE-4496 URL: https://issues.apache.org/jira/browse/IGNITE-4496 Project: Ignite Issue Type: Improvement Reporter: Alexandr Kuramshin Assignee: Alexandr Kuramshin While sensitive logging option added and toString() methods fixed, not all logging was checked for sensitive data leak -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Capacity Planning - Calculating Memory Usage
Hi Val, I'm sorry, of course only @QuerySqlField(index = true) makes an index on objects field. Fields without indexes make none additional overhead. Group index on multiple fields is a one index (isn't it?) I don't understand what is still unclear. Entry footprint = key footprint + value footprint + entry overhead + index overhead. Index overhead depends on how many indices are enabled for the entry type. 2016-12-23 2:06 GMT+07:00 Valentin Kulichenko <valentin.kuliche...@gmail.com >: > Alexandr, > > See my comments below. > > On Wed, Dec 21, 2016 at 7:01 PM, Alexandr Kuramshin <ein.nsk...@gmail.com> > wrote: > > > Hi Val, > > > > the understanding is simple. > > > > When you enables the single index on entry class you get "First index > > overhead" per entry. > > > > When you enables two indices on entry class you get "First index > overhead" > > + "Next index overhead" per entry. > > > > With three indices you get "First index overhead" + 2 * "Next index > > overhead", and so on... > > > > This should be explained in more detail, probably with some trivial > example. Currently it's very unclear. > > > > > > Each annotated field with @QuerySqlField is an index, except multiple > > fields annotated with @QuerySqlField.Group. > > > > This actually confuses me a lot, because a field can be created with or > without index? Can you please clarify? How much overhead is introduced by a > field without index? With index? What about group indexes? > > > > > > Another way to defining indices is to use property "queryEntities" and > it's > > subproperty "indexes". See the article [1] > > > > [1] https://apacheignite.readme.io/docs/indexes > > > > 2016-12-20 8:38 GMT+07:00 Valentin Kulichenko < > > valentin.kuliche...@gmail.com > > >: > > > > > Alexandr, > > > > > > Can you please clarify what is "First index overhead" and "Next index > > > overhead"? Generally, I think overhead provided by indexes should be > > > described in more details, now it's not very clear what happens when > > > indexes are added. > > > > > > Also the calculation example should be a separate section. > > > > > > -Val > > > > > > On Wed, Dec 14, 2016 at 1:07 AM, Alexandr Kuramshin < > > ein.nsk...@gmail.com> > > > wrote: > > > > > > > Thank you, Andrey, > > > > > > > > I'll do additional tests with expire policy and update the article. > > > > > > > > 2016-12-13 22:10 GMT+07:00 Andrey Mashenkov < > > andrey.mashen...@gmail.com > > > >: > > > > > > > > > Alexandr, > > > > > > > > > > In addition. If expire policy is configured, there is additional > > > overhead > > > > > to entries can be tracked by TtlManager. > > > > > This overhead is OnHeap and does not depend on cache MemoryMode > > (until > > > > > Ignite-3840 will be in master). > > > > > > > > > > For now overhead is about 32-40 bytes (EntryWrapper itself) + > (40-48) > > > > bytes > > > > > (ConcurrentSkipList node) per entry. > > > > > > > > > > > > > > > > > > > > On Tue, Dec 13, 2016 at 10:37 AM, Alexandr Kuramshin < > > > > ein.nsk...@gmail.com > > > > > > > > > > > wrote: > > > > > > > > > > > Hello, Igniters, > > > > > > > > > > > > I'd like to represent updated article [1] about the subject. > > > > > > > > > > > > And I'll very appreciate your comments and questions about it. > > > > > > > > > > > > Please review. > > > > > > > > > > > > [1] http://apacheignite.gridgain.org/docs/capacity-planning > > > > > > > > > > > > -- > > > > > > Thanks, > > > > > > Alexandr Kuramshin > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > С уважением, > > > > > Машенков Андрей Владимирович > > > > > Тел. +7-921-932-61-82 > > > > > > > > > > Best regards, > > > > > Andrey V. Mashenkov > > > > > Cerr: +7-921-932-61-82 > > > > > > > > > > > > > > > > > > > > > -- > > > > Thanks, > > > > Alexandr Kuramshin > > > > > > > > > > > > > > > -- > > Thanks, > > Alexandr Kuramshin > > > -- Thanks, Alexandr Kuramshin
[jira] [Created] (IGNITE-4485) CacheJdbcPojoStore returns unexpected BinaryObject upon loadCache()
Alexandr Kuramshin created IGNITE-4485: -- Summary: CacheJdbcPojoStore returns unexpected BinaryObject upon loadCache() Key: IGNITE-4485 URL: https://issues.apache.org/jira/browse/IGNITE-4485 Project: Ignite Issue Type: Bug Components: cache Affects Versions: 1.8, 1.7 Reporter: Alexandr Kuramshin When calling loadCache(IgniteBiInClosure clo, Object... args) sometimes we get unexpected values of type BinaryObject in IgniteBiInClosure.apply(), whereas POJO value kind was registered previously for well known key type. It's so because getOrCreateCacheMappings returns HashMap which resorts entity mappings for the same key but with different value kind. When BinaryMarshaller is used, then this map contains two mappings for the same key - POJO and BINARY. Possible fix is to use LinkedHashMap, then POJO mapping will be picked first. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Capacity Planning - Calculating Memory Usage
Hi Val, the understanding is simple. When you enables the single index on entry class you get "First index overhead" per entry. When you enables two indices on entry class you get "First index overhead" + "Next index overhead" per entry. With three indices you get "First index overhead" + 2 * "Next index overhead", and so on... Each annotated field with @QuerySqlField is an index, except multiple fields annotated with @QuerySqlField.Group. Another way to defining indices is to use property "queryEntities" and it's subproperty "indexes". See the article [1] [1] https://apacheignite.readme.io/docs/indexes 2016-12-20 8:38 GMT+07:00 Valentin Kulichenko <valentin.kuliche...@gmail.com >: > Alexandr, > > Can you please clarify what is "First index overhead" and "Next index > overhead"? Generally, I think overhead provided by indexes should be > described in more details, now it's not very clear what happens when > indexes are added. > > Also the calculation example should be a separate section. > > -Val > > On Wed, Dec 14, 2016 at 1:07 AM, Alexandr Kuramshin <ein.nsk...@gmail.com> > wrote: > > > Thank you, Andrey, > > > > I'll do additional tests with expire policy and update the article. > > > > 2016-12-13 22:10 GMT+07:00 Andrey Mashenkov <andrey.mashen...@gmail.com > >: > > > > > Alexandr, > > > > > > In addition. If expire policy is configured, there is additional > overhead > > > to entries can be tracked by TtlManager. > > > This overhead is OnHeap and does not depend on cache MemoryMode (until > > > Ignite-3840 will be in master). > > > > > > For now overhead is about 32-40 bytes (EntryWrapper itself) + (40-48) > > bytes > > > (ConcurrentSkipList node) per entry. > > > > > > > > > > > > On Tue, Dec 13, 2016 at 10:37 AM, Alexandr Kuramshin < > > ein.nsk...@gmail.com > > > > > > > wrote: > > > > > > > Hello, Igniters, > > > > > > > > I'd like to represent updated article [1] about the subject. > > > > > > > > And I'll very appreciate your comments and questions about it. > > > > > > > > Please review. > > > > > > > > [1] http://apacheignite.gridgain.org/docs/capacity-planning > > > > > > > > -- > > > > Thanks, > > > > Alexandr Kuramshin > > > > > > > > > > > > > > > > -- > > > С уважением, > > > Машенков Андрей Владимирович > > > Тел. +7-921-932-61-82 > > > > > > Best regards, > > > Andrey V. Mashenkov > > > Cerr: +7-921-932-61-82 > > > > > > > > > > > -- > > Thanks, > > Alexandr Kuramshin > > > -- Thanks, Alexandr Kuramshin
Re: Capacity Planning - Calculating Memory Usage
Thank you, Andrey, I'll do additional tests with expire policy and update the article. 2016-12-13 22:10 GMT+07:00 Andrey Mashenkov <andrey.mashen...@gmail.com>: > Alexandr, > > In addition. If expire policy is configured, there is additional overhead > to entries can be tracked by TtlManager. > This overhead is OnHeap and does not depend on cache MemoryMode (until > Ignite-3840 will be in master). > > For now overhead is about 32-40 bytes (EntryWrapper itself) + (40-48) bytes > (ConcurrentSkipList node) per entry. > > > > On Tue, Dec 13, 2016 at 10:37 AM, Alexandr Kuramshin <ein.nsk...@gmail.com > > > wrote: > > > Hello, Igniters, > > > > I'd like to represent updated article [1] about the subject. > > > > And I'll very appreciate your comments and questions about it. > > > > Please review. > > > > [1] http://apacheignite.gridgain.org/docs/capacity-planning > > > > -- > > Thanks, > > Alexandr Kuramshin > > > > > > -- > С уважением, > Машенков Андрей Владимирович > Тел. +7-921-932-61-82 > > Best regards, > Andrey V. Mashenkov > Cerr: +7-921-932-61-82 > -- Thanks, Alexandr Kuramshin
Capacity Planning - Calculating Memory Usage
Hello, Igniters, I'd like to represent updated article [1] about the subject. And I'll very appreciate your comments and questions about it. Please review. [1] http://apacheignite.gridgain.org/docs/capacity-planning -- Thanks, Alexandr Kuramshin
[jira] [Created] (IGNITE-4417) OptimizedMarshaller: show property path causing ClassNotFoundException
Alexandr Kuramshin created IGNITE-4417: -- Summary: OptimizedMarshaller: show property path causing ClassNotFoundException Key: IGNITE-4417 URL: https://issues.apache.org/jira/browse/IGNITE-4417 Project: Ignite Issue Type: Improvement Components: general Reporter: Alexandr Kuramshin Priority: Minor When OptimizedMarshaller could not unmarshal an object on remote side by cause of ClassNotFoundException, then IgniteCheckedException is thrown. We could see in stack trace the class loader toString() value and the name of the class which was not found. This information is insufficient. We should also know which field or property of an object causes ClassNotFoundException. And, if this object contains inside another object, we should know the type of this object and its field or property as well. For example, IgniteCheckedException: Failed to unmarshal an object ClassName1 root.ClassName2 fieldName2.ClassName3 propName3. Given class loader: classLoaderToString. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: IgniteCache.loadCache improvement proposal
Val, Yakov, Sorry for delay, I need time to think and to do some tests. Anyway, extending the API and supply default implementation - is good. It makes frameworks more flexible and usable. But your proposal of extension will not solve the problem that I have raise. Please, read the next with special attention. Current implementation IgniteCache.loadCache causes parallel execution of IgniteCache.localLoadCache on each node in the cluster. It's bad implementation, but it's *right semantic*. You propose to extend IgniteCache.localLoadCache and use it to load data on all the nodes. It's bad semantic. But it also leads to bad implementation. Please note why. When you filter the data with the supplied IgniteBiPredicate, you may access the data that must be co-located. Hence to load the data to all the nodes, you need access to all the related data partitioned by the cluster. This leads to great network overhead and near caches overload. And that is why am I wondering that IgniteBiPredicate is executed for every key supplied by Cache.loadCache, but not only for those keys, which will be stored on this node. My opinion in conclusion. localLoadCache should first filter a key by the affinity function and the current cache topology, *then *invoke the predicate, and then store the entity in the cache (possibly by invoking the supplied closure). All associated partitions should be locked for the time of loading. IgniteCache.loadCache should perform Cache.loadCache on the one (or some more) nodes, then transfer entities to the remote nodes, *then *invoke the predicate and closure on the remote nodes. 2016-11-22 2:16 GMT+03:00 Valentin Kulichenko <valentin.kuliche...@gmail.com >: > Guys, > > I created a ticket for this: > https://issues.apache.org/jira/browse/IGNITE-4255 > > Feel free to provide comments. > > -Val > > On Sat, Nov 19, 2016 at 6:56 AM, Yakov Zhdanov <yzhda...@apache.org> > wrote: > > > > > > > > > > Why not store the partition ID in the database and query only local > > > partitions? Whatever approach we design with a DataStreamer will be > > slower > > > than this. > > > > > > > Because this can be some generic DB. Imagine the app migrating to IMDG. > > > > I am pretty sure that in many cases approach with data streamer will be > > faster and in many cases approach with multiple queries will be faster. > And > > the choice should depend on many factors. I like Val's suggestions. I > think > > he goes in the right direction. > > > > --Yakov > > > -- Thanks, Alexandr Kuramshin
Re: IgniteCache.loadCache improvement proposal
Dmitriy, I will not be fully confident that partition ID is the best approach in all cases. Even if we have full access to the database structure, there are another problems. Assume we have a table PERSON (ID NUMBER, NAME VARCHAR, SURNAME VARCHAR, AGE NUMBER, EMPL_DATE DATE). And we add our column PART NUMBER. While we already have indexes IDX1(NAME), IDX2(SURNAME), IDX3(AGE), IDX4(EMPL_DATE), we have to add new 2-column index IDX5(PART, EMPL_DATE) for pre-loading at startup, for example, recently employed persons. And if we'd like to query filtered data from the database, we'd also have to create the other compound indexes IDX6(PART, NAME), IDX7(PART, SURNAME), IDX8(PART, AGE). So we doubling overhead is defined by indexes. After this modifications on the database has been done and the PART column is filled, what we should do to preload the data? We should perform so many database queries so many partitions are stored on the nodes. Number of queries would be 1024 by default settings in the affinity functions. Some calls may not return any data at all, and it will be a vain network round-trip. Also it may be a problem for some databases to effectively perform number of parallel queries without a degradation on the total throughput. DataStreamer approach may be faster, but it should be tested. 2016-11-16 16:40 GMT+03:00 Dmitriy Setrakyan <dsetrak...@apache.org>: > On Wed, Nov 16, 2016 at 1:54 PM, Yakov Zhdanov <yzhda...@apache.org> > wrote: > > > > On Wed, Nov 16, 2016 at 11:22 AM, Yakov Zhdanov <yzhda...@apache.org> > > wrote: > > > > > > > Yakov, I agree that such scenario should be avoided. I also think > > that > > > > > > > loadCache(...) method, as it is right now, provides a way to avoid > > it. > > > > > > > > > > > > No, it does not. > > > > > > > > > Yes it does :) > > > > No it doesn't. Load cache should either send a query to DB that filters > all > > the data on server side which, in turn, may result to full-scan of 2 Tb > > data set dozens of times (equal to node count) or send a query that > brings > > the whole dataset to each node which is unacceptable as well. > > > > Why not store the partition ID in the database and query only local > partitions? Whatever approach we design with a DataStreamer will be slower > than this. > -- Thanks, Alexandr Kuramshin
[jira] [Created] (IGNITE-4245) Get EXCEPTION_ACCESS_VIOLATION with OFFHEAP_TIRED cache
Alexandr Kuramshin created IGNITE-4245: -- Summary: Get EXCEPTION_ACCESS_VIOLATION with OFFHEAP_TIRED cache Key: IGNITE-4245 URL: https://issues.apache.org/jira/browse/IGNITE-4245 Project: Ignite Issue Type: Bug Affects Versions: 1.7, 1.6, 1.8 Reporter: Alexandr Kuramshin Get EXCEPTION_ACCESS_VIOLATION while iterating through local cache entries stored in the OFFHEAP_TIRED cache. Test class and log are attached. I've try the same test on 1.6.11, 1.7.4 and 1.8 versions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: IgniteCache.loadCache improvement proposal
in Kulichenko < > >>>> valentin.kuliche...@gmail.com> wrote: > >>>>> > >>>>> It sounds like Aleksandr is basically proposing to support automatic > >>>>> persistence [1] for loading through data streamer and we really don't > >>>> have > >>>>> this. However, I think I have more generic solution in mind. > >>>>> > >>>>> What if we add one more IgniteCache.loadCache overload like this: > >>>>> > >>>>> loadCache(@Nullable IgniteBiPredicate<K, V> p, IgniteBiInClosure<K, > V> > >>>>> clo, @Nullable > >>>>> Object... args) > >>>>> > >>>>> It's the same as the existing one, but with the key-value closure > >>>> provided > >>>>> as a parameter. This closure will be passed to the > CacheStore.loadCache > >>>>> along with the arguments and will allow to override the logic that > >>>> actually > >>>>> saves the loaded entry in cache (currently this logic is always > >> provided > >>>> by > >>>>> the cache itself and user can't control it). > >>>>> > >>>>> We can then provide the implementation of this closure that will > >> create a > >>>>> data streamer and call addData() within its apply() method. > >>>>> > >>>>> I see the following advantages: > >>>>> > >>>>> - Any existing CacheStore implementation can be reused to load > through > >>>>> streamer (our JDBC and Cassandra stores or anything else that user > >>>> has). > >>>>> - Loading code is always part of CacheStore implementation, so it's > >>>> very > >>>>> easy to switch between different ways of loading. > >>>>> - User is not limited by two approaches we provide out of the box, > >> they > >>>>> can always implement a new one. > >>>>> > >>>>> Thoughts? > >>>>> > >>>>> [1] https://apacheignite.readme.io/docs/automatic-persistence > >>>>> > >>>>> -Val > >>>>> > >>>>> On Tue, Nov 15, 2016 at 2:27 AM, Alexey Kuznetsov < > >> akuznet...@apache.org > >>>>> > >>>>> wrote: > >>>>> > >>>>>> Hi, All! > >>>>>> > >>>>>> I think we do not need to chage API at all. > >>>>>> > >>>>>> public void loadCache(@Nullable IgniteBiPredicate<K, V> p, @Nullable > >>>>>> Object... args) throws CacheException; > >>>>>> > >>>>>> We could pass any args to loadCache(); > >>>>>> > >>>>>> So we could create class > >>>>>> IgniteCacheLoadDescriptor { > >>>>>> some fields that will describe how to load > >>>>>> } > >>>>>> > >>>>>> > >>>>>> and modify POJO store to detect and use such arguments. > >>>>>> > >>>>>> > >>>>>> All we need is to implement this and write good documentation and > >>>> examples. > >>>>>> > >>>>>> Thoughts? > >>>>>> > >>>>>> On Tue, Nov 15, 2016 at 5:22 PM, Alexandr Kuramshin < > >>>> ein.nsk...@gmail.com> > >>>>>> wrote: > >>>>>> > >>>>>>> Hi Vladimir, > >>>>>>> > >>>>>>> I don't offer any changes in API. Usage scenario is the same as it > >> was > >>>>>>> described in > >>>>>>> https://apacheignite.readme.io/docs/persistent-store# > >>>> section-loadcache- > >>>>>>> > >>>>>>> The preload cache logic invokes IgniteCache.loadCache() with some > >>>>>>> additional arguments, depending on a CacheStore implementation, and > >>>> then > >>>>>>> the loading occurs in the way I've already described. > >>>>>>> > >>>>>>> > >>>>>>> 2016-11-15 11:26 GMT+03:00 Vladimir Ozerov <voze...@gridgain.com>: > >>>>>>> > >
Re: IgniteCache.loadCache improvement proposal
Hi Vladimir, I don't offer any changes in API. Usage scenario is the same as it was described in https://apacheignite.readme.io/docs/persistent-store#section-loadcache- The preload cache logic invokes IgniteCache.loadCache() with some additional arguments, depending on a CacheStore implementation, and then the loading occurs in the way I've already described. 2016-11-15 11:26 GMT+03:00 Vladimir Ozerov <voze...@gridgain.com>: > Hi Alex, > > >>> Let's give the user the reusable code which is convenient, reliable and > fast. > Convenience - this is why I asked for example on how API can look like and > how users are going to use it. > > Vladimir. > > On Tue, Nov 15, 2016 at 11:18 AM, Alexandr Kuramshin <ein.nsk...@gmail.com > > > wrote: > > > Hi all, > > > > I think the discussion goes a wrong direction. Certainly it's not a big > > deal to implement some custom user logic to load the data into caches. > But > > Ignite framework gives the user some reusable code build on top of the > > basic system. > > > > So the main question is: Why developers let the user to use convenient > way > > to load caches with totally non-optimal solution? > > > > We could talk too much about different persistence storage types, but > > whenever we initiate the loading with IgniteCache.loadCache the current > > implementation imposes much overhead on the network. > > > > Partition-aware data loading may be used in some scenarios to avoid this > > network overhead, but the users are compelled to do additional steps to > > achieve this optimization: adding the column to tables, adding compound > > indices including the added column, write a peace of repeatable code to > > load the data in different caches in fault-tolerant fashion, etc. > > > > Let's give the user the reusable code which is convenient, reliable and > > fast. > > > > 2016-11-14 20:56 GMT+03:00 Valentin Kulichenko < > > valentin.kuliche...@gmail.com>: > > > > > Hi Aleksandr, > > > > > > Data streamer is already outlined as one of the possible approaches for > > > loading the data [1]. Basically, you start a designated client node or > > > chose a leader among server nodes [1] and then use IgniteDataStreamer > API > > > to load the data. With this approach there is no need to have the > > > CacheStore implementation at all. Can you please elaborate what > > additional > > > value are you trying to add here? > > > > > > [1] https://apacheignite.readme.io/docs/data-loading# > ignitedatastreamer > > > [2] https://apacheignite.readme.io/docs/leader-election > > > > > > -Val > > > > > > On Mon, Nov 14, 2016 at 8:23 AM, Dmitriy Setrakyan < > > dsetrak...@apache.org> > > > wrote: > > > > > > > Hi, > > > > > > > > I just want to clarify a couple of API details from the original > email > > to > > > > make sure that we are making the right assumptions here. > > > > > > > > *"Because of none keys are passed to the CacheStore.loadCache > methods, > > > the > > > > > underlying implementation is forced to read all the data from the > > > > > persistence storage"* > > > > > > > > > > > > According to the javadoc, loadCache(...) method receives an optional > > > > argument from the user. You can pass anything you like, including a > > list > > > of > > > > keys, or an SQL where clause, etc. > > > > > > > > *"The partition-aware data loading approach is not a choice. It > > requires > > > > > persistence of the volatile data depended on affinity function > > > > > implementation and settings."* > > > > > > > > > > > > This is only partially true. While Ignite allows to plugin custom > > > affinity > > > > functions, the affinity function is not something that changes > > > dynamically > > > > and should always return the same partition for the same key.So, the > > > > partition assignments are not volatile at all. If, in some very rare > > > case, > > > > the partition assignment logic needs to change, then you could update > > the > > > > partition assignments that you may have persisted elsewhere as well, > > e.g. > > > > database. > > > > > > > > D. > > > > > > > > On Mon, Nov 14, 2016 at 10:23 AM, Vladimir
Re: IgniteCache.loadCache improvement proposal
> > > Looks good for me. > > > > > > > > But I will suggest to consider one more use-case: > > > > > > > > If user knows its data he could manually split loading. > > > > For example: table Persons contains 10M rows. > > > > User could provide something like: > > > > cache.loadCache(null, "Person", "select * from Person where id < > > > > 1_000_000", > > > > "Person", "select * from Person where id >= 1_000_000 and id < > > > 2_000_000", > > > > > > > > "Person", "select * from Person where id >= 9_000_000 and id < > > > 10_000_000", > > > > ); > > > > > > > > or may be it could be some descriptor object like > > > > > > > > { > > > >sql: select * from Person where id >= ? and id < ?" > > > >range: 0...10_000_000 > > > > } > > > > > > > > In this case provided queries will be send to mach nodes as number of > > > > queries. > > > > And data will be loaded in parallel and for keys that a not local - > > data > > > > streamer > > > > should be used (as described Alexandr description). > > > > > > > > I think it is a good issue for Ignite 2.0 > > > > > > > > Vova, Val - what do you think? > > > > > > > > > > > > On Mon, Nov 14, 2016 at 4:01 PM, Alexandr Kuramshin < > > > ein.nsk...@gmail.com> > > > > wrote: > > > > > > > >> All right, > > > >> > > > >> Let's assume a simple scenario. When the IgniteCache.loadCache is > > > invoked, > > > >> we check whether the cache is not local, and if so, then we'll > > initiate > > > >> the > > > >> new loading logic. > > > >> > > > >> First, we take a "streamer" node, it could be done by > > > >> utilizing LoadBalancingSpi, or it may be configured statically, for > > the > > > >> reason that the streamer node is running on the same host as the > > > >> persistence storage provider. > > > >> > > > >> After that we start the loading task on the streamer node which > > > >> creates IgniteDataStreamer and loads the cache with > > > CacheStore.loadCache. > > > >> Every call to IgniteBiInClosure.apply simply > > > >> invokes IgniteDataStreamer.addData. > > > >> > > > >> This implementation will completely relieve overhead on the > > persistence > > > >> storage provider. Network overhead is also decreased in the case of > > > >> partitioned caches. For two nodes we get 1-1/2 amount of data > > > transferred > > > >> by the network (1 part well be transferred from the persistence > > storage > > > to > > > >> the streamer, and then 1/2 from the streamer node to the another > > node). > > > >> For > > > >> three nodes it will be 1-2/3 and so on, up to the two times amount > of > > > data > > > >> on the big clusters. > > > >> > > > >> I'd like to propose some additional optimization at this place. If > we > > > have > > > >> the streamer node on the same machine as the persistence storage > > > provider, > > > >> then we completely relieve the network overhead as well. It could > be a > > > >> some > > > >> special daemon node for the cache loading assigned in the cache > > > >> configuration, or an ordinary sever node as well. > > > >> > > > >> Certainly this calculations have been done in assumption that we > have > > > even > > > >> partitioned cache with only primary nodes (without backups). In the > > case > > > >> of > > > >> one backup (the most frequent case I think), we get 2 amount of data > > > >> transferred by the network on two nodes, 2-1/3 on three, 2-1/2 on > > four, > > > >> and > > > >> so on up to the three times amount of data on the big clusters. > Hence > > > it's > > > >> still better than the current implementation. In the worst case > with a > > > >> fully replicated cache we take N+1 amount of data transferred by the > > > >
Re: IgniteCache.loadCache improvement proposal
All right, Let's assume a simple scenario. When the IgniteCache.loadCache is invoked, we check whether the cache is not local, and if so, then we'll initiate the new loading logic. First, we take a "streamer" node, it could be done by utilizing LoadBalancingSpi, or it may be configured statically, for the reason that the streamer node is running on the same host as the persistence storage provider. After that we start the loading task on the streamer node which creates IgniteDataStreamer and loads the cache with CacheStore.loadCache. Every call to IgniteBiInClosure.apply simply invokes IgniteDataStreamer.addData. This implementation will completely relieve overhead on the persistence storage provider. Network overhead is also decreased in the case of partitioned caches. For two nodes we get 1-1/2 amount of data transferred by the network (1 part well be transferred from the persistence storage to the streamer, and then 1/2 from the streamer node to the another node). For three nodes it will be 1-2/3 and so on, up to the two times amount of data on the big clusters. I'd like to propose some additional optimization at this place. If we have the streamer node on the same machine as the persistence storage provider, then we completely relieve the network overhead as well. It could be a some special daemon node for the cache loading assigned in the cache configuration, or an ordinary sever node as well. Certainly this calculations have been done in assumption that we have even partitioned cache with only primary nodes (without backups). In the case of one backup (the most frequent case I think), we get 2 amount of data transferred by the network on two nodes, 2-1/3 on three, 2-1/2 on four, and so on up to the three times amount of data on the big clusters. Hence it's still better than the current implementation. In the worst case with a fully replicated cache we take N+1 amount of data transferred by the network (where N is the number of nodes in the cluster). But it's not a problem in small clusters, and a little overhead in big clusters. And we still gain the persistence storage provider optimization. Now let's take more complex scenario. To achieve some level of parallelism, we could split our cluster on several groups. It could be a parameter of the IgniteCache.loadCache method or a cache configuration option. The number of groups could be a fixed value, or it could be calculated dynamically by the maximum number of nodes in the group. After splitting the whole cluster on groups we will take the streamer node in the each group and submit the task for loading the cache similar to the single streamer scenario, except as the only keys will be passed to the IgniteDataStreamer.addData method those correspond to the cluster group where is the streamer node running. In this case we get equal level of overhead as the parallelism, but not so surplus as how many nodes in whole the cluster. 2016-11-11 15:37 GMT+03:00 Alexey Kuznetsov <akuznet...@apache.org>: > Alexandr, > > Could you describe your proposal in more details? > Especially in case with several nodes. > > On Fri, Nov 11, 2016 at 6:34 PM, Alexandr Kuramshin <ein.nsk...@gmail.com> > wrote: > > > Hi, > > > > You know CacheStore API that is commonly used for read/write-through > > relationship of the in-memory data with the persistence storage. > > > > There is also IgniteCache.loadCache method for hot-loading the cache on > > startup. Invocation of this method causes execution of > CacheStore.loadCache > > on the all nodes storing the cache partitions. Because of none keys are > > passed to the CacheStore.loadCache methods, the underlying implementation > > is forced to read all the data from the persistence storage, but only > part > > of the data will be stored on each node. > > > > So, the current implementation have two general drawbacks: > > > > 1. Persistence storage is forced to perform as many identical queries as > > many nodes on the cluster. Each query may involve much additional > > computation on the persistence storage server. > > > > 2. Network is forced to transfer much more data, so obviously the big > > disadvantage on large systems. > > > > The partition-aware data loading approach, described in > > https://apacheignite.readme.io/docs/data-loading#section- > > partition-aware-data-loading > > , is not a choice. It requires persistence of the volatile data depended > on > > affinity function implementation and settings. > > > > I propose using something like IgniteDataStreamer inside > > IgniteCache.loadCache implementation. > > > > > > -- > > Thanks, > > Alexandr Kuramshin > > > > > > -- > Alexey Kuznetsov > -- Thanks, Alexandr Kuramshin
IgniteCache.loadCache improvement proposal
Hi, You know CacheStore API that is commonly used for read/write-through relationship of the in-memory data with the persistence storage. There is also IgniteCache.loadCache method for hot-loading the cache on startup. Invocation of this method causes execution of CacheStore.loadCache on the all nodes storing the cache partitions. Because of none keys are passed to the CacheStore.loadCache methods, the underlying implementation is forced to read all the data from the persistence storage, but only part of the data will be stored on each node. So, the current implementation have two general drawbacks: 1. Persistence storage is forced to perform as many identical queries as many nodes on the cluster. Each query may involve much additional computation on the persistence storage server. 2. Network is forced to transfer much more data, so obviously the big disadvantage on large systems. The partition-aware data loading approach, described in https://apacheignite.readme.io/docs/data-loading#section-partition-aware-data-loading , is not a choice. It requires persistence of the volatile data depended on affinity function implementation and settings. I propose using something like IgniteDataStreamer inside IgniteCache.loadCache implementation. -- Thanks, Alexandr Kuramshin