[jira] [Created] (IGNITE-12089) JVM is halted after this error during rolling restart of a cluster
temp2 created IGNITE-12089: -- Summary: JVM is halted after this error during rolling restart of a cluster Key: IGNITE-12089 URL: https://issues.apache.org/jira/browse/IGNITE-12089 Project: Ignite Issue Type: Bug Affects Versions: 2.6 Reporter: temp2 JVM is halted after this error during rolling restart of a cluster: excepition is :528-a852-c65782e337f0][2019-08-20 17:22:10,901][ERROR][ttl-cleanup-worker-#155][] Critical system error detected. Will be handled accordingly to configured handler [hnd=class o.a.i.failure.StopNodeOrHaltFailure528-a852-c65782e337f0][2019-08-20 17:22:10,901][ERROR][ttl-cleanup-worker-#155][] Critical system error detected. Will be handled accordingly to configured handler [hnd=class o.a.i.failure.StopNodeOrHaltFailureHandler, failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION, err=class o.a.i.IgniteException: Runtime failure on bounds: [lower=PendingRow [], upper=PendingRow [org.apache.ignite.IgniteException: Runtime failure on bounds: [lower=PendingRow [], upper=PendingRow []] at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.find(BPlusTree.java:971) ~[ignite-core-2.6.0.jar:2.6.0] at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.find(BPlusTree.java:950) ~[ignite-core-2.6.0.jar:2.6.0] at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.expire(IgniteCacheOffheapManagerImpl.java:1022) ~[ignite-core-2.6.0.jar:2.6.0] at org.apache.ignite.internal.processors.cache.GridCacheTtlManager.expire(GridCacheTtlManager.java:197) ~[ignite-core-2.6.0.jar:2.6.0] at org.apache.ignite.internal.processors.cache.GridCacheSharedTtlCleanupManager$CleanupWorker.body(GridCacheSharedTtlCleanupManager.java:137) [ignite-core-2.6.0.jar:2.6.0] at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) [ignite-core-2.6.0.jar:2.6.0] at java.lang.Thread.run(Thread.java:745) [?:1.8.0_101]Caused by: java.lang.IllegalStateException: Failed to get page IO instance (page content is corrupted) at org.apache.ignite.internal.processors.cache.persistence.tree.io.IOVersions.forVersion(IOVersions.java:83) ~[ignite-core-2.6.0.jar:2.6.0] at org.apache.ignite.internal.processors.cache.persistence.tree.io.IOVersions.forPage(IOVersions.java:95) ~[ignite-core-2.6.0.jar:2.6.0] at org.apache.ignite.internal.processors.cache.persistence.CacheDataRowAdapter.initFromLink(CacheDataRowAdapter.java:148) ~[ignite-core-2.6.0.jar:2.6.0] at org.apache.ignite.internal.processors.cache.persistence.CacheDataRowAdapter.initFromLink(CacheDataRowAdapter.java:102) ~[ignite-core-2.6.0.jar:2.6.0] at org.apache.ignite.internal.processors.cache.tree.PendingRow.initKey(PendingRow.java:72) ~[ignite-core-2.6.0.jar:2.6.0] at org.apache.ignite.internal.processors.cache.tree.PendingEntriesTree.getRow(PendingEntriesTree.java:118) ~[ignite-core-2.6.0.jar:2.6.0] at org.apache.ignite.internal.processors.cache.tree.PendingEntriesTree.getRow(PendingEntriesTree.java:31) ~[ignite-core-2.6.0.jar:2.6.0] at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$ForwardCursor.fillFromBuffer(BPlusTree.java:4660) ~[ignite-core-2.6.0.jar:2.6.0] at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$ForwardCursor.init(BPlusTree.java:4562) ~[ignite-core-2.6.0.jar:2.6.0] at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$ForwardCursor.access$5300(BPlusTree.java:4501) ~[ignite-core-2.6.0.jar:2.6.0] at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$GetCursor.notFound(BPlusTree.java:2633) ~[ignite-core-2.6.0.jar:2.6.0] at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Search.run0(BPlusTree.java:293) ~[ignite-core-2.6.0.jar:2.6.0] at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$GetPageHandler.run(BPlusTree.java:4816) ~[ignite-core-2.6.0.jar:2.6.0] at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$GetPageHandler.run(BPlusTree.java:4801) ~[ignite-core-2.6.0.jar:2.6.0] at org.apache.ignite.internal.processors.cache.persistence.tree.util.PageHandler.readPage(PageHandler.java:158) ~[ignite-core-2.6.0.jar:2.6.0] at org.apache.ignite.internal.processors.cache.persistence.DataStructure.read(DataStructure.java:332) ~[ignite-core-2.6.0.jar:2.6.0] at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findDown(BPlusTree.java:1140) ~[ignite-core-2.6.0.jar:2.6.0] at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findDown(BPlusTree.java:1149) ~[ignite-core-2.6.0.jar:2.6.0] at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.doFind(BPlusTree.java:1107) ~[ignite-core-2.6.0.jar:2.6.0] at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.access$15800(BPlusTree.java:83) ~[ignite
[jira] [Assigned] (IGNITE-12087) Transactional putAll - significant performance drop on big batches of entries.
[ https://issues.apache.org/jira/browse/IGNITE-12087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eduard Shangareev reassigned IGNITE-12087: -- Assignee: Eduard Shangareev > Transactional putAll - significant performance drop on big batches of entries. > -- > > Key: IGNITE-12087 > URL: https://issues.apache.org/jira/browse/IGNITE-12087 > Project: Ignite > Issue Type: Bug > Components: cache >Reporter: Pavel Pereslegin >Assignee: Eduard Shangareev >Priority: Major > > After IGNITE-5227 have been fixed I found significant performance drop in > putAll operation. > Insertion of 30_000 entries before IGNITE-5227 took ~1 second. > After IGNITE-5227 - 130 seconds (~100x slower). > I checked a different batch size: > 10_000 - 10 seconds > 20_000 - 48 seconds > 30_000 - 130 seconds > and I was not able to wait for the result of 100_000 entries. > Reproducer > {code:java} > public class CheckPutAll extends GridCommonAbstractTest { > @Override protected IgniteConfiguration getConfiguration(String > igniteInstanceName) throws Exception { > IgniteConfiguration cfg = super.getConfiguration(igniteInstanceName); > CacheConfiguration ccfg = new CacheConfiguration(DEFAULT_CACHE_NAME); > ccfg.setAtomicityMode(TRANSACTIONAL); > cfg.setCacheConfiguration(ccfg); > return cfg; > } > @Test > public void check() throws Exception { > int cnt = 30_000; > Map data = new HashMap<>(U.capacity(cnt)); > for (int i = 0; i < cnt; i++) > data.put(i, i); > Ignite node0 = startGrid(0); > IgniteCache cache0 = > node0.cache(DEFAULT_CACHE_NAME); > cache0.putAll(data); > } > }{code} -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (IGNITE-7285) Add default query timeout
[ https://issues.apache.org/jira/browse/IGNITE-7285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16911729#comment-16911729 ] Ivan Pavlukhin commented on IGNITE-7285: [~amashenkov], could you please step in and continue the review? Unfortunately, for a couple of weeks I have limited access to my computer and cannot do a review in a timely manner. > Add default query timeout > - > > Key: IGNITE-7285 > URL: https://issues.apache.org/jira/browse/IGNITE-7285 > Project: Ignite > Issue Type: Improvement > Components: cache, sql >Affects Versions: 2.3 >Reporter: Valentin Kulichenko >Assignee: Saikat Maitra >Priority: Major > Labels: sql-stability > Fix For: 2.8 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > Currently it's possible to provide timeout only on query level. It would be > very useful to have default timeout value provided on cache startup. Let's > add {{CacheConfiguration#defaultQueryTimeout}} configuration property. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (IGNITE-7285) Add default query timeout
[ https://issues.apache.org/jira/browse/IGNITE-7285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16911725#comment-16911725 ] Ivan Pavlukhin commented on IGNITE-7285: Hi [~samaitra], I suspect that not all places where a timeout should be passed are covered yet. For me it looks more natural to calculate a query timeout (taking default one into account) upon {{QueryParameters}} initialization. It is worth to consider following points for an implementation: 1. Reduce a number of places where _default timeout_ is read from configuration (ideally there should be only one place). 2. Be able to disable a timeout for a particular query ({{SqlFieldsQuery.setTimeout(0)}}) when default non-zero timeout is configured. > Add default query timeout > - > > Key: IGNITE-7285 > URL: https://issues.apache.org/jira/browse/IGNITE-7285 > Project: Ignite > Issue Type: Improvement > Components: cache, sql >Affects Versions: 2.3 >Reporter: Valentin Kulichenko >Assignee: Saikat Maitra >Priority: Major > Labels: sql-stability > Fix For: 2.8 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > Currently it's possible to provide timeout only on query level. It would be > very useful to have default timeout value provided on cache startup. Let's > add {{CacheConfiguration#defaultQueryTimeout}} configuration property. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (IGNITE-12080) Add extended logging for rebalance
[ https://issues.apache.org/jira/browse/IGNITE-12080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16911630#comment-16911630 ] Maxim Muzafarov commented on IGNITE-12080: -- Folks, I've looked through the changes and have a few questions regarding implementation. * Why the static utils class is used for collecting rebalance info? Why not, for instance, the DiagnosticProcossor (introduced recently)? * After IGNITE-3195 will be merged there is no reason to collect statistics about `rebalance topic` it will be replaced with the thread pools. * Do you have any benchmarks with the `printRebalanceStatistics` property enabled? Since the rebalance procedure can be run 4-8 hours it is necessary to check and analyze JVM metrics (GC, used heap etc.) We can have thousands of Supply-Demand messages and for each, we are holding in the heap a `RebalanceMessageStatistics` until the rebalance procedure finishes. * Printed statistics are not in the human-readable format. Is it user-friendly? Moreover, it is up to the implementation to print statistics the right way in logs. I think we don't need any abbreviations (e.g. `writeAliasesRebalanceStatistics`) to decode logs. * Do we have TC execution with `printRebalanceStatistics` enabled property on all suites? It seems to me we can get a `NullPointerException` for some of the cases. * Why the `RebalanceMessageStatistics` is needed? I don't think that holding `sndMsgTime` for each message will be useful for rebalancing statistic at all. The same thing for `rcvMsgTime`. * I think `ReceivePartitionStatistics`.`msgSize` will be the same for 98% cases. Do we need it? * Do we need `PartitionStatistics` at all? Can the same value be obtained from metrics `onRebalanceKeyReceived` and the end of the rebalance procedure? Please, do not merge PR until all the issues will be resolved. > Add extended logging for rebalance > -- > > Key: IGNITE-12080 > URL: https://issues.apache.org/jira/browse/IGNITE-12080 > Project: Ignite > Issue Type: Improvement >Reporter: Kirill Tkalenko >Assignee: Kirill Tkalenko >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > We should log all information about finished rebalance on demander node. > I'd have in log: > h3. Total information: > # Rebalance duration, rebalance start time/rebalance finish time > # How many partitions were processed in each topic (number of paritions, > number of entries, number of bytes) > # How many nodes were suppliers in rebalance (nodeId, number of supplied > paritions, number of supplied entries, number of bytes, duraton of getting > and processing partitions from supplier) > h3. Information per cache group: > # Rebalance duration, rebalance start time/rebalance finish time > # How many partitions were processed in each topic (number of paritions, > number of entries, number of bytes) > # How many nodes were suppliers in rebalance (nodeId, number of supplied > paritions, list of partition ids with PRIMARY/BACKUP flag, number of supplied > entries, number of bytes, duraton of getting and processing partitions from > supplier) > # Information about each partition distribution (list of nodeIds with > primary/backup flag and marked supplier nodeId) > h3. Information per supplier node: > # How many paritions were requested: > #* Total number > #* Primary/backup distribution (number of primary partitions, number of > backup partitions) > #* Total number of entries > #* Total size partitions in bytes > # How many paritions were requested per cache group: > #* Number of requested partitions > #* Number of entries in partitions > #* Total size of partitions in bytes > #* List of requested partitions with size in bytes, count entries, primary or > backup partition flag -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (IGNITE-9638) .NET: JVM keeps track of CLR Threads, even when they are finished
[ https://issues.apache.org/jira/browse/IGNITE-9638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16911566#comment-16911566 ] Pavel Tupitsyn commented on IGNITE-9638: [~ilyak] we could do that, as well as provide a way to detach manually. Both of those approaches are bad usability, but maybe they make sense in the short term while we work on the proper fix. > .NET: JVM keeps track of CLR Threads, even when they are finished > -- > > Key: IGNITE-9638 > URL: https://issues.apache.org/jira/browse/IGNITE-9638 > Project: Ignite > Issue Type: Bug > Components: platforms >Affects Versions: 2.6 >Reporter: Ilya Kasnacheev >Assignee: Pavel Tupitsyn >Priority: Major > Labels: .NET > Fix For: 2.8 > > Attachments: IgniteRepro.zip > > > When you create a Thread in C#, JVM creates corresponding thread > "Thread-" which is visible in jstack. When C# joins this thread, it is > not removed from JVM and is kept around. This means that jstack may show > thousands of threads which are not there. Reproducer is attached. It is > presumed that memory will be exhausted eventually. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (IGNITE-12088) Cache or template name should be validated before attempt to start
Pavel Kovalenko created IGNITE-12088: Summary: Cache or template name should be validated before attempt to start Key: IGNITE-12088 URL: https://issues.apache.org/jira/browse/IGNITE-12088 Project: Ignite Issue Type: Bug Components: cache Reporter: Pavel Kovalenko Fix For: 2.8 If set too long cache name it can be a cause of impossibility to create work directory for that cache: {noformat} [2019-08-20 19:35:42,139][ERROR][exchange-worker-#172%node1%][IgniteTestResources] Critical system error detected. Will be handled accordingly to configured handler [hnd=NoOpFailureHandler [super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=CRITICAL_ERROR, err=class o.a.i.IgniteCheckedException: Failed to initialize cache working directory (failed to create, make sure the work folder has correct permissions): /home/gridgain/projects/incubator-ignite/work/db/node1/cache-CacheConfiguration [name=ccfg3staticTemplate*, grpName=null, memPlcName=null, storeConcurrentLoadAllThreshold=5, rebalancePoolSize=1, rebalanceTimeout=1, evictPlc=null, evictPlcFactory=null, onheapCache=false, sqlOnheapCache=false, sqlOnheapCacheMaxSize=0, evictFilter=null, eagerTtl=true, dfltLockTimeout=0, nearCfg=null, writeSync=null, storeFactory=null, storeKeepBinary=false, loadPrevVal=false, aff=null, cacheMode=PARTITIONED, atomicityMode=null, backups=6, invalidate=false, tmLookupClsName=null, rebalanceMode=ASYNC, rebalanceOrder=0, rebalanceBatchSize=524288, rebalanceBatchesPrefetchCnt=2, maxConcurrentAsyncOps=500, sqlIdxMaxInlineSize=-1, writeBehindEnabled=false, writeBehindFlushSize=10240, writeBehindFlushFreq=5000, writeBehindFlushThreadCnt=1, writeBehindBatchSize=512, writeBehindCoalescing=true, maxQryIterCnt=1024, affMapper=null, rebalanceDelay=0, rebalanceThrottle=0, interceptor=null, longQryWarnTimeout=3000, qryDetailMetricsSz=0, readFromBackup=true, nodeFilter=null, sqlSchema=null, sqlEscapeAll=false, cpOnRead=true, topValidator=null, partLossPlc=IGNORE, qryParallelism=1, evtsDisabled=false, encryptionEnabled=false, diskPageCompression=null, diskPageCompressionLevel=null]0]] class org.apache.ignite.IgniteCheckedException: Failed to initialize cache working directory (failed to create, make sure the work folder has correct permissions): /home/gridgain/projects/incubator-ignite/work/db/node1/cache-CacheConfiguration [name=ccfg3staticTemplate*, grpName=null, memPlcName=null, storeConcurrentLoadAllThreshold=5, rebalancePoolSize=1, rebalanceTimeout=1, evictPlc=null, evictPlcFactory=null, onheapCache=false, sqlOnheapCache=false, sqlOnheapCacheMaxSize=0, evictFilter=null, eagerTtl=true, dfltLockTimeout=0, nearCfg=null, writeSync=null, storeFactory=null, storeKeepBinary=false, loadPrevVal=false, aff=null, cacheMode=PARTITIONED, atomicityMode=null, backups=6, invalidate=false, tmLookupClsName=null, rebalanceMode=ASYNC, rebalanceOrder=0, rebalanceBatchSize=524288, rebalanceBatchesPrefetchCnt=2, maxConcurrentAsyncOps=500, sqlIdxMaxInlineSize=-1, writeBehindEnabled=false, writeBehindFlushSize=10240, writeBehindFlushFreq=5000, writeBehindFlushThreadCnt=1, writeBehindBatchSize=512, writeBehindCoalescing=true, maxQryIterCnt=1024, affMapper=null, rebalanceDelay=0, rebalanceThrottle=0, interceptor=null, longQryWarnTimeout=3000, qryDetailMetricsSz=0, readFromBackup=true, nodeFilter=null, sqlSchema=null, sqlEscapeAll=false, cpOnRead=true, topValidator=null, partLossPlc=IGNORE, qryParallelism=1, evtsDisabled=false, encryptionEnabled=false, diskPageCompression=null, diskPageCompressionLevel=null]0 at org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.checkAndInitCacheWorkDir(FilePageStoreManager.java:769) at org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.checkAndInitCacheWorkDir(FilePageStoreManager.java:748) at org.apache.ignite.internal.processors.cache.CachesRegistry.persistCacheConfigurations(CachesRegistry.java:289) at org.apache.ignite.internal.processors.cache.CachesRegistry.registerAllCachesAndGroups(CachesRegistry.java:264) at org.apache.ignite.internal.processors.cache.CachesRegistry.update(CachesRegistry.java:202) at org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager.onCacheChangeRequest(CacheAffinitySharedManager.java:850) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onCacheChangeRequest(GridDhtPartitionsExchangeFuture.java:1306) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:846) at org.apache.ignite.internal.processors.c
[jira] [Updated] (IGNITE-12088) Cache or template name should be validated before attempt to start
[ https://issues.apache.org/jira/browse/IGNITE-12088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Kovalenko updated IGNITE-12088: - Labels: usability (was: ) > Cache or template name should be validated before attempt to start > -- > > Key: IGNITE-12088 > URL: https://issues.apache.org/jira/browse/IGNITE-12088 > Project: Ignite > Issue Type: Bug > Components: cache >Reporter: Pavel Kovalenko >Priority: Critical > Labels: usability > Fix For: 2.8 > > > If set too long cache name it can be a cause of impossibility to create work > directory for that cache: > {noformat} > [2019-08-20 > 19:35:42,139][ERROR][exchange-worker-#172%node1%][IgniteTestResources] > Critical system error detected. Will be handled accordingly to configured > handler [hnd=NoOpFailureHandler [super=AbstractFailureHandler > [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, > SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext > [type=CRITICAL_ERROR, err=class o.a.i.IgniteCheckedException: Failed to > initialize cache working directory (failed to create, make sure the work > folder has correct permissions): > /home/gridgain/projects/incubator-ignite/work/db/node1/cache-CacheConfiguration > [name=ccfg3staticTemplate*, grpName=null, memPlcName=null, > storeConcurrentLoadAllThreshold=5, rebalancePoolSize=1, > rebalanceTimeout=1, evictPlc=null, evictPlcFactory=null, > onheapCache=false, sqlOnheapCache=false, sqlOnheapCacheMaxSize=0, > evictFilter=null, eagerTtl=true, dfltLockTimeout=0, nearCfg=null, > writeSync=null, storeFactory=null, storeKeepBinary=false, loadPrevVal=false, > aff=null, cacheMode=PARTITIONED, atomicityMode=null, backups=6, > invalidate=false, tmLookupClsName=null, rebalanceMode=ASYNC, > rebalanceOrder=0, rebalanceBatchSize=524288, rebalanceBatchesPrefetchCnt=2, > maxConcurrentAsyncOps=500, sqlIdxMaxInlineSize=-1, writeBehindEnabled=false, > writeBehindFlushSize=10240, writeBehindFlushFreq=5000, > writeBehindFlushThreadCnt=1, writeBehindBatchSize=512, > writeBehindCoalescing=true, maxQryIterCnt=1024, affMapper=null, > rebalanceDelay=0, rebalanceThrottle=0, interceptor=null, > longQryWarnTimeout=3000, qryDetailMetricsSz=0, readFromBackup=true, > nodeFilter=null, sqlSchema=null, sqlEscapeAll=false, cpOnRead=true, > topValidator=null, partLossPlc=IGNORE, qryParallelism=1, evtsDisabled=false, > encryptionEnabled=false, diskPageCompression=null, > diskPageCompressionLevel=null]0]] > class org.apache.ignite.IgniteCheckedException: Failed to initialize cache > working directory (failed to create, make sure the work folder has correct > permissions): > /home/gridgain/projects/incubator-ignite/work/db/node1/cache-CacheConfiguration > [name=ccfg3staticTemplate*, grpName=null, memPlcName=null, > storeConcurrentLoadAllThreshold=5, rebalancePoolSize=1, > rebalanceTimeout=1, evictPlc=null, evictPlcFactory=null, > onheapCache=false, sqlOnheapCache=false, sqlOnheapCacheMaxSize=0, > evictFilter=null, eagerTtl=true, dfltLockTimeout=0, nearCfg=null, > writeSync=null, storeFactory=null, storeKeepBinary=false, loadPrevVal=false, > aff=null, cacheMode=PARTITIONED, atomicityMode=null, backups=6, > invalidate=false, tmLookupClsName=null, rebalanceMode=ASYNC, > rebalanceOrder=0, rebalanceBatchSize=524288, rebalanceBatchesPrefetchCnt=2, > maxConcurrentAsyncOps=500, sqlIdxMaxInlineSize=-1, writeBehindEnabled=false, > writeBehindFlushSize=10240, writeBehindFlushFreq=5000, > writeBehindFlushThreadCnt=1, writeBehindBatchSize=512, > writeBehindCoalescing=true, maxQryIterCnt=1024, affMapper=null, > rebalanceDelay=0, rebalanceThrottle=0, interceptor=null, > longQryWarnTimeout=3000, qryDetailMetricsSz=0, readFromBackup=true, > nodeFilter=null, sqlSchema=null, sqlEscapeAll=false, cpOnRead=true, > topValidator=null, partLossPlc=IGNORE, qryParallelism=1, evtsDisabled=false, > encryptionEnabled=false, diskPageCompression=null, > diskPageCompressionLevel=null]0 > at > org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.checkAndInitCacheWorkDir(FilePageStoreManager.java:769) > at > org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.checkAndInitCacheWorkDir(FilePageStoreManager.java:748) > at > org.apache.ignite.internal.processors.cache.CachesRegistry.persistCacheConfigurations(CachesRegistry.java:289) > at > org.apache.ignite.internal.processors.cache.CachesRegistry.registerAllCachesAndGroups(CachesRegistry.java:264) > at > org.apache.ignite.internal.processors.cache.CachesRegistry.update(CachesRegistry.java:202) > at > org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager.onCacheChangeRequest(CacheAffinitySharedMa
[jira] [Commented] (IGNITE-12080) Add extended logging for rebalance
[ https://issues.apache.org/jira/browse/IGNITE-12080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16911539#comment-16911539 ] Eduard Shangareev commented on IGNITE-12080: Looks good. Thank you for your contribution. > Add extended logging for rebalance > -- > > Key: IGNITE-12080 > URL: https://issues.apache.org/jira/browse/IGNITE-12080 > Project: Ignite > Issue Type: Improvement >Reporter: Kirill Tkalenko >Assignee: Kirill Tkalenko >Priority: Major > > We should log all information about finished rebalance on demander node. > I'd have in log: > h3. Total information: > # Rebalance duration, rebalance start time/rebalance finish time > # How many partitions were processed in each topic (number of paritions, > number of entries, number of bytes) > # How many nodes were suppliers in rebalance (nodeId, number of supplied > paritions, number of supplied entries, number of bytes, duraton of getting > and processing partitions from supplier) > h3. Information per cache group: > # Rebalance duration, rebalance start time/rebalance finish time > # How many partitions were processed in each topic (number of paritions, > number of entries, number of bytes) > # How many nodes were suppliers in rebalance (nodeId, number of supplied > paritions, list of partition ids with PRIMARY/BACKUP flag, number of supplied > entries, number of bytes, duraton of getting and processing partitions from > supplier) > # Information about each partition distribution (list of nodeIds with > primary/backup flag and marked supplier nodeId) > h3. Information per supplier node: > # How many paritions were requested: > #* Total number > #* Primary/backup distribution (number of primary partitions, number of > backup partitions) > #* Total number of entries > #* Total size partitions in bytes > # How many paritions were requested per cache group: > #* Number of requested partitions > #* Number of entries in partitions > #* Total size of partitions in bytes > #* List of requested partitions with size in bytes, count entries, primary or > backup partition flag -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (IGNITE-9638) .NET: JVM keeps track of CLR Threads, even when they are finished
[ https://issues.apache.org/jira/browse/IGNITE-9638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16911534#comment-16911534 ] Ilya Kasnacheev commented on IGNITE-9638: - Maybe we should have an option where every thread is detached every time we need to return? > .NET: JVM keeps track of CLR Threads, even when they are finished > -- > > Key: IGNITE-9638 > URL: https://issues.apache.org/jira/browse/IGNITE-9638 > Project: Ignite > Issue Type: Bug > Components: platforms >Affects Versions: 2.6 >Reporter: Ilya Kasnacheev >Assignee: Pavel Tupitsyn >Priority: Major > Labels: .NET > Fix For: 2.8 > > Attachments: IgniteRepro.zip > > > When you create a Thread in C#, JVM creates corresponding thread > "Thread-" which is visible in jstack. When C# joins this thread, it is > not removed from JVM and is kept around. This means that jstack may show > thousands of threads which are not there. Reproducer is attached. It is > presumed that memory will be exhausted eventually. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (IGNITE-5227) StackOverflowError in GridCacheMapEntry#checkOwnerChanged()
[ https://issues.apache.org/jira/browse/IGNITE-5227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Pavlov updated IGNITE-5227: --- Fix Version/s: 2.8 > StackOverflowError in GridCacheMapEntry#checkOwnerChanged() > --- > > Key: IGNITE-5227 > URL: https://issues.apache.org/jira/browse/IGNITE-5227 > Project: Ignite > Issue Type: Bug >Affects Versions: 1.6 >Reporter: Alexey Goncharuk >Assignee: Stepachev Maksim >Priority: Critical > Fix For: 2.8 > > Time Spent: 40m > Remaining Estimate: 0h > > A simple test reproducing this error: > {code} > /** > * @throws Exception if failed. > */ > public void testBatchUnlock() throws Exception { >startGrid(0); >grid(0).createCache(new CacheConfiguration Integer>(DEFAULT_CACHE_NAME) > .setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL)); > try { > final CountDownLatch releaseLatch = new CountDownLatch(1); > IgniteInternalFuture fut = GridTestUtils.runAsync(new > Callable() { > @Override public Object call() throws Exception { > IgniteCache cache = grid(0).cache(null); > Lock lock = cache.lock("key"); > try { > lock.lock(); > releaseLatch.await(); > } > finally { > lock.unlock(); > } > return null; > } > }); > Map putMap = new LinkedHashMap<>(); > putMap.put("key", "trigger"); > for (int i = 0; i < 10_000; i++) > putMap.put("key-" + i, "value"); > IgniteCache asyncCache = > grid(0).cache(null).withAsync(); > asyncCache.putAll(putMap); > IgniteFuture resFut = asyncCache.future(); > Thread.sleep(1000); > releaseLatch.countDown(); > fut.get(); > resFut.get(); > } > finally { > stopAllGrids(); > } > {code} > We should replace a recursive call with a simple iteration over the linked > list. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (IGNITE-5227) StackOverflowError in GridCacheMapEntry#checkOwnerChanged()
[ https://issues.apache.org/jira/browse/IGNITE-5227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16911525#comment-16911525 ] Dmitriy Pavlov commented on IGNITE-5227: Folks, why do you resolve the ticket without fix version? Some day it becomes a pain in the neck for release manager to find out where the fix is. > StackOverflowError in GridCacheMapEntry#checkOwnerChanged() > --- > > Key: IGNITE-5227 > URL: https://issues.apache.org/jira/browse/IGNITE-5227 > Project: Ignite > Issue Type: Bug >Affects Versions: 1.6 >Reporter: Alexey Goncharuk >Assignee: Stepachev Maksim >Priority: Critical > Time Spent: 40m > Remaining Estimate: 0h > > A simple test reproducing this error: > {code} > /** > * @throws Exception if failed. > */ > public void testBatchUnlock() throws Exception { >startGrid(0); >grid(0).createCache(new CacheConfiguration Integer>(DEFAULT_CACHE_NAME) > .setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL)); > try { > final CountDownLatch releaseLatch = new CountDownLatch(1); > IgniteInternalFuture fut = GridTestUtils.runAsync(new > Callable() { > @Override public Object call() throws Exception { > IgniteCache cache = grid(0).cache(null); > Lock lock = cache.lock("key"); > try { > lock.lock(); > releaseLatch.await(); > } > finally { > lock.unlock(); > } > return null; > } > }); > Map putMap = new LinkedHashMap<>(); > putMap.put("key", "trigger"); > for (int i = 0; i < 10_000; i++) > putMap.put("key-" + i, "value"); > IgniteCache asyncCache = > grid(0).cache(null).withAsync(); > asyncCache.putAll(putMap); > IgniteFuture resFut = asyncCache.future(); > Thread.sleep(1000); > releaseLatch.countDown(); > fut.get(); > resFut.get(); > } > finally { > stopAllGrids(); > } > {code} > We should replace a recursive call with a simple iteration over the linked > list. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (IGNITE-12087) Transactional putAll - significant performance drop on big batches of entries.
[ https://issues.apache.org/jira/browse/IGNITE-12087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16911520#comment-16911520 ] Dmitriy Pavlov commented on IGNITE-12087: - [~mstepachev] could you take a look at this issue? > Transactional putAll - significant performance drop on big batches of entries. > -- > > Key: IGNITE-12087 > URL: https://issues.apache.org/jira/browse/IGNITE-12087 > Project: Ignite > Issue Type: Bug > Components: cache >Reporter: Pavel Pereslegin >Priority: Major > > After IGNITE-5227 have been fixed I found significant performance drop in > putAll operation. > Insertion of 30_000 entries before IGNITE-5227 took ~1 second. > After IGNITE-5227 - 130 seconds (~100x slower). > I checked a different batch size: > 10_000 - 10 seconds > 20_000 - 48 seconds > 30_000 - 130 seconds > and I was not able to wait for the result of 100_000 entries. > Reproducer > {code:java} > public class CheckPutAll extends GridCommonAbstractTest { > @Override protected IgniteConfiguration getConfiguration(String > igniteInstanceName) throws Exception { > IgniteConfiguration cfg = super.getConfiguration(igniteInstanceName); > CacheConfiguration ccfg = new CacheConfiguration(DEFAULT_CACHE_NAME); > ccfg.setAtomicityMode(TRANSACTIONAL); > cfg.setCacheConfiguration(ccfg); > return cfg; > } > @Test > public void check() throws Exception { > int cnt = 30_000; > Map data = new HashMap<>(U.capacity(cnt)); > for (int i = 0; i < cnt; i++) > data.put(i, i); > Ignite node0 = startGrid(0); > IgniteCache cache0 = > node0.cache(DEFAULT_CACHE_NAME); > cache0.putAll(data); > } > }{code} -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (IGNITE-9638) .NET: JVM keeps track of CLR Threads, even when they are finished
[ https://issues.apache.org/jira/browse/IGNITE-9638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16911517#comment-16911517 ] Pavel Tupitsyn commented on IGNITE-9638: [~ilyak] I don't think so - again, the problem is that you can't detach some other thread, only the current one. > .NET: JVM keeps track of CLR Threads, even when they are finished > -- > > Key: IGNITE-9638 > URL: https://issues.apache.org/jira/browse/IGNITE-9638 > Project: Ignite > Issue Type: Bug > Components: platforms >Affects Versions: 2.6 >Reporter: Ilya Kasnacheev >Assignee: Pavel Tupitsyn >Priority: Major > Labels: .NET > Fix For: 2.8 > > Attachments: IgniteRepro.zip > > > When you create a Thread in C#, JVM creates corresponding thread > "Thread-" which is visible in jstack. When C# joins this thread, it is > not removed from JVM and is kept around. This means that jstack may show > thousands of threads which are not there. Reproducer is attached. It is > presumed that memory will be exhausted eventually. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (IGNITE-12087) Transactional putAll - significant performance drop on big batches of entries.
[ https://issues.apache.org/jira/browse/IGNITE-12087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Pereslegin updated IGNITE-12087: -- Description: After IGNITE-5227 have been fixed I found significant performance drop in putAll operation. Insertion of 30_000 entries before IGNITE-5227 took ~1 second. After IGNITE-5227 - 130 seconds (~100x slower). I checked a different batch size: 10_000 - 10 seconds 20_000 - 48 seconds 30_000 - 130 seconds and I was not able to wait for the result of 100_000 entries. Reproducer {code:java} public class CheckPutAll extends GridCommonAbstractTest { @Override protected IgniteConfiguration getConfiguration(String igniteInstanceName) throws Exception { IgniteConfiguration cfg = super.getConfiguration(igniteInstanceName); CacheConfiguration ccfg = new CacheConfiguration(DEFAULT_CACHE_NAME); ccfg.setAtomicityMode(TRANSACTIONAL); cfg.setCacheConfiguration(ccfg); return cfg; } @Test public void check() throws Exception { int cnt = 30_000; Map data = new HashMap<>(U.capacity(cnt)); for (int i = 0; i < cnt; i++) data.put(i, i); Ignite node0 = startGrid(0); IgniteCache cache0 = node0.cache(DEFAULT_CACHE_NAME); cache0.putAll(data); } }{code} was: After IGNITE-5227 have been fixed I found significant performance drop in putAll operation. Insertion of 30_000 entries before IGNITE-5227 took ~1 second. After IGNITE-5227 - 130 seconds (~100x slower). I checked a different batch size: 10_000 - 10 seconds 20_000 - 48 seconds 30_000 - 130 seconds and I was not able to wait for the result of 100_000 entries. Reproducer {code:java} public class CheckPutAll extends GridCommonAbstractTest { @Override protected IgniteConfiguration getConfiguration(String igniteInstanceName) throws Exception { IgniteConfiguration cfg = super.getConfiguration(igniteInstanceName); CacheConfiguration ccfg = new CacheConfiguration(DEFAULT_CACHE_NAME); ccfg.setAtomicityMode(TRANSACTIONAL); cfg.setCacheConfiguration(ccfg); return cfg; } @Test public void check() throws Exception { int cnt = 30_000; // Prepare data. Map data = new HashMap<>(U.capacity(cnt)); for (int i = 0; i < cnt; i++) data.put(i, i); // Start 1 node. Ignite node0 = startGrid(0); IgniteCache cache0 = node0.cache(DEFAULT_CACHE_NAME); // Load data. cache0.putAll(data); } }{code} > Transactional putAll - significant performance drop on big batches of entries. > -- > > Key: IGNITE-12087 > URL: https://issues.apache.org/jira/browse/IGNITE-12087 > Project: Ignite > Issue Type: Bug > Components: cache >Reporter: Pavel Pereslegin >Priority: Major > > After IGNITE-5227 have been fixed I found significant performance drop in > putAll operation. > Insertion of 30_000 entries before IGNITE-5227 took ~1 second. > After IGNITE-5227 - 130 seconds (~100x slower). > I checked a different batch size: > 10_000 - 10 seconds > 20_000 - 48 seconds > 30_000 - 130 seconds > and I was not able to wait for the result of 100_000 entries. > Reproducer > {code:java} > public class CheckPutAll extends GridCommonAbstractTest { > @Override protected IgniteConfiguration getConfiguration(String > igniteInstanceName) throws Exception { > IgniteConfiguration cfg = super.getConfiguration(igniteInstanceName); > CacheConfiguration ccfg = new CacheConfiguration(DEFAULT_CACHE_NAME); > ccfg.setAtomicityMode(TRANSACTIONAL); > cfg.setCacheConfiguration(ccfg); > return cfg; > } > @Test > public void check() throws Exception { > int cnt = 30_000; > Map data = new HashMap<>(U.capacity(cnt)); > for (int i = 0; i < cnt; i++) > data.put(i, i); > Ignite node0 = startGrid(0); > IgniteCache cache0 = > node0.cache(DEFAULT_CACHE_NAME); > cache0.putAll(data); > } > }{code} -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (IGNITE-12087) Transactional putAll - significant performance drop on big batches of entries.
[ https://issues.apache.org/jira/browse/IGNITE-12087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Pereslegin updated IGNITE-12087: -- Description: After IGNITE-5227 have been fixed I found significant performance drop in putAll operation. Insertion of 30_000 entries before IGNITE-5227 took ~1 second. After IGNITE-5227 - 130 seconds (~100x slower). I checked a different batch size: 10_000 - 10 seconds 20_000 - 48 seconds 30_000 - 130 seconds and I was not able to wait for the result of 100_000 entries. Reproducer {code:java} public class CheckPutAll extends GridCommonAbstractTest { @Override protected IgniteConfiguration getConfiguration(String igniteInstanceName) throws Exception { IgniteConfiguration cfg = super.getConfiguration(igniteInstanceName); CacheConfiguration ccfg = new CacheConfiguration(DEFAULT_CACHE_NAME); ccfg.setAtomicityMode(TRANSACTIONAL); cfg.setCacheConfiguration(ccfg); return cfg; } @Test public void check() throws Exception { int cnt = 30_000; // Prepare data. Map data = new HashMap<>(U.capacity(cnt)); for (int i = 0; i < cnt; i++) data.put(i, i); // Start 1 node. Ignite node0 = startGrid(0); IgniteCache cache0 = node0.cache(DEFAULT_CACHE_NAME); // Load data. cache0.putAll(data); } }{code} was: After IGNITE-5227 have been fixed I found significant performance drop in putAll operation. Insertion of 30_000 entries before IGNITE-5227 took ~1 second. After IGNITE-5227 - 130 seconds (~100x slower). I checked a different batch size: 10_000 - 10 seconds 20_000 - 48 seconds 30_000 - 130 seconds and I was not able to wait for the result of 100_000 entries. Reproducer {code:java} public class CheckPutAll extends GridCommonAbstractTest { @Override protected IgniteConfiguration getConfiguration(String igniteInstanceName) throws Exception { IgniteConfiguration cfg = super.getConfiguration(igniteInstanceName); CacheConfiguration ccfg = new CacheConfiguration(DEFAULT_CACHE_NAME); ccfg.setAtomicityMode(TRANSACTIONAL); cfg.setCacheConfiguration(ccfg); return cfg; } @Test public void check() throws Exception { int cnt = 30_000; // Prepare data. Map data = new HashMap<>(U.capacity(cnt)); for (int i = 0; i < cnt; i++) data.put(i, i); // Start 1 node. Ignite node0 = startGrid(0); node0.cluster().active(true); node0.cluster().baselineAutoAdjustTimeout(0); IgniteCache cache0 = node0.cache(DEFAULT_CACHE_NAME); // Load data. cache0.putAll(data); } }{code} > Transactional putAll - significant performance drop on big batches of entries. > -- > > Key: IGNITE-12087 > URL: https://issues.apache.org/jira/browse/IGNITE-12087 > Project: Ignite > Issue Type: Bug > Components: cache >Reporter: Pavel Pereslegin >Priority: Major > > After IGNITE-5227 have been fixed I found significant performance drop in > putAll operation. > Insertion of 30_000 entries before IGNITE-5227 took ~1 second. > After IGNITE-5227 - 130 seconds (~100x slower). > I checked a different batch size: > 10_000 - 10 seconds > 20_000 - 48 seconds > 30_000 - 130 seconds > and I was not able to wait for the result of 100_000 entries. > Reproducer > {code:java} > public class CheckPutAll extends GridCommonAbstractTest { > @Override protected IgniteConfiguration getConfiguration(String > igniteInstanceName) throws Exception { > IgniteConfiguration cfg = super.getConfiguration(igniteInstanceName); > CacheConfiguration ccfg = new CacheConfiguration(DEFAULT_CACHE_NAME); > ccfg.setAtomicityMode(TRANSACTIONAL); > cfg.setCacheConfiguration(ccfg); > return cfg; > } > @Test > public void check() throws Exception { > int cnt = 30_000; > // Prepare data. > Map data = new HashMap<>(U.capacity(cnt)); > for (int i = 0; i < cnt; i++) > data.put(i, i); > // Start 1 node. > Ignite node0 = startGrid(0); > IgniteCache cache0 = > node0.cache(DEFAULT_CACHE_NAME); > // Load data. > cache0.putAll(data); > } > }{code} -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (IGNITE-12087) Transactional putAll - significant performance drop on big batches of entries.
Pavel Pereslegin created IGNITE-12087: - Summary: Transactional putAll - significant performance drop on big batches of entries. Key: IGNITE-12087 URL: https://issues.apache.org/jira/browse/IGNITE-12087 Project: Ignite Issue Type: Bug Components: cache Reporter: Pavel Pereslegin After IGNITE-5227 have been fixed I found significant performance drop in putAll operation. Insertion of 30_000 entries before IGNITE-5227 took ~1 second. After IGNITE-5227 - 130 seconds (~100x slower). I checked a different batch size: 10_000 - 10 seconds 20_000 - 48 seconds 30_000 - 130 seconds and I was not able to wait for the result of 100_000 entries. Reproducer {code:java} public class CheckPutAll extends GridCommonAbstractTest { @Override protected IgniteConfiguration getConfiguration(String igniteInstanceName) throws Exception { IgniteConfiguration cfg = super.getConfiguration(igniteInstanceName); CacheConfiguration ccfg = new CacheConfiguration(DEFAULT_CACHE_NAME); ccfg.setAtomicityMode(TRANSACTIONAL); cfg.setCacheConfiguration(ccfg); return cfg; } @Test public void check() throws Exception { int cnt = 30_000; // Prepare data. Map data = new HashMap<>(U.capacity(cnt)); for (int i = 0; i < cnt; i++) data.put(i, i); // Start 1 node. Ignite node0 = startGrid(0); node0.cluster().active(true); node0.cluster().baselineAutoAdjustTimeout(0); IgniteCache cache0 = node0.cache(DEFAULT_CACHE_NAME); // Load data. cache0.putAll(data); } }{code} -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (IGNITE-12071) Test failures after IGNITE-9562 fix in IGFS suite
[ https://issues.apache.org/jira/browse/IGNITE-12071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Pavlov updated IGNITE-12071: Fix Version/s: (was: 2.7.6) 2.8 > Test failures after IGNITE-9562 fix in IGFS suite > - > > Key: IGNITE-12071 > URL: https://issues.apache.org/jira/browse/IGNITE-12071 > Project: Ignite > Issue Type: Test >Reporter: Dmitriy Pavlov >Assignee: Eduard Shangareev >Priority: Blocker > Fix For: 2.8 > > > https://lists.apache.org/thread.html/50375927a1375189c0aeec7dcaabc43ba83b7acee94524a3483d0c1b@%3Cdev.ignite.apache.org%3E > Unfortunately, since https://issues.apache.org/jira/browse/IGNITE-9562 is > planned to the 2.7.6 it is a blocker for the release > *New test failure in master-nightly > IgfsCachePerBlockLruEvictionPolicySelfTest.testFilePrimary > https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8&testNameId=-8890685422557348790&branch=%3Cdefault%3E&tab=testDetails > *New test failure in master-nightly > IgfsCachePerBlockLruEvictionPolicySelfTest.testFileDualExclusion > https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8&testNameId=3724804704021179739&branch=%3Cdefault%3E&tab=testDetails > Changes may lead to failure were done by > - eduard shangareev > https://ci.ignite.apache.org/viewModification.html?modId=889258 -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Resolved] (IGNITE-12071) Test failures after IGNITE-9562 fix in IGFS suite
[ https://issues.apache.org/jira/browse/IGNITE-12071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Pavlov resolved IGNITE-12071. - Resolution: Fixed Not reproduced in 2.7.6, resolving the issue, committed as part of another fix for master, here https://github.com/apache/ignite/pull/6765/files#diff-3b0297f8e0e757b6b5ede921d629c6b5R608 > Test failures after IGNITE-9562 fix in IGFS suite > - > > Key: IGNITE-12071 > URL: https://issues.apache.org/jira/browse/IGNITE-12071 > Project: Ignite > Issue Type: Test >Reporter: Dmitriy Pavlov >Assignee: Eduard Shangareev >Priority: Blocker > Fix For: 2.8 > > > https://lists.apache.org/thread.html/50375927a1375189c0aeec7dcaabc43ba83b7acee94524a3483d0c1b@%3Cdev.ignite.apache.org%3E > Unfortunately, since https://issues.apache.org/jira/browse/IGNITE-9562 is > planned to the 2.7.6 it is a blocker for the release > *New test failure in master-nightly > IgfsCachePerBlockLruEvictionPolicySelfTest.testFilePrimary > https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8&testNameId=-8890685422557348790&branch=%3Cdefault%3E&tab=testDetails > *New test failure in master-nightly > IgfsCachePerBlockLruEvictionPolicySelfTest.testFileDualExclusion > https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8&testNameId=3724804704021179739&branch=%3Cdefault%3E&tab=testDetails > Changes may lead to failure were done by > - eduard shangareev > https://ci.ignite.apache.org/viewModification.html?modId=889258 -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (IGNITE-11470) Views don't show in Dbeaver
[ https://issues.apache.org/jira/browse/IGNITE-11470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16911419#comment-16911419 ] Yury Gerzhedovich commented on IGNITE-11470: [~amashenkov], [~pkouznet], guys, could you please review the patch. > Views don't show in Dbeaver > --- > > Key: IGNITE-11470 > URL: https://issues.apache.org/jira/browse/IGNITE-11470 > Project: Ignite > Issue Type: Task > Components: sql >Reporter: Yury Gerzhedovich >Assignee: Yury Gerzhedovich >Priority: Major > Labels: iep-29 > Fix For: 2.8 > > Time Spent: 4.5h > Remaining Estimate: 0h > > At Database navigator tab we can see no a views. As of now we should see at > least SQL system views. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (IGNITE-11470) Views don't show in Dbeaver
[ https://issues.apache.org/jira/browse/IGNITE-11470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16911417#comment-16911417 ] Ignite TC Bot commented on IGNITE-11470: {panel:title=Branch: [pull/6456/head] Base: [master] : No blockers found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel} [TeamCity *--> Run :: All* Results|https://ci.ignite.apache.org/viewLog.html?buildId=4520011&buildTypeId=IgniteTests24Java8_RunAll] > Views don't show in Dbeaver > --- > > Key: IGNITE-11470 > URL: https://issues.apache.org/jira/browse/IGNITE-11470 > Project: Ignite > Issue Type: Task > Components: sql >Reporter: Yury Gerzhedovich >Assignee: Yury Gerzhedovich >Priority: Major > Labels: iep-29 > Fix For: 2.8 > > Time Spent: 4.5h > Remaining Estimate: 0h > > At Database navigator tab we can see no a views. As of now we should see at > least SQL system views. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Assigned] (IGNITE-11393) Create IgniteLinkTaglet.toString() implementation for Java9+
[ https://issues.apache.org/jira/browse/IGNITE-11393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Pavlov reassigned IGNITE-11393: --- Assignee: (was: Dmitriy Pavlov) > Create IgniteLinkTaglet.toString() implementation for Java9+ > > > Key: IGNITE-11393 > URL: https://issues.apache.org/jira/browse/IGNITE-11393 > Project: Ignite > Issue Type: Improvement >Reporter: Dmitriy Pavlov >Priority: Major > > New implementation was added according to the new Java API for Javadoc. > But the main method kept empty, need to implement toString() to process > IgniteLink annotation -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (IGNITE-12082) [Release] Update versions for pre-build DEB/RPM and describe how to set these versions
[ https://issues.apache.org/jira/browse/IGNITE-12082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Pavlov updated IGNITE-12082: Summary: [Release] Update versions for pre-build DEB/RPM and describe how to set these versions (was: [Release] Automate version assignment to pre-build DEB/RPM or describe how to set packages version to RC version) > [Release] Update versions for pre-build DEB/RPM and describe how to set these > versions > -- > > Key: IGNITE-12082 > URL: https://issues.apache.org/jira/browse/IGNITE-12082 > Project: Ignite > Issue Type: Bug >Reporter: Dmitriy Pavlov >Assignee: Dmitriy Pavlov >Priority: Major > Fix For: 2.7.6 > > > Problem: > https://ci.ignite.apache.org/viewLog.html?buildTypeId=Releases_ApacheIgniteMain_ReleaseBuild&buildId=4513186&branch_Releases_ApacheIgniteMain_ReleaseBuild=ignite-2.7.6 > RC 0 for 2.7.6. the build was successful, but versions for packages remain > unchanged > https://cwiki.apache.org/confluence/display/IGNITE/Release+Process does not > require Release manager to update versions, but pre-build DEB & RPM keeps > version from the previous release. > Solution 1 (manual): > We need to add a new step > https://cwiki.apache.org/confluence/display/IGNITE/Release+Process#ReleaseProcess-4.1.Updatereleasebranchversionsandyearincopyrightmessages > e.g. 4.1.4, where will ask a release manager to update versions. > May be similar with commit > https://gitbox.apache.org/repos/asf?p=ignite.git;a=commit;h=84c2dac5103a448bdaee88cb8290fd6e05a435bb > Solution 2 (automatic) > patch ./scripts/update-versions.sh to set packages version to current project > version. This will not require any actions from the release manager since > versions will be updated at step 4.1 with other assemblies versions. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Resolved] (IGNITE-12002) [TTL] Some expired data remains in memory even with eager TTL enabled
[ https://issues.apache.org/jira/browse/IGNITE-12002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philippe Anes resolved IGNITE-12002. Fix Version/s: 2.8 Resolution: Fixed > [TTL] Some expired data remains in memory even with eager TTL enabled > - > > Key: IGNITE-12002 > URL: https://issues.apache.org/jira/browse/IGNITE-12002 > Project: Ignite > Issue Type: Bug > Components: cache, general >Affects Versions: 2.7 > Environment: Running on MacOS 10.12.6 > OpenJDK 11 > Ignite v2.7.0 > >Reporter: Philippe Anes >Priority: Major > Fix For: 2.8 > > > Create an ignite client (in client mode false) and put some data (10k > entries/values) to it with very small expiration time (~20s) and TTL enabled. > Each time the thread is running it'll remove all the entries that expired, > but after few attempts this thread is not removing all the expired entries, > some of them are staying in memory and are not removed by this thread > execution. > That means we got some expired data in memory, and it's something we want to > avoid. > Please can you confirm that is a real issue or just misuse/configuration of > my test? > Thanks for your feedback. > > To reproduce: > Git repo: [https://github.com/panes/ignite-sample] > Run MyIgniteLoadRunnerTest.run() to reproduce the issue described on top. > (Global setup: Writing 10k entries of 64octets each with TTL 10s) -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Comment Edited] (IGNITE-12061) Silently fail while try to recreate already existing index with differ inline_size.
[ https://issues.apache.org/jira/browse/IGNITE-12061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16911357#comment-16911357 ] Dmitriy Pavlov edited comment on IGNITE-12061 at 8/20/19 1:41 PM: -- [~zstan] thank you for contribution, [~jooger], [~Pavlukhin], thank you for review, [~amashenkov], thank you for review and merging ticket. [~zstan] unfortunately I can't automatically cherry-pick fix to 2.7.6. Could you please prepare 2-7-6 based branch and create new PR for it? An example is https://issues.apache.org/jira/browse/IGNITE-9562 and PR https://github.com/apache/ignite/pull/6781 Also, since there is a risk of introducing bugs during merge, I suggest running TC Run All on the resulting branch. was (Author: dpavlov): [~amashenkov], thank you for review and merging ticket. [~zstan] unfortunately I can't automatically cherry-pick fix to 2.7.6. Could you please prepare 2-7-6 based branch and create new PR for it? An example is https://issues.apache.org/jira/browse/IGNITE-9562 and PR https://github.com/apache/ignite/pull/6781 Also, since there is a risk of introducing bugs during merge, I suggest running TC Run All on the resulting branch. > Silently fail while try to recreate already existing index with differ > inline_size. > --- > > Key: IGNITE-12061 > URL: https://issues.apache.org/jira/browse/IGNITE-12061 > Project: Ignite > Issue Type: Bug > Components: sql >Affects Versions: 2.5, 2.7, 2.7.5 >Reporter: Stanilovsky Evgeny >Assignee: Stanilovsky Evgeny >Priority: Major > Fix For: 2.7.6 > > Time Spent: 5.5h > Remaining Estimate: 0h > > INLINE_SIZE differ from previous value is not correctly sets. > 1. create index idx0(c1, c2) > 2. drop idx0 > 3. create index idx0(c1, c2) inline_size 100; > inline_size remains the same, in this case default = 10. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (IGNITE-12061) Silently fail while try to recreate already existing index with differ inline_size.
[ https://issues.apache.org/jira/browse/IGNITE-12061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16911357#comment-16911357 ] Dmitriy Pavlov commented on IGNITE-12061: - [~amashenkov], thank you for review and merging ticket. [~zstan] unfortunately I can't automatically cherry-pick fix to 2.7.6. Could you please prepare 2-7-6 based branch and create new PR for it? An example is https://issues.apache.org/jira/browse/IGNITE-9562 and PR https://github.com/apache/ignite/pull/6781 Also, since there is a risk of introducing bugs during merge, I suggest running TC Run All on the resulting branch. > Silently fail while try to recreate already existing index with differ > inline_size. > --- > > Key: IGNITE-12061 > URL: https://issues.apache.org/jira/browse/IGNITE-12061 > Project: Ignite > Issue Type: Bug > Components: sql >Affects Versions: 2.5, 2.7, 2.7.5 >Reporter: Stanilovsky Evgeny >Assignee: Stanilovsky Evgeny >Priority: Major > Fix For: 2.7.6 > > Time Spent: 5.5h > Remaining Estimate: 0h > > INLINE_SIZE differ from previous value is not correctly sets. > 1. create index idx0(c1, c2) > 2. drop idx0 > 3. create index idx0(c1, c2) inline_size 100; > inline_size remains the same, in this case default = 10. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Reopened] (IGNITE-12061) Silently fail while try to recreate already existing index with differ inline_size.
[ https://issues.apache.org/jira/browse/IGNITE-12061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Pavlov reopened IGNITE-12061: - > Silently fail while try to recreate already existing index with differ > inline_size. > --- > > Key: IGNITE-12061 > URL: https://issues.apache.org/jira/browse/IGNITE-12061 > Project: Ignite > Issue Type: Bug > Components: sql >Affects Versions: 2.5, 2.7, 2.7.5 >Reporter: Stanilovsky Evgeny >Assignee: Stanilovsky Evgeny >Priority: Major > Fix For: 2.7.6 > > Time Spent: 5.5h > Remaining Estimate: 0h > > INLINE_SIZE differ from previous value is not correctly sets. > 1. create index idx0(c1, c2) > 2. drop idx0 > 3. create index idx0(c1, c2) inline_size 100; > inline_size remains the same, in this case default = 10. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (IGNITE-10619) Add support files transmission between nodes over connection via CommunicationSpi
[ https://issues.apache.org/jira/browse/IGNITE-10619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16911274#comment-16911274 ] Maxim Muzafarov commented on IGNITE-10619: -- Accoding to benchmark there is no pefromance drop. ||FULL_SYNC||master d9fde7ee1562a84f26e92cd94dd0ac09738414ff||IGNITE-10619 7d1b87a67d6e115bff41eb80e7755a070b3f32ac||delta (%)|| |IgnitePutBenchmark|446568.01|449137.06|0.58| |IgnitePutGetBenchmark|326479.18|332820.69|1.91| > Add support files transmission between nodes over connection via > CommunicationSpi > - > > Key: IGNITE-10619 > URL: https://issues.apache.org/jira/browse/IGNITE-10619 > Project: Ignite > Issue Type: Sub-task > Components: persistence >Reporter: Maxim Muzafarov >Assignee: Maxim Muzafarov >Priority: Major > Labels: iep-28 > Fix For: 2.8 > > Time Spent: 10.5h > Remaining Estimate: 0h > > Partition preloader must support cache partition file relocation from one > cluster node to another (the zero copy algorithm [1] assume to be used by > default). To achieve this, the file transfer machinery must be implemented at > Apache Ignite over Communication SPI. > _CommunicationSpi_ > Ignite's Comminication SPI must support to: > * establishing channel connections to the remote node to an arbitrary topic > (GridTopic) with predefined processing policy; > * listening incoming channel creation events and registering connection > handlers on the particular node; > * an arbitrary set of channel parameters on connection handshake; > _FileTransmitProcessor_ > The file transmission manager must support to: > * using different approaches of incoming data handling – buffered and direct > (zero-copy approach of FileChannel#transferTo); > * transferring data by chunks of predefined size with saving intermediate > results; > * re-establishing connection if an error occurs and continue file > upload\download; > * limiting connection bandwidth (upload and download) at runtime; -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (IGNITE-9638) .NET: JVM keeps track of CLR Threads, even when they are finished
[ https://issues.apache.org/jira/browse/IGNITE-9638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16911266#comment-16911266 ] Ilya Kasnacheev commented on IGNITE-9638: - Is it possible to have a thread garbage collection, where threads will be detached periodically? > .NET: JVM keeps track of CLR Threads, even when they are finished > -- > > Key: IGNITE-9638 > URL: https://issues.apache.org/jira/browse/IGNITE-9638 > Project: Ignite > Issue Type: Bug > Components: platforms >Affects Versions: 2.6 >Reporter: Ilya Kasnacheev >Assignee: Pavel Tupitsyn >Priority: Major > Labels: .NET > Fix For: 2.8 > > Attachments: IgniteRepro.zip > > > When you create a Thread in C#, JVM creates corresponding thread > "Thread-" which is visible in jstack. When C# joins this thread, it is > not removed from JVM and is kept around. This means that jstack may show > thousands of threads which are not there. Reproducer is attached. It is > presumed that memory will be exhausted eventually. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (IGNITE-12083) [Release] Change release scripts according pre-build DEB/RPM folders
[ https://issues.apache.org/jira/browse/IGNITE-12083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16911263#comment-16911263 ] Peter Ivanov commented on IGNITE-12083: --- Looks good to me. > [Release] Change release scripts according pre-build DEB/RPM folders > > > Key: IGNITE-12083 > URL: https://issues.apache.org/jira/browse/IGNITE-12083 > Project: Ignite > Issue Type: Bug >Reporter: Dmitriy Pavlov >Assignee: Dmitriy Pavlov >Priority: Major > Fix For: 2.7.6 > > Time Spent: 1h > Remaining Estimate: 0h > > Problem: > svn: E02: Can't stat '/mnt/c/dev_env/release-2.7.6-rc0/packaging/pkg': No > such file or directory > Solution 1: > change release scripts accordingly, PR#5 > Solution 2: > change folder from 'packages' to packaging on Teamcity. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (IGNITE-12061) Silently fail while try to recreate already existing index with differ inline_size.
[ https://issues.apache.org/jira/browse/IGNITE-12061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16911260#comment-16911260 ] Ignite TC Bot commented on IGNITE-12061: {panel:title=Branch: [pull/6770/head] Base: [master] : No blockers found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel} [TeamCity *--> Run :: All* Results|https://ci.ignite.apache.org/viewLog.html?buildId=4518671&buildTypeId=IgniteTests24Java8_RunAll] > Silently fail while try to recreate already existing index with differ > inline_size. > --- > > Key: IGNITE-12061 > URL: https://issues.apache.org/jira/browse/IGNITE-12061 > Project: Ignite > Issue Type: Bug > Components: sql >Affects Versions: 2.5, 2.7, 2.7.5 >Reporter: Stanilovsky Evgeny >Assignee: Stanilovsky Evgeny >Priority: Major > Fix For: 2.7.6 > > Time Spent: 5h 20m > Remaining Estimate: 0h > > INLINE_SIZE differ from previous value is not correctly sets. > 1. create index idx0(c1, c2) > 2. drop idx0 > 3. create index idx0(c1, c2) inline_size 100; > inline_size remains the same, in this case default = 10. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (IGNITE-10808) Discovery message queue may build up with TcpDiscoveryMetricsUpdateMessage
[ https://issues.apache.org/jira/browse/IGNITE-10808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16911258#comment-16911258 ] Sergey Chugunov commented on IGNITE-10808: -- [~dmekhanikov], Along with MetricsUpdate message your change also affects TcpDiscoveryClientAckResponse which no longer be processed with priority to other messages. This may be risky. Could you check what are the consequences for client nodes' stability if acks are delivered to them with some delay? Thanks. > Discovery message queue may build up with TcpDiscoveryMetricsUpdateMessage > -- > > Key: IGNITE-10808 > URL: https://issues.apache.org/jira/browse/IGNITE-10808 > Project: Ignite > Issue Type: Bug >Affects Versions: 2.7 >Reporter: Stanislav Lukyanov >Assignee: Denis Mekhanikov >Priority: Major > Labels: discovery > Fix For: 2.8 > > Attachments: IgniteMetricsOverflowTest.java > > > A node receives a new metrics update message every `metricsUpdateFrequency` > milliseconds, and the message will be put at the top of the queue (because it > is a high priority message). > If processing one message takes more than `metricsUpdateFrequency` then > multiple `TcpDiscoveryMetricsUpdateMessage` will be in the queue. A long > enough delay (e.g. caused by a network glitch or GC) may lead to the queue > building up tens of metrics update messages which are essentially useless to > be processed. Finally, if processing a message on average takes a little more > than `metricsUpdateFrequency` (even for a relatively short period of time, > say, for a minute due to network issues) then the message worker will end up > processing only the metrics updates and the cluster will essentially hang. > Reproducer is attached. In the test, the queue first builds up and then very > slowly being teared down, causing "Failed to wait for PME" messages. > Need to change ServerImpl's SocketReader not to put another metrics update > message to the top of the queue if it already has one (or replace the one at > the top with new one). -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (IGNITE-12071) Test failures after IGNITE-9562 fix in IGFS suite
[ https://issues.apache.org/jira/browse/IGNITE-12071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Pavlov updated IGNITE-12071: Ignite Flags: (was: Docs Required) > Test failures after IGNITE-9562 fix in IGFS suite > - > > Key: IGNITE-12071 > URL: https://issues.apache.org/jira/browse/IGNITE-12071 > Project: Ignite > Issue Type: Test >Reporter: Dmitriy Pavlov >Assignee: Eduard Shangareev >Priority: Blocker > Fix For: 2.7.6 > > > https://lists.apache.org/thread.html/50375927a1375189c0aeec7dcaabc43ba83b7acee94524a3483d0c1b@%3Cdev.ignite.apache.org%3E > Unfortunately, since https://issues.apache.org/jira/browse/IGNITE-9562 is > planned to the 2.7.6 it is a blocker for the release > *New test failure in master-nightly > IgfsCachePerBlockLruEvictionPolicySelfTest.testFilePrimary > https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8&testNameId=-8890685422557348790&branch=%3Cdefault%3E&tab=testDetails > *New test failure in master-nightly > IgfsCachePerBlockLruEvictionPolicySelfTest.testFileDualExclusion > https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8&testNameId=3724804704021179739&branch=%3Cdefault%3E&tab=testDetails > Changes may lead to failure were done by > - eduard shangareev > https://ci.ignite.apache.org/viewModification.html?modId=889258 -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (IGNITE-9562) Destroyed cache that resurrected on an old offline node breaks PME
[ https://issues.apache.org/jira/browse/IGNITE-9562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Pavlov updated IGNITE-9562: --- Labels: 2.7.6-rc1 (was: ) > Destroyed cache that resurrected on an old offline node breaks PME > -- > > Key: IGNITE-9562 > URL: https://issues.apache.org/jira/browse/IGNITE-9562 > Project: Ignite > Issue Type: Bug > Components: cache >Affects Versions: 2.5 >Reporter: Pavel Kovalenko >Assignee: Eduard Shangareev >Priority: Critical > Labels: 2.7.6-rc1 > Fix For: 2.7.6 > > Time Spent: 2h 40m > Remaining Estimate: 0h > > Given: > 2 nodes, persistence enabled. > 1) Stop 1 node > 2) Destroy cache through client > 3) Start stopped node > When the stopped node joins to cluster it starts all caches that it has seen > before stopping. > If that cache was cluster-widely destroyed it leads to breaking the crash > recovery process or PME. > Root cause - we don't start/collect caches from the stopped node on another > part of a cluster. > In case of PARTITIONED cache mode that scenario breaks crash recovery: > {noformat} > java.lang.AssertionError: AffinityTopologyVersion [topVer=-1, minorTopVer=0] > at > org.apache.ignite.internal.processors.affinity.GridAffinityAssignmentCache.cachedAffinity(GridAffinityAssignmentCache.java:696) > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtPartitionTopologyImpl.updateLocal(GridDhtPartitionTopologyImpl.java:2449) > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtPartitionTopologyImpl.afterStateRestored(GridDhtPartitionTopologyImpl.java:679) > at > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restorePartitionStates(GridCacheDatabaseSharedManager.java:2445) > at > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.applyLastUpdates(GridCacheDatabaseSharedManager.java:2321) > at > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restoreState(GridCacheDatabaseSharedManager.java:1568) > at > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.beforeExchange(GridCacheDatabaseSharedManager.java:1308) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.distributedExchange(GridDhtPartitionsExchangeFuture.java:1255) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:766) > at > org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:2577) > at > org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2457) > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) > at java.lang.Thread.run(Thread.java:748) > {noformat} > In case of REPLICATED cache mode that scenario breaks PME coordinator process: > {noformat} > [2018-09-12 > 18:50:36,407][ERROR][sys-#148%distributed.CacheStopAndRessurectOnOldNodeTest0%][GridCacheIoManager] > Failed to process message [senderId=4b6fd0d4-b756-4a9f-90ca-f0ee2511, > messageType=class > o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionsSingleMessage] > java.lang.AssertionError: 3080586 > at > org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.clientTopology(GridCachePartitionExchangeManager.java:815) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.updatePartitionSingleMap(GridDhtPartitionsExchangeFuture.java:3621) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.processSingleMessage(GridDhtPartitionsExchangeFuture.java:2439) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.access$100(GridDhtPartitionsExchangeFuture.java:137) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$2.apply(GridDhtPartitionsExchangeFuture.java:2261) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$2.apply(GridDhtPartitionsExchangeFuture.java:2249) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:383) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.listen(GridFutureAdapter.java:353) > at
[jira] [Updated] (IGNITE-9562) Destroyed cache that resurrected on an old offline node breaks PME
[ https://issues.apache.org/jira/browse/IGNITE-9562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Pavlov updated IGNITE-9562: --- Fix Version/s: (was: 2.8) > Destroyed cache that resurrected on an old offline node breaks PME > -- > > Key: IGNITE-9562 > URL: https://issues.apache.org/jira/browse/IGNITE-9562 > Project: Ignite > Issue Type: Bug > Components: cache >Affects Versions: 2.5 >Reporter: Pavel Kovalenko >Assignee: Eduard Shangareev >Priority: Critical > Fix For: 2.7.6 > > Time Spent: 2h 40m > Remaining Estimate: 0h > > Given: > 2 nodes, persistence enabled. > 1) Stop 1 node > 2) Destroy cache through client > 3) Start stopped node > When the stopped node joins to cluster it starts all caches that it has seen > before stopping. > If that cache was cluster-widely destroyed it leads to breaking the crash > recovery process or PME. > Root cause - we don't start/collect caches from the stopped node on another > part of a cluster. > In case of PARTITIONED cache mode that scenario breaks crash recovery: > {noformat} > java.lang.AssertionError: AffinityTopologyVersion [topVer=-1, minorTopVer=0] > at > org.apache.ignite.internal.processors.affinity.GridAffinityAssignmentCache.cachedAffinity(GridAffinityAssignmentCache.java:696) > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtPartitionTopologyImpl.updateLocal(GridDhtPartitionTopologyImpl.java:2449) > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtPartitionTopologyImpl.afterStateRestored(GridDhtPartitionTopologyImpl.java:679) > at > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restorePartitionStates(GridCacheDatabaseSharedManager.java:2445) > at > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.applyLastUpdates(GridCacheDatabaseSharedManager.java:2321) > at > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restoreState(GridCacheDatabaseSharedManager.java:1568) > at > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.beforeExchange(GridCacheDatabaseSharedManager.java:1308) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.distributedExchange(GridDhtPartitionsExchangeFuture.java:1255) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:766) > at > org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:2577) > at > org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2457) > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) > at java.lang.Thread.run(Thread.java:748) > {noformat} > In case of REPLICATED cache mode that scenario breaks PME coordinator process: > {noformat} > [2018-09-12 > 18:50:36,407][ERROR][sys-#148%distributed.CacheStopAndRessurectOnOldNodeTest0%][GridCacheIoManager] > Failed to process message [senderId=4b6fd0d4-b756-4a9f-90ca-f0ee2511, > messageType=class > o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionsSingleMessage] > java.lang.AssertionError: 3080586 > at > org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.clientTopology(GridCachePartitionExchangeManager.java:815) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.updatePartitionSingleMap(GridDhtPartitionsExchangeFuture.java:3621) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.processSingleMessage(GridDhtPartitionsExchangeFuture.java:2439) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.access$100(GridDhtPartitionsExchangeFuture.java:137) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$2.apply(GridDhtPartitionsExchangeFuture.java:2261) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$2.apply(GridDhtPartitionsExchangeFuture.java:2249) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:383) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.listen(GridFutureAdapter.java:353) > at > org.apache.ignite.internal.
[jira] [Commented] (IGNITE-12061) Silently fail while try to recreate already existing index with differ inline_size.
[ https://issues.apache.org/jira/browse/IGNITE-12061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16911221#comment-16911221 ] Stanilovsky Evgeny commented on IGNITE-12061: - [~amashenkov] can you merge it ? thanks ! > Silently fail while try to recreate already existing index with differ > inline_size. > --- > > Key: IGNITE-12061 > URL: https://issues.apache.org/jira/browse/IGNITE-12061 > Project: Ignite > Issue Type: Bug > Components: sql >Affects Versions: 2.5, 2.7, 2.7.5 >Reporter: Stanilovsky Evgeny >Assignee: Stanilovsky Evgeny >Priority: Major > Fix For: 2.7.6 > > Time Spent: 5h 20m > Remaining Estimate: 0h > > INLINE_SIZE differ from previous value is not correctly sets. > 1. create index idx0(c1, c2) > 2. drop idx0 > 3. create index idx0(c1, c2) inline_size 100; > inline_size remains the same, in this case default = 10. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (IGNITE-12061) Silently fail while try to recreate already existing index with differ inline_size.
[ https://issues.apache.org/jira/browse/IGNITE-12061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16911218#comment-16911218 ] Stanilovsky Evgeny commented on IGNITE-12061: - *Release notes:* Fixed a bug with the inability to change the inline_size of an existing index after drop and further recreation with different one. > Silently fail while try to recreate already existing index with differ > inline_size. > --- > > Key: IGNITE-12061 > URL: https://issues.apache.org/jira/browse/IGNITE-12061 > Project: Ignite > Issue Type: Bug > Components: sql >Affects Versions: 2.5, 2.7, 2.7.5 >Reporter: Stanilovsky Evgeny >Assignee: Stanilovsky Evgeny >Priority: Major > Fix For: 2.7.6 > > Time Spent: 5h 20m > Remaining Estimate: 0h > > INLINE_SIZE differ from previous value is not correctly sets. > 1. create index idx0(c1, c2) > 2. drop idx0 > 3. create index idx0(c1, c2) inline_size 100; > inline_size remains the same, in this case default = 10. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (IGNITE-12061) Silently fail while try to recreate already existing index with differ inline_size.
[ https://issues.apache.org/jira/browse/IGNITE-12061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16911200#comment-16911200 ] Ignite TC Bot commented on IGNITE-12061: {panel:title=Branch: [pull/6770/head] Base: [master] : No blockers found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel} [TeamCity *--> Run :: All* Results|https://ci.ignite.apache.org/viewLog.html?buildId=4518671&buildTypeId=IgniteTests24Java8_RunAll] > Silently fail while try to recreate already existing index with differ > inline_size. > --- > > Key: IGNITE-12061 > URL: https://issues.apache.org/jira/browse/IGNITE-12061 > Project: Ignite > Issue Type: Bug > Components: sql >Affects Versions: 2.5, 2.7, 2.7.5 >Reporter: Stanilovsky Evgeny >Assignee: Stanilovsky Evgeny >Priority: Major > Fix For: 2.7.6 > > Time Spent: 5h 10m > Remaining Estimate: 0h > > INLINE_SIZE differ from previous value is not correctly sets. > 1. create index idx0(c1, c2) > 2. drop idx0 > 3. create index idx0(c1, c2) inline_size 100; > inline_size remains the same, in this case default = 10. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (IGNITE-9562) Destroyed cache that resurrected on an old offline node breaks PME
[ https://issues.apache.org/jira/browse/IGNITE-9562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16911183#comment-16911183 ] Ignite TC Bot commented on IGNITE-9562: --- {panel:title=Branch: [pull/6781/head] Base: [ignite-2.7.6] : Possible Blockers (6)|borderStyle=dashed|borderColor=#ccc|titleBGColor=#F7D6C1} {color:#d04437}Platform C++ (Linux Clang){color} [[tests 0 Exit Code , Failure on metric |https://ci.ignite.apache.org/viewLog.html?buildId=4516785]] {color:#d04437}Platform .NET (Inspections)*{color} [[tests 0 Failure on metric |https://ci.ignite.apache.org/viewLog.html?buildId=4516787]] {color:#d04437}Platform C++ (Linux)*{color} [[tests 0 Exit Code , Failure on metric |https://ci.ignite.apache.org/viewLog.html?buildId=4516789]] {color:#d04437}PDS 1{color} [[tests 2|https://ci.ignite.apache.org/viewLog.html?buildId=4516793]] * IgnitePdsTestSuite: IgnitePdsDestroyCacheTest.testDestroyCachesAbruptly - Test has low fail rate in base branch 0,0% and is not flaky * IgnitePdsTestSuite: IgnitePdsDestroyCacheTest.testDestroyGroupCachesAbruptly - Test has low fail rate in base branch 0,0% and is not flaky {color:#d04437}Platform C++ (Win x64 / Release){color} [[tests 0 BuildFailureOnMessage |https://ci.ignite.apache.org/viewLog.html?buildId=4516797]] {panel} [TeamCity *--> Run :: All* Results|https://ci.ignite.apache.org/viewLog.html?buildId=4514248&buildTypeId=IgniteTests24Java8_RunAll] > Destroyed cache that resurrected on an old offline node breaks PME > -- > > Key: IGNITE-9562 > URL: https://issues.apache.org/jira/browse/IGNITE-9562 > Project: Ignite > Issue Type: Bug > Components: cache >Affects Versions: 2.5 >Reporter: Pavel Kovalenko >Assignee: Eduard Shangareev >Priority: Critical > Fix For: 2.8, 2.7.6 > > Time Spent: 40m > Remaining Estimate: 0h > > Given: > 2 nodes, persistence enabled. > 1) Stop 1 node > 2) Destroy cache through client > 3) Start stopped node > When the stopped node joins to cluster it starts all caches that it has seen > before stopping. > If that cache was cluster-widely destroyed it leads to breaking the crash > recovery process or PME. > Root cause - we don't start/collect caches from the stopped node on another > part of a cluster. > In case of PARTITIONED cache mode that scenario breaks crash recovery: > {noformat} > java.lang.AssertionError: AffinityTopologyVersion [topVer=-1, minorTopVer=0] > at > org.apache.ignite.internal.processors.affinity.GridAffinityAssignmentCache.cachedAffinity(GridAffinityAssignmentCache.java:696) > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtPartitionTopologyImpl.updateLocal(GridDhtPartitionTopologyImpl.java:2449) > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtPartitionTopologyImpl.afterStateRestored(GridDhtPartitionTopologyImpl.java:679) > at > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restorePartitionStates(GridCacheDatabaseSharedManager.java:2445) > at > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.applyLastUpdates(GridCacheDatabaseSharedManager.java:2321) > at > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restoreState(GridCacheDatabaseSharedManager.java:1568) > at > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.beforeExchange(GridCacheDatabaseSharedManager.java:1308) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.distributedExchange(GridDhtPartitionsExchangeFuture.java:1255) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:766) > at > org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:2577) > at > org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2457) > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) > at java.lang.Thread.run(Thread.java:748) > {noformat} > In case of REPLICATED cache mode that scenario breaks PME coordinator process: > {noformat} > [2018-09-12 > 18:50:36,407][ERROR][sys-#148%distributed.CacheStopAndRessurectOnOldNodeTest0%][GridCacheIoManager] > Failed to process message [senderId=4b6fd0d4-b756-4a9f-90ca-f0ee2511, > messageType=class > o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionsSingleMessage] > java.lang.AssertionError: 3080586 >
[jira] [Commented] (IGNITE-12080) Add extended logging for rebalance
[ https://issues.apache.org/jira/browse/IGNITE-12080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16911129#comment-16911129 ] Kirill Tkalenko commented on IGNITE-12080: -- [~EdShangGG] Please code review. > Add extended logging for rebalance > -- > > Key: IGNITE-12080 > URL: https://issues.apache.org/jira/browse/IGNITE-12080 > Project: Ignite > Issue Type: Improvement >Reporter: Kirill Tkalenko >Assignee: Kirill Tkalenko >Priority: Major > > We should log all information about finished rebalance on demander node. > I'd have in log: > h3. Total information: > # Rebalance duration, rebalance start time/rebalance finish time > # How many partitions were processed in each topic (number of paritions, > number of entries, number of bytes) > # How many nodes were suppliers in rebalance (nodeId, number of supplied > paritions, number of supplied entries, number of bytes, duraton of getting > and processing partitions from supplier) > h3. Information per cache group: > # Rebalance duration, rebalance start time/rebalance finish time > # How many partitions were processed in each topic (number of paritions, > number of entries, number of bytes) > # How many nodes were suppliers in rebalance (nodeId, number of supplied > paritions, list of partition ids with PRIMARY/BACKUP flag, number of supplied > entries, number of bytes, duraton of getting and processing partitions from > supplier) > # Information about each partition distribution (list of nodeIds with > primary/backup flag and marked supplier nodeId) > h3. Information per supplier node: > # How many paritions were requested: > #* Total number > #* Primary/backup distribution (number of primary partitions, number of > backup partitions) > #* Total number of entries > #* Total size partitions in bytes > # How many paritions were requested per cache group: > #* Number of requested partitions > #* Number of entries in partitions > #* Total size of partitions in bytes > #* List of requested partitions with size in bytes, count entries, primary or > backup partition flag -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (IGNITE-12080) Add extended logging for rebalance
[ https://issues.apache.org/jira/browse/IGNITE-12080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirill Tkalenko updated IGNITE-12080: - Reviewer: Eduard Shangareev (was: Vladislav Pyatkov) > Add extended logging for rebalance > -- > > Key: IGNITE-12080 > URL: https://issues.apache.org/jira/browse/IGNITE-12080 > Project: Ignite > Issue Type: Improvement >Reporter: Kirill Tkalenko >Assignee: Kirill Tkalenko >Priority: Major > > We should log all information about finished rebalance on demander node. > I'd have in log: > h3. Total information: > # Rebalance duration, rebalance start time/rebalance finish time > # How many partitions were processed in each topic (number of paritions, > number of entries, number of bytes) > # How many nodes were suppliers in rebalance (nodeId, number of supplied > paritions, number of supplied entries, number of bytes, duraton of getting > and processing partitions from supplier) > h3. Information per cache group: > # Rebalance duration, rebalance start time/rebalance finish time > # How many partitions were processed in each topic (number of paritions, > number of entries, number of bytes) > # How many nodes were suppliers in rebalance (nodeId, number of supplied > paritions, list of partition ids with PRIMARY/BACKUP flag, number of supplied > entries, number of bytes, duraton of getting and processing partitions from > supplier) > # Information about each partition distribution (list of nodeIds with > primary/backup flag and marked supplier nodeId) > h3. Information per supplier node: > # How many paritions were requested: > #* Total number > #* Primary/backup distribution (number of primary partitions, number of > backup partitions) > #* Total number of entries > #* Total size partitions in bytes > # How many paritions were requested per cache group: > #* Number of requested partitions > #* Number of entries in partitions > #* Total size of partitions in bytes > #* List of requested partitions with size in bytes, count entries, primary or > backup partition flag -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (IGNITE-12080) Add extended logging for rebalance
[ https://issues.apache.org/jira/browse/IGNITE-12080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1696#comment-1696 ] Vladislav Pyatkov commented on IGNITE-12080: [~ktkale...@gridgain.com] looks good to me. > Add extended logging for rebalance > -- > > Key: IGNITE-12080 > URL: https://issues.apache.org/jira/browse/IGNITE-12080 > Project: Ignite > Issue Type: Improvement >Reporter: Kirill Tkalenko >Assignee: Kirill Tkalenko >Priority: Major > > We should log all information about finished rebalance on demander node. > I'd have in log: > h3. Total information: > # Rebalance duration, rebalance start time/rebalance finish time > # How many partitions were processed in each topic (number of paritions, > number of entries, number of bytes) > # How many nodes were suppliers in rebalance (nodeId, number of supplied > paritions, number of supplied entries, number of bytes, duraton of getting > and processing partitions from supplier) > h3. Information per cache group: > # Rebalance duration, rebalance start time/rebalance finish time > # How many partitions were processed in each topic (number of paritions, > number of entries, number of bytes) > # How many nodes were suppliers in rebalance (nodeId, number of supplied > paritions, list of partition ids with PRIMARY/BACKUP flag, number of supplied > entries, number of bytes, duraton of getting and processing partitions from > supplier) > # Information about each partition distribution (list of nodeIds with > primary/backup flag and marked supplier nodeId) > h3. Information per supplier node: > # How many paritions were requested: > #* Total number > #* Primary/backup distribution (number of primary partitions, number of > backup partitions) > #* Total number of entries > #* Total size partitions in bytes > # How many paritions were requested per cache group: > #* Number of requested partitions > #* Number of entries in partitions > #* Total size of partitions in bytes > #* List of requested partitions with size in bytes, count entries, primary or > backup partition flag -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (IGNITE-12086) Ignite pod keeps crashing and failed to recover the node
[ https://issues.apache.org/jira/browse/IGNITE-12086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] radhakrupa updated IGNITE-12086: Description: Ignite has been deployed on the kubernets , there are 3 replicas of server pod. The pods were up and running fine for 9 days. We have created 180 inventory tables and 204 transactional tables. The data has been inserted using the PyIgnite client using the cache.put() method. This is a very slow operation because PyIgnite is very slow. Each insert is committed one at a time, so it is not able to do bulk-style inserts. The PyIgnite was inserting about 20 of the inventory tables simultaneously (20 different threads/processes). The cluster was nowhere stable after 9days, one of the pod crashed and failed to recover. Below is the error log: {"type":"log","host":"ignite-cluster-ignite-esoc-2","level":"ERROR","system":"ignite-service","time":"2019-08-16T17:13:34,769Z","logger":"GridCachePartitionExchangeManager","timezone":"UTC","log":"Failed to process custom exchange task: ClientCacheChangeDummyDiscoveryMessage [reqId=6b5f6c50-a8c9-4b04-a461-49bfd0112eb0, cachesToClose=null, startCaches=[BgwService]] java.lang.NullPointerException| at org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager.processClientCachesChanges(CacheAffinitySharedManager.java:635)| at org.apache.ignite.internal.processors.cache.GridCacheProcessor.processCustomExchangeTask(GridCacheProcessor.java:391)| at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.processCustomTask(GridCachePartitionExchangeManager.java:2475)| at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:2620)| at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2539)| at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)| at java.lang.Thread.run(Thread.java:748)"} {"type":"log","host":"ignite-cluster-ignite-esoc-2","level":"WARN","system":"ignite-service","time":"2019-08-16T17:13:36,724Z","logger":"GridCacheDatabaseSharedManager","timezone":"UTC","log":"Ignite node stopped in the middle of checkpoint. Will restore memory state and finish checkpoint on node start."} The error report file and ignite-config.xml has been attached for your info. Heap Memory and RAM Configurations are as below on each of the ignite server container: Heap Memory: 32gb RAM: 64GB Default memory region: cpu: 4 Persistence volume wal_storage_size: 10GB persistence_storage_size: 10GB was: Ignite has been deployed on the kubernets , there are 3 replicas of server pod. The pods were up and running fine for 9 days. We have created 180 invent tables and 204 transactional tables. The data has been inserted using the PyIgnite client using the cache.put() method. This is a very slow operation because PyIgnite is very slow. Each insert is committed one at a time, so it is not able to do bulk-style inserts. The PyIgnite was inserting about 20 of the inventory tables simultaneously (20 different threads/processes). The cluster was nowhere stable after 9days, one of the pod crashed and failed to recover. Below is the error log: {"type":"log","host":"ignite-cluster-ignite-esoc-2","level":"ERROR","system":"ignite-service","time":"2019-08-16T17:13:34,769Z","logger":"GridCachePartitionExchangeManager","timezone":"UTC","log":"Failed to process custom exchange task: ClientCacheChangeDummyDiscoveryMessage [reqId=6b5f6c50-a8c9-4b04-a461-49bfd0112eb0, cachesToClose=null, startCaches=[BgwService]] java.lang.NullPointerException| at org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager.processClientCachesChanges(CacheAffinitySharedManager.java:635)| at org.apache.ignite.internal.processors.cache.GridCacheProcessor.processCustomExchangeTask(GridCacheProcessor.java:391)| at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.processCustomTask(GridCachePartitionExchangeManager.java:2475)| at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:2620)| at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2539)| at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)| at java.lang.Thread.run(Thread.java:748)"} \{"type":"log","host":"ignite-cluster-ignite-esoc-2","level":"WARN","system":"ignite-service","time":"2019-08-16T17:13:36,724Z","logger":"GridCacheDatabaseSharedManager","timezone":"UTC","log":"Ignite node stopped in the middle of checkpoint. Will restore memory state and finish checkpoint on node start."} The error report file and ignite-
[jira] [Created] (IGNITE-12086) Ignite pod keeps crashing and failed to recover the node
radhakrupa created IGNITE-12086: --- Summary: Ignite pod keeps crashing and failed to recover the node Key: IGNITE-12086 URL: https://issues.apache.org/jira/browse/IGNITE-12086 Project: Ignite Issue Type: Bug Affects Versions: 2.7 Reporter: radhakrupa Attachments: hs_err_pid116.log, ignite-config.xml Ignite has been deployed on the kubernets , there are 3 replicas of server pod. The pods were up and running fine for 9 days. We have created 180 invent tables and 204 transactional tables. The data has been inserted using the PyIgnite client using the cache.put() method. This is a very slow operation because PyIgnite is very slow. Each insert is committed one at a time, so it is not able to do bulk-style inserts. The PyIgnite was inserting about 20 of the inventory tables simultaneously (20 different threads/processes). The cluster was nowhere stable after 9days, one of the pod crashed and failed to recover. Below is the error log: {"type":"log","host":"ignite-cluster-ignite-esoc-2","level":"ERROR","system":"ignite-service","time":"2019-08-16T17:13:34,769Z","logger":"GridCachePartitionExchangeManager","timezone":"UTC","log":"Failed to process custom exchange task: ClientCacheChangeDummyDiscoveryMessage [reqId=6b5f6c50-a8c9-4b04-a461-49bfd0112eb0, cachesToClose=null, startCaches=[BgwService]] java.lang.NullPointerException| at org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager.processClientCachesChanges(CacheAffinitySharedManager.java:635)| at org.apache.ignite.internal.processors.cache.GridCacheProcessor.processCustomExchangeTask(GridCacheProcessor.java:391)| at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.processCustomTask(GridCachePartitionExchangeManager.java:2475)| at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:2620)| at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2539)| at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)| at java.lang.Thread.run(Thread.java:748)"} \{"type":"log","host":"ignite-cluster-ignite-esoc-2","level":"WARN","system":"ignite-service","time":"2019-08-16T17:13:36,724Z","logger":"GridCacheDatabaseSharedManager","timezone":"UTC","log":"Ignite node stopped in the middle of checkpoint. Will restore memory state and finish checkpoint on node start."} The error report file and ignite-config.xml has been attached for your info. Heap Memory and RAM Configurations are as below on each of the ignite server container: Heap Memory: 32gb RAM: 64GB Default memory region: cpu: 4 Persistence volume wal_storage_size: 10GB persistence_storage_size: 10GB -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Comment Edited] (IGNITE-12080) Add extended logging for rebalance
[ https://issues.apache.org/jira/browse/IGNITE-12080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16911065#comment-16911065 ] Kirill Tkalenko edited comment on IGNITE-12080 at 8/20/19 7:53 AM: --- [~v.pyatkov] Please code review. was (Author: ktkale...@gridgain.com): [~antonovsergey93] Please code review. > Add extended logging for rebalance > -- > > Key: IGNITE-12080 > URL: https://issues.apache.org/jira/browse/IGNITE-12080 > Project: Ignite > Issue Type: Improvement >Reporter: Kirill Tkalenko >Assignee: Kirill Tkalenko >Priority: Major > > We should log all information about finished rebalance on demander node. > I'd have in log: > h3. Total information: > # Rebalance duration, rebalance start time/rebalance finish time > # How many partitions were processed in each topic (number of paritions, > number of entries, number of bytes) > # How many nodes were suppliers in rebalance (nodeId, number of supplied > paritions, number of supplied entries, number of bytes, duraton of getting > and processing partitions from supplier) > h3. Information per cache group: > # Rebalance duration, rebalance start time/rebalance finish time > # How many partitions were processed in each topic (number of paritions, > number of entries, number of bytes) > # How many nodes were suppliers in rebalance (nodeId, number of supplied > paritions, list of partition ids with PRIMARY/BACKUP flag, number of supplied > entries, number of bytes, duraton of getting and processing partitions from > supplier) > # Information about each partition distribution (list of nodeIds with > primary/backup flag and marked supplier nodeId) > h3. Information per supplier node: > # How many paritions were requested: > #* Total number > #* Primary/backup distribution (number of primary partitions, number of > backup partitions) > #* Total number of entries > #* Total size partitions in bytes > # How many paritions were requested per cache group: > #* Number of requested partitions > #* Number of entries in partitions > #* Total size of partitions in bytes > #* List of requested partitions with size in bytes, count entries, primary or > backup partition flag -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (IGNITE-12080) Add extended logging for rebalance
[ https://issues.apache.org/jira/browse/IGNITE-12080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirill Tkalenko updated IGNITE-12080: - Reviewer: Vladislav Pyatkov (was: Sergey Antonov) > Add extended logging for rebalance > -- > > Key: IGNITE-12080 > URL: https://issues.apache.org/jira/browse/IGNITE-12080 > Project: Ignite > Issue Type: Improvement >Reporter: Kirill Tkalenko >Assignee: Kirill Tkalenko >Priority: Major > > We should log all information about finished rebalance on demander node. > I'd have in log: > h3. Total information: > # Rebalance duration, rebalance start time/rebalance finish time > # How many partitions were processed in each topic (number of paritions, > number of entries, number of bytes) > # How many nodes were suppliers in rebalance (nodeId, number of supplied > paritions, number of supplied entries, number of bytes, duraton of getting > and processing partitions from supplier) > h3. Information per cache group: > # Rebalance duration, rebalance start time/rebalance finish time > # How many partitions were processed in each topic (number of paritions, > number of entries, number of bytes) > # How many nodes were suppliers in rebalance (nodeId, number of supplied > paritions, list of partition ids with PRIMARY/BACKUP flag, number of supplied > entries, number of bytes, duraton of getting and processing partitions from > supplier) > # Information about each partition distribution (list of nodeIds with > primary/backup flag and marked supplier nodeId) > h3. Information per supplier node: > # How many paritions were requested: > #* Total number > #* Primary/backup distribution (number of primary partitions, number of > backup partitions) > #* Total number of entries > #* Total size partitions in bytes > # How many paritions were requested per cache group: > #* Number of requested partitions > #* Number of entries in partitions > #* Total size of partitions in bytes > #* List of requested partitions with size in bytes, count entries, primary or > backup partition flag -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (IGNITE-12080) Add extended logging for rebalance
[ https://issues.apache.org/jira/browse/IGNITE-12080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16911062#comment-16911062 ] Ignite TC Bot commented on IGNITE-12080: {panel:title=Branch: [pull/6785/head] Base: [master] : No blockers found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel} [TeamCity *--> Run :: All* Results|https://ci.ignite.apache.org/viewLog.html?buildId=4506939&buildTypeId=IgniteTests24Java8_RunAll] > Add extended logging for rebalance > -- > > Key: IGNITE-12080 > URL: https://issues.apache.org/jira/browse/IGNITE-12080 > Project: Ignite > Issue Type: Improvement >Reporter: Kirill Tkalenko >Assignee: Kirill Tkalenko >Priority: Major > > We should log all information about finished rebalance on demander node. > I'd have in log: > h3. Total information: > # Rebalance duration, rebalance start time/rebalance finish time > # How many partitions were processed in each topic (number of paritions, > number of entries, number of bytes) > # How many nodes were suppliers in rebalance (nodeId, number of supplied > paritions, number of supplied entries, number of bytes, duraton of getting > and processing partitions from supplier) > h3. Information per cache group: > # Rebalance duration, rebalance start time/rebalance finish time > # How many partitions were processed in each topic (number of paritions, > number of entries, number of bytes) > # How many nodes were suppliers in rebalance (nodeId, number of supplied > paritions, list of partition ids with PRIMARY/BACKUP flag, number of supplied > entries, number of bytes, duraton of getting and processing partitions from > supplier) > # Information about each partition distribution (list of nodeIds with > primary/backup flag and marked supplier nodeId) > h3. Information per supplier node: > # How many paritions were requested: > #* Total number > #* Primary/backup distribution (number of primary partitions, number of > backup partitions) > #* Total number of entries > #* Total size partitions in bytes > # How many paritions were requested per cache group: > #* Number of requested partitions > #* Number of entries in partitions > #* Total size of partitions in bytes > #* List of requested partitions with size in bytes, count entries, primary or > backup partition flag -- This message was sent by Atlassian Jira (v8.3.2#803003)