[jira] [Created] (IGNITE-14447) Invalid meta page can be used after index re-creation
Ivan Bessonov created IGNITE-14447: -- Summary: Invalid meta page can be used after index re-creation Key: IGNITE-14447 URL: https://issues.apache.org/jira/browse/IGNITE-14447 Project: Ignite Issue Type: Bug Reporter: Ivan Bessonov Assignee: Ivan Bessonov Fix For: 2.11 Consider the following scenario: * A user creates index "A" * Ignite allocates page 0x1234 as the index meta page and writes it to the index roots tree * Index is populated, query entity is written on disk * Checkpoint is triggered and the index pages (including root) are written to disk * User drops the index * The tree is deallocated, the meta page is removed from the roots tree, query entity without the index is written to disk. No logical record is written for the roots tree. * Node crashes without checkpoint being marked * Node restarts. Since the query entity does not contain the index "A", the index tree is not created * User deletes some entries, then attempts to create the index "A" again * Since the node did not trigger checkpoint before the crash and no logical record was written, the root tree contains obsolete tree with links pointing to non-existing data (namely, index "A" still refers to page 0x1234) * Depending on allocation pattern and enabled assertions flag, the node will either fail with an assertion, or will crash the JVM Fundamentally, the issue is caused by inconsistency between index roots tree and query entity. Ideally, we should move cache configuration to page memory subsystem, but this may be a big change. We should check whether writing a logical record on index drop that will run the index cleanup on recovery mitigates the issue (in other words, the index cleanup persistent task should be triggered even if no checkpoint was marked after query entity is persisted). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-14442) IgniteRunner fails with NPE after REST module was broken by incompatible changes.
Ivan Bessonov created IGNITE-14442: -- Summary: IgniteRunner fails with NPE after REST module was broken by incompatible changes. Key: IGNITE-14442 URL: https://issues.apache.org/jira/browse/IGNITE-14442 Project: Ignite Issue Type: Sub-task Reporter: Ivan Bessonov Assignee: Ivan Bessonov -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-14372) Fix REST json configuration update requests
Ivan Bessonov created IGNITE-14372: -- Summary: Fix REST json configuration update requests Key: IGNITE-14372 URL: https://issues.apache.org/jira/browse/IGNITE-14372 Project: Ignite Issue Type: Sub-task Reporter: Ivan Bessonov -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-14371) Fix REST json representation for configuration
Ivan Bessonov created IGNITE-14371: -- Summary: Fix REST json representation for configuration Key: IGNITE-14371 URL: https://issues.apache.org/jira/browse/IGNITE-14371 Project: Ignite Issue Type: Sub-task Reporter: Ivan Bessonov Assignee: Ivan Bessonov REST code is completely broken, it's time to fix it, partially at least. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-14302) Generated configuration classes break PMD suite in REST module
Ivan Bessonov created IGNITE-14302: -- Summary: Generated configuration classes break PMD suite in REST module Key: IGNITE-14302 URL: https://issues.apache.org/jira/browse/IGNITE-14302 Project: Ignite Issue Type: Sub-task Reporter: Ivan Bessonov Assignee: Ivan Bessonov https://ci.ignite.apache.org/buildConfiguration/ignite3_Tests_SanityChecks_Pmd?branch=pull%2F65=overview=builds#all-projects -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-14279) Introduce "sendWithResponse" into network API
Ivan Bessonov created IGNITE-14279: -- Summary: Introduce "sendWithResponse" into network API Key: IGNITE-14279 URL: https://issues.apache.org/jira/browse/IGNITE-14279 Project: Ignite Issue Type: Sub-task Reporter: Ivan Bessonov Assignee: Ivan Bessonov Fix For: 3.0.0-alpha2 {noformat} /** * Sends asynchronously a message with same guarantees as for {@link #send(NetworkMember, Object)} and * returns a response (RPC style). * * @param member Network member which should receive the message. * @param msg A message. * @param timeout Waiting for response timeout in milliseconds. * @param Expected response type. * @return A future holding the response or error if the expected response was not received. */ CompletableFuture sendWithResponse(NetworkMember member, Object msg, long timeout); {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-14230) Port DynamicConfiguration to new underlying configuration framework.
Ivan Bessonov created IGNITE-14230: -- Summary: Port DynamicConfiguration to new underlying configuration framework. Key: IGNITE-14230 URL: https://issues.apache.org/jira/browse/IGNITE-14230 Project: Ignite Issue Type: Sub-task Reporter: Ivan Bessonov Assignee: Ivan Bessonov -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-14194) Multiple storages support for configuration
Ivan Bessonov created IGNITE-14194: -- Summary: Multiple storages support for configuration Key: IGNITE-14194 URL: https://issues.apache.org/jira/browse/IGNITE-14194 Project: Ignite Issue Type: Sub-task Reporter: Ivan Bessonov Assignee: Ivan Bessonov Currently we have a single hardcoded storage, we should fix it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-14193) Initialize configuration tree with default values on first start
Ivan Bessonov created IGNITE-14193: -- Summary: Initialize configuration tree with default values on first start Key: IGNITE-14193 URL: https://issues.apache.org/jira/browse/IGNITE-14193 Project: Ignite Issue Type: Sub-task Reporter: Ivan Bessonov Conceptually we have the following picture: every possible configuration has non-null value. The problem is the exact moment when you save values not initialized by the user. This routine must be part of node lifecycle, of course, but implementation is not very trivial and used exclusively in lifecycle, which means that it can't be implemented as a part of other more abstract task. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-14145) ConfigurationUtil should be moved to internal package, visitor should be refactored.
Ivan Bessonov created IGNITE-14145: -- Summary: ConfigurationUtil should be moved to internal package, visitor should be refactored. Key: IGNITE-14145 URL: https://issues.apache.org/jira/browse/IGNITE-14145 Project: Ignite Issue Type: Sub-task Reporter: Ivan Bessonov Assignee: Ivan Bessonov See a comment in IGNITE-14121. Also we should add return values to configuration visitor and split Config(root=true) from Config(root=false) for simplicity. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-14121) Implement ability to generate configuration trees from arbitrary sources
Ivan Bessonov created IGNITE-14121: -- Summary: Implement ability to generate configuration trees from arbitrary sources Key: IGNITE-14121 URL: https://issues.apache.org/jira/browse/IGNITE-14121 Project: Ignite Issue Type: Sub-task Reporter: Ivan Bessonov Assignee: Ivan Bessonov Prototype is already present here: [https://github.com/apache/ignite-3/pull/34/files] Now we need to adapt it to current configuration code and implement automatic generation of construction method's implementations. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-14102) Create escaping and searching util methods for configuration framework
Ivan Bessonov created IGNITE-14102: -- Summary: Create escaping and searching util methods for configuration framework Key: IGNITE-14102 URL: https://issues.apache.org/jira/browse/IGNITE-14102 Project: Ignite Issue Type: Sub-task Reporter: Ivan Bessonov Assignee: Ivan Bessonov Right of the bat, I can think of two useful things to do: * escaping / unescaping; * replace for BaseSelectors#find that'll work on new trees. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-14087) Implement code generation for interfaces introduced in IGNITE-14062
Ivan Bessonov created IGNITE-14087: -- Summary: Implement code generation for interfaces introduced in IGNITE-14062 Key: IGNITE-14087 URL: https://issues.apache.org/jira/browse/IGNITE-14087 Project: Ignite Issue Type: Sub-task Reporter: Ivan Bessonov Assignee: Ivan Bessonov -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-14062) Create basic classes and interfaces for traversable configuration tree.
Ivan Bessonov created IGNITE-14062: -- Summary: Create basic classes and interfaces for traversable configuration tree. Key: IGNITE-14062 URL: https://issues.apache.org/jira/browse/IGNITE-14062 Project: Ignite Issue Type: Sub-task Reporter: Ivan Bessonov Assignee: Ivan Bessonov Prototype code is presented in this PR: https://github.com/apache/ignite-3/pull/34 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-13986) Proof of concept - SWIM group membership protocol for discovery
Ivan Bessonov created IGNITE-13986: -- Summary: Proof of concept - SWIM group membership protocol for discovery Key: IGNITE-13986 URL: https://issues.apache.org/jira/browse/IGNITE-13986 Project: Ignite Issue Type: New Feature Reporter: Ivan Bessonov Assignee: Ivan Bessonov In IEP-61 it is mentioned that discovery protocol will be updated. We need to play with mentioned options for a little bit to conclude if they match our needs: [http://www.cs.cornell.edu/Info/Projects/Spinglass/public_pdfs/SWIM.pdf] [https://github.com/scalecube/scalecube-cluster] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-13833) PersistenceBasicCompatibilityTest lacks recent releases
Ivan Bessonov created IGNITE-13833: -- Summary: PersistenceBasicCompatibilityTest lacks recent releases Key: IGNITE-13833 URL: https://issues.apache.org/jira/browse/IGNITE-13833 Project: Ignite Issue Type: Bug Reporter: Ivan Bessonov Assignee: Ivan Bessonov -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-13832) disco-notifier-worker handles IgniteInterruptedCheckedException incorrectly
Ivan Bessonov created IGNITE-13832: -- Summary: disco-notifier-worker handles IgniteInterruptedCheckedException incorrectly Key: IGNITE-13832 URL: https://issues.apache.org/jira/browse/IGNITE-13832 Project: Ignite Issue Type: Bug Reporter: Ivan Bessonov Assignee: Ivan Bessonov DiscoveryMessageNotifierWorker#body handles InterruptedException correctly but if it catches IgniteInterruptedCheckedException, it'll do different logic which is incorrect. I believe all InterruptedException should be handled in the same way. {code:java} [org.gridgain:gridgain-compatibility] [2020-04-13 08:19:15,109][ERROR][disco-notifier-worker-#69754%top2_node_rcv%][root] Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION, err=class o.a.i.IgniteException: Failed to wait for handling disconnect event.]] [08:19:15]W: [org.gridgain:gridgain-compatibility] class org.apache.ignite.IgniteException: Failed to wait for handling disconnect event. [08:19:15]W: [org.gridgain:gridgain-compatibility] at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryWorker.awaitDisconnectEvent(GridDiscoveryManager.java:3128) [08:19:15]W: [org.gridgain:gridgain-compatibility] at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryWorker.access$6400(GridDiscoveryManager.java:2793) [08:19:15]W: [org.gridgain:gridgain-compatibility] at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.onDiscovery0(GridDiscoveryManager.java:868) [08:19:15]W: [org.gridgain:gridgain-compatibility] at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.lambda$onDiscovery$0(GridDiscoveryManager.java:519) [08:19:15]W: [org.gridgain:gridgain-compatibility] at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body0(GridDiscoveryManager.java:2686) [08:19:15]W: [org.gridgain:gridgain-compatibility] at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body(GridDiscoveryManager.java:2724) [08:19:15]W: [org.gridgain:gridgain-compatibility] at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:119) [08:19:15]W: [org.gridgain:gridgain-compatibility] at java.lang.Thread.run(Thread.java:748) [08:19:15]W: [org.gridgain:gridgain-compatibility] Caused by: class org.apache.ignite.internal.IgniteInterruptedCheckedException: Got interrupted while waiting for future to complete. [08:19:15]W: [org.gridgain:gridgain-compatibility] at org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:185) [08:19:15]W: [org.gridgain:gridgain-compatibility] at org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:140) [08:19:15]W: [org.gridgain:gridgain-compatibility] at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryWorker.awaitDisconnectEvent(GridDiscoveryManager.java:3125) [08:19:15]W: [org.gridgain:gridgain-compatibility] ... 7 more {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-13823) WAL iterators require WRITE permissions
Ivan Bessonov created IGNITE-13823: -- Summary: WAL iterators require WRITE permissions Key: IGNITE-13823 URL: https://issues.apache.org/jira/browse/IGNITE-13823 Project: Ignite Issue Type: Bug Reporter: Ivan Bessonov Assignee: Ivan Bessonov org.apache.ignite.internal.processors.cache.persistence.wal.FileDescriptor#toIO uses default permissions, i.e. "CREATE, READ, WRITE" -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-13814) Long restorePartitionStates triggers FailureHandler on node startup
Ivan Bessonov created IGNITE-13814: -- Summary: Long restorePartitionStates triggers FailureHandler on node startup Key: IGNITE-13814 URL: https://issues.apache.org/jira/browse/IGNITE-13814 Project: Ignite Issue Type: Bug Environment: {noformat} Thread [name="sys-stripe-4-#5%EPE_CLUSTER_PERF%", id=24, state=WAITING, blockCnt=4, waitCnt=70836] at java.base@11.0.8/jdk.internal.misc.Unsafe.park(Native Method) at java.base@11.0.8/java.util.concurrent.locks.LockSupport.park(LockSupport.java:323) at app//o.a.i.i.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:186) at app//o.a.i.i.util.future.GridFutureAdapter.getUninterruptibly(GridFutureAdapter.java:154) at app//o.a.i.i.processors.cache.persistence.file.AsyncFileIO.read(AsyncFileIO.java:128) at app//o.a.i.i.processors.cache.persistence.file.AbstractFileIO$2.run(AbstractFileIO.java:89) at app//o.a.i.i.processors.cache.persistence.file.AbstractFileIO.fully(AbstractFileIO.java:52) at app//o.a.i.i.processors.cache.persistence.file.AbstractFileIO.readFully(AbstractFileIO.java:87) at app//o.a.i.i.processors.cache.persistence.file.FilePageStore.readWithFailover(FilePageStore.java:794) at app//o.a.i.i.processors.cache.persistence.file.FilePageStore.read(FilePageStore.java:418) at app//o.a.i.i.processors.cache.persistence.file.FilePageStoreManager.read(FilePageStoreManager.java:519) at app//o.a.i.i.processors.cache.persistence.file.FilePageStoreManager.read(FilePageStoreManager.java:503) at app//o.a.i.i.processors.cache.persistence.pagemem.PageMemoryImpl.acquirePage(PageMemoryImpl.java:874) at app//o.a.i.i.processors.cache.persistence.pagemem.PageMemoryImpl.acquirePage(PageMemoryImpl.java:700) at app//o.a.i.i.processors.cache.persistence.pagemem.PageMemoryImpl.acquirePage(PageMemoryImpl.java:689) at app//o.a.i.i.processors.cache.persistence.DataStructure.acquirePage(DataStructure.java:157) at app//o.a.i.i.processors.cache.persistence.freelist.PagesList.init(PagesList.java:274) at app//o.a.i.i.processors.cache.persistence.freelist.AbstractFreeList.(AbstractFreeList.java:390) at app//o.a.i.i.processors.cache.persistence.freelist.CacheFreeList.(CacheFreeList.java:57) at app//o.a.i.i.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore$1.(GridCacheOffheapManager.java:1806) at app//o.a.i.i.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.init0(GridCacheOffheapManager.java:1805) at app//o.a.i.i.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.init(GridCacheOffheapManager.java:2130) at app//o.a.i.i.processors.cache.persistence.GridCacheOffheapManager.restorePartitionStates(GridCacheOffheapManager.java:544) at app//o.a.i.i.processors.cache.GridCacheProcessor$CacheRecoveryLifecycle.lambda$restorePartitionStates$0(GridCacheProcessor.java:5253) at app//o.a.i.i.processors.cache.GridCacheProcessor$CacheRecoveryLifecycle$$Lambda$633/0x000800717040.run(Unknown Source) at app//o.a.i.i.util.StripedExecutor$Stripe.body(StripedExecutor.java:559) at app//o.a.i.i.util.worker.GridWorker.run(GridWorker.java:119) at java.base@11.0.8/java.lang.Thread.run(Thread.java:834){noformat} In this case, warm-up is on, but client also reports this to happen without warm-up.I don't think that restore partition states should trigger FH. It may take a lot of time with PDS. Also, why do we run it in striped pool? Let's imagine two large caches get the same stripe - restore time doubles. Reporter: Ivan Bessonov Assignee: Ivan Bessonov The following would be printed to log: {noformat} [2020-10-30T17:32:26,190][WARN ][grid-timeout-worker-#22%EPE_CLUSTER_PERF%][] Possible failure suppressed accordingly to a configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker [name=sys-stripe-4, igniteInstanceName=EPE_CLUSTER_PERF, finished=false, heartbeatTs=1604104192954]]] org.apache.ignite.IgniteException: GridWorker [name=sys-stripe-4, igniteInstanceName=EPE_CLUSTER_PERF, finished=false, heartbeatTs=1604104192954] at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1859) [ignite-core-8.7.28.jar:8.7.28] at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1854) [ignite-core-8.7.28.jar:8.7.28] at org.apache.ignite.internal.worker.WorkersRegistry.onIdle(WorkersRegistry.java:233)
[jira] [Created] (IGNITE-13813) SKIP_GARBAGE WAL compression doesn't work for binary recovery
Ivan Bessonov created IGNITE-13813: -- Summary: SKIP_GARBAGE WAL compression doesn't work for binary recovery Key: IGNITE-13813 URL: https://issues.apache.org/jira/browse/IGNITE-13813 Project: Ignite Issue Type: Bug Reporter: Ivan Bessonov Assignee: Ivan Bessonov {noformat} class org.apache.ignite.IgniteCheckedException: Failed to apply page snapshot at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.lambda$performBinaryMemoryRestore$14(GridCacheDatabaseSharedManager.java:2419) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.lambda$stripedApplyPage$18(GridCacheDatabaseSharedManager.java:2603) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.lambda$stripedApply$19(GridCacheDatabaseSharedManager.java:2641) at org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:559) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:119) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.AssertionError: 4096 at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.applyPageSnapshot(GridCacheDatabaseSharedManager.java:2671) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.lambda$performBinaryMemoryRestore$14(GridCacheDatabaseSharedManager.java:2412) ... 5 more{noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-13812) CheckpointEntry is read from WAL right after its creation.
Ivan Bessonov created IGNITE-13812: -- Summary: CheckpointEntry is read from WAL right after its creation. Key: IGNITE-13812 URL: https://issues.apache.org/jira/browse/IGNITE-13812 Project: Ignite Issue Type: Bug Reporter: Ivan Bessonov Assignee: Ivan Bessonov {noformat} [2020-07-31 16:33:15,545][INFO ][pitr-ctx-exec-#304][WalStateManager] WAL logging disabled [2020-07-31 16:33:15,545][INFO ][db-checkpoint-thread-#152][GridCacheDatabaseSharedManager] Checkpoint finished [cpId=e1a57b48-1610-4280-a3e2-4d808a5f0343, pages=64, markPos=FileWALPointer [idx=5, fileOff=45749881, len=186791], walSegmentsCleared=0, walSegmentsCovered=[], markDuration=49ms, pagesWrite=0ms, fsync=5ms, total=79ms] [2020-07-31 16:33:15,546][INFO ][pitr-ctx-exec-#304][GridRecoveryProcessor] Start apply segment idx=1 [2020-07-31 16:33:16,012][INFO ][pitr-ctx-exec-#304][GridRecoveryProcessor] Segment idx=1 applied [2020-07-31 16:33:16,373][INFO ][pitr-ctx-exec-#304][GridRecoveryProcessor] Segment idx=2 applied [2020-07-31 16:33:16,553][ERROR][db-checkpoint-thread-#152][root] Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=CRITICAL_ERROR, err=java.lang.ClassCastException: class o.a.i.i.pagemem.wal.record.MemoryRecoveryRecord cannot be cast to class o.a.i.i.pagemem.wal.record.CheckpointRecord (o.a.i.i.pagemem.wal.record.MemoryRecoveryRecord and o.a.i.i.pagemem.wal.record.CheckpointRecord are in unnamed module of loader 'app')]] java.lang.ClassCastException: class org.apache.ignite.internal.pagemem.wal.record.MemoryRecoveryRecord cannot be cast to class org.apache.ignite.internal.pagemem.wal.record.CheckpointRecord (org.apache.ignite.internal.pagemem.wal.record.MemoryRecoveryRecord and org.apache.ignite.internal.pagemem.wal.record.CheckpointRecord are in unnamed module of loader 'app') at org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointEntry$GroupStateLazyStore.initIfNeeded(CheckpointEntry.java:353) at org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointEntry$GroupStateLazyStore.access$300(CheckpointEntry.java:245) at org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointEntry.initIfNeeded(CheckpointEntry.java:124) at org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointEntry.groupState(CheckpointEntry.java:106) at org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointHistory.addCpToEarliestCpMap(CheckpointHistory.java:246) at org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointHistory.addCheckpoint(CheckpointHistory.java:179) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$Checkpointer.markCheckpointBegin(GridCacheDatabaseSharedManager.java:4221) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$Checkpointer.doCheckpoint(GridCacheDatabaseSharedManager.java:3732) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$Checkpointer.body(GridCacheDatabaseSharedManager.java:3621) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:119) at java.base/java.lang.Thread.run(Thread.java:834){noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-13811) ServerImpl#pingNode(InetSocketAddress, UUID, UUID) fails to ping nodes with unresolved addresses
Ivan Bessonov created IGNITE-13811: -- Summary: ServerImpl#pingNode(InetSocketAddress, UUID, UUID) fails to ping nodes with unresolved addresses Key: IGNITE-13811 URL: https://issues.apache.org/jira/browse/IGNITE-13811 Project: Ignite Issue Type: Bug Reporter: Ivan Bessonov Assignee: Ivan Bessonov Wrong key is deleted from map. {code:java} pingMap.putIfAbsent(addr, fut) {code} {code:java} if (addr.isUnresolved()) addr = new InetSocketAddress(InetAddress.getByName(addr.getHostName()), addr.getPort()); {code} {code:java} boolean b = pingMap.remove(addr, fut); assert b; {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-13808) Control.sh validate_indexes throws CorruptedTreeException and fails server node during check
Ivan Bessonov created IGNITE-13808: -- Summary: Control.sh validate_indexes throws CorruptedTreeException and fails server node during check Key: IGNITE-13808 URL: https://issues.apache.org/jira/browse/IGNITE-13808 Project: Ignite Issue Type: Bug Reporter: Ivan Bessonov Assignee: Ivan Bessonov CorruptedTreeException during validate index command calls Failure handler and stops server node: {code:java} [21:44:26,257][WARNING][pool-5-thread-2][ValidateIndexesClosure] Current progress of ValidateIndexesClosure: checked integrity of 1 index partitions of 14 cache groups [21:44:26,852][SEVERE][pool-5-thread-16][] Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=CRITICAL_ERROR, err=class o.a.i.i.processors.cache.persistence.tree.CorruptedTreeException: B+Tree is corrupted [pages(groupId, pageId)=[], msg=Runtime failure on bounds: [lower=null, upper=null class org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException: B+Tree is corrupted [pages(groupId, pageId)=[], msg=Runtime failure on bounds: [lower=null, upper=null]] at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.corruptedTreeException(BPlusTree.java:5126) at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.find(BPlusTree.java:1029) at org.apache.ignite.internal.processors.query.h2.database.H2TreeIndex.find(H2TreeIndex.java:243) at org.apache.ignite.internal.visor.verify.ValidateIndexesClosure.processIndex(ValidateIndexesClosure.java:651) at org.apache.ignite.internal.visor.verify.ValidateIndexesClosure.access$200(ValidateIndexesClosure.java:93) at org.apache.ignite.internal.visor.verify.ValidateIndexesClosure$4.call(ValidateIndexesClosure.java:631) at org.apache.ignite.internal.visor.verify.ValidateIndexesClosure$4.call(ValidateIndexesClosure.java:629) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTreeRuntimeException: org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTreeRuntimeException: java.lang.IllegalStateException: Item not found: 11 at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findLowerUnbounded(BPlusTree.java:987) at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.find(BPlusTree.java:1014) ... 9 more Caused by: org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTreeRuntimeException: java.lang.IllegalStateException: Item not found: 11 at org.apache.ignite.internal.processors.cache.persistence.CacheDataRowAdapter.initFromLink(CacheDataRowAdapter.java:203) at org.apache.ignite.internal.processors.cache.persistence.CacheDataRowAdapter.initFromLink(CacheDataRowAdapter.java:104) at org.apache.ignite.internal.processors.query.h2.database.H2RowFactory.getRow(H2RowFactory.java:61) at org.apache.ignite.internal.processors.query.h2.database.H2Tree.createRowFromLink(H2Tree.java:246) at org.apache.ignite.internal.processors.query.h2.database.io.H2ExtrasLeafIO.getLookupRow(H2ExtrasLeafIO.java:126) at org.apache.ignite.internal.processors.query.h2.database.io.H2ExtrasLeafIO.getLookupRow(H2ExtrasLeafIO.java:36) at org.apache.ignite.internal.processors.query.h2.database.H2Tree.getRow(H2Tree.java:264) at org.apache.ignite.internal.processors.query.h2.database.H2Tree.getRow(H2Tree.java:56) at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$ForwardCursor.fillFromBuffer(BPlusTree.java:4808) at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$ForwardCursor.init(BPlusTree.java:4710) at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$ForwardCursor.access$5000(BPlusTree.java:4646) at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findLowerUnbounded(BPlusTree.java:976) ... 10 more Caused by: java.lang.IllegalStateException: Item not found: 11 at org.apache.ignite.internal.processors.cache.persistence.tree.io.AbstractDataPageIO.findIndirectItemIndex(AbstractDataPageIO.java:341) at
[jira] [Created] (IGNITE-13802) GridCacheOffheapManager#addPartitions ignores candidate pages count for index partition
Ivan Bessonov created IGNITE-13802: -- Summary: GridCacheOffheapManager#addPartitions ignores candidate pages count for index partition Key: IGNITE-13802 URL: https://issues.apache.org/jira/browse/IGNITE-13802 Project: Ignite Issue Type: Bug Reporter: Ivan Bessonov Assignee: Ivan Bessonov It also marks page as dirty despite doing nothing with it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-13795) java.nio.file.InvalidPathException: Illegal char <:> at lock page on windows
Ivan Bessonov created IGNITE-13795: -- Summary: java.nio.file.InvalidPathException: Illegal char <:> at lock page on windows Key: IGNITE-13795 URL: https://issues.apache.org/jira/browse/IGNITE-13795 Project: Ignite Issue Type: Bug Reporter: Ivan Bessonov Assignee: Ivan Bessonov {code:java} Exception in thread "Thread-1" java.nio.file.InvalidPathException: Illegal char <:> at index 109: C:\BuildAgent\work\d501ae8146bd8253\i2test\var\suite-thin_clients\art-gg-ult\work\diagnostic\page_lock_dump_0:0:0:0:0:0:0:1,127.0.0.1,172.23.240.1,172.25.2.217:47500_2020_06_22_17_24_06_377 at sun.nio.fs.WindowsPathParser.normalize(WindowsPathParser.java:182) at sun.nio.fs.WindowsPathParser.parse(WindowsPathParser.java:153) at sun.nio.fs.WindowsPathParser.parse(WindowsPathParser.java:77) at sun.nio.fs.WindowsPath.parse(WindowsPath.java:94) at sun.nio.fs.WindowsFileSystem.getPath(WindowsFileSystem.java:255) at java.io.File.toPath(File.java:2234) at org.apache.ignite.internal.processors.cache.persistence.diagnostic.pagelocktracker.dumpprocessors.ToFileDumpProcessor.saveToFile(ToFileDumpProcessor.java:69) at org.apache.ignite.internal.processors.cache.persistence.diagnostic.pagelocktracker.dumpprocessors.ToFileDumpProcessor.toFileDump(ToFileDumpProcessor.java:53) at org.apache.ignite.internal.processors.cache.persistence.diagnostic.pagelocktracker.PageLockTrackerManager.onHangThreads(PageLockTrackerManager.java:123) at org.apache.ignite.internal.processors.cache.persistence.diagnostic.pagelocktracker.SharedPageLockTracker$TimeOutWorker.run(SharedPageLockTracker.java:385) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-13786) PDS defragmentation can inflate index size
Ivan Bessonov created IGNITE-13786: -- Summary: PDS defragmentation can inflate index size Key: IGNITE-13786 URL: https://issues.apache.org/jira/browse/IGNITE-13786 Project: Ignite Issue Type: Bug Reporter: Ivan Bessonov Assignee: Ivan Bessonov For huge caches it is possible that defragmentation will lead to bigger indexes size. The reason is that we only append new data to index trees and never insert into the middle, this leads to under-utilization of B+Tree pages space. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-13743) Defragmentation JMX API for schedule/cancel/status
Ivan Bessonov created IGNITE-13743: -- Summary: Defragmentation JMX API for schedule/cancel/status Key: IGNITE-13743 URL: https://issues.apache.org/jira/browse/IGNITE-13743 Project: Ignite Issue Type: Sub-task Reporter: Ivan Bessonov Assignee: Semyon Danilov -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-13742) Fix failed WalModeChangeAdvancedSelfTest.testMaintenanceIsSkippedIfWasFixedManuallyOnDowntime
Ivan Bessonov created IGNITE-13742: -- Summary: Fix failed WalModeChangeAdvancedSelfTest.testMaintenanceIsSkippedIfWasFixedManuallyOnDowntime Key: IGNITE-13742 URL: https://issues.apache.org/jira/browse/IGNITE-13742 Project: Ignite Issue Type: Sub-task Reporter: Ivan Bessonov https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=5803772702668480758=testDetails -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-13709) Control.sh API - status
Ivan Bessonov created IGNITE-13709: -- Summary: Control.sh API - status Key: IGNITE-13709 URL: https://issues.apache.org/jira/browse/IGNITE-13709 Project: Ignite Issue Type: Sub-task Reporter: Ivan Bessonov Assignee: Ivan Bessonov -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-13697) Control.sh API - schedule & cancel
Ivan Bessonov created IGNITE-13697: -- Summary: Control.sh API - schedule & cancel Key: IGNITE-13697 URL: https://issues.apache.org/jira/browse/IGNITE-13697 Project: Ignite Issue Type: Sub-task Reporter: Ivan Bessonov Assignee: Ivan Bessonov -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-13597) Execution timeout in PDS 2
Ivan Bessonov created IGNITE-13597: -- Summary: Execution timeout in PDS 2 Key: IGNITE-13597 URL: https://issues.apache.org/jira/browse/IGNITE-13597 Project: Ignite Issue Type: Test Reporter: Ivan Bessonov Assignee: Ivan Bessonov https://ci.ignite.apache.org/buildConfiguration/IgniteTests24Java8_Pds2/5677092?buildTab=log=3 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-13266) PDS (Indexing) fails with 'Exit code 137"
Ivan Bessonov created IGNITE-13266: -- Summary: PDS (Indexing) fails with 'Exit code 137" Key: IGNITE-13266 URL: https://issues.apache.org/jira/browse/IGNITE-13266 Project: Ignite Issue Type: Test Reporter: Ivan Bessonov Assignee: Ivan Bessonov [https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_PdsIndexing_IgniteTests24Java8=%3Cdefault%3E=buildTypeHistoryList] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-13246) Implement EVT_BASELINE_XXX events
Ivan Bessonov created IGNITE-13246: -- Summary: Implement EVT_BASELINE_XXX events Key: IGNITE-13246 URL: https://issues.apache.org/jira/browse/IGNITE-13246 Project: Ignite Issue Type: Improvement Reporter: Ivan Bessonov Assignee: Ivan Bessonov In order to notify external tools we need events EVT_BASELINE_CHANGED, EVT_BASELINE_AUTO_ADJUST_ENABLED_CHANGED and EVT_BASELINE_AUTO_ADJUST_AWAITING_TIME_CHANGED to correctly update baseline info on UI. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-13242) LocalWalModeChangeDuringRebalancingSelfTest.testDataClearedAfterRestartWithDisabledWal fails
Ivan Bessonov created IGNITE-13242: -- Summary: LocalWalModeChangeDuringRebalancingSelfTest.testDataClearedAfterRestartWithDisabledWal fails Key: IGNITE-13242 URL: https://issues.apache.org/jira/browse/IGNITE-13242 Project: Ignite Issue Type: Test Reporter: Ivan Bessonov Assignee: Ivan Bessonov [https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=-5966400795288779246=testDetails] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-13235) Deadlock in IgniteServiceProcessor
Ivan Bessonov created IGNITE-13235: -- Summary: Deadlock in IgniteServiceProcessor Key: IGNITE-13235 URL: https://issues.apache.org/jira/browse/IGNITE-13235 Project: Ignite Issue Type: Bug Reporter: Ivan Bessonov Assignee: Ivan Bessonov Fix For: 2.9 {code:java} "main" #1 prio=5 os_prio=0 tid=0x7ff9ac00f000 nid=0x86d in Object.wait() [0x7ff9b418b000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:502) at org.apache.ignite.internal.util.worker.GridWorker.join(GridWorker.java:242) - locked <0x000776ee2028> (a java.lang.Object) at org.apache.ignite.internal.util.IgniteUtils.join(IgniteUtils.java:5009) at org.apache.ignite.internal.processors.service.ServiceDeploymentManager.stopProcessing(ServiceDeploymentManager.java:145) at org.apache.ignite.internal.processors.service.IgniteServiceProcessor.stopProcessor(IgniteServiceProcessor.java:261) at org.apache.ignite.internal.processors.service.IgniteServiceProcessor.onKernalStop(IgniteServiceProcessor.java:248) at org.apache.ignite.internal.IgniteKernal.stop0(IgniteKernal.java:2466) at org.apache.ignite.internal.IgniteKernal.stop(IgniteKernal.java:2414) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.stop0(IgnitionEx.java:2577) - locked <0x000776424138> (a org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.stop(IgnitionEx.java:2540) at org.apache.ignite.internal.IgnitionEx.stop(IgnitionEx.java:333) at org.apache.ignite.Ignition.stop(Ignition.java:221) at org.apache.ignite.testframework.junits.GridAbstractTest.stopGrid(GridAbstractTest.java:1225) at org.apache.ignite.testframework.junits.GridAbstractTest.stopAllGrids(GridAbstractTest.java:1268) at org.apache.ignite.testframework.junits.GridAbstractTest.stopAllGrids(GridAbstractTest.java:1246) at org.apache.ignite.events.ClusterActivationStartedEventTest.afterTest(ClusterActivationStartedEventTest.java:41) at org.apache.ignite.testframework.junits.GridAbstractTest.cleanUpTestEnviroment(GridAbstractTest.java:701) at org.apache.ignite.testframework.junits.GridAbstractTest.runTest(GridAbstractTest.java:2165) at org.apache.ignite.testframework.junits.GridAbstractTest.access$600(GridAbstractTest.java:172) at org.apache.ignite.testframework.junits.GridAbstractTest$2.evaluate(GridAbstractTest.java:207) at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) at org.apache.ignite.testframework.junits.SystemPropertiesRule.lambda$methodStatement$1(SystemPropertiesRule.java:109) at org.apache.ignite.testframework.junits.SystemPropertiesRule$$Lambda$6/167185492.evaluate(Unknown Source) at org.apache.ignite.testframework.junits.DelegatingJUnitStatement.evaluate(DelegatingJUnitStatement.java:48) at org.junit.rules.RunRules.evaluate(RunRules.java:20) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) at org.apache.ignite.testframework.junits.GridAbstractTest.evaluateInsideFixture(GridAbstractTest.java:2669) at org.apache.ignite.testframework.junits.GridAbstractTest.access$500(GridAbstractTest.java:172) at org.apache.ignite.testframework.junits.GridAbstractTest$BeforeFirstAndAfterLastTestRule$1.evaluate(GridAbstractTest.java:2649) at org.apache.ignite.testframework.junits.SystemPropertiesRule.lambda$classStatement$0(SystemPropertiesRule.java:93) at org.apache.ignite.testframework.junits.SystemPropertiesRule$$Lambda$2/1879492184.evaluate(Unknown Source) at org.apache.ignite.testframework.junits.DelegatingJUnitStatement.evaluate(DelegatingJUnitStatement.java:48) at org.junit.rules.RunRules.evaluate(RunRules.java:20) at org.junit.runners.ParentRunner.run(ParentRunner.java:363) at org.junit.runner.JUnitCore.run(JUnitCore.java:137) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) at
[jira] [Created] (IGNITE-13156) Continuous query filter deployment hungs discovery thread
Ivan Bessonov created IGNITE-13156: -- Summary: Continuous query filter deployment hungs discovery thread Key: IGNITE-13156 URL: https://issues.apache.org/jira/browse/IGNITE-13156 Project: Ignite Issue Type: Bug Reporter: Ivan Bessonov Continuous query starts with a custom discovery event. Handler of the event is executed in discovery thread synchronously. Even worse is the fact that message itself is mutable and it blocks the ring. Inside of the handler there is a is p2p resource request from other node, which can be pretty time consuming. And after https://issues.apache.org/jira/browse/IGNITE-12438 or similar tasks this could even lead to a deadlock. All IO operations must be removed from discovery handlers. {code:java} at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:2099) at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:2099) at org.apache.ignite.internal.managers.communication.GridIoManager.sendToGridTopic(GridIoManager.java:2231) at org.apache.ignite.internal.managers.deployment.GridDeploymentCommunication.sendResourceRequest(GridDeploymentCommunication.java:456) at org.apache.ignite.internal.managers.deployment.GridDeploymentClassLoader.sendResourceRequest(GridDeploymentClassLoader.java:793) at org.apache.ignite.internal.managers.deployment.GridDeploymentClassLoader.getResourceAsStreamEx(GridDeploymentClassLoader.java:745) at org.apache.ignite.internal.managers.deployment.GridDeploymentPerVersionStore.checkLoadRemoteClass(GridDeploymentPerVersionStore.java:729) at org.apache.ignite.internal.managers.deployment.GridDeploymentPerVersionStore.getDeployment(GridDeploymentPerVersionStore.java:314) at org.apache.ignite.internal.managers.deployment.GridDeploymentManager.getGlobalDeployment(GridDeploymentManager.java:498) at org.apache.ignite.internal.GridEventConsumeHandler.p2pUnmarshal(GridEventConsumeHandler.java:416) at org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.processStartRequest(GridContinuousProcessor.java:1423) at org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.access$400(GridContinuousProcessor.java:117) at org.apache.ignite.internal.processors.continuous.GridContinuousProcessor$2.onCustomEvent(GridContinuousProcessor.java:220) at org.apache.ignite.internal.processors.continuous.GridContinuousProcessor$2.onCustomEvent(GridContinuousProcessor.java:211) at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.onDiscovery0(GridDiscoveryManager.java:670) at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.lambda$onDiscovery$0(GridDiscoveryManager.java:533) at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body0(GridDiscoveryManager.java:2635) at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body(GridDiscoveryManager.java:2673) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) at java.lang.Thread.run(Thread.java:748) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-13062) DistributedMetaStoragePersistentTest.testJoinNodeWithLongerHistory failed
Ivan Bessonov created IGNITE-13062: -- Summary: DistributedMetaStoragePersistentTest.testJoinNodeWithLongerHistory failed Key: IGNITE-13062 URL: https://issues.apache.org/jira/browse/IGNITE-13062 Project: Ignite Issue Type: Bug Reporter: Ivan Bessonov Assignee: Ivan Bessonov Reason is a race between transition future and cluster state that this future modifies. {code:java} java.lang.AssertionError at org.apache.ignite.internal.processors.cluster.GridClusterStateProcessor.publicApiActiveStateAsync(GridClusterStateProcessor.java:333) at org.apache.ignite.internal.processors.cluster.GridClusterStateProcessor.publicApiActiveState(GridClusterStateProcessor.java:295) at org.apache.ignite.internal.IgniteKernal.checkClusterState(IgniteKernal.java:4074) at org.apache.ignite.internal.IgniteKernal.internalCache(IgniteKernal.java:2692) at org.apache.ignite.testframework.junits.common.GridCommonAbstractTest$1.call(GridCommonAbstractTest.java:398) at org.apache.ignite.testframework.junits.common.GridCommonAbstractTest$1.call(GridCommonAbstractTest.java:394) at org.apache.ignite.testframework.junits.GridAbstractTest.executeOnLocalOrRemoteJvm(GridAbstractTest.java:2036) at org.apache.ignite.testframework.junits.common.GridCommonAbstractTest.nearEnabled(GridCommonAbstractTest.java:393) at org.apache.ignite.testframework.junits.common.GridCommonAbstractTest.dht(GridCommonAbstractTest.java:337) at org.apache.ignite.testframework.junits.common.GridCommonAbstractTest.awaitPartitionMapExchange(GridCommonAbstractTest.java:775) at org.apache.ignite.testframework.junits.common.GridCommonAbstractTest.awaitPartitionMapExchange(GridCommonAbstractTest.java:577) at org.apache.ignite.testframework.junits.common.GridCommonAbstractTest.awaitPartitionMapExchange(GridCommonAbstractTest.java:562) at org.apache.ignite.internal.processors.metastorage.DistributedMetaStoragePersistentTest.testJoinNodeWithLongerHistory(DistributedMetaStoragePersistentTest.java:179) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.apache.ignite.testframework.junits.GridAbstractTest$6.run(GridAbstractTest.java:2127) at java.lang.Thread.run(Thread.java:748) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-13058) GridCommandHandlerTest.testKillHangingRemoteTransactions failed
Ivan Bessonov created IGNITE-13058: -- Summary: GridCommandHandlerTest.testKillHangingRemoteTransactions failed Key: IGNITE-13058 URL: https://issues.apache.org/jira/browse/IGNITE-13058 Project: Ignite Issue Type: Bug Reporter: Ivan Bessonov Assignee: Ivan Bessonov Test may fail if not all clients completed local cache start routine. {code:java} at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertNotNull(Assert.java:712) at org.junit.Assert.assertNotNull(Assert.java:722) at org.apache.ignite.testframework.junits.JUnitAssertAware.assertNotNull(JUnitAssertAware.java:178) at org.apache.ignite.util.GridCommandHandlerTest.testKillHangingRemoteTransactions(GridCommandHandlerTest.java:1044) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.apache.ignite.testframework.junits.GridAbstractTest$6.run(GridAbstractTest.java:2127) at java.lang.Thread.run(Thread.java:748) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-13050) ClusterGroup that is recomputed on topology change
Ivan Bessonov created IGNITE-13050: -- Summary: ClusterGroup that is recomputed on topology change Key: IGNITE-13050 URL: https://issues.apache.org/jira/browse/IGNITE-13050 Project: Ignite Issue Type: Improvement Reporter: Ivan Bessonov Assignee: Ivan Bessonov Currently, ClusterGroup comes in two favors: One is a static set of UUIDs which will not change, second is predicate that is recomputed over ALL nodes on EVERY operation. This has bitten our client because recomputing of ClusterGroup happens in tcp-communication thread clogging it and delaying every operation in cluster. This is a major problem. It would be nice if there was a ClusterGroup with predicate which would recompute once per topology affinity change. Bonus points if it precisely tracks current topology with zero delay or overrun. Would be nice to upgrade firstNode/lastNode predicates to that mechanism since now they are static - topology changes but firstNode/lastNode projections don't, they may point to absent node. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-13048) WAL FSYNC mode doesn't work with disabled archiver
Ivan Bessonov created IGNITE-13048: -- Summary: WAL FSYNC mode doesn't work with disabled archiver Key: IGNITE-13048 URL: https://issues.apache.org/jira/browse/IGNITE-13048 Project: Ignite Issue Type: Bug Reporter: Ivan Bessonov Assignee: Ivan Bessonov {noformat} Caused by: org.apache.ignite.internal.processors.cache.persistence.StorageException: Failed to initialize WAL log segment (WAL segment size change is not supported in 'DEFAULT' WAL mode) [filePath=/home/vsisko/gridgain/backend-work/work/db/wal/web_console_data/0001.wal, fileSize=24313258, configSize=10] at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.checkFiles(FileWriteAheadLogManager.java:2427) ~[ignite-core-8.7.5.jar:8.7.5] at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.checkOrPrepareFiles(FileWriteAheadLogManager.java:1404) ~[ignite-core-8.7.5.jar:8.7.5] at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.start0(FileWriteAheadLogManager.java:435) ~[ignite-core-8.7.5.jar:8.7.5] at org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter.start(GridCacheSharedManagerAdapter.java:60) ~[ignite-core-8.7.5.jar:8.7.5] at org.apache.ignite.internal.processors.cache.GridCacheProcessor.start(GridCacheProcessor.java:841) ~[ignite-core-8.7.5.jar:8.7.5] at org.apache.ignite.internal.IgniteKernal.startProcessor(IgniteKernal.java:1717) ~[ignite-core-8.7.5.jar:8.7.5] at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1020) ~[ignite-core-8.7.5.jar:8.7.5] at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2039) ~[ignite-core-8.7.5.jar:8.7.5] at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1731) ~[ignite-core-8.7.5.jar:8.7.5] at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1157) ~[ignite-core-8.7.5.jar:8.7.5] at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:677) ~[ignite-core-8.7.5.jar:8.7.5] at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:602) ~[ignite-core-8.7.5.jar:8.7.5] at org.apache.ignite.Ignition.start(Ignition.java:322) ~[ignite-core-8.7.5.jar:8.7.5] at org.apache.ignite.console.config.GridConfiguration.igniteInstance(GridConfiguration.java:38) ~[classes/:?] at org.apache.ignite.console.config.GridConfiguration$$EnhancerBySpringCGLIB$$b50da981.CGLIB$igniteInstance$0() ~[classes/:?] at org.apache.ignite.console.config.GridConfiguration$$EnhancerBySpringCGLIB$$b50da981$$FastClassBySpringCGLIB$$d486ae88.invoke() ~[classes/:?] at org.springframework.cglib.proxy.MethodProxy.invokeSuper(MethodProxy.java:228) ~[spring-core-4.3.23.RELEASE.jar:4.3.23.RELEASE] at org.springframework.context.annotation.ConfigurationClassEnhancer$BeanMethodInterceptor.intercept(ConfigurationClassEnhancer.java:358) ~[spring-context-4.3.23.RELEASE.jar:4.3.23.RELEASE] at org.apache.ignite.console.config.GridConfiguration$$EnhancerBySpringCGLIB$$b50da981.igniteInstance() ~[classes/:?] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_222] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_222] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_222] at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_222] at org.springframework.beans.factory.support.SimpleInstantiationStrategy.instantiate(SimpleInstantiationStrategy.java:162) ~[spring-beans-4.3.23.RELEASE.jar:4.3.23.RELEASE] at org.springframework.beans.factory.support.ConstructorResolver.instantiateUsingFactoryMethod(ConstructorResolver.java:588) ~[spring-beans-4.3.23.RELEASE.jar:4.3.23.RELEASE] ... 83 more{noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-12885) Checkpoint thread executes partitions fsync in single thread
Ivan Bessonov created IGNITE-12885: -- Summary: Checkpoint thread executes partitions fsync in single thread Key: IGNITE-12885 URL: https://issues.apache.org/jira/browse/IGNITE-12885 Project: Ignite Issue Type: Bug Reporter: Ivan Bessonov Assignee: Ivan Bessonov It should use "asyncRunner" if it was configured, this will optimize checkpoint speed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-12877) "restorePartitionStates" always logs all meta pages into WAL
Ivan Bessonov created IGNITE-12877: -- Summary: "restorePartitionStates" always logs all meta pages into WAL Key: IGNITE-12877 URL: https://issues.apache.org/jira/browse/IGNITE-12877 Project: Ignite Issue Type: Bug Reporter: Ivan Bessonov Assignee: Ivan Bessonov {noformat} 2020-01-31T21:09:27,203 [INFO ][main][org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager] - Finished applying WAL changes [updatesApplied=11897, time=183531 ms] 2020-01-31T21:09:27,203 [INFO ][main][org.apache.ignite.internal.processors.cache.GridCacheProcessor] - Restoring partition state for local groups. 2020-01-31T21:17:49,692 [INFO ][main][org.apache.ignite.internal.processors.cache.GridCacheProcessor] - Finished restoring partition state for local groups [groupsProcessed=32, partitionsProcessed=9310, time=502498ms] {noformat} Main issue is that org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager#updateState unconditionally returns true. "stateId" is pretty much always not equal to "-1". UPDATE: that wasn’t the only problem, please look in the fix itself for more details. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-12875) Implement "EVT_CLUSTER_STATE_CHANGE_STARTED" event
Ivan Bessonov created IGNITE-12875: -- Summary: Implement "EVT_CLUSTER_STATE_CHANGE_STARTED" event Key: IGNITE-12875 URL: https://issues.apache.org/jira/browse/IGNITE-12875 Project: Ignite Issue Type: Improvement Reporter: Ivan Bessonov Assignee: Ivan Bessonov -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-12874) Possible NPE in GridDiscoveryManager#cacheGroupAffinityNode
Ivan Bessonov created IGNITE-12874: -- Summary: Possible NPE in GridDiscoveryManager#cacheGroupAffinityNode Key: IGNITE-12874 URL: https://issues.apache.org/jira/browse/IGNITE-12874 Project: Ignite Issue Type: Bug Reporter: Ivan Bessonov Assignee: Ivan Bessonov If "grpId" is invalid then method will throw NPE instead of returning false. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-12839) IGNITE-12789 broke WALRecordSerializationTest
Ivan Bessonov created IGNITE-12839: -- Summary: IGNITE-12789 broke WALRecordSerializationTest Key: IGNITE-12839 URL: https://issues.apache.org/jira/browse/IGNITE-12839 Project: Ignite Issue Type: Bug Reporter: Ivan Bessonov Assignee: Ivan Bessonov [https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=-3192056576753991319=%3Cdefault%3E=testDetails] Sorry, too bad that I skipped it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-12789) Tracking page repairing has no WAL record associated with it
Ivan Bessonov created IGNITE-12789: -- Summary: Tracking page repairing has no WAL record associated with it Key: IGNITE-12789 URL: https://issues.apache.org/jira/browse/IGNITE-12789 Project: Ignite Issue Type: Bug Reporter: Ivan Bessonov Assignee: Ivan Bessonov org.apache.ignite.internal.processors.cache.persistence.tree.io.TrackingPageIO#resetCorruptFlag(long) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-12726) Cache names can't be used as part of DistributedMetaStorage keys
Ivan Bessonov created IGNITE-12726: -- Summary: Cache names can't be used as part of DistributedMetaStorage keys Key: IGNITE-12726 URL: https://issues.apache.org/jira/browse/IGNITE-12726 Project: Ignite Issue Type: Bug Reporter: Ivan Bessonov Assignee: Ivan Bessonov Issue was discovered during the implementation of IGNITE-12721. Here's a shot version of the description: * local MetaStorage can't handle keys that have more than 64 bytes in their "byte[]" representation. Since DistributedMetaStorage uses it and adds some specific prefixes on top, we have a strict limit on the key length. Just to be clear - it just won't work, IGNITE-12721 only adds a valid exception and meaningful error message to the API. Recently IGNITE-11987 from [IEP-35] has been merged to master and 2.8 release branch, and it does exactly whats written in the title - adds cache name as a part of the key. So, if you use long cache name in, for example, test called "org.apache.ignite.internal.metric.MetricsConfigurationTest#testConfigRemovedOnCacheRemove", you'll get AssertionErrors in log. By "long" I mean about 50 symbols. This should not happen. I see two options here: * leave everything as it is and change keys format; * modify MetaStorage so that it can handle longer keys. I prefer this one. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-12638) Classes persisted by DistributedMetaStorage are not IgniteDTO
Ivan Bessonov created IGNITE-12638: -- Summary: Classes persisted by DistributedMetaStorage are not IgniteDTO Key: IGNITE-12638 URL: https://issues.apache.org/jira/browse/IGNITE-12638 Project: Ignite Issue Type: Bug Reporter: Ivan Bessonov Assignee: Ivan Bessonov Fix For: 2.8 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-12576) [IEP-35] TCP communication metrics use node ID instead of consistent ID
Ivan Bessonov created IGNITE-12576: -- Summary: [IEP-35] TCP communication metrics use node ID instead of consistent ID Key: IGNITE-12576 URL: https://issues.apache.org/jira/browse/IGNITE-12576 Project: Ignite Issue Type: Improvement Reporter: Ivan Bessonov Assignee: Ivan Bessonov TcpCommunicationMetricsListener uses nodeId for metrics name. consistentId should be used instead. Also all metrics for registry should be created at once before registry added to GridMetricManager. There is no need in separate initialization for sent and received counters. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-12542) Some tests failed after due to incompatible changes in IGNITE-12108 and IGNITE-11987
Ivan Bessonov created IGNITE-12542: -- Summary: Some tests failed after due to incompatible changes in IGNITE-12108 and IGNITE-11987 Key: IGNITE-12542 URL: https://issues.apache.org/jira/browse/IGNITE-12542 Project: Ignite Issue Type: Bug Reporter: Ivan Bessonov Assignee: Ivan Bessonov [https://ci.ignite.apache.org/buildConfiguration/IgniteTests24Java8_ComputeGrid?branch=%3Cdefault%3E=overview=builds] [https://ci.ignite.apache.org/buildConfiguration/IgniteTests24Java8_Basic1?branch=%3Cdefault%3E=overview=builds] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-12515) GridMultinodeRedeploySharedModeSelfTest.testSharedMode fails sometimes
Ivan Bessonov created IGNITE-12515: -- Summary: GridMultinodeRedeploySharedModeSelfTest.testSharedMode fails sometimes Key: IGNITE-12515 URL: https://issues.apache.org/jira/browse/IGNITE-12515 Project: Ignite Issue Type: Bug Reporter: Ivan Bessonov Assignee: Ivan Bessonov Fix For: 2.9 Exception in org.apache.ignite.internal.managers.deployment.GridDeploymentPerVersionStore#searchDeploymentCache -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-12514) WAL don't flush several last records in LOG-ONLY/FSYNC mode if flush ptr=null
Ivan Bessonov created IGNITE-12514: -- Summary: WAL don't flush several last records in LOG-ONLY/FSYNC mode if flush ptr=null Key: IGNITE-12514 URL: https://issues.apache.org/jira/browse/IGNITE-12514 Project: Ignite Issue Type: Bug Reporter: Ivan Bessonov Assignee: Ivan Bessonov Fix For: 2.9 In the current implementation, last flush pointer dependent to thread-local. If some thread adds new records and another thread calls wal.flush(null), this flush may not be flushed records witch was added in thread one, because in case null flush pointer, thread flushed until the last record which was added in current thread. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-12506) Deadlock in DistributedMetaStoragePersistentTest.testUnstableTopology
Ivan Bessonov created IGNITE-12506: -- Summary: Deadlock in DistributedMetaStoragePersistentTest.testUnstableTopology Key: IGNITE-12506 URL: https://issues.apache.org/jira/browse/IGNITE-12506 Project: Ignite Issue Type: Bug Reporter: Ivan Bessonov Assignee: Ivan Bessonov Fix For: 2.9 {code:java} "wal-file-archiver%metastorage.DistributedMetaStoragePersistentTest4-#51609%metastorage.DistributedMetaStoragePersistentTest4%@88463" prio=5 tid=0xf889 nid=NA waiting for monitor entry java.lang.Thread.State: BLOCKED waiting for dms-writer-thread-#51614%metastorage.DistributedMetaStoragePersistentTest4%@88460 to release lock on <0x159fe> (a org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileArchiver) at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileArchiver$2.apply(FileWriteAheadLogManager.java:2042) at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileArchiver$2.apply(FileWriteAheadLogManager.java:2040) at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.checkFiles(FileWriteAheadLogManager.java:2538) at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.access$3000(FileWriteAheadLogManager.java:157) at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileArchiver.allocateRemainingFiles(FileWriteAheadLogManager.java:2032) at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileArchiver.body(FileWriteAheadLogManager.java:1806) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:119) at java.lang.Thread.run(Thread.java:748) {code} {code:java} "dms-writer-thread-#51614%metastorage.DistributedMetaStoragePersistentTest4%@88460" prio=5 tid=0xf88e nid=NA waiting java.lang.Thread.State: WAITING blocks wal-file-archiver%metastorage.DistributedMetaStoragePersistentTest4-#51609%metastorage.DistributedMetaStoragePersistentTest4%@88463 at java.lang.Object.wait(Object.java:-1) at java.lang.Object.wait(Object.java:502) at org.apache.ignite.internal.processors.cache.persistence.wal.aware.SegmentCurrentStateStorage.nextAbsoluteSegmentIndex(SegmentCurrentStateStorage.java:107) at org.apache.ignite.internal.processors.cache.persistence.wal.aware.SegmentAware.nextAbsoluteSegmentIndex(SegmentAware.java:66) at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileArchiver.nextAbsoluteSegmentIndex(FileWriteAheadLogManager.java:1918) - locked <0x159fe> (a org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileArchiver) at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileArchiver.access$1100(FileWriteAheadLogManager.java:1687) at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.pollNextFile(FileWriteAheadLogManager.java:1575) at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.initNextWriteHandle(FileWriteAheadLogManager.java:1387) at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.rollOver(FileWriteAheadLogManager.java:1258) at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.log(FileWriteAheadLogManager.java:875) at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.log(FileWriteAheadLogManager.java:796) at org.apache.ignite.internal.processors.cache.persistence.freelist.AbstractFreeList$WriteRowHandler.addRow(AbstractFreeList.java:207) at org.apache.ignite.internal.processors.cache.persistence.freelist.AbstractFreeList$WriteRowHandler.run(AbstractFreeList.java:158) at org.apache.ignite.internal.processors.cache.persistence.freelist.AbstractFreeList$WriteRowHandler.run(AbstractFreeList.java:138) at org.apache.ignite.internal.processors.cache.persistence.tree.util.PageHandler.writePage(PageHandler.java:292) at org.apache.ignite.internal.processors.cache.persistence.DataStructure.write(DataStructure.java:318) at org.apache.ignite.internal.processors.cache.persistence.freelist.AbstractFreeList.insertDataRow(AbstractFreeList.java:516) at org.apache.ignite.internal.processors.cache.persistence.metastorage.MetastorageRowStore.addRow(MetastorageRowStore.java:72) at org.apache.ignite.internal.processors.cache.persistence.metastorage.MetaStorage.writeRaw(MetaStorage.java:419) - locked <0x159ff> (a org.apache.ignite.internal.processors.cache.persistence.metastorage.MetaStorage) at org.apache.ignite.internal.processors.cache.persistence.metastorage.MetaStorage.write(MetaStorage.java:396) at
[jira] [Created] (IGNITE-12499) Node took a long time to start after kill
Ivan Bessonov created IGNITE-12499: -- Summary: Node took a long time to start after kill Key: IGNITE-12499 URL: https://issues.apache.org/jira/browse/IGNITE-12499 Project: Ignite Issue Type: Bug Reporter: Ivan Bessonov Assignee: Ivan Bessonov Fix For: 2.9 Test scenario: 1) Start 4 node cluster 2) Activate 3) Load 1k rows to each cache 4) Stop node 5) Return it back without index.bin files 6) Wait until start Somehow the first node takes Waiting for topology snapshot: server(s) 4/4, client(s) 0/*, timeout 1166/1800 sec to start. [10:47:21,360][INFO][main][G] Node started : [stage="Configure system pool" (129 ms),stage="Start managers" (440 ms),stage="Configure binary metadata" (86 ms),stage="Start processors" (39341 ms),stage="Start 'GridGain' plugin" (16 ms),s tage="Init and start regions" (210 ms),stage="Restore binary memory" (228224 ms),stage="Restore logical state" (859694 ms),stage="Finish recovery" (8938 ms),stage="Join topology" (6024 ms),stage="Await transition" (16 ms),stage="Await e xchange" (14855 ms),stage="Total time" (1157973 ms)] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-12491) Eliminate contention on ConcurrentHashMap.size()
Ivan Bessonov created IGNITE-12491: -- Summary: Eliminate contention on ConcurrentHashMap.size() Key: IGNITE-12491 URL: https://issues.apache.org/jira/browse/IGNITE-12491 Project: Ignite Issue Type: Improvement Reporter: Ivan Bessonov Assignee: Ivan Bessonov Fix For: 2.9 Methods who invoked checkpointReadLock/checkpointReadUnlock spend much time on calculation of quantity dirty pages. You will to see that when have some hundreds of regions. Any persistent operation will be cost hundreds invokes of size on ConcurrentHashMap. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-12488) Fix JavaDocs in DistributedMetaStorage
Ivan Bessonov created IGNITE-12488: -- Summary: Fix JavaDocs in DistributedMetaStorage Key: IGNITE-12488 URL: https://issues.apache.org/jira/browse/IGNITE-12488 Project: Ignite Issue Type: Improvement Reporter: Ivan Bessonov Assignee: Ivan Bessonov Fix For: 2.9 Some information is obsolete after https://issues.apache.org/jira/browse/IGNITE-12109 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-12487) Inconsistent GridIoManager API for sendToGridTopic(Collection nodes) and sendToGridTopic(UUID nodeId)
Ivan Bessonov created IGNITE-12487: -- Summary: Inconsistent GridIoManager API for sendToGridTopic(Collection nodes) and sendToGridTopic(UUID nodeId) Key: IGNITE-12487 URL: https://issues.apache.org/jira/browse/IGNITE-12487 Project: Ignite Issue Type: Improvement Reporter: Ivan Bessonov Assignee: Ivan Bessonov Fix For: 2.9 Method {{1}}{{ctx.io().sendToGridTopic(Collection nodes, )}} will throw exception "Internal Ignite code should never call the method with local node in a node list." But at the same time {{1}}{{ctx.io().sendToGridTopic(((IgniteEx)ignite).localNode().id(), ...)}} Works without any exception. >From my point of view we should not throw exception. Processing messages in common listener is much more comfortable than writing same code twice, one for remote nodes and one for local. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-12486) Truncation of archived WAL segments doesn't work
Ivan Bessonov created IGNITE-12486: -- Summary: Truncation of archived WAL segments doesn't work Key: IGNITE-12486 URL: https://issues.apache.org/jira/browse/IGNITE-12486 Project: Ignite Issue Type: Bug Reporter: Ivan Bessonov Assignee: Ivan Bessonov Index calculation is wrong in FileWriteAheadLogManager#rollOver. It leads to unexpected and faulty WAL segments truncation and data corruption as a result. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-12485) DiscoveryEvent make event message lazy initialization
Ivan Bessonov created IGNITE-12485: -- Summary: DiscoveryEvent make event message lazy initialization Key: IGNITE-12485 URL: https://issues.apache.org/jira/browse/IGNITE-12485 Project: Ignite Issue Type: Improvement Reporter: Ivan Bessonov Assignee: Ivan Bessonov Fix For: 2.9 In GridDiscoveryManager$DiscoveryWorker#recordEvent() we set to each event message: "msg " + clusterNode Invocation toString() on ClusterNode's inheritor could be expensive. I think event message could be lazy generated from event type and cluster node. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-12119) Peer Class Loading has no retries
Ivan Bessonov created IGNITE-12119: -- Summary: Peer Class Loading has no retries Key: IGNITE-12119 URL: https://issues.apache.org/jira/browse/IGNITE-12119 Project: Ignite Issue Type: Bug Reporter: Ivan Bessonov Assignee: Ivan Bessonov Fix For: 2.8 That's it. Peer Class Loading has short timeout and no retries, and if it fails, loading of class will not be reattempted. I believe this is in part because GridDeploymentClassLoader is a class loader. If it throws ClassNotFoundException when asked to load class, JVM will take notice and not reattempt to load this class, even if error was transient. Proposed amendments: * Increase timeouts, introduce immediate retries. * See if we can report transient class loading issue to JVM. * If all failed, we need to mark class loader as invalid when timeout occurs, phase out its usage and create a new class loader which will reattempt to load this class later. Please note that extended waiting in class loader is not recommended because it can cause grid to stall. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (IGNITE-12109) The distributed metastorage must support read/write operation on an inactive cluster.
Ivan Bessonov created IGNITE-12109: -- Summary: The distributed metastorage must support read/write operation on an inactive cluster. Key: IGNITE-12109 URL: https://issues.apache.org/jira/browse/IGNITE-12109 Project: Ignite Issue Type: Improvement Reporter: Ivan Bessonov Assignee: Ivan Bessonov The metastorage isn't able to propagate value on an inactive cluster. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (IGNITE-12108) [IEP-35] Migrate Communication Metrics.
Ivan Bessonov created IGNITE-12108: -- Summary: [IEP-35] Migrate Communication Metrics. Key: IGNITE-12108 URL: https://issues.apache.org/jira/browse/IGNITE-12108 Project: Ignite Issue Type: New Feature Reporter: Ivan Bessonov Assignee: Ivan Bessonov ||*Name*||*Description*|| |communication.tcp.outboundMessagesQueueSize|Number of messages waiting to be sent| |communication.tcp.sentBytes|Total number of bytes received by current node| |communication.tcp.receivedBytes|Total number of bytes sent by current node| |communication.tcp.sentMessagesCount|Total number of messages sent by current node| |communication.tcp.receivedMessagesCount|Total number of messages received by current node| |communication.tcp.sentMessagesByType.|Total number of messages with given type sent by current node| |communication.tcp.receivedMessagesByType.|Total number of messages with given type received by current node| |communication.tcp..sentMessagesToNode|Total number of messages sent by current node to the given node| |communication.tcp..receivedMessagesFromNode|Total number of messages received by current node from the given node| -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (IGNITE-11998) Fix DataPageScan for fragmented pages.
Ivan Bessonov created IGNITE-11998: -- Summary: Fix DataPageScan for fragmented pages. Key: IGNITE-11998 URL: https://issues.apache.org/jira/browse/IGNITE-11998 Project: Ignite Issue Type: Bug Reporter: Ivan Bessonov Fix For: 2.8 Fragmented pages crash JVM when accessed by DataPageScan scanner/query optimized scanner. It happens when scanner accesses data in later chunk in fragmented entry but treats it like the first one, expecting length of the payload, which is absent and replaced with raw entry data. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (IGNITE-11963) Remove ContinuousQueryDeserializationErrorOnNodeJoinTest
Ivan Bessonov created IGNITE-11963: -- Summary: Remove ContinuousQueryDeserializationErrorOnNodeJoinTest Key: IGNITE-11963 URL: https://issues.apache.org/jira/browse/IGNITE-11963 Project: Ignite Issue Type: Test Reporter: Ivan Bessonov Assignee: Ivan Bessonov Test ContinuousQueryDeserializationErrorOnNodeJoinTest is invalid after IGNITE-11914 and should be removed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11931) Rewrite @WithSystemProperty handling using JUnit rules.
Ivan Bessonov created IGNITE-11931: -- Summary: Rewrite @WithSystemProperty handling using JUnit rules. Key: IGNITE-11931 URL: https://issues.apache.org/jira/browse/IGNITE-11931 Project: Ignite Issue Type: Test Reporter: Ivan Bessonov Assignee: Ivan Bessonov Fix For: 3.0 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11882) Bugs related to SPI & tests fixes
Ivan Bessonov created IGNITE-11882: -- Summary: Bugs related to SPI & tests fixes Key: IGNITE-11882 URL: https://issues.apache.org/jira/browse/IGNITE-11882 Project: Ignite Issue Type: Bug Reporter: Ivan Bessonov Assignee: Ivan Bessonov This issue contains fixes for several issues: * Checkpointer thread waits for too long on pending futures without heartbeat. * Ignite build date is always shown in current timezone. UTC should be used. * Examples have troublesome "jackson" dependency version and can't be run as a result. * Baseline in discovery cache might be inconsistent with the actual baseline on in-memory clusters. * Distributed metastorage triggers failure handler on thread interruption while node stopping. * Sometimes node restore fails with no segments to read, but there are no useful logs for diagnostic. * Spring test suite has issues: ** Exchange manager may invoke the failure processor on node stop with NodeStoppingException. ** KillerLifecycleBean should wait for the start of all nodes before stop node otherwise start the second node may lead to G.start() return null. * IgnitePdsCorruptedStoreTest.testReadOnlyMetaStore fails when run under root permissions. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11864) Log FileNotFoundException on restore if no segments were found.
Ivan Bessonov created IGNITE-11864: -- Summary: Log FileNotFoundException on restore if no segments were found. Key: IGNITE-11864 URL: https://issues.apache.org/jira/browse/IGNITE-11864 Project: Ignite Issue Type: Improvement Reporter: Ivan Bessonov Assignee: Ivan Bessonov -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11861) GridEventConsumeSelfTest.testMultithreadedWithNodeRestart fails on TC
Ivan Bessonov created IGNITE-11861: -- Summary: GridEventConsumeSelfTest.testMultithreadedWithNodeRestart fails on TC Key: IGNITE-11861 URL: https://issues.apache.org/jira/browse/IGNITE-11861 Project: Ignite Issue Type: Test Reporter: Ivan Bessonov Assignee: Ivan Bessonov [https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=4911099288413140059=testDetails] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11858) IgniteClientRejoinTest.testClientsReconnectAfterStart is flaky
Ivan Bessonov created IGNITE-11858: -- Summary: IgniteClientRejoinTest.testClientsReconnectAfterStart is flaky Key: IGNITE-11858 URL: https://issues.apache.org/jira/browse/IGNITE-11858 Project: Ignite Issue Type: Test Reporter: Ivan Bessonov Assignee: Ivan Bessonov [https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=594525236246121383=testDetails_IgniteTests24Java8=%3Cdefault%3E] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11841) Dump page history info in FailureHandler on CorruptedTreeException
Ivan Bessonov created IGNITE-11841: -- Summary: Dump page history info in FailureHandler on CorruptedTreeException Key: IGNITE-11841 URL: https://issues.apache.org/jira/browse/IGNITE-11841 Project: Ignite Issue Type: Sub-task Reporter: Ivan Bessonov Assignee: Ivan Bessonov -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11822) Wrong DistributedMetaStorage feature id
Ivan Bessonov created IGNITE-11822: -- Summary: Wrong DistributedMetaStorage feature id Key: IGNITE-11822 URL: https://issues.apache.org/jira/browse/IGNITE-11822 Project: Ignite Issue Type: Improvement Reporter: Ivan Bessonov Assignee: Ivan Bessonov Fix For: 2.8 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11706) DistributedMetaStoragePersistentTest.testConflictingData is flaky in zookeeper suite.
Ivan Bessonov created IGNITE-11706: -- Summary: DistributedMetaStoragePersistentTest.testConflictingData is flaky in zookeeper suite. Key: IGNITE-11706 URL: https://issues.apache.org/jira/browse/IGNITE-11706 Project: Ignite Issue Type: Test Reporter: Ivan Bessonov Assignee: Ivan Bessonov [https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=4285807788261365029=testDetails_IgniteTests24Java8=%3Cdefault%3E] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11702) GridCacheNearOnlyTopologySelfTest.testNodeLeave is flaky.
Ivan Bessonov created IGNITE-11702: -- Summary: GridCacheNearOnlyTopologySelfTest.testNodeLeave is flaky. Key: IGNITE-11702 URL: https://issues.apache.org/jira/browse/IGNITE-11702 Project: Ignite Issue Type: Test Reporter: Ivan Bessonov Assignee: Ivan Bessonov [https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=5748284805523586815=testDetails] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11684) CacheSerializableTransactionsTest#testGetRemoveTxNearCache2 (and 1) is flacky
Ivan Bessonov created IGNITE-11684: -- Summary: CacheSerializableTransactionsTest#testGetRemoveTxNearCache2 (and 1) is flacky Key: IGNITE-11684 URL: https://issues.apache.org/jira/browse/IGNITE-11684 Project: Ignite Issue Type: Test Reporter: Ivan Bessonov Assignee: Ivan Bessonov [https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=4876515758596068461=testDetails_IgniteTests24Java8=%3Cdefault%3E] Problem occurs when two optimistic transactions are being executed from the same client with near cache, when one of transaction removes key and another updates. In this scenario near node can be removed from "readers" list if "remove" transaction was completed before "update" transaction. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11683) DistributedMetaStoragePersistentTest#testClientReconnect hangs sometimes.
Ivan Bessonov created IGNITE-11683: -- Summary: DistributedMetaStoragePersistentTest#testClientReconnect hangs sometimes. Key: IGNITE-11683 URL: https://issues.apache.org/jira/browse/IGNITE-11683 Project: Ignite Issue Type: Test Reporter: Ivan Bessonov Assignee: Ivan Bessonov The problem occurs right after this line: {code:java} assertTrue(GridTestUtils.waitForCondition(() -> metastorage(1).getUpdatesCount() == expUpdatesCnt, 15_000)); {code} Client node might not be fully reconnected yet. Adding following line resolves the problem in the particular test: {code:java} grid(1).cluster().clientReconnectFuture().get(); {code} I don't consider this a proper fix. Stopping the client that hasn't finished its reconnect shouldn't result in infinite waiting (or deadlock). Client node should be stopped successfully. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11681) Some DistributedMetaStorageTest test methods constantly fail in master
Ivan Bessonov created IGNITE-11681: -- Summary: Some DistributedMetaStorageTest test methods constantly fail in master Key: IGNITE-11681 URL: https://issues.apache.org/jira/browse/IGNITE-11681 Project: Ignite Issue Type: Test Reporter: Ivan Bessonov Assignee: Ivan Bessonov [https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=1994017253676952364=testDetails] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11612) http.GridHttpDeploymentSelfTest always fails
Ivan Bessonov created IGNITE-11612: -- Summary: http.GridHttpDeploymentSelfTest always fails Key: IGNITE-11612 URL: https://issues.apache.org/jira/browse/IGNITE-11612 Project: Ignite Issue Type: Test Reporter: Ivan Bessonov Assignee: Ivan Bessonov Fix For: 2.8 [https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=8313984068492573325=testDetails] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11574) Exchange on NodeLeft event hangs when cluster is in transition state
Ivan Bessonov created IGNITE-11574: -- Summary: Exchange on NodeLeft event hangs when cluster is in transition state Key: IGNITE-11574 URL: https://issues.apache.org/jira/browse/IGNITE-11574 Project: Ignite Issue Type: Bug Reporter: Ivan Bessonov Assignee: Ivan Bessonov Attachments: ExchangeDeadlockTest.java The problem is in this code (GridCachePartitionExchangeManager#start0) : {code:java} if (cache.state().transition()) { if (log.isDebugEnabled()) log.debug("Adding pending event: " + evt); pendingEvts.add(new PendingDiscoveryEvent(evt, cache)); }{code} Problem occurs when "setBaseline" and "nodeLeft" events happen simultaneously (+ some undetermined conditions). "nodeLeft" provokes exchange, and while that exchange isn't finished "setBaseline" is invoked. This moves cluster into a transition state and "CacheAffinityChangeMessage" from the exchange cannot be processed properly. At the same time "setBaseline" cannot be completed before the exchange, so we have a deadlock. Reproducer attached. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11560) @WithSystemProperty annotation breaks some existing tests.
Ivan Bessonov created IGNITE-11560: -- Summary: @WithSystemProperty annotation breaks some existing tests. Key: IGNITE-11560 URL: https://issues.apache.org/jira/browse/IGNITE-11560 Project: Ignite Issue Type: Bug Reporter: Ivan Bessonov Assignee: Ivan Bessonov https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=-4555192785549771867=%3Cdefault%3E=testDetails -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11416) DistributedMetaStorage improvements
Ivan Bessonov created IGNITE-11416: -- Summary: DistributedMetaStorage improvements Key: IGNITE-11416 URL: https://issues.apache.org/jira/browse/IGNITE-11416 Project: Ignite Issue Type: Improvement Reporter: Ivan Bessonov Assignee: Ivan Bessonov We need following improvements: * do not write the same value twice in a row, this would lead to history pollution; * add "putAll" functionality on binary level, not in public API yet. This would simplify the migration in future; * do not use "*HistoryItem" class for everything, this is not conventional; * retrieve "dmsVer" from cluster on handshake, this would help to reduce joining node DataBag size drastically; * add "isEmpty()" or "long getVersion()" method to metastorage, will be helpful for components that use it; * there has to be an ability to read data on client nodes, maybe write as well, not sure yet. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11390) DistributedMetaStorage start is incorrect for in-memory cluster
Ivan Bessonov created IGNITE-11390: -- Summary: DistributedMetaStorage start is incorrect for in-memory cluster Key: IGNITE-11390 URL: https://issues.apache.org/jira/browse/IGNITE-11390 Project: Ignite Issue Type: Bug Reporter: Ivan Bessonov Assignee: Ivan Bessonov "onReadyForRead" is invoked in "start" method, which makes it impossible to register any listeners after "DistributedMetaStorageImpl" is started. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11388) Fix UnsupportedOperationException in MarshallerContextImpl
Ivan Bessonov created IGNITE-11388: -- Summary: Fix UnsupportedOperationException in MarshallerContextImpl Key: IGNITE-11388 URL: https://issues.apache.org/jira/browse/IGNITE-11388 Project: Ignite Issue Type: Bug Reporter: Ivan Bessonov Assignee: Ivan Bessonov Method org.apache.ignite.internal.MarshallerContextImpl#registerClassName(byte, int, java.lang.String) should be properly implemented. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11362) New protocol version is absent in baseline autoadjustment visor args
Ivan Bessonov created IGNITE-11362: -- Summary: New protocol version is absent in baseline autoadjustment visor args Key: IGNITE-11362 URL: https://issues.apache.org/jira/browse/IGNITE-11362 Project: Ignite Issue Type: Improvement Reporter: Ivan Bessonov Assignee: Ivan Bessonov org.apache.ignite.internal.visor.VisorDataTransferObject#getProtocolVersion method should be overridden when updating visor classes. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11347) DistributedMetaStoragePersistentTest.testUnstableTopology is flaky
Ivan Bessonov created IGNITE-11347: -- Summary: DistributedMetaStoragePersistentTest.testUnstableTopology is flaky Key: IGNITE-11347 URL: https://issues.apache.org/jira/browse/IGNITE-11347 Project: Ignite Issue Type: Test Reporter: Ivan Bessonov Assignee: Ivan Bessonov Fix For: 2.8 https://ci.ignite.apache.org/project.html?tab=testDetails=IgniteTests24Java8=-976745117458855384=1 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11323) Reduce boilerplate "System.setProperty" code in tests
Ivan Bessonov created IGNITE-11323: -- Summary: Reduce boilerplate "System.setProperty" code in tests Key: IGNITE-11323 URL: https://issues.apache.org/jira/browse/IGNITE-11323 Project: Ignite Issue Type: Test Reporter: Ivan Bessonov Assignee: Ivan Bessonov There are many examples in tests where some property gets new value in "beforeTestsStarted"/"beforeTest"/"beginning of test method" and then gets its previous value in "afterTestsStopped"/"afterTest"/"finally block of test method". This approach leads to excessive code that can be avoided. I suggest implementing annotation "WithSystemProperty" (name is the subject to discussion) that will allow us to write this: {code:java} @Test @WithSystemProperty(key = IGNITE_SKIP_CONFIGURATION_CONSISTENCY_CHECK, value = "true") public void testSkipCheckConsistencyFlagEnabled() throws Exception { ... } {code} instead of this: {code:java} @Test public void testSkipCheckConsistencyFlagEnabled() throws Exception { String backup = System.setProperty(IGNITE_SKIP_CONFIGURATION_CONSISTENCY_CHECK, "true"); try { ... } finally { if (backup != null) System.setProperty(IGNITE_SKIP_CONFIGURATION_CONSISTENCY_CHECK, backup); else System.clearProperty(IGNITE_SKIP_CONFIGURATION_CONSISTENCY_CHECK); } } {code} There's also has to be ability to use this annotation on test class so new value of system properties will be used in all of its test methods. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11313) Cluster hangs on cache invoke with binary objects creation
Ivan Bessonov created IGNITE-11313: -- Summary: Cluster hangs on cache invoke with binary objects creation Key: IGNITE-11313 URL: https://issues.apache.org/jira/browse/IGNITE-11313 Project: Ignite Issue Type: Bug Reporter: Ivan Bessonov Assignee: Ivan Bessonov Creating of binary objects in entry processors in parallel with continuous queries may lead to deadlock: {code:java} [2019-02-11 18:52:50,129][WARN ][grid-timeout-worker-#39] >>> Possible starvation in striped pool. Thread name: sys-stripe-13-#14 Queue: [] Deadlock: false Completed: 1 Thread [name="sys-stripe-13-#14", id=33, state=WAITING, blockCnt=3, waitCnt=3] at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304) at o.a.i.i.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:178) at o.a.i.i.util.future.GridFutureAdapter.get(GridFutureAdapter.java:141) at o.a.i.i.MarshallerContextImpl.registerClassName(MarshallerContextImpl.java:284) at o.a.i.i.binary.BinaryContext.registerUserClassName(BinaryContext.java:1202) at o.a.i.i.binary.builder.BinaryObjectBuilderImpl.serializeTo(BinaryObjectBuilderImpl.java:366) at o.a.i.i.binary.builder.BinaryObjectBuilderImpl.build(BinaryObjectBuilderImpl.java:189) at o.a.i.scenario.InvokeTask$MyEntryProcessor.process(InvokeTask.java:106) at o.a.i.i.processors.cache.EntryProcessorResourceInjectorProxy.process(EntryProcessorResourceInjectorProxy.java:68) at o.a.i.i.processors.cache.distributed.dht.GridDhtTxPrepareFuture.onEntriesLocked(GridDhtTxPrepareFuture.java:446) at o.a.i.i.processors.cache.distributed.dht.GridDhtTxPrepareFuture.prepare0(GridDhtTxPrepareFuture.java:1302) at o.a.i.i.processors.cache.distributed.dht.GridDhtTxPrepareFuture.mapIfLocked(GridDhtTxPrepareFuture.java:713) at o.a.i.i.processors.cache.distributed.dht.GridDhtTxPrepareFuture.prepare(GridDhtTxPrepareFuture.java:1103) at o.a.i.i.processors.cache.distributed.dht.GridDhtTxLocal.prepareAsync(GridDhtTxLocal.java:405) at o.a.i.i.processors.cache.transactions.IgniteTxHandler.prepareNearTx(IgniteTxHandler.java:569) at o.a.i.i.processors.cache.transactions.IgniteTxHandler.prepareNearTx(IgniteTxHandler.java:367) at o.a.i.i.processors.cache.transactions.IgniteTxHandler.processNearTxPrepareRequest0(IgniteTxHandler.java:171) at o.a.i.i.processors.cache.transactions.IgniteTxHandler.processNearTxPrepareRequest(IgniteTxHandler.java:156) at o.a.i.i.processors.cache.transactions.IgniteTxHandler.access$000(IgniteTxHandler.java:118) at o.a.i.i.processors.cache.transactions.IgniteTxHandler$1.apply(IgniteTxHandler.java:198) at o.a.i.i.processors.cache.transactions.IgniteTxHandler$1.apply(IgniteTxHandler.java:196) at o.a.i.i.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1129) at o.a.i.i.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:594) at o.a.i.i.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:393) at o.a.i.i.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:319) at o.a.i.i.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:109) at o.a.i.i.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:308) at o.a.i.i.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1569) at o.a.i.i.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1197) at o.a.i.i.managers.communication.GridIoManager.access$4200(GridIoManager.java:127) at o.a.i.i.managers.communication.GridIoManager$9.run(GridIoManager.java:1093) at o.a.i.i.util.StripedExecutor$Stripe.body(StripedExecutor.java:505) at o.a.i.i.util.worker.GridWorker.run(GridWorker.java:120) at java.lang.Thread.run(Thread.java:748) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11293) NPE in TcpCommunicationSpi
Ivan Bessonov created IGNITE-11293: -- Summary: NPE in TcpCommunicationSpi Key: IGNITE-11293 URL: https://issues.apache.org/jira/browse/IGNITE-11293 Project: Ignite Issue Type: Bug Reporter: Ivan Bessonov Assignee: Ivan Bessonov Sometimes this exception is happening in "stopAllGrids" method in tests: {code:java} java.lang.NullPointerException at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2787) at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2717) at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1648) at org.apache.ignite.internal.managers.communication.GridIoManager.sendOrderedMessage(GridIoManager.java:1757) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.sendOrderedMessage(GridCacheIoManager.java:1303) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander.lambda$null$1(GridDhtPartitionDemander.java:505) at org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:6892) at org.apache.ignite.internal.processors.closure.GridClosureProcessor$1.body(GridClosureProcessor.java:827) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) {code} Reason is interruption thrown from TcpCommunicationSpi#reserveClient, SPI is already in invalid state at this point so "log" and some other fields are nulls. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11264) JVM crash in OffheapReadWriteLock#tryWriteLock
Ivan Bessonov created IGNITE-11264: -- Summary: JVM crash in OffheapReadWriteLock#tryWriteLock Key: IGNITE-11264 URL: https://issues.apache.org/jira/browse/IGNITE-11264 Project: Ignite Issue Type: Bug Reporter: Ivan Bessonov Assignee: Eduard Shangareev Attachments: hs_err_pid19407.log JVM crash in the end of IgniteClusterActivateDeactivateTest#testClientReconnectClusterActivateInProgress. Test was invoked using "Until Failure" mode in IDEA. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11236) Add Distributed Metastorage to the list of IgniteFeatures
Ivan Bessonov created IGNITE-11236: -- Summary: Add Distributed Metastorage to the list of IgniteFeatures Key: IGNITE-11236 URL: https://issues.apache.org/jira/browse/IGNITE-11236 Project: Ignite Issue Type: Improvement Reporter: Ivan Bessonov Assignee: Ivan Bessonov Add Distributed Metastorage to the list of IgniteFeatures -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11188) Optimize baseline autoadjustment for in-memory clusters with zero timeout
Ivan Bessonov created IGNITE-11188: -- Summary: Optimize baseline autoadjustment for in-memory clusters with zero timeout Key: IGNITE-11188 URL: https://issues.apache.org/jira/browse/IGNITE-11188 Project: Ignite Issue Type: Improvement Reporter: Ivan Bessonov Assignee: Ivan Bessonov Fix For: 2.8 In current implementation (IGNITE-8571) zero-timeout case initiates two partition map exchanges on join/leave node events. This could be improved so that baseline is updated at the same time as join/leave event processing. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11111) DistributedMetaStoragePersistentTest doesn't work on Zookeeper SPI
Ivan Bessonov created IGNITE-1: -- Summary: DistributedMetaStoragePersistentTest doesn't work on Zookeeper SPI Key: IGNITE-1 URL: https://issues.apache.org/jira/browse/IGNITE-1 Project: Ignite Issue Type: Bug Reporter: Ivan Bessonov Assignee: Ivan Bessonov Accepting of joining node data is implemented in #collectGridNodeData which is wrong, because this method is invoked on coordinator only in case of Zookeeper SPI. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11109) DistributedMetaStorageTest should be moved into different test suite
Ivan Bessonov created IGNITE-11109: -- Summary: DistributedMetaStorageTest should be moved into different test suite Key: IGNITE-11109 URL: https://issues.apache.org/jira/browse/IGNITE-11109 Project: Ignite Issue Type: Test Reporter: Ivan Bessonov Assignee: Ivan Bessonov DistributedMetaStorageTest should be moved into different test suite: https://ci.ignite.apache.org/viewLog.html?buildId=2930324=buildResultsDiv=IgniteTests24Java8_DiskPageCompressions#testNameId5631188409568136213 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11108) Zookeeper handles DataBags differently
Ivan Bessonov created IGNITE-11108: -- Summary: Zookeeper handles DataBags differently Key: IGNITE-11108 URL: https://issues.apache.org/jira/browse/IGNITE-11108 Project: Ignite Issue Type: Bug Reporter: Ivan Bessonov Assignee: Eduard Shangareev Trying to run DistributedMetaStoragePersistentTest in Zookeeper Discovery suite I found that GridComponent#validateNode(ClusterNode, JoiningNodeDiscoveryData) is never invoked so node validation might work incorrectly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11066) Start MetaStorage for write before activation
Ivan Bessonov created IGNITE-11066: -- Summary: Start MetaStorage for write before activation Key: IGNITE-11066 URL: https://issues.apache.org/jira/browse/IGNITE-11066 Project: Ignite Issue Type: Improvement Reporter: Ivan Bessonov Assignee: Ivan Bessonov Distributed metastorage (IGNITE-10640) requires local metastorage to be writable to work properly. So early update messages will have to wait until this event which can hang the whole cluster for some time. As a first step of fixing this problem it is proposed to start local metastorage in read-write mode before activation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11063) IgniteClusterActivateDeactivateTestWithPersistence#testDeactivateDuringEvictionAndRebalance has NPE inside of it.
Ivan Bessonov created IGNITE-11063: -- Summary: IgniteClusterActivateDeactivateTestWithPersistence#testDeactivateDuringEvictionAndRebalance has NPE inside of it. Key: IGNITE-11063 URL: https://issues.apache.org/jira/browse/IGNITE-11063 Project: Ignite Issue Type: Test Reporter: Ivan Bessonov {code:java} [2019-01-24 16:23:19,561][ERROR][sys-#221%cache.IgniteClusterActivateDeactivateTestWithPersistence2%][PartitionsEvictManager] Partition eviction failed, this can cause grid hang. java.lang.NullPointerException at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheEntry.clearInternal(GridDhtCacheEntry.java:668) at org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtLocalPartition.clearAll(GridDhtLocalPartition.java:1106) at org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtLocalPartition.tryClear(GridDhtLocalPartition.java:910) at org.apache.ignite.internal.processors.cache.distributed.dht.topology.PartitionsEvictManager$PartitionEvictionTask.run(PartitionsEvictManager.java:415) at org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:6873) at org.apache.ignite.internal.processors.closure.GridClosureProcessor$1.body(GridClosureProcessor.java:827) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) {code} {code:java} [2019-01-24 16:28:30,862][ERROR][sys-#221%cache.IgniteClusterActivateDeactivateTestWithPersistence2%][PartitionsEvictManager] Partition eviction failed, this can cause grid hang. java.lang.NullPointerException at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.finishRemove(IgniteCacheOffheapManagerImpl.java:2566) at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.remove(IgniteCacheOffheapManagerImpl.java:2544) at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.remove(GridCacheOffheapManager.java:2122) at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.remove(IgniteCacheOffheapManagerImpl.java:634) at org.apache.ignite.internal.processors.cache.GridCacheMapEntry.removeValue(GridCacheMapEntry.java:4391) at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheEntry.clearInternal(GridDhtCacheEntry.java:652) at org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtLocalPartition.clearAll(GridDhtLocalPartition.java:1106) at org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtLocalPartition.tryClear(GridDhtLocalPartition.java:910) at org.apache.ignite.internal.processors.cache.distributed.dht.topology.PartitionsEvictManager$PartitionEvictionTask.run(PartitionsEvictManager.java:415) at org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:6873) at org.apache.ignite.internal.processors.closure.GridClosureProcessor$1.body(GridClosureProcessor.java:827) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) {code} There errors are caused by the lack of synchronization between partitions eviction and deactivation. Test is green despite all the errors. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10945) Document Baseline auto-adjust feature
Ivan Bessonov created IGNITE-10945: -- Summary: Document Baseline auto-adjust feature Key: IGNITE-10945 URL: https://issues.apache.org/jira/browse/IGNITE-10945 Project: Ignite Issue Type: Task Components: documentation Reporter: Ivan Bessonov Assignee: Artem Budnikov >From IGNITE-8571: Now we have only one way to change BLAT - manually update it via console.sh or API. We need to add the possibility to change it automatically. Adjust to current topology. So, I propose 3 new parameters which would be responsible to tune this feature. 1. Flag autoAdjustEnabled - true/false. Easy. Manual baseline control or auto adjusting baseline. 2. autoAdjustTimeout - time which we would wait after the actual topology change. But it would be reset if new discovery event happened. (node join/exit). 3. autoAdjustMaxTimeout - time which we would wait from the first dicovery event in the chain. If we achieved it than we would change BLAT right away (no matter were another node join/exit happened or not). We need to change API next way: 1. org.apache.ignite.IgniteCluster *Add* isBaselineAutoAdjustEnabled() setBaselineAutoAdjustEnabled(boolean enabled); setBaselineAutoAdjustTimeout(long timeoutInMs); setBaselineAutoAdjustMaxTimeout(long timeoutInMs); 2. org.apache.ignite.configuration.IgniteConfiguration *Add* IgniteConfiguration setBaselineAutoAdjustEnabled(boolean enabled); IgniteConfiguration setBaselineAutoAdjustTimeout(long timeoutInMs); IgniteConfiguration setBaselineAutoAdjustMaxTimeout(long timeoutInMs); Also, we need to ensure that all nodes would have the same parameters. And we should be able to survive coordinator left during parameters changes. - For IGNITE-8575: Proposed API format for control.sh: {{--baseline autoadjust disable}} {{--baseline autoadjust enable }} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10640) Create cluster-wide MetaStorage analogue
Ivan Bessonov created IGNITE-10640: -- Summary: Create cluster-wide MetaStorage analogue Key: IGNITE-10640 URL: https://issues.apache.org/jira/browse/IGNITE-10640 Project: Ignite Issue Type: New Feature Reporter: Ivan Bessonov Assignee: Ivan Bessonov Issues like IGNITE-8571 require the ability to store and update some properties consistently on the whole cluster. It is proposed to implement generic was of doing this. Main requirements: * read / write / delete; * surviving node / cluster restart; * consistency; * ability to add listeners on changing properties. First implementation is going to be based on local MetaStorage to guarantee data persistence. Existing MetaStorage API is a subject to change as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10475) Introduce IDEA async debugger annotations.
Ivan Bessonov created IGNITE-10475: -- Summary: Introduce IDEA async debugger annotations. Key: IGNITE-10475 URL: https://issues.apache.org/jira/browse/IGNITE-10475 Project: Ignite Issue Type: Improvement Reporter: Ivan Bessonov Assignee: Ivan Bessonov "JetBrains Java Annotation" library introduced "@Async" annotation in version 16: https://www.jetbrains.com/help/idea/async-stacktraces.html Since we use this version now we may as well integrate "@Async" into "IgniteFuture" and maybe other suitable classes. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10369) PDS 4 hangs on TC
Ivan Bessonov created IGNITE-10369: -- Summary: PDS 4 hangs on TC Key: IGNITE-10369 URL: https://issues.apache.org/jira/browse/IGNITE-10369 Project: Ignite Issue Type: Test Reporter: Ivan Bessonov Assignee: Ivan Bessonov [https://ci.ignite.apache.org/viewLog.html?buildId=2365697=buildResultsDiv=IgniteTests24Java8_Pds4] org.apache.ignite.internal.processors.cache.IgniteClusterActivateDeactivateTestWithPersistenceAndMemoryReuse#testClientJoinsWhenActivationIsInProgress hangs on client connection. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10321) Bug CacheContinuousWithTransformerReplicatedSelfTest.LocalEventListener causes certain tests to flack.
Ivan Bessonov created IGNITE-10321: -- Summary: Bug CacheContinuousWithTransformerReplicatedSelfTest.LocalEventListener causes certain tests to flack. Key: IGNITE-10321 URL: https://issues.apache.org/jira/browse/IGNITE-10321 Project: Ignite Issue Type: Improvement Reporter: Ivan Bessonov Assignee: Ivan Bessonov Fix For: 2.8 Known problems: https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=-5285455933531531639=testDetails https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=579266511269744969=testDetails -- This message was sent by Atlassian JIRA (v7.6.3#76005)