[jira] [Created] (IGNITE-14197) Checkpoint thread can't take checkpoint write lock because it waits for parked threads to complete their work
Anton Kalashnikov created IGNITE-14197: -- Summary: Checkpoint thread can't take checkpoint write lock because it waits for parked threads to complete their work Key: IGNITE-14197 URL: https://issues.apache.org/jira/browse/IGNITE-14197 Project: Ignite Issue Type: Bug Reporter: Anton Kalashnikov Assignee: Anton Kalashnikov In case of enabled write throttling, when, for example, node parks data streamer thread, it still holds checkpoint read lock and it leads to the long pauses on waiting for checkpoint lock: [2020-07-23 07:09:21,614][INFO ][db-checkpoint-thread-#371][GridCacheDatabaseSharedManager] Checkpoint started [checkpointId=f964c8f2-daa5-41b2-80ef-944326f26f8a, startPtr=FileWALPointer [idx=56913, fileOff=10362905, len=41972], checkpointBeforeLockTime=1983ms, *checkpointLockWait=812117ms*, checkpointListenersExecuteTime=90ms, checkpointLockHoldTime=93ms, walCpRecordFsyncDuration=123ms, writeCheckpointEntryDuration=4ms, splitAndSortCpPagesDuration=4155ms, pages=10516815, reason='too big size of WAL without checkpoint'] All operations at this moment are blocked. Sometimes, it can lead to a complete disaster: Parking thread=data-streamer-stripe-47-#144 for timeout(ms)=*21278855* {quote}“data-streamer-stripe-78-#175” #209 prio=5 os_prio=0 tid=0x7f6161d6a800 nid=0xf932 waiting on condition [0x7f5c292d1000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:338) at org.apache.ignite.internal.processors.cache.persistence.pagemem.PagesWriteSpeedBasedThrottle.doPark(PagesWriteSpeedBasedThrottle.java:244) at org.apache.ignite.internal.processors.cache.persistence.pagemem.PagesWriteSpeedBasedThrottle.onMarkDirty(PagesWriteSpeedBasedThrottle.java:227) at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.writeUnlockPage(PageMemoryImpl.java:1730) at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.writeUnlock(PageMemoryImpl.java:491) at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.writeUnlock(PageMemoryImpl.java:483) at org.apache.ignite.internal.processors.cache.persistence.tree.util.PageHandler.writeUnlock(PageHandler.java:394) at org.apache.ignite.internal.processors.cache.persistence.tree.util.PageHandler.writePage(PageHandler.java:369) at org.apache.ignite.internal.processors.cache.persistence.DataStructure.write(DataStructure.java:296) at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.access$11300(BPlusTree.java:98) at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Put.tryInsert(BPlusTree.java:3864) at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Put.access$7100(BPlusTree.java:3544) at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Invoke.onNotFound(BPlusTree.java:4103) at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Invoke.access$5800(BPlusTree.java:3894) at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invokeDown(BPlusTree.java:2022) at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invokeDown(BPlusTree.java:1997) at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invoke(BPlusTree.java:1904) at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke0(IgniteCacheOffheapManagerImpl.java:1662) at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke(IgniteCacheOffheapManagerImpl.java:1645) at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.invoke(GridCacheOffheapManager.java:2473) at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.invoke(IgniteCacheOffheapManagerImpl.java:436) at org.apache.ignite.internal.processors.cache.GridCacheMapEntry.storeValue(GridCacheMapEntry.java:4306) at org.apache.ignite.internal.processors.cache.GridCacheMapEntry.initialValue(GridCacheMapEntry.java:3441) at org.apache.ignite.internal.processors.cache.GridCacheEntryEx.initialValue(GridCacheEntryEx.java:770) at org.apache.ignite.internal.processors.datastreamer.DataStreamerImpl$IsolatedUpdater.receive(DataStreamerImpl.java:2278) at org.apache.ignite.internal.processors.datastreamer.DataStreamerUpdateJob.call(DataStreamerUpdateJob.java:139) at org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:7104) at org.apache.ignite.internal.processors.closure.GridClosureProcessor$2.body(GridClosureProcessor.java:966) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:119) at org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:559) at
[jira] [Created] (IGNITE-14110) Create networking module
Anton Kalashnikov created IGNITE-14110: -- Summary: Create networking module Key: IGNITE-14110 URL: https://issues.apache.org/jira/browse/IGNITE-14110 Project: Ignite Issue Type: Sub-task Reporter: Anton Kalashnikov Assignee: Anton Kalashnikov It needs to create a networking module with some API and simple implementation for further improvment -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-14092) Design network address resolver
Anton Kalashnikov created IGNITE-14092: -- Summary: Design network address resolver Key: IGNITE-14092 URL: https://issues.apache.org/jira/browse/IGNITE-14092 Project: Ignite Issue Type: Sub-task Reporter: Anton Kalashnikov It needs to design network address resolver/ip finder/discovery which would help to choose the right ip/port for connection. Perhaps we don't need such a service at all but it should be explicitly agreed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-14091) Implement messaging service
Anton Kalashnikov created IGNITE-14091: -- Summary: Implement messaging service Key: IGNITE-14091 URL: https://issues.apache.org/jira/browse/IGNITE-14091 Project: Ignite Issue Type: Sub-task Reporter: Anton Kalashnikov It needs to implement the ability to send/receive messages to/from network members: * there's a requirements of being able to send idempotent messages with very weak guarantees: ** no delivery guarantees required; ** multiple copies of the same message might be sent; ** no need to have any kind of acknowledgement; * there's another requirement for the common use: ** message must be sent exactly once with an acknowledgement that it has actually been received (not necessarily processed); ** messages must be received in the same order they were sent. These types of messages might utilize current recovery protocol with acks every 32 (or so) messages. This setting must be flexible enough so that we won't get OOM in big topologies. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-14090) Networking API
Anton Kalashnikov created IGNITE-14090: -- Summary: Networking API Key: IGNITE-14090 URL: https://issues.apache.org/jira/browse/IGNITE-14090 Project: Ignite Issue Type: Sub-task Reporter: Anton Kalashnikov It needs to design convinient public API for networking module which allow to get information about network members and send/receive messages from them. Draft: {noformat} public interface NetworkService \{ static NetworkService create(NetworkConfiguration cfg); void shutdown() throws ???; NetworkMember localMember(); Collection remoteMembers(); void weakSend(NetworkMember member, Message msg); Future guaranteedSend(NetworkMember member, Message msg); void listenMembers(MembershipListener lsnr); void listenMessages(Consumer lsnr); } public interface MembershipListener \{ void onAppeared(NetworkMember member); void onDisappeared(NetworkMember member); void onAcceptedByGroup(List remoteMembers); } public interface NetworkMember \{ UUID id(); } {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-14089) Override scalecube internal message by custom one
Anton Kalashnikov created IGNITE-14089: -- Summary: Override scalecube internal message by custom one Key: IGNITE-14089 URL: https://issues.apache.org/jira/browse/IGNITE-14089 Project: Ignite Issue Type: Sub-task Reporter: Anton Kalashnikov There is some custom logic in the networking module like a specific handshake, message recovery etc. which requires to have specific messages but at the same time default scalecube behaviour should be worked correctly. So it needs to implement one logic over another. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-14088) Implement scalecube transport API over netty
Anton Kalashnikov created IGNITE-14088: -- Summary: Implement scalecube transport API over netty Key: IGNITE-14088 URL: https://issues.apache.org/jira/browse/IGNITE-14088 Project: Ignite Issue Type: Sub-task Reporter: Anton Kalashnikov scalecube has its own netty inside but it is idea to integrate our expanded netty into it. It will help us to support more features like our own handshake, marshalling etc. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-14086) Implement retry of establishing connection if it was lost
Anton Kalashnikov created IGNITE-14086: -- Summary: Implement retry of establishing connection if it was lost Key: IGNITE-14086 URL: https://issues.apache.org/jira/browse/IGNITE-14086 Project: Ignite Issue Type: Sub-task Reporter: Anton Kalashnikov It needs to implement a retry of establishing the connection. It is not clear which way is better to implement such idea because the current implementation too difficult to configure(number of retries, several properties of retry time). So it needs to think a better way to configure it. And it needs to be implementeded. Perhaps, scalecube(gossip protocol) do all work already and we should do nothing here. Need to recheck. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-14085) Implement message recovery protocol over handshake
Anton Kalashnikov created IGNITE-14085: -- Summary: Implement message recovery protocol over handshake Key: IGNITE-14085 URL: https://issues.apache.org/jira/browse/IGNITE-14085 Project: Ignite Issue Type: Sub-task Reporter: Anton Kalashnikov The central idea of recovery protocol is the same as it is in the current implementation. So it needs to implement a similar idea with the recovery descriptor. This means information about last sending/received messages should be sent during the handshake and according to this information messages which were not received should be sent one more time. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-14084) Integrate direct marshalling to networking
Anton Kalashnikov created IGNITE-14084: -- Summary: Integrate direct marshalling to networking Key: IGNITE-14084 URL: https://issues.apache.org/jira/browse/IGNITE-14084 Project: Ignite Issue Type: Sub-task Reporter: Anton Kalashnikov Direct marshalling can be extracted from ignite2.x and integrate to ignite3.0. It helps to avoid extra data copy during the sending/receiving messages. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-14083) Add SSL support to networking
Anton Kalashnikov created IGNITE-14083: -- Summary: Add SSL support to networking Key: IGNITE-14083 URL: https://issues.apache.org/jira/browse/IGNITE-14083 Project: Ignite Issue Type: Sub-task Reporter: Anton Kalashnikov It needs to add the ability to establish SSL connection. It looks like it should not be a problem. But at least, it needs to design configuration which allow to manage the ssl(path to certificate, password, etc.) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-14082) Implementation of handshake for new connection
Anton Kalashnikov created IGNITE-14082: -- Summary: Implementation of handshake for new connection Key: IGNITE-14082 URL: https://issues.apache.org/jira/browse/IGNITE-14082 Project: Ignite Issue Type: Sub-task Reporter: Anton Kalashnikov It needs to implement the handshake after netty establish the connection. Perhaps, It makes sense to use netty handlers. During the handshake, It needs to exchange instanceId from one endpoint to another. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-14081) Networking module
Anton Kalashnikov created IGNITE-14081: -- Summary: Networking module Key: IGNITE-14081 URL: https://issues.apache.org/jira/browse/IGNITE-14081 Project: Ignite Issue Type: New Feature Reporter: Anton Kalashnikov -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-14055) Deadlock in timeoutObjectProcessor between 'send messag'e & 'handshake timeout'
Anton Kalashnikov created IGNITE-14055: -- Summary: Deadlock in timeoutObjectProcessor between 'send messag'e & 'handshake timeout' Key: IGNITE-14055 URL: https://issues.apache.org/jira/browse/IGNITE-14055 Project: Ignite Issue Type: Bug Reporter: Anton Kalashnikov Assignee: Anton Kalashnikov Cluster hangs after jvm pauses on one of server nodes. Scenario: 1. Start three server nodes with put operations using StartServerWithTxPuts. 2. Emulate jvm freezes on one server node by running the attached script: {{*sh freeze.sh *}} 3. Wait until the script has finished. Result: The cluster hangs on tx put operations. The first server node continuously prints: {{{noformat}}} {{}}{{[2020-11-03 09:36:01,719][INFO ][grid-nio-worker-tcp-comm-3-#26%TcpCommunicationSpi%][TcpCommunicationSpi] Accepted incoming communication connection [locAddr=/127.0.0.1:47100, rmtAddr=/127.0.0.1:57714][2020-11-03 09:36:01,720][INFO ][grid-nio-worker-tcp-comm-3-#26%TcpCommunicationSpi%][TcpCommunicationSpi] Received incoming connection from remote node while connecting to this node, rejecting [locNode=5defd32f-5bdb-4b9e-8a6e-5ee268edac42, locNodeOrder=1, rmtNode=07583a9d-36c8-4100-a69c-8cbd26ca82c9, rmtNodeOrder=3][2020-11-03 09:36:01,922][INFO ][grid-nio-worker-tcp-comm-0-#23%TcpCommunicationSpi%][TcpCommunicationSpi] Accepted incoming communication connection [locAddr=/127.0.0.1:47100, rmtAddr=/127.0.0.1:57716][2020-11-03 09:36:01,922][INFO ][grid-nio-worker-tcp-comm-0-#23%TcpCommunicationSpi%][TcpCommunicationSpi] Received incoming connection from remote node while connecting to this node, rejecting [locNode=5defd32f-5bdb-4b9e-8a6e-5ee268edac42, locNodeOrder=1, rmtNode=07583a9d-36c8-4100-a69c-8cbd26ca82c9, rmtNodeOrder=3][2020-11-03 09:36:02,124][INFO ][grid-nio-worker-tcp-comm-1-#24%TcpCommunicationSpi%][TcpCommunicationSpi] Accepted incoming communication connection [locAddr=/127.0.0.1:47100, rmtAddr=/127.0.0.1:57718][2020-11-03 09:36:02,125][INFO ][grid-nio-worker-tcp-comm-1-#24%TcpCommunicationSpi%][TcpCommunicationSpi] Received incoming connection from remote node while connecting to this node, rejecting [locNode=5defd32f-5bdb-4b9e-8a6e-5ee268edac42, locNodeOrder=1, rmtNode=07583a9d-36c8-4100-a69c-8cbd26ca82c9, rmtNodeOrder=3][2020-11-03 09:36:02,326][INFO ][grid-nio-worker-tcp-comm-2-#25%TcpCommunicationSpi%][TcpCommunicationSpi] Accepted incoming communication connection [locAddr=/127.0.0.1:47100, rmtAddr=/127.0.0.1:57720][2020-11-03 09:36:02,327][INFO ][grid-nio-worker-tcp-comm-2-#25%TcpCommunicationSpi%][TcpCommunicationSpi] Received incoming connection from remote node while connecting to this node, rejecting [locNode=5defd32f-5bdb-4b9e-8a6e-5ee268edac42, locNodeOrder=1, rmtNode=07583a9d-36c8-4100-a69c-8cbd26ca82c9, rmtNodeOrder=3][2020-11-03 09:36:02,528][INFO ][grid-nio-worker-tcp-comm-3-#26%TcpCommunicationSpi%][TcpCommunicationSpi] Accepted incoming communication connection [locAddr=/127.0.0.1:47100, rmtAddr=/127.0.0.1:57722][2020-11-03 09:36:02,529][INFO ][grid-nio-worker-tcp-comm-3-#26%TcpCommunicationSpi%][TcpCommunicationSpi] Received incoming connection from remote node while connecting to this node, rejecting [locNode=5defd32f-5bdb-4b9e-8a6e-5ee268edac42, locNodeOrder=1, rmtNode=07583a9d-36c8-4100-a69c-8cbd26ca82c9, rmtNodeOrder=3]}} {{{noformat}}}{{}} The second node prints long running transactions in prepared state ignoring the default tx timeout: {{{noformat}}} {{1}}{{[2020-11-03 09:36:46,199][WARN ][sys-#83%56b4f715-82d6-4d63-ba99-441ffcd673b4%][diagnostic] >>> Future [startTime=09:33:08.496, curTime=09:36:46.181, fut=GridNearTxFinishFuture [futId=425decc8571-4ce98554-8c56-4daf-a7a9-5b9bff52fa08, tx=GridNearTxLocal [mappings=IgniteTxMappingsSingleImpl [mapping=GridDistributedTxMapping [entries=LinkedHashSet [IgniteTxEntry [txKey=IgniteTxKey [key=KeyCacheObjectImpl [part=833, val=833, hasValBytes=true], cacheId=-923393186], val=TxEntryValueHolder [val=CacheObjectByteArrayImpl [arrLen=1048576], op=CREATE], prevVal=TxEntryValueHolder [val=null, op=NOOP], oldVal=TxEntryValueHolder [val=null, op=NOOP], entryProcessorsCol=null, ttl=-1, conflictExpireTime=-1, conflictVer=null, explicitVer=null, dhtVer=null, filters=CacheEntryPredicate[] [], filtersPassed=false, filtersSet=true, entry=GridDhtDetachedCacheEntry [super=GridDistributedCacheEntry [super=GridCacheMapEntry [key=KeyCacheObjectImpl [part=833, val=833, hasValBytes=true], val=null, ver=GridCacheVersion [topVer=0, order=0, nodeOrder=0], hash=833, extras=null, flags=0]]], prepared=0, locked=false, nodeId=07583a9d-36c8-4100-a69c-8cbd26ca82c9, locMapped=false, expiryPlc=null, transferExpiryPlc=false, flags=0, partUpdateCntr=0, serReadVer=null, xidVer=GridCacheVersion [topVer=215865159, order=1604385188157, nodeOrder=2]]], explicitLock=false, queryUpdate=false,
[jira] [Created] (IGNITE-13972) Clear the item id before moving the page to the reuse bucket
Anton Kalashnikov created IGNITE-13972: -- Summary: Clear the item id before moving the page to the reuse bucket Key: IGNITE-13972 URL: https://issues.apache.org/jira/browse/IGNITE-13972 Project: Ignite Issue Type: Task Reporter: Anton Kalashnikov There is assert - 'Incorrectly recycled pageId in reuse bucket:'(org.apache.ignite.internal.processors.cache.persistence.freelist.PagesList#takeEmptyPage). This assert sometimes fails. The reason is not clear because the same condition checked before putting this page in to reuse bucket. (Perhaps we have more than 1 link to this page?) There is an idea to reset item id to 1 before the putting page to reuse bucket in order of decreasing the possible invariants which can break this assert. It is already true for all data pages but item id can be still more than 1 if it is not a data page(ex. inner page). After that, we can change this assert from checking the range to checking the equality to 1 which theoretically will help us detect the problem fastly. Maybe it is also not a bad idea to set itemId to an impossible value(ex. 0 or 255). Then we can add the assert on every taking from the free list which checks that itemId more than 0 and if it is false that means we have a link to the reuse bucket page from the bucket which is not reused. Which is a bug. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-13843) Wrapper/Converter for primitive configuration
Anton Kalashnikov created IGNITE-13843: -- Summary: Wrapper/Converter for primitive configuration Key: IGNITE-13843 URL: https://issues.apache.org/jira/browse/IGNITE-13843 Project: Ignite Issue Type: Sub-task Reporter: Anton Kalashnikov Do we need the ability to use complex type such InternetAddress as wrapper of some string property? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-13842) Creating the new configuration on old cluster
Anton Kalashnikov created IGNITE-13842: -- Summary: Creating the new configuration on old cluster Key: IGNITE-13842 URL: https://issues.apache.org/jira/browse/IGNITE-13842 Project: Ignite Issue Type: Sub-task Reporter: Anton Kalashnikov Do we need the ability to create a new configuration/property on the working cluster? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-13841) Cluster bootstrapping
Anton Kalashnikov created IGNITE-13841: -- Summary: Cluster bootstrapping Key: IGNITE-13841 URL: https://issues.apache.org/jira/browse/IGNITE-13841 Project: Ignite Issue Type: Sub-task Reporter: Anton Kalashnikov How cluster bootstrapping should look like? Format of files? What is the right moment fr applying configuration? What is the state of the cluster before applying? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-13840) Rething API of Init*, change* classes
Anton Kalashnikov created IGNITE-13840: -- Summary: Rething API of Init*, change* classes Key: IGNITE-13840 URL: https://issues.apache.org/jira/browse/IGNITE-13840 Project: Ignite Issue Type: Sub-task Reporter: Anton Kalashnikov Right now, API of Init*, change* classes look too heavy and contain a lot of code boilerplate. It needs to think about how to simplify it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-13837) Configuration initialization
Anton Kalashnikov created IGNITE-13837: -- Summary: Configuration initialization Key: IGNITE-13837 URL: https://issues.apache.org/jira/browse/IGNITE-13837 Project: Ignite Issue Type: Sub-task Reporter: Anton Kalashnikov It needs to think how the first initialization of node/cluster should look like. What is the format of initial properties(json/hocon etc.)? How should they be handled? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-13836) Multiple property roots support
Anton Kalashnikov created IGNITE-13836: -- Summary: Multiple property roots support Key: IGNITE-13836 URL: https://issues.apache.org/jira/browse/IGNITE-13836 Project: Ignite Issue Type: Sub-task Reporter: Anton Kalashnikov Right now, Configurator is able to manage only one root. It looks like it is not enough. The current idea is to provide the ability to maintain multiple property roots, which allows other modules to create their own roots as needed. ex.: * indexing.query.bufferSize * persistence.pageSize NB! There is not any local/cluster root because it looks like local/cluster shouldn't be there at all. Perhaps it should be a storage-specific feature rather than a property path specific. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-13720) Defragmentation parallelism implementation
Anton Kalashnikov created IGNITE-13720: -- Summary: Defragmentation parallelism implementation Key: IGNITE-13720 URL: https://issues.apache.org/jira/browse/IGNITE-13720 Project: Ignite Issue Type: Sub-task Components: persistence Reporter: Anton Kalashnikov Assignee: Anton Kalashnikov Defragmentation is executed in a single thread right now. It makes sense to execute the defragmentation of partitions of one group in parallel. Several parameters will be added to the defragmentation configuration: * checkpointThreadPoolSize - the size of thread pool which would be used by checkpointer for writing defragmented pages to disk. * executionThreadPoolSize - the size of the thread pool which shows how many partitions maximum can be defragmented at the same time. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-13684) Rewrite PageIo resolver from static to explicit dependency
Anton Kalashnikov created IGNITE-13684: -- Summary: Rewrite PageIo resolver from static to explicit dependency Key: IGNITE-13684 URL: https://issues.apache.org/jira/browse/IGNITE-13684 Project: Ignite Issue Type: Sub-task Reporter: Anton Kalashnikov Assignee: Ivan Bessonov Right now, ignite has a static pageIo resolver which not allow substituting the different implementation if needed. So it is needed to rewrite the current implementation in order of this target. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-13683) Added MVCC validation to ValidateIndexesClosure
Anton Kalashnikov created IGNITE-13683: -- Summary: Added MVCC validation to ValidateIndexesClosure Key: IGNITE-13683 URL: https://issues.apache.org/jira/browse/IGNITE-13683 Project: Ignite Issue Type: Sub-task Reporter: Anton Kalashnikov Assignee: Semyon Danilov MVCC indexes validation should be added to ValidateIndexesClosure -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-13682) Added generic to maintenance mode feature
Anton Kalashnikov created IGNITE-13682: -- Summary: Added generic to maintenance mode feature Key: IGNITE-13682 URL: https://issues.apache.org/jira/browse/IGNITE-13682 Project: Ignite Issue Type: Sub-task Reporter: Anton Kalashnikov Assignee: Anton Kalashnikov MaintenanceAction has no generic right now which lead to parametirezed problem -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-13681) Non markers checkpoint implementation
Anton Kalashnikov created IGNITE-13681: -- Summary: Non markers checkpoint implementation Key: IGNITE-13681 URL: https://issues.apache.org/jira/browse/IGNITE-13681 Project: Ignite Issue Type: Sub-task Reporter: Anton Kalashnikov Assignee: Anton Kalashnikov It's needed to implement a new version of checkpoint which will be simpler than the current one. The main differences compared to the current checkpoint: * It doesn't contain any write operation to WAL. * It doesn't create checkpoint markers. * It should be possible to configure checkpoint listener only on the exact data region This checkpoint will be helpful for defragmentation and for recovery(it is not possible to use the current checkpoint during recovery right now) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-13569) disable archiving + walCompactionEnabled probably broke reading from wal on server restart
Anton Kalashnikov created IGNITE-13569: -- Summary: disable archiving + walCompactionEnabled probably broke reading from wal on server restart Key: IGNITE-13569 URL: https://issues.apache.org/jira/browse/IGNITE-13569 Project: Ignite Issue Type: Bug Reporter: Anton Kalashnikov Assignee: Anton Kalashnikov * Start cluster with 4 server node * Preload * Start 4 clients * Start transactional loading * Wait 10 sec While loading: For node in server nodes: Kill -9 node Wait 20 sec Return node back Wait 20 sec Wal + Wal_archive - lab40, lab41 - /storage/hdd/aromantsov/GG-18739 Looks like node can't read all wal files that was generated before start node back {noformat} [12:50:27,001][SEVERE][wal-file-compressor-%null%-1-#71][FileWriteAheadLogManager] Compression of WAL segment [idx=0] was skipped due to unexpected error class org.apache.ignite.IgniteCheckedException: WAL archive segment is missing: /storage/ssd/aromantsov/tiden/snapshots-190514-121520/test_pitr/node_1_1/.wal at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileCompressorWorker.body0(FileWriteAheadLogManager.java:2076) at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileCompressorWorker.body(FileWriteAheadLogManager.java:2054) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) at java.lang.Thread.run(Thread.java:748) [12:50:27,001][SEVERE][wal-file-compressor-%null%-0-#69][FileWriteAheadLogManager] Compression of WAL segment [idx=2] was skipped due to unexpected error class org.apache.ignite.IgniteCheckedException: WAL archive segment is missing: /storage/ssd/aromantsov/tiden/snapshots-190514-121520/test_pitr/node_1_1/0002.wal at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileCompressorWorker.body0(FileWriteAheadLogManager.java:2076) at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileCompressorWorker.access$4800(FileWriteAheadLogManager.java:2019) at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileCompressor.body(FileWriteAheadLogManager.java:1995) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) at java.lang.Thread.run(Thread.java:748) [12:50:27,001][SEVERE][wal-file-compressor-%null%-3-#73][FileWriteAheadLogManager] Compression of WAL segment [idx=3] was skipped due to unexpected error class org.apache.ignite.IgniteCheckedException: WAL archive segment is missing: /storage/ssd/aromantsov/tiden/snapshots-190514-121520/test_pitr/node_1_1/0003.wal at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileCompressorWorker.body0(FileWriteAheadLogManager.java:2076) at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileCompressorWorker.body(FileWriteAheadLogManager.java:2054) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) at java.lang.Thread.run(Thread.java:748) [12:50:27,001][SEVERE][wal-file-compressor-%null%-2-#72][FileWriteAheadLogManager] Compression of WAL segment [idx=1] was skipped due to unexpected error class org.apache.ignite.IgniteCheckedException: WAL archive segment is missing: /storage/ssd/aromantsov/tiden/snapshots-190514-121520/test_pitr/node_1_1/0001.wal at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileCompressorWorker.body0(FileWriteAheadLogManager.java:2076) at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileCompressorWorker.body(FileWriteAheadLogManager.java:2054) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) at java.lang.Thread.run(Thread.java:748) [12:50:27,002][SEVERE][wal-file-compressor-%null%-1-#71][FileWriteAheadLogManager] Compression of WAL segment [idx=4] was skipped due to unexpected error class org.apache.ignite.IgniteCheckedException: WAL archive segment is missing: /storage/ssd/aromantsov/tiden/snapshots-190514-121520/test_pitr/node_1_1/0004.wal at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileCompressorWorker.body0(FileWriteAheadLogManager.java:2076) at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileCompressorWorker.body(FileWriteAheadLogManager.java:2054) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) at java.lang.Thread.run(Thread.java:748) [12:50:27,002][SEVERE][wal-file-compressor-%null%-0-#69][FileWriteAheadLogManager]
[jira] [Created] (IGNITE-13562) Prototype dynamic configuration
Anton Kalashnikov created IGNITE-13562: -- Summary: Prototype dynamic configuration Key: IGNITE-13562 URL: https://issues.apache.org/jira/browse/IGNITE-13562 Project: Ignite Issue Type: Sub-task Reporter: Anton Kalashnikov Assignee: Semyon Danilov The main target to add a new extra configuration module with a framework that allows us to create dynamic properties(node local and cluster wide?). The framework should provide the following: * Describing a rule for the schema by which public and private property classes would be generated * Implementing generation public and private classes from schema * Describing a view of public POJO(update/insert/get) to interact with properties in a type-safe way * Converting the property from HOCON to the inner view -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-13511) Unified configuration
Anton Kalashnikov created IGNITE-13511: -- Summary: Unified configuration Key: IGNITE-13511 URL: https://issues.apache.org/jira/browse/IGNITE-13511 Project: Ignite Issue Type: New Feature Reporter: Anton Kalashnikov https://cwiki.apache.org/confluence/display/IGNITE/IEP-55+Unified+Configuration -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-13500) Checkpoint read lock fail if it is taking under write lock during the stopping node
Anton Kalashnikov created IGNITE-13500: -- Summary: Checkpoint read lock fail if it is taking under write lock during the stopping node Key: IGNITE-13500 URL: https://issues.apache.org/jira/browse/IGNITE-13500 Project: Ignite Issue Type: Bug Reporter: Anton Kalashnikov org.apache.ignite.internal.processors.cache.index.BasicIndexTest#testDynamicIndexesDropWithPersistence {noformat} [2020-09-30 15:09:26,085][ERROR][db-checkpoint-thread-#371%index.BasicIndexTest0%][Checkpointer] Runtime error caught during grid runnable execution: GridWorker [name=db-checkpoint-thread, igniteInstanceName=index.BasicIndexTest0, finished=false, heartbeatTs=1601467766063, hashCode=963964001, interrupted=false, runner=db-checkpoint-thread-#371%index.BasicIndexTest0%] class org.apache.ignite.IgniteException: Failed to perform cache update: node is stopping. at org.apache.ignite.internal.processors.cache.persistence.checkpoint.Checkpointer.doCheckpoint(Checkpointer.java:396) at org.apache.ignite.internal.processors.cache.persistence.checkpoint.Checkpointer.body(Checkpointer.java:263) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) at java.lang.Thread.run(Thread.java:748) Caused by: class org.apache.ignite.IgniteException: Failed to perform cache update: node is stopping. at org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointTimeoutLock.checkpointReadLock(CheckpointTimeoutLock.java:128) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.checkpointReadLock(GridCacheDatabaseSharedManager.java:1298) at org.apache.ignite.internal.processors.localtask.DurableBackgroundTasksProcessor.removeDurableBackgroundTask(DurableBackgroundTasksProcessor.java:245) at org.apache.ignite.internal.processors.localtask.DurableBackgroundTasksProcessor.onMarkCheckpointBegin(DurableBackgroundTasksProcessor.java:277) at org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointWorkflow.markCheckpointBegin(CheckpointWorkflow.java:274) at org.apache.ignite.internal.processors.cache.persistence.checkpoint.Checkpointer.doCheckpoint(Checkpointer.java:387) ... 3 more Caused by: class org.apache.ignite.internal.NodeStoppingException: Failed to perform cache update: node is stopping. ... 9 more {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-13368) Speed base throttling unexpectedly degraded to zero
Anton Kalashnikov created IGNITE-13368: -- Summary: Speed base throttling unexpectedly degraded to zero Key: IGNITE-13368 URL: https://issues.apache.org/jira/browse/IGNITE-13368 Project: Ignite Issue Type: Bug Reporter: Anton Kalashnikov Assignee: Anton Kalashnikov New test failure in master PagesWriteThrottleSmokeTest.testThrottle https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=2808794487465215609=%3Cdefault%3E=testDetails Throttling degraded to zero. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-13207) Checkpointer code refactoring: Splitting GridCacheDatabaseSharedManager ant Checkpointer
Anton Kalashnikov created IGNITE-13207: -- Summary: Checkpointer code refactoring: Splitting GridCacheDatabaseSharedManager ant Checkpointer Key: IGNITE-13207 URL: https://issues.apache.org/jira/browse/IGNITE-13207 Project: Ignite Issue Type: Sub-task Reporter: Anton Kalashnikov Assignee: Anton Kalashnikov -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-13080) Incorrect hash calculation for binaryObject in case of deduplication
Anton Kalashnikov created IGNITE-13080: -- Summary: Incorrect hash calculation for binaryObject in case of deduplication Key: IGNITE-13080 URL: https://issues.apache.org/jira/browse/IGNITE-13080 Project: Ignite Issue Type: Bug Components: binary Reporter: Anton Kalashnikov Lets suppose we have two follows classes(Implimentation of SubKey doesn't matter here): {noformat} public static class Key { private SubKey subKey; } public static class Value { private SubKey subKey; private Key key; } {noformat} If subKey would be same in Key and Value, and we try to do follows things: {noformat} SubKey subKey = new SubKey(); Key key = new Key(subKey); Value value = new Value(subKey, key); cache.put(key, value); assert cache.size() == 1; //true BinaryObject keyAsBinaryObject = cache.get(key).field("key"); cache.put(keyAsBinaryObject, value); // cache.size shuld be 1 but it would be 2 assert cache.size() == 1; //false because right now we have to different key which is wrong {noformat} We get two different record instead of one. Reason: When we put raw class Key to cache ignite convert it to binary object(literally to a byte array), and then calculate the hash over this byte array and store it to this object. When we put the raw class Value, the same thing happens, but since we have two references to the same object (subKey) inside Value, deduplication occurs. This means that the first time we meet an object, we save it as it is and remember its location, and then if we meet the same object again instead of saving all their bytes as is, we mark this place as HANDLE and record only the offset at which we can find the saved object. After that, we try to receive some object(Key) from BinaryObject of Value as a result we don't have a new BinaryObject with a new byte array but instead, we have BinaryObject with same byte array and with offset which shows us where we can find the requested value(Key). And when we try to store this object to cache, ignite does it incorrectly - first of all, byte array contains HANDLE mark with offset instead of real bytes of the inner object what is already wrong but more than it we also calculate hash incorrectly. Problem: Right now, Ignite isn't able to store BinaryObject with contains HANDLE. And as I understand, it's not so easy to fix. Maybe it makes sense just explicitly forbid to work with BinaryObject such described above but of course, it is discussable. Workaround: we can change the order of field in Value, like this: {noformat} public static class Value { private Key key; private SubKey subKey; } {noformat} After that subKey would be inlined inside of key and subKey inside of Value would be represented as HANDLE. Also we can rebuild the object such that: {noformat} keyAsBinaryObject.toBuilder().build(); {noformat} During the this procedure all HANDLE would be restored to real objects. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-13041) PDS (Indexing) is failed with 137 code
Anton Kalashnikov created IGNITE-13041: -- Summary: PDS (Indexing) is failed with 137 code Key: IGNITE-13041 URL: https://issues.apache.org/jira/browse/IGNITE-13041 Project: Ignite Issue Type: Bug Reporter: Anton Kalashnikov Assignee: Anton Kalashnikov Process exited with code 137 https://ci.ignite.apache.org/buildConfiguration/IgniteTests24Java8_PdsIndexing?branch=%3Cdefault%3E=overview=builds -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-12817) Streamer threads don't update timestamp
Anton Kalashnikov created IGNITE-12817: -- Summary: Streamer threads don't update timestamp Key: IGNITE-12817 URL: https://issues.apache.org/jira/browse/IGNITE-12817 Project: Ignite Issue Type: Bug Components: streaming Reporter: Anton Kalashnikov Scenario: 1. Start 3 data nodes 2. Start load with a streamer on 6 clients 3. Start data nodes restarter Result: Keys weren't loaded in all (1000) caches. In the server node log I see: {noformat} [2019-07-17 16:52:36,881][ERROR][tcp-disco-msg-worker-#2] Blocked system-critical thread has been detected. This can lead to cluster-wide undefined behaviour [threadName=data-streamer-stripe-7, blockedFor=16s] [2019-07-17 16:52:36,883][WARN ][tcp-disco-msg-worker-#2] Thread [name="data-streamer-stripe-7-#24", id=43, state=WAITING, blockCnt=111, waitCnt=169964] [2019-07-17 16:52:36,885][ERROR][tcp-disco-msg-worker-#2] Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker [name=data-streamer-stripe-7, igniteInstanceName=null, finished=false, heartbeatTs=1563371540069]]] org.apache.ignite.IgniteException: GridWorker [name=data-streamer-stripe-7, igniteInstanceName=null, finished=false, heartbeatTs=1563371540069] at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1838) ~[ignite-core-2.5.9.jar:2.5.9] at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1833) ~[ignite-core-2.5.9.jar:2.5.9] at org.apache.ignite.internal.worker.WorkersRegistry.onIdle(WorkersRegistry.java:230) ~[ignite-core-2.5.9.jar:2.5.9] at org.apache.ignite.internal.util.worker.GridWorker.onIdle(GridWorker.java:297) ~[ignite-core-2.5.9.jar:2.5.9] at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.lambda$new$0(ServerImpl.java:2804) ~[ignite-core-2.5.9.jar:2.5.9] at org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorker.body(ServerImpl.java:7568) [ignite-core-2.5.9.jar:2.5.9] at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2866) [ignite-core-2.5.9.jar:2.5.9] at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) [ignite-core-2.5.9.jar:2.5.9] at org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerThread.body(ServerImpl.java:7506) [ignite-core-2.5.9.jar:2.5.9] at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62) [ignite-core-2.5.9.jar:2.5.9] {noformat} The problem is in data streamer threads. They should update progress timestamps. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-12801) Possible extra page release when throttling and checkpoint thread store its concurrently
Anton Kalashnikov created IGNITE-12801: -- Summary: Possible extra page release when throttling and checkpoint thread store its concurrently Key: IGNITE-12801 URL: https://issues.apache.org/jira/browse/IGNITE-12801 Project: Ignite Issue Type: Bug Reporter: Anton Kalashnikov Assignee: Anton Kalashnikov * User thread acquire page on write release * Checkpoint thread sees that page was acquired * Throttling thread sees that page was acquired * Checkpoint thread saves page to disk and releases the page * Throttling thread understand that the page was already saved but nonetheless release this page again. - this is not ok. {noformat} java.lang.AssertionError: null at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.copyPageForCheckpoint(PageMemoryImpl.java:1181) at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.checkpointWritePage(PageMemoryImpl.java:1160) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$WriteCheckpointPages.writePages(GridCacheDatabaseSharedManager.java:4868) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$WriteCheckpointPages.run(GridCacheDatabaseSharedManager.java:4792) ... 3 common frames omitted {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-12714) Absence of default value of IGNITE_SYSTEM_WORKER_BLOCKED TIMEOUT
Anton Kalashnikov created IGNITE-12714: -- Summary: Absence of default value of IGNITE_SYSTEM_WORKER_BLOCKED TIMEOUT Key: IGNITE-12714 URL: https://issues.apache.org/jira/browse/IGNITE-12714 Project: Ignite Issue Type: Bug Reporter: Anton Kalashnikov Assignee: Anton Kalashnikov Scenario: 1. Start 3 data nodes 2. Start load with a streamer on 6 clients 3. Start data nodes restarter Result: Keys weren't loaded in all (1000) caches. In the server node log I see: {noformat} [2019-07-17 16:52:36,881][ERROR][tcp-disco-msg-worker-#2] Blocked system-critical thread has been detected. This can lead to cluster-wide undefined behaviour [threadName=data-streamer-stripe-7, blockedFor=16s] [2019-07-17 16:52:36,883][WARN ][tcp-disco-msg-worker-#2] Thread [name="data-streamer-stripe-7-#24", id=43, state=WAITING, blockCnt=111, waitCnt=169964] [2019-07-17 16:52:36,885][ERROR][tcp-disco-msg-worker-#2] Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker [name=data-streamer-stripe-7, igniteInstanceName=null, finished=false, heartbeatTs=1563371540069]]] org.apache.ignite.IgniteException: GridWorker [name=data-streamer-stripe-7, igniteInstanceName=null, finished=false, heartbeatTs=1563371540069] at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1838) ~[ignite-core-2.5.9.jar:2.5.9] at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1833) ~[ignite-core-2.5.9.jar:2.5.9] at org.apache.ignite.internal.worker.WorkersRegistry.onIdle(WorkersRegistry.java:230) ~[ignite-core-2.5.9.jar:2.5.9] at org.apache.ignite.internal.util.worker.GridWorker.onIdle(GridWorker.java:297) ~[ignite-core-2.5.9.jar:2.5.9] at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.lambda$new$0(ServerImpl.java:2804) ~[ignite-core-2.5.9.jar:2.5.9] at org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorker.body(ServerImpl.java:7568) [ignite-core-2.5.9.jar:2.5.9] at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2866) [ignite-core-2.5.9.jar:2.5.9] at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) [ignite-core-2.5.9.jar:2.5.9] at org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerThread.body(ServerImpl.java:7506) [ignite-core-2.5.9.jar:2.5.9] at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62) [ignite-core-2.5.9.jar:2.5.9] {noformat} Logs: ftp://gg@172.25.2.50/poc-tester-logs/1723/log-2019-07-17-17-33-23 Log with dumps: ftp://gg@172.25.2.50/poc-tester-logs/1723/log-2019-07-17-17-33-23/servers/172.25.1.12/poc-tester-server-172.25.1.12-id-0-2019-07-17-16-46-58.log-1-2019-07-17.log.gz *Solution:* Increase timeout to 2 min org.apache.ignite.IgniteSystemProperties#IGNITE_SYSTEM_WORKER_BLOCKED_TIMEOUT -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-12713) [Suite] PDS 1 flaky failed on TC
Anton Kalashnikov created IGNITE-12713: -- Summary: [Suite] PDS 1 flaky failed on TC Key: IGNITE-12713 URL: https://issues.apache.org/jira/browse/IGNITE-12713 Project: Ignite Issue Type: Bug Reporter: Anton Kalashnikov Assignee: Anton Kalashnikov IgnitePdsTestSuite: BPlusTreeReuseListPageMemoryImplTest.testIterateConcurrentPutRemove_2 ⚂IgnitePdsTestSuite: BPlusTreeReuseListPageMemoryImplTest.testMassiveRemove2_false -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-12712) NPE in checkpoint thread
Anton Kalashnikov created IGNITE-12712: -- Summary: NPE in checkpoint thread Key: IGNITE-12712 URL: https://issues.apache.org/jira/browse/IGNITE-12712 Project: Ignite Issue Type: Bug Reporter: Anton Kalashnikov Assignee: Anton Kalashnikov NPE occured in checkpoint thread (rare reproducing): {noformat} [2019-11-04 20:54:58,018][INFO ][sys-#50][GridDhtPartitionsExchangeFuture] Received full message, will finish exchange [node=1784645d-3bef-44fe-8288-e0c16202f5e3, resVer=AffinityTopologyVersion [topVer=4, minorTopVer=9]] [2019-11-04 20:54:58,023][INFO ][sys-#50][GridDhtPartitionsExchangeFuture] Finish exchange future [startVer=AffinityTopologyVersion [topVer=4, minorTopVer=9], resVer=AffinityTopologyVersion [topVer=4, minorTopVer=9], err=null] [2019-11-04 20:54:58,029][INFO ][sys-#50][GridCacheProcessor] Finish proxy initialization, cacheName=SQL_PUBLIC_T8, localNodeId=5b153e14-70f2-4408-a125-584752532ebd [2019-11-04 20:54:58,030][INFO ][sys-#50][GridDhtPartitionsExchangeFuture] Completed partition exchange [localNode=5b153e14-70f2-4408-a125-584752532ebd, exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion [topVer=4, minorTopVer=9], evt=DISCOVERY_CUSTOM_EVT, evtNode=TcpDiscoveryNode [id=1784645d-3bef-44fe-8288-e0c16202f5e3, consistentId=1, addrs=ArrayList [127.0.0.1], sockAddrs=HashSet [/127.0.0.1:47500], discPort=47500, order=1, intOrder=1, lastExchangeTime=1572890071469, loc=false, ver=8.7.8#20191101-sha1:e344ed04, isClient=false], done=true, newCrdFut=null], topVer=AffinityTopologyVersion [topVer=4, minorTopVer=9]] [2019-11-04 20:54:58,030][INFO ][sys-#50][GridDhtPartitionsExchangeFuture] Exchange timings [startVer=AffinityTopologyVersion [topVer=4, minorTopVer=9], resVer=AffinityTopologyVersion [topVer=4, minorTopVer=9], stage="Waiting in exchange queue" (0 ms), stage="Exchange parameters initialization" (0 ms), stage="Update caches registry" (0 ms), stage="Start caches" (52 ms), stage="Affinity initialization on cache group start" (1 ms), stage="Determine exchange type" (0 ms), stage="Preloading notification" (0 ms), stage="WAL history reservation" (0 ms), stage="Wait partitions release" (1 ms), stage="Wait partitions release latch" (5 ms), stage="Wait partitions release" (0 ms), stage="Restore partition states" (7 ms), stage="After states restored callback" (10 ms), stage="Waiting for Full message" (59 ms), stage="Affinity recalculation" (0 ms), stage="Full map updating" (4 ms), stage="Exchange done" (7 ms), stage="Total time" (146 ms)] [2019-11-04 20:54:58,030][INFO ][sys-#50][GridDhtPartitionsExchangeFuture] Exchange longest local stages [startVer=AffinityTopologyVersion [topVer=4, minorTopVer=9], resVer=AffinityTopologyVersion [topVer=4, minorTopVer=9], stage="Affinity initialization on cache group start [grp=SQL_PUBLIC_T8]" (1 ms) (parent=Affinity initialization on cache group start), stage="Restore partition states [grp=SQL_PUBLIC_T8]" (6 ms) (parent=Restore partition states), stage="Restore partition states [grp=ignite-sys-cache]" (3 ms) (parent=Restore partition states), stage="Restore partition states [grp=cache_group_3]" (0 ms) (parent=Restore partition states)] [2019-11-04 20:54:58,037][INFO ][exchange-worker-#45][GridCachePartitionExchangeManager] Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion [topVer=4, minorTopVer=9], force=false, evt=DISCOVERY_CUSTOM_EVT, node=1784645d-3bef-44fe-8288-e0c16202f5e3] [2019-11-04 20:54:58,713][INFO ][db-checkpoint-thread-#53][GridCacheDatabaseSharedManager] Checkpoint started [checkpointId=82969270-b1a5-4480-9513-3af65bab0e17, startPtr=FileWALPointer [idx=0, fileOff=3550077, len=12350], checkpointBeforeLockTime=8ms, checkpointLockWait=4ms, checkpointListenersExecuteTime=56ms, checkpointLockHoldTime=61ms, walCpRecordFsyncDuration=4ms, writeCheckpointEntryDuration=8ms, splitAndSortCpPagesDuration=1ms, pages=178, reason='timeout'] [2019-11-04 20:54:58,914][INFO ][exchange-worker-#45][time] Started exchange init [topVer=AffinityTopologyVersion [topVer=4, minorTopVer=10], crd=false, evt=DISCOVERY_CUSTOM_EVT, evtNode=1784645d-3bef-44fe-8288-e0c16202f5e3, customEvt=DynamicCacheChangeBatch [id=8b06d873e61-af9e27a6-8fe9-4da1-bc0a-d19cd0eabd36, reqs=ArrayList [DynamicCacheChangeRequest [cacheName=SQL_PUBLIC_T9, hasCfg=true, nodeId=1784645d-3bef-44fe-8288-e0c16202f5e3, clientStartOnly=false, stop=false, destroy=false, disabledAfterStartfalse]], exchangeActions=ExchangeActions [startCaches=[SQL_PUBLIC_T9], stopCaches=null, startGrps=[cache_group_4], stopGrps=[], resetParts=null, stateChangeRequest=null], startCaches=false], allowMerge=false] [2019-11-04 20:54:58,930][INFO ][exchange-worker-#45][PageMemoryImpl] Started page memory [memoryAllocated=200.0 MiB, pages=49630, tableSize=3.9 MiB, checkpointBuffer=200.0 MiB] [2019-11-04
[jira] [Created] (IGNITE-12709) Server latch initialized after client latch in Zookeeper discovery
Anton Kalashnikov created IGNITE-12709: -- Summary: Server latch initialized after client latch in Zookeeper discovery Key: IGNITE-12709 URL: https://issues.apache.org/jira/browse/IGNITE-12709 Project: Ignite Issue Type: Bug Reporter: Anton Kalashnikov Assignee: Anton Kalashnikov The coordinator node missed latch message from the client because it doesn't receive a triggered message of exchange. So it leads to infinity wait of answer from the coordinator. {noformat} [2019-10-23 12:49:42,110]\[ERROR]\[sys-#39470%continuous.GridEventConsumeSelfTest0%]\[GridIoManager] An error occurred processing the message \[msg=GridIoMessage \[plc=2, topic=TOPIC_EXCHANGE, topicOrd=31, ordered=fa lse, timeout=0, skipOnTimeout=false, msg=org.apache.ignite.internal.processors.cache.distributed.dht.preloader.latch.LatchAckMessage@7699f4f2], nodeId=857a40a8-f384-4740-816c-dd54d3a1]. class org.apache.ignite.IgniteException: Topology AffinityTopologyVersion \[topVer=54, minorTopVer=0] not found in discovery history ; consider increasing IGNITE_DISCOVERY_HISTORY_SIZE property. Current value is -1 at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.latch.ExchangeLatchManager.aliveNodesForTopologyVer(ExchangeLatchManager.java:292) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.latch.ExchangeLatchManager.getLatchCoordinator(ExchangeLatchManager.java:334) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.latch.ExchangeLatchManager.processAck(ExchangeLatchManager.java:379) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.latch.ExchangeLatchManager.lambda$new$0(ExchangeLatchManager.java:119) at org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1632) at org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1252) at org.apache.ignite.internal.managers.communication.GridIoManager.access$4300(GridIoManager.java:143) at org.apache.ignite.internal.managers.communication.GridIoManager$8.execute(GridIoManager.java:1143) at org.apache.ignite.internal.managers.communication.TraceRunnable.run(TraceRunnable.java:50) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) [2019-10-23 12:50:02,106]\[WARN ]\[exchange-worker-#39517%continuous.GridEventConsumeSelfTest1%]\[GridDhtPartitionsExchangeFuture] Unable to await partitions release latch within timeout: ClientLatch \[coordinator=ZookeeperClusterNode \[id=760ca6b5-f30b-4c40-81b1-5b602c20, addrs=\[127.0.0.1], order=1, loc=false, client=false], ackSent=true, super=CompletableLatch \[id=CompletableLatchUid \[id=exchange, topVer=AffinityTopologyVersion \[topVer=54, minorTopVer=0 [2019-10-23 12:50:02,192]\[WARN ]\[exchange-worker-#39469%continuous.GridEventConsumeSelfTest0%]\[GridDhtPartitionsExchangeFuture] Unable to await partitions release latch within timeout: ServerLatch \[permits=1, pendingAcks=HashSet \[06c3094b-c1f3-4fe8-81e8-22cb6602], super=CompletableLatch \[id=CompletableLatchUid \[id=exchange, topVer=AffinityTopologyVersion \[topVer=54, minorTopVer=0 {noformat} Reproduced by org.apache.ignite.internal.processors.continuous.GridEventConsumeSelfTest#testMultithreadedWithNodeRestart -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-12653) Add example of baseline auto-adjust feature
Anton Kalashnikov created IGNITE-12653: -- Summary: Add example of baseline auto-adjust feature Key: IGNITE-12653 URL: https://issues.apache.org/jira/browse/IGNITE-12653 Project: Ignite Issue Type: Task Components: examples Reporter: Anton Kalashnikov Work on the Phase II of IEP-4 (Baseline topology) [1] has finished. It makes sense to implement some examples of "Baseline auto-adjust" [2]. "Baseline auto-adjust" feature implements mechanism of auto-adjust baseline corresponding to current topology after event join/left was appeared. It is required because when a node left the grid and nobody would change baseline manually it can lead to lost data(when some more nodes left the grid on depends in backup factor) but permanent tracking of grid is not always possible/desirible. Looks like in many cases auto-adjust baseline after some timeout is very helpfull. Distributed metastore[3](it is already done): First of all it is required the ability to store configuration data consistently and cluster-wide. Ignite doesn't have any specific API for such configurations and we don't want to have many similar implementations of the same feature in our code. After some thoughts is was proposed to implement it as some kind of distributed metastorage that gives the ability to store any data in it. First implementation is based on existing local metastorage API for persistent clusters (in-memory clusters will store data in memory). Write/remove operation use Discovery SPI to send updates to the cluster, it guarantees updates order and the fact that all existing (alive) nodes have handled the update message. As a way to find out which node has the latest data there is a "version" value of distributed metastorage, which is basically . All updates history until some point in the past is stored along with the data, so when an outdated node connects to the cluster it will receive all the missing data and apply it locally. If there's not enough history stored or joining node is clear then it'll receive shapshot of distributed metastorage so there won't be inconsistencies. Baseline auto-adjust: Main scenario: - There is a grid with the baseline is equal to the current topology - New node joins to grid or some node left(failed) the grid - New mechanism detects this event and it add a task for changing baseline to queue with configured timeout - If a new event happens before baseline would be changed task would be removed from the queue and a new task will be added - When a timeout is expired the task would try to set new baseline corresponded to current topology First of all we need to add two parameters[4]: - baselineAutoAdjustEnabled - enable/disable "Baseline auto-adjust" feature. - baselineAutoAdjustTimeout - timeout after which baseline should be changed. These parameters are cluster-wide and can be changed in real-time because it is based on "Distributed metastore". Restrictions: - This mechanism handling events only on active grid - for in-memory nodes - enabled by default. For persistent nodes - disabled. - If lost partitions was detected this feature would be disabled - If baseline was adjusted manually on baselineNodes != gridNodes the exception would be thrown [1] https://cwiki.apache.org/confluence/display/IGNITE/IEP-4+Baseline+topology+for+caches [2] https://issues.apache.org/jira/browse/IGNITE-8571 [3] https://issues.apache.org/jira/browse/IGNITE-10640 [4] https://issues.apache.org/jira/browse/IGNITE-8573 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-12652) Add example of failure handling
Anton Kalashnikov created IGNITE-12652: -- Summary: Add example of failure handling Key: IGNITE-12652 URL: https://issues.apache.org/jira/browse/IGNITE-12652 Project: Ignite Issue Type: Task Components: examples Reporter: Anton Kalashnikov Ignite has the following feature - https://apacheignite.readme.io/docs/critical-failures-handling, but there is not an example of how to use it correctly. So it is good to add some examples. Also, Ignite has DiagnosticProcessor which invokes when the failure handler is triggered. Maybe it is a good idea to add to this example some samples of diagnostic work. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-12647) Get rid of IGFS and Hadoop Accelerator
Anton Kalashnikov created IGNITE-12647: -- Summary: Get rid of IGFS and Hadoop Accelerator Key: IGNITE-12647 URL: https://issues.apache.org/jira/browse/IGNITE-12647 Project: Ignite Issue Type: Improvement Reporter: Anton Kalashnikov Assignee: Anton Kalashnikov There is no single committer who maintains the integrations; they are no longer tested and, even more, the community stopped providing the binaries since Ignite 2.6.0 release (look for In-Memory Hadoop Accelerator table). So it makes sense to get rid of IGFS and Hadoop Accelerator -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-12631) Incorrect rewriting wal record type in marshalled mode during iteration
Anton Kalashnikov created IGNITE-12631: -- Summary: Incorrect rewriting wal record type in marshalled mode during iteration Key: IGNITE-12631 URL: https://issues.apache.org/jira/browse/IGNITE-12631 Project: Ignite Issue Type: Bug Reporter: Anton Kalashnikov Assignee: Anton Kalashnikov The fail happens on iteration over wal record which was written under marshalled mode in case when RecordType#ordinal != RecordType#index {noformat} [16:46:58,800][SEVERE][pitr-ctx-exec-#399][GridRecoveryProcessor] Fail scan wal log for recovery localNodeConstId=node_1_1 class org.apache.ignite.IgniteCheckedException: Failed to read WAL record at position: 45905 size: -1 at org.apache.ignite.internal.processors.cache.persistence.wal.AbstractWalRecordsIterator.handleRecordException(AbstractWalRecordsIterator.java:292) at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$RecordsIterator.handleRecordException(FileWriteAheadLogManager.java:3302) at org.apache.ignite.internal.processors.cache.persistence.wal.AbstractWalRecordsIterator.advanceRecord(AbstractWalRecordsIterator.java:258) at org.apache.ignite.internal.processors.cache.persistence.wal.AbstractWalRecordsIterator.advance(AbstractWalRecordsIterator.java:154) at org.apache.ignite.internal.processors.cache.persistence.wal.AbstractWalRecordsIterator.onNext(AbstractWalRecordsIterator.java:123) at org.apache.ignite.internal.processors.cache.persistence.wal.AbstractWalRecordsIterator.onNext(AbstractWalRecordsIterator.java:52) at org.apache.ignite.internal.util.GridCloseableIteratorAdapter.nextX(GridCloseableIteratorAdapter.java:41) at org.apache.ignite.internal.util.lang.GridIteratorAdapter.next(GridIteratorAdapter.java:35) ... 7 more Caused by: class org.apache.ignite.IgniteCheckedException: Failed to read WAL record at position: 45905 size: -1 at org.apache.ignite.internal.processors.cache.persistence.wal.serializer.RecordV1Serializer.readWithCrc(RecordV1Serializer.java:394) at org.apache.ignite.internal.processors.cache.persistence.wal.serializer.RecordV2Serializer.readRecord(RecordV2Serializer.java:235) at org.apache.ignite.internal.processors.cache.persistence.wal.AbstractWalRecordsIterator.advanceRecord(AbstractWalRecordsIterator.java:243) ... 12 more Caused by: java.io.IOException: Unknown record type: null, expected pointer [idx=2, offset=45905] at org.apache.ignite.internal.processors.cache.persistence.wal.serializer.RecordV2Serializer$2.readWithHeaders(RecordV2Serializer.java:122) at org.apache.ignite.internal.processors.cache.persistence.wal.serializer.RecordV1Serializer.readWithCrc(RecordV1Serializer.java:373) ... 14 more Suppressed: class org.apache.ignite.internal.processors.cache.persistence.wal.crc.IgniteDataIntegrityViolationException: val: 1445348818 writtenCrc: 374280888 at org.apache.ignite.internal.processors.cache.persistence.wal.io.FileInput$Crc32CheckingFileInput.close(FileInput.java:106) at org.apache.ignite.internal.processors.cache.persistence.wal.serializer.RecordV1Serializer.readWithCrc(RecordV1Serializer.java:380) ... 14 more {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-12594) Deadlock between GridCacheDataStore#purgeExpiredInternal and GridNearTxLocal#enlistWriteEntry
Anton Kalashnikov created IGNITE-12594: -- Summary: Deadlock between GridCacheDataStore#purgeExpiredInternal and GridNearTxLocal#enlistWriteEntry Key: IGNITE-12594 URL: https://issues.apache.org/jira/browse/IGNITE-12594 Project: Ignite Issue Type: Bug Reporter: Anton Kalashnikov Assignee: Anton Kalashnikov The deadlock is reproduced occasionally in PDS3 suite and can be seen in the thread dump below. One thread attempts to unwind evicts, acquires checkpoint read lock and then locks {{GridCacheMapEntry}}. Another thread does {{GridCacheMapEntry#unswap}}, determines that the entry is expired and acquires checkpoint read lock to remove the entry from the store. We should not acquire checkpoint read lock inside of a locked {{GridCacheMapEntry}}. {code:java}Thread [name="updater-1", id=29900, state=WAITING, blockCnt=2, waitCnt=4450] Lock [object=java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@2fc51685, ownerName=null, ownerId=-1] at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283) at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727) at o.a.i.i.processors.cache.persistence.GridCacheDatabaseSharedManager.checkpointReadLock(GridCacheDatabaseSharedManager.java:1632) <- CP read lock at o.a.i.i.processors.cache.GridCacheMapEntry.onExpired(GridCacheMapEntry.java:4081) at o.a.i.i.processors.cache.GridCacheMapEntry.unswap(GridCacheMapEntry.java:559) at o.a.i.i.processors.cache.GridCacheMapEntry.unswap(GridCacheMapEntry.java:519) <- locked entry at o.a.i.i.processors.cache.distributed.near.GridNearTxLocal.enlistWriteEntry(GridNearTxLocal.java:1437) at o.a.i.i.processors.cache.distributed.near.GridNearTxLocal.enlistWrite(GridNearTxLocal.java:1303) at o.a.i.i.processors.cache.distributed.near.GridNearTxLocal.putAllAsync0(GridNearTxLocal.java:957) at o.a.i.i.processors.cache.distributed.near.GridNearTxLocal.putAllAsync(GridNearTxLocal.java:491) at o.a.i.i.processors.cache.GridCacheAdapter$29.inOp(GridCacheAdapter.java:2526) at o.a.i.i.processors.cache.GridCacheAdapter$SyncInOp.op(GridCacheAdapter.java:4727) at o.a.i.i.processors.cache.GridCacheAdapter.syncOp(GridCacheAdapter.java:3740) at o.a.i.i.processors.cache.GridCacheAdapter.putAll0(GridCacheAdapter.java:2524) at o.a.i.i.processors.cache.GridCacheAdapter.putAll(GridCacheAdapter.java:2513) at o.a.i.i.processors.cache.IgniteCacheProxyImpl.putAll(IgniteCacheProxyImpl.java:1264) at o.a.i.i.processors.cache.GatewayProtectedCacheProxy.putAll(GatewayProtectedCacheProxy.java:863) at o.a.i.i.processors.cache.persistence.IgnitePdsContinuousRestartTest$1.call(IgnitePdsContinuousRestartTest.java:291) at o.a.i.testframework.GridTestThread.run(GridTestThread.java:83) Locked synchronizers: java.util.concurrent.locks.ReentrantLock$NonfairSync@762613f7 Thread [name="sys-stripe-0-#24086%persistence.IgnitePdsContinuousRestartTestWithExpiryPolicy0%", id=29617, state=WAITING, blockCnt=2, waitCnt=65381] Lock [object=java.util.concurrent.locks.ReentrantLock$NonfairSync@762613f7, ownerName=updater-1, ownerId=29900] at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209) at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285) <- lock entry at o.a.i.i.processors.cache.GridCacheMapEntry.lockEntry(GridCacheMapEntry.java:5017) at o.a.i.i.processors.cache.GridCacheMapEntry.markObsoleteVersion(GridCacheMapEntry.java:2799) at o.a.i.i.processors.cache.distributed.dht.topology.GridDhtLocalPartition.removeVersionedEntry(GridDhtLocalPartition.java:392) at
[jira] [Created] (IGNITE-12593) Corruption of B+Tree caused by byte array values and TTL
Anton Kalashnikov created IGNITE-12593: -- Summary: Corruption of B+Tree caused by byte array values and TTL Key: IGNITE-12593 URL: https://issues.apache.org/jira/browse/IGNITE-12593 Project: Ignite Issue Type: Bug Reporter: Anton Kalashnikov Assignee: Anton Kalashnikov It seems that the following set of parameters may lead to a corruption of B+Tree: - persistence is enabled - TTL is enabled - Expiry policy - AccessedExpiryPolicy 1 sec. - cache value type is byte[] - all caches belong to the same cache group Example of the stack trace: {code:java} [2019-07-16 21:13:19,288][ERROR][sys-stripe-2-#46%db.IgnitePdsWithTtlDeactivateOnHighloadTest1%][IgniteTestResources] Critical system error detected. Will be handled accordingly to configured handler [hnd=NoOpFailureHandler [super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=CRITICAL_ERROR, err=class o.a.i.i.processors.cache.persistence.tree.CorruptedTreeException: B+Tree is corrupted [pages(groupId, pageId)=[IgniteBiTuple [val1=-1237460590, val2=281586645860358]], msg=Runtime failure on search row: SearchRow [key=KeyCacheObjectImpl [part=26, val=378, hasValBytes=true], hash=378, cacheId=-1806498247 class org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException: B+Tree is corrupted [pages(groupId, pageId)=[IgniteBiTuple [val1=-1237460590, val2=281586645860358]], msg=Runtime failure on search row: SearchRow [key=KeyCacheObjectImpl [part=26, val=378, hasValBytes=true], hash=378, cacheId=-1806498247]] at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.corruptedTreeException(BPlusTree.java:5910) at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invoke(BPlusTree.java:1859) at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke0(IgniteCacheOffheapManagerImpl.java:1662) at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke(IgniteCacheOffheapManagerImpl.java:1645) at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.invoke(GridCacheOffheapManager.java:2410) at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.invoke(IgniteCacheOffheapManagerImpl.java:445) at org.apache.ignite.internal.processors.cache.GridCacheMapEntry.innerUpdate(GridCacheMapEntry.java:2309) at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateSingle(GridDhtAtomicCache.java:2570) at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.update(GridDhtAtomicCache.java:2030) at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal0(GridDhtAtomicCache.java:1848) at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal(GridDhtAtomicCache.java:1668) at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.processNearAtomicUpdateRequest(GridDhtAtomicCache.java:3235) at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$400(GridDhtAtomicCache.java:139) at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$5.apply(GridDhtAtomicCache.java:273) at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$5.apply(GridDhtAtomicCache.java:268) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1141) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:591) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:392) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:318) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:109) at org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:308) at org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1558) at org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1186) at org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:125) at
[jira] [Created] (IGNITE-12463) Inconsistancy of checkpoint progress future with its state
Anton Kalashnikov created IGNITE-12463: -- Summary: Inconsistancy of checkpoint progress future with its state Key: IGNITE-12463 URL: https://issues.apache.org/jira/browse/IGNITE-12463 Project: Ignite Issue Type: Bug Reporter: Anton Kalashnikov Assignee: Anton Kalashnikov It needs to reorganize checkpoint futures(start, finish) so they should be matched to states. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-12460) Cluster fails to find the node by consistent ID
Anton Kalashnikov created IGNITE-12460: -- Summary: Cluster fails to find the node by consistent ID Key: IGNITE-12460 URL: https://issues.apache.org/jira/browse/IGNITE-12460 Project: Ignite Issue Type: Bug Reporter: Anton Kalashnikov Assignee: Anton Kalashnikov Steps to reproduce 1: * Start cluster of three nodes * Navigate to Baseline screen * Start one more node * Include it into baseline * Hit 'Save' btn Expected: * Success alert, node enters baseline Actual: * Exception is thrown and is displayed Steps to reproduce 2: # Start topology with 2 nodes. # Activate cluster. # Start third node. # Stop second node. # Try to add third node to baseline in Web console. Also reproduced with *control.sh --baseline set* command. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-12459) Searching checkpoint record in WAL doesn't work with segment compaction
Anton Kalashnikov created IGNITE-12459: -- Summary: Searching checkpoint record in WAL doesn't work with segment compaction Key: IGNITE-12459 URL: https://issues.apache.org/jira/browse/IGNITE-12459 Project: Ignite Issue Type: Bug Reporter: Anton Kalashnikov Assignee: Anton Kalashnikov During iteration over WAL we have two invariants about result(Tuple): * WALPointer equal to WALRecord.position() when segment is uncompacted * WALPointer not equal to WALRecord.position() when the segment is compacted Unfortunately, the second variant is broken in FileWriteAheadLogManager#read(WALPointer ptr) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-12227) Default auto-adjust baseline enabled flag calculated incorrectly in some cases
Anton Kalashnikov created IGNITE-12227: -- Summary: Default auto-adjust baseline enabled flag calculated incorrectly in some cases Key: IGNITE-12227 URL: https://issues.apache.org/jira/browse/IGNITE-12227 Project: Ignite Issue Type: Bug Reporter: Anton Kalashnikov Assignee: Anton Kalashnikov baselineAutoAdjustEnabled can be been different on different nodes because of the calculation of default value happening locally on each node and including only local configuration. It issue can happen by the following reasons: * If IGNITE_BASELINE_AUTO_ADJUST_ENABLED flag set to a different value on different nodes it leads to cluster hanging due to baseline calculation finishing with the unpredictable state on each node. * if cluster in mixed mode(included in-memory and persistent nodes) sometimes flag is set to a different value due to calculation doesn't consider remote nodes configuration. Possible solution(both points required): * Get rid of IGNITE_BASELINE_AUTO_ADJUST_ENABLED and replace it by the explicit call of IgniteCluster#baselineAutoAdjustEnabled where it required(test only). * Calculating default value on the first started node as early as possible(instead of activation) and this value always should be set to distributed metastorage(unlike it happening now). It means that instead of awaiting activation, the default value would be calculated by the first started node. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-12179) Test and javadoc fixes
Anton Kalashnikov created IGNITE-12179: -- Summary: Test and javadoc fixes Key: IGNITE-12179 URL: https://issues.apache.org/jira/browse/IGNITE-12179 Project: Ignite Issue Type: Bug Reporter: Anton Kalashnikov Assignee: Anton Kalashnikov Some javadoc package descriptions missed: * org.apache.ignite.spi.communication.tcp.internal * org.apache.ignite.spi.discovery.zk * org.apache.ignite.spi.discovery.zk.internal * org.apache.ignite.ml.structures.partition * org.gridgain.grid.persistentstore.snapshot.file.copy Unclear CLEANUP_RESTARTING_CACHES command in snapshot utility unclear error when connecting to secure cluster (SSL + Auth) Update log message to avoid confusion for an user *.testTtlNoTx flaky failed on TC TcpCommunicationSpiFreezingClientTest failed TcpCommunicationSpiFaultyClientSslTest.testNotAcceptedConnection failed testCacheIdleVerifyPrintLostPartitions failed -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (IGNITE-12154) Test testCheckpointFailBeforeMarkEntityWrite fail in compression suit
Anton Kalashnikov created IGNITE-12154: -- Summary: Test testCheckpointFailBeforeMarkEntityWrite fail in compression suit Key: IGNITE-12154 URL: https://issues.apache.org/jira/browse/IGNITE-12154 Project: Ignite Issue Type: Bug Reporter: Anton Kalashnikov Assignee: Anton Kalashnikov CheckpointFailBeforeWriteMarkTest.testCheckpointFailBeforeMarkEntityWrite https://ci.ignite.apache.org/viewLog.html?buildId=4584051=IgniteTests24Java8_DiskPageCompressions=buildResultsDiv_IgniteTests24Java8=%3Cdefault%3E -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (IGNITE-12121) Double checkpoint triggering due to incorrect place of update current checkpoint
Anton Kalashnikov created IGNITE-12121: -- Summary: Double checkpoint triggering due to incorrect place of update current checkpoint Key: IGNITE-12121 URL: https://issues.apache.org/jira/browse/IGNITE-12121 Project: Ignite Issue Type: Bug Reporter: Anton Kalashnikov Assignee: Anton Kalashnikov Double checkpoint triggering due to incorrect place of update current checkpoint. This can lead to two ckeckpoints one by one if checkpoint trigger was 'too many dirty pages'. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (IGNITE-11982) Fix bugs of pds
Anton Kalashnikov created IGNITE-11982: -- Summary: Fix bugs of pds Key: IGNITE-11982 URL: https://issues.apache.org/jira/browse/IGNITE-11982 Project: Ignite Issue Type: Bug Reporter: Anton Kalashnikov Assignee: Anton Kalashnikov fixed pds crash: * Fail during logical recovery * JVM crash in all compatibility LFS tests * WAL segments serialization problem * Unable to read last WAL record after crash during checkpoint * Node failed on detecting storage block size if page compression enabled on many caches * Can not change baseline for in-memory cluster * SqlFieldsQuery DELETE FROM causes JVM crash * Fixed IgniteCheckedException: Compound exception for CountDownFuture. fixed tests: * WalCompactionAndPageCompressionTest * IgnitePdsRestartAfterFailedToWriteMetaPageTest.test * GridPointInTimeRecoveryRebalanceTest.testRecoveryNotFailsIfWalSomewhereEnab * IgniteClusterActivateDeactivateTest.testDeactivateSimple_5_Servers_5_Clients_Fro * IgniteCacheReplicatedQuerySelfTest.testNodeLeft * .NET tests optimization: * Replace TcpDiscoveryNode to nodeId in TcpDiscoveryMessages * Failures to deserialize discovery data should be handled by a failure handler * Optimize GridToStringBuilder -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (IGNITE-11969) Incorrect DefaultConcurrencyLevel value in .net test
Anton Kalashnikov created IGNITE-11969: -- Summary: Incorrect DefaultConcurrencyLevel value in .net test Key: IGNITE-11969 URL: https://issues.apache.org/jira/browse/IGNITE-11969 Project: Ignite Issue Type: Test Reporter: Anton Kalashnikov Assignee: Anton Kalashnikov Incorrect DefaultConcurrencyLevel value in .net test after default configuration in java was changed -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11892) Incorrect assert in wal scanner test
Anton Kalashnikov created IGNITE-11892: -- Summary: Incorrect assert in wal scanner test Key: IGNITE-11892 URL: https://issues.apache.org/jira/browse/IGNITE-11892 Project: Ignite Issue Type: Improvement Reporter: Anton Kalashnikov Assignee: Anton Kalashnikov https://ci.ignite.apache.org/viewLog.html?buildId=4038516=IgniteTests24Java8_Pds2 {noformat} junit.framework.AssertionFailedError: Next WAL record :: Record : PAGE_RECORD - Unable to convert to string representation. at org.apache.ignite.internal.processors.cache.persistence.wal.scanner.WalScannerTest.shouldDumpToFileFoundRecord(WalScannerTest.java:254) {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11818) Support JMX/control.sh for debug page info
Anton Kalashnikov created IGNITE-11818: -- Summary: Support JMX/control.sh for debug page info Key: IGNITE-11818 URL: https://issues.apache.org/jira/browse/IGNITE-11818 Project: Ignite Issue Type: Improvement Reporter: Anton Kalashnikov Assignee: Anton Kalashnikov Support JMX/control.sh for debug page info -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11816) Debug processor for dump page history info
Anton Kalashnikov created IGNITE-11816: -- Summary: Debug processor for dump page history info Key: IGNITE-11816 URL: https://issues.apache.org/jira/browse/IGNITE-11816 Project: Ignite Issue Type: Improvement Reporter: Anton Kalashnikov Assignee: Anton Kalashnikov Debug processor for dump page history info -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11782) WAL iterator for collect per-pageId info
Anton Kalashnikov created IGNITE-11782: -- Summary: WAL iterator for collect per-pageId info Key: IGNITE-11782 URL: https://issues.apache.org/jira/browse/IGNITE-11782 Project: Ignite Issue Type: Improvement Reporter: Anton Kalashnikov Assignee: Anton Kalashnikov Implement WAL iterator for collect per-pageId info (page is root) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11678) Forbidding joining persistence node to in-memory cluster
Anton Kalashnikov created IGNITE-11678: -- Summary: Forbidding joining persistence node to in-memory cluster Key: IGNITE-11678 URL: https://issues.apache.org/jira/browse/IGNITE-11678 Project: Ignite Issue Type: Improvement Reporter: Anton Kalashnikov Assignee: Anton Kalashnikov Forbidding joining persistence node to in-memory cluster when baseline auto-adjust enabled and timeout equal to 0 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11650) Commutication worker doesn't kick client node after expired idleConnTimeout
Anton Kalashnikov created IGNITE-11650: -- Summary: Commutication worker doesn't kick client node after expired idleConnTimeout Key: IGNITE-11650 URL: https://issues.apache.org/jira/browse/IGNITE-11650 Project: Ignite Issue Type: Bug Reporter: Anton Kalashnikov Assignee: Anton Kalashnikov Reproduced by TcpCommunicationSpiFreezingClientTest.testFreezingClient {noformat} java.lang.AssertionError: Client node must be kicked from topology at org.junit.Assert.fail(Assert.java:88) at org.apache.ignite.testframework.junits.JUnitAssertAware.fail(JUnitAssertAware.java:49) at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpiFreezingClientTest.testFreezingClient(TcpCommunicationSpiFreezingClientTest.java:122) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.apache.ignite.testframework.junits.GridAbstractTest$6.run(GridAbstractTest.java:2102) {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11627) Test CheckpointFreeListTest.testRestoreFreeListCorrectlyAfterRandomStop always fails in DiskCompression suite
Anton Kalashnikov created IGNITE-11627: -- Summary: Test CheckpointFreeListTest.testRestoreFreeListCorrectlyAfterRandomStop always fails in DiskCompression suite Key: IGNITE-11627 URL: https://issues.apache.org/jira/browse/IGNITE-11627 Project: Ignite Issue Type: Bug Reporter: Anton Kalashnikov Assignee: Anton Kalashnikov https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=5828425958400232265=testDetails_IgniteTests24Java8=%3Cdefault%3E -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11605) Incorrect check condition in BinaryTypeRegistrationTest.shouldSendOnlyOneMetadataMessage
Anton Kalashnikov created IGNITE-11605: -- Summary: Incorrect check condition in BinaryTypeRegistrationTest.shouldSendOnlyOneMetadataMessage Key: IGNITE-11605 URL: https://issues.apache.org/jira/browse/IGNITE-11605 Project: Ignite Issue Type: Bug Reporter: Anton Kalashnikov Assignee: Anton Kalashnikov BinaryTypeRegistrationTest.shouldSendOnlyOneMetadataMessage is flaky. {noformat} java.lang.AssertionError: Expected :1 Actual :2 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.ignite.testframework.junits.JUnitAssertAware.assertEquals(JUnitAssertAware.java:94) at org.apache.ignite.internal.processors.cache.BinaryTypeRegistrationTest.shouldSendOnlyOneMetadataMessage(BinaryTypeRegistrationTest.java:106) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.apache.ignite.testframework.junits.GridAbstractTest$6.run(GridAbstractTest.java:2102) at java.lang.Thread.run(Thread.java:748) {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11590) NPE during onKernalStop in mvcc processor
Anton Kalashnikov created IGNITE-11590: -- Summary: NPE during onKernalStop in mvcc processor Key: IGNITE-11590 URL: https://issues.apache.org/jira/browse/IGNITE-11590 Project: Ignite Issue Type: Bug Reporter: Anton Kalashnikov Assignee: Anton Kalashnikov IgniteProjectionStartStopRestartSelfTest#testStopNodesByIds {noformat} java.lang.NullPointerException at java.util.concurrent.ConcurrentHashMap.replaceNode(ConcurrentHashMap.java:1106) at java.util.concurrent.ConcurrentHashMap.remove(ConcurrentHashMap.java:1097) at org.apache.ignite.internal.processors.cache.mvcc.MvccProcessorImpl.onCoordinatorFailed(MvccProcessorImpl.java:527) at org.apache.ignite.internal.processors.cache.mvcc.MvccProcessorImpl.onKernalStop(MvccProcessorImpl.java:459) at org.apache.ignite.internal.IgniteKernal.stop0(IgniteKernal.java:2335) at org.apache.ignite.internal.IgniteKernal.stop(IgniteKernal.java:2283) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.stop0(IgnitionEx.java:2570) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.stop(IgnitionEx.java:2533) at org.apache.ignite.internal.IgnitionEx.stop(IgnitionEx.java:330) at org.apache.ignite.internal.IgnitionEx.stop(IgnitionEx.java:297) at org.apache.ignite.Ignition.stop(Ignition.java:200) at org.apache.ignite.internal.IgniteProjectionStartStopRestartSelfTest.afterTest(IgniteProjectionStartStopRestartSelfTest.java:190) at org.apache.ignite.testframework.junits.GridAbstractTest.tearDown(GridAbstractTest.java:1804) at org.apache.ignite.testframework.junits.JUnit3TestLegacySupport.runTestCase(JUnit3TestLegacySupport.java:70) at org.apache.ignite.testframework.junits.GridAbstractTest$2.evaluate(GridAbstractTest.java:185) at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) at org.junit.rules.RunRules.evaluate(RunRules.java:20) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.apache.ignite.testframework.junits.GridAbstractTest.evaluateInsideFixture(GridAbstractTest.java:2579) at org.apache.ignite.testframework.junits.GridAbstractTest.access$500(GridAbstractTest.java:152) at org.apache.ignite.testframework.junits.GridAbstractTest$BeforeFirstAndAfterLastTestRule$1.evaluate(GridAbstractTest.java:2559) at org.junit.rules.RunRules.evaluate(RunRules.java:20) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.junit.runner.JUnitCore.run(JUnitCore.java:160) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) at com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47) at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242) at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70) {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11569) Enable baseline auto-adjust by default only for empty cluster
Anton Kalashnikov created IGNITE-11569: -- Summary: Enable baseline auto-adjust by default only for empty cluster Key: IGNITE-11569 URL: https://issues.apache.org/jira/browse/IGNITE-11569 Project: Ignite Issue Type: Bug Reporter: Anton Kalashnikov Assignee: Anton Kalashnikov It is required to enable baseline auto-adjust by default only for empty cluster -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11545) Logging baseline auto-adjust
Anton Kalashnikov created IGNITE-11545: -- Summary: Logging baseline auto-adjust Key: IGNITE-11545 URL: https://issues.apache.org/jira/browse/IGNITE-11545 Project: Ignite Issue Type: Bug Reporter: Anton Kalashnikov Assignee: Anton Kalashnikov It needs to add some extra log to baseline auto-adjust process. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11391) Test on free list is freezes sometimes
Anton Kalashnikov created IGNITE-11391: -- Summary: Test on free list is freezes sometimes Key: IGNITE-11391 URL: https://issues.apache.org/jira/browse/IGNITE-11391 Project: Ignite Issue Type: Bug Reporter: Anton Kalashnikov Assignee: Anton Kalashnikov CheckpointFreeListTest#testRestoreFreeListCorrectlyAfterRandomStop - freezed sometimes CheckpointFreeListTest.testFreeListRestoredCorrectly - flaky -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11382) Stop managers from all caches before caches stop
Anton Kalashnikov created IGNITE-11382: -- Summary: Stop managers from all caches before caches stop Key: IGNITE-11382 URL: https://issues.apache.org/jira/browse/IGNITE-11382 Project: Ignite Issue Type: Improvement Reporter: Anton Kalashnikov Assignee: Anton Kalashnikov It is required to stop all cache managers before stopping this caches -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11377) Display time to baseline auto-adjust event in console.sh
Anton Kalashnikov created IGNITE-11377: -- Summary: Display time to baseline auto-adjust event in console.sh Key: IGNITE-11377 URL: https://issues.apache.org/jira/browse/IGNITE-11377 Project: Ignite Issue Type: Improvement Reporter: Anton Kalashnikov Assignee: Anton Kalashnikov It required to add information about next auto-adjust. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11297) Improving read of hot variables in WAL
Anton Kalashnikov created IGNITE-11297: -- Summary: Improving read of hot variables in WAL Key: IGNITE-11297 URL: https://issues.apache.org/jira/browse/IGNITE-11297 Project: Ignite Issue Type: Improvement Reporter: Anton Kalashnikov Assignee: Anton Kalashnikov Looks like it is not neccessery mark some variables as volatile in FileWriteAheadLogManager because its initialized only one time on start but its have a lot of read of them. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10720) Decrease time to save metadata during checkpoint
Anton Kalashnikov created IGNITE-10720: -- Summary: Decrease time to save metadata during checkpoint Key: IGNITE-10720 URL: https://issues.apache.org/jira/browse/IGNITE-10720 Project: Ignite Issue Type: Improvement Reporter: Anton Kalashnikov Assignee: Anton Kalashnikov Looks like it's not neccessery save all metadata(like free list) under write checkpoint lock because sometimes it's too long. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10636) Deadlock on stopping node due to segmentation
Anton Kalashnikov created IGNITE-10636: -- Summary: Deadlock on stopping node due to segmentation Key: IGNITE-10636 URL: https://issues.apache.org/jira/browse/IGNITE-10636 Project: Ignite Issue Type: Bug Reporter: Anton Kalashnikov * Node have "put" operations * Node detected segmentation * Node do call to failulre handler(StopNodeFailureHandler) to stop itself * Failure handler try to get GridKernalGateway write lock but await all operation finished * GridNearTxLocal uninterruptebly await rollbackNearTxLocalAsync future Failure handler await: {noformat} Lock [object=java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@2370ac7a, ownerName=null, ownerId=-1] [03:24:53] : [Step 4/5] at sun.misc.Unsafe.park(Native Method) [03:24:53] : [Step 4/5] at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) [03:24:53] : [Step 4/5] at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireNanos(AbstractQueuedSynchronizer.java:934) [03:24:53] : [Step 4/5] at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(AbstractQueuedSynchronizer.java:1247) [03:24:53] : [Step 4/5] at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.tryLock(ReentrantReadWriteLock.java:1115) [03:24:53] : [Step 4/5] at o.a.i.i.util.StripedCompositeReadWriteLock$WriteLock.tryLock(StripedCompositeReadWriteLock.java:220) [03:24:53] : [Step 4/5] at o.a.i.i.GridKernalGatewayImpl.tryWriteLock(GridKernalGatewayImpl.java:143) [03:24:53] : [Step 4/5] at o.a.i.i.IgniteKernal.stop0(IgniteKernal.java:2313) [03:24:53] : [Step 4/5] at o.a.i.i.IgniteKernal.stop(IgniteKernal.java:2230) [03:24:53] : [Step 4/5] at o.a.i.i.IgnitionEx$IgniteNamedInstance.stop0(IgnitionEx.java:2613) [03:24:53] : [Step 4/5] - locked o.a.i.i.IgnitionEx$IgniteNamedInstance@41294371 [03:24:53] : [Step 4/5] at o.a.i.i.IgnitionEx$IgniteNamedInstance.stop(IgnitionEx.java:2576) [03:24:53] : [Step 4/5] at o.a.i.i.IgnitionEx.stop(IgnitionEx.java:379) [03:24:53] : [Step 4/5] at o.a.i.failure.StopNodeFailureHandler$1.run(StopNodeFailureHandler.java:36) [03:24:53] : [Step 4/5] at java.lang.Thread.run(Thread.java:748) {noformat} Put await: {noformat} java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304) at org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:178) at org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:141) at org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal.close(GridNearTxLocal.java:4358) at org.apache.ignite.internal.processors.cache.GridCacheSharedContext.endTx(GridCacheSharedContext.java:1017) at org.apache.ignite.internal.processors.cache.transactions.TransactionProxyImpl.close(TransactionProxyImpl.java:329) at org.apache.ignite.internal.processors.cache.distributed.GridCacheAbstractNodeRestartSelfTest$3.run(GridCacheAbstractNodeRestartSelfTest.java:782) at java.lang.Thread.run(Thread.java:748) {noformat} Reproduced by GridCacheAbstractNodeRestartSelfTest#testRestartWithPutTenNodesTwoBackups and other tests from this class -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10622) Undelivered ensure message to some nodes
Anton Kalashnikov created IGNITE-10622: -- Summary: Undelivered ensure message to some nodes Key: IGNITE-10622 URL: https://issues.apache.org/jira/browse/IGNITE-10622 Project: Ignite Issue Type: Bug Reporter: Anton Kalashnikov Assignee: Anton Kalashnikov We have follow case: * Grid from 5 nodes(node1, node2, node3, node4, node5) * node1 detect that node4 was failed and send NodeFailed message to node2 * node2 send NodeFailedNode3 message to node3 * node3 accepted message but does not handle because it also failed * node1 detect that node3 was failed and send NodeFailed message to node2 * node2 select new next node(node4) and send NodeFailedNode3 message to node4 As result node4 received only NodeFailedNode3 but don't received NodeFailedNode3. This case also valid for other ensure message which sending one after another. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10522) Remote node has not joined
Anton Kalashnikov created IGNITE-10522: -- Summary: Remote node has not joined Key: IGNITE-10522 URL: https://issues.apache.org/jira/browse/IGNITE-10522 Project: Ignite Issue Type: Bug Reporter: Anton Kalashnikov Assignee: Anton Kalashnikov Sometimes tests failed because of "Remote node has not joined" suit - PDS (Indexing) example test - IgniteWalRecoveryWithCompactionTest.testLargeRandomCrash -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10509) Rollback exception instead of timeout exception
Anton Kalashnikov created IGNITE-10509: -- Summary: Rollback exception instead of timeout exception Key: IGNITE-10509 URL: https://issues.apache.org/jira/browse/IGNITE-10509 Project: Ignite Issue Type: Bug Reporter: Anton Kalashnikov Assignee: Anton Kalashnikov Looks like we have race on changing transaction state between timedOut and state set Reproducer - TxRollbackOnTimeoutNearCacheTest.testEnlistManyWrite -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10491) Out of memory: unable to create new native thread(test150Clients)
Anton Kalashnikov created IGNITE-10491: -- Summary: Out of memory: unable to create new native thread(test150Clients) Key: IGNITE-10491 URL: https://issues.apache.org/jira/browse/IGNITE-10491 Project: Ignite Issue Type: Bug Reporter: Anton Kalashnikov Assignee: Anton Kalashnikov IgniteCache150ClientsTest.test150Clients https://ci.ignite.apache.org/viewLog.html?buildId=2424817=buildResultsDiv=IgniteTests24Java8_Cache6 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10423) Hangs grid-nio-worker-tcp-comm
Anton Kalashnikov created IGNITE-10423: -- Summary: Hangs grid-nio-worker-tcp-comm Key: IGNITE-10423 URL: https://issues.apache.org/jira/browse/IGNITE-10423 Project: Ignite Issue Type: Bug Reporter: Anton Kalashnikov Assignee: Anton Kalashnikov {noformat} [org.apache.ignite:ignite-core] [2018-11-24 04:49:34,736][ERROR][tcp-disco-msg-worker-#89615%replicated.GridCacheReplicatedNodeRestartSelfTest2%][G] Blocked system-critical thread has be en detected. This can lead to cluster-wide undefined behaviour [threadName=grid-nio-worker-tcp-comm-1, blockedFor=11s] [org.apache.ignite:ignite-core] [2018-11-24 04:49:44,894][WARN ][tcp-disco-msg-worker-#89615%replicated.GridCacheReplicatedNodeRestartSelfTest2%][G] Thread [name="grid-nio-worker-tcp-com m-1-#454082%replicated.GridCacheReplicatedNodeRestartSelfTest2%", id=562184, state=RUNNABLE, blockCnt=1, waitCnt=0] [org.apache.ignite:ignite-core] [2018-11-24 04:49:44,897][ERROR][tcp-disco-msg-worker-#89615%replicated.GridCacheReplicatedNodeRestartSelfTest2%][IgniteTestResources] Critical system err or detected. Will be handled accordingly to configured handler [hnd=NoOpFailureHandler [super=AbstractFailureHandler [ignoredFailureTypes=SingletonSet [SYSTEM_WORKER_BLOCKED]]], failureCtx=FailureContext [type=S YSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker [name=grid-nio-worker-tcp-comm-1, igniteInstanceName=replicated.GridCacheReplicatedNodeRestartSelfTest2, finished=false, heartbeatTs=154303498488 9]]] {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10156) Invalid convertation DynamicCacheDescriptor to StoredCacheData
Anton Kalashnikov created IGNITE-10156: -- Summary: Invalid convertation DynamicCacheDescriptor to StoredCacheData Key: IGNITE-10156 URL: https://issues.apache.org/jira/browse/IGNITE-10156 Project: Ignite Issue Type: Bug Reporter: Anton Kalashnikov Assignee: Anton Kalashnikov Invalid convertation DynamicCacheDescriptor to StoredCacheData in CacheRegistry#persistCacheConfigurations -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10111) Affinity doesn't recalculate after lost partitions
Anton Kalashnikov created IGNITE-10111: -- Summary: Affinity doesn't recalculate after lost partitions Key: IGNITE-10111 URL: https://issues.apache.org/jira/browse/IGNITE-10111 Project: Ignite Issue Type: Bug Reporter: Anton Kalashnikov Attachments: AffinityLostPartitionTest.java Case: 1)Start 3 data nodes and activate the cluster with cache with 1 backup and PartitionLossPolicy.READ_ONLY_SAFE. 2)Start client and add the data to your cache. Stop the client 3)Stop DN2 and clear it pds and val 4)Start DN2. Rebalance will start. 5)During rebalance stop DN3. At this moment some partitions from DN2 marked as LOST. 6)Start DN3. In fact all data was come back but affinity instead of DN3 use DN2 which have partitions(lost) with loss some data. Reproducer is attached. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9962) Unhandled exception during BatchCacheChangeRequest
Anton Kalashnikov created IGNITE-9962: - Summary: Unhandled exception during BatchCacheChangeRequest Key: IGNITE-9962 URL: https://issues.apache.org/jira/browse/IGNITE-9962 Project: Ignite Issue Type: Bug Reporter: Anton Kalashnikov Assignee: Anton Kalashnikov Node hangs if exception in GridQueryProcessor#onCacheChangeRequested throw. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9909) Merge FsyncWalManager and WalManager
Anton Kalashnikov created IGNITE-9909: - Summary: Merge FsyncWalManager and WalManager Key: IGNITE-9909 URL: https://issues.apache.org/jira/browse/IGNITE-9909 Project: Ignite Issue Type: Improvement Reporter: Anton Kalashnikov Assignee: Anton Kalashnikov Now we have two similar WAL managers FileWriteAheadLogManager and FsyncModeFileWriteAheadLogManager and because of similarity it is too difficult to support them. It is need to extract unique part from them and leave only one manager. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9761) Deadlock SegmentArchivedStorage <-> SegmentLockStorage
Anton Kalashnikov created IGNITE-9761: - Summary: Deadlock SegmentArchivedStorage <-> SegmentLockStorage Key: IGNITE-9761 URL: https://issues.apache.org/jira/browse/IGNITE-9761 Project: Ignite Issue Type: Bug Reporter: Anton Kalashnikov Assignee: Anton Kalashnikov {noformat} Found one Java-level deadlock: = "wal-file-archiver%cache.IgniteClusterActivateDeactivateTestWithPersistence2-#11729%cache.IgniteClusterActivateDeactivateTestWithPersistence2%": waiting to lock monitor 0x7fa33c0121e8 (object 0xf7142560, a org.apache.ignite.internal.processors.cache.persistence.wal.aware.SegmentLockStorage), which is held by "exchange-worker-#11646%cache.IgniteClusterActivateDeactivateTestWithPersistence2%" "exchange-worker-#11646%cache.IgniteClusterActivateDeactivateTestWithPersistence2%": waiting to lock monitor 0x7fa3503b6058 (object 0xf7142578, a org.apache.ignite.internal.processors.cache.persistence.wal.aware.SegmentArchivedStorage), which is held by "wal-file-archiver%cache.IgniteClusterActivateDeactivateTestWithPersistence2-#11729%cache.IgniteClusterActivateDeactivateTestWithPersistence2%" Java stack information for the threads listed above: === "wal-file-archiver%cache.IgniteClusterActivateDeactivateTestWithPersistence2-#11729%cache.IgniteClusterActivateDeactivateTestWithPersistence2%": at org.apache.ignite.internal.processors.cache.persistence.wal.aware.SegmentLockStorage.locked(SegmentLockStorage.java:41) - waiting to lock <0xf7142560> (a org.apache.ignite.internal.processors.cache.persistence.wal.aware.SegmentLockStorage) at org.apache.ignite.internal.processors.cache.persistence.wal.aware.SegmentArchivedStorage.markAsMovedToArchive(SegmentArchivedStorage.java:101) - locked <0xf7142578> (a org.apache.ignite.internal.processors.cache.persistence.wal.aware.SegmentArchivedStorage) at org.apache.ignite.internal.processors.cache.persistence.wal.aware.SegmentAware.markAsMovedToArchive(SegmentAware.java:91) at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileArchiver.body(FileWriteAheadLogManager.java:1643) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) at java.lang.Thread.run(Thread.java:748) "exchange-worker-#11646%cache.IgniteClusterActivateDeactivateTestWithPersistence2%": at org.apache.ignite.internal.processors.cache.persistence.wal.aware.SegmentArchivedStorage.onSegmentUnlocked(SegmentArchivedStorage.java:135) - waiting to lock <0xf7142578> (a org.apache.ignite.internal.processors.cache.persistence.wal.aware.SegmentArchivedStorage) at org.apache.ignite.internal.processors.cache.persistence.wal.aware.SegmentArchivedStorage$$Lambda$2/2113450692.accept(Unknown Source) at org.apache.ignite.internal.processors.cache.persistence.wal.aware.SegmentObservable.lambda$notifyObservers$0(SegmentObservable.java:44) at org.apache.ignite.internal.processors.cache.persistence.wal.aware.SegmentObservable$$Lambda$6/688404745.accept(Unknown Source) at java.util.ArrayList.forEach(ArrayList.java:1257) at org.apache.ignite.internal.processors.cache.persistence.wal.aware.SegmentObservable.notifyObservers(SegmentObservable.java:44) - locked <0xf7142560> (a org.apache.ignite.internal.processors.cache.persistence.wal.aware.SegmentLockStorage) at org.apache.ignite.internal.processors.cache.persistence.wal.aware.SegmentLockStorage.releaseWorkSegment(SegmentLockStorage.java:74) - locked <0xf7142560> (a org.apache.ignite.internal.processors.cache.persistence.wal.aware.SegmentLockStorage) at org.apache.ignite.internal.processors.cache.persistence.wal.aware.SegmentAware.releaseWorkSegment(SegmentAware.java:226) at org.apache.ignite.internal.processors.cache.persistence.wal.io.LockedReadFileInput.ensure(LockedReadFileInput.java:81) at org.apache.ignite.internal.processors.cache.persistence.wal.serializer.RecordV1Serializer.readSegmentHeader(RecordV1Serializer.java:260) at org.apache.ignite.internal.processors.cache.persistence.wal.AbstractWalRecordsIterator.initReadHandle(AbstractWalRecordsIterator.java:381) at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$RecordsIterator.initReadHandle(FileWriteAheadLogManager.java:2942) at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$RecordsIterator.advanceSegment(FileWriteAheadLogManager.java:3024) at org.apache.ignite.internal.processors.cache.persistence.wal.AbstractWalRecordsIterator.advance(AbstractWalRecordsIterator.java:163) at
[jira] [Created] (IGNITE-9760) NPE is possible during WAL flushing for FSYNC mode
Anton Kalashnikov created IGNITE-9760: - Summary: NPE is possible during WAL flushing for FSYNC mode Key: IGNITE-9760 URL: https://issues.apache.org/jira/browse/IGNITE-9760 Project: Ignite Issue Type: Bug Reporter: Anton Kalashnikov Assignee: Anton Kalashnikov {noformat} class org.apache.ignite.IgniteCheckedException: Failed to update keys (retry update if possible).: [9483] at org.apache.ignite.internal.util.IgniteUtils.cast(IgniteUtils.java:7409) at org.apache.ignite.internal.util.future.GridFutureAdapter.resolve(GridFutureAdapter.java:261) at org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:172) at org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:141) at org.apache.ignite.testframework.GridTestUtils.lambda$runMultiThreadedAsync$96d302c5$1(GridTestUtils.java:853) at org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:385) at org.apache.ignite.internal.util.future.GridFutureAdapter.unblock(GridFutureAdapter.java:349) at org.apache.ignite.internal.util.future.GridFutureAdapter.unblockAll(GridFutureAdapter.java:337) at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:497) at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:476) at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:464) at org.apache.ignite.testframework.GridTestUtils.lambda$runAsync$2(GridTestUtils.java:1005) at org.apache.ignite.testframework.GridTestUtils$7.call(GridTestUtils.java:1295) at org.apache.ignite.testframework.GridTestThread.run(GridTestThread.java:86) Caused by: org.apache.ignite.cache.CachePartialUpdateException: Failed to update keys (retry update if possible).: [9483] at org.apache.ignite.internal.processors.cache.GridCacheUtils.convertToCacheException(GridCacheUtils.java:1307) at org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.cacheException(IgniteCacheProxyImpl.java:1742) at org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.put(IgniteCacheProxyImpl.java:1092) at org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.put(GatewayProtectedCacheProxy.java:820) at org.apache.ignite.internal.processors.cache.persistence.db.wal.WalRolloverRecordLoggingTest.lambda$testAvoidInfinityWaitingOnRolloverOfSegment$0(WalRolloverRecordLoggingTest.java:119) ... 2 more Caused by: class org.apache.ignite.internal.processors.cache.CachePartialUpdateCheckedException: Failed to update keys (retry update if possible).: [9483] at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicAbstractUpdateFuture.onPrimaryError(GridNearAtomicAbstractUpdateFuture.java:397) at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicSingleUpdateFuture.onPrimaryResponse(GridNearAtomicSingleUpdateFuture.java:253) at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicAbstractUpdateFuture$1.apply(GridNearAtomicAbstractUpdateFuture.java:303) at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicAbstractUpdateFuture$1.apply(GridNearAtomicAbstractUpdateFuture.java:300) at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal0(GridDhtAtomicCache.java:1855) at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal(GridDhtAtomicCache.java:1668) at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicAbstractUpdateFuture.sendSingleRequest(GridNearAtomicAbstractUpdateFuture.java:299) at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicSingleUpdateFuture.map(GridNearAtomicSingleUpdateFuture.java:483) at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicSingleUpdateFuture.mapOnTopology(GridNearAtomicSingleUpdateFuture.java:443) at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicAbstractUpdateFuture.map(GridNearAtomicAbstractUpdateFuture.java:248) at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.update0(GridDhtAtomicCache.java:1153) at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.put0(GridDhtAtomicCache.java:611) at org.apache.ignite.internal.processors.cache.GridCacheAdapter.put(GridCacheAdapter.java:2449) at
[jira] [Created] (IGNITE-9729) Ability to start GridQueryProcessor in parallel
Anton Kalashnikov created IGNITE-9729: - Summary: Ability to start GridQueryProcessor in parallel Key: IGNITE-9729 URL: https://issues.apache.org/jira/browse/IGNITE-9729 Project: Ignite Issue Type: Improvement Reporter: Anton Kalashnikov After task [StartCachesInParallel|https://issues.apache.org/jira/browse/IGNITE-8006] we can start caches in parallel but GridQueryProcessor is narrow place because it should be start consistently by following reasons: * checking index to duplicate(and other checking) require one order on every nodes. * onCacheStart and createSchema contains a lot of mutex. * maybe it has other reasons. After this task GridCacheProcessor#prepareStartCaches should be rewrited. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9441) Failed to read WAL record at position
Anton Kalashnikov created IGNITE-9441: - Summary: Failed to read WAL record at position Key: IGNITE-9441 URL: https://issues.apache.org/jira/browse/IGNITE-9441 Project: Ignite Issue Type: Test Reporter: Anton Kalashnikov Assignee: Anton Kalashnikov IgnitePdsAtomicCacheHistoricalRebalancingTest.testPartitionCounterConsistencyOnUnstableTopology IgnitePdsAtomicCacheHistoricalRebalancingTest.testTopologyChangesWithConstantLoad IgnitePdsTxHistoricalRebalancingTest.testPartitionCounterConsistencyOnUnstableTopology -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9424) Partition equal to -1 during insert to atomic cache
Anton Kalashnikov created IGNITE-9424: - Summary: Partition equal to -1 during insert to atomic cache Key: IGNITE-9424 URL: https://issues.apache.org/jira/browse/IGNITE-9424 Project: Ignite Issue Type: Test Reporter: Anton Kalashnikov Assignee: Anton Kalashnikov Reproduced by IgnitePdsThreadInterruptionTest.testInterruptsOnWALWrite {noformat} org.apache.ignite.cache.CachePartialUpdateException: Failed to update keys (retry update if possible).: [31108] at org.apache.ignite.internal.processors.cache.GridCacheUtils.convertToCacheException(GridCacheUtils.java:1261) at org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.cacheException(IgniteCacheProxyImpl.java:1740) at org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.put(IgniteCacheProxyImpl.java:1090) at org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.put(GatewayProtectedCacheProxy.java:817) at org.apache.ignite.internal.processors.cache.persistence.db.file.IgnitePdsThreadInterruptionTest$3.run(IgnitePdsThreadInterruptionTest.java:208) at java.lang.Thread.run(Thread.java:748) Caused by: class org.apache.ignite.internal.processors.cache.CachePartialUpdateCheckedException: Failed to update keys (retry update if possible).: [31108] at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicAbstractUpdateFuture.onPrimaryError(GridNearAtomicAbstractUpdateFuture.java:397) at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicSingleUpdateFuture.onPrimaryResponse(GridNearAtomicSingleUpdateFuture.java:253) at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicAbstractUpdateFuture$1.apply(GridNearAtomicAbstractUpdateFuture.java:303) at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicAbstractUpdateFuture$1.apply(GridNearAtomicAbstractUpdateFuture.java:300) at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicAbstractUpdateFuture.map(GridDhtAtomicAbstractUpdateFuture.java:394) at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal0(GridDhtAtomicCache.java:1865) at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal(GridDhtAtomicCache.java:1664) at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicAbstractUpdateFuture.sendSingleRequest(GridNearAtomicAbstractUpdateFuture.java:299) at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicSingleUpdateFuture.map(GridNearAtomicSingleUpdateFuture.java:483) at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicSingleUpdateFuture.mapOnTopology(GridNearAtomicSingleUpdateFuture.java:443) at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicAbstractUpdateFuture.map(GridNearAtomicAbstractUpdateFuture.java:248) at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.update0(GridDhtAtomicCache.java:1153) at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.put0(GridDhtAtomicCache.java:611) at org.apache.ignite.internal.processors.cache.GridCacheAdapter.put(GridCacheAdapter.java:2430) at org.apache.ignite.internal.processors.cache.GridCacheAdapter.put(GridCacheAdapter.java:2407) at org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.put(IgniteCacheProxyImpl.java:1087) ... 3 more Suppressed: class org.apache.ignite.IgniteCheckedException: Failed to update keys. at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.UpdateErrors.addFailedKey(UpdateErrors.java:108) at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicUpdateResponse.addFailedKey(GridNearAtomicUpdateResponse.java:329) at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateSingle(GridDhtAtomicCache.java:2623) at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.update(GridDhtAtomicCache.java:1942) at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal0(GridDhtAtomicCache.java:1776) ... 13 more Suppressed: class org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException: Runtime failure on search row: org.apache.ignite.internal.processors.cache.tree.SearchRow@371d7ce1 at
[jira] [Created] (IGNITE-9407) Node is hang when it was stopping from several client in one time
Anton Kalashnikov created IGNITE-9407: - Summary: Node is hang when it was stopping from several client in one time Key: IGNITE-9407 URL: https://issues.apache.org/jira/browse/IGNITE-9407 Project: Ignite Issue Type: Test Reporter: Anton Kalashnikov Reproduced by IgniteChangeGlobalStateTest#testFailGetLock {noformat} [2018-08-27 19:00:29,463][ERROR][sys-#32068%node0-backUp-client%][GridClosureProcessor] Closure execution failed with error. [22:00:29]W: [org.apache.ignite:ignite-core] java.lang.AssertionError: ignite-sys-cache [22:00:29]W: [org.apache.ignite:ignite-core]at org.apache.ignite.internal.processors.cache.GridCacheProcessor.internalCacheEx(GridCacheProcessor.java:3847) [22:00:29]W: [org.apache.ignite:ignite-core]at org.apache.ignite.internal.processors.cache.GridCacheProcessor.utilityCache(GridCacheProcessor.java:3829) [22:00:29]W: [org.apache.ignite:ignite-core]at org.apache.ignite.internal.processors.service.GridServiceProcessor.updateUtilityCache(GridServiceProcessor.java:298) [22:00:29] : [Step 3/4] [2018-08-27 19:00:29,463][INFO ][sys-#32069%node2-backUp-client%][GridCacheProcessor] Stopped cache [cacheName=ignite-sys-cache] [22:00:29]W: [org.apache.ignite:ignite-core]at org.apache.ignite.internal.processors.service.GridServiceProcessor.onKernalStart0(GridServiceProcessor.java:241) [22:00:29]W: [org.apache.ignite:ignite-core]at org.apache.ignite.internal.processors.service.GridServiceProcessor.onActivate(GridServiceProcessor.java:397) [22:00:29]W: [org.apache.ignite:ignite-core]at org.apache.ignite.internal.processors.cluster.GridClusterStateProcessor$6.run(GridClusterStateProcessor.java:1151) [22:00:29]W: [org.apache.ignite:ignite-core]at org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:6756) [22:00:29]W: [org.apache.ignite:ignite-core]at org.apache.ignite.internal.processors.closure.GridClosureProcessor$1.body(GridClosureProcessor.java:827) [22:00:29]W: [org.apache.ignite:ignite-core]at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) [22:00:29]W: [org.apache.ignite:ignite-core]at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [22:00:29]W: [org.apache.ignite:ignite-core]at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [22:00:29]W: [org.apache.ignite:ignite-core]at java.lang.Thread.run(Thread.java:748) [22:00:29]W: [org.apache.ignite:ignite-core] [2018-08-27 19:00:29,469][ERROR][sys-#32068%node0-backUp-client%][GridClosureProcessor] Runtime error caught during grid runnable execution: GridWorker [name=closure-proc-worker, igniteInstanceName=node0-backUp-client, finished=false, hashCode=669424318, interrupted=false, runner=sys-#32068%node0-backUp-client%] [22:00:29]W: [org.apache.ignite:ignite-core] java.lang.AssertionError: ignite-sys-cache [22:00:29]W: [org.apache.ignite:ignite-core]at org.apache.ignite.internal.processors.cache.GridCacheProcessor.internalCacheEx(GridCacheProcessor.java:3847) [22:00:29]W: [org.apache.ignite:ignite-core]at org.apache.ignite.internal.processors.cache.GridCacheProcessor.utilityCache(GridCacheProcessor.java:3829) [22:00:29]W: [org.apache.ignite:ignite-core]at org.apache.ignite.internal.processors.service.GridServiceProcessor.updateUtilityCache(GridServiceProcessor.java:298) [22:00:29]W: [org.apache.ignite:ignite-core]at org.apache.ignite.internal.processors.service.GridServiceProcessor.onKernalStart0(GridServiceProcessor.java:241) [22:00:29]W: [org.apache.ignite:ignite-core]at org.apache.ignite.internal.processors.service.GridServiceProcessor.onActivate(GridServiceProcessor.java:397) [22:00:29]W: [org.apache.ignite:ignite-core]at org.apache.ignite.internal.processors.cluster.GridClusterStateProcessor$6.run(GridClusterStateProcessor.java:1151) [22:00:29]W: [org.apache.ignite:ignite-core]at org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:6756) [22:00:29]W: [org.apache.ignite:ignite-core]at org.apache.ignite.internal.processors.closure.GridClosureProcessor$1.body(GridClosureProcessor.java:827) [22:00:29]W: [org.apache.ignite:ignite-core]at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) [22:00:29]W: [org.apache.ignite:ignite-core]at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [22:00:29]W:
[jira] [Created] (IGNITE-9402) IgnitePdsDiskErrorsRecoveringTest.testRecoveringOnWALWritingFail2 because of LogOnly mode
Anton Kalashnikov created IGNITE-9402: - Summary: IgnitePdsDiskErrorsRecoveringTest.testRecoveringOnWALWritingFail2 because of LogOnly mode Key: IGNITE-9402 URL: https://issues.apache.org/jira/browse/IGNITE-9402 Project: Ignite Issue Type: Test Reporter: Anton Kalashnikov Assignee: Anton Kalashnikov IgnitePdsDiskErrorsRecoveringTest.testRecoveringOnWALWritingFail2 failed because it can lost last WAL data which have not flushed yet. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9391) Incorrect calculated esitmated rabalancing finish time
Anton Kalashnikov created IGNITE-9391: - Summary: Incorrect calculated esitmated rabalancing finish time Key: IGNITE-9391 URL: https://issues.apache.org/jira/browse/IGNITE-9391 Project: Ignite Issue Type: Test Reporter: Anton Kalashnikov Actually looks like test CacheGroupsMetricsRebalanceTest.testRebalanceEstimateFinishTime is incorrect or we have bug in esitmated rabalancing finish time calculation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9327) Client Nodes hangs because client reconnect not handled
Anton Kalashnikov created IGNITE-9327: - Summary: Client Nodes hangs because client reconnect not handled Key: IGNITE-9327 URL: https://issues.apache.org/jira/browse/IGNITE-9327 Project: Ignite Issue Type: Test Reporter: Anton Kalashnikov Assignee: Anton Kalashnikov Reproduced by IgniteCacheClientReconnectTest#testClientInForceServerModeStopsOnExchangeHistoryExhaustion If IgniteNeedReconnectException happend we should stop not if reconnect doens't supported. But after https://issues.apache.org/jira/browse/IGNITE-8673 we became to ignore this case. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9307) Node is hang when it was stopping during eviction
Anton Kalashnikov created IGNITE-9307: - Summary: Node is hang when it was stopping during eviction Key: IGNITE-9307 URL: https://issues.apache.org/jira/browse/IGNITE-9307 Project: Ignite Issue Type: Bug Reporter: Anton Kalashnikov Assignee: Anton Kalashnikov {noformat} "main" #1 prio=5 os_prio=0 tid=0x7f0ae800e000 nid=0x2e26 waiting on condition [0x7f0aef33] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304) at org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:177) at org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:140) at org.apache.ignite.internal.processors.cache.distributed.dht.PartitionsEvictManager$GroupEvictionContext.awaitFinish(PartitionsEvictManager.java:362) at org.apache.ignite.internal.processors.cache.distributed.dht.PartitionsEvictManager$GroupEvictionContext$$Lambda$203/1143143890.accept(Unknown Source) at java.util.concurrent.ConcurrentHashMap.forEach(ConcurrentHashMap.java:1597) at org.apache.ignite.internal.processors.cache.distributed.dht.PartitionsEvictManager$GroupEvictionContext.awaitFinishAll(PartitionsEvictManager.java:348) at org.apache.ignite.internal.processors.cache.distributed.dht.PartitionsEvictManager$GroupEvictionContext.access$100(PartitionsEvictManager.java:265) at org.apache.ignite.internal.processors.cache.distributed.dht.PartitionsEvictManager.onCacheGroupStopped(PartitionsEvictManager.java:103) at org.apache.ignite.internal.processors.cache.CacheGroupContext.stopGroup(CacheGroupContext.java:725) at org.apache.ignite.internal.processors.cache.GridCacheProcessor.stopCacheGroup(GridCacheProcessor.java:2366) at org.apache.ignite.internal.processors.cache.GridCacheProcessor.stopCacheGroup(GridCacheProcessor.java:2359) at org.apache.ignite.internal.processors.cache.GridCacheProcessor.stopCaches(GridCacheProcessor.java:959) at org.apache.ignite.internal.processors.cache.GridCacheProcessor.stop(GridCacheProcessor.java:924) at org.apache.ignite.internal.IgniteKernal.stop0(IgniteKernal.java:2206) at org.apache.ignite.internal.IgniteKernal.stop(IgniteKernal.java:2081) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.stop0(IgnitionEx.java:2594) - locked <0xf39b8770> (a org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.stop(IgnitionEx.java:2557) at org.apache.ignite.internal.IgnitionEx.stop(IgnitionEx.java:374) at org.apache.ignite.Ignition.stop(Ignition.java:225) at org.apache.ignite.testframework.junits.GridAbstractTest.stopGrid(GridAbstractTest.java:1153) at org.apache.ignite.testframework.junits.GridAbstractTest.stopAllGrids(GridAbstractTest.java:1196) at org.apache.ignite.testframework.junits.GridAbstractTest.stopAllGrids(GridAbstractTest.java:1174) at org.apache.ignite.internal.processors.query.h2.IgniteSqlQueryMinMaxTest.afterTest(IgniteSqlQueryMinMaxTest.java:55) at org.apache.ignite.testframework.junits.GridAbstractTest.tearDown(GridAbstractTest.java:1766) at org.apache.ignite.testframework.junits.common.GridCommonAbstractTest.tearDown(GridCommonAbstractTest.java:503) at junit.framework.TestCase.runBare(TestCase.java:146) at junit.framework.TestResult$1.protect(TestResult.java:122) at junit.framework.TestResult.runProtected(TestResult.java:142) at junit.framework.TestResult.run(TestResult.java:125) at junit.framework.TestCase.run(TestCase.java:129) at junit.framework.TestSuite.runTest(TestSuite.java:255) at junit.framework.TestSuite.run(TestSuite.java:250) at junit.framework.TestSuite.runTest(TestSuite.java:255) at junit.framework.TestSuite.run(TestSuite.java:250) at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:84) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:369) at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:275) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:239) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:160) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at
[jira] [Created] (IGNITE-9268) Hangs on await offheap read lock
Anton Kalashnikov created IGNITE-9268: - Summary: Hangs on await offheap read lock Key: IGNITE-9268 URL: https://issues.apache.org/jira/browse/IGNITE-9268 Project: Ignite Issue Type: Test Reporter: Anton Kalashnikov Assignee: Anton Kalashnikov During awaitiing of read lock node has failed and handler are stopping the node. And nobody can wake up awaiting thread. {noformat} Lock [object=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@65067d90, ownerName=null, ownerId=-1] [12:24:51] : [Step 3/4] at sun.misc.Unsafe.park(Native Method) [12:24:51] : [Step 3/4] at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) [12:24:51] : [Step 3/4] at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) [12:24:51] : [Step 3/4] at o.a.i.i.util.OffheapReadWriteLock.waitAcquireReadLock(OffheapReadWriteLock.java:435) [12:24:51] : [Step 3/4] at o.a.i.i.util.OffheapReadWriteLock.readLock(OffheapReadWriteLock.java:142) [12:24:51] : [Step 3/4] at o.a.i.i.pagemem.impl.PageMemoryNoStoreImpl.readLock(PageMemoryNoStoreImpl.java:463) [12:24:51] : [Step 3/4] at o.a.i.i.processors.cache.persistence.tree.util.PageHandler.readLock(PageHandler.java:185) [12:24:51] : [Step 3/4] at o.a.i.i.processors.cache.persistence.tree.util.PageHandler.readPage(PageHandler.java:157) [12:24:51] : [Step 3/4] at o.a.i.i.processors.cache.persistence.DataStructure.read(DataStructure.java:334) [12:24:51] : [Step 3/4] at o.a.i.i.processors.cache.persistence.tree.BPlusTree.putDown(BPlusTree.java:2348) [12:24:51] : [Step 3/4] at o.a.i.i.processors.cache.persistence.tree.BPlusTree.putDown(BPlusTree.java:2360) [12:24:51] : [Step 3/4] at o.a.i.i.processors.cache.persistence.tree.BPlusTree.putDown(BPlusTree.java:2360) [12:24:51] : [Step 3/4] at o.a.i.i.processors.cache.persistence.tree.BPlusTree.putDown(BPlusTree.java:2360) [12:24:51] : [Step 3/4] at o.a.i.i.processors.cache.persistence.tree.BPlusTree.putDown(BPlusTree.java:2360) [12:24:51] : [Step 3/4] at o.a.i.i.processors.cache.persistence.tree.BPlusTree.putDown(BPlusTree.java:2360) [12:24:51] : [Step 3/4] at o.a.i.i.processors.cache.persistence.tree.BPlusTree.putDown(BPlusTree.java:2360) [12:24:51] : [Step 3/4] at o.a.i.i.processors.cache.persistence.tree.BPlusTree.putDown(BPlusTree.java:2360) [12:24:51] : [Step 3/4] at o.a.i.i.processors.cache.persistence.tree.BPlusTree.putDown(BPlusTree.java:2360) [12:24:51] : [Step 3/4] at o.a.i.i.processors.cache.persistence.tree.BPlusTree.putDown(BPlusTree.java:2360) [12:24:51] : [Step 3/4] at o.a.i.i.processors.cache.persistence.tree.BPlusTree.putDown(BPlusTree.java:2360) [12:24:51] : [Step 3/4] at o.a.i.i.processors.cache.persistence.tree.BPlusTree.putDown(BPlusTree.java:2360) [12:24:51] : [Step 3/4] at o.a.i.i.processors.cache.persistence.tree.BPlusTree.putDown(BPlusTree.java:2360) [12:24:51] : [Step 3/4] at o.a.i.i.processors.cache.persistence.tree.BPlusTree.putDown(BPlusTree.java:2360) [12:24:51] : [Step 3/4] at o.a.i.i.processors.cache.persistence.tree.BPlusTree.putDown(BPlusTree.java:2360) [12:24:51] : [Step 3/4] at o.a.i.i.processors.cache.persistence.tree.BPlusTree.putDown(BPlusTree.java:2360) {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9250) Replace CacheAffinitySharedManager.CachesInfo by ClusterCachesInfo
Anton Kalashnikov created IGNITE-9250: - Summary: Replace CacheAffinitySharedManager.CachesInfo by ClusterCachesInfo Key: IGNITE-9250 URL: https://issues.apache.org/jira/browse/IGNITE-9250 Project: Ignite Issue Type: Improvement Reporter: Anton Kalashnikov Assignee: Anton Kalashnikov Now we have duplicate of registerCaches(and groups). They holds in ClusterCachesInfo - main storage, and also they holds in CacheAffinitySharedManager.CachesInfo. It looks like redundantly and can lead to unconsistancy of caches info. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9004) Failed to move temp file during segment creation
Anton Kalashnikov created IGNITE-9004: - Summary: Failed to move temp file during segment creation Key: IGNITE-9004 URL: https://issues.apache.org/jira/browse/IGNITE-9004 Project: Ignite Issue Type: Test Reporter: Anton Kalashnikov Assignee: Anton Kalashnikov Reproduced by Activate/Deactivate suit, for example IgniteChangeGlobalStateTest#testStopPrimaryAndActivateFromClientNode {noformat} class org.apache.ignite.internal.pagemem.wal.StorageException: Failed to move temp file to a regular WAL segment file: /data/teamcity/work/c182b70f2dfa650 7/work/IgniteChangeGlobalStateTest/db/wal/node1/0002.wal [13:56:05]W: [org.apache.ignite:ignite-core] at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.createFile(FileWriteAheadLogManager.java:1446) [13:56:05]W: [org.apache.ignite:ignite-core] at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.checkFiles(FileWriteAheadLogManager.java:2269) [13:56:05]W: [org.apache.ignite:ignite-core] at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.access$4500(FileWriteAheadLogManager.java:143) [13:56:05]W: [org.apache.ignite:ignite-core] at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileArchiver.allocateRemainingFiles(FileWriteAheadLogManage r.java:1862) [13:56:05]W: [org.apache.ignite:ignite-core] at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileArchiver.body(FileWriteAheadLogManager.java:1606) [13:56:05]W: [org.apache.ignite:ignite-core] at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) [13:56:05]W: [org.apache.ignite:ignite-core] at java.lang.Thread.run(Thread.java:748) [13:56:05]W: [org.apache.ignite:ignite-core] Caused by: java.nio.file.NoSuchFileException: /data/teamcity/work/c182b70f2dfa6507/work/IgniteChangeGlobalStateTest/db/wal/node1/0002.wal.tmp [13:56:05]W: [org.apache.ignite:ignite-core] at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) [13:56:05]W: [org.apache.ignite:ignite-core] at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) [13:56:05]W: [org.apache.ignite:ignite-core] at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) [13:56:05]W: [org.apache.ignite:ignite-core] at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:409) [13:56:05]W: [org.apache.ignite:ignite-core] at sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262) [13:56:05]W: [org.apache.ignite:ignite-core] at java.nio.file.Files.move(Files.java:1395) [13:56:05]W: [org.apache.ignite:ignite-core] at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.createFile(FileWriteAheadLogManager.java:1442) [13:56:05]W: [org.apache.ignite:ignite-core] ... 6 more {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8998) Client hangs after merge exchange
Anton Kalashnikov created IGNITE-8998: - Summary: Client hangs after merge exchange Key: IGNITE-8998 URL: https://issues.apache.org/jira/browse/IGNITE-8998 Project: Ignite Issue Type: Test Reporter: Anton Kalashnikov Assignee: Anton Kalashnikov Reproduce by CacheExchangeMergeTest#testConcurrentStartServersAndClients -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8969) Unable to await partitions release latch
Anton Kalashnikov created IGNITE-8969: - Summary: Unable to await partitions release latch Key: IGNITE-8969 URL: https://issues.apache.org/jira/browse/IGNITE-8969 Project: Ignite Issue Type: Test Reporter: Anton Kalashnikov Assignee: Anton Kalashnikov Unable to await partitions release latch within timeout for ClientLatch after this node become latch coordinator after old latch coordinator was failed. Reproduced by TcpDiscoverySslSelfTest.testNodeShutdownOnRingMessageWorkerStartNotFinished, TcpDiscoverySslTrustedSelfTest.testNodeShutdownOnRingMessageWorkerStartNotFinished -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8953) Test fail: Bind address already in use(TcpDiscoverySpiFailureTimeoutSelfTest)
Anton Kalashnikov created IGNITE-8953: - Summary: Test fail: Bind address already in use(TcpDiscoverySpiFailureTimeoutSelfTest) Key: IGNITE-8953 URL: https://issues.apache.org/jira/browse/IGNITE-8953 Project: Ignite Issue Type: Test Reporter: Anton Kalashnikov During execution beforeTestsStarted in TcpDiscoverySpiFailureTimeoutSelfTest, TcpDiscoverySpiSelfTest, registration of MBean server failed with error "Bind address already in use" but tests continue to execute because try-catch block. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8940) Activation job hangs(IgniteChangeGlobalStateTest#testFailGetLock)
Anton Kalashnikov created IGNITE-8940: - Summary: Activation job hangs(IgniteChangeGlobalStateTest#testFailGetLock) Key: IGNITE-8940 URL: https://issues.apache.org/jira/browse/IGNITE-8940 Project: Ignite Issue Type: Test Reporter: Anton Kalashnikov given: # Cluster consisted from 3 nodes which should fail activation cause "can't get lock" # 3 clients for cluster when: # Try to activate cluster from one of client # Activation job start execution on one of 3 server nodes # Activation triggered exchange # Exchange is finished with error cause lock can not be acquired then: If job is executed on coordinator, its future successfuly finished otherwise job is hang expected: Job is finished on any nodes. why it is happen: During handling of GridDhtPartitionsFullMessage on non-coordinator node, execution finished before job's future finished cause if clause(GridDhtPartitionsExchangeFuture#3230) Reason of this maybe task of https://issues.apache.org/jira/browse/IGNITE-8657 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8872) WAL scanner for crash recovery
Anton Kalashnikov created IGNITE-8872: - Summary: WAL scanner for crash recovery Key: IGNITE-8872 URL: https://issues.apache.org/jira/browse/IGNITE-8872 Project: Ignite Issue Type: Task Reporter: Anton Kalashnikov Assignee: Anton Kalashnikov -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8739) Implement WA for TCP communication related to hanging on descriptor reservation
Anton Kalashnikov created IGNITE-8739: - Summary: Implement WA for TCP communication related to hanging on descriptor reservation Key: IGNITE-8739 URL: https://issues.apache.org/jira/browse/IGNITE-8739 Project: Ignite Issue Type: Bug Reporter: Anton Kalashnikov Assignee: Anton Kalashnikov -- This message was sent by Atlassian JIRA (v7.6.3#76005)