[jira] [Created] (IGNITE-8755) NegativeArraySizeException when trying to serialize in GridClientOptimizedMarshaller humongous object
Ivan Daschinskiy created IGNITE-8755: Summary: NegativeArraySizeException when trying to serialize in GridClientOptimizedMarshaller humongous object Key: IGNITE-8755 URL: https://issues.apache.org/jira/browse/IGNITE-8755 Project: Ignite Issue Type: Bug Components: binary Affects Versions: 2.5 Reporter: Ivan Daschinskiy Fix For: 2.6 When trying to serialize humongous object in GridClientOptimizedMarshaller, NegativeArraySizeException thrown. See below {code:java} java.io.IOException: class org.apache.ignite.IgniteCheckedException: Failed to serialize object: GridClientResponse [clientId=null, reqId=0, destId=null, status=0, errMsg=null, result=org.apache.ignite.internal.processors.rest.protocols.tcp.TcpRestParserSelfTest$HugeObject@60a582c1] at org.apache.ignite.internal.client.marshaller.optimized.GridClientOptimizedMarshaller.marshal(GridClientOptimizedMarshaller.java:101) at org.apache.ignite.internal.processors.rest.protocols.tcp.TcpRestParserSelfTest.testHugeObject(TcpRestParserSelfTest.java:103) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at junit.framework.TestCase.runTest(TestCase.java:176) at org.apache.ignite.testframework.junits.GridAbstractTest.runTestInternal(GridAbstractTest.java:2086) at org.apache.ignite.testframework.junits.GridAbstractTest.access$000(GridAbstractTest.java:140) at org.apache.ignite.testframework.junits.GridAbstractTest$5.run(GridAbstractTest.java:2001) at java.lang.Thread.run(Thread.java:748) Caused by: class org.apache.ignite.IgniteCheckedException: Failed to serialize object: GridClientResponse [clientId=null, reqId=0, destId=null, status=0, errMsg=null, result=org.apache.ignite.internal.processors.rest.protocols.tcp.TcpRestParserSelfTest$HugeObject@60a582c1] at org.apache.ignite.internal.marshaller.optimized.OptimizedMarshaller.marshal0(OptimizedMarshaller.java:206) at org.apache.ignite.marshaller.AbstractNodeNameAwareMarshaller.marshal(AbstractNodeNameAwareMarshaller.java:58) at org.apache.ignite.internal.util.IgniteUtils.marshal(IgniteUtils.java:10059) at org.apache.ignite.internal.client.marshaller.optimized.GridClientOptimizedMarshaller.marshal(GridClientOptimizedMarshaller.java:88) ... 10 more Caused by: java.lang.NegativeArraySizeException at org.apache.ignite.internal.util.io.GridUnsafeDataOutput.requestFreeSize(GridUnsafeDataOutput.java:131) at org.apache.ignite.internal.util.io.GridUnsafeDataOutput.write(GridUnsafeDataOutput.java:166) at org.apache.ignite.internal.marshaller.optimized.OptimizedObjectOutputStream.write(OptimizedObjectOutputStream.java:142) at org.apache.ignite.internal.processors.rest.protocols.tcp.TcpRestParserSelfTest$HugeObject.writeExternal(TcpRestParserSelfTest.java:122) at org.apache.ignite.internal.marshaller.optimized.OptimizedObjectOutputStream.writeExternalizable(OptimizedObjectOutputStream.java:319) at org.apache.ignite.internal.marshaller.optimized.OptimizedClassDescriptor.write(OptimizedClassDescriptor.java:814) at org.apache.ignite.internal.marshaller.optimized.OptimizedObjectOutputStream.writeObject0(OptimizedObjectOutputStream.java:242) at org.apache.ignite.internal.marshaller.optimized.OptimizedObjectOutputStream.writeObjectOverride(OptimizedObjectOutputStream.java:159) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:344) at org.apache.ignite.internal.processors.rest.client.message.GridClientResponse.writeExternal(GridClientResponse.java:103) at org.apache.ignite.internal.marshaller.optimized.OptimizedObjectOutputStream.writeExternalizable(OptimizedObjectOutputStream.java:319) at org.apache.ignite.internal.marshaller.optimized.OptimizedClassDescriptor.write(OptimizedClassDescriptor.java:814) at org.apache.ignite.internal.marshaller.optimized.OptimizedObjectOutputStream.writeObject0(OptimizedObjectOutputStream.java:242) at org.apache.ignite.internal.marshaller.optimized.OptimizedObjectOutputStream.writeObjectOverride(OptimizedObjectOutputStream.java:159) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:344) at org.apache.ignite.internal.marshaller.optimized.OptimizedMarshaller.marshal0(OptimizedMarshaller.java:201) {code} The main cause of this that GridClientOptimizedMarshaller marshall object through OptimizedMarshaller without backed OutputStream, so arithmetical overflow occurs in
[jira] [Created] (IGNITE-8820) Add ability to accept changing txTimeoutOnPartitionMapExchange while waiting for pending transactions.
Ivan Daschinskiy created IGNITE-8820: Summary: Add ability to accept changing txTimeoutOnPartitionMapExchange while waiting for pending transactions. Key: IGNITE-8820 URL: https://issues.apache.org/jira/browse/IGNITE-8820 Project: Ignite Issue Type: Improvement Affects Versions: 2.5 Reporter: Ivan Daschinskiy Assignee: Ivan Daschinskiy Fix For: 2.6 Currently, if ExchangeFuture waits whith old value of txTimeoutOnPartitionMapExchange, new value is not accepted until next exchange starts. Sometimes it's very usefull (while timeout is too long and must be shorter applied immediatelly) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8624) Add test coverage for NPE in TTL Manager [IGNITE-7972]
Ivan Daschinskiy created IGNITE-8624: Summary: Add test coverage for NPE in TTL Manager [IGNITE-7972] Key: IGNITE-8624 URL: https://issues.apache.org/jira/browse/IGNITE-8624 Project: Ignite Issue Type: Test Reporter: Ivan Daschinskiy Assignee: Ivan Daschinskiy Add test coverage (reproducer) to the [IGNITE-7972] case. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8869) PartitionsExchangeOnDiscoveryHistoryOverflowTest
Ivan Daschinskiy created IGNITE-8869: Summary: PartitionsExchangeOnDiscoveryHistoryOverflowTest Key: IGNITE-8869 URL: https://issues.apache.org/jira/browse/IGNITE-8869 Project: Ignite Issue Type: Bug Affects Versions: 2.5 Reporter: Ivan Daschinskiy Fix For: 2.6 After introduction of ExhangeLatches, PartitionsExchangeOnDiscoveryHistoryOverflowTest will hangs permanently. In current implementation, ExchangeLatchManager retrieves alive nodes from discoveryCache with specific affinity topology version and fails because of a too short discovery history. This causes fail of exchange-worker and therefore NoOpFailureHandler leaves node in hanging state. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8429) Unexpected error during incorrect WAL segment decompression, causes node termination.
Ivan Daschinskiy created IGNITE-8429: Summary: Unexpected error during incorrect WAL segment decompression, causes node termination. Key: IGNITE-8429 URL: https://issues.apache.org/jira/browse/IGNITE-8429 Project: Ignite Issue Type: Bug Components: persistence Affects Versions: 2.5 Reporter: Ivan Daschinskiy Fix For: 2.5 File decompressor failure due to incorrect (zero-length) archived segment. 2018-04-30 00:00:02.811 [ERROR][wal-file-decompressor%DPL_GRID%DplGridNodeName][org.apache.ignite.Ignite] Critical system error detected. Will be handled accordingly to configured handler [hnd=class o.a.i.failure.StopNodeOrHaltFailureHandler, failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION, err=java.lang.IllegalStateException: Thread wal-file-decompressor%DPL_GRID%DplGridNodeName is terminated unexpectedly]] java.lang.IllegalStateException: Thread wal-file-decompressor%DPL_GRID%DplGridNodeName is terminated unexpectedly at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileDecompressor.run(FileWriteAheadLogManager.java:2104) 2018-04-30 00:00:02.812 [ERROR][wal-file-decompressor%DPL_GRID%DplGridNodeName][org.apache.ignite.Ignite] JVM will be halted immediately due to the failure: [failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION, err=java.lang.IllegalStateException: Thread wal-file-decompressor%DPL_GRID%DplGridNodeName is terminated unexpectedly]] touch 0754.wal zip 0754.wal.zip 0754.wal ls -l -rw-rw-r-- 1 dmitriy dmitriy 0 май 1 16:40 0754.wal -rw-rw-r-- 1 dmitriy dmitriy 190 май 1 16:46 0754.wal.zip Archive: /tmp/temp/0754.wal.zip Length MethodSize CmprDateTime CRC-32 Name -- --- -- - 0 Stored0 0% 2018-05-01 16:40 0754.wal --- ------ 00 0%1 file We should softly handle this situation: print message in log and continue the compression with next segment. We also should handle "skipped" segments and don't delete them in deleteObsoleteRawSegments(). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8920) Node should be failed when during tx finish indices are corrupted.
Ivan Daschinskiy created IGNITE-8920: Summary: Node should be failed when during tx finish indices are corrupted. Key: IGNITE-8920 URL: https://issues.apache.org/jira/browse/IGNITE-8920 Project: Ignite Issue Type: Bug Affects Versions: 2.5 Reporter: Ivan Daschinskiy Fix For: 2.7 While transaction is processed after receiving finish request (IgniteTxHandler.finish) , node should be failed by FailureHandler if page content of indices is corrupted. Currently this case is not handled properly and cause to long running transactions over the grid. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9183) Proper handling UUID columns, that are added by DDL.
Ivan Daschinskiy created IGNITE-9183: Summary: Proper handling UUID columns, that are added by DDL. Key: IGNITE-9183 URL: https://issues.apache.org/jira/browse/IGNITE-9183 Project: Ignite Issue Type: Bug Components: sql Affects Versions: 2.6, 2.5 Reporter: Ivan Daschinskiy Assignee: Ivan Daschinskiy Fix For: 2.7 Currently, if we added new UUID columnt thru DDL, it is saved to schema as byte[]. So it's impossible to use it with DML without placeholders and put values thru cache api without converting UUID to byte[]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9192) Dump statistics of processing IO messages.
Ivan Daschinskiy created IGNITE-9192: Summary: Dump statistics of processing IO messages. Key: IGNITE-9192 URL: https://issues.apache.org/jira/browse/IGNITE-9192 Project: Ignite Issue Type: Improvement Reporter: Ivan Daschinskiy Assignee: Ivan Daschinskiy When debugging various performance problem, it's crucial to understand how long and what messages are processing. When enabled, this statistics should be collected and dumped in log with predefined frequency. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9023) LinkageError or ClassNotFoundException should not be swollen by GridDeploymentCommunication during processing deployment request.
Ivan Daschinskiy created IGNITE-9023: Summary: LinkageError or ClassNotFoundException should not be swollen by GridDeploymentCommunication during processing deployment request. Key: IGNITE-9023 URL: https://issues.apache.org/jira/browse/IGNITE-9023 Project: Ignite Issue Type: Bug Affects Versions: 2.5 Reporter: Ivan Daschinskiy Fix For: 2.7 In current implementation any error, that is thrown in GridDeploymentCommunication#processResourceRequest, is ignored silently. Any error should be logged and send to client. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8945) Stored cache data files corruption when node stops abruptly.
Ivan Daschinskiy created IGNITE-8945: Summary: Stored cache data files corruption when node stops abruptly. Key: IGNITE-8945 URL: https://issues.apache.org/jira/browse/IGNITE-8945 Project: Ignite Issue Type: Bug Affects Versions: 2.5 Reporter: Ivan Daschinskiy Assignee: Ivan Daschinskiy Fix For: 2.7 When node is halted during saving stored cache data, content of this file can be corrupted. 1. Additional check should be implemented in FilePageStoreManager.readCacheData (print the name of corrupted file) 2. In storeCacheData we need to serialize StoredCacheData to temp file then swap. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8975) Invalid initialization of compressed archived WAL segment when WAL compression is switched off.
Ivan Daschinskiy created IGNITE-8975: Summary: Invalid initialization of compressed archived WAL segment when WAL compression is switched off. Key: IGNITE-8975 URL: https://issues.apache.org/jira/browse/IGNITE-8975 Project: Ignite Issue Type: Bug Affects Versions: 2.5 Reporter: Ivan Daschinskiy Assignee: Ivan Daschinskiy Fix For: 2.7 After restarting node with WAL compression disabled and when compressed wal archive presentd, current implementation of FileWriteAheadLogManager ignores presenting compressed wal segment and initalizes empty brand new one. This causes following error: {code:java} 2018-07-05 16:14:25.761 [ERROR][exchange-worker-#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.p.c.CheckpointHistory] Failed to process checkpoint: CheckpointEntry [id=8dc4b1cc-dedd-4a57-8748-f5a7ecfd389d, timestamp=1530785506909, ptr=FileWALPointer [idx=4520, fileOff=860507725, len=691515]] org.apache.ignite.IgniteCheckedException: Failed to find checkpoint record at the given WAL pointer: FileWALPointer [idx=4520, fileOff=860507725, len=691515] at org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointEntry$GroupStateLazyStore.initIfNeeded(CheckpointEntry.java:346) at org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointEntry$GroupStateLazyStore.access$300(CheckpointEntry.java:231) at org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointEntry.initIfNeeded(CheckpointEntry.java:123) at org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointEntry.groupState(CheckpointEntry.java:105) at org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointHistory.isCheckpointApplicableForGroup(CheckpointHistory.java:377) at org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointHistory.searchAndReserveCheckpoints(CheckpointHistory.java:304) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.reserveHistoryForExchange(GridCacheDatabaseSharedManager.java:1614) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.distributedExchange(GridDhtPartitionsExchangeFuture.java:1139) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:724) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:2477) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2357) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) at java.lang.Thread.run(Thread.java:745) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8203) Interrupting task can cause node fail with PersistenceStorageIOException.
Ivan Daschinskiy created IGNITE-8203: Summary: Interrupting task can cause node fail with PersistenceStorageIOException. Key: IGNITE-8203 URL: https://issues.apache.org/jira/browse/IGNITE-8203 Project: Ignite Issue Type: Bug Components: persistence Affects Versions: 2.4 Reporter: Ivan Daschinskiy Fix For: 2.6 Attachments: GridFailNodesOnCanceledTaskTest.java Interrupting task with simple cache operations (i.e. get, put) can cause PersistenceStorageIOException. Main cause of this failure is lack of proper handling InterruptedException in FilePageStore.init() etc. This cause a throw ClosedByInterruptException by FileChannel.write() and so on. PersistenceStorageIOException is a critical failure and typically makes a node to stop. A reproducer is attached. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8120) Improve test coverage of rebalance failing
Ivan Daschinskiy created IGNITE-8120: Summary: Improve test coverage of rebalance failing Key: IGNITE-8120 URL: https://issues.apache.org/jira/browse/IGNITE-8120 Project: Ignite Issue Type: Test Components: general Affects Versions: 2.4 Reporter: Ivan Daschinskiy Assignee: Ivan Daschinskiy Fix For: 2.5 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10242) NPE in GridDhtPartitionDemander#handleSupplyMessage when concurrently rebalancing and stopping cache in same cache group.
Ivan Daschinskiy created IGNITE-10242: - Summary: NPE in GridDhtPartitionDemander#handleSupplyMessage when concurrently rebalancing and stopping cache in same cache group. Key: IGNITE-10242 URL: https://issues.apache.org/jira/browse/IGNITE-10242 Project: Ignite Issue Type: Bug Affects Versions: 2.6, 2.5 Reporter: Ivan Daschinskiy Fix For: 2.8 NPE in GridDhtPartitionDemander#handleSupplyMessage occurs when concurrently rebalancing and stopping cache in same cache group. Reproducer is attached {noformat} java.lang.NullPointerException at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander.preloadEntry(GridDhtPartitionDemander.java:893) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander.handleSupplyMessage(GridDhtPartitionDemander.java:772) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader.handleSupplyMessage(GridDhtPreloader.java:331) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$5.apply(GridCachePartitionExchangeManager.java:411) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$5.apply(GridCachePartitionExchangeManager.java:401) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1058) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:583) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$700(GridCacheIoManager.java:101) {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9854) NullPointerException in PageMemoryImpl.refreshOutdatedPages during removing from segCheckpointPages
Ivan Daschinskiy created IGNITE-9854: Summary: NullPointerException in PageMemoryImpl.refreshOutdatedPages during removing from segCheckpointPages Key: IGNITE-9854 URL: https://issues.apache.org/jira/browse/IGNITE-9854 Project: Ignite Issue Type: Bug Components: persistence Affects Versions: 2.6 Reporter: Ivan Daschinskiy Fix For: 2.8 Because of possibility of concurrently setting segCheckpointPages to null of segment not under segment writeLock (i.e. in PageMemoryImpl#finishCheckpoint), NullPointerException is possible. This causes immediate node failure. Example stack trace is attached (failure during iteration in rebalance supplier). {code:java} java.lang.NullPointerException: null at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.refreshOutdatedPage(PageMemoryImpl.java:840) at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.access$5100(PageMemoryImpl.java:120) at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl$Segment.removePageForReplacement(PageMemoryImpl.java:2175) at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl$Segment.access$900(PageMemoryImpl.java:1841) at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.acquirePage(PageMemoryImpl.java:686) at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.acquirePage(PageMemoryImpl.java:627) at org.apache.ignite.internal.processors.cache.persistence.CacheDataRowAdapter.initFromLink(CacheDataRowAdapter.java:140) at org.apache.ignite.internal.processors.cache.persistence.CacheDataRowAdapter.initFromLink(CacheDataRowAdapter.java:102) at org.apache.ignite.internal.processors.cache.tree.DataRow.(DataRow.java:54) at org.apache.ignite.internal.processors.cache.tree.CacheDataRowStore.dataRow(CacheDataRowStore.java:73) at org.apache.ignite.internal.processors.cache.tree.CacheDataTree.getRow(CacheDataTree.java:146) at org.apache.ignite.internal.processors.cache.tree.CacheDataTree.getRow(CacheDataTree.java:41) at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$ForwardCursor.fillFromBuffer(BPlusTree.java:4660) at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$ForwardCursor.nextPage(BPlusTree.java:4760) at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$ForwardCursor.next(BPlusTree.java:4689) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9452) Correct GridInternalTaskUnusedWalSegmentsTest after merging IGNITE-6552
Ivan Daschinskiy created IGNITE-9452: Summary: Correct GridInternalTaskUnusedWalSegmentsTest after merging IGNITE-6552 Key: IGNITE-9452 URL: https://issues.apache.org/jira/browse/IGNITE-9452 Project: Ignite Issue Type: Test Reporter: Ivan Daschinskiy Assignee: Ivan Daschinskiy Fix For: 2.8 After merging IGNITE-6552 need to correct GridInternalTaskUnusedWalSegmentsTest a little bit. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9658) Add ability to disable memory deallocation on deactivation.
Ivan Daschinskiy created IGNITE-9658: Summary: Add ability to disable memory deallocation on deactivation. Key: IGNITE-9658 URL: https://issues.apache.org/jira/browse/IGNITE-9658 Project: Ignite Issue Type: Improvement Reporter: Ivan Daschinskiy Assignee: Ivan Daschinskiy Fix For: 2.8 Currently, in some systems (i.e. RHEL 7.4), we can see, that during massive UNSAFE.freeMemory process freezes. This behaviour can lead to SEGMENTATION of node, especcially when ZookeeperDiscoverySPI is used. There should be an abillity to disable memory deallocation during deactivation of cluster. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10997) Add new property to DataRegionMetrics: empty pages count in reuseList.
Ivan Daschinskiy created IGNITE-10997: - Summary: Add new property to DataRegionMetrics: empty pages count in reuseList. Key: IGNITE-10997 URL: https://issues.apache.org/jira/browse/IGNITE-10997 Project: Ignite Issue Type: Improvement Reporter: Ivan Daschinskiy Fix For: 2.8 In order to estimate available space in data region, new property should be added in dataregions metrics -- empty pages count from org.apache.ignite.internal.processors.cache.persistence.freelist.AbstractFreeList#emptyDataPages -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10339) Connection to cluster failed in control.sh while using --cache
Ivan Daschinskiy created IGNITE-10339: - Summary: Connection to cluster failed in control.sh while using --cache Key: IGNITE-10339 URL: https://issues.apache.org/jira/browse/IGNITE-10339 Project: Ignite Issue Type: Bug Affects Versions: 2.6 Reporter: Ivan Daschinskiy Assignee: Ivan Daschinskiy Fix For: 2.8 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10406) .NET Failed to run ScanQuery with custom filter after server node restart
Ivan Daschinskiy created IGNITE-10406: - Summary: .NET Failed to run ScanQuery with custom filter after server node restart Key: IGNITE-10406 URL: https://issues.apache.org/jira/browse/IGNITE-10406 Project: Ignite Issue Type: Bug Reporter: Ivan Daschinskiy Scenario: 1. Start server 2. Start client. 3. Restart server and wait for client reconnects the server. 4. Put some data to cache and run ScanQuery with custom filter StackTrace: {code:java} class org.apache.ignite.IgniteCheckedException: Failed to inject resource [method=setIgniteInstance, target=org.apache.ignite.internal.processors.platform.cache.PlatformCacheEntryFilterImpl@6225c21c, rsrc=IgniteKernal [cfg=IgniteConfiguration [igniteInstanceName=CashflowCluster, pubPoolSize=8, svcPoolSize=8, callbackPoolSize=8, stripedPoolSize=8, sysPoolSize=8, mgmtPoolSize=4, igfsPoolSize=4, dataStreamerPoolSize=8, utilityCachePoolSize=8, utilityCacheKeepAliveTime=6, p2pPoolSize=2, qryPoolSize=8, igniteHome=C:\Job\fd-tasks\7404\IgniteTests2\packages\Apache.Ignite.2.6.0, igniteWorkDir=C:\Job\fd-tasks\7404\IgniteTests2\packages\Apache.Ignite.2.6.0\work, mbeanSrv=com.sun.jmx.mbeanserver.JmxMBeanServer@49993335, nodeId=3f4aadd9-01b3-4ffe-b629-895fb6ac886f, marsh=org.apache.ignite.internal.binary.BinaryMarshaller@77a57272, mar shLocJobs=false, daemon=false, p2pEnabled=false, netTimeout=5000, sndRetryDelay=1000, sndRetryCnt=3, metricsHistSize=1, metricsUpdateFreq=2000, metricsExpTime=9223372036854775807, discoSpi=TcpDiscoverySpi [addrRslvr=null, sockTimeout=5000, ackTimeout=5000, marsh=JdkMarshaller [clsFilter=org.apache.ignite.marshaller.MarshallerUtils$1@65b1c1e3], reconCnt=10, reconDelay=2000, maxAckTimeout=60, forceSrvMode=fals e, clientReconnectDisabled=false, internalLsnr=null], segPlc=STOP, segResolveAttempts=2, waitForSegOnStart=true, allResolversPassReq=true, segChkFreq=1, commSpi=TcpCommunicationSpi [connectGate=org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$ConnectGateway@4737110c, connPlc=org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$6@bce0ed4, enableForcibleNodeKill=false, enableTroubleshootingLog=fa lse, srvLsnr=org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$2@11c20519, locAddr=null, locHost=0.0.0.0/0.0.0.0, locPort=47100, locPortRange=100, shmemPort=-1, directBuf=true, directSndBuf=false, idleConnTimeout=60, connTimeout=5000, maxConnTimeout=60, reconCnt=10, sockSndBuf=32768, sockRcvBuf=32768, msgQueueLimit=0, slowClientQueueLimit=0, nioSrvr=GridNioServer [selectorSpins=0, filterChain=Filte rChain[filters=[GridNioCodecFilter [parser=org.apache.ignite.internal.util.nio.GridDirectParser@6839fd4e, directMode=true], GridConnectionBytesVerifyFilter], lsnr=org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$2@11c20519, closed=false, directBuf=true, tcpNoDelay=true, sockSndBuf=32768, sockRcvBuf=32768, writeTimeout=2000, idleTimeout=60, skipWrite=false, skipRead=false, locAddr=0.0.0.0/0.0.0.0:47100 , order=LITTLE_ENDIAN, sndQueueLimit=0, directMode=true, metricsLsnr=org.apache.ignite.spi.communication.tcp.TcpCommunicationMetricsListener@4e41089d, sslFilter=null, msgQueueLsnr=null, readerMoveCnt=0, writerMoveCnt=0, readWriteSelectorsAssign=false], shmemSrv=null, usePairedConnections=false, connectionsPerNode=1, tcpNoDelay=true, filterReachableAddresses=false, ackSndThreshold=32, unackedMsgsBufSize=0, sockWriteT imeout=2000, lsnr=org.apache.ignite.internal.managers.communication.GridIoManager$2@432d2e4e, boundTcpPort=47100, boundTcpShmemPort=-1, selectorsCnt=4, selectorSpins=0, addrRslvr=null, ctxInitLatch=java.util.concurrent.CountDownLatch@70beb599[Count = 0], stopping=false, metricsLsnr=org.apache.ignite.spi.communication.tcp.TcpCommunicationMetricsListener@4e41089d], evtSpi=org.apache.ignite.spi.eventstorage.NoopEventSt orageSpi@32a068d1, colSpi=NoopCollisionSpi [], deploySpi=LocalDeploymentSpi [lsnr=org.apache.ignite.internal.managers.deployment.GridDeploymentLocalStore$LocalDeploymentListener@3c6df856], indexingSpi=org.apache.ignite.spi.indexing.noop.NoopIndexingSpi@282003e1, addrRslvr=null, clientMode=false, rebalanceThreadPoolSize=1, txCfg=org.apache.ignite.configuration.TransactionConfiguration@7fad8c79, cacheSanityCheckEnable d=true, discoStartupDelay=6, deployMode=SHARED, p2pMissedCacheSize=100, locHost=null, timeSrvPortBase=31100, timeSrvPortRange=100, failureDetectionTimeout=1, clientFailureDetectionTimeout=3, metricsLogFreq=6, hadoopCfg=null, connectorCfg=org.apache.ignite.configuration.ConnectorConfiguration@71a794e5, odbcCfg=null, warmupClos=null, atomicCfg=AtomicConfiguration [seqReserveSize=1000, cacheMode=PARTITI ONED, backups=1, aff=null, grpName=null], classLdr=null, sslCtxFactory=null, platformCfg=PlatformDotNetConfiguration [binaryCfg=null], binaryCfg=null,
[jira] [Created] (IGNITE-11400) Rebalancing caches with TTL enabled can cause data corruption.
Ivan Daschinskiy created IGNITE-11400: - Summary: Rebalancing caches with TTL enabled can cause data corruption. Key: IGNITE-11400 URL: https://issues.apache.org/jira/browse/IGNITE-11400 Project: Ignite Issue Type: Bug Affects Versions: 2.5 Reporter: Ivan Daschinskiy During or just after rebalancing caches with TTL enabled, data corruption can occurs while ttl-cleanup-worker purges expired data. See details in log {code:java} [15:24:01,677][INFO ][sys-#49%datafabric-dev-21.example.com%][GridDhtPartitionDemander] Started rebalance routine [M2_PRODUCT_CACHE, supplier=14c0d3aa-6720-4c7f-a0e5-3ae1a00948b6, topic=0, fullPartitions=[1, 55, 112, 153, 170, 175, 204, 236, 247, 331, 347, 417, 473, 503, 514, 524, 551, 745, 748, 752, 762, 803, 816, 831, 851, 877, 928, 939], histPartitions=[]] [15:24:02,031][ERROR][ttl-cleanup-worker-#39%datafabric-dev-21.example.com%][GridCacheTtlManager] Failed to process entry expiration: class o.a.i.i.processors.cache.persistence.tree.CorruptedTreeException: Runtime failure on bounds: [lower=null, upper=PendingRow []] class org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException: Runtime failure on bounds: [lower=null, upper=PendingRow []] at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.find(BPlusTree.java:1000) at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.find(BPlusTree.java:979) at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.purgeExpiredInternal(GridCacheOffheapManager.java:1957) at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.purgeExpired(GridCacheOffheapManager.java:1913) at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager.expire(GridCacheOffheapManager.java:861) at org.apache.ignite.internal.processors.cache.GridCacheTtlManager.expire(GridCacheTtlManager.java:207) at org.apache.ignite.internal.processors.cache.GridCacheSharedTtlCleanupManager$CleanupWorker.body(GridCacheSharedTtlCleanupManager.java:142) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.IllegalArgumentException: Invalid object type: 0 at org.apache.ignite.internal.processors.cacheobject.IgniteCacheObjectProcessorImpl.toKeyCacheObject(IgniteCacheObjectProcessorImpl.java:166) at org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.toKeyCacheObject(CacheObjectBinaryProcessorImpl.java:980) at org.apache.ignite.internal.processors.cache.persistence.CacheDataRowAdapter.readFullRow(CacheDataRowAdapter.java:299) at org.apache.ignite.internal.processors.cache.persistence.CacheDataRowAdapter.initFromLink(CacheDataRowAdapter.java:159) at org.apache.ignite.internal.processors.cache.persistence.CacheDataRowAdapter.initFromLink(CacheDataRowAdapter.java:102) at org.apache.ignite.internal.processors.cache.tree.PendingRow.initKey(PendingRow.java:72) at org.apache.ignite.internal.processors.cache.tree.PendingEntriesTree.getRow(PendingEntriesTree.java:118) at org.apache.ignite.internal.processors.cache.tree.PendingEntriesTree.getRow(PendingEntriesTree.java:31) at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$ForwardCursor.fillFromBuffer(BPlusTree.java:4702) at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$ForwardCursor.init(BPlusTree.java:4604) at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$ForwardCursor.access$5000(BPlusTree.java:4543) at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findLowerUnbounded(BPlusTree.java:956) at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.find(BPlusTree.java:988) ... 8 more [15:24:02,348][INFO ][sys-#52%datafabric-dev-21.example.com%][GridDhtPartitionDemander] Started rebalance routine [CART_CACHE, supplier=70be2776-9e8f-4940-8a07-5e3c0ad43bdd, topic=0, fullPartitions=[561, 897], histPartitions=[]] [15:24:02,439][INFO ][sys-#48%datafabric-dev-21.example.com%][GridDhtPartitionDemander] Completed (final) rebalancing [grp=CART_CACHE, supplier=14c0d3aa-6720-4c7f-a0e5-3ae1a00948b6, topVer=AffinityTopologyVersion [topVer=921, minorTopVer=0], progress=5/5, time=95 ms] [15:24:02,558][ERROR][ttl-cleanup-worker-#39%datafabric-dev-21.example.com%][GridCacheTtlManager] Failed to process entry expiration: class o.a.i.i.processors.cache.persistence.tree.CorruptedTreeException: Runtime failure on bounds: [lower=null, upper=PendingRow []] class
[jira] [Created] (IGNITE-11364) Segmenting node can cause ring topology broke
Ivan Daschinskiy created IGNITE-11364: - Summary: Segmenting node can cause ring topology broke Key: IGNITE-11364 URL: https://issues.apache.org/jira/browse/IGNITE-11364 Project: Ignite Issue Type: Bug Affects Versions: 2.7, 2.6, 2.5 Reporter: Ivan Daschinskiy Fix For: 2.8 While segmenting by partial network drop, i.e. by applying iptables rules, can cause ring broke. Scenario: On each machine there are two nodes, client and server respectivelly. Lets draw diagram (only server nodes for brevity, they have been started before clients). => grid915 => ... => grid947 => grid945 => grid703 => ..skip 12 nodes...=> grid952 => grid946. On grid945 machine we drop incoming/outgoing connections by iptables. During ongoing drop of connection, grid945 send TcpDiscoveryStatusCheckMessage, but cannot send them to grid703 and others mentioned above 12 nodes, but some next node accepted it with collection of failedNodes (13 nodes above). This message was received by grid947 and it skip these 13 nodes in org.apache.ignite.spi.discovery.tcp.ServerImpl.RingMessageWorker#sendMessageAcrossRing. So we see this situation in topology: .. => grid947 => grid952 ^ // grid703=>=>grid662 These nodes are not considere by topology as failed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11139) Remove deprecated snapshot tags from PageMetaIO.
Ivan Daschinskiy created IGNITE-11139: - Summary: Remove deprecated snapshot tags from PageMetaIO. Key: IGNITE-11139 URL: https://issues.apache.org/jira/browse/IGNITE-11139 Project: Ignite Issue Type: Improvement Reporter: Ivan Daschinskiy Fix For: 3.0 After resolving IGNITE-9672, unnecessary methods from PageMetaIO should be removed. Also corresponding PageDeltaRecords should be also removed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-12897) Add .NET api to enabling SQL indexing for existing cache.
Ivan Daschinskiy created IGNITE-12897: - Summary: Add .NET api to enabling SQL indexing for existing cache. Key: IGNITE-12897 URL: https://issues.apache.org/jira/browse/IGNITE-12897 Project: Ignite Issue Type: Bug Components: platforms Reporter: Ivan Daschinskiy Assignee: Ivan Daschinskiy Add .NET api to enabling SQL indexing for existing cache. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-12999) Fix broken ZookeeperDiscoverySpiSslTest.testIgniteSslWrongPort
Ivan Daschinskiy created IGNITE-12999: - Summary: Fix broken ZookeeperDiscoverySpiSslTest.testIgniteSslWrongPort Key: IGNITE-12999 URL: https://issues.apache.org/jira/browse/IGNITE-12999 Project: Ignite Issue Type: Test Reporter: Ivan Daschinskiy Assignee: Ivan Daschinskiy After merging [IGNITE-12992|https://issues.apache.org/jira/browse/IGNITE-12992] to master, mentioned above test, that was initially broken, starts to fail in master. This is because actual zk connection string was set, but not wrong. So node joins and assertion fails. Fix is trivial. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-12992) Fix multijvm failing tests in ZookeeperDiscoverySpiTestSuite3
Ivan Daschinskiy created IGNITE-12992: - Summary: Fix multijvm failing tests in ZookeeperDiscoverySpiTestSuite3 Key: IGNITE-12992 URL: https://issues.apache.org/jira/browse/IGNITE-12992 Project: Ignite Issue Type: Test Reporter: Ivan Daschinskiy Assignee: Ivan Daschinskiy -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-13043) Fix compilation error in Ignite C++, when boost version is greater than 1.70
Ivan Daschinskiy created IGNITE-13043: - Summary: Fix compilation error in Ignite C++, when boost version is greater than 1.70 Key: IGNITE-13043 URL: https://issues.apache.org/jira/browse/IGNITE-13043 Project: Ignite Issue Type: Bug Reporter: Ivan Daschinskiy Assignee: Ivan Daschinskiy Fix compilation issue when libboost greater than 1.71 in TeamcityBoostLogFormatter -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-13042) Update SSL certificates in C++ test suites to more secure signature
Ivan Daschinskiy created IGNITE-13042: - Summary: Update SSL certificates in C++ test suites to more secure signature Key: IGNITE-13042 URL: https://issues.apache.org/jira/browse/IGNITE-13042 Project: Ignite Issue Type: Test Components: platforms Reporter: Ivan Daschinskiy When modern openssl is used (i.e OpenSSL 1.1.1f, which is default for ubuntu 20.04, for example), provided certificates are not accepted, because use sha1withRsaEncription signature, that is widely considered flaw. So certificates needs to be renewed. Example error: {code} Connecting to 127.0.0.1:0 140246535644992:error:140AB18E:SSL routines:SSL_CTX_use_certificate:ca md too weak:../ssl/ssl_rsa.c:310: Failed to connect :Can not set client certificate file for secure connection: path /home/ivandasch/ignite/modules/platforms/cpp/thin-client-test/config/ssl/client_full.pem {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-13430) Create minimal documentation for ducktape tests
Ivan Daschinskiy created IGNITE-13430: - Summary: Create minimal documentation for ducktape tests Key: IGNITE-13430 URL: https://issues.apache.org/jira/browse/IGNITE-13430 Project: Ignite Issue Type: Task Components: documentation Reporter: Ivan Daschinskiy Assignee: Sergei Ryzhov Create minimal quickstart documentation in {{README.md}} Documentation should contain following: # Requirements for development # Requirements for running tests locally # Exact algorithm how to run tests locally (full suite, particular suite, particular test) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-13429) Implement integration tests for control.sh transactions management
Ivan Daschinskiy created IGNITE-13429: - Summary: Implement integration tests for control.sh transactions management Key: IGNITE-13429 URL: https://issues.apache.org/jira/browse/IGNITE-13429 Project: Ignite Issue Type: Test Reporter: Ivan Daschinskiy Assignee: Ivan Daschinskiy -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-13508) Test scenario of two-phased rebalance (PDS reduce)
Ivan Daschinskiy created IGNITE-13508: - Summary: Test scenario of two-phased rebalance (PDS reduce) Key: IGNITE-13508 URL: https://issues.apache.org/jira/browse/IGNITE-13508 Project: Ignite Issue Type: Test Reporter: Ivan Daschinskiy Let us assume a cluster of 16 affinity nodes. Lets divide cluster in 4 equal cells: Each node in cell has the same node attribute {{CELL=CELL_}} Caches, that will be started on nodes, should have affinity function with this backup filter: {code:java} public class CellularAffinityBackupFilter implements IgniteBiPredicate> { private static final long serialVersionUID = 1L; private final String attrName; public CellularAffinityBackupFilter(String attrName) { this.attrName = attrName; } @Override public boolean apply(ClusterNode candidate, List previouslySelected) { for (ClusterNode node : previouslySelected) return Objects.equals(candidate.attribute(attrName), node.attribute(attrName)); return true; } } {code} Steps: * Preparations. 1. Start all 4 cells. 2. Load data to cache with the mentioned above affinity function and fix PDS size on all nodes. 3. Delete 80% of data and fix PDS size on all nodes. * Phase 1 1. Stop two nodes in each cell, total a half of all nodes and clean PDS. 2. Start cleaned node with preservance of consistent id and cell attributes. 3. Wait for rebalance finished. * Phase 2 Run steps 1-3 of Phase 2 on the other half of the cluster. * Verifications 1. Check that PDS size reduced (compare to step 3) 2. Check data consistency (idle_verify --dump) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-13540) Exchange worker, waiting for new task from queue, considered as blocked.
Ivan Daschinskiy created IGNITE-13540: - Summary: Exchange worker, waiting for new task from queue, considered as blocked. Key: IGNITE-13540 URL: https://issues.apache.org/jira/browse/IGNITE-13540 Project: Ignite Issue Type: Bug Reporter: Ivan Daschinskiy Assignee: Ivan Daschinskiy Waiting for new task in ExchangeWorker#body now is not marking as blocking section. So if network timeout (timeout for polling task from queue) is greater than system worker blocked timeout, exchange worker thread is considered as blocking. Sometimes this is reported in logs after few seconds when actually PME is finished {noformat} [2020-10-06 16:55:45,939][INFO ][exchange-worker-#50][org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager1] Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion [topVer=6, minorTopVer=1], force=false, evt=DISCOVERY_CUSTOM_EVT, node=163fd0f0-b9a4-4317-a28f-f7dbdb776076] [2020-10-06 16:55:48,822][ERROR][tcp-disco-msg-worker-[9e18957a 172.18.0.5:47500]-#2-#44][org.apache.ignite.internal.util.typedef.G1] Blocked system-critical thread has been detected. This can lead to cluster-wide undefined behaviour [workerName=partition-exchanger, threadName=exchange-worker-#50, blockedFor=2s] [2020-10-06 16:55:48,824][WARN ][tcp-disco-msg-worker-[9e18957a 172.18.0.5:47500]-#2-#44][org.apache.ignite.internal.util.typedef.G1] Thread [name="exchange-worker-#50", id=90, state=TIMED_WAITING, blockCnt=20, waitCnt=48] Lock [object=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@14f29e0e, ownerName=null, ownerId=-1] [2020-10-06 16:55:48,827][WARN ][tcp-disco-msg-worker-[9e18957a 172.18.0.5:47500]-#2-#44][root1] Possible failure suppressed accordingly to a configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker [name=partition-exchanger, igniteInstanceName=null, finished=false, heartbeatTs=1601992545941]]] class org.apache.ignite.IgniteException: GridWorker [name=partition-exchanger, igniteInstanceName=null, finished=false, heartbeatTs=1601992545941] at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$3.apply(IgnitionEx.java:1860) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$3.apply(IgnitionEx.java:1855) at org.apache.ignite.internal.worker.WorkersRegistry.onIdle(WorkersRegistry.java:234) at org.apache.ignite.internal.util.worker.GridWorker.onIdle(GridWorker.java:299) {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-13572) Duplicates in select query during partition eviction.
Ivan Daschinskiy created IGNITE-13572: - Summary: Duplicates in select query during partition eviction. Key: IGNITE-13572 URL: https://issues.apache.org/jira/browse/IGNITE-13572 Project: Ignite Issue Type: Bug Affects Versions: 2.8.1, 2.9 Reporter: Ivan Daschinskiy Scenario: # Starts 2 node with indexed atomic partitioned cache with 0 backups. # Loads sufficient amout of data (or emulate slow removal from idx) # Start another node. # Perform SELECT * FROM . Reproducer is attached -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-13575) Invalid blocking section in GridNioWorker and GridNioClientWorker leads to false positive blocking thread detection
Ivan Daschinskiy created IGNITE-13575: - Summary: Invalid blocking section in GridNioWorker and GridNioClientWorker leads to false positive blocking thread detection Key: IGNITE-13575 URL: https://issues.apache.org/jira/browse/IGNITE-13575 Project: Ignite Issue Type: Bug Affects Versions: 2.8.1, 2.9 Reporter: Ivan Daschinskiy Assignee: Ivan Daschinskiy If {{IGNITE_SYSTEM_WORKER_BLOCKED_TIMEOUT}} less than 2000 ms, then simple {{epoll_wait}} for 2000 on idle cluster is considered as critical failure. We should surround {{selector.select}} with {{blockingSectionBegin}} and {{blockingSectionEnd}} instead of {{updateHeartbeat}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-13564) Improve SYSTEM_WORKER_BLOCKED reporting.
Ivan Daschinskiy created IGNITE-13564: - Summary: Improve SYSTEM_WORKER_BLOCKED reporting. Key: IGNITE-13564 URL: https://issues.apache.org/jira/browse/IGNITE-13564 Project: Ignite Issue Type: Improvement Affects Versions: 2.8.1, 2.9 Reporter: Ivan Daschinskiy Assignee: Ivan Daschinskiy Fix For: 2.10 Currently, reporting of system thread blocking has major drawbacks. 1. As system worker blocking is detected by another thread, due to implementation, failure handler receives not full information about problem. In {{FailureContext}} we have only two fields -- {{type}} and {{err}}. Throwable {{err}} is generated in thread-detector flow, so we lost a context of main problem. 2. Currently, due to implementation, we print not full stacktrace of blocking thread in {{org.apache.ignite.internal.worker.WorkersRegistry#onIdle}}. 3. Current approach doesn't work when there is one thread in registry, this fact isn't checked and this can cause to infinite looping of single thread, calling {{onIdle}} This two drawbacks can lead to completely loss of information about blocking system thread. I suggests: 1. Add another parameter in {{FailureContext}}, namely {{worker}} 2. Fix threaddump printing. 3. Add assertion when there is only one system thread in registry -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-13495) ZookeeperDiscoverySpiMBeanImpl#getCoordinator can return invalid node as coordinator
Ivan Daschinskiy created IGNITE-13495: - Summary: ZookeeperDiscoverySpiMBeanImpl#getCoordinator can return invalid node as coordinator Key: IGNITE-13495 URL: https://issues.apache.org/jira/browse/IGNITE-13495 Project: Ignite Issue Type: Bug Affects Versions: 2.9 Reporter: Ivan Daschinskiy Fix For: 2.10 Due to invalid algorithm in {{org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoveryImpl#getCoordinator}} sometimes invalid coordinator could be return Consider scenarion: 1. Start server #1 2. Start client 3. Start server #2 4. Stop server #1 After this, {{ZookeeperDiscoverySpiMBeanImpl#getCoordinator}} returns as coordinator a client, because it is the oldest node in topology. We should rewrite {{org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoveryImpl#getCoordinator}} to return *oldest server*, not any node. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-13481) Decorators @version_if and @ignite_versions injects incorrect variables.
Ivan Daschinskiy created IGNITE-13481: - Summary: Decorators @version_if and @ignite_versions injects incorrect variables. Key: IGNITE-13481 URL: https://issues.apache.org/jira/browse/IGNITE-13481 Project: Ignite Issue Type: Bug Reporter: Ivan Daschinskiy Assignee: Ivan Daschinskiy Sometimes these decorators injects variables incorrectly, especially when mixed. Need to fix corner cases and checks them in unit test. As a side effect, introduce unit testing in ducktests module -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-13491) Fix incorrect topology snapshot logger output about coordinator change.
Ivan Daschinskiy created IGNITE-13491: - Summary: Fix incorrect topology snapshot logger output about coordinator change. Key: IGNITE-13491 URL: https://issues.apache.org/jira/browse/IGNITE-13491 Project: Ignite Issue Type: Bug Reporter: Ivan Daschinskiy Currently, logic in {{org.apache.ignite.internal.managers.discovery.GridDiscoveryManager#topologySnapshotMessage}} has major drawback, in condition we don't check that failed node with order less than oldest server node, is actually server node. So we can see invalid message about coordinator change, event though previous node was a client. Reproducer: 1. Start server #1 2. Start client 3. Start server #1 4. Stop server #1 and client We will see in logs of server #2 something like this: {{[2020-09-29 10:41:25,909][INFO ][disco-event-worker-#150%tcp.TcpDiscoverySpiMBeanTest2%][GridDiscoveryManager] Coordinator changed [prev=TcpDiscoveryNode [id=371896fb-f612-4640-bfcd-cef6d281, consistentId=371896fb-f612-4640-bfcd-cef6d281, addrs=ArrayList [127.0.0.1], sockAddrs=HashSet [/127.0.0.1:0], discPort=0, order=2, intOrder=2, lastExchangeTime=1601365285287, loc=false, ver=2.10.0#20200929-sha1:, *isClient=true*], cur=TcpDiscoveryNode [id=9d90f4b0-1374-4147-b7a7-d821f002, consistentId=127.0.0.1:47501, addrs=ArrayList [127.0.0.1], sockAddrs=HashSet [/127.0.0.1:47501], discPort=47501, order=3, intOrder=3, lastExchangeTime=1601365285900, loc=true, ver=2.10.0#20200929-sha1:, isClient=false]]}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-13078) С++: Add CMake build support
Ivan Daschinskiy created IGNITE-13078: - Summary: С++: Add CMake build support Key: IGNITE-13078 URL: https://issues.apache.org/jira/browse/IGNITE-13078 Project: Ignite Issue Type: Improvement Components: platforms Reporter: Ivan Daschinskiy Assignee: Ivan Daschinskiy Fix For: 2.9 Currently, it is hard to build Ignite.C++. Different build process for windows and linux, lack of building support on Mac OS X (quite popular OS among developers), absolutely not IDE support, except windows and only Visual Studio is supported. I’d suggest to migrate to CMake build system. It is very popular among open source projects, and in Apache Software Foundation too. Notable user: Apache Mesos, Apache Zookeeper (C client offers CMake as an alternative to autoconf and only option on windows), Apache Kafka (librdkafka - C/C++ client), Apache Thrift. Popular column-oriented database ClickHouse also uses CMake. CMake is widely supported in many IDE’s on various platforms, notably Visual Studio, CLion, Xcode, QtCreator, KDevelop. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-13459) Document new build process for Ignite C++
Ivan Daschinskiy created IGNITE-13459: - Summary: Document new build process for Ignite C++ Key: IGNITE-13459 URL: https://issues.apache.org/jira/browse/IGNITE-13459 Project: Ignite Issue Type: Task Components: documentation Reporter: Ivan Daschinskiy Assignee: Ivan Daschinskiy Fix For: 2.9 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-13321) Control utility doesn't print results to stdout.
Ivan Daschinskiy created IGNITE-13321: - Summary: Control utility doesn't print results to stdout. Key: IGNITE-13321 URL: https://issues.apache.org/jira/browse/IGNITE-13321 Project: Ignite Issue Type: Bug Components: control.sh Affects Versions: 2.10 Reporter: Ivan Daschinskiy Fix For: 2.10 After merging [IGNITE-13123|https://issues.apache.org/jira/browse/IGNITE-13123] {{control.sh}} doesn't work properly either in dev mode, or in release mode. Specifically, no output printed in stdout. However, For example, incorrect output for {{control.sh --activate}} after commit s: {code:sh} Control utility [ver. 2.9.0-SNAPSHOT#20200803-sha1:DEV] 2020 Copyright(C) Apache Software Foundation User: ivandasch Time: 2020-08-03T17:21:06.246 Command [BASELINE] started Arguments: --baseline Failed to execute baseline command='collect' Latest topology update failed. Connection to cluster failed. Latest topology update failed. Command [BASELINE] finished with code: 2 Control utility has completed execution at: 2020-08-03T17:21:09.613 Execution time: 3367 ms {code} Correct output for {{control.sh --activate}} before commit is: {code} Control utility [ver. 2.8.1#20200521-sha1:86422096], 2020 Copyright(C) Apache Software Foundation, User: ducker, Time: 2020-08-03T14:23:55.793, Command [BASELINE] started, Arguments: --host ducker04 --baseline set ducker02,ducker03,ducker04 --yes, , Cluster state: active, Current topology version: 3, Baseline auto adjustment disabled: softTimeout=30 Current topology version: 3 (Coordinator: ConsistentId=ducker02, Order=1) Baseline nodes: ConsistentId=ducker02, State=ONLINE, Order=1, ConsistentId=ducker03, State=ONLINE, Order=2, ConsistentId=ducker04, State=ONLINE, Order=3, , "Number of baseline nodes: 3\n", "\n", "Other nodes not found.\n", "Command [BASELINE] finished with code: 0\n", "Control utility has completed execution at: 2020-08-03T14:23:57.351\n", "Execution time: 1558 ms\n" {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-13328) Control.sh bash script swallow return code of CommandHandler and always return 0
Ivan Daschinskiy created IGNITE-13328: - Summary: Control.sh bash script swallow return code of CommandHandler and always return 0 Key: IGNITE-13328 URL: https://issues.apache.org/jira/browse/IGNITE-13328 Project: Ignite Issue Type: Bug Affects Versions: 2.8.1, 2.8 Reporter: Ivan Daschinskiy Fix For: 2.9 After merging [IGNITE-12367|https://issues.apache.org/jira/browse/IGNITE-12367], control.sh always return 0, despite the fact that CommandHandler returns correct code. For example: Ignite 2.8.1 {code} Failed to execute baseline command='collect' Latest topology update failed. Connection to cluster failed. Latest topology update failed. Command [BASELINE] finished with code: 2 Control utility has completed execution at: 2020-08-05T15:01:34.123 Execution time: 26627 ms >>> echo $? 0 {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-13176) C++: Remove autotools build after merging CMake
Ivan Daschinskiy created IGNITE-13176: - Summary: C++: Remove autotools build after merging CMake Key: IGNITE-13176 URL: https://issues.apache.org/jira/browse/IGNITE-13176 Project: Ignite Issue Type: Improvement Components: platforms Reporter: Ivan Daschinskiy Assignee: Ivan Daschinskiy Fix For: 2.9 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-13187) Jar hell in classpath leads to failed tests in C++ and .NET suites
Ivan Daschinskiy created IGNITE-13187: - Summary: Jar hell in classpath leads to failed tests in C++ and .NET suites Key: IGNITE-13187 URL: https://issues.apache.org/jira/browse/IGNITE-13187 Project: Ignite Issue Type: Test Components: platforms Affects Versions: 2.8.1 Environment: Apache Ignite TC. Reporter: Ivan Daschinskiy On some agents tests and examples start failing with this calltrace: {code:java} [13:53:52]java.lang.NoSuchFieldError: logger [13:53:52] at org.springframework.beans.factory.support.DefaultListableBeanFactory.preInstantiateSingletons(DefaultListableBeanFactory.java:723) [13:53:52] at org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:881) [13:53:52] at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:551) [13:53:52] at org.apache.ignite.internal.util.spring.IgniteSpringHelperImpl.applicationContext(IgniteSpringHelperImpl.java:381) [13:53:52] at org.apache.ignite.internal.util.spring.IgniteSpringHelperImpl.loadConfigurations(IgniteSpringHelperImpl.java:104) [13:53:52] at org.apache.ignite.internal.util.spring.IgniteSpringHelperImpl.loadConfigurations(IgniteSpringHelperImpl.java:98) [13:53:52] at org.apache.ignite.internal.IgnitionEx.loadConfigurations(IgnitionEx.java:709) [13:53:52] at org.apache.ignite.internal.IgnitionEx.loadConfiguration(IgnitionEx.java:767) [13:53:52] at org.apache.ignite.internal.processors.platform.PlatformIgnition.configuration(PlatformIgnition.java:152) [13:53:52] at org.apache.ignite.internal.processors.platform.PlatformIgnition.start(PlatformIgnition.java:67) {code} The main reason of failure is jar-hell. When .NET or C++ tests are started, if IGNITE_NATIVE_TEST_CLASSPATH is set to true, source directory is iterated and files libs, target/classes etc.are added to classpath. But neither readdir(), FindNextFileA() or Directory.EnumerateDirectories() do guarantee any ordering. But in spring-data-2.0 and spring-data-2.2 there are different version of spring. So jar hell occurs and tests fails. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-13291) Remove unnecessary dependency to curator-client from ZookeeperDiscoverySpi
Ivan Daschinskiy created IGNITE-13291: - Summary: Remove unnecessary dependency to curator-client from ZookeeperDiscoverySpi Key: IGNITE-13291 URL: https://issues.apache.org/jira/browse/IGNITE-13291 Project: Ignite Issue Type: Improvement Components: zookeeper Affects Versions: 2.9 Reporter: Ivan Daschinskiy Assignee: Ivan Daschinskiy Fix For: 2.10 Currently, I suppose by mistake, we use {{org.apache.curator.utils.PathUtils#validatePath(java.lang.String)}} from {{curator-client}} in {{ZookeeperDiscoverySpi}}. Generally, this discovery implementation doesn't depend on curator framework at all, except some test code. We should remove this dependency and add this utility method to our codebase. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-13292) Remove unneeded ZkPinger from ZookeeperDiscovery
Ivan Daschinskiy created IGNITE-13292: - Summary: Remove unneeded ZkPinger from ZookeeperDiscovery Key: IGNITE-13292 URL: https://issues.apache.org/jira/browse/IGNITE-13292 Project: Ignite Issue Type: Improvement Components: zookeeper Affects Versions: 2.9 Reporter: Ivan Daschinskiy Assignee: Ivan Daschinskiy Fix For: 2.10 We need remove unneede {{ZkPinger}} from our codebase, introduced in [IGNITE-9683|https://issues.apache.org/jira/browse/IGNITE-9683]. This pinger was introduced to solve issues with server nodes segmentation when cluster is deactivated. The main reason of that is the strange all thread freeze when huge amount of memory is deallocated with {{Unsafe.freeMemory}}, such freeze can last for a minute and more. So this pinger doesn't solve problem at all and this is proved. The working solution to this problem is introduced in [IGNITE-9658|https://issues.apache.org/jira/browse/IGNITE-9658] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-13308) C++: Thin client transactions
Ivan Daschinskiy created IGNITE-13308: - Summary: C++: Thin client transactions Key: IGNITE-13308 URL: https://issues.apache.org/jira/browse/IGNITE-13308 Project: Ignite Issue Type: Improvement Components: platforms Reporter: Ivan Daschinskiy Assignee: Ivan Daschinskiy Fix For: 2.10 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-13967) Refactor and imrpove performance of python thin client marshaller
Ivan Daschinskiy created IGNITE-13967: - Summary: Refactor and imrpove performance of python thin client marshaller Key: IGNITE-13967 URL: https://issues.apache.org/jira/browse/IGNITE-13967 Project: Ignite Issue Type: Improvement Components: thin client Reporter: Ivan Daschinskiy Assignee: Ivan Daschinskiy Currently implemented serialization has questionable design and suffers from some problems 1. It is tightly coupled with Client object 2. It doesn't use protocol feature that total length of message is in the header, thus it constantly load from Client some data instead of iteration over byte array. 3. It uses some tricky hacks and sometimes new connection is created when deserializing object. 4. It constantly allocates bytes (immutable data structure). I suggest to rewrite serialization and deserialization: 1. Pass to corresponding methods specific SerDe context + BytesIO 2. Context can be sync and async and contains specific flags and methods for loading/uploading binary object schemas 3. Refactor Client in order to retrieve full packet from socket at once then pass full packet futher. These steps can significantly improve performance, reduce amount of allocations and give foundation for implementing asyncio version of client. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-13872) Hardcoded retry timeout in ZookeeperClient
Ivan Daschinskiy created IGNITE-13872: - Summary: Hardcoded retry timeout in ZookeeperClient Key: IGNITE-13872 URL: https://issues.apache.org/jira/browse/IGNITE-13872 Project: Ignite Issue Type: Bug Reporter: Ivan Daschinskiy Assignee: Ivan Daschinskiy Currently, retry timeout is hardcoded (2000ms) in ZookeeperClient. We should calculate this timeout using some strategy, depending on session timeout. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-13903) Python thin client tests automation.
Ivan Daschinskiy created IGNITE-13903: - Summary: Python thin client tests automation. Key: IGNITE-13903 URL: https://issues.apache.org/jira/browse/IGNITE-13903 Project: Ignite Issue Type: Improvement Components: python Reporter: Ivan Daschinskiy Assignee: Ivan Daschinskiy It would be nice to futher improve our development process of python-thin-client 1. Add docker-compose.yml to simplify local development 2. Add tox.ini to simplify test running automation 3. Integrate travis-ci build. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-13882) Support configurable install root and work root for ducktape
Ivan Daschinskiy created IGNITE-13882: - Summary: Support configurable install root and work root for ducktape Key: IGNITE-13882 URL: https://issues.apache.org/jira/browse/IGNITE-13882 Project: Ignite Issue Type: Test Reporter: Ivan Daschinskiy Assignee: Ivan Daschinskiy -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-13690) Failed to init coordinator caches on concurrent start of nodes with different cache configurations.
Ivan Daschinskiy created IGNITE-13690: - Summary: Failed to init coordinator caches on concurrent start of nodes with different cache configurations. Key: IGNITE-13690 URL: https://issues.apache.org/jira/browse/IGNITE-13690 Project: Ignite Issue Type: Bug Affects Versions: 2.9 Reporter: Ivan Daschinskiy Scenario: 1. Start simultaneously nodes with different cache configurations (for simplicity, let client nodes be with configured caches, servers without). 2. When processing first exchange on coordinator, coordinator will fail with {code:java} [2020-11-10 13:23:57,232][ERROR][start-node-1][DifferentCacheConfigurationConcurrentStart0] Got exception while starting (will rollback startup routine). java.lang.AssertionError: Invalid exchange futures state [cur=6, total=7] at org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager$17.applyx(CacheAffinitySharedManager.java:1964) at org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager$17.applyx(CacheAffinitySharedManager.java:1935) at org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager.lambda$forAllRegisteredCacheGroups$e0a6939d$1(CacheAffinitySharedManager.java:1265) at org.apache.ignite.internal.util.IgniteUtils.doInParallel(IgniteUtils.java:11157) at org.apache.ignite.internal.util.IgniteUtils.doInParallel(IgniteUtils.java:11059) at org.apache.ignite.internal.util.IgniteUtils.doInParallel(IgniteUtils.java:11039) at org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager.forAllRegisteredCacheGroups(CacheAffinitySharedManager.java:1264) at org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager.initCoordinatorCaches(CacheAffinitySharedManager.java:1935) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.initCoordinatorCaches(GridDhtPartitionsExchangeFuture.java:716) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:850) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:3175) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:3021) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) at java.lang.Thread.run(Thread.java:748) {code} The main reason is the race on creating {{LocalJoinCachesContext}}, so local join caches differs from registered caches from other nodes. Reproducer for zk and ring discoveries are attached. NB! Not always reproducible -- to increase probability of fail, add sleep in {{GridDhtPartitionsExchangeFuture#init}} {code:java} public void init(boolean newCrd) throws IgniteInterruptedCheckedException { if (newCrd) U.sleep(500); {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-13699) Support new metrics framework in ZookeeperDiscovery
Ivan Daschinskiy created IGNITE-13699: - Summary: Support new metrics framework in ZookeeperDiscovery Key: IGNITE-13699 URL: https://issues.apache.org/jira/browse/IGNITE-13699 Project: Ignite Issue Type: Improvement Components: zookeeper Reporter: Ivan Daschinskiy Assignee: Ivan Daschinskiy -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-13911) asyncio version of python ignite thin client
Ivan Daschinskiy created IGNITE-13911: - Summary: asyncio version of python ignite thin client Key: IGNITE-13911 URL: https://issues.apache.org/jira/browse/IGNITE-13911 Project: Ignite Issue Type: Improvement Components: python Reporter: Ivan Daschinskiy Assignee: Ivan Daschinskiy Currently, asyncio is default event-loop and coroutine engine for python 3.6+. This approach can drastically improve performance of IO-bound tasks. So it is important to implement asyncio version of python ignite client. Old synchronous version should remain and share common code with asyncio version. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-13994) Rebalance huge cache for in-memory cluster
Ivan Daschinskiy created IGNITE-13994: - Summary: Rebalance huge cache for in-memory cluster Key: IGNITE-13994 URL: https://issues.apache.org/jira/browse/IGNITE-13994 Project: Ignite Issue Type: Test Reporter: Ivan Daschinskiy There are some evidence, that rebalancing huge cache without rebalance throttling can cause OOM on supplier. We need to cover this scenario. Scenario: 1. Start two nodes and 1 replicated cache with data region much more than heap. 2. Stop one of the node. 3. Load data to cache almost equal to size of data region. 4. Start node. Goal is to run experiments with parameters 1. Heap size 2. Cache size 3. Rebalance batch size. 4. Rebalance throttle -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-14003) OOM on creating rebalance iterator while rebalancing cache with large values.
Ivan Daschinskiy created IGNITE-14003: - Summary: OOM on creating rebalance iterator while rebalancing cache with large values. Key: IGNITE-14003 URL: https://issues.apache.org/jira/browse/IGNITE-14003 Project: Ignite Issue Type: Bug Affects Versions: 2.9.1, 2.8.1, 2.9 Reporter: Ivan Daschinskiy Scenario 1. Start replicated cache on ignite node, memory region approx 6 Gb, heap 1Gb 2. Load significant amount of data to cache with values approx 200Kb (~20K kv pairs) 3. Start another node First node (supplier) will crash while initializing rebalance iterator with OOM Main reason -- all values, to whon pointed from leaf of BTree, are all loaded to buffer in BPlusTree#ForwardCursor. For replicated cache, 512 iterators for each partition are created at once. Reproducer is attached. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-14046) Update ducktape version to 0.8.2 in ignite-ducktape module
Ivan Daschinskiy created IGNITE-14046: - Summary: Update ducktape version to 0.8.2 in ignite-ducktape module Key: IGNITE-14046 URL: https://issues.apache.org/jira/browse/IGNITE-14046 Project: Ignite Issue Type: Task Reporter: Ivan Daschinskiy -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-14074) Add ability to skip affinity tests for testing with older ignite versions.
Ivan Daschinskiy created IGNITE-14074: - Summary: Add ability to skip affinity tests for testing with older ignite versions. Key: IGNITE-14074 URL: https://issues.apache.org/jira/browse/IGNITE-14074 Project: Ignite Issue Type: Task Components: python, thin client Reporter: Ivan Daschinskiy -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-14154) Remove test test_unsupported_affinity_cache_operation_routed_to_random_node
Ivan Daschinskiy created IGNITE-14154: - Summary: Remove test test_unsupported_affinity_cache_operation_routed_to_random_node Key: IGNITE-14154 URL: https://issues.apache.org/jira/browse/IGNITE-14154 Project: Ignite Issue Type: Test Reporter: Ivan Daschinskiy Currently, this test simply the same as test_replicated_cache_operation_routed_to_random_node, but it required custom affinity function, that is introduced only in 2.9.1. I suggest to remove it -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-14167) Concurrency issues in reconnect and too short backoff strategy for reconnect timeout.
Ivan Daschinskiy created IGNITE-14167: - Summary: Concurrency issues in reconnect and too short backoff strategy for reconnect timeout. Key: IGNITE-14167 URL: https://issues.apache.org/jira/browse/IGNITE-14167 Project: Ignite Issue Type: Bug Components: python, thin client Reporter: Ivan Daschinskiy Currently the code in Connection class is not properly synchronized and socket can be set to None while reconnecting and sending requests simultaneously. Also, reconnections attempt are to short (8 fibonacci sequence items) and total only 33 sec. These issues lead to flaky tests for affinity suite. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-14072) Remove copy-paste of response for different versions
Ivan Daschinskiy created IGNITE-14072: - Summary: Remove copy-paste of response for different versions Key: IGNITE-14072 URL: https://issues.apache.org/jira/browse/IGNITE-14072 Project: Ignite Issue Type: Task Components: python, thin client Reporter: Ivan Daschinskiy Currently there are many common code in classed Response140 SqlResponse140 Response130 and SqlResponse130. This should be fixed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-14429) Python thin client cache.get_size works not as expected and PeekModes are incorrect.
Ivan Daschinskiy created IGNITE-14429: - Summary: Python thin client cache.get_size works not as expected and PeekModes are incorrect. Key: IGNITE-14429 URL: https://issues.apache.org/jira/browse/IGNITE-14429 Project: Ignite Issue Type: Bug Components: python, thin client Reporter: Ivan Daschinskiy 1. PeekModes is now ByteArray, so class variables should be changed. Currently these values are incorrect, seems like masks. They should be changed to ordinal values in order to resemble java enum. 2. By default, peek_modes in get_size should be None, not 0 * If pass 0, behaviour is not if we use PeekModes.ALL, but PeekModes.PRIMARY -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-14418) Document asyncio version of python ignite thin client
Ivan Daschinskiy created IGNITE-14418: - Summary: Document asyncio version of python ignite thin client Key: IGNITE-14418 URL: https://issues.apache.org/jira/browse/IGNITE-14418 Project: Ignite Issue Type: New Feature Components: python, thin client Reporter: Ivan Daschinskiy -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-14444) Move affinity calculation and storage to client
Ivan Daschinskiy created IGNITE-1: - Summary: Move affinity calculation and storage to client Key: IGNITE-1 URL: https://issues.apache.org/jira/browse/IGNITE-1 Project: Ignite Issue Type: Improvement Components: python Reporter: Ivan Daschinskiy In current implementation, affinity storage and affinity calculation are located in cache. It is not optimal: 1. affinity is not shared between Cache instance with same name 2. affinity mapping requests per cache and add additional loads. 3. if we start implementing transactions or expiry policy, this can be an issue. I propose to move affinity storage to Client and AioClient. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-14472) Performance drop on primitive operations.
Ivan Daschinskiy created IGNITE-14472: - Summary: Performance drop on primitive operations. Key: IGNITE-14472 URL: https://issues.apache.org/jira/browse/IGNITE-14472 Project: Ignite Issue Type: Bug Components: python, thin client Affects Versions: python-0.4.0 Reporter: Ivan Daschinskiy Reason of performance drop: header struct of Response is not cached (now it is instance variable, earlier it will be class variable) Performance drop approx 15 %. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-14422) Version management for ducktape.
Ivan Daschinskiy created IGNITE-14422: - Summary: Version management for ducktape. Key: IGNITE-14422 URL: https://issues.apache.org/jira/browse/IGNITE-14422 Project: Ignite Issue Type: Improvement Reporter: Ivan Daschinskiy I propose following: 1. Add to `update-versions` task a sub-task, that bumps versions in `ignitetests.__init__.py` (i.e. `2.11-SNAPSHOT` to `2.11-dev`) 2. Change `ignitetests.versions.DEV` to `IgniteVersion(ignitetests.__version__)` This automatically set `DEV` as latest version. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-14245) Infinite loop while trying to get affinity mapping on failed node
Ivan Daschinskiy created IGNITE-14245: - Summary: Infinite loop while trying to get affinity mapping on failed node Key: IGNITE-14245 URL: https://issues.apache.org/jira/browse/IGNITE-14245 Project: Ignite Issue Type: Bug Components: python, thin client Reporter: Ivan Daschinskiy Currenlty, it's possible to jump in infinite loop trying to reconnect to failed node while requesting affinity mapping. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-14240) Handle authentication error on python thin client properly
Ivan Daschinskiy created IGNITE-14240: - Summary: Handle authentication error on python thin client properly Key: IGNITE-14240 URL: https://issues.apache.org/jira/browse/IGNITE-14240 Project: Ignite Issue Type: Bug Components: python, thin client Reporter: Ivan Daschinskiy -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-14186) Develop C module for python thin client to speedup hashcode calculation
Ivan Daschinskiy created IGNITE-14186: - Summary: Develop C module for python thin client to speedup hashcode calculation Key: IGNITE-14186 URL: https://issues.apache.org/jira/browse/IGNITE-14186 Project: Ignite Issue Type: Improvement Components: python, thin client Reporter: Ivan Daschinskiy Pure python calculation of hashcode is very slow. It leads to inadequate performance of simple operation. For example, put object with 1Mb data takes 500ms. After rewriting hashcode in C, operation tooks only 7ms -- This message was sent by Atlassian Jira (v8.3.4#803005)