[jira] [Created] (IGNITE-10852) [Documentation] - Add details on to public API behaviour
Alexander Gerus created IGNITE-10852: Summary: [Documentation] - Add details on to public API behaviour Key: IGNITE-10852 URL: https://issues.apache.org/jira/browse/IGNITE-10852 Project: Ignite Issue Type: Improvement Components: documentation Affects Versions: 2.7, 2.6, 2.5, 2.4 Reporter: Alexander Gerus Current public API documentation has some specification gaps. In case if method was not successfully executed, it is not clear what should be done by user code. Good practice is to describe all API exceptions that can be processed by user code and recommended actions -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-6477) Add cache index metric to represent index size
[ https://issues.apache.org/jira/browse/IGNITE-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Gerus updated IGNITE-6477: Affects Version/s: 2.5 > Add cache index metric to represent index size > -- > > Key: IGNITE-6477 > URL: https://issues.apache.org/jira/browse/IGNITE-6477 > Project: Ignite > Issue Type: Bug > Components: general >Affects Versions: 1.8, 1.9, 2.0, 2.1, 2.5 >Reporter: Alexander Belyak >Priority: Minor > Labels: iep-29 > > Now we can't estimate space used by particular cache index. Let's add it! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-6477) Add cache index metric to represent index size
[ https://issues.apache.org/jira/browse/IGNITE-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Gerus updated IGNITE-6477: Labels: iep-29 (was: ) > Add cache index metric to represent index size > -- > > Key: IGNITE-6477 > URL: https://issues.apache.org/jira/browse/IGNITE-6477 > Project: Ignite > Issue Type: Bug > Components: general >Affects Versions: 1.8, 1.9, 2.0, 2.1 >Reporter: Alexander Belyak >Priority: Minor > Labels: iep-29 > > Now we can't estimate space used by particular cache index. Let's add it! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-10385) NPE in CachePartitionPartialCountersMap.toString
[ https://issues.apache.org/jira/browse/IGNITE-10385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Gerus updated IGNITE-10385: - Priority: Blocker (was: Major) > NPE in CachePartitionPartialCountersMap.toString > > > Key: IGNITE-10385 > URL: https://issues.apache.org/jira/browse/IGNITE-10385 > Project: Ignite > Issue Type: Bug > Components: general >Affects Versions: 2.4 >Reporter: Anton Kurbanov >Priority: Blocker > > {noformat} > Failed to reinitialize local partitions (preloading will be stopped) > org.apache.ignite.IgniteException: null > at > org.apache.ignite.internal.util.tostring.GridToStringBuilder.toStringImpl(GridToStringBuilder.java:1032) > ~[ignite-core-2.4.10.jar:2.4.10] > at > org.apache.ignite.internal.util.tostring.GridToStringBuilder.toString(GridToStringBuilder.java:868) > ~[ignite-core-2.4.10.jar:2.4.10] > at > org.apache.ignite.internal.managers.communication.GridIoMessage.toString(GridIoMessage.java:358) > ~[ignite-core-2.4.10.jar:2.4.10] > at java.lang.String.valueOf(String.java:2994) ~[?:1.8.0_171] > at java.lang.StringBuilder.append(StringBuilder.java:131) ~[?:1.8.0_171] > at > org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2653) > ~[ignite-core-2.4.10.jar:2.4.10] > at > org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2586) > ~[ignite-core-2.4.10.jar:2.4.10] > at > org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1642) > ~[ignite-core-2.4.10.jar:2.4.10] > at > org.apache.ignite.internal.managers.communication.GridIoManager.sendToGridTopic(GridIoManager.java:1714) > ~[ignite-core-2.4.10.jar:2.4.10] > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager.send(GridCacheIoManager.java:1160) > ~[ignite-core-2.4.10.jar:2.4.10] > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.sendLocalPartitions(GridDhtPartitionsExchangeFuture.java:1399) > ~[ignite-core-2.4.10.jar:2.4.10] > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.sendPartitions(GridDhtPartitionsExchangeFuture.java:1506) > ~[ignite-core-2.4.10.jar:2.4.10] > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.distributedExchange(GridDhtPartitionsExchangeFuture.java:1139) > ~[ignite-core-2.4.10.jar:2.4.10] > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:703) > [ignite-core-2.4.10.jar:2.4.10] > at > org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2379) > [ignite-core-2.4.10.jar:2.4.10] > at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) > [ignite-core-2.4.10.jar:2.4.10] > at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171] > Caused by: org.apache.ignite.IgniteException > at > org.apache.ignite.internal.util.tostring.GridToStringBuilder.toStringImpl(GridToStringBuilder.java:1032) > ~[ignite-core-2.4.10.jar:2.4.10] > at > org.apache.ignite.internal.util.tostring.GridToStringBuilder.toString(GridToStringBuilder.java:830) > ~[ignite-core-2.4.10.jar:2.4.10] > at > org.apache.ignite.internal.util.tostring.GridToStringBuilder.toString(GridToStringBuilder.java:787) > ~[ignite-core-2.4.10.jar:2.4.10] > at > org.apache.ignite.internal.util.tostring.GridToStringBuilder.toString(GridToStringBuilder.java:889) > ~[ignite-core-2.4.10.jar:2.4.10] > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsSingleMessage.toString(GridDhtPartitionsSingleMessage.java:551) > ~[ignite-core-2.4.10.jar:2.4.10] > at java.lang.String.valueOf(String.java:2994) ~[?:1.8.0_171] > at > org.apache.ignite.internal.util.GridStringBuilder.a(GridStringBuilder.java:101) > ~[ignite-core-2.4.10.jar:2.4.10] > at > org.apache.ignite.internal.util.tostring.SBLimitedLength.a(SBLimitedLength.java:88) > ~[ignite-core-2.4.10.jar:2.4.10] > at > org.apache.ignite.internal.util.tostring.GridToStringBuilder.toString(GridToStringBuilder.java:943) > ~[ignite-core-2.4.10.jar:2.4.10] > at > org.apache.ignite.internal.util.tostring.GridToStringBuilder.toStringImpl(GridToStringBuilder.java:1009) > ~[ignite-core-2.4.10.jar:2.4.10] > ... 16 more > Caused by: java.lang.NullPointerException > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.CachePartitionPartialCountersMap.toString(CachePartitionPartialCountersMap.java:231) > ~[ignite-core-2.4.10.jar:2.4.10] > at java.lang.String.valueOf(String.java:2994) ~[?:1.8.0_171] > at
[jira] [Updated] (IGNITE-9525) Ignite + Informatica Integration
[ https://issues.apache.org/jira/browse/IGNITE-9525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Gerus updated IGNITE-9525: Attachment: Ignite_Informatica_Integration.pdf > Ignite + Informatica Integration > > > Key: IGNITE-9525 > URL: https://issues.apache.org/jira/browse/IGNITE-9525 > Project: Ignite > Issue Type: Task > Components: documentation >Reporter: Prachi Garg >Assignee: Pavel Vinokurov >Priority: Major > Fix For: 2.7 > > Attachments: Ignite_Informatica_Integration.pdf > > > Mentioned in https://cwiki.apache.org/confluence/display/IGNITE/Required+Docs -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-9676) Ignite as storage in Spring Session
[ https://issues.apache.org/jira/browse/IGNITE-9676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Gerus updated IGNITE-9676: Fix Version/s: 2.7 > Ignite as storage in Spring Session > --- > > Key: IGNITE-9676 > URL: https://issues.apache.org/jira/browse/IGNITE-9676 > Project: Ignite > Issue Type: New Feature >Reporter: Anton Kurbanov >Assignee: Anton Kurbanov >Priority: Minor > Fix For: 2.7 > > > Implement repository backed with Ignite for sessions clustering with Spring > Session. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-8879) Blinking baseline node sometimes unable to connect to cluster
[ https://issues.apache.org/jira/browse/IGNITE-8879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Gerus updated IGNITE-8879: Priority: Critical (was: Major) > Blinking baseline node sometimes unable to connect to cluster > - > > Key: IGNITE-8879 > URL: https://issues.apache.org/jira/browse/IGNITE-8879 > Project: Ignite > Issue Type: Bug >Affects Versions: 2.5 >Reporter: Dmitry Sherstobitov >Assignee: Vladislav Pyatkov >Priority: Critical > Attachments: IGNITE-8879.zip > > > Almost the same scenario as in IGNITE-8874 but node left baseline while > blinking > All caches with 2 backups > 4 nodes in cluster > # Start cluster, load data > # Start transactional loading (8 threads, 100 ops/second put/get in each op) > # Repeat 10 times: kill one node, remove from baseline, start node again > (*with no LFS clean*), wait for rebalance > # Check idle_verify, check data corruption > > At some point killed node unable to start and join cluster because of error > (Attachments info: grid.1.node2.X.log - blinking node logs, X - iteration > counter from step 3) > {code:java} > 080ee8-END.bin] > [2018-06-26 19:01:43,039][INFO ][main][PageMemoryImpl] Started page memory > [memoryAllocated=100.0 MiB, pages=24800, tableSize=1.9 MiB, > checkpointBuffer=100.0 MiB] > [2018-06-26 19:01:43,039][INFO ][main][GridCacheDatabaseSharedManager] > Checking memory state [lastValidPos=FileWALPointer [idx=0, fileOff=583691, > len=119], lastMarked=FileWALPointer [idx=0, fileOff=583691, len=119], > lastCheckpointId=7fca4dbb-8f01-4b63-95e2-43283b080ee8] > [2018-06-26 19:01:43,050][INFO ][main][GridCacheDatabaseSharedManager] Found > last checkpoint marker [cpId=7fca4dbb-8f01-4b63-95e2-43283b080ee8, > pos=FileWALPointer [idx=0, fileOff=583691, len=119]] > [2018-06-26 19:01:43,082][INFO ][main][FileWriteAheadLogManager] Stopping WAL > iteration due to an exception: EOF at position [100] expected to read [1] > bytes, ptr=FileWALPointer [idx=0, fileOff=100, len=0] > [2018-06-26 19:01:43,219][WARN ][main][FileWriteAheadLogManager] WAL segment > tail is reached. [ Expected next state: {Index=19,Offset=794017}, Actual > state : {Index=3602879702215753728,Offset=775434544} ] > [2018-06-26 19:01:43,243][INFO ][main][GridCacheDatabaseSharedManager] > Applying lost cache updates since last checkpoint record > [lastMarked=FileWALPointer [idx=0, fileOff=583691, len=119], > lastCheckpointId=7fca4dbb-8f01-4b63-95e2-43283b080ee8] > [2018-06-26 19:01:43,246][INFO ][main][FileWriteAheadLogManager] Stopping WAL > iteration due to an exception: EOF at position [100] expected to read [1] > bytes, ptr=FileWALPointer [idx=0, fileOff=100, len=0] > [2018-06-26 19:01:43,336][WARN ][main][FileWriteAheadLogManager] WAL segment > tail is reached. [ Expected next state: {Index=19,Offset=794017}, Actual > state : {Index=3602879702215753728,Offset=775434544} ] > [2018-06-26 19:01:43,336][INFO ][main][GridCacheDatabaseSharedManager] > Finished applying WAL changes [updatesApplied=0, time=101ms] > [2018-06-26 19:01:43,450][INFO > ][main][GridSnapshotAwareClusterStateProcessorImpl] Restoring history for > BaselineTopology[id=4] > [2018-06-26 19:01:43,454][ERROR][main][IgniteKernal] Exception during start > processors, node will be stopped and close connections > class org.apache.ignite.IgniteCheckedException: Failed to start processor: > GridProcessorAdapter [] > at > org.apache.ignite.internal.IgniteKernal.startProcessor(IgniteKernal.java:1769) > at > org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1001) > at > org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2020) > at > org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1725) > at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1153) > at > org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1071) > at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:957) > at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:856) > at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:726) > at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:695) > at org.apache.ignite.Ignition.start(Ignition.java:352) > at > org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:301) > Caused by: class org.apache.ignite.IgniteCheckedException: Restoring of > BaselineTopology history has failed, expected history item not found for id=1 > at > org.apache.ignite.internal.processors.cluster.BaselineTopologyHistory.restoreHistory(BaselineTopologyHistory.java:54) > at >
[jira] [Created] (IGNITE-9495) Update version for org.apache.lucene lucene-queryparser : 5.5.2
Alexander Gerus created IGNITE-9495: --- Summary: Update version for org.apache.lucene lucene-queryparser : 5.5.2 Key: IGNITE-9495 URL: https://issues.apache.org/jira/browse/IGNITE-9495 Project: Ignite Issue Type: Improvement Affects Versions: 2.6, 2.5, 2.4 Reporter: Alexander Gerus Update version for org.apache.lucene Current version: lucene-queryparser : 5.5.2 New version version: later than 7.1 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9295) Add Warning message for multiple data streamers
Alexander Gerus created IGNITE-9295: --- Summary: Add Warning message for multiple data streamers Key: IGNITE-9295 URL: https://issues.apache.org/jira/browse/IGNITE-9295 Project: Ignite Issue Type: Improvement Reporter: Alexander Gerus Fix For: 2.7 DataStreamer is design to allocate as much resources as available. In case if user is starting more then one instance per cache, it can cause significant slowdown for the streaming due to significant consumption of resources The proposal is to add warning message to the application log in case if two or more data streamers per cache: Warning Text: “DataStreamer is already running. For best performance please use single instance” The warning should be printed only once when the case is detected -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (IGNITE-8987) Ignite hangs during getting of atomic structure after autoactivation
[ https://issues.apache.org/jira/browse/IGNITE-8987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Gerus reassigned IGNITE-8987: --- Assignee: Roman Guseinov (was: Alexey Goncharuk) > Ignite hangs during getting of atomic structure after autoactivation > > > Key: IGNITE-8987 > URL: https://issues.apache.org/jira/browse/IGNITE-8987 > Project: Ignite > Issue Type: Bug > Components: cache >Affects Versions: 2.4 >Reporter: Andrey Aleksandrov >Assignee: Roman Guseinov >Priority: Major > Fix For: 2.7 > > Attachments: reproducer.java > > > I investigate the use cases with autoactivation and creating of the > IgniteAtomicSequence. It hangs on awaitInitialization() method in case if it > called after the last node from BLT was started. > Steps to reproduce: > First iteration: > > Do next in one thread: > 1)Start server 1 > 2)Start server 2 > 3)Activate the cluster > 4)Create the IgniteAtomicSequence using next code: > IgniteAtomicSequence igniteAtomicSequence = ignite.atomicSequence( > "TestName", > atomicConfiguration, > 10, > true); > Second iteration: > 1)Start server 1 > 2)Start server 2 (Autoactivation will be started) > 3)Get the IgniteAtomicSequence using next code: > IgniteAtomicSequence igniteAtomicSequence = ignite.atomicSequence( > "TestName", > 10, > true); //could be false because TestName was already created in iteration 1 > In this case, we hang in awaitInitialization() method in > DataStructureProcessor.getAtomic() method. > In case if I added some sleep timeout between step 2 and 3 in the second > iteration then everything was ok. Looks like we have some race here. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-9196) SQL: Memory leak in MapNodeResults
[ https://issues.apache.org/jira/browse/IGNITE-9196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Gerus updated IGNITE-9196: Priority: Blocker (was: Major) > SQL: Memory leak in MapNodeResults > -- > > Key: IGNITE-9196 > URL: https://issues.apache.org/jira/browse/IGNITE-9196 > Project: Ignite > Issue Type: Bug > Components: sql >Affects Versions: 2.6 >Reporter: Denis Mekhanikov >Priority: Blocker > > When size of a SQL query result set is a multiple of {{Query#pageSize}}, then > {{MapQueryResult}} is never closed and removed from {{MapNodeResults#res}} > collection. > The following code leads to OOME when run with 1Gb heap: > {code:java} > public class MemLeakRepro { > public static void main(String[] args) { > Ignition.start(getConfiguration("server")); > try (Ignite client = > Ignition.start(getConfiguration("client").setClientMode(true))) { > IgniteCache cache = startPeopleCache(client); > int pages = 10; > int pageSize = 1024; > for (int i = 0; i < pages * pageSize; i++) { > Person p = new Person("Person #" + i, 25); > cache.put(i, p); > } > for (int i = 0; i < 1_000_000; i++) { > if (i % 1000 == 0) > System.out.println("Select iteration #" + i); > Query> qry = new SqlFieldsQuery("select * from > people"); > qry.setPageSize(pageSize); > QueryCursor> cursor = cache.query(qry); > cursor.getAll(); > cursor.close(); > } > } > } > private static IgniteConfiguration getConfiguration(String instanceName) { > IgniteConfiguration igniteCfg = new IgniteConfiguration(); > igniteCfg.setIgniteInstanceName(instanceName); > TcpDiscoverySpi discoSpi = new TcpDiscoverySpi(); > discoSpi.setIpFinder(new TcpDiscoveryVmIpFinder(true)); > return igniteCfg; > } > private static IgniteCache startPeopleCache(Ignite node) > { > CacheConfiguration cacheCfg = new > CacheConfiguration<>("cache"); > QueryEntity qe = new QueryEntity(Integer.class, Person.class); > qe.setTableName("people"); > cacheCfg.setQueryEntities(Collections.singleton(qe)); > cacheCfg.setSqlSchema("PUBLIC"); > return node.getOrCreateCache(cacheCfg); > } > public static class Person { > @QuerySqlField > private String name; > @QuerySqlField > private int age; > public Person(String name, int age) { > this.name = name; > this.age = age; > } > } > } > {code} > > At the same time it works perfectly fine, when there are, for example, > {{pages * pageSize - 1}} records in cache instead. > The reason for it is that {{MapQueryResult#fetchNextPage(...)}} method > doesn't return true, when the result set size is a multiple of the page size. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (IGNITE-9178) Partition lost event are not triggered if multiple nodes left cluster
[ https://issues.apache.org/jira/browse/IGNITE-9178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Gerus reassigned IGNITE-9178: --- Assignee: Pavel Vinokurov > Partition lost event are not triggered if multiple nodes left cluster > - > > Key: IGNITE-9178 > URL: https://issues.apache.org/jira/browse/IGNITE-9178 > Project: Ignite > Issue Type: Bug > Components: cache >Affects Versions: 2.4 >Reporter: Pavel Vinokurov >Assignee: Pavel Vinokurov >Priority: Major > > If multiple nodes left cluster simultaneously, left partitions are removed > from GridDhtPartitionTopologyImpl#node2part without adding to leftNode2Part > in GridDhtPartitionTopologyImpl#update method. > Thus GridDhtPartitionTopologyImpl#detectLostPartitions can't detect lost > partitions -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-5103) TcpDiscoverySpi ignores maxMissedClientHeartbeats property
[ https://issues.apache.org/jira/browse/IGNITE-5103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Gerus updated IGNITE-5103: Priority: Critical (was: Major) > TcpDiscoverySpi ignores maxMissedClientHeartbeats property > -- > > Key: IGNITE-5103 > URL: https://issues.apache.org/jira/browse/IGNITE-5103 > Project: Ignite > Issue Type: Bug > Components: general >Affects Versions: 1.9 >Reporter: Valentin Kulichenko >Assignee: Evgenii Zhuravlev >Priority: Critical > Fix For: 2.7 > > Attachments: TcpDiscoveryClientSuspensionSelfTest.java > > > Test scenario is the following: > * Start one or more servers. > * Start a client node. > * Suspend client process using {{-SIGSTOP}} signal. > * Wait for {{maxMissedClientHeartbeats*heartbeatFrequency}}. > * Client node is expected to be removed from topology, but server nodes don't > do that. > Attached is the unit test reproducing the same by stopping the heartbeat > sender thread on the client. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-9178) Partition lost event are not triggered if multiple nodes left cluster
[ https://issues.apache.org/jira/browse/IGNITE-9178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Gerus updated IGNITE-9178: Priority: Critical (was: Major) > Partition lost event are not triggered if multiple nodes left cluster > - > > Key: IGNITE-9178 > URL: https://issues.apache.org/jira/browse/IGNITE-9178 > Project: Ignite > Issue Type: Bug > Components: cache >Affects Versions: 2.4 >Reporter: Pavel Vinokurov >Assignee: Pavel Vinokurov >Priority: Critical > > If multiple nodes left cluster simultaneously, left partitions are removed > from GridDhtPartitionTopologyImpl#node2part without adding to leftNode2Part > in GridDhtPartitionTopologyImpl#update method. > Thus GridDhtPartitionTopologyImpl#detectLostPartitions can't detect lost > partitions -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-9068) Node fails to stop when CacheObjectBinaryProcessor.addMeta() is executed inside guard()/unguard()
[ https://issues.apache.org/jira/browse/IGNITE-9068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Gerus updated IGNITE-9068: Priority: Critical (was: Major) > Node fails to stop when CacheObjectBinaryProcessor.addMeta() is executed > inside guard()/unguard() > - > > Key: IGNITE-9068 > URL: https://issues.apache.org/jira/browse/IGNITE-9068 > Project: Ignite > Issue Type: Bug > Components: binary, managed services >Affects Versions: 2.5 >Reporter: Ilya Kasnacheev >Assignee: Ilya Lantukh >Priority: Critical > Labels: test > Fix For: 2.7 > > Attachments: GridServiceDeadlockTest.java, MyService.java > > > When addMeta is called in e.g. service deployment it us executed inside > guard()/unguard() > If node will be stopped at this point, Ignite.stop() will hang. > Consider the following thread dump: > {code} > "Thread-1" #57 prio=5 os_prio=0 tid=0x7f7780005000 nid=0x7f26 runnable > [0x7f766cbef000] >java.lang.Thread.State: TIMED_WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x0005cb7b0468> (a > java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) > at > java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireNanos(AbstractQueuedSynchronizer.java:934) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(AbstractQueuedSynchronizer.java:1247) > at > java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.tryLock(ReentrantReadWriteLock.java:1115) > at > org.apache.ignite.internal.util.StripedCompositeReadWriteLock$WriteLock.tryLock(StripedCompositeReadWriteLock.java:220) > at > org.apache.ignite.internal.GridKernalGatewayImpl.tryWriteLock(GridKernalGatewayImpl.java:143) > // Waiting for lock to cancel futures of BinaryMetadataTransport > at org.apache.ignite.internal.IgniteKernal.stop0(IgniteKernal.java:2171) > at org.apache.ignite.internal.IgniteKernal.stop(IgniteKernal.java:2094) > at > org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.stop0(IgnitionEx.java:2545) > - locked <0x0005cb423f00> (a > org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance) > at > org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.stop(IgnitionEx.java:2508) > at > org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.run(IgnitionEx.java:2033) > "test-runner-#1%service.GridServiceDeadlockTest%" #13 prio=5 os_prio=0 > tid=0x7f77b87d5800 nid=0x7eb8 waiting on condition [0x7f778cdfc000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304) > // May never return if there's discovery problems > at > org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:177) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:140) > at > org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.addMeta(CacheObjectBinaryProcessorImpl.java:463) > at > org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl$2.addMeta(CacheObjectBinaryProcessorImpl.java:188) > at > org.apache.ignite.internal.binary.BinaryContext.registerUserClassDescriptor(BinaryContext.java:802) > at > org.apache.ignite.internal.binary.BinaryContext.registerClassDescriptor(BinaryContext.java:761) > at > org.apache.ignite.internal.binary.BinaryContext.descriptorForClass(BinaryContext.java:627) > at > org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal0(BinaryWriterExImpl.java:174) > at > org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal(BinaryWriterExImpl.java:157) > at > org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal(BinaryWriterExImpl.java:144) > at > org.apache.ignite.internal.binary.GridBinaryMarshaller.marshal(GridBinaryMarshaller.java:254) > at > org.apache.ignite.internal.binary.BinaryMarshaller.marshal0(BinaryMarshaller.java:82) > at > org.apache.ignite.marshaller.AbstractNodeNameAwareMarshaller.marshal(AbstractNodeNameAwareMarshaller.java:58) > at > org.apache.ignite.internal.util.IgniteUtils.marshal(IgniteUtils.java:10069) > at > org.apache.ignite.internal.processors.service.GridServiceProcessor.prepareServiceConfigurations(GridServiceProcessor.java:570) > at > org.apache.ignite.internal.processors.service.GridServiceProcessor.deployAll(GridServiceProcessor.java:622) > at >
[jira] [Updated] (IGNITE-9184) Cluster hangs during concurrent node restart and continues query registration
[ https://issues.apache.org/jira/browse/IGNITE-9184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Gerus updated IGNITE-9184: Ignite Flags: (was: Docs Required) > Cluster hangs during concurrent node restart and continues query registration > - > > Key: IGNITE-9184 > URL: https://issues.apache.org/jira/browse/IGNITE-9184 > Project: Ignite > Issue Type: Bug > Components: general >Affects Versions: 2.6 >Reporter: Mikhail Cherkasov >Assignee: Dmitriy Govorukhin >Priority: Blocker > Fix For: 2.7 > > Attachments: StressTest.java, logs, stacktrace > > > Please check the attached test case and stack trace. > I can see: "Failed to wait for initial partition map exchange" message. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (IGNITE-9184) Cluster hangs during concurrent node restart and continues query registration
[ https://issues.apache.org/jira/browse/IGNITE-9184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Gerus reassigned IGNITE-9184: --- Assignee: Dmitriy Govorukhin > Cluster hangs during concurrent node restart and continues query registration > - > > Key: IGNITE-9184 > URL: https://issues.apache.org/jira/browse/IGNITE-9184 > Project: Ignite > Issue Type: Bug > Components: general >Affects Versions: 2.6 >Reporter: Mikhail Cherkasov >Assignee: Dmitriy Govorukhin >Priority: Blocker > Fix For: 2.7 > > Attachments: StressTest.java, logs, stacktrace > > > Please check the attached test case and stack trace. > I can see: "Failed to wait for initial partition map exchange" message. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-9184) Cluster hangs during concurrent node restart and continues query registration
[ https://issues.apache.org/jira/browse/IGNITE-9184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Gerus updated IGNITE-9184: Priority: Blocker (was: Critical) > Cluster hangs during concurrent node restart and continues query registration > - > > Key: IGNITE-9184 > URL: https://issues.apache.org/jira/browse/IGNITE-9184 > Project: Ignite > Issue Type: Bug > Components: general >Affects Versions: 2.6 >Reporter: Mikhail Cherkasov >Priority: Blocker > Fix For: 2.7 > > Attachments: StressTest.java, logs, stacktrace > > > Please check the attached test case and stack trace. > I can see: "Failed to wait for initial partition map exchange" message. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9112) Pre-touch for Ignite off-heap memory
Alexander Gerus created IGNITE-9112: --- Summary: Pre-touch for Ignite off-heap memory Key: IGNITE-9112 URL: https://issues.apache.org/jira/browse/IGNITE-9112 Project: Ignite Issue Type: New Feature Affects Versions: 2.6, 2.5, 2.4 Reporter: Alexander Gerus At the moment Ignite off-heap memory is allocated in virtual memory of operating system, not physical memory: it is recorded in an internal data structure to avoid it being used by any other process. Not even a single page will be allocated in physical memory until it's actually accessed. When the Ignite needs memory, the operating system will allocate pages as needed. The proposal is to add an option to Ignite that will touch every single byte of the max off heap with a '0', resulting in the memory being allocated in the physical memory in addition to being reserved in the internal data structure (virtual memory). Similar option is available in JVM {{-XX:+AlwaysPreTouch}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (IGNITE-9068) Node fails to stop when CacheObjectBinaryProcessor.addMeta() is executed inside guard()/unguard()
[ https://issues.apache.org/jira/browse/IGNITE-9068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Gerus reassigned IGNITE-9068: --- Assignee: Pavel Kovalenko (was: Ilya Lantukh) > Node fails to stop when CacheObjectBinaryProcessor.addMeta() is executed > inside guard()/unguard() > - > > Key: IGNITE-9068 > URL: https://issues.apache.org/jira/browse/IGNITE-9068 > Project: Ignite > Issue Type: Bug > Components: binary, managed services >Affects Versions: 2.5 >Reporter: Ilya Kasnacheev >Assignee: Pavel Kovalenko >Priority: Major > Labels: test > Attachments: GridServiceDeadlockTest.java > > > When addMeta is called in e.g. service deployment it us executed inside > guard()/unguard() > If node will be stopped at this point, Ignite.stop() will hang. > Consider the following thread dump: > {code} > "Thread-1" #57 prio=5 os_prio=0 tid=0x7f7780005000 nid=0x7f26 runnable > [0x7f766cbef000] >java.lang.Thread.State: TIMED_WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x0005cb7b0468> (a > java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) > at > java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireNanos(AbstractQueuedSynchronizer.java:934) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(AbstractQueuedSynchronizer.java:1247) > at > java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.tryLock(ReentrantReadWriteLock.java:1115) > at > org.apache.ignite.internal.util.StripedCompositeReadWriteLock$WriteLock.tryLock(StripedCompositeReadWriteLock.java:220) > at > org.apache.ignite.internal.GridKernalGatewayImpl.tryWriteLock(GridKernalGatewayImpl.java:143) > // Waiting for lock to cancel futures of BinaryMetadataTransport > at org.apache.ignite.internal.IgniteKernal.stop0(IgniteKernal.java:2171) > at org.apache.ignite.internal.IgniteKernal.stop(IgniteKernal.java:2094) > at > org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.stop0(IgnitionEx.java:2545) > - locked <0x0005cb423f00> (a > org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance) > at > org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.stop(IgnitionEx.java:2508) > at > org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.run(IgnitionEx.java:2033) > "test-runner-#1%service.GridServiceDeadlockTest%" #13 prio=5 os_prio=0 > tid=0x7f77b87d5800 nid=0x7eb8 waiting on condition [0x7f778cdfc000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304) > // May never return if there's discovery problems > at > org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:177) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:140) > at > org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.addMeta(CacheObjectBinaryProcessorImpl.java:463) > at > org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl$2.addMeta(CacheObjectBinaryProcessorImpl.java:188) > at > org.apache.ignite.internal.binary.BinaryContext.registerUserClassDescriptor(BinaryContext.java:802) > at > org.apache.ignite.internal.binary.BinaryContext.registerClassDescriptor(BinaryContext.java:761) > at > org.apache.ignite.internal.binary.BinaryContext.descriptorForClass(BinaryContext.java:627) > at > org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal0(BinaryWriterExImpl.java:174) > at > org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal(BinaryWriterExImpl.java:157) > at > org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal(BinaryWriterExImpl.java:144) > at > org.apache.ignite.internal.binary.GridBinaryMarshaller.marshal(GridBinaryMarshaller.java:254) > at > org.apache.ignite.internal.binary.BinaryMarshaller.marshal0(BinaryMarshaller.java:82) > at > org.apache.ignite.marshaller.AbstractNodeNameAwareMarshaller.marshal(AbstractNodeNameAwareMarshaller.java:58) > at > org.apache.ignite.internal.util.IgniteUtils.marshal(IgniteUtils.java:10069) > at > org.apache.ignite.internal.processors.service.GridServiceProcessor.prepareServiceConfigurations(GridServiceProcessor.java:570) > at > org.apache.ignite.internal.processors.service.GridServiceProcessor.deployAll(GridServiceProcessor.java:622) > at >
[jira] [Updated] (IGNITE-8828) Detecting and stopping unresponsive nodes during Partition Map Exchange
[ https://issues.apache.org/jira/browse/IGNITE-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Gerus updated IGNITE-8828: Ignite Flags: Docs Required > Detecting and stopping unresponsive nodes during Partition Map Exchange > --- > > Key: IGNITE-8828 > URL: https://issues.apache.org/jira/browse/IGNITE-8828 > Project: Ignite > Issue Type: Improvement > Components: general >Reporter: Sergey Chugunov >Assignee: Ilya Lantukh >Priority: Major > Labels: iep-25 > Original Estimate: 264h > Remaining Estimate: 264h > > During PME process coordinator (1) gathers local partition maps from all > nodes and (2) sends calculated full partition map back to all nodes in the > topology. > However if one or more nodes fail to send local information on step 1 for any > reason, PME process hangs blocking all operations. The only solution will be > to manually identify and stop nodes which failed to send info to coordinator. > This should be done by coordinator itself: in case it didn't receive in time > local partition maps from any nodes, it should check that stopping these > nodes won't lead to data loss and then stop them forcibly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-8908) NPE on discovery message processing
[ https://issues.apache.org/jira/browse/IGNITE-8908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Gerus updated IGNITE-8908: Priority: Critical (was: Major) > NPE on discovery message processing > --- > > Key: IGNITE-8908 > URL: https://issues.apache.org/jira/browse/IGNITE-8908 > Project: Ignite > Issue Type: Bug > Components: general >Reporter: Mikhail Cherkasov >Priority: Critical > Attachments: ContinuousQueryTask.txt, RegisterTask.txt, > ServiceTask.txt > > > To reproduce the problem we do the the following steps: > 1) start 4 server nodes > 2) start client nodes: ServiceTask, RegisterTask, ContinuousQueryTask > 3) restart 3 of 4 server nodes. > The following exception is observed in logs: > [2018-07-02 10:15:48,199][ERROR]tcp-disco-msg-worker-#2 Failed to notify > direct custom event listener: MetadataUpdateAcceptedMessage > [id=cae4cd95461-6f8d75e0-424e-4a8f-8f20-c34a9d55e44f, typeId=-372239526, > acceptedVer=1, duplicated=false] > java.lang.NullPointerException: null > at > org.apache.ignite.internal.processors.cache.binary.BinaryMetadataTransport$MetadataUpdateAcceptedListener.onCustomEvent(BinaryMetadataTransport.java:451) > ~[ignite-core-2.x.x.jar:2.x.x] > at > org.apache.ignite.internal.processors.cache.binary.BinaryMetadataTransport$MetadataUpdateAcceptedListener.onCustomEvent(BinaryMetadataTransport.java:418) > ~[ignite-core-2.x.x.jar:2.x.x] > at > org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.onDiscovery0(GridDiscoveryManager.java:695) > [ignite-core-2.x.x.jar:2.x.x] > at > org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.onDiscovery(GridDiscoveryManager.java:577) > [ignite-core-2.x.x.jar:2.x.x] > at > org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.notifyDiscoveryListener(ServerImpl.java:5453) > [ignite-core-2.x.x.jar:2.x.x] > at > org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processCustomMessage(ServerImpl.java:5279) > [ignite-core-2.x.x.jar:2.x.x] > at > org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processCustomMessage(ServerImpl.java:5313) > [ignite-core-2.x.x.jar:2.x.x] > at > org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2739) > [ignite-core-2.x.x.jar:2.x.x] > at > org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2531) > [ignite-core-2.x.x.jar:2.x.x] > at > org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerAdapter.body(ServerImpl.java:6730) > [ignite-core-2.x.x.jar:2.x.x] > at > org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2614) > [ignite-core-2.x.x.jar:2.x.x] > at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62) > [ignite-core-2.x.x.jar:2.x.x] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-8676) Possible data loss after stoping/starting several nodes at the same time
[ https://issues.apache.org/jira/browse/IGNITE-8676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16523436#comment-16523436 ] Alexander Gerus commented on IGNITE-8676: - Cannot be reproduce after following fixes: * https://issues.apache.org/jira/browse/IGNITE-8339 * https://issues.apache.org/jira/browse/IGNITE-8122 * https://issues.apache.org/jira/browse/IGNITE-8405 > Possible data loss after stoping/starting several nodes at the same time > > > Key: IGNITE-8676 > URL: https://issues.apache.org/jira/browse/IGNITE-8676 > Project: Ignite > Issue Type: Bug > Components: persistence >Affects Versions: 2.4 >Reporter: Andrey Aleksandrov >Assignee: Stanislav Lukyanov >Priority: Critical > Fix For: 2.6 > > Attachments: DataLossTest.zip, Ignite8676Test.java, > image-2018-06-01-12-34-54-320.png, image-2018-06-01-13-12-47-218.png, > image-2018-06-01-13-15-17-437.png > > > Steps to reproduce: > 1)Start 3 data (DN1, DN2, DN3) nodes with the configuration that contains the > cache with node filter for only these three nodes and 1 backup. (see > configuration from attachment) > 2)Activate the cluster. Now you should have 3 nodes in BLT > 3)Start new server node (SN). Now you should have 3 nodes in BLT and 1 node > not in the baseline. > 4)Using some node load about 1 (or more) entities into the cache. > 5)Start that number of primary partitions equals to backup partitions. > !image-2018-06-01-12-34-54-320.png! > 6)Now stop DN3 and SN. After that start them at the same time. > 7)When DN3 and SN will be online, check that number of primary partitions > (PN) equals to backup partitions (BN). > 7.1)In a case if PN == BN => go to step 6) > 7.2)In a case if PN != BN => go to step 8) > > !image-2018-06-01-13-12-47-218.png! > 8)Deactivate the cluster with control.sh. > 9)Activate the cluster with control.sh. > Not you should see the data loss. > !image-2018-06-01-13-15-17-437.png! > Notes: > 1)Stops/Starts should be done at the same time > 2)Consistent Ids for nodes should be constant. > Not you should see the data loss. > Also, I provide the reproducer that often possible to reproduce this issue > (not always). Free the working directory and restart reproducer in case if > there is no data loss in this iteration. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (IGNITE-8676) Possible data loss after stoping/starting several nodes at the same time
[ https://issues.apache.org/jira/browse/IGNITE-8676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Gerus resolved IGNITE-8676. - Resolution: Cannot Reproduce > Possible data loss after stoping/starting several nodes at the same time > > > Key: IGNITE-8676 > URL: https://issues.apache.org/jira/browse/IGNITE-8676 > Project: Ignite > Issue Type: Bug > Components: persistence >Affects Versions: 2.4 >Reporter: Andrey Aleksandrov >Assignee: Stanislav Lukyanov >Priority: Critical > Fix For: 2.6 > > Attachments: DataLossTest.zip, Ignite8676Test.java, > image-2018-06-01-12-34-54-320.png, image-2018-06-01-13-12-47-218.png, > image-2018-06-01-13-15-17-437.png > > > Steps to reproduce: > 1)Start 3 data (DN1, DN2, DN3) nodes with the configuration that contains the > cache with node filter for only these three nodes and 1 backup. (see > configuration from attachment) > 2)Activate the cluster. Now you should have 3 nodes in BLT > 3)Start new server node (SN). Now you should have 3 nodes in BLT and 1 node > not in the baseline. > 4)Using some node load about 1 (or more) entities into the cache. > 5)Start that number of primary partitions equals to backup partitions. > !image-2018-06-01-12-34-54-320.png! > 6)Now stop DN3 and SN. After that start them at the same time. > 7)When DN3 and SN will be online, check that number of primary partitions > (PN) equals to backup partitions (BN). > 7.1)In a case if PN == BN => go to step 6) > 7.2)In a case if PN != BN => go to step 8) > > !image-2018-06-01-13-12-47-218.png! > 8)Deactivate the cluster with control.sh. > 9)Activate the cluster with control.sh. > Not you should see the data loss. > !image-2018-06-01-13-15-17-437.png! > Notes: > 1)Stops/Starts should be done at the same time > 2)Consistent Ids for nodes should be constant. > Not you should see the data loss. > Also, I provide the reproducer that often possible to reproduce this issue > (not always). Free the working directory and restart reproducer in case if > there is no data loss in this iteration. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Issue Comment Deleted] (IGNITE-8676) Possible data loss after stoping/starting several nodes at the same time
[ https://issues.apache.org/jira/browse/IGNITE-8676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Gerus updated IGNITE-8676: Comment: was deleted (was: Assigned on Stan as solution for the issue is known and should be merged to affected 2.4 master) > Possible data loss after stoping/starting several nodes at the same time > > > Key: IGNITE-8676 > URL: https://issues.apache.org/jira/browse/IGNITE-8676 > Project: Ignite > Issue Type: Bug > Components: persistence >Affects Versions: 2.4 >Reporter: Andrey Aleksandrov >Assignee: Stanislav Lukyanov >Priority: Critical > Fix For: 2.6 > > Attachments: DataLossTest.zip, Ignite8676Test.java, > image-2018-06-01-12-34-54-320.png, image-2018-06-01-13-12-47-218.png, > image-2018-06-01-13-15-17-437.png > > > Steps to reproduce: > 1)Start 3 data (DN1, DN2, DN3) nodes with the configuration that contains the > cache with node filter for only these three nodes and 1 backup. (see > configuration from attachment) > 2)Activate the cluster. Now you should have 3 nodes in BLT > 3)Start new server node (SN). Now you should have 3 nodes in BLT and 1 node > not in the baseline. > 4)Using some node load about 1 (or more) entities into the cache. > 5)Start that number of primary partitions equals to backup partitions. > !image-2018-06-01-12-34-54-320.png! > 6)Now stop DN3 and SN. After that start them at the same time. > 7)When DN3 and SN will be online, check that number of primary partitions > (PN) equals to backup partitions (BN). > 7.1)In a case if PN == BN => go to step 6) > 7.2)In a case if PN != BN => go to step 8) > > !image-2018-06-01-13-12-47-218.png! > 8)Deactivate the cluster with control.sh. > 9)Activate the cluster with control.sh. > Not you should see the data loss. > !image-2018-06-01-13-15-17-437.png! > Notes: > 1)Stops/Starts should be done at the same time > 2)Consistent Ids for nodes should be constant. > Not you should see the data loss. > Also, I provide the reproducer that often possible to reproduce this issue > (not always). Free the working directory and restart reproducer in case if > there is no data loss in this iteration. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (IGNITE-8676) Possible data loss after stoping/starting several nodes at the same time
[ https://issues.apache.org/jira/browse/IGNITE-8676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Gerus reassigned IGNITE-8676: --- Assignee: Stanislav Lukyanov Assigned on Stan as solution for the issue is known and should be merged to affected 2.4 master > Possible data loss after stoping/starting several nodes at the same time > > > Key: IGNITE-8676 > URL: https://issues.apache.org/jira/browse/IGNITE-8676 > Project: Ignite > Issue Type: Bug > Components: persistence >Affects Versions: 2.4 >Reporter: Andrey Aleksandrov >Assignee: Stanislav Lukyanov >Priority: Critical > Fix For: 2.6 > > Attachments: DataLossTest.zip, Ignite8676Test.java, > image-2018-06-01-12-34-54-320.png, image-2018-06-01-13-12-47-218.png, > image-2018-06-01-13-15-17-437.png > > > Steps to reproduce: > 1)Start 3 data (DN1, DN2, DN3) nodes with the configuration that contains the > cache with node filter for only these three nodes and 1 backup. (see > configuration from attachment) > 2)Activate the cluster. Now you should have 3 nodes in BLT > 3)Start new server node (SN). Now you should have 3 nodes in BLT and 1 node > not in the baseline. > 4)Using some node load about 1 (or more) entities into the cache. > 5)Start that number of primary partitions equals to backup partitions. > !image-2018-06-01-12-34-54-320.png! > 6)Now stop DN3 and SN. After that start them at the same time. > 7)When DN3 and SN will be online, check that number of primary partitions > (PN) equals to backup partitions (BN). > 7.1)In a case if PN == BN => go to step 6) > 7.2)In a case if PN != BN => go to step 8) > > !image-2018-06-01-13-12-47-218.png! > 8)Deactivate the cluster with control.sh. > 9)Activate the cluster with control.sh. > Not you should see the data loss. > !image-2018-06-01-13-15-17-437.png! > Notes: > 1)Stops/Starts should be done at the same time > 2)Consistent Ids for nodes should be constant. > Not you should see the data loss. > Also, I provide the reproducer that often possible to reproduce this issue > (not always). Free the working directory and restart reproducer in case if > there is no data loss in this iteration. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (IGNITE-8740) Support reuse of already initialized Ignite in IgniteSpringBean
[ https://issues.apache.org/jira/browse/IGNITE-8740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Gerus reassigned IGNITE-8740: --- Assignee: Amir Akhmedov (was: Alexander Gerus) > Support reuse of already initialized Ignite in IgniteSpringBean > --- > > Key: IGNITE-8740 > URL: https://issues.apache.org/jira/browse/IGNITE-8740 > Project: Ignite > Issue Type: Improvement > Components: spring >Affects Versions: 2.4 >Reporter: Ilya Kasnacheev >Assignee: Amir Akhmedov >Priority: Blocker > Fix For: 2.6 > > > See > http://apache-ignite-users.70518.x6.nabble.com/IgniteSpringBean-amp-Ignite-SpringTransactionManager-broken-with-2-4-td21667.html#a21724 > (there's patch available) > The idea is to introduce a workaround for users hit by IGNITE-6555, which > unfortunately broke some scenarios. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (IGNITE-8740) Support reuse of already initialized Ignite in IgniteSpringBean
[ https://issues.apache.org/jira/browse/IGNITE-8740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Gerus reassigned IGNITE-8740: --- Assignee: Alexander Gerus (was: Amir Akhmedov) > Support reuse of already initialized Ignite in IgniteSpringBean > --- > > Key: IGNITE-8740 > URL: https://issues.apache.org/jira/browse/IGNITE-8740 > Project: Ignite > Issue Type: Improvement > Components: spring >Affects Versions: 2.4 >Reporter: Ilya Kasnacheev >Assignee: Alexander Gerus >Priority: Blocker > Fix For: 2.6 > > > See > http://apache-ignite-users.70518.x6.nabble.com/IgniteSpringBean-amp-Ignite-SpringTransactionManager-broken-with-2-4-td21667.html#a21724 > (there's patch available) > The idea is to introduce a workaround for users hit by IGNITE-6555, which > unfortunately broke some scenarios. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (IGNITE-8524) Document consistency check utilities
[ https://issues.apache.org/jira/browse/IGNITE-8524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16482680#comment-16482680 ] Alexander Gerus edited comment on IGNITE-8524 at 5/22/18 9:58 AM: -- [~ivan.glukos], Hi Ivan, do you have any forecasted date for the task to be completed? Many thanks was (Author: agerus): [~ivan.glukos], Hi Ivan, do you have any forecasted date for the task to be completed? Our client is waiting for this spec. Many thanks > Document consistency check utilities > > > Key: IGNITE-8524 > URL: https://issues.apache.org/jira/browse/IGNITE-8524 > Project: Ignite > Issue Type: Task > Components: documentation >Reporter: Denis Magda >Assignee: Ivan Rakov >Priority: Critical > Fix For: 2.5 > > > Ignite 2.5 will go with special consistency check utilities that, for > instance, ensure that the data stays consistent across backups, indexes are > correct and many other things. More details can be taken from here: > * https://issues.apache.org/jira/browse/IGNITE-8277 > * https://issues.apache.org/jira/browse/IGNITE-7467 > Here is an empty page that is created for the documentation: > https://apacheignite.readme.io/docs/consistency-check-utilities -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (IGNITE-8530) Exchange hangs during start/stop stress test
[ https://issues.apache.org/jira/browse/IGNITE-8530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16482566#comment-16482566 ] Alexander Gerus edited comment on IGNITE-8530 at 5/22/18 9:55 AM: -- [~akalashnikov] Can you please help with analysis for the issue. was (Author: agerus): [~akalashnikov] Can you please help with analysis for the issue. It is really critical for multiple clients > Exchange hangs during start/stop stress test > > > Key: IGNITE-8530 > URL: https://issues.apache.org/jira/browse/IGNITE-8530 > Project: Ignite > Issue Type: Bug > Components: general >Affects Versions: 2.4 >Reporter: Mikhail Cherkasov >Assignee: Anton Kalashnikov >Priority: Major > Attachments: LocalRunner.java, Main2.java > > > Please see attached test, it starts N_CORES*2+2 nodes first and after this > starts N_CORES*2 threads with while(true) cycle in which closes and starts > nodes with small random pause. > After couple minutes it hangs with Failed to wait for partition map exchange. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-8524) Document consistency check utilities
[ https://issues.apache.org/jira/browse/IGNITE-8524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16482680#comment-16482680 ] Alexander Gerus commented on IGNITE-8524: - [~ivan.glukos], Hi Ivan, do you have any forecasted date for the task to be completed? Our client is waiting for this spec. Many thanks > Document consistency check utilities > > > Key: IGNITE-8524 > URL: https://issues.apache.org/jira/browse/IGNITE-8524 > Project: Ignite > Issue Type: Task > Components: documentation >Reporter: Denis Magda >Assignee: Ivan Rakov >Priority: Critical > Fix For: 2.5 > > > Ignite 2.5 will go with special consistency check utilities that, for > instance, ensure that the data stays consistent across backups, indexes are > correct and many other things. More details can be taken from here: > * https://issues.apache.org/jira/browse/IGNITE-8277 > * https://issues.apache.org/jira/browse/IGNITE-7467 > Here is an empty page that is created for the documentation: > https://apacheignite.readme.io/docs/consistency-check-utilities -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-8530) Exchange hangs during start/stop stress test
[ https://issues.apache.org/jira/browse/IGNITE-8530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16482566#comment-16482566 ] Alexander Gerus commented on IGNITE-8530: - [~akalashnikov] Can you please help with analysis for the issue. It is really critical for multiple clients > Exchange hangs during start/stop stress test > > > Key: IGNITE-8530 > URL: https://issues.apache.org/jira/browse/IGNITE-8530 > Project: Ignite > Issue Type: Bug > Components: general >Affects Versions: 2.4 >Reporter: Mikhail Cherkasov >Assignee: Anton Kalashnikov >Priority: Major > Attachments: LocalRunner.java, Main2.java > > > Please see attached test, it starts N_CORES*2+2 nodes first and after this > starts N_CORES*2 threads with while(true) cycle in which closes and starts > nodes with small random pause. > After couple minutes it hangs with Failed to wait for partition map exchange. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)