[jira] [Created] (IGNITE-9768) Network partition leads to failures in Ignite's atomic data types.
Mo created IGNITE-9768: -- Summary: Network partition leads to failures in Ignite's atomic data types. Key: IGNITE-9768 URL: https://issues.apache.org/jira/browse/IGNITE-9768 Project: Ignite Issue Type: Bug Affects Versions: 2.4 Reporter: Mo Creating a network partition in a replicated Ignite cluster leads to creating two independent clusters, each of which would operate independently from the other, even after the network partition is healed. Setup: 3 servers (s1,s2,s3) two clients (c1,c2). A partition created \{(s1,s2,c1),(s3,c2)}. --> At this point two independent clusters form; one containing s1 and s2, while the other containing s3. The two never rejoin even after the partition is healed. This leads to a faulty atomic types in Ignite. Affected data types: * *Atomic Sequence*: An IncrementAndGet operation on *s3* will no affect the sequence in both *s1* and *s2* (even after the partition is healed). * *AtomicLong* and *AtomicRef*: Operations such as IncrementAndGet, CompareAndSet on *s3* will not be reflected to *s1* and *s2* even after the partition heals, which leads in faulty results for clients connected to these servers. * *CountDownLatch*: A CountDown Operation on the latch in *s3* will not be reflected to the other servers. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9767) Network partition leads to failures in Ignite's semaphore
Mo created IGNITE-9767: -- Summary: Network partition leads to failures in Ignite's semaphore Key: IGNITE-9767 URL: https://issues.apache.org/jira/browse/IGNITE-9767 Project: Ignite Issue Type: Bug Affects Versions: 2.4 Reporter: Mo Creating a network partition in a replicated Ignite cluster leads to creating two independent clusters, each of which would operate independently from the other, even after the network partition is healed. Setup: 3 servers (s1,s2,s3) two clients (c1,c2). A partition created \{(s1,s2,c1),(s3,c2)}. --> At this point two independent clusters form; one containing s1 and s2, while the other containing s3. The two never rejoin even after the partition is healed. This leads to a faulty semaphore on both sides of the partition. For example, if a semaphore with one permit is created in the cluster, after creating a network partition and healing it, both *c1* and *c2* can acquire that one permit. System config: Release acquired permits if node, that owned them, left topology ==> Set to true -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9766) Network partition leads to failures in Ignite's set
Mo created IGNITE-9766: -- Summary: Network partition leads to failures in Ignite's set Key: IGNITE-9766 URL: https://issues.apache.org/jira/browse/IGNITE-9766 Project: Ignite Issue Type: Bug Affects Versions: 2.4 Reporter: Mo Creating a network partition in a replicated Ignite cluster leads to creating two independent clusters, each of which would operate independently from the other, even after the network partition is healed. Setup: 3 servers (s1,s2,s3) two clients (c1,c2). A partition created \{(s1,s2,c1),(s3,c2)}. --> At this point two independent clusters form; one containing s1 and s2, while the other containing s3. The two never rejoin even after the partition is healed. This leads to a faulty set in both sides of the partition. For example, adding an element to the set in *s3* will not add that element *s1* and *s2*, even after the partition is healed. This leads to data unavailability. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9765) Network partition leads to failures in Ignite's queue
Mo created IGNITE-9765: -- Summary: Network partition leads to failures in Ignite's queue Key: IGNITE-9765 URL: https://issues.apache.org/jira/browse/IGNITE-9765 Project: Ignite Issue Type: Bug Affects Versions: 2.4 Reporter: Mo Creating a network partition in a replicated Ignite cluster leads to creating two independent clusters, each of which would operate independently from the other, even after the network partition is healed. Setup: 3 servers (s1,s2,s3) two clients (c1,c2). A partition created \{(s1,s2,c1),(s3,c2)}. --> At this point two independent clusters form; one containing s1 and s2, while the other containing s3. The two never rejoin even after the partition is healed. Affected operations: * *Queue add*: Inserting an element to *s3*'s queue ** will no be propagated to *s1* and *s2* even after the partition is healed. This leads to data unavailability. * *Queue remove:* Removing an element from the queue in *s3* will not be executed in the other servers. This leads to reappearance of deleted data. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9762) Network partition leads to failures in Ignite's cache
Mo created IGNITE-9762: -- Summary: Network partition leads to failures in Ignite's cache Key: IGNITE-9762 URL: https://issues.apache.org/jira/browse/IGNITE-9762 Project: Ignite Issue Type: Bug Components: cache Affects Versions: 2.4 Reporter: Mo Creating a network partition in a replicated Ignite cluster leads to creating two independent clusters, each of which would operate independently from the other, even after the network partition is healed. Setup: 3 servers (s1,s2,s3) two clients (c1,c2). A partition created \{(s1,s2,c1),(s3,c2)}. --> At this point two independent clusters form; one containing s1 and s2, while the other containing s3. The two never rejoin even after the partition is healed. This leads to a faulty cache in both sides of the partition: * *Stale reads*: An update to a cache in one side of the partition will not be propagated to the other side, hence, future reads to the other side's cache (using the updated key) will be stale reads. * *Data unavailability*: Inserting a new element to the cache on one side of the partition will not be added to the other side even after the partition is healed. This results in data unavailability for clients connected to the servers on the other side of the partition. These are the settings used for the replicated cache: cfg.setCacheMode(CacheMode.REPLICATED); cfg.setAtomicityMode(CacheAtomicityMode.ATOMIC); cfg.setWriteSynchronizationMode(CacheWriteSynchronizationMode.FULL_SYNC); cfg.setReadFromBackup(false); cfg.setPartitionLossPolicy(PartitionLossPolicy.READ_ONLY_SAFE); -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8883) Semaphore fails on network partitioning 2
Mo created IGNITE-8883: -- Summary: Semaphore fails on network partitioning 2 Key: IGNITE-8883 URL: https://issues.apache.org/jira/browse/IGNITE-8883 Project: Ignite Issue Type: Bug Components: data structures Reporter: Mo Scenario: Three servers (s1,s2,s3) two clients (c1,c2). A semaphore with one permit is created. Config: # {{Release acquired permits if the node that owned them left topology: set to true}} steps: # c2 acquires the permit. # Network failure happens, isolating c2 from the rest of nodes for a period of time. # Network heals. # c2 releases the permit. # c2 acquires the permit. # c1 tries to acquire lock but fails (exception) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8882) Semaphore fails on network partitioning 1
Mo created IGNITE-8882: -- Summary: Semaphore fails on network partitioning 1 Key: IGNITE-8882 URL: https://issues.apache.org/jira/browse/IGNITE-8882 Project: Ignite Issue Type: Bug Components: data structures Reporter: Mo Scenario: Three servers (s1,s2,s3) two clients (c1, c2, c3, c4). A semaphore with one permit is created. Config: {{1. Release acquired permits if the node that owned them left topology: set to false}} 2. TCP discovery mode: on steps: # c2 acquires the permit. # Network failure happens, isolating s1,s2, c1, and c3 from s3, c2, and c4 (i.e., (s1,s2,c1,c3),(s3,c2,c4)}) # c2 releases the lock # c1 and c3 try to acquire lock, but fail (an exception happens) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8881) Semaphore hangs on network partitioning
Mo created IGNITE-8881: -- Summary: Semaphore hangs on network partitioning Key: IGNITE-8881 URL: https://issues.apache.org/jira/browse/IGNITE-8881 Project: Ignite Issue Type: Bug Components: data structures Affects Versions: 2.4 Reporter: Mo Scenario: Three servers (s1,s2,s3) two clients (c1,c2). A semaphore with one permit is created. Config: {{1. Release acquired permits if the node that owned them left topology: set to false}} 2. TCP discovery mode: on 1.c2 takes a lock 2. Network partitioning \{(s1,s2,,s3,c1),(c2)}, then heal it 4. c3 tries to release the lock, but hangs steps: # c2 acquires the permit. # Network failure happens, isolating c2 from the rest of nodes for a period of time. # Network heals. # c2 tries to release the permit but hangs. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8593) The semaphore's isBroken function doesn't work properly.
Mo created IGNITE-8593: -- Summary: The semaphore's isBroken function doesn't work properly. Key: IGNITE-8593 URL: https://issues.apache.org/jira/browse/IGNITE-8593 Project: Ignite Issue Type: Bug Components: data structures Affects Versions: 2.4 Reporter: Mo Scenario: Three servers (s1,s2,s3) two clients (c1,c2). A semaphore with one permit is created. Config: {{Release acquired permits if node, that owned them, left topology: set to false}} # c2 acquires the permit. # Network failure happens, isolating c2 from the rest of nodes for a period of time. # Network heals. # c2 releases the permit. # c2 acquires the permit. # Calling semaphore.isBroken() returns false on both c1 and c2. # c1 tries to acquire the permit but fails. # Now calling isBroken() returns true on both c1 and c2. I think isBroken() should return true before a client tries to acquire a permit, and then fails (i.e., in step 6) rather than after acquiring a permit fails, as in the latter case, what purpose does the isBroken() function serves? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8592) Network partitions lead to two independent clusters
Mo created IGNITE-8592: -- Summary: Network partitions lead to two independent clusters Key: IGNITE-8592 URL: https://issues.apache.org/jira/browse/IGNITE-8592 Project: Ignite Issue Type: Bug Affects Versions: 2.4 Reporter: Mo Creating a network partition in a replicated Ignite cluster leads to creating two independent clusters, each of which would operate independently from the other, even after the network partition is healed. Setup: 3 servers (s1,s2,s3) two clients (c1,c2). A partition created \{(s1,s2,c1),(s3,c2)}. --> At this point two independent clusters form; one containing s1 and s2, while the other containing s3. The two never rejoin even after the partition is healed. This creates different kinds of problems for the different data structure ignite provides, such as the cache (stale reads, and data unavailability), atomic types (atomicref and long ) ... etc. These are the settings used for the replicated cache: cfg.setCacheMode(CacheMode.REPLICATED); cfg.setAtomicityMode(CacheAtomicityMode.ATOMIC); cfg.setWriteSynchronizationMode(CacheWriteSynchronizationMode.FULL_SYNC); cfg.setReadFromBackup(false); cfg.setPartitionLossPolicy(PartitionLossPolicy.READ_ONLY_SAFE); -- This message was sent by Atlassian JIRA (v7.6.3#76005)