[jira] [Updated] (CASSANDRA-14800) Avoid referencing DatabaseDescriptor in ProtocolVersion
[ https://issues.apache.org/jira/browse/CASSANDRA-14800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcus Eriksson updated CASSANDRA-14800: Resolution: Fixed Status: Resolved (was: Ready to Commit) committed as {{bd0cef9a369ae9245b45040796a6e10f51e522ce}}, thanks! > Avoid referencing DatabaseDescriptor in ProtocolVersion > --- > > Key: CASSANDRA-14800 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14800 > Project: Cassandra > Issue Type: Improvement >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Minor > Fix For: 4.0 > > > We should not reference {{DatabaseDescriptor}} in {{ProtocolVersion}} as it > is used outside of Cassandra (for example when handling full query logs) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
cassandra git commit: Avoid using DatabaseDescriptor in ProtocolVersion
Repository: cassandra Updated Branches: refs/heads/trunk 47a10649d -> bd0cef9a3 Avoid using DatabaseDescriptor in ProtocolVersion Patch by marcuse; reviewed by Sam Tunnicliffe for CASSANDRA-14800 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/bd0cef9a Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/bd0cef9a Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/bd0cef9a Branch: refs/heads/trunk Commit: bd0cef9a369ae9245b45040796a6e10f51e522ce Parents: 47a1064 Author: Marcus Eriksson Authored: Wed Oct 3 10:12:54 2018 +0200 Committer: Marcus Eriksson Committed: Thu Oct 4 08:23:49 2018 +0200 -- src/java/org/apache/cassandra/transport/Client.java | 2 +- src/java/org/apache/cassandra/transport/Frame.java| 2 +- .../org/apache/cassandra/transport/ProtocolVersion.java | 6 ++ .../org/apache/cassandra/audit/FullQueryLoggerTest.java | 4 ++-- .../apache/cassandra/transport/ProtocolVersionTest.java | 10 +- .../src/org/apache/cassandra/fqltool/FQLQueryReader.java | 2 +- .../src/org/apache/cassandra/fqltool/commands/Dump.java | 2 +- 7 files changed, 13 insertions(+), 15 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/bd0cef9a/src/java/org/apache/cassandra/transport/Client.java -- diff --git a/src/java/org/apache/cassandra/transport/Client.java b/src/java/org/apache/cassandra/transport/Client.java index f7ed272..eacd5a9 100644 --- a/src/java/org/apache/cassandra/transport/Client.java +++ b/src/java/org/apache/cassandra/transport/Client.java @@ -292,7 +292,7 @@ public class Client extends SimpleClient // Parse options. String host = args[0]; int port = Integer.parseInt(args[1]); -ProtocolVersion version = args.length == 3 ? ProtocolVersion.decode(Integer.parseInt(args[2])) : ProtocolVersion.CURRENT; +ProtocolVersion version = args.length == 3 ? ProtocolVersion.decode(Integer.parseInt(args[2]), DatabaseDescriptor.getNativeTransportAllowOlderProtocols()) : ProtocolVersion.CURRENT; EncryptionOptions encryptionOptions = new EncryptionOptions(); System.out.println("CQL binary protocol console " + host + "@" + port + " using native protocol version " + version); http://git-wip-us.apache.org/repos/asf/cassandra/blob/bd0cef9a/src/java/org/apache/cassandra/transport/Frame.java -- diff --git a/src/java/org/apache/cassandra/transport/Frame.java b/src/java/org/apache/cassandra/transport/Frame.java index d6a1cbc..d3c810b 100644 --- a/src/java/org/apache/cassandra/transport/Frame.java +++ b/src/java/org/apache/cassandra/transport/Frame.java @@ -174,7 +174,7 @@ public class Frame int firstByte = buffer.getByte(idx++); Message.Direction direction = Message.Direction.extractFromVersion(firstByte); int versionNum = firstByte & PROTOCOL_VERSION_MASK; -ProtocolVersion version = ProtocolVersion.decode(versionNum); +ProtocolVersion version = ProtocolVersion.decode(versionNum, DatabaseDescriptor.getNativeTransportAllowOlderProtocols()); // Wait until we have the complete header if (readableBytes < Header.LENGTH) http://git-wip-us.apache.org/repos/asf/cassandra/blob/bd0cef9a/src/java/org/apache/cassandra/transport/ProtocolVersion.java -- diff --git a/src/java/org/apache/cassandra/transport/ProtocolVersion.java b/src/java/org/apache/cassandra/transport/ProtocolVersion.java index e1f634c..546983f 100644 --- a/src/java/org/apache/cassandra/transport/ProtocolVersion.java +++ b/src/java/org/apache/cassandra/transport/ProtocolVersion.java @@ -26,8 +26,6 @@ import java.util.Optional; import org.apache.commons.lang3.ArrayUtils; -import org.apache.cassandra.config.DatabaseDescriptor; - /** * The native (CQL binary) protocol version. * @@ -95,7 +93,7 @@ public enum ProtocolVersion implements Comparable return versions; } -public static ProtocolVersion decode(int versionNum) +public static ProtocolVersion decode(int versionNum, boolean allowOlderProtocols) { ProtocolVersion ret = versionNum >= MIN_SUPPORTED_VERSION.num && versionNum <= MAX_SUPPORTED_VERSION.num ? SUPPORTED_VERSIONS[versionNum - MIN_SUPPORTED_VERSION.num] @@ -116,7 +114,7 @@ public enum ProtocolVersion implements Comparable throw new ProtocolException(invalidVersionMessage(versionNum), MAX_SUPPORTED_VERSION); } -if (!DatabaseDescriptor.getNativeTransportAllowOlderProtocols() && ret.isSmallerThan(CURREN
[jira] [Comment Edited] (CASSANDRA-14747) Evaluate 200 node, compression=none, encryption=none, coalescing=off
[ https://issues.apache.org/jira/browse/CASSANDRA-14747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16637835#comment-16637835 ] Dinesh Joshi edited comment on CASSANDRA-14747 at 10/4/18 6:13 AM: --- [~jolynch] this is pretty cool! I think it would make sense to set all tunable knobs to default and see the impact. Then we can start tuning the parameters to arrive at sensible defaults. We should also document the findings. was (Author: djoshi3): [~jolynch] this is pretty cool! I think it would make sense to set all tunable knobs to default and see the impact. > Evaluate 200 node, compression=none, encryption=none, coalescing=off > - > > Key: CASSANDRA-14747 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14747 > Project: Cassandra > Issue Type: Sub-task >Reporter: Joseph Lynch >Assignee: Joseph Lynch >Priority: Major > Attachments: 3.0.17-QPS.png, 4.0.1-QPS.png, > 4.0.11-after-jolynch-tweaks.svg, 4.0.12-after-unconditional-flush.svg, > 4.0.15-after-sndbuf-fix.svg, 4.0.7-before-my-changes.svg, > 4.0_errors_showing_heap_pressure.txt, > 4.0_heap_histogram_showing_many_MessageOuts.txt, > i-0ed2acd2dfacab7c1-after-looping-fixes.svg, > trunk_vs_3.0.17_latency_under_load.png, > ttop_NettyOutbound-Thread_spinning.txt, > useast1c-i-0e1ddfe8b2f769060-mutation-flame.svg, > useast1e-i-08635fa1631601538_flamegraph_96node.svg, > useast1e-i-08635fa1631601538_ttop_netty_outbound_threads_96nodes, > useast1e-i-08635fa1631601538_uninlinedcpuflamegraph.0_96node_60sec_profile.svg > > > Tracks evaluating a 200 node cluster with all internode settings off (no > compression, no encryption, no coalescing). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-14747) Evaluate 200 node, compression=none, encryption=none, coalescing=off
[ https://issues.apache.org/jira/browse/CASSANDRA-14747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16637835#comment-16637835 ] Dinesh Joshi edited comment on CASSANDRA-14747 at 10/4/18 6:05 AM: --- [~jolynch] this is pretty cool! I think it would make sense to set all tunable knobs to default and see the impact. was (Author: djoshi3): [~jolynch] this is pretty cool! > Evaluate 200 node, compression=none, encryption=none, coalescing=off > - > > Key: CASSANDRA-14747 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14747 > Project: Cassandra > Issue Type: Sub-task >Reporter: Joseph Lynch >Assignee: Joseph Lynch >Priority: Major > Attachments: 3.0.17-QPS.png, 4.0.1-QPS.png, > 4.0.11-after-jolynch-tweaks.svg, 4.0.12-after-unconditional-flush.svg, > 4.0.15-after-sndbuf-fix.svg, 4.0.7-before-my-changes.svg, > 4.0_errors_showing_heap_pressure.txt, > 4.0_heap_histogram_showing_many_MessageOuts.txt, > i-0ed2acd2dfacab7c1-after-looping-fixes.svg, > trunk_vs_3.0.17_latency_under_load.png, > ttop_NettyOutbound-Thread_spinning.txt, > useast1c-i-0e1ddfe8b2f769060-mutation-flame.svg, > useast1e-i-08635fa1631601538_flamegraph_96node.svg, > useast1e-i-08635fa1631601538_ttop_netty_outbound_threads_96nodes, > useast1e-i-08635fa1631601538_uninlinedcpuflamegraph.0_96node_60sec_profile.svg > > > Tracks evaluating a 200 node cluster with all internode settings off (no > compression, no encryption, no coalescing). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14747) Evaluate 200 node, compression=none, encryption=none, coalescing=off
[ https://issues.apache.org/jira/browse/CASSANDRA-14747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16637835#comment-16637835 ] Dinesh Joshi commented on CASSANDRA-14747: -- [~jolynch] this is pretty cool! > Evaluate 200 node, compression=none, encryption=none, coalescing=off > - > > Key: CASSANDRA-14747 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14747 > Project: Cassandra > Issue Type: Sub-task >Reporter: Joseph Lynch >Assignee: Joseph Lynch >Priority: Major > Attachments: 3.0.17-QPS.png, 4.0.1-QPS.png, > 4.0.11-after-jolynch-tweaks.svg, 4.0.12-after-unconditional-flush.svg, > 4.0.15-after-sndbuf-fix.svg, 4.0.7-before-my-changes.svg, > 4.0_errors_showing_heap_pressure.txt, > 4.0_heap_histogram_showing_many_MessageOuts.txt, > i-0ed2acd2dfacab7c1-after-looping-fixes.svg, > trunk_vs_3.0.17_latency_under_load.png, > ttop_NettyOutbound-Thread_spinning.txt, > useast1c-i-0e1ddfe8b2f769060-mutation-flame.svg, > useast1e-i-08635fa1631601538_flamegraph_96node.svg, > useast1e-i-08635fa1631601538_ttop_netty_outbound_threads_96nodes, > useast1e-i-08635fa1631601538_uninlinedcpuflamegraph.0_96node_60sec_profile.svg > > > Tracks evaluating a 200 node cluster with all internode settings off (no > compression, no encryption, no coalescing). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14804) Running repair on multiple nodes in parallel could halt entire repair
[ https://issues.apache.org/jira/browse/CASSANDRA-14804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16637811#comment-16637811 ] Jaydeepkumar Chovatia commented on CASSANDRA-14804: --- [~bdeggleston] [~jjirsa] Could you please see if this analysis makes sense or not? > Running repair on multiple nodes in parallel could halt entire repair > -- > > Key: CASSANDRA-14804 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14804 > Project: Cassandra > Issue Type: Bug > Components: Repair >Reporter: Jaydeepkumar Chovatia >Priority: Major > Fix For: 3.0.18 > > > Possible deadlock if we run repair on multiple nodes at the same time. We > have come across a situation in production in which if we repair multiple > nodes at the same time then repair hangs forever. Here are the details: > Time t1 > {{node-1}} has issued repair command to {{node-2}} but due to some reason > {{node-2}} didn't receive request hence {{node-1}} is awaiting at > [prepareForRepair > |https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/ActiveRepairService.java#L333] > for 1 hour *with lock* > Time t2 > {{node-2}} sent prepare repair request to {{node-1}}, some exception > occurred on {{node-1}} and it is trying to cleanup parent session > [here|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/repair/RepairMessageVerbHandler.java#L172] > but {{node-1}} cannot get lock as 1 hour of time has not yet elapsed (above > one) > snippet of jstack on {{node-1}} > {quote}"Thread-888" #262588 daemon prio=5 os_prio=0 waiting on condition > java.lang.Thread.State: TIMED_WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for (a java.util.concurrent.CountDownLatch$Sync) > at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328) > at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277) > at > org.apache.cassandra.service.ActiveRepairService.prepareForRepair(ActiveRepairService.java:332) > - locked <> (a org.apache.cassandra.service.ActiveRepairService) > at > org.apache.cassandra.repair.RepairRunnable.runMayThrow(RepairRunnable.java:214) > at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79) > at > org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$9/864248990.run(Unknown > Source) > at java.lang.Thread.run(Thread.java:748) > "AntiEntropyStage:1" #1789 daemon prio=5 os_prio=0 waiting for monitor entry > [] > java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.cassandra.service.ActiveRepairService.removeParentRepairSession(ActiveRepairService.java:421) > - waiting to lock <> (a org.apache.cassandra.service.ActiveRepairService) > at > org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:172) > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at > org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79) > at > org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$9/864248990.run(Unknown > Source) > at java.lang.Thread.run(Thread.java:748){quote} > Time t3: > {{node-2}}(and possibly other nodes {{node-3}}…) sent [prepare request > |https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/ActiveRepairService.java#L333] > to {{node-1}}, but {{node-1}}’s AntiEntropyStage thread is busy awaiting for > lock at {{ActiveRepairService.removeParentRepairSession}}, hence {{node-2}}, > {{node-3}} (and possibly other nodes) will also go in 1 hour wait *with > lock*. This rolling effect continues and stalls repair in entire ring. > If we totally stop triggering repair then system would recover slowly but > here are the two major problems with this: > 1. Externally there is no way to decide whether to trig
[jira] [Created] (CASSANDRA-14804) Running repair on multiple nodes in parallel could halt entire repair
Jaydeepkumar Chovatia created CASSANDRA-14804: - Summary: Running repair on multiple nodes in parallel could halt entire repair Key: CASSANDRA-14804 URL: https://issues.apache.org/jira/browse/CASSANDRA-14804 Project: Cassandra Issue Type: Bug Components: Repair Reporter: Jaydeepkumar Chovatia Fix For: 3.0.18 Possible deadlock if we run repair on multiple nodes at the same time. We have come across a situation in production in which if we repair multiple nodes at the same time then repair hangs forever. Here are the details: Time t1 {{node-1}} has issued repair command to {{node-2}} but due to some reason {{node-2}} didn't receive request hence {{node-1}} is awaiting at [prepareForRepair |https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/ActiveRepairService.java#L333] for 1 hour *with lock* Time t2 {{node-2}} sent prepare repair request to {{node-1}}, some exception occurred on {{node-1}} and it is trying to cleanup parent session [here|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/repair/RepairMessageVerbHandler.java#L172] but {{node-1}} cannot get lock as 1 hour of time has not yet elapsed (above one) snippet of jstack on {{node-1}} {quote}"Thread-888" #262588 daemon prio=5 os_prio=0 waiting on condition java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for (a java.util.concurrent.CountDownLatch$Sync) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037) at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328) at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277) at org.apache.cassandra.service.ActiveRepairService.prepareForRepair(ActiveRepairService.java:332) - locked <> (a org.apache.cassandra.service.ActiveRepairService) at org.apache.cassandra.repair.RepairRunnable.runMayThrow(RepairRunnable.java:214) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79) at org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$9/864248990.run(Unknown Source) at java.lang.Thread.run(Thread.java:748) "AntiEntropyStage:1" #1789 daemon prio=5 os_prio=0 waiting for monitor entry [] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.cassandra.service.ActiveRepairService.removeParentRepairSession(ActiveRepairService.java:421) - waiting to lock <> (a org.apache.cassandra.service.ActiveRepairService) at org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:172) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79) at org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$9/864248990.run(Unknown Source) at java.lang.Thread.run(Thread.java:748){quote} Time t3: {{node-2}}(and possibly other nodes {{node-3}}…) sent [prepare request |https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/ActiveRepairService.java#L333] to {{node-1}}, but {{node-1}}’s AntiEntropyStage thread is busy awaiting for lock at {{ActiveRepairService.removeParentRepairSession}}, hence {{node-2}}, {{node-3}} (and possibly other nodes) will also go in 1 hour wait *with lock*. This rolling effect continues and stalls repair in entire ring. If we totally stop triggering repair then system would recover slowly but here are the two major problems with this: 1. Externally there is no way to decide whether to trigger new repair or wait for system to recover 2. In this case system recovers eventually but it takes probably {{n}} hours where n = #of repair requests fired, only way to come out of this situation is either to do a rolling restart of entire ring or wait for {{n}} hours before triggering new repair request Please let me know if my above analysis makes sense or not. -- This message was sent by Atlassian JIRA (v7.6.3#76005) -
[jira] [Comment Edited] (CASSANDRA-14747) Evaluate 200 node, compression=none, encryption=none, coalescing=off
[ https://issues.apache.org/jira/browse/CASSANDRA-14747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16637652#comment-16637652 ] Joseph Lynch edited comment on CASSANDRA-14747 at 10/4/18 12:38 AM: [~jasobrown] Ok, I think we found the problem! In the new Netty code we explicitly set the {{SO_SNDBUF}} [of the outbound socket|https://github.com/apache/cassandra/blob/47a10649dadbdea6960836a7c0fe6d271a476204/src/java/org/apache/cassandra/net/async/NettyFactory.java#L332] to 64KB. This works great if you have a low latency connection, but for long fat networks this is a serious issue as you restrict your bandwidth significantly due to a high [bandwidth delay product|https://en.wikipedia.org/wiki/Bandwidth-delay_product]. In the tests we've been running where we are trying to push a semi reasonable amount of traffic (like 8mbps) to peers that are about 80ms away (us-east-1 to eu-west-1 is usually about [80ms|https://www.cloudping.co/]). With a 64KB window size we just don't have enough bandwidth even though the actual link is very high bandwidth. As we can see using {{iperf}} setting a 64KB buffer cripples throughput: {noformat} # On the eu-west-1 node X $ iperf -s -p 8080 Server listening on TCP port 8080 TCP window size: 12.0 MByte (default) [ 4] local X port 8080 connected with Y port 26964 [ ID] Interval Transfer Bandwidth [ 4] 0.0-10.5 sec 506 MBytes 404 Mbits/sec [ 5] local X port 8080 connected with Y port 27050 [ 5] 0.0-10.5 sec 8.50 MBytes 6.81 Mbits/sec # On the us-east-1 node Y about 80ms away $ iperf -N -c X -p 8080 Client connecting to X, TCP port 8080 TCP window size: 12.0 MByte (default) [ 3] local Y port 26964 connected with X port 8080 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.1 sec 506 MBytes 421 Mbits/sec $ iperf -N -w 64K -c X -p 8080 Client connecting to X, TCP port 8080 TCP window size: 128 KByte (WARNING: requested 64.0 KByte) [ 3] local Y port 27050 connected with X port 8080 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.1 sec 8.50 MBytes 7.03 Mbits/sec {noformat} So instead of Cassandra getting the full link's bandwidth of 500mbps we're only able to get 7mbps. This is lower than the 8mbps we need to push so the us-east-1 -> eu-west-1 queues effectively grow without bound until we start dropping messages. I applied a [patch|https://gist.github.com/jolynch/966e0e52f34eff7a7b8ac8d5a9cb4b5d#file-fix-the-problem-diff] which does not set {{SO_SNDBUF}} unless explicitly asked to and *everything is completely wonderful* now. Some ways that things are wonderful: 1. The cpu usage is now on par with 3.0.x, and most of that CPU time is spent in compaction (both in garbage creation and actual cpu time): {noformat} $ sjk ttop -p $(pgrep -f Cassandra) -n 20 -o CPU 2018-10-03T23:56:40.889+ Process summary process cpu=321.33% application cpu=301.46% (user=185.93% sys=115.52%) other: cpu=19.88% thread count: 274 GC time=5.27% (young=5.27%, old=0.00%) heap allocation rate 478mb/s safe point rate: 0.4 (events/s) avg. safe point pause: 135.64ms safe point sync time: 0.08% processing time: 5.38% (wallclock time) [000135] user=49.03% sys=11.84% alloc= 142mb/s - CompactionExecutor:1 [000136] user=44.60% sys=13.81% alloc= 133mb/s - CompactionExecutor:2 [000198] user= 0.00% sys=41.46% alloc= 4833b/s - NonPeriodicTasks:1 [10] user= 9.56% sys= 0.67% alloc= 57mb/s - spectator-gauge-polling-0 [29] user= 7.45% sys= 2.13% alloc= 5772kb/s - PerDiskMemtableFlushWriter_0:1 [36] user= 0.00% sys= 8.98% alloc= 2598b/s - PERIODIC-COMMIT-LOG-SYNCER [000115] user= 5.74% sys= 2.22% alloc= 12mb/s - MessagingService-NettyInbound-Thread-3-1 [000118] user= 4.03% sys= 3.75% alloc= 2915kb/s - MessagingService-NettyOutbound-Thread-4-3 [000117] user= 3.12% sys= 2.79% alloc= 2110kb/s - MessagingService-NettyOutbound-Thread-4-2 [000144] user= 4.03% sys= 0.92% alloc= 7205kb/s - MutationStage-1 [000146] user= 4.13% sys= 0.77% alloc= 6837kb/s - Native-Transport-Requests-2 [000147] user= 3.12% sys= 1.49% alloc= 6054kb/s - MutationStage-3 [000150] user= 3.22% sys= 1.21% alloc= 6630kb/s - MutationStage-4 [000116] user= 2.72% sys= 1.61% alloc= 1412kb/s - MessagingService-NettyOutbound-Thread-4-1 [000132] user= 2.21% sys= 2.04% alloc= 11mb/s - MessagingService-NettyInbound-Thread-3-2 [000151] user= 2.92% sys= 1.30% alloc= 5462kb/s - Native-Transport-Requests-5 [000134] user= 2.11% sys= 1.71% alloc= 6212kb/s - MessagingService-NettyInb
[jira] [Comment Edited] (CASSANDRA-14747) Evaluate 200 node, compression=none, encryption=none, coalescing=off
[ https://issues.apache.org/jira/browse/CASSANDRA-14747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16637652#comment-16637652 ] Joseph Lynch edited comment on CASSANDRA-14747 at 10/4/18 12:31 AM: [~jasobrown] Ok, I think we found the problem! In the new Netty code we explicitly set the {{SO_SNDBUF}} [of the outbound socket|https://github.com/apache/cassandra/blob/47a10649dadbdea6960836a7c0fe6d271a476204/src/java/org/apache/cassandra/net/async/NettyFactory.java#L332] to 64KB. This works great if you have a low latency connection, but for long fat networks this is a serious issue as you restrict your bandwidth significantly due to a high [bandwidth delay product|https://en.wikipedia.org/wiki/Bandwidth-delay_product]. In the tests we've been running where we are trying to push a semi reasonable amount of traffic (like 8mbps) to peers that are about 80ms away (us-east-1 to eu-west-1 is usually about [80ms|https://www.cloudping.co/]). With a 64KB window size we just don't have enough bandwidth even though the actual link is very high bandwidth. As we can see using {{iperf}} setting a 64KB buffer cripples throughput: {noformat} # On the eu-west-1 node X $ iperf -s -p 8080 Server listening on TCP port 8080 TCP window size: 12.0 MByte (default) [ 4] local X port 8080 connected with Y port 26964 [ ID] Interval Transfer Bandwidth [ 4] 0.0-10.5 sec 506 MBytes 404 Mbits/sec [ 5] local X port 8080 connected with Y port 27050 [ 5] 0.0-10.5 sec 8.50 MBytes 6.81 Mbits/sec # On the us-east-1 node Y about 80ms away $ iperf -N -c X -p 8080 Client connecting to X, TCP port 8080 TCP window size: 12.0 MByte (default) [ 3] local Y port 26964 connected with X port 8080 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.1 sec 506 MBytes 421 Mbits/sec $ iperf -N -w 64K -c X -p 8080 Client connecting to X, TCP port 8080 TCP window size: 128 KByte (WARNING: requested 64.0 KByte) [ 3] local Y port 27050 connected with X port 8080 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.1 sec 8.50 MBytes 7.03 Mbits/sec {noformat} So instead of Cassandra getting the full link's bandwidth of 500mbps we're only able to get 7mbps. This is lower than the 8mbps we need to push so the us-east-1 -> eu-west-1 queues effectively grow without bound until we start dropping messages. I applied a [patch|https://gist.github.com/jolynch/966e0e52f34eff7a7b8ac8d5a9cb4b5d#file-fix-the-problem-diff] which does not set {{SO_SNDBUF}} unless explicitly asked to and *everything is completely wonderful* now. Some ways that things are wonderful: 1. The cpu usage is now on par with 3.0.x, and most of that CPU time is spent in compaction (both in garbage creation and actual cpu time): {noformat} 2018-10-03T23:56:40.889+ Process summary process cpu=321.33% application cpu=301.46% (user=185.93% sys=115.52%) other: cpu=19.88% thread count: 274 GC time=5.27% (young=5.27%, old=0.00%) heap allocation rate 478mb/s safe point rate: 0.4 (events/s) avg. safe point pause: 135.64ms safe point sync time: 0.08% processing time: 5.38% (wallclock time) [000135] user=49.03% sys=11.84% alloc= 142mb/s - CompactionExecutor:1 [000136] user=44.60% sys=13.81% alloc= 133mb/s - CompactionExecutor:2 [000198] user= 0.00% sys=41.46% alloc= 4833b/s - NonPeriodicTasks:1 [10] user= 9.56% sys= 0.67% alloc= 57mb/s - spectator-gauge-polling-0 [29] user= 7.45% sys= 2.13% alloc= 5772kb/s - PerDiskMemtableFlushWriter_0:1 [36] user= 0.00% sys= 8.98% alloc= 2598b/s - PERIODIC-COMMIT-LOG-SYNCER [000115] user= 5.74% sys= 2.22% alloc= 12mb/s - MessagingService-NettyInbound-Thread-3-1 [000118] user= 4.03% sys= 3.75% alloc= 2915kb/s - MessagingService-NettyOutbound-Thread-4-3 [000117] user= 3.12% sys= 2.79% alloc= 2110kb/s - MessagingService-NettyOutbound-Thread-4-2 [000144] user= 4.03% sys= 0.92% alloc= 7205kb/s - MutationStage-1 [000146] user= 4.13% sys= 0.77% alloc= 6837kb/s - Native-Transport-Requests-2 [000147] user= 3.12% sys= 1.49% alloc= 6054kb/s - MutationStage-3 [000150] user= 3.22% sys= 1.21% alloc= 6630kb/s - MutationStage-4 [000116] user= 2.72% sys= 1.61% alloc= 1412kb/s - MessagingService-NettyOutbound-Thread-4-1 [000132] user= 2.21% sys= 2.04% alloc= 11mb/s - MessagingService-NettyInbound-Thread-3-2 [000151] user= 2.92% sys= 1.30% alloc= 5462kb/s - Native-Transport-Requests-5 [000134] user= 2.11% sys= 1.71% alloc= 6212kb/s - MessagingService-NettyInbound-Thread-3-4 [000152] user= 3.02% sys= 0.65% al
[jira] [Comment Edited] (CASSANDRA-14747) Evaluate 200 node, compression=none, encryption=none, coalescing=off
[ https://issues.apache.org/jira/browse/CASSANDRA-14747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16637652#comment-16637652 ] Joseph Lynch edited comment on CASSANDRA-14747 at 10/4/18 12:31 AM: [~jasobrown] Ok, I think we found the problem! In the new Netty code we explicitly set the {{SO_SNDBUF}} [of the outbound socket|https://github.com/apache/cassandra/blob/47a10649dadbdea6960836a7c0fe6d271a476204/src/java/org/apache/cassandra/net/async/NettyFactory.java#L332] to 64KB. This works great if you have a low latency connection, but for long fat networks this is a serious issue as you restrict your bandwidth significantly due to a high [bandwidth delay product|https://en.wikipedia.org/wiki/Bandwidth-delay_product]. In the tests we've been running where we are trying to push a semi reasonable amount of traffic (like 8mbps) to peers that are about 80ms away (us-east-1 to eu-west-1 is usually about [80ms|https://www.cloudping.co/]). With a 64KB window size we just don't have enough bandwidth even though the actual link is very high bandwidth. As we can see using {{iperf}} setting a 64KB buffer cripples throughput: {noformat} # On the eu-west-1 node X $ iperf -s -p 8080 Server listening on TCP port 8080 TCP window size: 12.0 MByte (default) [ 4] local X port 8080 connected with Y port 26964 [ ID] Interval Transfer Bandwidth [ 4] 0.0-10.5 sec 506 MBytes 404 Mbits/sec [ 5] local X port 8080 connected with Y port 27050 [ 5] 0.0-10.5 sec 8.50 MBytes 6.81 Mbits/sec # On the us-east-1 node Y about 80ms away $ iperf -N -c X -p 8080 Client connecting to X, TCP port 8080 TCP window size: 12.0 MByte (default) [ 3] local Y port 26964 connected with X port 8080 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.1 sec 506 MBytes 421 Mbits/sec $ iperf -N -w 64K -c X -p 8080 Client connecting to X, TCP port 8080 TCP window size: 128 KByte (WARNING: requested 64.0 KByte) [ 3] local Y port 27050 connected with X port 8080 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.1 sec 8.50 MBytes 7.03 Mbits/sec {noformat} So instead of Cassandra getting the full link's bandwidth of 500mbps we're only able to get 7mbps. This is lower than the 8mbps we need to push so the us-east-1 -> eu-west-1 queues effectively grow without bound until we start dropping messages. I applied a [patch|https://gist.github.com/jolynch/966e0e52f34eff7a7b8ac8d5a9cb4b5d#file-fix-the-problem-diff] which does not set {{SO_SNDBUF}} unless explicitly asked to and *everything is completely wonderful* now. Some ways that things are wonderful: 1. The cpu usage is now on par with 3.0.x, and most of that CPU time is spent in compaction (both in garbage creation and actual cpu time): {noformat} 2018-10-03T23:56:40.889+ Process summary process cpu=321.33% application cpu=301.46% (user=185.93% sys=115.52%) other: cpu=19.88% thread count: 274 GC time=5.27% (young=5.27%, old=0.00%) heap allocation rate 478mb/s safe point rate: 0.4 (events/s) avg. safe point pause: 135.64ms safe point sync time: 0.08% processing time: 5.38% (wallclock time) [000135] user=49.03% sys=11.84% alloc= 142mb/s - CompactionExecutor:1 [000136] user=44.60% sys=13.81% alloc= 133mb/s - CompactionExecutor:2 [000198] user= 0.00% sys=41.46% alloc= 4833b/s - NonPeriodicTasks:1 [10] user= 9.56% sys= 0.67% alloc= 57mb/s - spectator-gauge-polling-0 [29] user= 7.45% sys= 2.13% alloc= 5772kb/s - PerDiskMemtableFlushWriter_0:1 [36] user= 0.00% sys= 8.98% alloc= 2598b/s - PERIODIC-COMMIT-LOG-SYNCER [000115] user= 5.74% sys= 2.22% alloc= 12mb/s - MessagingService-NettyInbound-Thread-3-1 [000118] user= 4.03% sys= 3.75% alloc= 2915kb/s - MessagingService-NettyOutbound-Thread-4-3 [000117] user= 3.12% sys= 2.79% alloc= 2110kb/s - MessagingService-NettyOutbound-Thread-4-2 [000144] user= 4.03% sys= 0.92% alloc= 7205kb/s - MutationStage-1 [000146] user= 4.13% sys= 0.77% alloc= 6837kb/s - Native-Transport-Requests-2 [000147] user= 3.12% sys= 1.49% alloc= 6054kb/s - MutationStage-3 [000150] user= 3.22% sys= 1.21% alloc= 6630kb/s - MutationStage-4 [000116] user= 2.72% sys= 1.61% alloc= 1412kb/s - MessagingService-NettyOutbound-Thread-4-1 [000132] user= 2.21% sys= 2.04% alloc= 11mb/s - MessagingService-NettyInbound-Thread-3-2 [000151] user= 2.92% sys= 1.30% alloc= 5462kb/s - Native-Transport-Requests-5 [000134] user= 2.11% sys= 1.71% alloc= 6212kb/s - MessagingService-NettyInbound-Thread-3-4 [000152] user= 3.02% sys= 0.65% al
[jira] [Comment Edited] (CASSANDRA-14747) Evaluate 200 node, compression=none, encryption=none, coalescing=off
[ https://issues.apache.org/jira/browse/CASSANDRA-14747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16637652#comment-16637652 ] Joseph Lynch edited comment on CASSANDRA-14747 at 10/4/18 12:30 AM: [~jasobrown] Ok, I think we found the problem! In the new Netty code we explicitly set the {{SO_SNDBUF}} [of the outbound socket|https://github.com/apache/cassandra/blob/47a10649dadbdea6960836a7c0fe6d271a476204/src/java/org/apache/cassandra/net/async/NettyFactory.java#L332] to 64KB. This works great if you have a low latency connection, but for long fat networks this is a serious issue as you restrict your bandwidth significantly due to a high [bandwidth delay product|https://en.wikipedia.org/wiki/Bandwidth-delay_product]. In the tests we've been running where we are trying to push a semi reasonable amount of traffic (like 8mbps) to peers that are about 80ms away (us-east-1 to eu-west-1 is usually about [80ms|https://www.cloudping.co/]). With a 64KB window size we just don't have enough bandwidth even though the actual link is very high bandwidth. As we can see using {{iperf}} setting a 64KB buffer cripples throughput: {noformat} # On the eu-west-1 node X $ iperf -s -p 8080 Server listening on TCP port 8080 TCP window size: 12.0 MByte (default) [ 4] local X port 8080 connected with Y port 26964 [ ID] Interval Transfer Bandwidth [ 4] 0.0-10.5 sec 506 MBytes 404 Mbits/sec [ 5] local X port 8080 connected with Y port 27050 [ 5] 0.0-10.5 sec 8.50 MBytes 6.81 Mbits/sec # On the us-east-1 node Y about 80ms away $ iperf -N -c X -p 8080 Client connecting to X, TCP port 8080 TCP window size: 12.0 MByte (default) [ 3] local Y port 26964 connected with X port 8080 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.1 sec 506 MBytes 421 Mbits/sec $ iperf -N -w 64K -c X -p 8080 Client connecting to X, TCP port 8080 TCP window size: 128 KByte (WARNING: requested 64.0 KByte) [ 3] local Y port 27050 connected with X port 8080 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.1 sec 8.50 MBytes 7.03 Mbits/sec {noformat} So instead of Cassandra getting the full link's bandwidth of 500mbps we're only able to get 7mbps. This is lower than the 8mbps we need to push so the us-east-1 -> eu-west-1 queues effectively grow without bound until we start dropping messages. I applied a [patch|https://gist.github.com/jolynch/966e0e52f34eff7a7b8ac8d5a9cb4b5d#file-fix-the-problem-diff] which does not set {{SO_SNDBUF}} unless explicitly asked to and *everything is completely wonderful* now. Some ways that things are wonderful: 1. The cpu usage is now on par with 3.0.x, and most of that CPU time is spent in compaction (both in garbage creation and actual cpu time): {noformat} 2018-10-03T23:56:40.889+ Process summary process cpu=321.33% application cpu=301.46% (user=185.93% sys=115.52%) other: cpu=19.88% thread count: 274 GC time=5.27% (young=5.27%, old=0.00%) heap allocation rate 478mb/s safe point rate: 0.4 (events/s) avg. safe point pause: 135.64ms safe point sync time: 0.08% processing time: 5.38% (wallclock time) [000135] user=49.03% sys=11.84% alloc= 142mb/s - CompactionExecutor:1 [000136] user=44.60% sys=13.81% alloc= 133mb/s - CompactionExecutor:2 [000198] user= 0.00% sys=41.46% alloc= 4833b/s - NonPeriodicTasks:1 [10] user= 9.56% sys= 0.67% alloc= 57mb/s - spectator-gauge-polling-0 [29] user= 7.45% sys= 2.13% alloc= 5772kb/s - PerDiskMemtableFlushWriter_0:1 [36] user= 0.00% sys= 8.98% alloc= 2598b/s - PERIODIC-COMMIT-LOG-SYNCER [000115] user= 5.74% sys= 2.22% alloc= 12mb/s - MessagingService-NettyInbound-Thread-3-1 [000118] user= 4.03% sys= 3.75% alloc= 2915kb/s - MessagingService-NettyOutbound-Thread-4-3 [000117] user= 3.12% sys= 2.79% alloc= 2110kb/s - MessagingService-NettyOutbound-Thread-4-2 [000144] user= 4.03% sys= 0.92% alloc= 7205kb/s - MutationStage-1 [000146] user= 4.13% sys= 0.77% alloc= 6837kb/s - Native-Transport-Requests-2 [000147] user= 3.12% sys= 1.49% alloc= 6054kb/s - MutationStage-3 [000150] user= 3.22% sys= 1.21% alloc= 6630kb/s - MutationStage-4 [000116] user= 2.72% sys= 1.61% alloc= 1412kb/s - MessagingService-NettyOutbound-Thread-4-1 [000132] user= 2.21% sys= 2.04% alloc= 11mb/s - MessagingService-NettyInbound-Thread-3-2 [000151] user= 2.92% sys= 1.30% alloc= 5462kb/s - Native-Transport-Requests-5 [000134] user= 2.11% sys= 1.71% alloc= 6212kb/s - MessagingService-NettyInbound-Thread-3-4 [000152] user= 3.02% sys= 0.65% al
[jira] [Updated] (CASSANDRA-14747) Evaluate 200 node, compression=none, encryption=none, coalescing=off
[ https://issues.apache.org/jira/browse/CASSANDRA-14747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph Lynch updated CASSANDRA-14747: - Attachment: 4.0.15-after-sndbuf-fix.svg > Evaluate 200 node, compression=none, encryption=none, coalescing=off > - > > Key: CASSANDRA-14747 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14747 > Project: Cassandra > Issue Type: Sub-task >Reporter: Joseph Lynch >Assignee: Joseph Lynch >Priority: Major > Attachments: 3.0.17-QPS.png, 4.0.1-QPS.png, > 4.0.11-after-jolynch-tweaks.svg, 4.0.12-after-unconditional-flush.svg, > 4.0.15-after-sndbuf-fix.svg, 4.0.7-before-my-changes.svg, > 4.0_errors_showing_heap_pressure.txt, > 4.0_heap_histogram_showing_many_MessageOuts.txt, > i-0ed2acd2dfacab7c1-after-looping-fixes.svg, > trunk_vs_3.0.17_latency_under_load.png, > ttop_NettyOutbound-Thread_spinning.txt, > useast1c-i-0e1ddfe8b2f769060-mutation-flame.svg, > useast1e-i-08635fa1631601538_flamegraph_96node.svg, > useast1e-i-08635fa1631601538_ttop_netty_outbound_threads_96nodes, > useast1e-i-08635fa1631601538_uninlinedcpuflamegraph.0_96node_60sec_profile.svg > > > Tracks evaluating a 200 node cluster with all internode settings off (no > compression, no encryption, no coalescing). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-14747) Evaluate 200 node, compression=none, encryption=none, coalescing=off
[ https://issues.apache.org/jira/browse/CASSANDRA-14747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16637652#comment-16637652 ] Joseph Lynch edited comment on CASSANDRA-14747 at 10/4/18 12:29 AM: [~jasobrown] Ok, I think we found the problem! In the new Netty code we explicitly set the {{SO_SNDBUF}} [of the outbound socket|https://github.com/apache/cassandra/blob/47a10649dadbdea6960836a7c0fe6d271a476204/src/java/org/apache/cassandra/net/async/NettyFactory.java#L332] to 64KB. This works great if you have a low latency connection, but for long fat networks this is a serious issue as you restrict your bandwidth significantly due to a high [bandwidth delay product|https://en.wikipedia.org/wiki/Bandwidth-delay_product]. In the tests we've been running where we are trying to push a semi reasonable amount of traffic (like 8mbps) to peers that are about 80ms away (us-east-1 to eu-west-1 is usually about [80ms|https://www.cloudping.co/]). With a 64KB window size we just don't have enough bandwidth even though the actual link is very high bandwidth. As we can see using {{iperf}} setting a 64KB buffer cripples throughput: {noformat} # On the eu-west-1 node X $ iperf -s -p 8080 Server listening on TCP port 8080 TCP window size: 12.0 MByte (default) [ 4] local X port 8080 connected with Y port 26964 [ ID] Interval Transfer Bandwidth [ 4] 0.0-10.5 sec 506 MBytes 404 Mbits/sec [ 5] local X port 8080 connected with Y port 27050 [ 5] 0.0-10.5 sec 8.50 MBytes 6.81 Mbits/sec # On the us-east-1 node Y about 80ms away $ iperf -N -c X -p 8080 Client connecting to X, TCP port 8080 TCP window size: 12.0 MByte (default) [ 3] local Y port 26964 connected with X port 8080 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.1 sec 506 MBytes 421 Mbits/sec $ iperf -N -w 64K -c X -p 8080 Client connecting to X, TCP port 8080 TCP window size: 128 KByte (WARNING: requested 64.0 KByte) [ 3] local Y port 27050 connected with X port 8080 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.1 sec 8.50 MBytes 7.03 Mbits/sec {noformat} So instead of Cassandra getting the full link's bandwidth of 500mbps we're only able to get 7mbps. This is lower than the 8mbps we need to push so the us-east-1 -> eu-west-1 queues effectively grow without bound until we start dropping messages. I applied a [patch|https://gist.github.com/jolynch/966e0e52f34eff7a7b8ac8d5a9cb4b5d#file-fix-the-problem-diff] which does not set {{SO_SNDBUF}} unless explicitly asked to and *everything is completely wonderful* now. Some ways that things are wonderful: 1. The cpu usage is now on par with 3.0.x, and most of that CPU time is spent in compaction (both in garbage creation and actual cpu time): {noformat} 2018-10-03T23:56:40.889+ Process summary process cpu=321.33% application cpu=301.46% (user=185.93% sys=115.52%) other: cpu=19.88% thread count: 274 GC time=5.27% (young=5.27%, old=0.00%) heap allocation rate 478mb/s safe point rate: 0.4 (events/s) avg. safe point pause: 135.64ms safe point sync time: 0.08% processing time: 5.38% (wallclock time) [000135] user=49.03% sys=11.84% alloc= 142mb/s - CompactionExecutor:1 [000136] user=44.60% sys=13.81% alloc= 133mb/s - CompactionExecutor:2 [000198] user= 0.00% sys=41.46% alloc= 4833b/s - NonPeriodicTasks:1 [10] user= 9.56% sys= 0.67% alloc= 57mb/s - spectator-gauge-polling-0 [29] user= 7.45% sys= 2.13% alloc= 5772kb/s - PerDiskMemtableFlushWriter_0:1 [36] user= 0.00% sys= 8.98% alloc= 2598b/s - PERIODIC-COMMIT-LOG-SYNCER [000115] user= 5.74% sys= 2.22% alloc= 12mb/s - MessagingService-NettyInbound-Thread-3-1 [000118] user= 4.03% sys= 3.75% alloc= 2915kb/s - MessagingService-NettyOutbound-Thread-4-3 [000117] user= 3.12% sys= 2.79% alloc= 2110kb/s - MessagingService-NettyOutbound-Thread-4-2 [000144] user= 4.03% sys= 0.92% alloc= 7205kb/s - MutationStage-1 [000146] user= 4.13% sys= 0.77% alloc= 6837kb/s - Native-Transport-Requests-2 [000147] user= 3.12% sys= 1.49% alloc= 6054kb/s - MutationStage-3 [000150] user= 3.22% sys= 1.21% alloc= 6630kb/s - MutationStage-4 [000116] user= 2.72% sys= 1.61% alloc= 1412kb/s - MessagingService-NettyOutbound-Thread-4-1 [000132] user= 2.21% sys= 2.04% alloc= 11mb/s - MessagingService-NettyInbound-Thread-3-2 [000151] user= 2.92% sys= 1.30% alloc= 5462kb/s - Native-Transport-Requests-5 [000134] user= 2.11% sys= 1.71% alloc= 6212kb/s - MessagingService-NettyInbound-Thread-3-4 [000152] user= 3.02% sys= 0.65% al
[jira] [Updated] (CASSANDRA-14747) Evaluate 200 node, compression=none, encryption=none, coalescing=off
[ https://issues.apache.org/jira/browse/CASSANDRA-14747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph Lynch updated CASSANDRA-14747: - Attachment: trunk_vs_3.0.17_latency_under_load.png > Evaluate 200 node, compression=none, encryption=none, coalescing=off > - > > Key: CASSANDRA-14747 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14747 > Project: Cassandra > Issue Type: Sub-task >Reporter: Joseph Lynch >Assignee: Joseph Lynch >Priority: Major > Attachments: 3.0.17-QPS.png, 4.0.1-QPS.png, > 4.0.11-after-jolynch-tweaks.svg, 4.0.12-after-unconditional-flush.svg, > 4.0.7-before-my-changes.svg, 4.0_errors_showing_heap_pressure.txt, > 4.0_heap_histogram_showing_many_MessageOuts.txt, > i-0ed2acd2dfacab7c1-after-looping-fixes.svg, > trunk_vs_3.0.17_latency_under_load.png, > ttop_NettyOutbound-Thread_spinning.txt, > useast1c-i-0e1ddfe8b2f769060-mutation-flame.svg, > useast1e-i-08635fa1631601538_flamegraph_96node.svg, > useast1e-i-08635fa1631601538_ttop_netty_outbound_threads_96nodes, > useast1e-i-08635fa1631601538_uninlinedcpuflamegraph.0_96node_60sec_profile.svg > > > Tracks evaluating a 200 node cluster with all internode settings off (no > compression, no encryption, no coalescing). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14747) Evaluate 200 node, compression=none, encryption=none, coalescing=off
[ https://issues.apache.org/jira/browse/CASSANDRA-14747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16637652#comment-16637652 ] Joseph Lynch commented on CASSANDRA-14747: -- [~jasobrown] Ok, I think we found the problem! In the new Netty code we explicitly set the {{SO_SNDBUF}} [of the outbound socket|https://github.com/apache/cassandra/blob/47a10649dadbdea6960836a7c0fe6d271a476204/src/java/org/apache/cassandra/net/async/NettyFactory.java#L332] to 64KB. This works great if you have a low latency connection, but for long fat networks this is a serious issue as you restrict your bandwidth significantly due to a high [bandwidth delay product|https://en.wikipedia.org/wiki/Bandwidth-delay_product]. In the tests we've been running where we are trying to push a semi reasonable amount of traffic (like 8mbps) to peers that are about 80ms away (us-east-1 to eu-west-1 is usually about [80ms|https://www.cloudping.co/]). With a 64KB window size we just don't have enough bandwidth even though the actual link is very high bandwidth. As we can see using {{iperf}} setting a 64KB buffer cripples throughput: {noformat} # On the eu-west-1 node X $ iperf -s -p 8080 Server listening on TCP port 8080 TCP window size: 12.0 MByte (default) [ 4] local X port 8080 connected with Y port 26964 [ ID] Interval Transfer Bandwidth [ 4] 0.0-10.5 sec 506 MBytes 404 Mbits/sec [ 5] local X port 8080 connected with Y port 27050 [ 5] 0.0-10.5 sec 8.50 MBytes 6.81 Mbits/sec # On the us-east-1 node Y about 80ms away $ iperf -N -c X -p 8080 Client connecting to X, TCP port 8080 TCP window size: 12.0 MByte (default) [ 3] local Y port 26964 connected with X port 8080 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.1 sec 506 MBytes 421 Mbits/sec $ iperf -N -w 64K -c X -p 8080 Client connecting to X, TCP port 8080 TCP window size: 128 KByte (WARNING: requested 64.0 KByte) [ 3] local Y port 27050 connected with X port 8080 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.1 sec 8.50 MBytes 7.03 Mbits/sec {noformat} So instead of Cassandra getting the full link's bandwidth of 500mbps we're only able to get 7mbps. This is lower than the 8mbps we need to push so the us-east-1 -> eu-west-1 queues effectively grow without bound until we start dropping messages. I applied a [patch|https://gist.github.com/jolynch/966e0e52f34eff7a7b8ac8d5a9cb4b5d#file-fix-the-problem-diff] which does not set {{SO_SNDBUF}} unless explicitly asked to and *everything is completely wonderful* now. Some ways that things are wonderful: 1. The cpu usage is now on par with 3.0.x, and most of that CPU time is spent in compaction (both in garbage creation and actual cpu time): {noformat} 2018-10-03T23:56:40.889+ Process summary process cpu=321.33% application cpu=301.46% (user=185.93% sys=115.52%) other: cpu=19.88% thread count: 274 GC time=5.27% (young=5.27%, old=0.00%) heap allocation rate 478mb/s safe point rate: 0.4 (events/s) avg. safe point pause: 135.64ms safe point sync time: 0.08% processing time: 5.38% (wallclock time) [000135] user=49.03% sys=11.84% alloc= 142mb/s - CompactionExecutor:1 [000136] user=44.60% sys=13.81% alloc= 133mb/s - CompactionExecutor:2 [000198] user= 0.00% sys=41.46% alloc= 4833b/s - NonPeriodicTasks:1 [10] user= 9.56% sys= 0.67% alloc= 57mb/s - spectator-gauge-polling-0 [29] user= 7.45% sys= 2.13% alloc= 5772kb/s - PerDiskMemtableFlushWriter_0:1 [36] user= 0.00% sys= 8.98% alloc= 2598b/s - PERIODIC-COMMIT-LOG-SYNCER [000115] user= 5.74% sys= 2.22% alloc= 12mb/s - MessagingService-NettyInbound-Thread-3-1 [000118] user= 4.03% sys= 3.75% alloc= 2915kb/s - MessagingService-NettyOutbound-Thread-4-3 [000117] user= 3.12% sys= 2.79% alloc= 2110kb/s - MessagingService-NettyOutbound-Thread-4-2 [000144] user= 4.03% sys= 0.92% alloc= 7205kb/s - MutationStage-1 [000146] user= 4.13% sys= 0.77% alloc= 6837kb/s - Native-Transport-Requests-2 [000147] user= 3.12% sys= 1.49% alloc= 6054kb/s - MutationStage-3 [000150] user= 3.22% sys= 1.21% alloc= 6630kb/s - MutationStage-4 [000116] user= 2.72% sys= 1.61% alloc= 1412kb/s - MessagingService-NettyOutbound-Thread-4-1 [000132] user= 2.21% sys= 2.04% alloc= 11mb/s - MessagingService-NettyInbound-Thread-3-2 [000151] user= 2.92% sys= 1.30% alloc= 5462kb/s - Native-Transport-Requests-5 [000134] user= 2.11% sys= 1.71% alloc= 6212kb/s - MessagingService-NettyInbound-Thread-3-4 [000152] user= 3.02% sys= 0.65% alloc= 5357kb/s - MutationStage-6 [000133] user= 1.81
[jira] [Comment Edited] (CASSANDRA-14706) Support "IF EXISTS/IF NOT EXISTS" for all clauses of "ALTER TABLE"
[ https://issues.apache.org/jira/browse/CASSANDRA-14706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16637631#comment-16637631 ] Dmitry Lazurkin edited comment on CASSANDRA-14706 at 10/3/18 11:48 PM: --- RDBMS has DDL-transactions. So this is not so crucial for it. But C* migrations aren't so comfortable without this feature. If you have up-migration with many statements then you can't restart up-migration after error in middle and you can't start down-migration. You need manual revert of all statement before error. was (Author: dmitry.lazurkin): RDBMS have DDL-transactions. So this is not so crucial for it. But C* migrations aren't so comfortable without this feature. If you have up-migration with many statements then you can't restart up-migration after error in middle and you can't start down-migration. You need manual revert of all statement before error. > Support "IF EXISTS/IF NOT EXISTS" for all clauses of "ALTER TABLE" > -- > > Key: CASSANDRA-14706 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14706 > Project: Cassandra > Issue Type: New Feature >Reporter: Dmitry Lazurkin >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Like so: > {noformat} > ALTER TABLE ALTER TYPE ; > ALTER TABLE [ IF EXISTS ] ADD [ IF NOT EXISTS ] ; > ALTER TABLE [ IF EXISTS ] ADD [ IF NOT EXISTS ] ( > , . ); > ALTER TABLE [ IF EXISTS ] DROP [ IF EXISTS ] ; > ALTER TABLE [ IF EXISTS ] DROP [ IF EXISTS ] ( > ,.); > ALTER TABLE [ IF EXISTS ] RENAME [ IF EXISTS ] TO ; > ALTER TABLE [ IF EXISTS ] WITH = ; > {noformat} > I think common IF EXISTS/IF NOT EXISTS clause for ADD/DROP/RENAME better than > clause for each column. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14706) Support "IF EXISTS/IF NOT EXISTS" for all clauses of "ALTER TABLE"
[ https://issues.apache.org/jira/browse/CASSANDRA-14706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16637631#comment-16637631 ] Dmitry Lazurkin commented on CASSANDRA-14706: - RDBMS have DDL-transactions. So this is not so crucial for it. But C* migrations aren't so comfortable without this feature. If you have up-migration with many statements then you can't restart up-migration after error in middle and you can't start down-migration. You need manual revert of all statement before error. > Support "IF EXISTS/IF NOT EXISTS" for all clauses of "ALTER TABLE" > -- > > Key: CASSANDRA-14706 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14706 > Project: Cassandra > Issue Type: New Feature >Reporter: Dmitry Lazurkin >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Like so: > {noformat} > ALTER TABLE ALTER TYPE ; > ALTER TABLE [ IF EXISTS ] ADD [ IF NOT EXISTS ] ; > ALTER TABLE [ IF EXISTS ] ADD [ IF NOT EXISTS ] ( > , . ); > ALTER TABLE [ IF EXISTS ] DROP [ IF EXISTS ] ; > ALTER TABLE [ IF EXISTS ] DROP [ IF EXISTS ] ( > ,.); > ALTER TABLE [ IF EXISTS ] RENAME [ IF EXISTS ] TO ; > ALTER TABLE [ IF EXISTS ] WITH = ; > {noformat} > I think common IF EXISTS/IF NOT EXISTS clause for ADD/DROP/RENAME better than > clause for each column. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14803) Rows that cross index block boundaries can cause incomplete reverse reads in some cases.
[ https://issues.apache.org/jira/browse/CASSANDRA-14803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Blake Eggleston updated CASSANDRA-14803: Reviewer: Sam Tunnicliffe > Rows that cross index block boundaries can cause incomplete reverse reads in > some cases. > > > Key: CASSANDRA-14803 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14803 > Project: Cassandra > Issue Type: Bug >Reporter: Blake Eggleston >Assignee: Blake Eggleston >Priority: Major > Fix For: 3.0.x, 3.11.x > > > When we're reading 2.1 sstables in reverse, we skip the first row of an index > block if it's split across index boundaries. The entire row will be read at > the end of the next block. In some cases though, the only thing in this index > block is the partial row, so we return an empty iterator. The empty iterator > is then interpreted as the end of the row further down the call stack, so we > return early without reading the rest of the data. This only affects 3.x > during upgrades from 2.1 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14803) Rows that cross index block boundaries can cause incomplete reverse reads in some cases.
[ https://issues.apache.org/jira/browse/CASSANDRA-14803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Blake Eggleston updated CASSANDRA-14803: Status: Patch Available (was: Open) |[3.0|https://github.com/bdeggleston/cassandra/tree/14803-3.0]|[circle|https://circleci.com/gh/bdeggleston/workflows/cassandra/tree/cci%2F14803-3.0]| |[3.11|https://github.com/bdeggleston/cassandra/tree/14803-3.11]|[circle|https://circleci.com/gh/bdeggleston/workflows/cassandra/tree/cci%2F14803-3.11]| The sstable used for the test was generated from [here|https://github.com/bdeggleston/cassandra/tree/14803-2.1] Since this is testing a specific problem upgrading from 2.x-3.x, it didn't seem like LegacySSTableTest was the right place for this > Rows that cross index block boundaries can cause incomplete reverse reads in > some cases. > > > Key: CASSANDRA-14803 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14803 > Project: Cassandra > Issue Type: Bug >Reporter: Blake Eggleston >Assignee: Blake Eggleston >Priority: Major > Fix For: 3.0.x, 3.11.x > > > When we're reading 2.1 sstables in reverse, we skip the first row of an index > block if it's split across index boundaries. The entire row will be read at > the end of the next block. In some cases though, the only thing in this index > block is the partial row, so we return an empty iterator. The empty iterator > is then interpreted as the end of the row further down the call stack, so we > return early without reading the rest of the data. This only affects 3.x > during upgrades from 2.1 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14798) Improve wording around partitioner selection
[ https://issues.apache.org/jira/browse/CASSANDRA-14798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16637484#comment-16637484 ] Aaron Ploetz commented on CASSANDRA-14798: -- [~djoshi3] I had some trouble with the patch file. It' looks like it's picked up a couple of other changes (not mine). I attached the file just to be complete. Would you mind taking a look through it to make sure it's ok? My changes start with this line: {{}}{{ @@ -182,19 +182,19}} > Improve wording around partitioner selection > > > Key: CASSANDRA-14798 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14798 > Project: Cassandra > Issue Type: Improvement > Components: Documentation and Website >Reporter: Aaron Ploetz >Assignee: Aaron Ploetz >Priority: Trivial > Fix For: 4.0 > > Attachments: 14798-trunk.txt > > > Given some recent community interactions on Stack Overflow, Nate McCall asked > me provide some stronger wording on partitioner selection. Specifically, in > further discouraging people from using the other partitioners (namely, the > ByteOrderedPartitioner). > Right now, this is the language that I'm leaning toward: > {{# The partitioner is responsible for distributing groups of rows (by}} > {{# partition key) across nodes in the cluster. The partitioner can NOT be}} > {{# changed without reloading all data. If you are upgrading, you should set > this}} > {{# to the same partitioner that you are currently using.}} > {{#}} > {{# The default partitioner is the Murmur3Partitioner. Older partitioners}} > {{# such as the RandomPartitioner, ByteOrderedPartitioner, and}} > {{# OrderPreservingPartitioner have been included for backward compatibility > only.}} > {{# For new clusters, you should NOT change this value.}} > {{#}} > {{partitioner: org.apache.cassandra.dht.Murmur3Partitioner }} > I'm open to suggested improvements. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-14798) Improve wording around partitioner selection
[ https://issues.apache.org/jira/browse/CASSANDRA-14798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16637484#comment-16637484 ] Aaron Ploetz edited comment on CASSANDRA-14798 at 10/3/18 8:35 PM: --- [~djoshi3] I had some trouble with the patch file. It looks like it's picked up a couple of other changes (not mine). I attached the file just to be complete. Would you mind taking a look through it to make sure it's ok? My changes start with this line: {{}}{{ @@ -182,19 +182,19}} was (Author: aploetz): [~djoshi3] I had some trouble with the patch file. It' looks like it's picked up a couple of other changes (not mine). I attached the file just to be complete. Would you mind taking a look through it to make sure it's ok? My changes start with this line: {{}}{{ @@ -182,19 +182,19}} > Improve wording around partitioner selection > > > Key: CASSANDRA-14798 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14798 > Project: Cassandra > Issue Type: Improvement > Components: Documentation and Website >Reporter: Aaron Ploetz >Assignee: Aaron Ploetz >Priority: Trivial > Fix For: 4.0 > > Attachments: 14798-trunk.txt > > > Given some recent community interactions on Stack Overflow, Nate McCall asked > me provide some stronger wording on partitioner selection. Specifically, in > further discouraging people from using the other partitioners (namely, the > ByteOrderedPartitioner). > Right now, this is the language that I'm leaning toward: > {{# The partitioner is responsible for distributing groups of rows (by}} > {{# partition key) across nodes in the cluster. The partitioner can NOT be}} > {{# changed without reloading all data. If you are upgrading, you should set > this}} > {{# to the same partitioner that you are currently using.}} > {{#}} > {{# The default partitioner is the Murmur3Partitioner. Older partitioners}} > {{# such as the RandomPartitioner, ByteOrderedPartitioner, and}} > {{# OrderPreservingPartitioner have been included for backward compatibility > only.}} > {{# For new clusters, you should NOT change this value.}} > {{#}} > {{partitioner: org.apache.cassandra.dht.Murmur3Partitioner }} > I'm open to suggested improvements. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-14798) Improve wording around partitioner selection
[ https://issues.apache.org/jira/browse/CASSANDRA-14798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16637484#comment-16637484 ] Aaron Ploetz edited comment on CASSANDRA-14798 at 10/3/18 8:35 PM: --- [~djoshi3] I had some trouble with the patch file. It looks like it's picked up a couple of other changes (not mine). I attached the file just to be complete. Would you mind taking a look through it to make sure it's ok? My changes start with this line: {{ @@ -182,19 +182,19}} was (Author: aploetz): [~djoshi3] I had some trouble with the patch file. It looks like it's picked up a couple of other changes (not mine). I attached the file just to be complete. Would you mind taking a look through it to make sure it's ok? My changes start with this line: {{}}{{ @@ -182,19 +182,19}} > Improve wording around partitioner selection > > > Key: CASSANDRA-14798 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14798 > Project: Cassandra > Issue Type: Improvement > Components: Documentation and Website >Reporter: Aaron Ploetz >Assignee: Aaron Ploetz >Priority: Trivial > Fix For: 4.0 > > Attachments: 14798-trunk.txt > > > Given some recent community interactions on Stack Overflow, Nate McCall asked > me provide some stronger wording on partitioner selection. Specifically, in > further discouraging people from using the other partitioners (namely, the > ByteOrderedPartitioner). > Right now, this is the language that I'm leaning toward: > {{# The partitioner is responsible for distributing groups of rows (by}} > {{# partition key) across nodes in the cluster. The partitioner can NOT be}} > {{# changed without reloading all data. If you are upgrading, you should set > this}} > {{# to the same partitioner that you are currently using.}} > {{#}} > {{# The default partitioner is the Murmur3Partitioner. Older partitioners}} > {{# such as the RandomPartitioner, ByteOrderedPartitioner, and}} > {{# OrderPreservingPartitioner have been included for backward compatibility > only.}} > {{# For new clusters, you should NOT change this value.}} > {{#}} > {{partitioner: org.apache.cassandra.dht.Murmur3Partitioner }} > I'm open to suggested improvements. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14798) Improve wording around partitioner selection
[ https://issues.apache.org/jira/browse/CASSANDRA-14798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Ploetz updated CASSANDRA-14798: - Attachment: 14798-trunk.txt > Improve wording around partitioner selection > > > Key: CASSANDRA-14798 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14798 > Project: Cassandra > Issue Type: Improvement > Components: Documentation and Website >Reporter: Aaron Ploetz >Assignee: Aaron Ploetz >Priority: Trivial > Fix For: 4.0 > > Attachments: 14798-trunk.txt > > > Given some recent community interactions on Stack Overflow, Nate McCall asked > me provide some stronger wording on partitioner selection. Specifically, in > further discouraging people from using the other partitioners (namely, the > ByteOrderedPartitioner). > Right now, this is the language that I'm leaning toward: > {{# The partitioner is responsible for distributing groups of rows (by}} > {{# partition key) across nodes in the cluster. The partitioner can NOT be}} > {{# changed without reloading all data. If you are upgrading, you should set > this}} > {{# to the same partitioner that you are currently using.}} > {{#}} > {{# The default partitioner is the Murmur3Partitioner. Older partitioners}} > {{# such as the RandomPartitioner, ByteOrderedPartitioner, and}} > {{# OrderPreservingPartitioner have been included for backward compatibility > only.}} > {{# For new clusters, you should NOT change this value.}} > {{#}} > {{partitioner: org.apache.cassandra.dht.Murmur3Partitioner }} > I'm open to suggested improvements. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14798) Improve wording around partitioner selection
[ https://issues.apache.org/jira/browse/CASSANDRA-14798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Ploetz updated CASSANDRA-14798: - Attachment: (was: 14798-trunk.txt) > Improve wording around partitioner selection > > > Key: CASSANDRA-14798 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14798 > Project: Cassandra > Issue Type: Improvement > Components: Documentation and Website >Reporter: Aaron Ploetz >Assignee: Aaron Ploetz >Priority: Trivial > Fix For: 4.0 > > > Given some recent community interactions on Stack Overflow, Nate McCall asked > me provide some stronger wording on partitioner selection. Specifically, in > further discouraging people from using the other partitioners (namely, the > ByteOrderedPartitioner). > Right now, this is the language that I'm leaning toward: > {{# The partitioner is responsible for distributing groups of rows (by}} > {{# partition key) across nodes in the cluster. The partitioner can NOT be}} > {{# changed without reloading all data. If you are upgrading, you should set > this}} > {{# to the same partitioner that you are currently using.}} > {{#}} > {{# The default partitioner is the Murmur3Partitioner. Older partitioners}} > {{# such as the RandomPartitioner, ByteOrderedPartitioner, and}} > {{# OrderPreservingPartitioner have been included for backward compatibility > only.}} > {{# For new clusters, you should NOT change this value.}} > {{#}} > {{partitioner: org.apache.cassandra.dht.Murmur3Partitioner }} > I'm open to suggested improvements. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14798) Improve wording around partitioner selection
[ https://issues.apache.org/jira/browse/CASSANDRA-14798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Ploetz updated CASSANDRA-14798: - Attachment: 14798-trunk.txt > Improve wording around partitioner selection > > > Key: CASSANDRA-14798 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14798 > Project: Cassandra > Issue Type: Improvement > Components: Documentation and Website >Reporter: Aaron Ploetz >Assignee: Aaron Ploetz >Priority: Trivial > Fix For: 4.0 > > Attachments: 14798-trunk.txt > > > Given some recent community interactions on Stack Overflow, Nate McCall asked > me provide some stronger wording on partitioner selection. Specifically, in > further discouraging people from using the other partitioners (namely, the > ByteOrderedPartitioner). > Right now, this is the language that I'm leaning toward: > {{# The partitioner is responsible for distributing groups of rows (by}} > {{# partition key) across nodes in the cluster. The partitioner can NOT be}} > {{# changed without reloading all data. If you are upgrading, you should set > this}} > {{# to the same partitioner that you are currently using.}} > {{#}} > {{# The default partitioner is the Murmur3Partitioner. Older partitioners}} > {{# such as the RandomPartitioner, ByteOrderedPartitioner, and}} > {{# OrderPreservingPartitioner have been included for backward compatibility > only.}} > {{# For new clusters, you should NOT change this value.}} > {{#}} > {{partitioner: org.apache.cassandra.dht.Murmur3Partitioner }} > I'm open to suggested improvements. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14798) Improve wording around partitioner selection
[ https://issues.apache.org/jira/browse/CASSANDRA-14798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16637373#comment-16637373 ] Dinesh Joshi commented on CASSANDRA-14798: -- This sounds good. It's concise and crisp. > Improve wording around partitioner selection > > > Key: CASSANDRA-14798 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14798 > Project: Cassandra > Issue Type: Improvement > Components: Documentation and Website >Reporter: Aaron Ploetz >Assignee: Aaron Ploetz >Priority: Trivial > Fix For: 4.0 > > > Given some recent community interactions on Stack Overflow, Nate McCall asked > me provide some stronger wording on partitioner selection. Specifically, in > further discouraging people from using the other partitioners (namely, the > ByteOrderedPartitioner). > Right now, this is the language that I'm leaning toward: > {{# The partitioner is responsible for distributing groups of rows (by}} > {{# partition key) across nodes in the cluster. The partitioner can NOT be}} > {{# changed without reloading all data. If you are upgrading, you should set > this}} > {{# to the same partitioner that you are currently using.}} > {{#}} > {{# The default partitioner is the Murmur3Partitioner. Older partitioners}} > {{# such as the RandomPartitioner, ByteOrderedPartitioner, and}} > {{# OrderPreservingPartitioner have been included for backward compatibility > only.}} > {{# For new clusters, you should NOT change this value.}} > {{#}} > {{partitioner: org.apache.cassandra.dht.Murmur3Partitioner }} > I'm open to suggested improvements. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-14803) Rows that cross index block boundaries can cause incomplete reverse reads in some cases.
Blake Eggleston created CASSANDRA-14803: --- Summary: Rows that cross index block boundaries can cause incomplete reverse reads in some cases. Key: CASSANDRA-14803 URL: https://issues.apache.org/jira/browse/CASSANDRA-14803 Project: Cassandra Issue Type: Bug Reporter: Blake Eggleston Assignee: Blake Eggleston Fix For: 3.0.x, 3.11.x When we're reading 2.1 sstables in reverse, we skip the first row of an index block if it's split across index boundaries. The entire row will be read at the end of the next block. In some cases though, the only thing in this index block is the partial row, so we return an empty iterator. The empty iterator is then interpreted as the end of the row further down the call stack, so we return early without reading the rest of the data. This only affects 3.x during upgrades from 2.1 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-12297) Privacy Violation - Heap Inspection
[ https://issues.apache.org/jira/browse/CASSANDRA-12297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Brown reassigned CASSANDRA-12297: --- Assignee: (was: Jason Brown) > Privacy Violation - Heap Inspection > --- > > Key: CASSANDRA-12297 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12297 > Project: Cassandra > Issue Type: Sub-task >Reporter: Eduardo Aguinaga >Priority: Major > > Overview: > In May through June of 2016 a static analysis was performed on version 3.0.5 > of the Cassandra source code. The analysis included > an automated analysis using HP Fortify v4.21 SCA and a manual analysis > utilizing SciTools Understand v4. The results of that > analysis includes the issue below. > Issue: > In the file PasswordAuthenticator.java on line 129, 164 and 222 a string > object is used to store sensitive data. String objects are immutable and > should not be used to store sensitive data. Sensitive data should be stored > in char or byte arrays and the contents of those arrays should be cleared > ASAP. Operations performed on string objects will require that the original > object be copied and the operation be applied in the new copy of the string > object. This results in the likelihood that multiple copies of sensitive data > being present in the heap until garbage collection takes place. > The snippet below shows the issue on line 129: > PasswordAuthenticator.java, lines 123-134: > {code:java} > 123 public AuthenticatedUser legacyAuthenticate(Map > credentials) throws AuthenticationException > 124 { > 125 String username = credentials.get(USERNAME_KEY); > 126 if (username == null) > 127 throw new AuthenticationException(String.format("Required key > '%s' is missing", USERNAME_KEY)); > 128 > 129 String password = credentials.get(PASSWORD_KEY); > 130 if (password == null) > 131 throw new AuthenticationException(String.format("Required key > '%s' is missing", PASSWORD_KEY)); > 132 > 133 return authenticate(username, password); > 134 } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-8060) Geography-aware, distributed replication
[ https://issues.apache.org/jira/browse/CASSANDRA-8060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Brown reassigned CASSANDRA-8060: -- Assignee: (was: Jason Brown) > Geography-aware, distributed replication > > > Key: CASSANDRA-8060 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8060 > Project: Cassandra > Issue Type: Wish >Reporter: Donald Smith >Priority: Major > > We have three data centers in the US (CA in California, TX in Texas, and NJ > in NJ), two in Europe (UK and DE), and two in Asia (JP and CH1). We do all > our writing to CA. That represents a bottleneck, since the coordinator nodes > in CA are responsible for all the replication to every data center. > Far better if we had the option of setting things up so that CA replicated to > TX , which replicated to NJ. NJ is closer to UK, so NJ should be responsible > for replicating to UK, which should replicate to DE. Etc, etc. > This could be controlled by the topology file. > The replication could be organized in a tree-like structure instead of a > daisy-chain. > It would require architectural changes and would have major ramifications for > latency but might be appropriate for some scenarios. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-12298) Privacy Violation - Heap Inspection
[ https://issues.apache.org/jira/browse/CASSANDRA-12298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Brown reassigned CASSANDRA-12298: --- Assignee: (was: Jason Brown) > Privacy Violation - Heap Inspection > --- > > Key: CASSANDRA-12298 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12298 > Project: Cassandra > Issue Type: Sub-task >Reporter: Eduardo Aguinaga >Priority: Major > > Overview: > In May through June of 2016 a static analysis was performed on version 3.0.5 > of the Cassandra source code. The analysis included > an automated analysis using HP Fortify v4.21 SCA and a manual analysis > utilizing SciTools Understand v4. The results of that > analysis includes the issue below. > Issue: > In the file RoleOptions.java on line 89 a string object is used to store > sensitive data. String objects are immutable and should not be used to store > sensitive data. Sensitive data should be stored in char or byte arrays and > the contents of those arrays should be cleared ASAP. Operations performed on > string objects will require that the original object be copied and the > operation be applied in the new copy of the string object. This results in > the likelihood that multiple copies of sensitive data will be present in the > heap until garbage collection takes place. > The snippet below shows the issue on line 89: > RoleOptions.java, lines 87-90: > {code:java} > 87 public Optional getPassword() > 88 { > 89 return > Optional.fromNullable((String)options.get(IRoleManager.Option.PASSWORD)); > 90 } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14228) Add expiration date overflow notice and recovery instructions to doc
[ https://issues.apache.org/jira/browse/CASSANDRA-14228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16637082#comment-16637082 ] Andrew Baker commented on CASSANDRA-14228: -- I will take a stab at this, can someone assign it to me? Looks like cql3/CQL.textile needs an update. The recovery information feels like it should be a footnote, or link to a different file, is there any similar documentation example where we've done that? > Add expiration date overflow notice and recovery instructions to doc > > > Key: CASSANDRA-14228 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14228 > Project: Cassandra > Issue Type: Task > Components: Documentation and Website >Reporter: Paulo Motta >Priority: Minor > Labels: lhf > > On CASSANDRA-14092 we added a new > [CASSANDRA-14092.txt|https://github.com/apache/cassandra/blob/trunk/CASSANDRA-14092.txt] > file with the maximum ttl expiration notice and recovery instructions for > affected users. > We should probably also add the contents of this file to the documentation > with some basic formatting. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
cassandra git commit: ninja fix bad merge
Repository: cassandra Updated Branches: refs/heads/trunk daa3619ae -> 47a10649d ninja fix bad merge Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/47a10649 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/47a10649 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/47a10649 Branch: refs/heads/trunk Commit: 47a10649dadbdea6960836a7c0fe6d271a476204 Parents: daa3619 Author: Benedict Elliott Smith Authored: Wed Oct 3 15:46:18 2018 +0100 Committer: Benedict Elliott Smith Committed: Wed Oct 3 15:46:18 2018 +0100 -- src/java/org/apache/cassandra/locator/ReplicaLayout.java | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/47a10649/src/java/org/apache/cassandra/locator/ReplicaLayout.java -- diff --git a/src/java/org/apache/cassandra/locator/ReplicaLayout.java b/src/java/org/apache/cassandra/locator/ReplicaLayout.java index 54b82f9..d44fdd7 100644 --- a/src/java/org/apache/cassandra/locator/ReplicaLayout.java +++ b/src/java/org/apache/cassandra/locator/ReplicaLayout.java @@ -278,7 +278,7 @@ public abstract class ReplicaLayout> @VisibleForTesting static EndpointsForToken resolveWriteConflictsInNatural(EndpointsForToken natural, EndpointsForToken pending) { -EndpointsForToken.Mutable resolved = natural.newMutable(natural.size()); +EndpointsForToken.Builder resolved = natural.newBuilder(natural.size()); for (Replica replica : natural) { // always prefer the full natural replica, if there is a conflict @@ -297,7 +297,7 @@ public abstract class ReplicaLayout> } resolved.add(replica); } -return resolved.asSnapshot(); +return resolved.build(); } /** - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14610) Flaky dtest: nodetool_test.TestNodetool.test_describecluster_more_information_three_datacenters
[ https://issues.apache.org/jira/browse/CASSANDRA-14610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16637045#comment-16637045 ] Marcus Eriksson commented on CASSANDRA-14610: - [~jay.zhuang] or [~jasobrown] do either of you have time to review? > Flaky dtest: > nodetool_test.TestNodetool.test_describecluster_more_information_three_datacenters > --- > > Key: CASSANDRA-14610 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14610 > Project: Cassandra > Issue Type: Task > Components: Testing, Tools >Reporter: Jason Brown >Assignee: Marcus Eriksson >Priority: Minor > Labels: dtest > > @jay zhuang observed > nodetool_test.TestNodetool.test_describecluster_more_information_three_datacenters > being flaky in Apache Jenkins. I ran locally and got a different flaky > behavior: > {noformat} > out_node1_dc3, err, _ = node1_dc3.nodetool('describecluster') > assert 0 == len(err), err > > assert out_node1_dc1 == out_node1_dc3 > E AssertionError: assert 'Cluster Info...1=3, dc3=1}\n' == 'Cluster > Infor...1=3, dc3=1}\n' > E Cluster Information: > E Name: test > E Snitch: org.apache.cassandra.locator.PropertyFileSnitch > E DynamicEndPointSnitch: enabled > E Partitioner: org.apache.cassandra.dht.Murmur3Partitioner > E Schema versions: > E fc9ec7cd-80ba-3f27-87af-fc0bafcf7a03: [127.0.0.6, > 127.0.0.5, 127.0.0.4, 127.0.0.3, 127.0.0.2, 127.0.0.1]... > E > E ...Full output truncated (26 lines hidden), use '-vv' to show > 09:58:14,357 ccm DEBUG Log-watching thread exiting. > ===Flaky Test Report=== > test_describecluster_more_information_three_datacenters failed and was not > selected for rerun. > > assert 'Cluster Info...1=3, dc3=1}\n' == 'Cluster Infor...1=3, dc3=1}\n' > Cluster Information: > Name: test > Snitch: org.apache.cassandra.locator.PropertyFileSnitch > DynamicEndPointSnitch: enabled > Partitioner: org.apache.cassandra.dht.Murmur3Partitioner > Schema versions: > fc9ec7cd-80ba-3f27-87af-fc0bafcf7a03: [127.0.0.6, 127.0.0.5, > 127.0.0.4, 127.0.0.3, 127.0.0.2, 127.0.0.1]... > > ...Full output truncated (26 lines hidden), use '-vv' to show > [ /opt/orig/1/opt/dev/cassandra-dtest/nodetool_test.py:373>] > ===End Flaky Test Report=== > {noformat} > As this test is for a patch that was introduced for 4.0, this dtest (should) > only be failing on trunk. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14610) Flaky dtest: nodetool_test.TestNodetool.test_describecluster_more_information_three_datacenters
[ https://issues.apache.org/jira/browse/CASSANDRA-14610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcus Eriksson updated CASSANDRA-14610: Status: Patch Available (was: Open) the test depends on hard coded cassandra version and the order in which keyspaces/datacenters are listed, this should fix those things; https://github.com/krummas/cassandra-dtest/commits/marcuse/14610 https://circleci.com/workflow-run/6c0f0d61-3b5c-49dd-bdeb-22d24dc60b15 it also reduces the number of nodes started to 4 since starting 6 nodes is likely to fail in circle ci > Flaky dtest: > nodetool_test.TestNodetool.test_describecluster_more_information_three_datacenters > --- > > Key: CASSANDRA-14610 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14610 > Project: Cassandra > Issue Type: Task > Components: Testing, Tools >Reporter: Jason Brown >Assignee: Marcus Eriksson >Priority: Minor > Labels: dtest > > @jay zhuang observed > nodetool_test.TestNodetool.test_describecluster_more_information_three_datacenters > being flaky in Apache Jenkins. I ran locally and got a different flaky > behavior: > {noformat} > out_node1_dc3, err, _ = node1_dc3.nodetool('describecluster') > assert 0 == len(err), err > > assert out_node1_dc1 == out_node1_dc3 > E AssertionError: assert 'Cluster Info...1=3, dc3=1}\n' == 'Cluster > Infor...1=3, dc3=1}\n' > E Cluster Information: > E Name: test > E Snitch: org.apache.cassandra.locator.PropertyFileSnitch > E DynamicEndPointSnitch: enabled > E Partitioner: org.apache.cassandra.dht.Murmur3Partitioner > E Schema versions: > E fc9ec7cd-80ba-3f27-87af-fc0bafcf7a03: [127.0.0.6, > 127.0.0.5, 127.0.0.4, 127.0.0.3, 127.0.0.2, 127.0.0.1]... > E > E ...Full output truncated (26 lines hidden), use '-vv' to show > 09:58:14,357 ccm DEBUG Log-watching thread exiting. > ===Flaky Test Report=== > test_describecluster_more_information_three_datacenters failed and was not > selected for rerun. > > assert 'Cluster Info...1=3, dc3=1}\n' == 'Cluster Infor...1=3, dc3=1}\n' > Cluster Information: > Name: test > Snitch: org.apache.cassandra.locator.PropertyFileSnitch > DynamicEndPointSnitch: enabled > Partitioner: org.apache.cassandra.dht.Murmur3Partitioner > Schema versions: > fc9ec7cd-80ba-3f27-87af-fc0bafcf7a03: [127.0.0.6, 127.0.0.5, > 127.0.0.4, 127.0.0.3, 127.0.0.2, 127.0.0.1]... > > ...Full output truncated (26 lines hidden), use '-vv' to show > [ /opt/orig/1/opt/dev/cassandra-dtest/nodetool_test.py:373>] > ===End Flaky Test Report=== > {noformat} > As this test is for a patch that was introduced for 4.0, this dtest (should) > only be failing on trunk. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14759) Transient->Full movements mishandle consistency level upgrade
[ https://issues.apache.org/jira/browse/CASSANDRA-14759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benedict updated CASSANDRA-14759: - Resolution: Fixed Status: Resolved (was: Patch Available) I've committed as [daa3619ae63bb8b06d532890e51d288c189c787c|https://github.com/apache/cassandra/commit/daa3619ae63bb8b06d532890e51d288c189c787c] dtests are now very flaky on CircleCI, but all of the failures in the latest run have shown up in trunk runs for me on CircleCI. > Transient->Full movements mishandle consistency level upgrade > - > > Key: CASSANDRA-14759 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14759 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Benedict >Assignee: Benedict >Priority: Major > Labels: Availability, correctness, transient-replication > Fix For: 4.0 > > > While we need treat a transitioning node as ‘full’ for writes, so that it can > safely begin serving full data requests once it has finished, we cannot > maintain it in the ‘pending’ collection else we will also increase our > consistency requirements by a node that doesn’t exist. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
cassandra git commit: Transient->Full movements mishandle consistency level upgrade
Repository: cassandra Updated Branches: refs/heads/trunk e645b9172 -> daa3619ae Transient->Full movements mishandle consistency level upgrade patch by Benedict; reviewed by Alex Petrov and Ariel Weisberg for CASSANDRA-14759 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/daa3619a Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/daa3619a Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/daa3619a Branch: refs/heads/trunk Commit: daa3619ae63bb8b06d532890e51d288c189c787c Parents: e645b91 Author: Benedict Elliott Smith Authored: Sun Sep 9 23:53:07 2018 +0100 Committer: Benedict Elliott Smith Committed: Wed Oct 3 14:48:15 2018 +0100 -- CHANGES.txt | 1 + .../org/apache/cassandra/locator/Endpoints.java | 6 -- .../org/apache/cassandra/locator/Replica.java | 2 +- .../apache/cassandra/locator/ReplicaLayout.java | 28 +++- .../cassandra/locator/ReplicaLayoutTest.java| 73 5 files changed, 100 insertions(+), 10 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/daa3619a/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index 25c2728..e1fbb90 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 4.0 + * Transient->Full range movements mishandle consistency level upgrade (CASSANDRA-14759) * ReplicaCollection follow-up (CASSANDRA-14726) * Transient node receives full data requests (CASSANDRA-14762) * Enable snapshot artifacts publish (CASSANDRA-12704) http://git-wip-us.apache.org/repos/asf/cassandra/blob/daa3619a/src/java/org/apache/cassandra/locator/Endpoints.java -- diff --git a/src/java/org/apache/cassandra/locator/Endpoints.java b/src/java/org/apache/cassandra/locator/Endpoints.java index ee42e36..a2bad6c 100644 --- a/src/java/org/apache/cassandra/locator/Endpoints.java +++ b/src/java/org/apache/cassandra/locator/Endpoints.java @@ -60,12 +60,6 @@ public abstract class Endpoints> extends AbstractReplicaC return map; } -public boolean contains(InetAddressAndPort endpoint, boolean isFull) -{ -Replica replica = byEndpoint().get(endpoint); -return replica != null && replica.isFull() == isFull; -} - @Override public boolean contains(Replica replica) { http://git-wip-us.apache.org/repos/asf/cassandra/blob/daa3619a/src/java/org/apache/cassandra/locator/Replica.java -- diff --git a/src/java/org/apache/cassandra/locator/Replica.java b/src/java/org/apache/cassandra/locator/Replica.java index c884f13..4c5f7c6 100644 --- a/src/java/org/apache/cassandra/locator/Replica.java +++ b/src/java/org/apache/cassandra/locator/Replica.java @@ -110,7 +110,7 @@ public final class Replica implements Comparable return range; } -public boolean isFull() +public final boolean isFull() { return full; } http://git-wip-us.apache.org/repos/asf/cassandra/blob/daa3619a/src/java/org/apache/cassandra/locator/ReplicaLayout.java -- diff --git a/src/java/org/apache/cassandra/locator/ReplicaLayout.java b/src/java/org/apache/cassandra/locator/ReplicaLayout.java index cba4f68..54b82f9 100644 --- a/src/java/org/apache/cassandra/locator/ReplicaLayout.java +++ b/src/java/org/apache/cassandra/locator/ReplicaLayout.java @@ -18,6 +18,7 @@ package org.apache.cassandra.locator; +import com.google.common.annotations.VisibleForTesting; import org.apache.cassandra.config.DatabaseDescriptor; import org.apache.cassandra.db.Keyspace; import org.apache.cassandra.db.PartitionPosition; @@ -274,9 +275,29 @@ public abstract class ReplicaLayout> * See {@link ReplicaLayout#haveWriteConflicts} * @return a 'natural' replica collection, that has had its conflicts with pending repaired */ -private static > E resolveWriteConflictsInNatural(E natural, E pending) +@VisibleForTesting +static EndpointsForToken resolveWriteConflictsInNatural(EndpointsForToken natural, EndpointsForToken pending) { -return natural.filter(r -> !r.isTransient() || !pending.contains(r.endpoint(), true)); +EndpointsForToken.Mutable resolved = natural.newMutable(natural.size()); +for (Replica replica : natural) +{ +// always prefer the full natural replica, if there is a conflict +if (replica.isTransient()) +{ +Replica conflict = pending.byEndpoint().get(replica.endpoint()); +if (conflict != null) +{ +// it
[jira] [Updated] (CASSANDRA-14726) ReplicaCollection follow-up
[ https://issues.apache.org/jira/browse/CASSANDRA-14726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benedict updated CASSANDRA-14726: - Resolution: Fixed Fix Version/s: 4.0 Status: Resolved (was: Patch Available) Thanks for the review. I've committed as [e645b9172c5d50fc2af407de724e46121edfe109|https://github.com/apache/cassandra/commit/e645b9172c5d50fc2af407de724e46121edfe109]. dtests are now very flaky on CircleCI, but all of the failures in the latest run have shown up in trunk runs for me on CircleCI. As to tests, it's interesting that you find hard-coded tests to be easier to read. I personally find the opposite. However, I'm happy to file a follow-up JIRA to further expand the testing; I'll follow up with that in the near future. > ReplicaCollection follow-up > --- > > Key: CASSANDRA-14726 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14726 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Benedict >Assignee: Benedict >Priority: Major > Fix For: 4.0 > > Time Spent: 10m > Remaining Estimate: 0h > > We introduced \{{ReplicaCollection}} as part of CASSANDRA-14404, but while it > improves readability, we could do more to ensure it minimises extra garbage, > and does not otherwise unnecessarily waste cycles. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
cassandra-builds git commit: Updated jenkins slave list and contact details
Repository: cassandra-builds Updated Branches: refs/heads/master 6fd11d6e2 -> 4f49929aa Updated jenkins slave list and contact details Project: http://git-wip-us.apache.org/repos/asf/cassandra-builds/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra-builds/commit/4f49929a Tree: http://git-wip-us.apache.org/repos/asf/cassandra-builds/tree/4f49929a Diff: http://git-wip-us.apache.org/repos/asf/cassandra-builds/diff/4f49929a Branch: refs/heads/master Commit: 4f49929aaa7285b687c272708ac8f3fd898d2a3a Parents: 6fd11d6 Author: kurt Authored: Thu Sep 27 10:41:08 2018 +1000 Committer: Michael Shuler Committed: Wed Oct 3 08:40:01 2018 -0500 -- ASF-slaves.txt | 12 1 file changed, 12 insertions(+) -- http://git-wip-us.apache.org/repos/asf/cassandra-builds/blob/4f49929a/ASF-slaves.txt -- diff --git a/ASF-slaves.txt b/ASF-slaves.txt index 8a24422..28aa660 100644 --- a/ASF-slaves.txt +++ b/ASF-slaves.txt @@ -38,6 +38,15 @@ cassandra4 - 163.172.83.163 - Ubuntu 16.04 LTS amd64, 16G RAM, donated by Datast cassandra5 - 163.172.83.175 - Ubuntu 16.04 LTS amd64, 16G RAM, donated by Datastax cassandra6 - 163.172.71.128 - Ubuntu 16.04 LTS amd64, 32G RAM, donated by Datastax cassandra7 - 163.172.71.129 - Ubuntu 16.04 LTS amd64, 32G RAM, donated by Datastax +cassandra8 - 35.160.175.252 - Ubuntu 16.04 LTS amd64, 16G RAM, 4 cores, donated by instaclustr +cassandra9 - 34.210.158.175 - Ubuntu 16.04 LTS amd64, 16G RAM, 4 cores, donated by instaclustr +cassandra10 - 35.165.114.131 - Ubuntu 16.04 LTS amd64, 16G RAM, 4 cores, donated by instaclustr +cassandra11 - 35.164.80.43 - Ubuntu 16.04 LTS amd64, 16G RAM, 4 cores, donated by instaclustr +cassandra12 - 52.10.125.176 - Ubuntu 16.04 LTS amd64, 16G RAM, 4 cores, donated by instaclustr +cassandra13 - 52.32.194.237 - Ubuntu 16.04 LTS amd64, 16G RAM, 4 cores, donated by instaclustr +cassandra14 - 52.38.171.39 - Ubuntu 16.04 LTS amd64, 16G RAM, 4 cores, donated by instaclustr +cassandra15 - 52.89.160.64 - Ubuntu 16.04 LTS amd64, 16G RAM, 4 cores, donated by instaclustr +cassandra16 - 54.148.184.162 - Ubuntu 16.04 LTS amd64, 16G RAM, 4 cores, donated by instaclustr @@ -46,3 +55,6 @@ Contacts for system donators, when console hands may be needed by INFRA: Datastax: Michael Shuler alternative group list: cassandra...@datastax.com + Instaclustr: Kurt Greaves + alternative group list: ad...@instaclustr.com + - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[1/2] cassandra git commit: ReplicaCollection follow-up
Repository: cassandra Updated Branches: refs/heads/trunk 467068d1e -> e645b9172 http://git-wip-us.apache.org/repos/asf/cassandra/blob/e645b917/src/java/org/apache/cassandra/locator/RangesByEndpoint.java -- diff --git a/src/java/org/apache/cassandra/locator/RangesByEndpoint.java b/src/java/org/apache/cassandra/locator/RangesByEndpoint.java index 698b133..1a71141 100644 --- a/src/java/org/apache/cassandra/locator/RangesByEndpoint.java +++ b/src/java/org/apache/cassandra/locator/RangesByEndpoint.java @@ -19,9 +19,11 @@ package org.apache.cassandra.locator; import com.google.common.base.Preconditions; +import com.google.common.collect.ImmutableMap; import com.google.common.collect.Maps; import java.util.Collections; +import java.util.HashMap; import java.util.Map; public class RangesByEndpoint extends ReplicaMultimap @@ -37,17 +39,19 @@ public class RangesByEndpoint extends ReplicaMultimap +public static class Builder extends ReplicaMultimap.Builder { @Override -protected RangesAtEndpoint.Mutable newMutable(InetAddressAndPort endpoint) +protected RangesAtEndpoint.Builder newBuilder(InetAddressAndPort endpoint) { -return new RangesAtEndpoint.Mutable(endpoint); +return new RangesAtEndpoint.Builder(endpoint); } -public RangesByEndpoint asImmutableView() +public RangesByEndpoint build() { -return new RangesByEndpoint(Collections.unmodifiableMap(Maps.transformValues(map, RangesAtEndpoint.Mutable::asImmutableView))); +return new RangesByEndpoint( +ImmutableMap.copyOf( +Maps.transformValues(this.map, RangesAtEndpoint.Builder::build))); } } http://git-wip-us.apache.org/repos/asf/cassandra/blob/e645b917/src/java/org/apache/cassandra/locator/ReplicaCollection.java -- diff --git a/src/java/org/apache/cassandra/locator/ReplicaCollection.java b/src/java/org/apache/cassandra/locator/ReplicaCollection.java index d1006dc..d870316 100644 --- a/src/java/org/apache/cassandra/locator/ReplicaCollection.java +++ b/src/java/org/apache/cassandra/locator/ReplicaCollection.java @@ -18,8 +18,6 @@ package org.apache.cassandra.locator; -import org.apache.cassandra.locator.ReplicaCollection.Mutable.Conflict; - import java.util.Comparator; import java.util.Iterator; import java.util.Set; @@ -69,28 +67,39 @@ public interface ReplicaCollection> extends Itera /** * @return a *eagerly constructed* copy of this collection containing the Replica that match the provided predicate. * An effort will be made to either return ourself, or a subList, where possible. - * It is guaranteed that no changes to any upstream Mutable will affect the state of the result. + * It is guaranteed that no changes to any upstream Builder will affect the state of the result. */ public abstract C filter(Predicate predicate); /** * @return a *eagerly constructed* copy of this collection containing the Replica that match the provided predicate. * An effort will be made to either return ourself, or a subList, where possible. - * It is guaranteed that no changes to any upstream Mutable will affect the state of the result. + * It is guaranteed that no changes to any upstream Builder will affect the state of the result. * Only the first maxSize items will be returned. */ public abstract C filter(Predicate predicate, int maxSize); /** + * @return a *lazily constructed* Iterable over this collection, containing the Replica that match the provided predicate. + */ +public abstract Iterable filterLazily(Predicate predicate); + +/** + * @return a *lazily constructed* Iterable over this collection, containing the Replica that match the provided predicate. + * Only the first maxSize matching items will be returned. + */ +public abstract Iterable filterLazily(Predicate predicate, int maxSize); + +/** * @return an *eagerly constructed* copy of this collection containing the Replica at positions [start..end); * An effort will be made to either return ourself, or a subList, where possible. - * It is guaranteed that no changes to any upstream Mutable will affect the state of the result. + * It is guaranteed that no changes to any upstream Builder will affect the state of the result. */ public abstract C subList(int start, int end); /** * @return an *eagerly constructed* copy of this collection containing the Replica re-ordered according to this comparator - * It is guaranteed that no changes to any upstream Mutable will affect the state of the result. + * It is guaranteed that no changes to any upstream Builder will affect the state of the resul
[2/2] cassandra git commit: ReplicaCollection follow-up
ReplicaCollection follow-up patch by Benedict; reviewed by Alex Petrov and Ariel Weisberg for CASSANDRA-14726 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/e645b917 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/e645b917 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/e645b917 Branch: refs/heads/trunk Commit: e645b9172c5d50fc2af407de724e46121edfe109 Parents: 467068d Author: Benedict Elliott Smith Authored: Fri Sep 7 19:28:16 2018 +0100 Committer: Benedict Elliott Smith Committed: Wed Oct 3 14:38:22 2018 +0100 -- CHANGES.txt | 1 + .../cassandra/db/DiskBoundaryManager.java | 8 +- .../org/apache/cassandra/db/ReadCommand.java| 15 +- .../db/compaction/CompactionManager.java| 28 +- .../db/streaming/CassandraStreamManager.java| 2 +- .../org/apache/cassandra/dht/RangeStreamer.java | 10 +- .../locator/AbstractReplicaCollection.java | 394 --- .../locator/AbstractReplicationStrategy.java| 10 +- .../org/apache/cassandra/locator/Endpoints.java | 47 +-- .../cassandra/locator/EndpointsByRange.java | 16 +- .../cassandra/locator/EndpointsByReplica.java | 18 +- .../cassandra/locator/EndpointsForRange.java| 93 ++--- .../cassandra/locator/EndpointsForToken.java| 87 ++-- .../locator/NetworkTopologyStrategy.java| 10 +- .../locator/OldNetworkTopologyStrategy.java | 4 +- .../cassandra/locator/PendingRangeMaps.java | 38 +- .../cassandra/locator/RangesAtEndpoint.java | 163 +++- .../cassandra/locator/RangesByEndpoint.java | 14 +- .../cassandra/locator/ReplicaCollection.java| 70 ++-- .../apache/cassandra/locator/ReplicaLayout.java | 1 + .../cassandra/locator/ReplicaMultimap.java | 26 +- .../apache/cassandra/locator/ReplicaPlans.java | 13 +- .../cassandra/locator/SimpleStrategy.java | 12 +- .../apache/cassandra/locator/TokenMetadata.java | 8 +- .../cassandra/service/RangeRelocator.java | 4 +- .../cassandra/service/StorageService.java | 12 +- .../service/reads/AbstractReadExecutor.java | 14 +- .../db/compaction/AntiCompactionTest.java | 4 +- .../dht/RangeFetchMapCalculatorTest.java| 55 ++- .../locator/ReplicaCollectionTest.java | 207 +++--- .../org/apache/cassandra/service/MoveTest.java | 5 +- .../cassandra/service/MoveTransientTest.java| 32 +- .../cassandra/service/StorageServiceTest.java | 4 +- .../service/reads/DataResolverTest.java | 5 +- 34 files changed, 883 insertions(+), 547 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/e645b917/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index ae321e5..25c2728 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 4.0 + * ReplicaCollection follow-up (CASSANDRA-14726) * Transient node receives full data requests (CASSANDRA-14762) * Enable snapshot artifacts publish (CASSANDRA-12704) * Introduce RangesAtEndpoint.unwrap to simplify StreamSession.addTransferRanges (CASSANDRA-14770) http://git-wip-us.apache.org/repos/asf/cassandra/blob/e645b917/src/java/org/apache/cassandra/db/DiskBoundaryManager.java -- diff --git a/src/java/org/apache/cassandra/db/DiskBoundaryManager.java b/src/java/org/apache/cassandra/db/DiskBoundaryManager.java index 0961a42..69aabfd 100644 --- a/src/java/org/apache/cassandra/db/DiskBoundaryManager.java +++ b/src/java/org/apache/cassandra/db/DiskBoundaryManager.java @@ -123,19 +123,19 @@ public class DiskBoundaryManager * * The final entry in the returned list will always be the partitioner maximum tokens upper key bound */ -private static List getDiskBoundaries(RangesAtEndpoint ranges, IPartitioner partitioner, Directories.DataDirectory[] dataDirectories) +private static List getDiskBoundaries(RangesAtEndpoint replicas, IPartitioner partitioner, Directories.DataDirectory[] dataDirectories) { assert partitioner.splitter().isPresent(); Splitter splitter = partitioner.splitter().get(); boolean dontSplitRanges = DatabaseDescriptor.getNumTokens() > 1; -List weightedRanges = new ArrayList<>(ranges.size()); +List weightedRanges = new ArrayList<>(replicas.size()); // note that Range.sort unwraps any wraparound ranges, so we need to sort them here -for (Range r : Range.sort(ranges.fullRanges())) +for (Range r : Range.sort(replicas.onlyFull().ranges())) weightedRanges.add(new Splitter.WeightedRange(1.0, r)); -for (Range r : Range.sort(ranges.transientRanges(
[jira] [Updated] (CASSANDRA-14762) Transient node receives full data requests in dtests
[ https://issues.apache.org/jira/browse/CASSANDRA-14762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benedict updated CASSANDRA-14762: - Resolution: Fixed Status: Resolved (was: Patch Available) > Transient node receives full data requests in dtests > > > Key: CASSANDRA-14762 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14762 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Ariel Weisberg >Assignee: Benedict >Priority: Major > Fix For: 4.0 > > > I saw this running them on my laptop with rapid write protection disabled. > Attached is a patch for disabling rapid write protection in the transient > dtests. > {noformat} > .Exception in thread Thread-19: > Traceback (most recent call last): > File > "/usr/local/Cellar/python/3.6.4_4/Frameworks/Python.framework/Versions/3.6/lib/python3.6/threading.py", > line 916, in _bootstrap_inner > self.run() > File > "/Users/aweisberg/repos/cassandra-dtest/venv/src/ccm/ccmlib/cluster.py", line > 180, in run > self.scan_and_report() > File > "/Users/aweisberg/repos/cassandra-dtest/venv/src/ccm/ccmlib/cluster.py", line > 173, in scan_and_report > on_error_call(errordata) > File "/Users/aweisberg/repos/cassandra-dtest/dtest_setup.py", line 137, in > _log_error_handler > pytest.fail("Error details: \n{message}".format(message=message)) > File > "/Users/aweisberg/repos/cassandra-dtest/venv/lib/python3.6/site-packages/_pytest/outcomes.py", > line 96, in fail > raise Failed(msg=msg, pytrace=pytrace) > Failed: Error details: > Errors seen in logs for: node3 > node3: ERROR [ReadStage-1] 2018-09-18 12:28:48,344 > AbstractLocalAwareExecutorService.java:167 - Uncaught exception on thread > Thread[ReadStage-1,5,main] > org.apache.cassandra.exceptions.InvalidRequestException: Attempted to serve > transient data request from full node in > org.apache.cassandra.db.ReadCommandVerbHandler@3c55e0ff > at > org.apache.cassandra.db.ReadCommandVerbHandler.validateTransientStatus(ReadCommandVerbHandler.java:104) > at > org.apache.cassandra.db.ReadCommandVerbHandler.doVerb(ReadCommandVerbHandler.java:53) > at > org.apache.cassandra.net.MessageDeliveryTask.process(MessageDeliveryTask.java:92) > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:54) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at > org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162) > at > org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:134) > at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:110) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.lang.Thread.run(Thread.java:748) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14762) Transient node receives full data requests in dtests
[ https://issues.apache.org/jira/browse/CASSANDRA-14762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16636947#comment-16636947 ] Benedict commented on CASSANDRA-14762: -- I've committed as-is, to [467068d1e9d84e6cca1f9dd5a4eff5f80d027c2e|https://github.com/apache/cassandra/commit/467068d1e9d84e6cca1f9dd5a4eff5f80d027c2e]. dtests are now very flaky on CircleCI, but all of the failures in the latest run have shown up in trunk runs for me on CircleCI. > Transient node receives full data requests in dtests > > > Key: CASSANDRA-14762 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14762 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Ariel Weisberg >Assignee: Benedict >Priority: Major > Fix For: 4.0 > > > I saw this running them on my laptop with rapid write protection disabled. > Attached is a patch for disabling rapid write protection in the transient > dtests. > {noformat} > .Exception in thread Thread-19: > Traceback (most recent call last): > File > "/usr/local/Cellar/python/3.6.4_4/Frameworks/Python.framework/Versions/3.6/lib/python3.6/threading.py", > line 916, in _bootstrap_inner > self.run() > File > "/Users/aweisberg/repos/cassandra-dtest/venv/src/ccm/ccmlib/cluster.py", line > 180, in run > self.scan_and_report() > File > "/Users/aweisberg/repos/cassandra-dtest/venv/src/ccm/ccmlib/cluster.py", line > 173, in scan_and_report > on_error_call(errordata) > File "/Users/aweisberg/repos/cassandra-dtest/dtest_setup.py", line 137, in > _log_error_handler > pytest.fail("Error details: \n{message}".format(message=message)) > File > "/Users/aweisberg/repos/cassandra-dtest/venv/lib/python3.6/site-packages/_pytest/outcomes.py", > line 96, in fail > raise Failed(msg=msg, pytrace=pytrace) > Failed: Error details: > Errors seen in logs for: node3 > node3: ERROR [ReadStage-1] 2018-09-18 12:28:48,344 > AbstractLocalAwareExecutorService.java:167 - Uncaught exception on thread > Thread[ReadStage-1,5,main] > org.apache.cassandra.exceptions.InvalidRequestException: Attempted to serve > transient data request from full node in > org.apache.cassandra.db.ReadCommandVerbHandler@3c55e0ff > at > org.apache.cassandra.db.ReadCommandVerbHandler.validateTransientStatus(ReadCommandVerbHandler.java:104) > at > org.apache.cassandra.db.ReadCommandVerbHandler.doVerb(ReadCommandVerbHandler.java:53) > at > org.apache.cassandra.net.MessageDeliveryTask.process(MessageDeliveryTask.java:92) > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:54) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at > org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162) > at > org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:134) > at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:110) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.lang.Thread.run(Thread.java:748) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
cassandra git commit: Transient node receives full data requests
Repository: cassandra Updated Branches: refs/heads/trunk 42c92b976 -> 467068d1e Transient node receives full data requests patch by Benedict; reviewed by Ariel Weisberg for CASSANDRA-14762 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/467068d1 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/467068d1 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/467068d1 Branch: refs/heads/trunk Commit: 467068d1e9d84e6cca1f9dd5a4eff5f80d027c2e Parents: 42c92b9 Author: Benedict Elliott Smith Authored: Thu Sep 20 18:56:38 2018 +0100 Committer: Benedict Elliott Smith Committed: Wed Oct 3 14:27:21 2018 +0100 -- CHANGES.txt | 1 + .../service/reads/AbstractReadExecutor.java | 3 ++- .../reads/ShortReadPartitionsProtection.java| 6 + .../reads/repair/AbstractReadRepair.java| 27 +++- .../reads/repair/BlockingReadRepairTest.java| 2 +- .../DiagEventsBlockingReadRepairTest.java | 2 +- .../reads/repair/ReadOnlyReadRepairTest.java| 2 +- 7 files changed, 32 insertions(+), 11 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/467068d1/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index e89c1c5..ae321e5 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 4.0 + * Transient node receives full data requests (CASSANDRA-14762) * Enable snapshot artifacts publish (CASSANDRA-12704) * Introduce RangesAtEndpoint.unwrap to simplify StreamSession.addTransferRanges (CASSANDRA-14770) * LOCAL_QUORUM may speculate to non-local nodes, resulting in Timeout instead of Unavailable (CASSANDRA-14735) http://git-wip-us.apache.org/repos/asf/cassandra/blob/467068d1/src/java/org/apache/cassandra/service/reads/AbstractReadExecutor.java -- diff --git a/src/java/org/apache/cassandra/service/reads/AbstractReadExecutor.java b/src/java/org/apache/cassandra/service/reads/AbstractReadExecutor.java index 8d0f14c..8f7bc26 100644 --- a/src/java/org/apache/cassandra/service/reads/AbstractReadExecutor.java +++ b/src/java/org/apache/cassandra/service/reads/AbstractReadExecutor.java @@ -274,7 +274,7 @@ public abstract class AbstractReadExecutor speculated = true; ReplicaPlan.ForTokenRead replicaPlan = replicaPlan(); -ReadCommand retryCommand = command; +ReadCommand retryCommand; Replica extraReplica; if (handler.resolver.isDataPresent()) { @@ -290,6 +290,7 @@ public abstract class AbstractReadExecutor else { extraReplica = replicaPlan.firstUncontactedCandidate(Replica::isFull); +retryCommand = command; if (extraReplica == null) { cfs.metric.speculativeInsufficientReplicas.inc(); http://git-wip-us.apache.org/repos/asf/cassandra/blob/467068d1/src/java/org/apache/cassandra/service/reads/ShortReadPartitionsProtection.java -- diff --git a/src/java/org/apache/cassandra/service/reads/ShortReadPartitionsProtection.java b/src/java/org/apache/cassandra/service/reads/ShortReadPartitionsProtection.java index 2e4440f..7b7c4d3 100644 --- a/src/java/org/apache/cassandra/service/reads/ShortReadPartitionsProtection.java +++ b/src/java/org/apache/cassandra/service/reads/ShortReadPartitionsProtection.java @@ -182,9 +182,15 @@ public class ShortReadPartitionsProtection extends Transformation handler = new ReadCallback<>(resolver, cmd, replicaPlan, queryStartNanoTime); if (source.isSelf()) +{ StageManager.getStage(Stage.READ).maybeExecuteImmediately(new StorageProxy.LocalReadRunnable(cmd, handler)); +} else +{ +if (source.isTransient()) +cmd = cmd.copyAsTransientQuery(source); MessagingService.instance().sendRRWithFailure(cmd.createMessage(), source.endpoint(), handler); +} // We don't call handler.get() because we want to preserve tombstones since we're still in the middle of merging node results. handler.awaitResults(); http://git-wip-us.apache.org/repos/asf/cassandra/blob/467068d1/src/java/org/apache/cassandra/service/reads/repair/AbstractReadRepair.java -- diff --git a/src/java/org/apache/cassandra/service/reads/repair/AbstractReadRepair.java b/src/java/org/apache/cassandra/service/reads/repair/AbstractReadRepair.java index 1b213ff..b74f8d3 1
[jira] [Assigned] (CASSANDRA-14610) Flaky dtest: nodetool_test.TestNodetool.test_describecluster_more_information_three_datacenters
[ https://issues.apache.org/jira/browse/CASSANDRA-14610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcus Eriksson reassigned CASSANDRA-14610: --- Assignee: Marcus Eriksson Environment: (was: @jay zhuang observed nodetool_test.TestNodetool.test_describecluster_more_information_three_datacenters being flaky in Apache Jenkins. I ran locally and got a different flaky behavior: {noformat} out_node1_dc3, err, _ = node1_dc3.nodetool('describecluster') assert 0 == len(err), err > assert out_node1_dc1 == out_node1_dc3 E AssertionError: assert 'Cluster Info...1=3, dc3=1}\n' == 'Cluster Infor...1=3, dc3=1}\n' E Cluster Information: E Name: test E Snitch: org.apache.cassandra.locator.PropertyFileSnitch E DynamicEndPointSnitch: enabled E Partitioner: org.apache.cassandra.dht.Murmur3Partitioner E Schema versions: E fc9ec7cd-80ba-3f27-87af-fc0bafcf7a03: [127.0.0.6, 127.0.0.5, 127.0.0.4, 127.0.0.3, 127.0.0.2, 127.0.0.1]... E E ...Full output truncated (26 lines hidden), use '-vv' to show 09:58:14,357 ccm DEBUG Log-watching thread exiting. ===Flaky Test Report=== test_describecluster_more_information_three_datacenters failed and was not selected for rerun. assert 'Cluster Info...1=3, dc3=1}\n' == 'Cluster Infor...1=3, dc3=1}\n' Cluster Information: Name: test Snitch: org.apache.cassandra.locator.PropertyFileSnitch DynamicEndPointSnitch: enabled Partitioner: org.apache.cassandra.dht.Murmur3Partitioner Schema versions: fc9ec7cd-80ba-3f27-87af-fc0bafcf7a03: [127.0.0.6, 127.0.0.5, 127.0.0.4, 127.0.0.3, 127.0.0.2, 127.0.0.1]... ...Full output truncated (26 lines hidden), use '-vv' to show [] ===End Flaky Test Report=== {noformat} As this test is for a patch that was introduced for 4.0, this dtest (should) only be failing on trunk.) Description: @jay zhuang observed nodetool_test.TestNodetool.test_describecluster_more_information_three_datacenters being flaky in Apache Jenkins. I ran locally and got a different flaky behavior: {noformat} out_node1_dc3, err, _ = node1_dc3.nodetool('describecluster') assert 0 == len(err), err > assert out_node1_dc1 == out_node1_dc3 E AssertionError: assert 'Cluster Info...1=3, dc3=1}\n' == 'Cluster Infor...1=3, dc3=1}\n' E Cluster Information: E Name: test E Snitch: org.apache.cassandra.locator.PropertyFileSnitch E DynamicEndPointSnitch: enabled E Partitioner: org.apache.cassandra.dht.Murmur3Partitioner E Schema versions: E fc9ec7cd-80ba-3f27-87af-fc0bafcf7a03: [127.0.0.6, 127.0.0.5, 127.0.0.4, 127.0.0.3, 127.0.0.2, 127.0.0.1]... E E ...Full output truncated (26 lines hidden), use '-vv' to show 09:58:14,357 ccm DEBUG Log-watching thread exiting. ===Flaky Test Report=== test_describecluster_more_information_three_datacenters failed and was not selected for rerun. assert 'Cluster Info...1=3, dc3=1}\n' == 'Cluster Infor...1=3, dc3=1}\n' Cluster Information: Name: test Snitch: org.apache.cassandra.locator.PropertyFileSnitch DynamicEndPointSnitch: enabled Partitioner: org.apache.cassandra.dht.Murmur3Partitioner Schema versions: fc9ec7cd-80ba-3f27-87af-fc0bafcf7a03: [127.0.0.6, 127.0.0.5, 127.0.0.4, 127.0.0.3, 127.0.0.2, 127.0.0.1]... ...Full output truncated (26 lines hidden), use '-vv' to show [] ===End Flaky Test Report=== {noformat} As this test is for a patch that was introduced for 4.0, this dtest (should) only be failing on trunk. > Flaky dtest: > nodetool_test.TestNodetool.test_describecluster_more_information_three_datacenters > --- > > Key: CASSANDRA-14610 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14610 > Project: Cassandra > Issue Type: Task > Components: Testing, Tools >Reporter: Jason Brown >Assignee: Marcus Eriksson >Priority: Minor > Labels: dtest > > @jay zhuang observed > nodetool_test.TestNodetool.test_describecluster_more_information_three_datacenters > being flaky in Apache Jenkins. I ran locally and got a different flaky > behavior: > {noformat} > out_node1_dc3, err, _ = node1_dc3.nodetool('describecluster') > assert 0 == len(err), err > > assert out_node1_dc1 == out_node1_dc3 > E AssertionError: assert 'Cluster Info...1=3, dc3=1}\n' == 'Cluster > Infor...1=3, dc3=1}\n' > E Cluster Information: > E Name: test > E Sni
[jira] [Updated] (CASSANDRA-14802) calculatePendingRanges assigns more pending ranges than necessary
[ https://issues.apache.org/jira/browse/CASSANDRA-14802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benedict updated CASSANDRA-14802: - Fix Version/s: 4.x > calculatePendingRanges assigns more pending ranges than necessary > -- > > Key: CASSANDRA-14802 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14802 > Project: Cassandra > Issue Type: Bug > Components: Coordination, Distributed Metadata >Reporter: Benedict >Priority: Major > Fix For: 4.x > > > This might be a good thing, but should probably be configurable, and made > consistent. Presently, in a number of circumstances where there are multiple > range movements, {{calculatePendingRanges}} will assign a pending range to a > node that will not ultimately own it. If done consistently, this might make > range movements resilient to node failures / aborted range movements, since > all nodes will be receiving all ranges they might own under any incomplete > range ownership movements. But done inconsistently it seems only to reduce > availability in the cluster, by potentially increasing the number of pending > nodes unnecessarily. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-14802) calculatePendingRanges assigns more pending ranges than necessary
Benedict created CASSANDRA-14802: Summary: calculatePendingRanges assigns more pending ranges than necessary Key: CASSANDRA-14802 URL: https://issues.apache.org/jira/browse/CASSANDRA-14802 Project: Cassandra Issue Type: Bug Components: Coordination, Distributed Metadata Reporter: Benedict This might be a good thing, but should probably be configurable, and made consistent. Presently, in a number of circumstances where there are multiple range movements, {{calculatePendingRanges}} will assign a pending range to a node that will not ultimately own it. If done consistently, this might make range movements resilient to node failures / aborted range movements, since all nodes will be receiving all ranges they might own under any incomplete range ownership movements. But done inconsistently it seems only to reduce availability in the cluster, by potentially increasing the number of pending nodes unnecessarily. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-14801) calculatePendingRanges no longer safe for multiple adjacent range movements
Benedict created CASSANDRA-14801: Summary: calculatePendingRanges no longer safe for multiple adjacent range movements Key: CASSANDRA-14801 URL: https://issues.apache.org/jira/browse/CASSANDRA-14801 Project: Cassandra Issue Type: Bug Components: Coordination, Distributed Metadata Reporter: Benedict Fix For: 4.0 Correctness depended upon the narrowing to a {{Set}}, which we no longer do - we maintain a collection of all {{Replica}}. Our {{RangesAtEndpoint}} collection built by {{getPendingRanges}} can as a result contain the same endpoint multiple times; and our {{EndpointsForToken}} obtained by {{TokenMetadata.pendingEndpointsFor}} may fail to be constructed, resulting in cluster-wide failures for writes to the affected token ranges for the duration of the range movement. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14800) Avoid referencing DatabaseDescriptor in ProtocolVersion
[ https://issues.apache.org/jira/browse/CASSANDRA-14800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16636759#comment-16636759 ] Sam Tunnicliffe commented on CASSANDRA-14800: - +1 > Avoid referencing DatabaseDescriptor in ProtocolVersion > --- > > Key: CASSANDRA-14800 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14800 > Project: Cassandra > Issue Type: Improvement >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Minor > Fix For: 4.0 > > > We should not reference {{DatabaseDescriptor}} in {{ProtocolVersion}} as it > is used outside of Cassandra (for example when handling full query logs) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14800) Avoid referencing DatabaseDescriptor in ProtocolVersion
[ https://issues.apache.org/jira/browse/CASSANDRA-14800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-14800: Status: Ready to Commit (was: Patch Available) > Avoid referencing DatabaseDescriptor in ProtocolVersion > --- > > Key: CASSANDRA-14800 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14800 > Project: Cassandra > Issue Type: Improvement >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Minor > Fix For: 4.0 > > > We should not reference {{DatabaseDescriptor}} in {{ProtocolVersion}} as it > is used outside of Cassandra (for example when handling full query logs) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14713) Update docker image used for testing
[ https://issues.apache.org/jira/browse/CASSANDRA-14713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcus Eriksson updated CASSANDRA-14713: Reviewer: Marcus Eriksson > Update docker image used for testing > > > Key: CASSANDRA-14713 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14713 > Project: Cassandra > Issue Type: New Feature > Components: Testing >Reporter: Stefan Podkowinski >Assignee: Stefan Podkowinski >Priority: Major > Attachments: Dockerfile > > > Tests executed on builds.apache.org ({{docker/jenkins/jenkinscommand.sh}}) > and circleCI ({{.circleci/config.yml}}) will currently use the same > [cassandra-test|https://hub.docker.com/r/kjellman/cassandra-test/] docker > image ([github|https://github.com/mkjellman/cassandra-test-docker]) by > [~mkjellman]. > We should manage this image on our own as part of cassandra-builds, to keep > it updated. There's also a [Apache > user|https://hub.docker.com/u/apache/?page=1] on docker hub for publishing > images. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14800) Avoid referencing DatabaseDescriptor in ProtocolVersion
[ https://issues.apache.org/jira/browse/CASSANDRA-14800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcus Eriksson updated CASSANDRA-14800: Fix Version/s: 4.0 > Avoid referencing DatabaseDescriptor in ProtocolVersion > --- > > Key: CASSANDRA-14800 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14800 > Project: Cassandra > Issue Type: Improvement >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Minor > Fix For: 4.0 > > > We should not reference {{DatabaseDescriptor}} in {{ProtocolVersion}} as it > is used outside of Cassandra (for example when handling full query logs) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14800) Avoid referencing DatabaseDescriptor in ProtocolVersion
[ https://issues.apache.org/jira/browse/CASSANDRA-14800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcus Eriksson updated CASSANDRA-14800: Assignee: Marcus Eriksson Status: Patch Available (was: Open) https://github.com/krummas/cassandra/commits/marcuse/14800 https://circleci.com/gh/krummas/workflows/cassandra/tree/marcuse%2F14800 > Avoid referencing DatabaseDescriptor in ProtocolVersion > --- > > Key: CASSANDRA-14800 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14800 > Project: Cassandra > Issue Type: Improvement >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Minor > > We should not reference {{DatabaseDescriptor}} in {{ProtocolVersion}} as it > is used outside of Cassandra (for example when handling full query logs) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14713) Update docker image used for testing
[ https://issues.apache.org/jira/browse/CASSANDRA-14713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Podkowinski updated CASSANDRA-14713: --- Status: Patch Available (was: In Progress) > Update docker image used for testing > > > Key: CASSANDRA-14713 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14713 > Project: Cassandra > Issue Type: New Feature > Components: Testing >Reporter: Stefan Podkowinski >Assignee: Stefan Podkowinski >Priority: Major > Attachments: Dockerfile > > > Tests executed on builds.apache.org ({{docker/jenkins/jenkinscommand.sh}}) > and circleCI ({{.circleci/config.yml}}) will currently use the same > [cassandra-test|https://hub.docker.com/r/kjellman/cassandra-test/] docker > image ([github|https://github.com/mkjellman/cassandra-test-docker]) by > [~mkjellman]. > We should manage this image on our own as part of cassandra-builds, to keep > it updated. There's also a [Apache > user|https://hub.docker.com/u/apache/?page=1] on docker hub for publishing > images. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-14800) Avoid referencing DatabaseDescriptor in ProtocolVersion
Marcus Eriksson created CASSANDRA-14800: --- Summary: Avoid referencing DatabaseDescriptor in ProtocolVersion Key: CASSANDRA-14800 URL: https://issues.apache.org/jira/browse/CASSANDRA-14800 Project: Cassandra Issue Type: Improvement Reporter: Marcus Eriksson We should not reference {{DatabaseDescriptor}} in {{ProtocolVersion}} as it is used outside of Cassandra (for example when handling full query logs) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org