Sergio Bossa created CASSANDRA-15700:
----------------------------------------

             Summary: Performance regression on internode messaging
                 Key: CASSANDRA-15700
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15700
             Project: Cassandra
          Issue Type: Bug
          Components: Messaging/Internode
            Reporter: Sergio Bossa
            Assignee: Sergio Bossa
         Attachments: Oss40vsOss311.png, oss40.gc, oss40_nogc.tar.xz, 
oss40_system.log

Me and [~jasonstack] have been investigating a performance regression affecting 
4.0 during a 3 nodes, RF 3 write throughput test with a timeseries like 
workload, as shown in this plot, where blue is 3.11 and orange is 4.0:

!Oss40vsOss311.png|width=389,height=214!

 It's been a bit of a long investigation, but two clues ended up standing out:
1) An abnormal number of expired messages on 4.0 (as shown in the attached  
system log), while 3.11 has almost none.
2) An abnormal GC activity (as shown in the attached gc log).

Turns out the two are related, as the [on expired 
callback|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/OutboundConnection.java#L462]
 creates a huge amount of strings in the {{id()}} call. The next question is 
what causes all those message expirations; we thoroughly reviewed the internode 
messaging code and the only issue we could find so far is related to the "batch 
pruning" calls 
[here|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/OutboundMessageQueue.java#L81]
 and 
[here|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/OutboundMessageQueue.java#L188]:
 it _seems_ too much time is spent on those, causing the event loop to fall 
behind in processing the rest of the messages, which will end up being expired. 
This is supported by the analysis of the collapsed stacks (after fixing the GC 
issue):
{noformat}
(tprint (top-aggregated-calls oss40nogc "EventLoopDelivery:doRun" 5))
org/apache/cassandra/net/OutboundConnection$EventLoopDelivery:doRun 3456
org/apache/cassandra/net/OutboundMessageQueue:access$600 1621
org/apache/cassandra/net/PrunableArrayQueue:prune 1621
org/apache/cassandra/net/OutboundMessageQueue$WithLock:close 1621
org/apache/cassandra/net/OutboundMessageQueue:pruneInternalQueueWithLock 1620
{noformat}

Those are the top 5 sampled calls from {{EventLoopDelivery#doRun()}} which 
spends half of its time pruning. But only a tiny portion of such pruning time 
is spent actually expiring:
{noformat}
(tprint (top-aggregated-calls oss40nogc 
"OutboundMessageQueue:pruneInternalQueueWithLock" 5))
org/apache/cassandra/net/OutboundMessageQueue:pruneInternalQueueWithLock 1900
org/apache/cassandra/net/PrunableArrayQueue:prune 1894
org/apache/cassandra/net/OutboundMessageQueue$1Pruner:onPruned 147
org/apache/cassandra/net/OutboundConnection$$Lambda$444/740904487:accept 147
org/apache/cassandra/net/OutboundConnection:onExpired 147
{noformat}

And indeed, the {{PrunableArrayQueue:prune()}} self time is dominant:
{noformat}
(tprint (top-self-calls oss40nogc "PrunableArrayQueue:prune" 5))
org/apache/cassandra/net/PrunableArrayQueue:prune 1718
org/apache/cassandra/net/OutboundConnection:releaseCapacity 27
java/util/concurrent/ConcurrentHashMap:replaceNode 19
java/util/concurrent/ConcurrentLinkedQueue:offer 16
java/util/concurrent/LinkedBlockingQueue:offer 15
{noformat}

That said, before proceeding with a PR to fix those issues, I'd like to 
understand: what's the reason to prune so often, rather than just when polling 
the message during delivery? If there's a reason I'm missing, let's talk about 
how to optimize pruning, otherwise let's get rid of that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to