Adam Horwich created CASSANDRA-8620:
---------------------------------------

             Summary: Bootstrap session hanging indefinitely
                 Key: CASSANDRA-8620
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8620
             Project: Cassandra
          Issue Type: Bug
         Environment: Debian 7, Oracle JDK 1.7.0_51, AWS + GCE
            Reporter: Adam Horwich


Hi! We have been running a relatively small 2.1.2 cluster over 2 DCs for a few 
months with ~100GB load per node and a RF=3 and over the last few weeks have 
been trying to scale up capacity. 

We've been recently seeing scenarios in which the Bootstrap or Unbootstrap 
streaming process hangs indefinitely for one or more sessions on the receiver 
without stacktrace or exception. This does not happen every time, and we do not 
get into this state with the same sender every time. When the receiver is in a 
hung state, the following can be found in the thread dump:

The Stream-IN thread for one or more sessions is blocked in the following 
state: 
Thread 24942: (state = BLOCKED)
 - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information may 
be imprecise)
 - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, 
line=186 (Compiled frame)
 - 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await() 
@bci=42, line=2043 (Compiled frame)
 - java.util.concurrent.ArrayBlockingQueue.take() @bci=20, line=374 (Compiled 
frame)
 - org.apache.cassandra.streaming.compress.CompressedInputStream.read() 
@bci=31, line=89 (Compiled frame)
 - java.io.DataInputStream.readUnsignedShort() @bci=4, line=337 (Compiled frame)
 - org.apache.cassandra.utils.BytesReadTracker.readUnsignedShort() @bci=4, 
line=140 (Compiled frame)
 - org.apache.cassandra.utils.ByteBufferUtil.readShortLength(java.io.DataInput) 
@bci=1, line=317 (Compiled frame)
 - 
org.apache.cassandra.utils.ByteBufferUtil.readWithShortLength(java.io.DataInput)
 @bci=2, line=327 (Compiled frame)
 - 
org.apache.cassandra.db.composites.AbstractCType$Serializer.deserialize(java.io.DataInput)
 @bci=5, line=397 (Compiled frame)
 - 
org.apache.cassandra.db.composites.AbstractCType$Serializer.deserialize(java.io.DataInput)
 @bci=2, line=381 (Compiled frame)
 - 
org.apache.cassandra.db.OnDiskAtom$Serializer.deserializeFromSSTable(java.io.DataInput,
 org.apache.cassandra.db.ColumnSerializer$Flag, int, 
org.apache.cassandra.io.sstable.Descriptor$Version) @bci=10, line=75 (Compiled 
frame)
 - org.apache.cassandra.db.AbstractCell$1.computeNext() @bci=25, line=52 
(Compiled frame)
 - org.apache.cassandra.db.AbstractCell$1.computeNext() @bci=1, line=46 
(Compiled frame)
 - com.google.common.collect.AbstractIterator.tryToComputeNext() @bci=9, 
line=143 (Compiled frame)
 - com.google.common.collect.AbstractIterator.hasNext() @bci=61, line=138 
(Compiled frame)
 - 
org.apache.cassandra.io.sstable.SSTableWriter.appendFromStream(org.apache.cassandra.db.DecoratedKey,
 org.apache.cassandra.config.CFMetaData, java.io.DataInput, 
org.apache.cassandra.io.sstable.Descriptor$Version) @bci=320, line=283 
(Compiled frame)
 - 
org.apache.cassandra.streaming.StreamReader.writeRow(org.apache.cassandra.io.sstable.SSTableWriter,
 java.io.DataInput, org.apache.cassandra.db.ColumnFamilyStore) @bci=26, 
line=157 (Compiled frame)
 - 
org.apache.cassandra.streaming.compress.CompressedStreamReader.read(java.nio.channels.ReadableByteChannel)
 @bci=258, line=89 (Compiled frame)
 - 
org.apache.cassandra.streaming.messages.IncomingFileMessage$1.deserialize(java.nio.channels.ReadableByteChannel,
 int, org.apache.cassandra.streaming.StreamSession) @bci=69, line=48 
(Interpreted frame)
 - 
org.apache.cassandra.streaming.messages.IncomingFileMessage$1.deserialize(java.nio.channels.ReadableByteChannel,
 int, org.apache.cassandra.streaming.StreamSession) @bci=4, line=38 
(Interpreted frame)
 - 
org.apache.cassandra.streaming.messages.StreamMessage.deserialize(java.nio.channels.ReadableByteChannel,
 int, org.apache.cassandra.streaming.StreamSession) @bci=37, line=55 
(Interpreted frame)
 - 
org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run() 
@bci=24, line=245 (Interpreted frame)
 - java.lang.Thread.run() @bci=11, line=744 (Interpreted frame)

Debug logging shows that the receiver is still reading the file it is receiving 
from the receiver and has not yet sent an ACK.

The receiver is waiting for more data to finish writing its row, and the sender 
is not sending any more data. On both the receiver and sender there is a large 
amount of data (~5MB) stuck in the Recv-Q (receiver) and Send-Q (sender). 

We've been trying to diagnose this issue internally, but it's difficult to 
create a reliably reproducible scenario. We have so far found that restarting 
all nodes in the cluster and ensuring that a cleanup has been performed helps 
mitigate the problem (though a cleanup without restart can still result in a 
hung state). However, it's unclear to me why either of these would affect the 
streaming process in the way we've observed. One theory is that the calculated 
section sizes are inaccurate.

We tried setting a timeout on the dataBuffer reading (moving take to poll), 
which forced a retry on the file, but the same transfer failed again until 
retries were exhausted.

>From a heap dump in the dead locked state we confirmed the following:

ByteTracker.bytesRead = 100477411
Length of the section being read (happens to be the last section) 100477411

We also wrote all the data in the buffer to a text file and observed that the 
data in the buffer ended half way through a row.

The stream reader checks the value of bytes read after importing each row 
(StreamReader line 97) to detect the end of the section, however it does not 
expect the data to finish half way through a row. This happens in 
OnDiskAtom.deserializeFromSSTable in any of the various deserialization calls.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to