Re: New node block in autobootstrap

laxmikanth sadula Tue, 27 Sep 2016 12:47:47 -0700

Ok... Thanks for the reply...
I'm going to retry nodetool rebuild with following changes as you said


net.ipv4.tcp_keepalive_time=60 net.ipv4.tcp_keepalive_probes=3
net.ipv4.tcp_keepalive_intvl=10

Hope this changes would be enough on the new node where I'm running
'nodetool rebuild' and hope NOT required on all existing nodes from which
data is going to get streamed..Am I right?

On Sep 28, 2016 1:04 AM, "Paulo Motta" <pauloricard...@gmail.com> wrote:

> Yeah this is likely to be caused by idle connections being shut down, so
> you may need to update your tcp_keepalive* and/or network/firewall settings.
>
> 2016-09-27 15:29 GMT-03:00 laxmikanth sadula <laxmikanth...@gmail.com>:
>
>> Hi paul,
>>
>> Thanks for the reply...
>>
>> I'm getting following streaming exceptions during nodetool rebuild in
>> c*-2.0.17
>>
>> *04:24:49,759 StreamSession.java (line 461) [Stream
>> #5e1b7f40-8496-11e6-8847-1b88665e430d] Streaming error occurred*
>> *java.io.IOException: Connection timed out*
>> *    at sun.nio.ch.FileDispatcherImpl.write0(Native Method)*
>> *    at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)*
>> *    at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)*
>> *    at sun.nio.ch.IOUtil.write(IOUtil.java:65)*
>> *    at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)*
>> *    at
>> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:44)*
>> *    at
>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:339)*
>> *    at
>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:311)*
>> *    at java.lang.Thread.run(Thread.java:745)*
>> *DEBUG [STREAM-OUT-/xxx.xxx.98.168] 2016-09-27 04:24:49,764
>> ConnectionHandler.java (line 104) [Stream
>> #5e1b7f40-8496-11e6-8847-1b88665e430d] Closing stream connection handler on
>> /xxx.xxx.98.168*
>> * INFO [STREAM-OUT-/xxx.xxx.98.168] 2016-09-27 04:24:49,764
>> StreamResultFuture.java (line 186) [Stream
>> #5e1b7f40-8496-11e6-8847-1b88665e430d] Session with /xxx.xxx.98.168 is
>> complete*
>> *ERROR [STREAM-OUT-/xxx.xxx.98.168] 2016-09-27 04:24:49,764
>> StreamSession.java (line 461) [Stream
>> #5e1b7f40-8496-11e6-8847-1b88665e430d] Streaming error occurred*
>> *java.io.IOException: Broken pipe*
>> *    at sun.nio.ch.FileDispatcherImpl.write0(Native Method)*
>> *    at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)*
>> *    at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)*
>> *    at sun.nio.ch.IOUtil.write(IOUtil.java:65)*
>> *    at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)*
>> *    at
>> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:44)*
>> *    at
>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:339)*
>> *    at
>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:319)*
>> *    at java.lang.Thread.run(Thread.java:745)*
>> *DEBUG [STREAM-IN-/xxx.xxx.98.168] 2016-09-27 04:24:49,909
>> ConnectionHandler.java (line 244) [Stream
>> #5e1b7f40-8496-11e6-8847-1b88665e430d] Received File (Header (cfId:
>> 68af9ee0-96f8-3b1d-a418-e5ae844f2cc2, #3, version: jb, estimated keys:
>> 4736, transfer size: 2306880, compressed?: true), file:
>> /home/cassandra/data_directories/data/keyspace_name1/archiving_metadata/keyspace_name1-archiving_metadata-tmp-jb-27-Data.db)*
>> *ERROR [STREAM-IN-/xxx.xxx.98.168] 2016-09-27 04:24:49,909
>> StreamSession.java (line 461) [Stream
>> #5e1b7f40-8496-11e6-8847-1b88665e430d] Streaming error occurred*
>> *java.lang.RuntimeException: Outgoing stream handler has been closed*
>> *    at
>> org.apache.cassandra.streaming.ConnectionHandler.sendMessage(ConnectionHandler.java:126)*
>> *    at
>> org.apache.cassandra.streaming.StreamSession.receive(StreamSession.java:524)*
>> *    at
>> org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:413)*
>> *    at
>> org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:245)*
>> *    at java.lang.Thread.run(Thread.java:745)*
>>
>> On Sep 27, 2016 11:48 PM, "Paulo Motta" <pauloricard...@gmail.com> wrote:
>>
>>> What type of streaming timeout are you getting? Do you have a stack
>>> trace? What version are you in?
>>>
>>> See more information about tuning tcp_keepalive* here:
>>> https://docs.datastax.com/en/cassandra/2.0/cassandra/trouble
>>> shooting/trblshootIdleFirewall.html
>>>
>>> 2016-09-27 14:07 GMT-03:00 laxmikanth sadula <laxmikanth...@gmail.com>:
>>>
>>>> @Paulo Motta
>>>>
>>>> Even we are facing Streaming timeout exceptions during 'nodetool
>>>> rebuild' , I set streaming_socket_timeout_in_ms to 86400000 (24 hours) as
>>>> suggested in datastax blog  - https://support.datastax.com/h
>>>> c/en-us/articles/206502913-FAQ-How-to-reduce-the-impact-of-s
>>>> treaming-errors-or-failures  , but still we are getting streaming
>>>> exceptions.
>>>>
>>>> And what is the suggestible settings/values for kernel tcp_keepalive
>>>> which would help streaming succeed ?
>>>>
>>>> Thank you
>>>>
>>>> On Tue, Aug 16, 2016 at 12:21 AM, Paulo Motta <pauloricard...@gmail.com
>>>> > wrote:
>>>>
>>>>> What version are you in? This seems like a typical case were there was
>>>>> a problem with streaming (hanging, etc), do you have access to the logs?
>>>>> Maybe look for streaming errors? Typically streaming errors are related to
>>>>> timeouts, so you should review your cassandra
>>>>> streaming_socket_timeout_in_ms and kernel tcp_keepalive settings.
>>>>>
>>>>> If you're on 2.2+ you can resume a failed bootstrap with nodetool
>>>>> bootstrap resume. There were also some streaming hanging problems fixed
>>>>> recently, so I'd advise you to upgrade to the latest version of your
>>>>> particular series for a more robust version.
>>>>>
>>>>> Is there any reason why you didn't use the replace procedure
>>>>> (-Dreplace_address) to replace the node with the same tokens? This would 
>>>>> be
>>>>> a bit faster than remove + bootstrap procedure.
>>>>>
>>>>> 2016-08-15 15:37 GMT-03:00 Jérôme Mainaud <jer...@mainaud.com>:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> A client of mime have problems when adding a node in the cluster.
>>>>>> After 4 days, the node is still in joining mode, it doesn't have the
>>>>>> same level of load than the other and there seems to be no streaming from
>>>>>> and to the new node.
>>>>>>
>>>>>> This node has a history.
>>>>>>
>>>>>>    1. At the begin, it was in a seed in the cluster.
>>>>>>    2. Ops detected that client had problems with it.
>>>>>>    3. They tried to reset it but failed. In their process they
>>>>>>    launched several repair and rebuild process on the node.
>>>>>>    4. Then they asked me to help them.
>>>>>>    5. We stopped the node,
>>>>>>    6. removed it from the list of seeds (more precisely it was
>>>>>>    replaced by another node),
>>>>>>    7. removed it from the cluster (I choose not to use decommission
>>>>>>    since node data was compromised)
>>>>>>    8. deleted all files from data, commitlog and savedcache
>>>>>>    directories.
>>>>>>    9. after the leaving process ended, it was started as a fresh new
>>>>>>    node and began autobootstrap.
>>>>>>
>>>>>>
>>>>>> As I don’t have direct access to the cluster I don't have a lot of
>>>>>> information, but I will have tomorrow (logs and results of some 
>>>>>> commands).
>>>>>> And I can ask for people any required information.
>>>>>>
>>>>>> Does someone have any idea of what could have happened and what I
>>>>>> should investigate first ?
>>>>>> What would you do to unlock the situation ?
>>>>>>
>>>>>> Context: The cluster consists of two DC, each with 15 nodes. Average
>>>>>> load is around 3 TB per node. The joining node froze a little after 2 TB.
>>>>>>
>>>>>> Thank you for your help.
>>>>>> Cheers,
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Jérôme Mainaud
>>>>>> jer...@mainaud.com
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Regards,
>>>> Laxmikanth
>>>> 99621 38051
>>>>
>>>>
>>>
>

Re: New node block in autobootstrap

Reply via email to