Hello Martin. Did you solve your issue?
I would say that this exception could be due to 'streaming_socket_timeout_in_ms' indeed. Make sure you have a large value enough or indeed upgrade to a newer version implementing the keep alive is indeed an interesting thing to try. The thing is if you are trying to add a DC, it might not be the best moment for an upgrade. It is clear to me that using a keep-alive here is better, so if it is a good fit upgrading could definitely help. Another reason I can think of would be network issue of some kind such as a flaky cross DC connection, a node going down, strictly or just bouncing because of GC or any other reason. I believe this kind of events are not well handled by the streaming process yet. Is the cluster healthy overall? Do you have pending / dropped messages of some kind, GC pressure, log warnings and errors or any other troubles? Let us know how it goes :). C*heers, ----------------------- Alain Rodriguez - @arodream - al...@thelastpickle.com France / Spain The Last Pickle - Apache Cassandra Consulting http://www.thelastpickle.com 2018-01-08 14:31 GMT+00:00 Martin Mačura <m.mac...@gmail.com>: > None of the files is listed more than once in the logs: > > java.lang.RuntimeException: Transfer of file > /fs3/cassandra/data/<redacted>/event_group-3b5782d08e4411e68 > 42917253f111990/mc-116042-big-Data.db > already completed or aborted (perhaps session failed?). > java.lang.RuntimeException: Transfer of file > /fs0/cassandra/data/<redacted>/event_group-3b5782d08e4411e68 > 42917253f111990/mc-111370-big-Data.db > already completed or aborted (perhaps session failed?). > java.lang.RuntimeException: Transfer of file > /fs3/cassandra/data/<redacted>/event_alert-13d700008e3f11e6a > 6cbe1698349da4d/mc-8659-big-Data.db > already completed or aborted (perhaps session failed?). > java.lang.RuntimeException: Transfer of file > /fs4/cassandra/data/<redacted>/event_alert-13d700008e3f11e6a > 6cbe1698349da4d/mc-9133-big-Data.db > already completed or aborted (perhaps session failed?). > java.lang.RuntimeException: Transfer of file > /fs2/cassandra/data/<redacted>/event_alert-13d700008e3f11e6a > 6cbe1698349da4d/mc-3997-big-Data.db > already completed or aborted (perhaps session failed?). > java.lang.RuntimeException: Transfer of file > /fs1/cassandra/data/<redacted>//event_group-3b5782d08e4411e6 > 842917253f111990/mc-152979-big-Data.db > already completed or aborted (perhaps session failed?). > > > > > On Mon, Jan 8, 2018 at 2:21 AM, kurt greaves <k...@instaclustr.com> wrote: > > If you're on 3.9 it's likely unrelated as streaming_socket_timeout_in_ms > is > > 48 hours. Appears rebuild is trying to stream the same file twice. Are > there > > other exceptions in the logs related to the file, or can you find out if > > it's previously been sent by the same session? Search the logs for the > file > > that failed and post back any exceptions. > > > > On 29 December 2017 at 10:18, Martin Mačura <m.mac...@gmail.com> wrote: > >> > >> Is this something that can be resolved by CASSANDRA-11841 ? > >> > >> Thanks, > >> > >> Martin > >> > >> On Thu, Dec 21, 2017 at 3:02 PM, Martin Mačura <m.mac...@gmail.com> > wrote: > >> > Hi all, > >> > we are trying to add a new datacenter to the existing cluster, but the > >> > 'nodetool rebuild' command always fails after a couple of hours. > >> > > >> > We're on Cassandra 3.9. > >> > > >> > Example 1: > >> > > >> > 172.24.16.169 INFO [STREAM-IN-/172.25.16.125:55735] 2017-12-13 > >> > 23:55:38,840 StreamResultFuture.java:174 - [Stream > >> > #b8faf130-e092-11e7-bab5-0d4fb7c90e72 ID#0] Prepare completed. > >> > Receiving 0 files(0.000KiB), sending 9844 files(885.587GiB) > >> > 172.25.16.125 INFO [STREAM-IN-/172.24.16.169:7000] 2017-12-13 > >> > 23:55:38,858 StreamResultFuture.java:174 - [Stream > >> > #b8faf130-e092-11e7-bab5-0d4fb7c90e72 ID#0] Prepare completed. > >> > Receiving 9844 files(885.587GiB), sending 0 files(0.000KiB) > >> > > >> > 172.24.16.169 ERROR [STREAM-IN-/172.25.16.125:55735] 2017-12-14 > >> > 04:28:09,064 StreamSession.java:533 - [Stream > >> > #b8faf130-e092-11e7-bab5-0d4fb7c90e72] Streaming error occurred on > >> > session with peer 172.25.16.125 > >> > 172.24.16.169 java.io.IOException: Connection reset by peer > >> > > >> > 172.24.16.169 ERROR [STREAM-OUT-/172.25.16.125:49412] 2017-12-14 > >> > 07:26:26,832 StreamSession.java:533 - [Stream > >> > #b8faf130-e092-11e7-bab5-0d4fb7c90e72] Streaming error occurred on > >> > session with peer 172.25.16.125 > >> > 172.24.16.169 java.lang.RuntimeException: Transfer of file > >> > <redacted>-13d700008e3f11e6a6cbe1698349da4d/mc-8659-big-Data.db > >> > already completed or aborted (perhaps session failed?). > >> > 172.25.16.125 ERROR [STREAM-OUT-/172.24.16.169:7000] 2017-12-14 > >> > 07:26:50,004 StreamSession.java:533 - [Stream > >> > #b8faf130-e092-11e7-bab5-0d4fb7c90e72] Streaming error occurred on > >> > session with peer 172.24.16.169 > >> > 172.25.16.125 java.io.IOException: Connection reset by peer > >> > > >> > Example 2: > >> > > >> > 172.24.16.169 INFO [STREAM-IN-/172.25.16.125:35202] 2017-12-18 > >> > 03:24:31,423 StreamResultFuture.java:174 - [Stream > >> > #95d36300-e3d4-11e7-a90b-2b89506ad2af ID#0] Prepare completed. > >> > Receiving 0 files(0.000KiB), sending 12312 files(895.973GiB) > >> > 172.25.16.125 INFO [STREAM-IN-/172.24.16.169:7000] 2017-12-18 > >> > 03:24:31,441 StreamResultFuture.java:174 - [Stream > >> > #95d36300-e3d4-11e7-a90b-2b89506ad2af ID#0] Prepare completed. > >> > Receiving 12312 files(895.973GiB), sending 0 files(0.000KiB) > >> > > >> > 172.24.16.169 ERROR [STREAM-IN-/172.25.16.125:35202] 2017-12-18 > >> > 06:39:42,049 StreamSession.java:533 - [Stream > >> > #95d36300-e3d4-11e7-a90b-2b89506ad2af] Streaming error occurred on > >> > session with peer 172.25.16.125 > >> > 172.24.16.169 java.io.IOException: Connection reset by peer > >> > > >> > 172.24.16.169 ERROR [STREAM-OUT-/172.25.16.125:42744] 2017-12-18 > >> > 09:25:36,188 StreamSession.java:533 - [Stream > >> > #95d36300-e3d4-11e7-a90b-2b89506ad2af] Streaming error occurred on > >> > session with peer 172.25.16.125 > >> > 172.24.16.169 java.lang.RuntimeException: Transfer of file > >> > <redacted>-3b5782d08e4411e6842917253f111990/mc-152979-big-Data.db > >> > already completed or aborted (perhaps session failed?). > >> > 172.25.16.125 ERROR [STREAM-OUT-/172.24.16.169:7000] 2017-12-18 > >> > 09:25:59,447 StreamSession.java:533 - [Stream > >> > #95d36300-e3d4-11e7-a90b-2b89506ad2af] Streaming error occurred on > >> > session with peer 172.24.16.169 > >> > 172.25.16.125 java.io.IOException: Connection timed out > >> > > >> > Datacenter: PRIMARY > >> > =================== > >> > Status=Up/Down > >> > |/ State=Normal/Leaving/Joining/Moving > >> > -- Address Load Tokens Owns (effective) Host ID > >> > Rack > >> > UN 172.24.16.169 918.31 GiB 256 100.0% > >> > bc4a980b-cca6-4ca2-b32f-f8206d48e14c RAC1 > >> > UN 172.24.16.170 908.76 GiB 256 100.0% > >> > 37b2742e-c83a-4341-896f-09d244810e69 RAC1 > >> > UN 172.24.16.171 908.44 GiB 256 100.0% > >> > 6dc2b9d8-75dd-48f8-858c-53b1af42e8fb RAC1 > >> > Datacenter: SECONDARY > >> > ===================== > >> > Status=Up/Down > >> > |/ State=Normal/Leaving/Joining/Moving > >> > -- Address Load Tokens Owns (effective) Host ID > >> > Rack > >> > UN 172.25.16.125 27.48 GiB 256 100.0% > >> > 1e1669eb-cfd2-4718-a073-558946a8c947 RAC2 > >> > UN 172.25.16.124 28.24 GiB 256 100.0% > >> > 896d9894-10c8-4269-9476-5ddab3c8abe9 RAC2 > >> > > >> > Any ideas? > >> > > >> > Thanks, > >> > > >> > Martin > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org > >> For additional commands, e-mail: user-h...@cassandra.apache.org > >> > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org > For additional commands, e-mail: user-h...@cassandra.apache.org > >