Re: cqlsh COPY ... TO ... doesn't work if one node down

Anup Shirolkar Sun, 01 Jul 2018 18:15:48 -0700

Hi,

The error shows that, the cqlsh connection with down node is failed.
So, you should debug why it happened.


Although, you have mentioned other node in cqlsh command '10.0.0.154'
my guess is, the down node was present in connection pool, hence it was
attempted for connection.

Ideally the availability of data should not be hampered due
to unavailability of one replica out of 5.
Also the stack trace is about 'cqlsh' connection error.

I think once you get your connection sorted, the COPY should work as usual.

Regards,
Anup


On 30 June 2018 at 15:05, Dmitry Simonov <dimmobor...@gmail.com> wrote:

> Hello!
>
> I have cassandra cluster with 5 nodes.
> There is a (relatively small) keyspace X with RF5.
> One node goes down.
>
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address      Load       Tokens       Owns (effective)  Host
> ID                               Rack
> UN  10.0.0.82   253.64 MB  256          100.0%
> 839bef9d-79af-422c-a21f-33bdcf4493c1  rack1
> UN  10.0.0.154  255.92 MB  256          100.0%
> ce23f3a7-67d2-47c0-9ece-7a5dd67c4105  rack1
> UN  10.0.0.76   461.26 MB  256          100.0%
> c8e18603-0ede-43f0-b713-3ff47ad92323  rack1
> UN  10.0.0.94   575.78 MB  256          100.0%
> 9a324dbc-5ae1-4788-80e4-d86dcaae5a4c  rack1
> DN  10.0.0.47   ?          256          100.0%
> 7b628ca2-4e47-457a-ba42-5191f7e5374b  rack1
>
> I try to export some data using COPY TO, but it fails after long retries.
> Why does it fail?
> How can I make a copy?
> There must be 4 copies of each row on other (alive) replicas.
>
> cqlsh 10.0.0.154 -e "COPY X.Y TO 'backup/X.Y' WITH NUMPROCESSES=1"
>
> Using 1 child processes
>
> Starting copy of X.Y with columns [key, column1, value].
> 2018-06-29 19:12:23,661 Failed to create connection pool for new host
> 10.0.0.47:
> Traceback (most recent call last):
>   File "/usr/lib/foobar/lib/python3.5/site-packages/cassandra/cluster.py",
> line 2476, in run_add_or_renew_pool
>     new_pool = HostConnection(host, distance, self)
>   File "/usr/lib/foobar/lib/python3.5/site-packages/cassandra/pool.py",
> line 332, in __init__
>     self._connection = session.cluster.connection_factory(host.address)
>   File "/usr/lib/foobar/lib/python3.5/site-packages/cassandra/cluster.py",
> line 1205, in connection_factory
>     return self.connection_class.factory(address, self.connect_timeout,
> *args, **kwargs)
>   File "/usr/lib/foobar/lib/python3.5/site-packages/cassandra/connection.py",
> line 332, in factory
>     conn = cls(host, *args, **kwargs)
>   File 
> "/usr/lib/foobar/lib/python3.5/site-packages/cassandra/io/asyncorereactor.py",
> line 344, in __init__
>     self._connect_socket()
>   File "/usr/lib/foobar/lib/python3.5/site-packages/cassandra/connection.py",
> line 371, in _connect_socket
>     raise socket.error(sockerr.errno, "Tried connecting to %s. Last error:
> %s" % ([a[4] for a in addresses], sockerr.strerror or sockerr))
> OSError: [Errno None] Tried connecting to [('10.0.0.47', 9042)]. Last
> error: timed out
> 2018-06-29 19:12:23,665 Host 10.0.0.47 has been marked down
> 2018-06-29 19:12:29,674 Error attempting to reconnect to 10.0.0.47,
> scheduling retry in 2.0 seconds: [Errno None] Tried connecting to
> [('10.0.0.47', 9042)]. Last error: timed out
> 2018-06-29 19:12:36,684 Error attempting to reconnect to 10.0.0.47,
> scheduling retry in 4.0 seconds: [Errno None] Tried connecting to
> [('10.0.0.47', 9042)]. Last error: timed out
> 2018-06-29 19:12:45,696 Error attempting to reconnect to 10.0.0.47,
> scheduling retry in 8.0 seconds: [Errno None] Tried connecting to
> [('10.0.0.47', 9042)]. Last error: timed out
> 2018-06-29 19:12:58,716 Error attempting to reconnect to 10.0.0.47,
> scheduling retry in 16.0 seconds: [Errno None] Tried connecting to
> [('10.0.0.47', 9042)]. Last error: timed out
> 2018-06-29 19:13:19,756 Error attempting to reconnect to 10.0.0.47,
> scheduling retry in 32.0 seconds: [Errno None] Tried connecting to
> [('10.0.0.47', 9042)]. Last error: timed out
> 2018-06-29 19:13:56,834 Error attempting to reconnect to 10.0.0.47,
> scheduling retry in 64.0 seconds: [Errno None] Tried connecting to
> [('10.0.0.47', 9042)]. Last error: timed out
> 2018-06-29 19:15:05,887 Error attempting to reconnect to 10.0.0.47,
> scheduling retry in 128.0 seconds: [Errno None] Tried connecting to
> [('10.0.0.47', 9042)]. Last error: timed out
> 2018-06-29 19:17:18,982 Error attempting to reconnect to 10.0.0.47,
> scheduling retry in 256.0 seconds: [Errno None] Tried connecting to
> [('10.0.0.47', 9042)]. Last error: timed out
> 2018-06-29 19:21:40,064 Error attempting to reconnect to 10.0.0.47,
> scheduling retry in 512.0 seconds: [Errno None] Tried connecting to
> [('10.0.0.47', 9042)]. Last error: timed out
> <stdin>:1:(4, 'Interrupted system call')
> IOError:
> IOError:
> IOError:
> IOError:
> IOError:
>
>
> --
> Best Regards,
> Dmitry Simonov
>



-- 

Anup Shirolkar

Consultant

+61 420 602 338

<https://www.instaclustr.com/solutions/managed-apache-kafka/>

<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.

Re: cqlsh COPY ... TO ... doesn't work if one node down

Reply via email to