Re: Added node - now queries time out

2021-12-09 Thread Joe Obernberger

This worked - decommissioned the node, and re-adding it worked.

If a drive fails on a Cassandra node, what is the process to bring that 
node back up?


-joe

On 12/3/2021 4:31 PM, Bowen Song wrote:
The load on the new server looks clearly wrong. Are you sure this node 
has fully bootstraped / rebuilt? If not, the large amount of streaming 
activity triggered by read repair may be enough to cause timeouts. 
Please check the new server's log and make sure it did not fail any 
streaming session when it first joined the cluster. If in doubt, 
remove the node and re-add it, and keep an eye on the log.


On 03/12/2021 20:51, Joe Obernberger wrote:
Hi all - just added a node to an 11 node cluster (4.0.1) and it 
synced up OK, but now all queries are timing out.

This time I made sure the clocks are synced!  :)

Kinda desperate to get this to work again.  What can I check do? Just 
added the .34 node.  One item of concern is the amount of load/data 
on it compared to the others.
I'm running a repair on the new node, but things like select * from 
table, on a table with maybe 100 rows times out.

Help!

Datacenter: datacenter1
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address Load    Tokens  Owns  Host 
ID   Rack
UN  172.16.100.45   161.81 GiB  250 ? 
07bccfce-45f1-41a3-a5c4-ee748a7a9b98  rack1
UN  172.16.100.251  128.6 GiB   200 ? 
660f476c-a124-4ca0-b55f-75efe56370da  rack1
UN  172.16.100.252  128.44 GiB  200 ? 
e83aa851-69b4-478f-88f6-60e657ea6539  rack1
UN  172.16.100.249  128.43 GiB  200 ? 
49e4f571-7d1c-4e1e-aca7-5bbe076596f7  rack1
UN  172.16.100.36   128.79 GiB  200 ? 
d9702f96-256e-45ae-8e12-69a42712be50  rack1
UN  172.16.100.39   127.47 GiB  200 ? 
93f9cb0f-ea71-4e3d-b62a-f0ea0e888c47  rack1
UN  172.16.100.253  2.19 GiB    4   ? 
a1a16910-9167-4174-b34b-eb859d36347e  rack1
UN  172.16.100.248  127.74 GiB  200 ? 
4bbbe57c-6219-41e5-bbac-de92a9594d53  rack1
UN  172.16.100.37   75.89 GiB   120 ? 
08a19658-40be-4e55-8709-812b3d4ac750  rack1
UN  172.16.100.250  128.3 GiB   200 ? 
b74b6e65-af63-486a-b07f-9e304ec30a39  rack1
UN  172.16.100.34   29.67 GiB   200 ? 
84219e6d-74ac-4d23-89d0-0bd734d0c09e  rack1


-joe





Re: Node failed after drive failed

2021-12-09 Thread Joss
unsubscribe

On Mon, 6 Dec 2021 at 14:12, Joe Obernberger 
wrote:

> Hi All - one node in an 11 node cluster experienced a drive failure on
> the first drive in the list.  I removed that drive from the list so that
> it now reads:
>
> data_file_directories:
>  - /data/2/cassandra/data
>  - /data/3/cassandra/data
>  - /data/4/cassandra/data
>  - /data/5/cassandra/data
>  - /data/6/cassandra/data
>  - /data/8/cassandra/data
>  - /data/9/cassandra/data
>
> But when I try to start the server, I get:
>
> Exception (java.lang.RuntimeException) encountered during startup: A
> node with address /172.16.100.251:7000 already exists, cancelling join.
> Use cassandra.replace_address if you want to replace this node.
> java.lang.RuntimeException: A node with address /172.16.100.251:7000
> already exists, cancelling join. Use cassandra.replace_address if you
> want to replace this node.
>  at
>
> org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:659)
>  at
>
> org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:934)
>  at
>
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:784)
>  at
>
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:729)
>  at
>
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:420)
>  at
>
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:763)
>  at
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:887)
> ERROR [main] 2021-12-05 15:49:48,446 CassandraDaemon.java:909 -
> Exception encountered during startup
> java.lang.RuntimeException: A node with address /172.16.100.251:7000
> already exists, cancelling join. Use cassandra.replace_address if you
> want to replace this node.
>  at
>
> org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:659)
>  at
>
> org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:934)
>  at
>
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:784)
>  at
>
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:729)
>  at
>
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:420)
>  at
>
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:763)
>  at
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:887)
> INFO  [StorageServiceShutdownHook] 2021-12-05 15:49:48,468
> HintsService.java:220 - Paused hints dispatch
> WARN  [StorageServiceShutdownHook] 2021-12-05 15:49:48,470
> Gossiper.java:1993 - No local state, state is in silent shutdown, or
> node hasn't joined, not announcing shutdown
>
> Do I need to remove and re-add the node?  When a drive fails with
> cassandra, is it common for the node to come down?
>
> Thank you!
>
> -Joe Obernberger
>
>