Re: removing a drive - 4.0.1

2022-01-07 Thread Joe Obernberger

Thank you Dmitry.
At this point the one node where I removed the first drive from the list 
and then rebuilt it, is now in some odd state.  Locally nodetool status 
shows it as up (UN), but all the other nodes in the cluster show it as 
down (DN).


Not sure what to do at this juncture.

-Joe

On 1/7/2022 4:38 PM, Dmitry Saprykin wrote:

There is a jira ticket describing your situation
https://issues.apache.org/jira/plugins/servlet/mobile#issue/CASSANDRA-14793

I may be wrong but is seems that system directories are pinned to 
first data directory in cassandra.yaml by default. When you removed 
first item from the list system data regenerated in the new first 
directory in the list. And then merged??? when original first dir returned


On Fri, Jan 7, 2022 at 4:23 PM Joe Obernberger 
 wrote:


Hi - in order to get the node back up and running I did the following:
Deleted all data on the node:
Added: -Dcassandra.replace_address=172.16.100.39
to the cassandra.env.sh  file, and
started it up.  It is currently bootstrapping.

In cassandra.yaml, say you have the following:

data_file_directories:
    - /data/1/cassandra
    - /data/2/cassandra
    - /data/3/cassandra
    - /data/4/cassandra
    - /data/5/cassandra
    - /data/6/cassandra
    - /data/7/cassandra
    - /data/8/cassandra

If I change the above to:
#    - /data/1/cassandra
    - /data/2/cassandra
    - /data/3/cassandra
    - /data/4/cassandra
    - /data/5/cassandra
    - /data/6/cassandra
    - /data/7/cassandra
    - /data/8/cassandra

the problem happens.  If I change it to:

    - /data/1/cassandra
    - /data/2/cassandra
    - /data/3/cassandra
    - /data/4/cassandra
    - /data/5/cassandra
    - /data/6/cassandra
    - /data/7/cassandra
#    - /data/8/cassandra

the node starts up OK.  I assume it will recover the missing data
during a repair?

-Joe

On 1/7/2022 4:13 PM, Mano ksio wrote:

Hi, you may have already tried, but this may help.

https://stackoverflow.com/questions/29323709/unable-to-start-cassandra-node-already-exists


can you be little narrate 'If I remove a drive other than the
first one'? what does it means

On Fri, Jan 7, 2022 at 2:52 PM Joe Obernberger
 wrote:

Hi All - I have a 13 node cluster running Cassandra 4.0.1. 
If I stop a
node, edit the cassandra.yaml file, comment out the first
drive in the
list, and restart the node, it fails to start saying that a
node already
exists in the cluster with the IP address.

If I put the drive back into the list, the node still fails
to start
with the same error.  At this point the node is useless and I
think the
only option is to remove all the data, and re-boostrap it?
-

ERROR [main] 2022-01-07 15:50:09,155 CassandraDaemon.java:909 -
Exception encountered during startup
java.lang.RuntimeException: A node with address
/172.16.100.39:7000 
already exists, cancelling join. Use
cassandra.replace_address if you
want to replace this node.
 at

org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:659)
 at

org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:934)
 at

org.apache.cassandra.service.StorageService.initServer(StorageService.java:784)
 at

org.apache.cassandra.service.StorageService.initServer(StorageService.java:729)
 at

org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:420)
 at

org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:763)
 at

org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:887)

---

If I remove a drive other than the first one, this problem
doesn't
occur.  Any other options?  It appears that if it the first
drive in the
list goes bad, or is just removed, that entire node must be
replaced.

-Joe




Virus-free. www.avg.com




<#m_3361535422621871688_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>


Re: about memory problem in write heavy system..

2022-01-07 Thread daemeon reiydelle
Maybe SSD's? Take a look at the IO read/write wait times.

FYI, your config changes simply push more activity into memory. Trading IO
for mem footprint ;{)

*Daemeon Reiydelle*

*email: daeme...@gmail.com *
*San Francisco 1.415.501.0198/Skype daemeon.c.m.reiydelle*

Cognitive Bias: (written in 1935) ...
One of the painful things about our time is that those who feel certainty
are stupid, and those with any imagination and understanding are filled
with doubt and indecision. - Bertrand Russel



On Fri, Jan 7, 2022 at 8:27 AM Jeff Jirsa  wrote:

> 3.11.4 is a very old release, with lots of known bugs. It's possible the
> memory is related to that.
>
> If you bounce one of the old nodes, where does the memory end up?
>
>
> On Thu, Jan 6, 2022 at 3:44 PM Eunsu Kim  wrote:
>
>>
>> Looking at the memory usage chart, it seems that the physical memory
>> usage of the existing node has increased since the new node was added with
>> auto_bootstrap=false.
>>
>>
>>
>>
>> On Fri, Jan 7, 2022 at 1:11 AM Eunsu Kim  wrote:
>>
>>> Hi,
>>>
>>> I have a Cassandra cluster(3.11.4) that does heavy writing work.
>>> (14k~16k write throughput per second per node)
>>>
>>> Nodes are physical machine in data center. Number of nodes are 30. Each
>>> node has three data disks mounted.
>>>
>>>
>>> A few days ago, a QueryTimeout problem occurred due to Full GC.
>>> So, referring to this blog(
>>> https://thelastpickle.com/blog/2018/04/11/gc-tuning.html), it seemed to
>>> have been solved by changing the memtable_allocation_type to
>>> offheap_objects.
>>>
>>> But today, I got an alarm saying that some nodes are using more than 90%
>>> of physical memory. (115GiB /125GiB)
>>>
>>> Native memory usage of some nodes is gradually increasing.
>>>
>>>
>>>
>>> All tables use TWCS, and TTL is 2 weeks.
>>>
>>> Below is the applied jvm option.
>>>
>>> -Xms31g
>>> -Xmx31g
>>> -XX:+UseG1GC
>>> -XX:G1RSetUpdatingPauseTimePercent=5
>>> -XX:MaxGCPauseMillis=500
>>> -XX:InitiatingHeapOccupancyPercent=70
>>> -XX:ParallelGCThreads=24
>>> -XX:ConcGCThreads=24
>>> …
>>>
>>>
>>> What additional things can I try?
>>>
>>> I am looking forward to the advice of experts.
>>>
>>> Regards.
>>>
>>
>>


Re: removing a drive - 4.0.1

2022-01-07 Thread Dmitry Saprykin
There is a jira ticket describing your situation
https://issues.apache.org/jira/plugins/servlet/mobile#issue/CASSANDRA-14793

I may be wrong but is seems that system directories are pinned to first
data directory in cassandra.yaml by default. When you removed first item
from the list system data regenerated in the new first directory in the
list. And then merged??? when original first dir returned

On Fri, Jan 7, 2022 at 4:23 PM Joe Obernberger 
wrote:

> Hi - in order to get the node back up and running I did the following:
> Deleted all data on the node:
> Added: -Dcassandra.replace_address=172.16.100.39
> to the cassandra.env.sh file, and started it up.  It is currently
> bootstrapping.
>
> In cassandra.yaml, say you have the following:
>
> data_file_directories:
> - /data/1/cassandra
> - /data/2/cassandra
> - /data/3/cassandra
> - /data/4/cassandra
> - /data/5/cassandra
> - /data/6/cassandra
> - /data/7/cassandra
> - /data/8/cassandra
>
> If I change the above to:
> #- /data/1/cassandra
> - /data/2/cassandra
> - /data/3/cassandra
> - /data/4/cassandra
> - /data/5/cassandra
> - /data/6/cassandra
> - /data/7/cassandra
> - /data/8/cassandra
>
> the problem happens.  If I change it to:
>
> - /data/1/cassandra
> - /data/2/cassandra
> - /data/3/cassandra
> - /data/4/cassandra
> - /data/5/cassandra
> - /data/6/cassandra
> - /data/7/cassandra
> #- /data/8/cassandra
>
> the node starts up OK.  I assume it will recover the missing data during a
> repair?
>
> -Joe
> On 1/7/2022 4:13 PM, Mano ksio wrote:
>
> Hi, you may have already tried, but this may help.
> https://stackoverflow.com/questions/29323709/unable-to-start-cassandra-node-already-exists
>
> can you be little narrate 'If I remove a drive other than the first one'?
> what does it means
>
> On Fri, Jan 7, 2022 at 2:52 PM Joe Obernberger <
> joseph.obernber...@gmail.com> wrote:
>
>> Hi All - I have a 13 node cluster running Cassandra 4.0.1.  If I stop a
>> node, edit the cassandra.yaml file, comment out the first drive in the
>> list, and restart the node, it fails to start saying that a node already
>> exists in the cluster with the IP address.
>>
>> If I put the drive back into the list, the node still fails to start
>> with the same error.  At this point the node is useless and I think the
>> only option is to remove all the data, and re-boostrap it?
>> -
>>
>> ERROR [main] 2022-01-07 15:50:09,155 CassandraDaemon.java:909 -
>> Exception encountered during startup
>> java.lang.RuntimeException: A node with address /172.16.100.39:7000
>> already exists, cancelling join. Use cassandra.replace_address if you
>> want to replace this node.
>>  at
>>
>> org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:659)
>>  at
>>
>> org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:934)
>>  at
>>
>> org.apache.cassandra.service.StorageService.initServer(StorageService.java:784)
>>  at
>>
>> org.apache.cassandra.service.StorageService.initServer(StorageService.java:729)
>>  at
>>
>> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:420)
>>  at
>>
>> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:763)
>>  at
>>
>> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:887)
>>
>> ---
>>
>> If I remove a drive other than the first one, this problem doesn't
>> occur.  Any other options?  It appears that if it the first drive in the
>> list goes bad, or is just removed, that entire node must be replaced.
>>
>> -Joe
>>
>>
>
> 
>  Virus-free.
> www.avg.com
> 
> <#m_3361535422621871688_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>
>


Re: removing a drive - 4.0.1

2022-01-07 Thread Joe Obernberger

Hi - in order to get the node back up and running I did the following:
Deleted all data on the node:
Added: -Dcassandra.replace_address=172.16.100.39
to the cassandra.env.sh file, and started it up.  It is currently 
bootstrapping.


In cassandra.yaml, say you have the following:

data_file_directories:
    - /data/1/cassandra
    - /data/2/cassandra
    - /data/3/cassandra
    - /data/4/cassandra
    - /data/5/cassandra
    - /data/6/cassandra
    - /data/7/cassandra
    - /data/8/cassandra

If I change the above to:
#    - /data/1/cassandra
    - /data/2/cassandra
    - /data/3/cassandra
    - /data/4/cassandra
    - /data/5/cassandra
    - /data/6/cassandra
    - /data/7/cassandra
    - /data/8/cassandra

the problem happens.  If I change it to:

    - /data/1/cassandra
    - /data/2/cassandra
    - /data/3/cassandra
    - /data/4/cassandra
    - /data/5/cassandra
    - /data/6/cassandra
    - /data/7/cassandra
#    - /data/8/cassandra

the node starts up OK.  I assume it will recover the missing data during 
a repair?


-Joe

On 1/7/2022 4:13 PM, Mano ksio wrote:
Hi, you may have already tried, but this may help. 
https://stackoverflow.com/questions/29323709/unable-to-start-cassandra-node-already-exists 



can you be little narrate 'If I remove a drive other than the first 
one'? what does it means


On Fri, Jan 7, 2022 at 2:52 PM Joe Obernberger 
 wrote:


Hi All - I have a 13 node cluster running Cassandra 4.0.1.  If I
stop a
node, edit the cassandra.yaml file, comment out the first drive in
the
list, and restart the node, it fails to start saying that a node
already
exists in the cluster with the IP address.

If I put the drive back into the list, the node still fails to start
with the same error.  At this point the node is useless and I
think the
only option is to remove all the data, and re-boostrap it?
-

ERROR [main] 2022-01-07 15:50:09,155 CassandraDaemon.java:909 -
Exception encountered during startup
java.lang.RuntimeException: A node with address
/172.16.100.39:7000 
already exists, cancelling join. Use cassandra.replace_address if you
want to replace this node.
 at

org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:659)
 at

org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:934)
 at

org.apache.cassandra.service.StorageService.initServer(StorageService.java:784)
 at

org.apache.cassandra.service.StorageService.initServer(StorageService.java:729)
 at
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:420)
 at

org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:763)
 at
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:887)

---

If I remove a drive other than the first one, this problem doesn't
occur.  Any other options?  It appears that if it the first drive
in the
list goes bad, or is just removed, that entire node must be replaced.

-Joe


 
	Virus-free. www.avg.com 
 



<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

Re: removing a drive - 4.0.1

2022-01-07 Thread Mano ksio
Hi, you may have already tried, but this may help.
https://stackoverflow.com/questions/29323709/unable-to-start-cassandra-node-already-exists

can you be little narrate 'If I remove a drive other than the first one'?
what does it means

On Fri, Jan 7, 2022 at 2:52 PM Joe Obernberger 
wrote:

> Hi All - I have a 13 node cluster running Cassandra 4.0.1.  If I stop a
> node, edit the cassandra.yaml file, comment out the first drive in the
> list, and restart the node, it fails to start saying that a node already
> exists in the cluster with the IP address.
>
> If I put the drive back into the list, the node still fails to start
> with the same error.  At this point the node is useless and I think the
> only option is to remove all the data, and re-boostrap it?
> -
>
> ERROR [main] 2022-01-07 15:50:09,155 CassandraDaemon.java:909 -
> Exception encountered during startup
> java.lang.RuntimeException: A node with address /172.16.100.39:7000
> already exists, cancelling join. Use cassandra.replace_address if you
> want to replace this node.
>  at
>
> org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:659)
>  at
>
> org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:934)
>  at
>
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:784)
>  at
>
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:729)
>  at
>
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:420)
>  at
>
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:763)
>  at
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:887)
>
> ---
>
> If I remove a drive other than the first one, this problem doesn't
> occur.  Any other options?  It appears that if it the first drive in the
> list goes bad, or is just removed, that entire node must be replaced.
>
> -Joe
>
>


removing a drive - 4.0.1

2022-01-07 Thread Joe Obernberger
Hi All - I have a 13 node cluster running Cassandra 4.0.1.  If I stop a 
node, edit the cassandra.yaml file, comment out the first drive in the 
list, and restart the node, it fails to start saying that a node already 
exists in the cluster with the IP address.


If I put the drive back into the list, the node still fails to start 
with the same error.  At this point the node is useless and I think the 
only option is to remove all the data, and re-boostrap it?

-

ERROR [main] 2022-01-07 15:50:09,155 CassandraDaemon.java:909 - 
Exception encountered during startup
java.lang.RuntimeException: A node with address /172.16.100.39:7000 
already exists, cancelling join. Use cassandra.replace_address if you 
want to replace this node.
    at 
org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:659)
    at 
org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:934)
    at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:784)
    at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:729)
    at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:420)
    at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:763)
    at 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:887)


---

If I remove a drive other than the first one, this problem doesn't 
occur.  Any other options?  It appears that if it the first drive in the 
list goes bad, or is just removed, that entire node must be replaced.


-Joe



Re: about memory problem in write heavy system..

2022-01-07 Thread Jeff Jirsa
3.11.4 is a very old release, with lots of known bugs. It's possible the
memory is related to that.

If you bounce one of the old nodes, where does the memory end up?


On Thu, Jan 6, 2022 at 3:44 PM Eunsu Kim  wrote:

>
> Looking at the memory usage chart, it seems that the physical memory usage
> of the existing node has increased since the new node was added with
> auto_bootstrap=false.
>
>
>
>
> On Fri, Jan 7, 2022 at 1:11 AM Eunsu Kim  wrote:
>
>> Hi,
>>
>> I have a Cassandra cluster(3.11.4) that does heavy writing work. (14k~16k
>> write throughput per second per node)
>>
>> Nodes are physical machine in data center. Number of nodes are 30. Each
>> node has three data disks mounted.
>>
>>
>> A few days ago, a QueryTimeout problem occurred due to Full GC.
>> So, referring to this blog(
>> https://thelastpickle.com/blog/2018/04/11/gc-tuning.html), it seemed to
>> have been solved by changing the memtable_allocation_type to
>> offheap_objects.
>>
>> But today, I got an alarm saying that some nodes are using more than 90%
>> of physical memory. (115GiB /125GiB)
>>
>> Native memory usage of some nodes is gradually increasing.
>>
>>
>>
>> All tables use TWCS, and TTL is 2 weeks.
>>
>> Below is the applied jvm option.
>>
>> -Xms31g
>> -Xmx31g
>> -XX:+UseG1GC
>> -XX:G1RSetUpdatingPauseTimePercent=5
>> -XX:MaxGCPauseMillis=500
>> -XX:InitiatingHeapOccupancyPercent=70
>> -XX:ParallelGCThreads=24
>> -XX:ConcGCThreads=24
>> …
>>
>>
>> What additional things can I try?
>>
>> I am looking forward to the advice of experts.
>>
>> Regards.
>>
>
>