RE: [EXTERNAL] Cassandra 3.11.X upgrades

2020-02-13 Thread Durity, Sean R
+1 on nodetool drain. I added that to our upgrade automation and it really helps with post-upgrade start-up time. Sean Durity From: Erick Ramirez Sent: Wednesday, February 12, 2020 10:29 PM To: user@cassandra.apache.org Subject: Re: [EXTERNAL] Cassandra 3.11.X upgrades Yes to the steps. The

Re: Connection reset by peer

2020-02-13 Thread Erick Ramirez
> > Last question: In all your experiences, how high can the latency (simple > ping response times go) before it becomes a problem? (Obviously the lower > the better but is there some sort of cut off/formula where problems can be > expected intermittently like the connection resets)

RE: [EXTERNAL] Re: Cassandra Encyrption between DC

2020-02-13 Thread Durity, Sean R
I will just add-on that I usually reserve security changes as the primary exception where app downtime may be necessary with Cassandra. (DSE has some Transitional tools that are useful, though.) Sometimes a short outage is preferred over a longer, more-complicated attempt to keep the app up.

Corruption of frozen UDT during upgrade

2020-02-13 Thread Paul Chandler
Hi all, I have looked at the release notes for the up coming release 3.11.6 and seen the part about corruption of frozen UDT types during upgrade from 3.0. We have a number of cluster using UDT and have been upgrading to 3.11.4 and haven’t noticed any problems. In the ticket ( CASSANDRA-15035

Re: Connection reset by peer

2020-02-13 Thread Reid Pinchback
Since ping is ICMP, not TCP, you probably want to investigate a mix of TCP and CPU stats to see what is behind the slow pings. I’d guess you are getting network impacts beyond what the ping times are hinting at. ICMP isn’t subject to retransmission, so your TCP situation could be far worse

Corrupt SSTable Cassandra 3.11.2

2020-02-13 Thread manish khandelwal
Hi I see a corrupt SSTable in one of my keyspace table on one node. Cluster is 3 nodes with replication 3. Cassandra version is 3.11.2. I am thinking on following lines to resolve the corrupt SSTable issue. 1. Run nodetool scrub. 2. If step 1 fails, run offline sstabablescrub. 3. If step 2 fails,

Re: [EXTERNAL] Re: Cassandra Encyrption between DC

2020-02-13 Thread Jai Bheemsen Rao Dhanwada
thank you On Thu, Feb 13, 2020 at 6:30 AM Durity, Sean R wrote: > I will just add-on that I usually reserve security changes as the primary > exception where app downtime may be necessary with Cassandra. (DSE has some > Transitional tools that are useful, though.) Sometimes a short outage is >

Re: [EXTERNAL] Cassandra 3.11.X upgrades

2020-02-13 Thread Sergio
- Verify that nodetool upgradesstables has completed successfully on all nodes from any previous upgrade - Turn off repairs and any other streaming operations (add/remove nodes) - Nodetool drain on the node that needs to be stopped (seeds first, preferably) - Stop an un-upgraded

New seed node in the cluster immediately UN without passing for UJ state

2020-02-13 Thread Sergio
Hi guys! I don't know how but this is the first time that I see such behavior. I wanted to add a new node in the cluster and it looks to be working fine but instead to wait for 2-3 hours data streaming like 100GB it immediately went to the UN (UP and NORMAL) state. I saw a bunch of exception in

Re: Corrupt SSTable Cassandra 3.11.2

2020-02-13 Thread Erick Ramirez
You need to stop C* in order to run the offline sstable scrub utility. That's why it's referred to as "offline". :) Do you have any idea on what caused the corruption? It's highly unusual that you're thinking of removing all the files for just one table. Typically if the corruption was a result

Re: Corruption of frozen UDT during upgrade

2020-02-13 Thread Erick Ramirez
Paul, if you do a sstabledump in C* 3.0 (before upgrading) and compare it to the dump output after upgrading to C* 3.11 then you will see that the cell names in the outputs are different. This is the symptom of the broken serialization header which leads to various exceptions during compactions

Re: New seed node in the cluster immediately UN without passing for UJ state

2020-02-13 Thread Jon Haddad
Seeds don't bootstrap, don't list new nodes as seeds. On Thu, Feb 13, 2020 at 5:23 PM Sergio wrote: > Hi guys! > > I don't know how but this is the first time that I see such behavior. I > wanted to add a new node in the cluster and it looks to be working fine but > instead to wait for 2-3

Re: New seed node in the cluster immediately UN without passing for UJ state

2020-02-13 Thread Sergio
Should I do something to fix it or leave as it? On Thu, Feb 13, 2020, 5:29 PM Jon Haddad wrote: > Seeds don't bootstrap, don't list new nodes as seeds. > > On Thu, Feb 13, 2020 at 5:23 PM Sergio wrote: > >> Hi guys! >> >> I don't know how but this is the first time that I see such behavior. I

Re: New seed node in the cluster immediately UN without passing for UJ state

2020-02-13 Thread Erick Ramirez
> > Should I do something to fix it or leave as it? It depends on what your intentions are. I would use the "replace" method to build it correctly. At a high level: - remove the IP from it's own seeds list - delete the contents of data, commitlog and saved_caches - add the replace flag in

Re: New seed node in the cluster immediately UN without passing for UJ state

2020-02-13 Thread Erick Ramirez
> > I want to have more than one seed node in each DC, so unless I don't > restart the node after changing the seed_list in that node it will not > become the seed. That's not really going to hurt you if you have other seeds in other DCs. But if you're willing to take the hit from the restart

Re: New seed node in the cluster immediately UN without passing for UJ state

2020-02-13 Thread Sergio
Thank you very much for this helpful information! I opened a new thread for the other question :) Sergio Il giorno gio 13 feb 2020 alle ore 19:22 Erick Ramirez < erick.rami...@datastax.com> ha scritto: > I want to have more than one seed node in each DC, so unless I don't >> restart the node

Re: New seed node in the cluster immediately UN without passing for UJ state

2020-02-13 Thread Erick Ramirez
> > I did decommission of this node and I did all the steps mentioned except > the -Dcassandra.replace_address and now it is streaming correctly! That works too but I was trying to avoid the rebalance operations (like streaming to restore replica counts) since they can be expensive. So

Re: New seed node in the cluster immediately UN without passing for UJ state

2020-02-13 Thread Sergio
Right now yes I have one seed per DC. I want to have more than one seed node in each DC, so unless I don't restart the node after changing the seed_list in that node it will not become the seed. Do I need to update the seed_list across all the nodes even in separate DCs and perform a rolling

AWS I3.XLARGE retiring instances advices

2020-02-13 Thread Sergio
We have i3xlarge instances with data directory in the XFS filesystem that is ephemeral and *hints*, *commit_log* and *saved_caches* in the EBS volume. Whenever AWS is going to retire the instance due to degraded hardware performance is it better: Option 1) - Nodetool drain - Stop cassandra

Re: AWS I3.XLARGE retiring instances advices

2020-02-13 Thread Sergio
Thank you for the advices! Best! Sergio On Thu, Feb 13, 2020, 7:44 PM Erick Ramirez wrote: > Option 1 is a cheaper option because the cluster doesn't need to rebalance > (with the loss of a replica) post-decommission then rebalance again when > you add a new node. > > The hints directory on

Re: New seed node in the cluster immediately UN without passing for UJ state

2020-02-13 Thread Erick Ramirez
Not a problem. And I've just responded on the new thread. Cheers!  >

Re: Corrupt SSTable Cassandra 3.11.2

2020-02-13 Thread manish khandelwal
Hi Eric Thanks for reply. Reason for corruption is unknown to me. I just found the corrupt table when scheduled repair failed with logs showing *ERROR [ValidationExecutor:16] 2020-01-21 19:13:18,123 CassandraDaemon.java:228 - Exception in thread

Re: AWS I3.XLARGE retiring instances advices

2020-02-13 Thread Jeff Jirsa
Feels that way and most people don’t do it, but definitely required for strict correctness. > On Feb 13, 2020, at 8:57 PM, Erick Ramirez wrote: > >  > Interesting... though it feels a bit extreme unless you're dealing with a > cluster that's constantly dropping mutations. In which case,

Re: Corrupt SSTable Cassandra 3.11.2

2020-02-13 Thread Erick Ramirez
It will achieve the outcome you are after but I doubt anyone would recommend that approach. It's like using a sledgehammer when an ordinary hammer would suffice. And if you were hitting some bug then you'd run into the same problem anyway. Can you post the full stack trace? It might provide us

Re: New seed node in the cluster immediately UN without passing for UJ state

2020-02-13 Thread Sergio
I did decommission of this node and I did all the steps mentioned except the -Dcassandra.replace_address and now it is streaming correctly! So basically, if I want this new node as seed should I add its IP address after it joined the cluster and after - nodetool drain - restart cassandra? I

Re: New seed node in the cluster immediately UN without passing for UJ state

2020-02-13 Thread Sergio
Thank you very much for your response! 2 things: 1) If I don't restart the node after changing the seed list this will never become the seed and I would like to be sure that I don't find my self in a spot where I don't have seed nodes and this means that I can not add a node in the cluster 2)

Re: New seed node in the cluster immediately UN without passing for UJ state

2020-02-13 Thread Erick Ramirez
> > 1) If I don't restart the node after changing the seed list this will > never become the seed and I would like to be sure that I don't find my self > in a spot where I don't have seed nodes and this means that I can not add a > node in the cluster Are you saying you only have 1 seed node in

Re: Corrupt SSTable Cassandra 3.11.2

2020-02-13 Thread manish khandelwal
Hi Erick Thanks for your quick response. I have attached the full stacktrace which show exception during validation phase of table repair. I would like to know what will be "ordinary hammer" in this case. Do you want to suggest that deleting only corrupt sstable file ( in this case

Re: New seed node in the cluster immediately UN without passing for UJ state

2020-02-13 Thread Erick Ramirez
> > I wanted to add a new node in the cluster and it looks to be working fine > but instead to wait for 2-3 hours data streaming like 100GB it immediately > went to the UN (UP and NORMAL) state. > Are you running a repair? I can't see how it's possibly receiving 100GB since it won't bootstrap.

Re: New seed node in the cluster immediately UN without passing for UJ state

2020-02-13 Thread Sergio
Thanks for your fast reply! No repairs are running! https://cassandra.apache.org/doc/latest/faq/index.html#does-single-seed-mean-single-point-of-failure I added the node IP itself and the IP of existing seeds and I started Cassandra. So the right procedure is not to add in the seed list the

Re: AWS I3.XLARGE retiring instances advices

2020-02-13 Thread Erick Ramirez
Option 1 is a cheaper option because the cluster doesn't need to rebalance (with the loss of a replica) post-decommission then rebalance again when you add a new node. The hints directory on EBS is irrelevant because it would only contain mutations to replay to down replicas if the node was a

Re: AWS I3.XLARGE retiring instances advices

2020-02-13 Thread Jeff Jirsa
Option 1 is only strictly safe if you run repair while the down replica is down (otherwise you validate quorum consistency guarantees) Option 2 is probably easier to manage and wont require any special effort to avoid violating consistency. I'd probably go with option 2. On Thu, Feb 13, 2020

Re: AWS I3.XLARGE retiring instances advices

2020-02-13 Thread Erick Ramirez
Interesting... though it feels a bit extreme unless you're dealing with a cluster that's constantly dropping mutations. In which case, you have bigger problems anyway. :)

Re: Corrupt SSTable Cassandra 3.11.2

2020-02-13 Thread Erick Ramirez
The log shows that the the problem occurs when decompressing the SSTable but there's not much actionable info from it. I would like to know what will be "ordinary hammer" in this case. Do you > want to suggest that deleting only corrupt sstable file ( in this case > mc-1234-big-*.db) would be

Re: Corrupt SSTable Cassandra 3.11.2

2020-02-13 Thread Jeff Jirsa
Agree this is both strictly possible and more common with LCS. The only thing that's strictly correct to do is treat every corrupt sstable exception as a failed host, and replace it just like you would a failed host. On Thu, Feb 13, 2020 at 10:55 PM manish khandelwal <

Re: Corrupt SSTable Cassandra 3.11.2

2020-02-13 Thread manish khandelwal
Thanks Erick I would like to explain how data resurrection can take place with single SSTable deletion. Consider this case of table with Levelled Compaction Strategy 1. Data A written a long time back. 2. Data A is deleted and tombstone is created. 3. After GC grace tombstone is purgeable. 4.

Re: Corrupt SSTable Cassandra 3.11.2

2020-02-13 Thread manish khandelwal
Thanks Jeff for your response. Do you see any risk in following approach 1. Stop the node. 2. Remove all sstable files from */var/lib/cassandra/data/keyspace/tablename-23dfadf32adf33d33s333s33s3s33 * directory. 3. Start the node. 4. Run full repair on this particular table I wanted to go