We are doing a ton of upgrades to get out of 2.1.x. We've done probably
20-30 clusters so far and have not encountered anything like this yet.

After upgrade of a node, the restart takes a long time. like 10 minutes
long. ALmost all of our other nodes took less than 2 minutes to upgrade
(aside from sstableupgrades).

The startup stalls on a particular table, it is the largest table at about
300GB, but we have upgraded other clusters with about that much data
without this 8-10 minute delay. We have the ability to roll back the node,
and the restart as a 2.1.x node is normal with no delays.

Alas this is a prod cluster so we are going to try to sstable load the data
on a lower environment and try to replicate the delay. If we can, we will
turn on debug logging.

This occurred on the first node we tried to upgrade. It is possible it is
limited to only this node, but we are gunshy to play around with upgrades
in prod.

We have an automated upgrading program that flushes, snapshots, shuts down
gossip, drains before upgrade, suppressed autostart on upgrade, and has
worked about as flawlessly as one could hope for so far for 2.1->2.2 and
2.2-> 3.11 upgrades.

INFO  [main] 2019-04-16 17:22:17,004 ColumnFamilyStore.java:389 -
Initializing zzzz.access_token
INFO  [main] 2019-04-16 17:22:17,096 ColumnFamilyStore.java:389 -
Initializing zzzz.refresh_token
INFO  [main] 2019-04-16 17:28:52,929 ColumnFamilyStore.java:389 -
Initializing zzzz.userid
INFO  [main] 2019-04-16 17:28:52,930 ColumnFamilyStore.java:389 -
Initializing zzzz.access_token_by_auth

You can see the 6:30 delay in the startup log above. All the other
keyspace/tables initialize in under a second.

Reply via email to