Will try if we get to replicate the node upgrade on that node, or if we
replicate in a lower env.

Thanks


On Wed, Apr 17, 2019 at 1:49 PM Jon Haddad <j...@jonhaddad.com> wrote:

> Let me be more specific - run the async java profiler and generate a
> flame graph to determine where CPU time is spent.
>
> On Wed, Apr 17, 2019 at 11:36 AM Jon Haddad <j...@jonhaddad.com> wrote:
> >
> > Run the async java profiler on the node to determine what it's doing:
> > https://github.com/jvm-profiling-tools/async-profiler
> >
> > On Wed, Apr 17, 2019 at 11:31 AM Carl Mueller
> > <carl.muel...@smartthings.com.invalid> wrote:
> > >
> > > No, we just did the package upgrade 2.1.9 --> 2.2.13
> > >
> > > It definitely feels like some indexes are being recalculated or the
> entire sstables are being scanned due to suspected corruption.
> > >
> > >
> > > On Wed, Apr 17, 2019 at 12:32 PM Jeff Jirsa <jji...@gmail.com> wrote:
> > >>
> > >> There was a time when changing some of the parameters (especially
> bloom filter FP ratio) would cause the bloom filters to be rebuilt on
> startup if the sstables didnt match what was in the schema, leading to a
> delay like that and similar logs. Any chance you changed the schema on that
> table since the last time you restarted it?
> > >>
> > >>
> > >>
> > >> On Wed, Apr 17, 2019 at 10:30 AM Carl Mueller <
> carl.muel...@smartthings.com.invalid> wrote:
> > >>>
> > >>> Oh, the table in question is SizeTiered, had about 10 sstables
> total, it was JBOD across two data directories.
> > >>>
> > >>> On Wed, Apr 17, 2019 at 12:26 PM Carl Mueller <
> carl.muel...@smartthings.com> wrote:
> > >>>>
> > >>>> We are doing a ton of upgrades to get out of 2.1.x. We've done
> probably 20-30 clusters so far and have not encountered anything like this
> yet.
> > >>>>
> > >>>> After upgrade of a node, the restart takes a long time. like 10
> minutes long. ALmost all of our other nodes took less than 2 minutes to
> upgrade (aside from sstableupgrades).
> > >>>>
> > >>>> The startup stalls on a particular table, it is the largest table
> at about 300GB, but we have upgraded other clusters with about that much
> data without this 8-10 minute delay. We have the ability to roll back the
> node, and the restart as a 2.1.x node is normal with no delays.
> > >>>>
> > >>>> Alas this is a prod cluster so we are going to try to sstable load
> the data on a lower environment and try to replicate the delay. If we can,
> we will turn on debug logging.
> > >>>>
> > >>>> This occurred on the first node we tried to upgrade. It is possible
> it is limited to only this node, but we are gunshy to play around with
> upgrades in prod.
> > >>>>
> > >>>> We have an automated upgrading program that flushes, snapshots,
> shuts down gossip, drains before upgrade, suppressed autostart on upgrade,
> and has worked about as flawlessly as one could hope for so far for
> 2.1->2.2 and 2.2-> 3.11 upgrades.
> > >>>>
> > >>>> INFO  [main] 2019-04-16 17:22:17,004 ColumnFamilyStore.java:389 -
> Initializing zzzz.access_token
> > >>>> INFO  [main] 2019-04-16 17:22:17,096 ColumnFamilyStore.java:389 -
> Initializing zzzz.refresh_token
> > >>>> INFO  [main] 2019-04-16 17:28:52,929 ColumnFamilyStore.java:389 -
> Initializing zzzz.userid
> > >>>> INFO  [main] 2019-04-16 17:28:52,930 ColumnFamilyStore.java:389 -
> Initializing zzzz.access_token_by_auth
> > >>>>
> > >>>> You can see the 6:30 delay in the startup log above. All the other
> keyspace/tables initialize in under a second.
> > >>>>
> > >>>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>

Reply via email to