Re: Adding nodes

Bowen Song via user Mon, 11 Jul 2022 03:23:24 -0700

You should look for warning and error level logs in the system.log, notthe debug.log or gc.log, and certainly not only the latest lines.

BTW, you may want to spend some time investigating potential GC issuesbased on the GC logs you provided. I can see 1 full GC in the 3 hourssince the node started. It's not necessarily a problem (if it onlyoccasionally happens during the initial bootstraping process), but itshould justify an investigation if this is the first time you've seen it.


On 11/07/2022 11:09, Marc Hoppins wrote:


Service still running. No errors showing.

The latest info is in debug.log

DEBUG [Streaming-EventLoop-4-3] 2022-07-11 12:00:38,902NettyStreamingMessageSender.java:258 - [Stream#befbc5d0-00e7-11ed-860a-a139feb6a78a channel: 053f2911] Sendingkeep-alive

DEBUG [Stream-Deserializer-/10.1.146.174:7000-053f2911] 2022-07-1112:00:39,790 StreamingInboundHandler.java:179 - [Stream#befbc5d0-00e7-11ed-860a-a139feb6a78a channel: 053f2911] Receivedkeep-alive

DEBUG [ScheduledTasks:1] 2022-07-11 12:00:44,688StorageService.java:2398 - Ignoring application state LOAD from/x.x.x.64:7000 because it is not a member in token metadata

DEBUG [ScheduledTasks:1] 2022-07-11 12:01:44,689StorageService.java:2398 - Ignoring application state LOAD from/x.x.x.64:7000 because it is not a member in token metadata

DEBUG [ScheduledTasks:1] 2022-07-11 12:02:44,690StorageService.java:2398 - Ignoring application state LOAD from/x.x.x.64:7000 because it is not a member in token metadata


And

gc.log.1.current

2022-07-11T12:08:40.562+0200: 11122.837: [GC (Allocation Failure)2022-07-11T12:08:40.562+0200: 11122.838: [ParNew


Desired survivor size 41943040 bytes, new threshold 1 (max 1)

- age   1:      57264 bytes,      57264 total

: 655440K->74K(737280K), 0.0289143 secs] 2575800K->1920436K(8128512K),0.0291355 secs] [Times: user=0.23 sys=0.00, real=0.03 secs]


Heap after GC invocations=6532 (full 1):

par new generation total 737280K, used 74K [0x00000005cae00000,0x00000005fce00000, 0x00000005fce00000)

eden space 655360K, 0% used [0x00000005cae00000, 0x00000005cae00000,0x00000005f2e00000)

from space 81920K, 0% used [0x00000005f2e00000, 0x00000005f2e12848,0x00000005f7e00000)

to space 81920K, 0% used [0x00000005f7e00000, 0x00000005f7e00000,0x00000005fce00000)

concurrent mark-sweep generation total 7391232K, used 1920362K[0x00000005fce00000, 0x00000007c0000000, 0x00000007c0000000)

Metaspace used 53255K, capacity 56387K, committed 56416K, reserved1097728K

class space used 6926K, capacity 7550K, committed 7576K, reserved1048576K

2022-07-11T12:08:40.591+0200: 11122.867: Total time for whichapplication threads were stopped: 0.0309913 seconds, Stopping threadstook: 0.0012599 seconds


{Heap before GC invocations=6532 (full 1):

par new generation total 737280K, used 655434K [0x00000005cae00000,0x00000005fce00000, 0x00000005fce00000)

eden space 655360K, 100% used [0x00000005cae00000, 0x00000005f2e00000,0x00000005f2e00000)

from space 81920K, 0% used [0x00000005f2e00000, 0x00000005f2e12848,0x00000005f7e00000)

to space 81920K, 0% used [0x00000005f7e00000, 0x00000005f7e00000,0x00000005fce00000)

concurrent mark-sweep generation total 7391232K, used 1920362K[0x00000005fce00000, 0x00000007c0000000, 0x00000007c0000000)

Metaspace used 53255K, capacity 56387K, committed 56416K,reserved 1097728K

class space used 6926K, capacity 7550K, committed 7576K, reserved1048576K

2022-07-11T12:08:42.163+0200: 11124.438: [GC (Allocation Failure)2022-07-11T12:08:42.163+0200: 11124.438: [ParNew


Desired survivor size 41943040 bytes, new threshold 1 (max 1)

- age   1:      54984 bytes,      54984 total

: 655434K->80K(737280K), 0.0291754 secs] 2575796K->1920445K(8128512K),0.0293884 secs] [Times: user=0.22 sys=0.00, real=0.03 secs]


*From:*Bowen Song via user <user@cassandra.apache.org>
*Sent:* Monday, July 11, 2022 11:56 AM
*To:* user@cassandra.apache.org
*Subject:* Re: Adding nodes

EXTERNAL

Checking on multiple nodes won't help if the joining node suffers fromany of the issues I described, as it will likely be flipping up anddown frequently, and the existing nodes in the cluster may never reachan agreement before the joining node stays up (or stays down) for awhile. However, it will be a very strange thing if this is apersistent behaviour. If the 'nodetool status' output on each noderemained unchanged for hours and the outputs aren't the same betweennodes, it could be an indicator of something else that had gone wrong.

Does the strange behaviour goes away after the joining node completesthe streaming and fully joins the cluster?


On 11/07/2022 10:46, Marc Hoppins wrote:

    I am beginning to wonder…

    If you recall, I stated that I had checked status on a bunch of
    other nodes from both datacentres and the joining node shows up.
    No errors are occurring anywhere; data is streaming; node is
    joining…but, as I also stated, on the initial node which I only
    used to run the nodetool status, the new node is no longer showing
    up.  Thus the new node has not disappeared from the cluster, only
    from nodetool status on that particular node – which is already in
    the cluster, has been so for several weeks, and is also
    functioning without error.

    *From:*Bowen Song via user <user@cassandra.apache.org>
    <mailto:user@cassandra.apache.org>
    *Sent:* Monday, July 11, 2022 11:40 AM
    *To:* user@cassandra.apache.org
    *Subject:* Re: Adding nodes

    EXTERNAL

    A node in joining state can disappearing from the cluster from
    other nodes' perspective if the joining node stops
    sending/receiving gossip messages to other nodes. This can happen
    when the joining node is severely overloaded, has bad network
    connectivity or stuck in long STW GC pauses. Regardless of the
    reason behind it, the state shown on the joining node will remain
    as joining unless the steaming process has failed.

    The node state is propagated between nodes via gossip, and there
    may be a delay before all existing nodes agree on the fact that
    the joining node is no longer in the cluster. Within that delay,
    different nodes in the cluster may show different results in
    'nodetool status'.

    You should check the logs on the existing nodes and the joining
    node to find out why is it happening, and make appropriate changes
    if needed.

    On 11/07/2022 09:23, Marc Hoppins wrote:

        Further oddities…

        I was sitting here watching our new new node being added
        (nodetool status being run from one of the seed nodes) and all
        was going well.  Then I noticed that our new new node was no
        longer visible.  I checked the service on the new new node and
        it was still running. So I checked status from this node and
        it shows in the status report (still UJ and streaming data),
        but takes a little longer to get the results than it did when
        it was visible from the seed.

        I checked status from a few different nodes in both
        datacentres (including other seeds) and the new new node shows
        up but from the original seed node, it does not appear in the
        nodetool status. Can anyone shed any light on this phenomena?

        *From:*Marc Hoppins <marc.hopp...@eset.com>
        <mailto:marc.hopp...@eset.com>
        *Sent:* Monday, July 11, 2022 10:02 AM
        *To:* user@cassandra.apache.org
        *Cc:* Bowen Song <bo...@bso.ng> <mailto:bo...@bso.ng>
        *Subject:* RE: Adding nodes

        Well then…

        I left this on Friday (still running) and came back to it
        today (Monday) to find the service stopped.  So, I blitzed
        this node from the ring and began anew with a different new node.

        I rather suspect the problem was with trying to use Ansible to
        add these initially - despite the fact that I had a serial
        limit of 1 and a pause of 90s for starting the service on each
        new node (based on the time taken when setting up this
        Cassandra cluster).

        So…moving forward…

        It is recommended to only add one new node at a time from what
        I read.  This leads me to:

        Although I see the new node LOAD is progressing far faster
        than the previous failure, it is still going to take several
        hours to move from UJ to UN, which means I’ll be at this all
        week for the 12 new nodes. If our LOAD per node is around
        400-600GB, is there any practical method to speed up adding
        multiple new nodes which is unlikely to cause problems?  After
        all, in the modern world of big (how big is big?) data, 600G
        per node is far less than the real BIG big-data.

        Marc

        *From:*Jeff Jirsa <jji...@gmail.com>
        *Sent:* Friday, July 8, 2022 5:46 PM
        *To:* cassandra <user@cassandra.apache.org>
        *Cc:* Bowen Song <bo...@bso.ng>
        *Subject:* Re: Adding nodes

        EXTERNAL

        Having a node UJ but not sending/receiving other streams is an
        invalid state (unless 4.0 moved the streaming data out of
        netstats? I'm not 100% sure, but I'm 99% sure it should be
        there).

        It likely stopped the bootstrap process long ago with an error
        (which you may not have seen), and is running without being in
        the ring, but also not trying to join the ring.

        145GB vs 1.1T could be bits vs bytes (that's a factor of 8),
        or it could be that you streamed data and compacted it away.
        Hard to say, but less important - the fact that it's UJ but
        not streaming means there's a different problem.

        If it's me, I do this (not guaranteed to work, your mileage
        may vary, etc):

        1) Look for errors in the logs of ALL hosts. In the joining
        host, look for an exception that stops bootstrap. In the
        others, look for messages about errors streaming, and/or
        exceptions around file access. In all of those hosts, check to
        see if any of them think they're streaming ( nodetool netstats
        again)

        2) Stop the joining host. It's almost certainly not going to
        finish now. Remove data directories, commitlog directory,
        saved caches, hints. Wait 2 minutes. Make sure every other
        host in the cluster sees it disappear from the ring. Then,
        start it fresh and let it bootstrap again. (you could
        alternatively try the resumable bootstrap option, but I never
        use it).

        On Fri, Jul 8, 2022 at 2:56 AM Marc Hoppins
        <marc.hopp...@eset.com> wrote:

            Ifconfig shows RX of 1.1T. This doesn't seem to fit with
            the LOAD of 145GiB (nodetool status), unless I am reading
            that wrong...and the fact that this node still has a
            status of UJ.

            Netstats on this node shows (other than :
            Read Repair Statistics:
            Attempted: 0
            Mismatch (Blocking): 0
            Mismatch (Background): 0
            Pool Name                    Active   Pending   Completed 
             Dropped
            Large messages                  n/a         0           0 
                   0
            Small messages                  n/a        53   569755545 
            15740262
            Gossip messages                 n/a         0      288878 
                   2
            None of this addresses the issue of not being able to add
            more nodes.

            -----Original Message-----
            From: Bowen Song via user <user@cassandra.apache.org>
            Sent: Friday, July 8, 2022 11:47 AM
            To: user@cassandra.apache.org
            Subject: Re: Adding nodes

            EXTERNAL


            I would assume that's 85 GB (i.e. gigabytes) then. Which
            is approximately 79 GiB (i.e. gibibytes). This still
            sounds awfully slow - less than 1MB/s over a full day (24
            hours).

            You said CPU and network aren't the bottleneck. Have you
            checked the disk IO? Also, be mindful with CPU usage. It
            can still be a bottleneck if one thread uses 100% of a CPU
            core while all other cores are idle.

            On 08/07/2022 07:09, Marc Hoppins wrote:
            > Thank you for pointing that out.
            >
            > 85 gigabytes/gibibytes/GIGABYTES/GIBIBYTES/whatever name
            you care to
            > give it
            >
            > CPU and bandwidth are not the problem.
            >
            > Version 4.0.3 but, as I stated, all nodes use the same
            version so the version is not important either.
            >
            > Existing nodes have 350-400+(choose whatever you want to
            call a
            > gigabyte)
            >
            > The problem appears to be that adding new nodes is a
            serial process, which is fine when there is no data and
            each node is added within 2minutes.  It is hardly
            practical in production.
            >
            > -----Original Message-----
            > From: Bowen Song via user <user@cassandra.apache.org>
            > Sent: Thursday, July 7, 2022 8:43 PM
            > To: user@cassandra.apache.org
            > Subject: Re: Adding nodes
            >
            > EXTERNAL
            >
            >
            > 86Gb (that's gigabits, which is 10.75GB, gigabytes) took
            an entire day seems obviously too long. I would check the
            network bandwidth, disk IO and CPU usage and find out what
            is the bottleneck.
            >
            > On 07/07/2022 15:48, Marc Hoppins wrote:
            >> Hi all,
            >>
            >> Cluster of 2 DC and 24 nodes
            >>
            >> DC1 (RF3) = 12 nodes, 16 tokens each
            >> DC2 (RF3) = 12 nodes, 16 tokens each
            >>
            >> Adding 12 more nodes to DC1: I installed Cassandra
            (version is the same across all nodes) but, after the
            first node added, I couldn't seem to add any further nodes.
            >>
            >> I check nodetool status and the newly added node is UJ.
            It remains this way all day and only 86Gb of data is added
            to the node over the entire day (probably not yet
            complete). This seems a little slow and, more than a
            little inconvenient to only be able to add one node at a
            time - or at least one node every 2 minutes. When the
            cluster was created, I timed each node from service start
            to status UJ (having a UUID) and it was around 120
            seconds.  Of course there was no data.
            >>
            >> Is it possible I have some setting not correctly tuned?
            >>
            >> Thanks
            >>
            >> Marc

Re: Adding nodes

Reply via email to