Thanks for replying Jeff.

Responses below.

On Sat, Aug 12, 2017 at 8:33 PM Jeff Jirsa <jji...@gmail.com> wrote:

> Answers inline
>
> --
> Jeff Jirsa
>
>
> > On Aug 12, 2017, at 2:58 PM, brian.spind...@gmail.com wrote:
> >
> > Hi folks, hopefully a quick one:
> >
> > We are running a 12 node cluster (2.1.15) in AWS with Ec2Snitch.  It's
> all in one region but spread across 3 availability zones.  It was nicely
> balanced with 4 nodes in each.
> >
> > But with a couple of failures and subsequent provisions to the wrong az
> we now have a cluster with :
> >
> > 5 nodes in az A
> > 5 nodes in az B
> > 2 nodes in az C
> >
> > Not sure why, but when adding a third node in AZ C it fails to stream
> after getting all the way to completion and no apparent error in logs.
> I've looked at a couple of bugs referring to scrubbing and possible OOM
> bugs due to metadata writing at end of streaming (sorry don't have ticket
> handy).  I'm worried I might not be able to do much with these since the
> disk space usage is high and they are under a lot of load given the small
> number of them for this rack.
>
> You'll definitely have higher load on az C instances with rf=3 in this
> ratio


> Streaming should still work - are you sure it's not busy doing something?
> Like building secondary index or similar? jstack thread dump would be
> useful, or at least nodetool tpstats
>
> Only other thing might be a backup.  We do incrementals x1hr and snapshots
x24h; they are shipped to s3 then links are cleaned up.  The error I get on
the node I'm trying to add to rack C is:

ERROR [main] 2017-08-12 23:54:51,546 CassandraDaemon.java:583 - Exception
encountered during startup
java.lang.RuntimeException: Error during boostrap: Stream failed
        at
org.apache.cassandra.dht.BootStrapper.bootstrap(BootStrapper.java:87)
~[apache-cassandra-2.1.15.jar:2.1.15]
        at
org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1166)
~[apache-cassandra-2.1.15.jar:2.1.15]
        at
org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:944)
~[apache-cassandra-2.1.15.jar:2.1.15]
        at
org.apache.cassandra.service.StorageService.initServer(StorageService.java:740)
~[apache-cassandra-2.1.15.jar:2.1.15]
        at
org.apache.cassandra.service.StorageService.initServer(StorageService.java:617)
~[apache-cassandra-2.1.15.jar:2.1.15]
        at
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:391)
[apache-cassandra-2.1.15.jar:2.1.15]
        at
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:566)
[apache-cassandra-2.1.15.jar:2.1.15]
        at
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:655)
[apache-cassandra-2.1.15.jar:2.1.15]
Caused by: org.apache.cassandra.streaming.StreamException: Stream failed
        at
org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:85)
~[apache-cassandra-2.1.15.jar:2.1.15]
        at
com.google.common.util.concurrent.Futures$4.run(Futures.java:1172)
~[guava-16.0.jar:na]
        at
com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297)
~[guava-16.0.jar:na]
        at
com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)
~[guava-16.0.jar:na]
        at
com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145)
~[guava-16.0.jar:na]
        at
com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202)
~[guava-16.0.jar:na]
        at
org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:209)
~[apache-cassandra-2.1.15.jar:2.1.15]
        at
org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:185)
~[apache-cassandra-2.1.15.jar:2.1.15]
        at
org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:413)
~[apache-cassandra-2.1.15.jar:2.1.15]
        at
org.apache.cassandra.streaming.StreamSession.maybeCompleted(StreamSession.java:700)
~[apache-cassandra-2.1.15.jar:2.1.15]
        at
org.apache.cassandra.streaming.StreamSession.taskCompleted(StreamSession.java:661)
~[apache-cassandra-2.1.15.jar:2.1.15]
        at
org.apache.cassandra.streaming.StreamReceiveTask$OnCompletionRunnable.run(StreamReceiveTask.java:179)
~[apache-cassandra-2.1.15.jar:2.1.15]
        at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
~[na:1.8.0_112]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
~[na:1.8.0_112]
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
~[na:1.8.0_112]
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
~[na:1.8.0_112]
        at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_112]
WARN  [StorageServiceShutdownHook] 2017-08-12 23:54:51,582
Gossiper.java:1462 - No local state or state is in silent shutdown, not
announcing shutdown
INFO  [StorageServiceShutdownHook] 2017-08-12 23:54:51,582
MessagingService.java:734 - Waiting for messaging service to quiesce
INFO  [ACCEPT-/10.40.17.114] 2017-08-12 23:54:51,583
MessagingService.java:1020 - MessagingService has terminated the accept()
thread

And I got this on this same node when it was bootstrapping, I ran 'nodetool
netstats' just before it shutdown:

        Receiving 377 files, 161928296443 bytes total. Already received 377
files, 161928296443 bytes total

TPStats on host that was streaming the data to this node:

Pool Name                    Active   Pending      Completed   Blocked  All
time blocked
MutationStage                     1         1     4488289014         0
            0
ReadStage                         0         0       24486526         0
            0
RequestResponseStage              0         0     3038847374
<(303)%20884-7374>         0                 0
ReadRepairStage                   0         0        1601576         0
            0
CounterMutationStage              0         0          68403         0
            0
MiscStage                         0         0              0         0
            0
AntiEntropySessions               0         0              0         0
            0
HintedHandoff                     0         0             18         0
            0
GossipStage                       0         0        2786892         0
            0
CacheCleanupExecutor              0         0              0         0
            0
InternalResponseStage             0         0          61115         0
            0
CommitLogArchiver                 0         0              0         0
            0
CompactionExecutor                4        83         304167         0
            0
ValidationExecutor                0         0          78249         0
            0
MigrationStage                    0         0          94201         0
            0
AntiEntropyStage                  0         0         160505         0
            0
PendingRangeCalculator            0         0             30         0
            0
Sampler                           0         0              0         0
            0
MemtableFlushWriter               0         0          71270         0
            0
MemtablePostFlush                 0         0         175209         0
            0
MemtableReclaimMemory             0         0          81222         0
            0
Native-Transport-Requests         2         0     1983565628         0
      9405444

Message type           Dropped
READ                       218
RANGE_SLICE                 15
_TRACE                       0
MUTATION               2949001
COUNTER_MUTATION             0
BINARY                       0
REQUEST_RESPONSE             0
PAGED_RANGE                  0
READ_REPAIR               8571

I can get a jstack if needed.

>
> >
> > Rather than troubleshoot this further, what I was thinking about doing
> was:
> > - drop the replication factor on our keyspace to two
>
> Repair before you do this, or you'll lose your consistency guarantees
>

Given the load on the 2 nodes in rack C I'm hoping a repair will succeed.


> > - hopefully this would reduce load on these two remaining nodes
>
> It should, racks awareness guarantees on replica per rack if rf==num
> racks, so right now those 2 c machines have 2.5x as much data as the
> others. This will drop that requirement and drop the load significantly
>
> > - run repairs/cleanup across the cluster
> > - then shoot these two nodes in the 'c' rack
>
> Why shoot the c instances? Why not drop RF and then add 2 more C
> instances, then increase RF back to 3, run repair, then Decom the extra
> instances in a and b?
>
>
> Fair point.  I was considering staying at RF two but I think with your
points below, I should reconsider.


> > - run repairs/cleanup across the cluster
> >
> > Would this work with minimal/no disruption?
>
> The big risk of running rf=2 is that quorum==all - any gc pause or node
> restarting will make you lose HA or strong consistency guarantees.
>
> > Should I update their "rack" before hand or after ?
>
> You can't change a node's rack once it's in the cluster, it SHOULD refuse
> to start if you do that
>
> Got it.


> > What else am I not thinking about?
> >
> > My main goal atm is to get back to where the cluster is in a clean
> consistent state that allows nodes to properly bootstrap.
> >
> > Thanks for your help in advance.
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: user-h...@cassandra.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>

Reply via email to