Re: Bootstrapping taking long

Thibaut Britz Wed, 05 Jan 2011 07:23:46 -0800

https://issues.apache.org/jira/browse/CASSANDRA-1676


you have to use at least 0.6.7


On Wed, Jan 5, 2011 at 4:19 PM, Edward Capriolo <edlinuxg...@gmail.com>wrote:

> On Wed, Jan 5, 2011 at 10:05 AM, Ran Tavory <ran...@gmail.com> wrote:
> > In storage-conf I see this comment [1] from which I understand that the
> > recommended way to bootstrap a new node is to set AutoBootstrap=true and
> > remove itself from the seeds list.
> > Moreover, I did try to set AutoBootstrap=true and have the node in its
> own
> > seeds list, but it would not bootstrap. I don't recall the exact message
> but
> > it was something like "I found myself in the seeds list therefore I'm not
> > going to bootstrap even though AutoBootstrap is true".
> >
> > [1]
> >   <!--
> >    ~ Turn on to make new [non-seed] nodes automatically migrate the right
> > data
> >    ~ to themselves.  (If no InitialToken is specified, they will pick one
> >    ~ such that they will get half the range of the most-loaded node.)
> >    ~ If a node starts up without bootstrapping, it will mark itself
> > bootstrapped
> >    ~ so that you can't subsequently accidently bootstrap a node with
> >    ~ data on it.  (You can reset this by wiping your data and commitlog
> >    ~ directories.)
> >    ~
> >    ~ Off by default so that new clusters and upgraders from 0.4 don't
> >    ~ bootstrap immediately.  You should turn this on when you start
> adding
> >    ~ new nodes to a cluster that already has data on it.  (If you are
> > upgrading
> >    ~ from 0.4, start your cluster with it off once before changing it to
> > true.
> >    ~ Otherwise, no data will be lost but you will incur a lot of
> unnecessary
> >    ~ I/O before your cluster starts up.)
> >   -->
> >   <AutoBootstrap>false</AutoBootstrap>
> > On Wed, Jan 5, 2011 at 4:58 PM, David Boxenhorn <da...@lookin2.com>
> wrote:
> >>
> >> If "seed list should be the same across the cluster" that means that
> nodes
> >> *should* have themselves as a seed. If that doesn't work for Ran, then
> that
> >> is the first problem, no?
> >>
> >>
> >> On Wed, Jan 5, 2011 at 3:56 PM, Jake Luciani <jak...@gmail.com> wrote:
> >>>
> >>> Well your ring issues don't make sense to me, seed list should be the
> >>> same across the cluster.
> >>> I'm just thinking of other things to try, non-boostrapped nodes should
> >>> join the ring instantly but reads will fail if you aren't using quorum.
> >>>
> >>> On Wed, Jan 5, 2011 at 8:51 AM, Ran Tavory <ran...@gmail.com> wrote:
> >>>>
> >>>> I haven't tried repair.  Should I?
> >>>>
> >>>> On Jan 5, 2011 3:48 PM, "Jake Luciani" <jak...@gmail.com> wrote:
> >>>> > Have you tried not bootstrapping but setting the token and manually
> >>>> > calling
> >>>> > repair?
> >>>> >
> >>>> > On Wed, Jan 5, 2011 at 7:07 AM, Ran Tavory <ran...@gmail.com>
> wrote:
> >>>> >
> >>>> >> My conclusion is lame: I tried this on several hosts and saw the
> same
> >>>> >> behavior, the only way I was able to join new nodes was to first
> >>>> >> start them
> >>>> >> when they are *not in* their own seeds list and after they
> >>>> >> finish transferring the data, then restart them with themselves
> *in*
> >>>> >> their
> >>>> >> own seeds list. After doing that the node would join the ring.
> >>>> >> This is either my misunderstanding or a bug, but the only place I
> >>>> >> found it
> >>>> >> documented stated that the new node should not be in its own seeds
> >>>> >> list.
> >>>> >> Version 0.6.6.
> >>>> >>
> >>>> >> On Wed, Jan 5, 2011 at 10:35 AM, David Boxenhorn
> >>>> >> <da...@lookin2.com>wrote:
> >>>> >>
> >>>> >>> My nodes all have themselves in their list of seeds - always did -
> >>>> >>> and
> >>>> >>> everything works. (You may ask why I did this. I don't know, I
> must
> >>>> >>> have
> >>>> >>> copied it from an example somewhere.)
> >>>> >>>
> >>>> >>> On Wed, Jan 5, 2011 at 9:42 AM, Ran Tavory <ran...@gmail.com>
> wrote:
> >>>> >>>
> >>>> >>>> I was able to make the node join the ring but I'm confused.
> >>>> >>>> What I did is, first when adding the node, this node was not in
> the
> >>>> >>>> seeds
> >>>> >>>> list of itself. AFAIK this is how it's supposed to be. So it was
> >>>> >>>> able to
> >>>> >>>> transfer all data to itself from other nodes but then it stayed
> in
> >>>> >>>> the
> >>>> >>>> bootstrapping state.
> >>>> >>>> So what I did (and I don't know why it works), is add this node
> to
> >>>> >>>> the
> >>>> >>>> seeds list in its own storage-conf.xml file. Then restart the
> >>>> >>>> server and
> >>>> >>>> then I finally see it in the ring...
> >>>> >>>> If I had added the node to the seeds list of itself when first
> >>>> >>>> joining
> >>>> >>>> it, it would not join the ring but if I do it in two phases it
> did
> >>>> >>>> work.
> >>>> >>>> So it's either my misunderstanding or a bug...
> >>>> >>>>
> >>>> >>>>
> >>>> >>>> On Wed, Jan 5, 2011 at 7:14 AM, Ran Tavory <ran...@gmail.com>
> >>>> >>>> wrote:
> >>>> >>>>
> >>>> >>>>> The new node does not see itself as part of the ring, it sees
> all
> >>>> >>>>> others
> >>>> >>>>> but itself, so from that perspective the view is consistent.
> >>>> >>>>> The only problem is that the node never finishes to bootstrap.
> It
> >>>> >>>>> stays
> >>>> >>>>> in this state for hours (It's been 20 hours now...)
> >>>> >>>>>
> >>>> >>>>>
> >>>> >>>>> $ bin/nodetool -p 9004 -h localhost streams
> >>>> >>>>>> Mode: Bootstrapping
> >>>> >>>>>> Not sending any streams.
> >>>> >>>>>> Not receiving any streams.
> >>>> >>>>>
> >>>> >>>>>
> >>>> >>>>> On Wed, Jan 5, 2011 at 1:20 AM, Nate McCall <n...@riptano.com>
> >>>> >>>>> wrote:
> >>>> >>>>>
> >>>> >>>>>> Does the new node have itself in the list of seeds per chance?
> >>>> >>>>>> This
> >>>> >>>>>> could cause some issues if so.
> >>>> >>>>>>
> >>>> >>>>>> On Tue, Jan 4, 2011 at 4:10 PM, Ran Tavory <ran...@gmail.com>
> >>>> >>>>>> wrote:
> >>>> >>>>>> > I'm still at lost. I haven't been able to resolve this. I
> tried
> >>>> >>>>>> > adding another node at a different location on the ring but
> >>>> >>>>>> > this node
> >>>> >>>>>> > too remains stuck in the bootstrapping state for many hours
> >>>> >>>>>> > without
> >>>> >>>>>> > any of the other nodes being busy with anti compaction or
> >>>> >>>>>> > anything
> >>>> >>>>>> > else. I don't know what's keeping it from finishing the
> >>>> >>>>>> > bootstrap,no
> >>>> >>>>>> > CPU, no io, files were already streamed so what is it waiting
> >>>> >>>>>> > for?
> >>>> >>>>>> > I read the release notes of 0.6.7 and 0.6.8 and there didn't
> >>>> >>>>>> > seem to
> >>>> >>>>>> > be anything addressing a similar issue so I figured there was
> >>>> >>>>>> > no
> >>>> >>>>>> point
> >>>> >>>>>> > in upgrading. But let me know if you think there is.
> >>>> >>>>>> > Or any other advice...
> >>>> >>>>>> >
> >>>> >>>>>> > On Tuesday, January 4, 2011, Ran Tavory <ran...@gmail.com>
> >>>> >>>>>> > wrote:
> >>>> >>>>>> >> Thanks Jake, but unfortunately the streams directory is
> empty
> >>>> >>>>>> >> so I
> >>>> >>>>>> don't think that any of the nodes is anti-compacting data right
> >>>> >>>>>> now or had
> >>>> >>>>>> been in the past 5 hours. It seems that all the data was
> already
> >>>> >>>>>> transferred
> >>>> >>>>>> to the joining host but the joining node, after having received
> >>>> >>>>>> the data
> >>>> >>>>>> would still remain in bootstrapping mode and not join the
> >>>> >>>>>> cluster. I'm not
> >>>> >>>>>> sure that *all* data was transferred (perhaps other nodes need
> to
> >>>> >>>>>> transfer
> >>>> >>>>>> more data) but nothing is actually happening so I assume all
> has
> >>>> >>>>>> been moved.
> >>>> >>>>>> >> Perhaps it's a configuration error from my part. Should I
> use
> >>>> >>>>>> >> I use
> >>>> >>>>>> AutoBootstrap=true ? Anything else I should look out for in the
> >>>> >>>>>> configuration file or something else?
> >>>> >>>>>> >>
> >>>> >>>>>> >>
> >>>> >>>>>> >> On Tue, Jan 4, 2011 at 4:08 PM, Jake Luciani
> >>>> >>>>>> >> <jak...@gmail.com>
> >>>> >>>>>> wrote:
> >>>> >>>>>> >>
> >>>> >>>>>> >> In 0.6, locate the node doing anti-compaction and look in
> the
> >>>> >>>>>> "streams" subdirectory in the keyspace data dir to monitor the
> >>>> >>>>>> anti-compaction progress (it puts new SSTables for
> bootstrapping
> >>>> >>>>>> node in
> >>>> >>>>>> there)
> >>>> >>>>>> >>
> >>>> >>>>>> >>
> >>>> >>>>>> >> On Tue, Jan 4, 2011 at 8:01 AM, Ran Tavory <
> ran...@gmail.com>
> >>>> >>>>>> wrote:
> >>>> >>>>>> >>
> >>>> >>>>>> >>
> >>>> >>>>>> >> Running nodetool decommission didn't help. Actually the node
> >>>> >>>>>> >> refused
> >>>> >>>>>> to decommission itself (b/c it wasn't part of the ring). So I
> >>>> >>>>>> simply stopped
> >>>> >>>>>> the process, deleted all the data directories and started it
> >>>> >>>>>> again. It
> >>>> >>>>>> worked in the sense of the node bootstrapped again but as
> before,
> >>>> >>>>>> after it
> >>>> >>>>>> had finished moving the data nothing happened for a long time
> >>>> >>>>>> (I'm still
> >>>> >>>>>> waiting, but nothing seems to be happening).
> >>>> >>>>>> >>
> >>>> >>>>>> >>
> >>>> >>>>>> >>
> >>>> >>>>>> >>
> >>>> >>>>>> >> Any hints how to analyze a "stuck" bootstrapping
> node??thanks
> >>>> >>>>>> >> On Tue, Jan 4, 2011 at 1:51 PM, Ran Tavory <
> ran...@gmail.com>
> >>>> >>>>>> wrote:
> >>>> >>>>>> >> Thanks Shimi, so indeed anticompaction was run on one of the
> >>>> >>>>>> >> other
> >>>> >>>>>> nodes from the same DC but to my understanding it has already
> >>>> >>>>>> ended. A few
> >>>> >>>>>> hour ago...
> >>>> >>>>>> >>
> >>>> >>>>>> >>
> >>>> >>>>>> >>
> >>>> >>>>>> >> I plenty of log messages such as [1] which ended a couple of
> >>>> >>>>>> >> hours
> >>>> >>>>>> ago, and I've seen the new node streaming and accepting the
> data
> >>>> >>>>>> from the
> >>>> >>>>>> node which performed the anticompaction and so far it was
> normal
> >>>> >>>>>> so it
> >>>> >>>>>> seemed that data is at its right place. But now the new node
> >>>> >>>>>> seems sort of
> >>>> >>>>>> stuck. None of the other nodes is anticompacting right now or
> had
> >>>> >>>>>> been
> >>>> >>>>>> anticompacting since then.
> >>>> >>>>>> >>
> >>>> >>>>>> >>
> >>>> >>>>>> >>
> >>>> >>>>>> >>
> >>>> >>>>>> >> The new node's CPU is close to zero, it's iostats are almost
> >>>> >>>>>> >> zero so
> >>>> >>>>>> I can't find another bottleneck that would keep it hanging.
> >>>> >>>>>> >> On the IRC someone suggested I'd maybe retry to join this
> >>>> >>>>>> >> node,
> >>>> >>>>>> e.g. decommission and rejoin it again. I'll try it now...
> >>>> >>>>>> >>
> >>>> >>>>>> >>
> >>>> >>>>>> >>
> >>>> >>>>>> >>
> >>>> >>>>>> >>
> >>>> >>>>>> >>
> >>>> >>>>>> >> [1] INFO [COMPACTION-POOL:1] 2011-01-04 04:04:09,721
> >>>> >>>>>> CompactionManager.java (line 338) AntiCompacting
> >>>> >>>>>>
> >>>> >>>>>>
> [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')]
> >>>> >>>>>> >>
> >>>> >>>>>> >>
> >>>> >>>>>> >>
> >>>> >>>>>> >>
> >>>> >>>>>> >> INFO [COMPACTION-POOL:1] 2011-01-04 04:34:18,683
> >>>> >>>>>> CompactionManager.java (line 338) AntiCompacting
> >>>> >>>>>>
> >>>> >>>>>>
> [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3874-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3873-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3876-Data.db')]
> >>>> >>>>>> >>
> >>>> >>>>>> >>
> >>>> >>>>>> >>
> >>>> >>>>>> >>
> >>>> >>>>>> >> INFO [COMPACTION-POOL:1] 2011-01-04 04:34:19,132
> >>>> >>>>>> CompactionManager.java (line 338) AntiCompacting
> >>>> >>>>>>
> >>>> >>>>>>
> [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-951-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-976-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-978-Data.db')]
> >>>> >>>>>> >>
> >>>> >>>>>> >>
> >>>> >>>>>> >>
> >>>> >>>>>> >>
> >>>> >>>>>> >> INFO [COMPACTION-POOL:1] 2011-01-04 04:34:26,486
> >>>> >>>>>> CompactionManager.java (line 338) AntiCompacting
> >>>> >>>>>>
> >>>> >>>>>>
> [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')]
> >>>> >>>>>> >>
> >>>> >>>>>> >>
> >>>> >>>>>> >>
> >>>> >>>>>> >>
> >>>> >>>>>> >>
> >>>> >>>>>> >> On Tue, Jan 4, 2011 at 12:45 PM, shimi <shim...@gmail.com>
> >>>> >>>>>> >> wrote:
> >>>> >>>>>> >>
> >>>> >>>>>> >>
> >>>> >>>>>> >>
> >>>> >>>>>> >>
> >>>> >>>>>> >>
> >>>> >>>>>> >> In my experience most of the time it takes for a node to
> join
> >>>> >>>>>> >> the
> >>>> >>>>>> cluster is the anticompaction on the other nodes. The streaming
> >>>> >>>>>> part is very
> >>>> >>>>>> fast.
> >>>> >>>>>> >> Check the other nodes logs to see if there is any node doing
> >>>> >>>>>> anticompaction.I don't remember how much data I had in the
> >>>> >>>>>> cluster when I
> >>>> >>>>>> needed to add/remove nodes. I do remember that it took a few
> >>>> >>>>>> hours.
> >>>> >>>>>> >>
> >>>> >>>>>> >>
> >>>> >>>>>> >>
> >>>> >>>>>> >>
> >>>> >>>>>> >>
> >>>> >>>>>> >>
> >>>> >>>>>> >> The node will join the ring only when it will finish the
> >>>> >>>>>> >> bootstrap.
> >>>> >>>>>> >> --
> >>>> >>>>>> >> /Ran
> >>>> >>>>>> >>
> >>>> >>>>>> >>
> >>>> >>>>>> >
> >>>> >>>>>> > --
> >>>> >>>>>> > /Ran
> >>>> >>>>>> >
> >>>> >>>>>>
> >>>> >>>>>
> >>>> >>>>>
> >>>> >>>>>
> >>>> >>>>> --
> >>>> >>>>> /Ran
> >>>> >>>>>
> >>>> >>>>>
> >>>> >>>>
> >>>> >>>>
> >>>> >>>> --
> >>>> >>>> /Ran
> >>>> >>>>
> >>>> >>>>
> >>>> >>>
> >>>> >>
> >>>> >>
> >>>> >> --
> >>>> >> /Ran
> >>>> >>
> >>>> >>
> >>>
> >>
> >
> >
> >
> > --
> > /Ran
> >
>
> If non-auto-bootstrap nodes to not join they check to make sure good
> old iptables is not on.
>
> Edward
>

Re: Bootstrapping taking long

Reply via email to