https://issues.apache.org/jira/browse/CASSANDRA-1676
you have to use at least 0.6.7 On Wed, Jan 5, 2011 at 4:19 PM, Edward Capriolo <edlinuxg...@gmail.com>wrote: > On Wed, Jan 5, 2011 at 10:05 AM, Ran Tavory <ran...@gmail.com> wrote: > > In storage-conf I see this comment [1] from which I understand that the > > recommended way to bootstrap a new node is to set AutoBootstrap=true and > > remove itself from the seeds list. > > Moreover, I did try to set AutoBootstrap=true and have the node in its > own > > seeds list, but it would not bootstrap. I don't recall the exact message > but > > it was something like "I found myself in the seeds list therefore I'm not > > going to bootstrap even though AutoBootstrap is true". > > > > [1] > > <!-- > > ~ Turn on to make new [non-seed] nodes automatically migrate the right > > data > > ~ to themselves. (If no InitialToken is specified, they will pick one > > ~ such that they will get half the range of the most-loaded node.) > > ~ If a node starts up without bootstrapping, it will mark itself > > bootstrapped > > ~ so that you can't subsequently accidently bootstrap a node with > > ~ data on it. (You can reset this by wiping your data and commitlog > > ~ directories.) > > ~ > > ~ Off by default so that new clusters and upgraders from 0.4 don't > > ~ bootstrap immediately. You should turn this on when you start > adding > > ~ new nodes to a cluster that already has data on it. (If you are > > upgrading > > ~ from 0.4, start your cluster with it off once before changing it to > > true. > > ~ Otherwise, no data will be lost but you will incur a lot of > unnecessary > > ~ I/O before your cluster starts up.) > > --> > > <AutoBootstrap>false</AutoBootstrap> > > On Wed, Jan 5, 2011 at 4:58 PM, David Boxenhorn <da...@lookin2.com> > wrote: > >> > >> If "seed list should be the same across the cluster" that means that > nodes > >> *should* have themselves as a seed. If that doesn't work for Ran, then > that > >> is the first problem, no? > >> > >> > >> On Wed, Jan 5, 2011 at 3:56 PM, Jake Luciani <jak...@gmail.com> wrote: > >>> > >>> Well your ring issues don't make sense to me, seed list should be the > >>> same across the cluster. > >>> I'm just thinking of other things to try, non-boostrapped nodes should > >>> join the ring instantly but reads will fail if you aren't using quorum. > >>> > >>> On Wed, Jan 5, 2011 at 8:51 AM, Ran Tavory <ran...@gmail.com> wrote: > >>>> > >>>> I haven't tried repair. Should I? > >>>> > >>>> On Jan 5, 2011 3:48 PM, "Jake Luciani" <jak...@gmail.com> wrote: > >>>> > Have you tried not bootstrapping but setting the token and manually > >>>> > calling > >>>> > repair? > >>>> > > >>>> > On Wed, Jan 5, 2011 at 7:07 AM, Ran Tavory <ran...@gmail.com> > wrote: > >>>> > > >>>> >> My conclusion is lame: I tried this on several hosts and saw the > same > >>>> >> behavior, the only way I was able to join new nodes was to first > >>>> >> start them > >>>> >> when they are *not in* their own seeds list and after they > >>>> >> finish transferring the data, then restart them with themselves > *in* > >>>> >> their > >>>> >> own seeds list. After doing that the node would join the ring. > >>>> >> This is either my misunderstanding or a bug, but the only place I > >>>> >> found it > >>>> >> documented stated that the new node should not be in its own seeds > >>>> >> list. > >>>> >> Version 0.6.6. > >>>> >> > >>>> >> On Wed, Jan 5, 2011 at 10:35 AM, David Boxenhorn > >>>> >> <da...@lookin2.com>wrote: > >>>> >> > >>>> >>> My nodes all have themselves in their list of seeds - always did - > >>>> >>> and > >>>> >>> everything works. (You may ask why I did this. I don't know, I > must > >>>> >>> have > >>>> >>> copied it from an example somewhere.) > >>>> >>> > >>>> >>> On Wed, Jan 5, 2011 at 9:42 AM, Ran Tavory <ran...@gmail.com> > wrote: > >>>> >>> > >>>> >>>> I was able to make the node join the ring but I'm confused. > >>>> >>>> What I did is, first when adding the node, this node was not in > the > >>>> >>>> seeds > >>>> >>>> list of itself. AFAIK this is how it's supposed to be. So it was > >>>> >>>> able to > >>>> >>>> transfer all data to itself from other nodes but then it stayed > in > >>>> >>>> the > >>>> >>>> bootstrapping state. > >>>> >>>> So what I did (and I don't know why it works), is add this node > to > >>>> >>>> the > >>>> >>>> seeds list in its own storage-conf.xml file. Then restart the > >>>> >>>> server and > >>>> >>>> then I finally see it in the ring... > >>>> >>>> If I had added the node to the seeds list of itself when first > >>>> >>>> joining > >>>> >>>> it, it would not join the ring but if I do it in two phases it > did > >>>> >>>> work. > >>>> >>>> So it's either my misunderstanding or a bug... > >>>> >>>> > >>>> >>>> > >>>> >>>> On Wed, Jan 5, 2011 at 7:14 AM, Ran Tavory <ran...@gmail.com> > >>>> >>>> wrote: > >>>> >>>> > >>>> >>>>> The new node does not see itself as part of the ring, it sees > all > >>>> >>>>> others > >>>> >>>>> but itself, so from that perspective the view is consistent. > >>>> >>>>> The only problem is that the node never finishes to bootstrap. > It > >>>> >>>>> stays > >>>> >>>>> in this state for hours (It's been 20 hours now...) > >>>> >>>>> > >>>> >>>>> > >>>> >>>>> $ bin/nodetool -p 9004 -h localhost streams > >>>> >>>>>> Mode: Bootstrapping > >>>> >>>>>> Not sending any streams. > >>>> >>>>>> Not receiving any streams. > >>>> >>>>> > >>>> >>>>> > >>>> >>>>> On Wed, Jan 5, 2011 at 1:20 AM, Nate McCall <n...@riptano.com> > >>>> >>>>> wrote: > >>>> >>>>> > >>>> >>>>>> Does the new node have itself in the list of seeds per chance? > >>>> >>>>>> This > >>>> >>>>>> could cause some issues if so. > >>>> >>>>>> > >>>> >>>>>> On Tue, Jan 4, 2011 at 4:10 PM, Ran Tavory <ran...@gmail.com> > >>>> >>>>>> wrote: > >>>> >>>>>> > I'm still at lost. I haven't been able to resolve this. I > tried > >>>> >>>>>> > adding another node at a different location on the ring but > >>>> >>>>>> > this node > >>>> >>>>>> > too remains stuck in the bootstrapping state for many hours > >>>> >>>>>> > without > >>>> >>>>>> > any of the other nodes being busy with anti compaction or > >>>> >>>>>> > anything > >>>> >>>>>> > else. I don't know what's keeping it from finishing the > >>>> >>>>>> > bootstrap,no > >>>> >>>>>> > CPU, no io, files were already streamed so what is it waiting > >>>> >>>>>> > for? > >>>> >>>>>> > I read the release notes of 0.6.7 and 0.6.8 and there didn't > >>>> >>>>>> > seem to > >>>> >>>>>> > be anything addressing a similar issue so I figured there was > >>>> >>>>>> > no > >>>> >>>>>> point > >>>> >>>>>> > in upgrading. But let me know if you think there is. > >>>> >>>>>> > Or any other advice... > >>>> >>>>>> > > >>>> >>>>>> > On Tuesday, January 4, 2011, Ran Tavory <ran...@gmail.com> > >>>> >>>>>> > wrote: > >>>> >>>>>> >> Thanks Jake, but unfortunately the streams directory is > empty > >>>> >>>>>> >> so I > >>>> >>>>>> don't think that any of the nodes is anti-compacting data right > >>>> >>>>>> now or had > >>>> >>>>>> been in the past 5 hours. It seems that all the data was > already > >>>> >>>>>> transferred > >>>> >>>>>> to the joining host but the joining node, after having received > >>>> >>>>>> the data > >>>> >>>>>> would still remain in bootstrapping mode and not join the > >>>> >>>>>> cluster. I'm not > >>>> >>>>>> sure that *all* data was transferred (perhaps other nodes need > to > >>>> >>>>>> transfer > >>>> >>>>>> more data) but nothing is actually happening so I assume all > has > >>>> >>>>>> been moved. > >>>> >>>>>> >> Perhaps it's a configuration error from my part. Should I > use > >>>> >>>>>> >> I use > >>>> >>>>>> AutoBootstrap=true ? Anything else I should look out for in the > >>>> >>>>>> configuration file or something else? > >>>> >>>>>> >> > >>>> >>>>>> >> > >>>> >>>>>> >> On Tue, Jan 4, 2011 at 4:08 PM, Jake Luciani > >>>> >>>>>> >> <jak...@gmail.com> > >>>> >>>>>> wrote: > >>>> >>>>>> >> > >>>> >>>>>> >> In 0.6, locate the node doing anti-compaction and look in > the > >>>> >>>>>> "streams" subdirectory in the keyspace data dir to monitor the > >>>> >>>>>> anti-compaction progress (it puts new SSTables for > bootstrapping > >>>> >>>>>> node in > >>>> >>>>>> there) > >>>> >>>>>> >> > >>>> >>>>>> >> > >>>> >>>>>> >> On Tue, Jan 4, 2011 at 8:01 AM, Ran Tavory < > ran...@gmail.com> > >>>> >>>>>> wrote: > >>>> >>>>>> >> > >>>> >>>>>> >> > >>>> >>>>>> >> Running nodetool decommission didn't help. Actually the node > >>>> >>>>>> >> refused > >>>> >>>>>> to decommission itself (b/c it wasn't part of the ring). So I > >>>> >>>>>> simply stopped > >>>> >>>>>> the process, deleted all the data directories and started it > >>>> >>>>>> again. It > >>>> >>>>>> worked in the sense of the node bootstrapped again but as > before, > >>>> >>>>>> after it > >>>> >>>>>> had finished moving the data nothing happened for a long time > >>>> >>>>>> (I'm still > >>>> >>>>>> waiting, but nothing seems to be happening). > >>>> >>>>>> >> > >>>> >>>>>> >> > >>>> >>>>>> >> > >>>> >>>>>> >> > >>>> >>>>>> >> Any hints how to analyze a "stuck" bootstrapping > node??thanks > >>>> >>>>>> >> On Tue, Jan 4, 2011 at 1:51 PM, Ran Tavory < > ran...@gmail.com> > >>>> >>>>>> wrote: > >>>> >>>>>> >> Thanks Shimi, so indeed anticompaction was run on one of the > >>>> >>>>>> >> other > >>>> >>>>>> nodes from the same DC but to my understanding it has already > >>>> >>>>>> ended. A few > >>>> >>>>>> hour ago... > >>>> >>>>>> >> > >>>> >>>>>> >> > >>>> >>>>>> >> > >>>> >>>>>> >> I plenty of log messages such as [1] which ended a couple of > >>>> >>>>>> >> hours > >>>> >>>>>> ago, and I've seen the new node streaming and accepting the > data > >>>> >>>>>> from the > >>>> >>>>>> node which performed the anticompaction and so far it was > normal > >>>> >>>>>> so it > >>>> >>>>>> seemed that data is at its right place. But now the new node > >>>> >>>>>> seems sort of > >>>> >>>>>> stuck. None of the other nodes is anticompacting right now or > had > >>>> >>>>>> been > >>>> >>>>>> anticompacting since then. > >>>> >>>>>> >> > >>>> >>>>>> >> > >>>> >>>>>> >> > >>>> >>>>>> >> > >>>> >>>>>> >> The new node's CPU is close to zero, it's iostats are almost > >>>> >>>>>> >> zero so > >>>> >>>>>> I can't find another bottleneck that would keep it hanging. > >>>> >>>>>> >> On the IRC someone suggested I'd maybe retry to join this > >>>> >>>>>> >> node, > >>>> >>>>>> e.g. decommission and rejoin it again. I'll try it now... > >>>> >>>>>> >> > >>>> >>>>>> >> > >>>> >>>>>> >> > >>>> >>>>>> >> > >>>> >>>>>> >> > >>>> >>>>>> >> > >>>> >>>>>> >> [1] INFO [COMPACTION-POOL:1] 2011-01-04 04:04:09,721 > >>>> >>>>>> CompactionManager.java (line 338) AntiCompacting > >>>> >>>>>> > >>>> >>>>>> > [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')] > >>>> >>>>>> >> > >>>> >>>>>> >> > >>>> >>>>>> >> > >>>> >>>>>> >> > >>>> >>>>>> >> INFO [COMPACTION-POOL:1] 2011-01-04 04:34:18,683 > >>>> >>>>>> CompactionManager.java (line 338) AntiCompacting > >>>> >>>>>> > >>>> >>>>>> > [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3874-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3873-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3876-Data.db')] > >>>> >>>>>> >> > >>>> >>>>>> >> > >>>> >>>>>> >> > >>>> >>>>>> >> > >>>> >>>>>> >> INFO [COMPACTION-POOL:1] 2011-01-04 04:34:19,132 > >>>> >>>>>> CompactionManager.java (line 338) AntiCompacting > >>>> >>>>>> > >>>> >>>>>> > [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-951-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-976-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-978-Data.db')] > >>>> >>>>>> >> > >>>> >>>>>> >> > >>>> >>>>>> >> > >>>> >>>>>> >> > >>>> >>>>>> >> INFO [COMPACTION-POOL:1] 2011-01-04 04:34:26,486 > >>>> >>>>>> CompactionManager.java (line 338) AntiCompacting > >>>> >>>>>> > >>>> >>>>>> > [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')] > >>>> >>>>>> >> > >>>> >>>>>> >> > >>>> >>>>>> >> > >>>> >>>>>> >> > >>>> >>>>>> >> > >>>> >>>>>> >> On Tue, Jan 4, 2011 at 12:45 PM, shimi <shim...@gmail.com> > >>>> >>>>>> >> wrote: > >>>> >>>>>> >> > >>>> >>>>>> >> > >>>> >>>>>> >> > >>>> >>>>>> >> > >>>> >>>>>> >> > >>>> >>>>>> >> In my experience most of the time it takes for a node to > join > >>>> >>>>>> >> the > >>>> >>>>>> cluster is the anticompaction on the other nodes. The streaming > >>>> >>>>>> part is very > >>>> >>>>>> fast. > >>>> >>>>>> >> Check the other nodes logs to see if there is any node doing > >>>> >>>>>> anticompaction.I don't remember how much data I had in the > >>>> >>>>>> cluster when I > >>>> >>>>>> needed to add/remove nodes. I do remember that it took a few > >>>> >>>>>> hours. > >>>> >>>>>> >> > >>>> >>>>>> >> > >>>> >>>>>> >> > >>>> >>>>>> >> > >>>> >>>>>> >> > >>>> >>>>>> >> > >>>> >>>>>> >> The node will join the ring only when it will finish the > >>>> >>>>>> >> bootstrap. > >>>> >>>>>> >> -- > >>>> >>>>>> >> /Ran > >>>> >>>>>> >> > >>>> >>>>>> >> > >>>> >>>>>> > > >>>> >>>>>> > -- > >>>> >>>>>> > /Ran > >>>> >>>>>> > > >>>> >>>>>> > >>>> >>>>> > >>>> >>>>> > >>>> >>>>> > >>>> >>>>> -- > >>>> >>>>> /Ran > >>>> >>>>> > >>>> >>>>> > >>>> >>>> > >>>> >>>> > >>>> >>>> -- > >>>> >>>> /Ran > >>>> >>>> > >>>> >>>> > >>>> >>> > >>>> >> > >>>> >> > >>>> >> -- > >>>> >> /Ran > >>>> >> > >>>> >> > >>> > >> > > > > > > > > -- > > /Ran > > > > If non-auto-bootstrap nodes to not join they check to make sure good > old iptables is not on. > > Edward >