Re: Cluster Maintenance Mishap

2016-10-21 Thread Branton Davis
Thanks.  Unfortunately, we lost our system logs during all of this
(had normal logs, but not system) due to an unrelated issue :/

Anyhow, as far as I can tell, we're doing okay.

On Thu, Oct 20, 2016 at 11:18 PM, Jeremiah D Jordan <
jeremiah.jor...@gmail.com> wrote:

> The easiest way to figure out what happened is to examine the system log.
> It will tell you what happened.  But I’m pretty sure your nodes got new
> tokens during that time.
>
> If you want to get back the data inserted during the 2 hours you could use
> sstableloader to send all the data from the 
> /var/data/cassandra_new/cassandra/*
> folders back into the cluster if you still have it.
>
> -Jeremiah
>
>
>
> On Oct 20, 2016, at 3:58 PM, Branton Davis <branton.da...@spanning.com>
> wrote:
>
> Howdy folks.  I asked some about this in IRC yesterday, but we're looking
> to hopefully confirm a couple of things for our sanity.
>
> Yesterday, I was performing an operation on a 21-node cluster (vnodes,
> replication factor 3, NetworkTopologyStrategy, and the nodes are balanced
> across 3 AZs on AWS EC2).  The plan was to swap each node's existing 1TB
> volume (where all cassandra data, including the commitlog, is stored) with
> a 2TB volume.  The plan for each node (one at a time) was basically:
>
>- rsync while the node is live (repeated until there were only minor
>differences from new data)
>- stop cassandra on the node
>- rsync again
>- replace the old volume with the new
>- start cassandra
>
> However, there was a bug in the rsync command.  Instead of copying the
> contents of /var/data/cassandra to /var/data/cassandra_new, it copied it to
> /var/data/cassandra_new/cassandra.  So, when cassandra was started after
> the volume swap, there was some behavior that was similar to bootstrapping
> a new node (data started streaming in from other nodes).  But there
> was also some behavior that was similar to a node replacement (nodetool
> status showed the same IP address, but a different host ID).  This
> happened with 3 nodes (one from each AZ).  The nodes had received 1.4GB,
> 1.2GB, and 0.6GB of data (whereas the normal load for a node is around
> 500-600GB).
>
> The cluster was in this state for about 2 hours, at which point cassandra
> was stopped on them.  Later, I moved the data from the original volumes
> back into place (so, should be the original state before the operation) and
> started cassandra back up.
>
> Finally, the questions.  We've accepted the potential loss of new data
> within the two hours, but our primary concern now is what was happening
> with the bootstrapping nodes.  Would they have taken on the token ranges
> of the original nodes or acted like new nodes and got new token ranges?  If
> the latter, is it possible that any data moved from the healthy nodes to
> the "new" nodes or would restarting them with the original data (and
> repairing) put the cluster's token ranges back into a normal state?
>
> Hopefully that was all clear.  Thanks in advance for any info!
>
>
>


Re: Cluster Maintenance Mishap

2016-10-21 Thread Branton Davis
It mostly seems so.  The thing that bugs me is that some things acted
like they weren't joining as a normal new node.  For example, I forgot
to mention until I read your comment, that the instances showed as UN
(up, normal) instead of UJ (up, joining) while they were
apparently bootstrapping.

Thanks for the assurance.  I'm thinking (hoping) that we're good.

On Thu, Oct 20, 2016 at 11:24 PM, kurt Greaves <k...@instaclustr.com> wrote:

>
> On 20 October 2016 at 20:58, Branton Davis <branton.da...@spanning.com>
> wrote:
>
>> Would they have taken on the token ranges of the original nodes or acted
>> like new nodes and got new token ranges?  If the latter, is it possible
>> that any data moved from the healthy nodes to the "new" nodes or
>> would restarting them with the original data (and repairing) put
>> the cluster's token ranges back into a normal state?
>
>
> It sounds like you stopped them before they completed joining. So you
> should have nothing to worry about. If not, you will see them marked as DN
> from other nodes in the cluster. If you did, they wouldn't have assumed the
> token ranges and you shouldn't have any issues.
>
> You can just copy the original data back (including system tables) and
> they should assume their own ranges again, and then you can repair to fix
> any missing replicas.
>
> Kurt Greaves
> k...@instaclustr.com
> www.instaclustr.com
>


Re: Cluster Maintenance Mishap

2016-10-20 Thread Branton Davis
I guess I'm either not understanding how that answers the question
and/or I've just a done a terrible job at asking it.  I'll sleep on it and
maybe I'll think of a better way to describe it tomorrow ;)

On Thu, Oct 20, 2016 at 8:45 PM, Yabin Meng <yabinm...@gmail.com> wrote:

> I believe you're using VNodes (because token range change doesn't make
> sense for single-token setup unless you change it explicitly). If you
> bootstrap a new node with VNodes, I think the way that the token ranges are
> assigned to the node is random (I'm not 100% sure here, but should be so
> logically). If so, the ownership of the data that each node is responsible
> for will be changed. The part of the data that doesn't belong to the node
> under the new ownership, however, will still be kept on that node.
> Cassandra won't remove it automatically unless you run "nodetool cleanup".
> So to answer your question, I don't think the data have been moved away.
> More likely you have extra duplicate here :
>
> Yabin
>
> On Thu, Oct 20, 2016 at 6:41 PM, Branton Davis <branton.da...@spanning.com
> > wrote:
>
>> Thanks for the response, Yabin.  However, if there's an answer to my
>> question here, I'm apparently too dense to see it ;)
>>
>> I understand that, since the system keyspace data was not there, it
>> started bootstrapping.  What's not clear is if they took over the token
>> ranges of the previous nodes or got new token ranges.  I'm mainly
>> concerned about the latter.  We've got the nodes back in place with the
>> original data, but the fear is that some data may have been moved off of
>> other nodes.  I think that this is very unlikely, but I'm just looking for
>> confirmation.
>>
>>
>> On Thursday, October 20, 2016, Yabin Meng <yabinm...@gmail.com> wrote:
>>
>>> Most likely the issue is caused by the fact that when you move the data,
>>> you move the system keyspace data away as well. Meanwhile, due to the error
>>> of data being copied into a different location than what C* is expecting,
>>> when C* starts, it can not find the system metadata info and therefore
>>> tries to start as a fresh new node. If you keep keyspace data in the right
>>> place, you should see all old info. as expected.
>>>
>>> I've seen a few such occurrences from customers. As a best practice, I
>>> would always suggest to totally separate Cassandra application data
>>> directory from system keyspace directory (e.g. they don't share common
>>> parent folder, and such).
>>>
>>> Regards,
>>>
>>> Yabin
>>>
>>> On Thu, Oct 20, 2016 at 4:58 PM, Branton Davis <
>>> branton.da...@spanning.com> wrote:
>>>
>>>> Howdy folks.  I asked some about this in IRC yesterday, but we're
>>>> looking to hopefully confirm a couple of things for our sanity.
>>>>
>>>> Yesterday, I was performing an operation on a 21-node cluster (vnodes,
>>>> replication factor 3, NetworkTopologyStrategy, and the nodes are balanced
>>>> across 3 AZs on AWS EC2).  The plan was to swap each node's existing
>>>> 1TB volume (where all cassandra data, including the commitlog, is stored)
>>>> with a 2TB volume.  The plan for each node (one at a time) was
>>>> basically:
>>>>
>>>>- rsync while the node is live (repeated until there were
>>>>only minor differences from new data)
>>>>- stop cassandra on the node
>>>>- rsync again
>>>>- replace the old volume with the new
>>>>- start cassandra
>>>>
>>>> However, there was a bug in the rsync command.  Instead of copying the
>>>> contents of /var/data/cassandra to /var/data/cassandra_new, it copied it to
>>>> /var/data/cassandra_new/cassandra.  So, when cassandra was started
>>>> after the volume swap, there was some behavior that was similar to
>>>> bootstrapping a new node (data started streaming in from other nodes).
>>>>  But there was also some behavior that was similar to a node
>>>> replacement (nodetool status showed the same IP address, but a
>>>> different host ID).  This happened with 3 nodes (one from each AZ).  The
>>>> nodes had received 1.4GB, 1.2GB, and 0.6GB of data (whereas the normal load
>>>> for a node is around 500-600GB).
>>>>
>>>> The cluster was in this state for about 2 hours, at which
>>>> point cassandra was stopped on them.  Later, I moved the data from the
>>>> original volumes back into place (so, should be the original state before
>>>> the operation) and started cassandra back up.
>>>>
>>>> Finally, the questions.  We've accepted the potential loss of new data
>>>> within the two hours, but our primary concern now is what was happening
>>>> with the bootstrapping nodes.  Would they have taken on the token
>>>> ranges of the original nodes or acted like new nodes and got new token
>>>> ranges?  If the latter, is it possible that any data moved from the
>>>> healthy nodes to the "new" nodes or would restarting them with the original
>>>> data (and repairing) put the cluster's token ranges back into a normal
>>>> state?
>>>>
>>>> Hopefully that was all clear.  Thanks in advance for any info!
>>>>
>>>
>>>
>


Re: Cluster Maintenance Mishap

2016-10-20 Thread Branton Davis
Thanks for the response, Yabin.  However, if there's an answer to my
question here, I'm apparently too dense to see it ;)

I understand that, since the system keyspace data was not there, it started
bootstrapping.  What's not clear is if they took over the token ranges of
the previous nodes or got new token ranges.  I'm mainly concerned about the
latter.  We've got the nodes back in place with the original data, but the
fear is that some data may have been moved off of other nodes.  I think
that this is very unlikely, but I'm just looking for confirmation.

On Thursday, October 20, 2016, Yabin Meng <yabinm...@gmail.com> wrote:

> Most likely the issue is caused by the fact that when you move the data,
> you move the system keyspace data away as well. Meanwhile, due to the error
> of data being copied into a different location than what C* is expecting,
> when C* starts, it can not find the system metadata info and therefore
> tries to start as a fresh new node. If you keep keyspace data in the right
> place, you should see all old info. as expected.
>
> I've seen a few such occurrences from customers. As a best practice, I
> would always suggest to totally separate Cassandra application data
> directory from system keyspace directory (e.g. they don't share common
> parent folder, and such).
>
> Regards,
>
> Yabin
>
> On Thu, Oct 20, 2016 at 4:58 PM, Branton Davis <branton.da...@spanning.com
> <javascript:_e(%7B%7D,'cvml','branton.da...@spanning.com');>> wrote:
>
>> Howdy folks.  I asked some about this in IRC yesterday, but we're
>> looking to hopefully confirm a couple of things for our sanity.
>>
>> Yesterday, I was performing an operation on a 21-node cluster (vnodes,
>> replication factor 3, NetworkTopologyStrategy, and the nodes are balanced
>> across 3 AZs on AWS EC2).  The plan was to swap each node's existing 1TB
>> volume (where all cassandra data, including the commitlog, is stored) with
>> a 2TB volume.  The plan for each node (one at a time) was basically:
>>
>>- rsync while the node is live (repeated until there were only minor
>>differences from new data)
>>- stop cassandra on the node
>>- rsync again
>>- replace the old volume with the new
>>- start cassandra
>>
>> However, there was a bug in the rsync command.  Instead of copying the
>> contents of /var/data/cassandra to /var/data/cassandra_new, it copied it to
>> /var/data/cassandra_new/cassandra.  So, when cassandra was started after
>> the volume swap, there was some behavior that was similar to bootstrapping
>> a new node (data started streaming in from other nodes).  But there
>> was also some behavior that was similar to a node replacement (nodetool
>> status showed the same IP address, but a different host ID).  This
>> happened with 3 nodes (one from each AZ).  The nodes had received 1.4GB,
>> 1.2GB, and 0.6GB of data (whereas the normal load for a node is around
>> 500-600GB).
>>
>> The cluster was in this state for about 2 hours, at which point cassandra
>> was stopped on them.  Later, I moved the data from the original volumes
>> back into place (so, should be the original state before the operation) and
>> started cassandra back up.
>>
>> Finally, the questions.  We've accepted the potential loss of new data
>> within the two hours, but our primary concern now is what was happening
>> with the bootstrapping nodes.  Would they have taken on the token ranges
>> of the original nodes or acted like new nodes and got new token ranges?  If
>> the latter, is it possible that any data moved from the healthy nodes to
>> the "new" nodes or would restarting them with the original data (and
>> repairing) put the cluster's token ranges back into a normal state?
>>
>> Hopefully that was all clear.  Thanks in advance for any info!
>>
>
>


Cluster Maintenance Mishap

2016-10-20 Thread Branton Davis
Howdy folks.  I asked some about this in IRC yesterday, but we're looking
to hopefully confirm a couple of things for our sanity.

Yesterday, I was performing an operation on a 21-node cluster (vnodes,
replication factor 3, NetworkTopologyStrategy, and the nodes are balanced
across 3 AZs on AWS EC2).  The plan was to swap each node's existing 1TB
volume (where all cassandra data, including the commitlog, is stored) with
a 2TB volume.  The plan for each node (one at a time) was basically:

   - rsync while the node is live (repeated until there were only minor
   differences from new data)
   - stop cassandra on the node
   - rsync again
   - replace the old volume with the new
   - start cassandra

However, there was a bug in the rsync command.  Instead of copying the
contents of /var/data/cassandra to /var/data/cassandra_new, it copied it to
/var/data/cassandra_new/cassandra.  So, when cassandra was started after
the volume swap, there was some behavior that was similar to bootstrapping
a new node (data started streaming in from other nodes).  But there
was also some behavior that was similar to a node replacement (nodetool
status showed the same IP address, but a different host ID).  This happened
with 3 nodes (one from each AZ).  The nodes had received 1.4GB, 1.2GB, and
0.6GB of data (whereas the normal load for a node is around 500-600GB).

The cluster was in this state for about 2 hours, at which point cassandra
was stopped on them.  Later, I moved the data from the original volumes
back into place (so, should be the original state before the operation) and
started cassandra back up.

Finally, the questions.  We've accepted the potential loss of new data
within the two hours, but our primary concern now is what was happening
with the bootstrapping nodes.  Would they have taken on the token ranges of
the original nodes or acted like new nodes and got new token ranges?  If
the latter, is it possible that any data moved from the healthy nodes to
the "new" nodes or would restarting them with the original data (and
repairing) put the cluster's token ranges back into a normal state?

Hopefully that was all clear.  Thanks in advance for any info!


Re: Adding disk capacity to a running node

2016-10-17 Thread Branton Davis
I doubt that's true anymore.  EBS volumes, while previously discouraged,
are the most flexible way to go, and are very reliable.  You can attach,
detach, and snapshot them too.  If you don't need provisioned IOPS, the GP2
SSDs are more cost-effective and allow you to balance IOPS with cost.

On Mon, Oct 17, 2016 at 1:55 PM, Jonathan Haddad  wrote:

> Vladimir,
>
> *Most* people are running Cassandra are doing so using ephemeral disks.  
> Instances
> are not arbitrarily moved to different hosts.  Yes, instances can be shut
> down, but that's why you distribute across AZs.
>
> On Mon, Oct 17, 2016 at 11:48 AM Vladimir Yudovin 
> wrote:
>
>> It's extremely unreliable to use ephemeral (local) disks. Even if you
>> don't stop instance by yourself, it can be restarted on different server in
>> case of some hardware failure or AWS initiated update. So all node data
>> will be lost.
>>
>> Best regards, Vladimir Yudovin,
>>
>>
>> *Winguzone  - Hosted Cloud Cassandra on
>> Azure and SoftLayer.Launch your cluster in minutes.*
>>
>>
>>  On Mon, 17 Oct 2016 14:45:00 -0400*Seth Edwards > >* wrote 
>>
>> These are i2.2xlarge instances so the disks currently configured as
>> ephemeral dedicated disks.
>>
>> On Mon, Oct 17, 2016 at 11:34 AM, Laing, Michael <
>> michael.la...@nytimes.com> wrote:
>>
>> You could just expand the size of your ebs volume and extend the file
>> system. No data is lost - assuming you are running Linux.
>>
>>
>> On Monday, October 17, 2016, Seth Edwards  wrote:
>>
>> We're running 2.0.16. We're migrating to a new data model but we've had
>> an unexpected increase in write traffic that has caused us some capacity
>> issues when we encounter compactions. Our old data model is on STCS. We'd
>> like to add another ebs volume (we're on aws) to our JBOD config and
>> hopefully avoid any situation where we run out of disk space during a large
>> compaction. It appears that the behavior we are hoping to get is actually
>> undesirable and removed in 3.2. It still might be an option for us until we
>> can finish the migration.
>>
>> I'm not familiar with LVM so it may be a bit risky to try at this point.
>>
>> On Mon, Oct 17, 2016 at 9:42 AM, Yabin Meng  wrote:
>>
>> I assume you're talking about Cassandra JBOD (just a bunch of disk) setup
>> because you do mention it as adding it to the list of data directories. If
>> this is the case, you may run into issues, depending on your C* version.
>> Check this out: http://www.datastax.com/dev/blog/improving-jbod.
>>
>> Or another approach is to use LVM to manage multiple devices into a
>> single mount point. If you do so, from what Cassandra can see is just
>> simply increased disk storage space and there should should have no problem.
>>
>> Hope this helps,
>>
>> Yabin
>>
>> On Mon, Oct 17, 2016 at 11:54 AM, Vladimir Yudovin 
>> wrote:
>>
>>
>> Yes, Cassandra should keep percent of disk usage equal for all disk.
>> Compaction process and SSTable flushes will use new disk to distribute both
>> new and existing data.
>>
>> Best regards, Vladimir Yudovin,
>>
>> *Winguzone  - Hosted Cloud Cassandra on
>> Azure and SoftLayer.Launch your cluster in minutes.*
>>
>>
>>  On Mon, 17 Oct 2016 11:43:27 -0400*Seth Edwards *
>> wrote 
>>
>> We have a few nodes that are running out of disk capacity at the moment
>> and instead of adding more nodes to the cluster, we would like to add
>> another disk to the server and add it to the list of data directories. My
>> question, is, will Cassandra use the new disk for compactions on sstables
>> that already exist in the primary directory?
>>
>>
>>
>> Thanks!
>>
>>
>>
>>


Re: unsubscibe

2016-08-13 Thread Branton Davis
This may be a silly question, but has anyone considered making
the mailing list accept unsubscribe requests this way?  Or at least filter
them out and auto-respond with a message explaining how to unsubscribe?  Seems
like it should be pretty simple and would make it easier for folks to leave
and less noise for the rest of us.

On Sat, Aug 13, 2016 at 7:48 PM, James Carman 
wrote:

> I've never noticed the unsubscribe button. That may be a good idea,
> though, to support such a feature. Maybe we should poke infra!
>
> On Sat, Aug 13, 2016 at 8:46 PM Russell Bradberry 
> wrote:
>
>> While there is an informative email when you sign up, there are a couple
>> issues with it. First, as you said, people don't read instructions. Second,
>> people forget. I would have to search back to when I signed up for the list
>> in order to find how to unsubscribe. Lastly, gmail and many other apps
>> provide a button that says "unsubscribe" which does little more than send
>> an email to the list. The latter is the reason I am suggesting that this be
>> worked in as an additional way to unsubscribe from the list.
>>
>>
>>
>>
>>
>> On Sat, Aug 13, 2016 at 8:39 PM -0400, "James Carman" <
>> ja...@carmanconsulting.com> wrote:
>>
>> I see the confusion on quite a few lists around the organization. It's
>>> not rampant, but it does happen. Perhaps it would be a good idea to improve
>>> the communication somehow. When you first subscribe, you get a pretty
>>> informative email describing all of the things you can do via email, but
>>> who reads instructions?!?! :). I wonder if infra could force a footer on
>>> the emails or something.
>>>
>>> On Sat, Aug 13, 2016 at 8:35 PM Russell Bradberry 
>>> wrote:
>>>
 I think the overall issue here is that there are many apps that provide
 an "unsubscribe" button that automagically sends these emails.

 I think the best course of action would be to bring this up to the
 powers that be to possibly decide on supporting this functionality as a
 feature. This, of course, because this method of unsubscribing from lists
 is pretty much standard now.

 Im not sure the patronizing responses with Google links help at all.


 _
 From: James Carman 
 Sent: Saturday, August 13, 2016 8:24 PM
 Subject: Re: unsubscibe
 To: 



 Was the Google stuff really necessary? Couldn't you have just nicely
 told them how to unsubscribe?

 On Sat, Aug 13, 2016 at 7:52 PM Alain RODRIGUEZ 
 wrote:

> Hi,
>
> You did not unsubscribe yet.
>
> 'unsubscribe cassandra' in google search:
>
> Result 1: http://www.planetcassandra.org/apache-cassandra-mailing-
> lists/
> Result 2: http://mail-archives.apache.org/mod_mbox/cassandra-user/
> Result 3: http://cassandra.apache.org/
>
> Sending a message to user-unsubscr...@cassandra.apache.org, as
> mentioned everywhere, should work better and spam less people.
>
> Alain
>
> 2016-08-13 23:32 GMT+02:00 Lawrence Turcotte <
> lawrence.turco...@gmail.com>:
>
>>
>>
>




Re: Cassandra metrics custom reporter

2016-07-19 Thread Branton Davis
This isn't a direct answer to your question, but jolokia (
https://jolokia.org/) may be a useful alternative.  It runs as an agent
attached to your cassandra process and provides a REST API for JMX.

On Tue, Jul 19, 2016 at 11:19 AM, Ricardo Sancho 
wrote:

> Is anyone using a custom reporter to plugin to their own monitoring
> systems?
> ie one that does not use graphite or something for which a reporter
> exists.
>
> Any documentation on plugging this in?
> We have a custom reporter, it's been loaded and we see report method being
> called. But when we iterate the metrics provided as args to the report
> method are empty.
>
> Thanks
>


Re: Cassandra nodes reduce disks per node

2016-02-18 Thread Branton Davis
Jan, thanks!  That makes perfect sense to run a second time before stopping
cassandra.  I'll add that in when I do the production cluster.

On Fri, Feb 19, 2016 at 12:16 AM, Jan Kesten <j.kes...@enercast.de> wrote:

> Hi Branton,
>
> two cents from me - I didnt look through the script, but for the rsyncs I
> do pretty much the same when moving them. Since they are immutable I do a
> first sync while everything is up and running to the new location which
> runs really long. Meanwhile new ones are created and I sync them again
> online, much less files to copy now. After that I shutdown the node and my
> last rsync now has to copy only a few files which is quite fast and so the
> downtime for that node is within minutes.
>
> Jan
>
>
>
> Von meinem iPhone gesendet
>
> Am 18.02.2016 um 22:12 schrieb Branton Davis <branton.da...@spanning.com>:
>
> Alain, thanks for sharing!  I'm confused why you do so many repetitive
> rsyncs.  Just being cautious or is there another reason?  Also, why do you
> have --delete-before when you're copying data to a temp (assumed empty)
> directory?
>
> On Thu, Feb 18, 2016 at 4:12 AM, Alain RODRIGUEZ <arodr...@gmail.com>
> wrote:
>
>> I did the process a few weeks ago and ended up writing a runbook and a
>> script. I have anonymised and share it fwiw.
>>
>> https://github.com/arodrime/cassandra-tools/tree/master/remove_disk
>>
>> It is basic bash. I tried to have the shortest down time possible, making
>> this a bit more complex, but it allows you to do a lot in parallel and just
>> do a fast operation sequentially, reducing overall operation time.
>>
>> This worked fine for me, yet I might have make some errors while making
>> it configurable though variables. Be sure to be around if you decide to run
>> this. Also I automated this more by using knife (Chef), I hate to repeat
>> ops, this is something you might want to consider.
>>
>> Hope this is useful,
>>
>> C*heers,
>> -
>> Alain Rodriguez
>> France
>>
>> The Last Pickle
>> http://www.thelastpickle.com
>>
>> 2016-02-18 8:28 GMT+01:00 Anishek Agarwal <anis...@gmail.com>:
>>
>>> Hey Branton,
>>>
>>> Please do let us know if you face any problems  doing this.
>>>
>>> Thanks
>>> anishek
>>>
>>> On Thu, Feb 18, 2016 at 3:33 AM, Branton Davis <
>>> branton.da...@spanning.com> wrote:
>>>
>>>> We're about to do the same thing.  It shouldn't be necessary to shut
>>>> down the entire cluster, right?
>>>>
>>>> On Wed, Feb 17, 2016 at 12:45 PM, Robert Coli <rc...@eventbrite.com>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Tue, Feb 16, 2016 at 11:29 PM, Anishek Agarwal <anis...@gmail.com>
>>>>> wrote:
>>>>>>
>>>>>> To accomplish this can I just copy the data from disk1 to disk2 with
>>>>>> in the relevant cassandra home location folders, change the cassanda.yaml
>>>>>> configuration and restart the node. before starting i will shutdown the
>>>>>> cluster.
>>>>>>
>>>>>
>>>>> Yes.
>>>>>
>>>>> =Rob
>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>


Re: Cassandra nodes reduce disks per node

2016-02-18 Thread Branton Davis
Here's what I ended up doing on a test cluster.  It seemed to work well.
I'm running a full repair on the production cluster, probably over the
weekend, then I'll have a go at the test cluster again and go for broke.

# sync to temporary directory on original volume
rsync -azvuiP /var/data/cassandra_data2/ /var/data/cassandra/data2/

# check "before" size of data directory
du -sh /var/data/cassandra/data

# compare sizes
du -sh /var/data/cassandra_data2 && du -sh /var/data/cassandra/data2

service cassandra stop

# sync anything that changed before stop/drain completed
rsync -azvuiP /var/data/cassandra_data2/ /var/data/cassandra/data2/

# compare sizes
du -sh /var/data/cassandra_data2 && du -sh /var/data/cassandra/data2

# edit /usr/local/cassandra/conf/cassandra.yaml:
#  - remove /var/data/cassandra_data2 from data_file_directories

# sync files into real data directory
rsync -azvuiP /var/data/cassandra/data2/ /var/data/cassandra/data/

# check "after" size of data directory (should be size of
/var/data/cassandra_data2 plus "before" size)
du -sh /var/data/cassandra/data

# remove temporary directory
rm -Rf /var/data/cassandra/data2

# unmount second volume
umount /dev/xvdf

# In AWS console:
#  - detach sdf volume
#  - delete volume

# remove mount directory
rm -Rf /var/data/cassandra_data2/

# restart cassandra
service cassandra start

# run repair
/usr/local/cassandra/bin/nodetool repair -pr



On Thu, Feb 18, 2016 at 3:12 PM, Branton Davis <branton.da...@spanning.com>
wrote:

> Alain, thanks for sharing!  I'm confused why you do so many repetitive
> rsyncs.  Just being cautious or is there another reason?  Also, why do you
> have --delete-before when you're copying data to a temp (assumed empty)
> directory?
>
> On Thu, Feb 18, 2016 at 4:12 AM, Alain RODRIGUEZ <arodr...@gmail.com>
> wrote:
>
>> I did the process a few weeks ago and ended up writing a runbook and a
>> script. I have anonymised and share it fwiw.
>>
>> https://github.com/arodrime/cassandra-tools/tree/master/remove_disk
>>
>> It is basic bash. I tried to have the shortest down time possible, making
>> this a bit more complex, but it allows you to do a lot in parallel and just
>> do a fast operation sequentially, reducing overall operation time.
>>
>> This worked fine for me, yet I might have make some errors while making
>> it configurable though variables. Be sure to be around if you decide to run
>> this. Also I automated this more by using knife (Chef), I hate to repeat
>> ops, this is something you might want to consider.
>>
>> Hope this is useful,
>>
>> C*heers,
>> -
>> Alain Rodriguez
>> France
>>
>> The Last Pickle
>> http://www.thelastpickle.com
>>
>> 2016-02-18 8:28 GMT+01:00 Anishek Agarwal <anis...@gmail.com>:
>>
>>> Hey Branton,
>>>
>>> Please do let us know if you face any problems  doing this.
>>>
>>> Thanks
>>> anishek
>>>
>>> On Thu, Feb 18, 2016 at 3:33 AM, Branton Davis <
>>> branton.da...@spanning.com> wrote:
>>>
>>>> We're about to do the same thing.  It shouldn't be necessary to shut
>>>> down the entire cluster, right?
>>>>
>>>> On Wed, Feb 17, 2016 at 12:45 PM, Robert Coli <rc...@eventbrite.com>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Tue, Feb 16, 2016 at 11:29 PM, Anishek Agarwal <anis...@gmail.com>
>>>>> wrote:
>>>>>>
>>>>>> To accomplish this can I just copy the data from disk1 to disk2 with
>>>>>> in the relevant cassandra home location folders, change the cassanda.yaml
>>>>>> configuration and restart the node. before starting i will shutdown the
>>>>>> cluster.
>>>>>>
>>>>>
>>>>> Yes.
>>>>>
>>>>> =Rob
>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>


Re: Cassandra nodes reduce disks per node

2016-02-18 Thread Branton Davis
Alain, thanks for sharing!  I'm confused why you do so many repetitive
rsyncs.  Just being cautious or is there another reason?  Also, why do you
have --delete-before when you're copying data to a temp (assumed empty)
directory?

On Thu, Feb 18, 2016 at 4:12 AM, Alain RODRIGUEZ <arodr...@gmail.com> wrote:

> I did the process a few weeks ago and ended up writing a runbook and a
> script. I have anonymised and share it fwiw.
>
> https://github.com/arodrime/cassandra-tools/tree/master/remove_disk
>
> It is basic bash. I tried to have the shortest down time possible, making
> this a bit more complex, but it allows you to do a lot in parallel and just
> do a fast operation sequentially, reducing overall operation time.
>
> This worked fine for me, yet I might have make some errors while making it
> configurable though variables. Be sure to be around if you decide to run
> this. Also I automated this more by using knife (Chef), I hate to repeat
> ops, this is something you might want to consider.
>
> Hope this is useful,
>
> C*heers,
> -
> Alain Rodriguez
> France
>
> The Last Pickle
> http://www.thelastpickle.com
>
> 2016-02-18 8:28 GMT+01:00 Anishek Agarwal <anis...@gmail.com>:
>
>> Hey Branton,
>>
>> Please do let us know if you face any problems  doing this.
>>
>> Thanks
>> anishek
>>
>> On Thu, Feb 18, 2016 at 3:33 AM, Branton Davis <
>> branton.da...@spanning.com> wrote:
>>
>>> We're about to do the same thing.  It shouldn't be necessary to shut
>>> down the entire cluster, right?
>>>
>>> On Wed, Feb 17, 2016 at 12:45 PM, Robert Coli <rc...@eventbrite.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Tue, Feb 16, 2016 at 11:29 PM, Anishek Agarwal <anis...@gmail.com>
>>>> wrote:
>>>>>
>>>>> To accomplish this can I just copy the data from disk1 to disk2 with
>>>>> in the relevant cassandra home location folders, change the cassanda.yaml
>>>>> configuration and restart the node. before starting i will shutdown the
>>>>> cluster.
>>>>>
>>>>
>>>> Yes.
>>>>
>>>> =Rob
>>>>
>>>>
>>>
>>>
>>
>


Re: Sudden disk usage

2016-02-13 Thread Branton Davis
We use SizeTieredCompaction.  The nodes were about 67% full and we were
planning on adding new nodes (doubling the cluster to 6) soon.  I've been
watching the disk space used, and the nodes were taking about 100GB during
compaction, so I thought we were going to be okay for another week.  The
other nodes are still like that.  It's just this one node that's now taking
a lot more and I'm worried about running out of disk space.  I've gone
ahead and added 2 new nodes and was hoping cleanup would buy some space,
but it looks like compaction still has to complete, and is just continuing
to eat up space.

I guess, worst case scenario, I can remove that node and replace it, but
it's just really strange that this is happening with just the one node, and
apparently adding the new nodes hasn't helped in the short term.

On Sat, Feb 13, 2016 at 4:37 PM, Jan Kesten <j.kes...@enercast.de> wrote:

> Hi,
>
> what kind of compaction strategy do you use? What you are about to see is
> a compaction likely - think of 4 sstables of 50gb each, compacting those
> can take up 200g while rewriting the new sstable. After that the old ones
> are deleted and space will be freed again.
>
> If using SizeTieredCompaction you can end up with very huge sstables as I
> do (>250gb each). In the worst case you could possibly need twice the space
> - a reason why I set up my monitoring for disk to 45% usage.
>
> Just my 2 cents.
> Jan
>
> Von meinem iPhone gesendet
>
> > Am 13.02.2016 um 08:48 schrieb Branton Davis <branton.da...@spanning.com
> >:
> >
> > One of our clusters had a strange thing happen tonight.  It's a 3 node
> cluster, running 2.1.10.  The primary keyspace has RF 3, vnodes with 256
> tokens.
> >
> > This evening, over the course of about 6 hours, disk usage increased
> from around 700GB to around 900GB on only one node.  I was at a loss as to
> what was happening and, on a whim, decided to run nodetool cleanup on the
> instance.  I had no reason to believe that it was necessary, as no nodes
> were added or tokens moved (not intentionally, anyhow).  But it immediately
> cleared up that extra space.
> >
> > I'm pretty lost as to what would have happened here.  Any ideas where to
> look?
> >
> > Thanks!
> >
>


Sudden disk usage

2016-02-12 Thread Branton Davis
One of our clusters had a strange thing happen tonight.  It's a 3 node
cluster, running 2.1.10.  The primary keyspace has RF 3, vnodes with 256
tokens.

This evening, over the course of about 6 hours, disk usage increased from
around 700GB to around 900GB on only one node.  I was at a loss as to what
was happening and, on a whim, decided to run nodetool cleanup on the
instance.  I had no reason to believe that it was necessary, as no nodes
were added or tokens moved (not intentionally, anyhow).  But it immediately
cleared up that extra space.

I'm pretty lost as to what would have happened here.  Any ideas where to
look?

Thanks!


Re: Any excellent tutorials or automated scripts for cluster setup on EC2?

2016-01-28 Thread Branton Davis
If you use Chef, there's this cookbook:
https://github.com/michaelklishin/cassandra-chef-cookbook

It's not perfect, but you can make a wrapper cookbook pretty easily to
fix/extend it to do anything you need.

On Wed, Jan 27, 2016 at 11:25 PM, Richard L. Burton III 
wrote:

> I'm curious to see if there's automated scripts or tutorials on setting up
> Cassandra on EC2 with security taken care of etc.
>
> Thanks,
> --
> -Richard L. Burton III
> @rburton
>


Re: Too many open files Cassandra 2.1.11.872

2015-11-06 Thread Branton Davis
We recently went down the rabbit hole of trying to understand the output of
lsof.  lsof -n has a lot of duplicates (files opened by multiple threads).
Use 'lsof -p $PID' or 'lsof -u cassandra' instead.

On Fri, Nov 6, 2015 at 12:49 PM, Bryan Cheng  wrote:

> Is your compaction progressing as expected? If not, this may cause an
> excessive number of tiny db files. Had a node refuse to start recently
> because of this, had to temporarily remove limits on that process.
>
> On Fri, Nov 6, 2015 at 10:09 AM, Jason Lewis 
> wrote:
>
>> I'm getting too many open files errors and I'm wondering what the
>> cause may be.
>>
>> lsof -n | grep java  show 1.4M files
>>
>> ~90k are inodes
>> ~70k are pipes
>> ~500k are cassandra services in /usr
>> ~700K are the data files.
>>
>> What might be causing so many files to be open?
>>
>> jas
>>
>
>


Re: "invalid global counter shard detected" warning on 2.1.3 and 2.1.10

2015-10-20 Thread Branton Davis
Howdy Cassandra folks.

Crickets here and it's sort of unsettling that we're alone with this
issue.  Is it appropriate to create a JIRA issue for this or is there maybe
another way to deal with it?

Thanks!

On Sun, Oct 18, 2015 at 1:55 PM, Branton Davis <branton.da...@spanning.com>
wrote:

> Hey all.
>
> We've been seeing this warning on one of our clusters:
>
> 2015-10-18 14:28:52,898 WARN  [ValidationExecutor:14]
> org.apache.cassandra.db.context.CounterContext invalid global counter shard
> detected; (4aa69016-4cf8-4585-8f23-e59af050d174, 1, 67158) and
> (4aa69016-4cf8-4585-8f23-e59af050d174, 1, 21486) differ only in count; will
> pick highest to self-heal on compaction
>
>
> From what I've read and heard in the IRC channel, this warning could be
> related to not running upgradesstables after upgrading from 2.0.x to
> 2.1.x.  I don't think we ran that then, but we've been at 2.1 since last
> November.  Looking back, the warnings start appearing around June, when no
> maintenance had been performed on the cluster.  At that time, we had been
> on 2.1.3 for a couple of months.  We've been on 2.1.10 for the last week
> (the upgrade was when we noticed this warning for the first time).
>
> From a suggestion in IRC, I went ahead and ran upgradesstables on all the
> nodes.  Our weekly repair also ran this morning.  But the warnings still
> show up throughout the day.
>
> So, we have many questions:
>
>- How much should we be freaking out?
>- Why is this recurring?  If I understand what's happening, this is a
>self-healing process.  So, why would it keep happening?  Are we possibly
>using counters incorrectly?
>- What does it even mean that there were multiple shards for the same
>counter?  How does that situation even occur?
>
> We're pretty lost here, so any help would be greatly appreciated.
>
> Thanks!
>


Re: Would we have data corruption if we bootstrapped 10 nodes at once?

2015-10-20 Thread Branton Davis
On Mon, Oct 19, 2015 at 5:42 PM, Robert Coli <rc...@eventbrite.com> wrote:

> On Mon, Oct 19, 2015 at 9:20 AM, Branton Davis <branton.da...@spanning.com
> > wrote:
>
>> Is that also true if you're standing up multiple nodes from backups that
>> already have data?  Could you not stand up more than one at a time since
>> they already have the data?
>>
>
> An operator probably almost never wants to add multiple
> not-previously-joined nodes to an active cluster via auto_bootstrap:false.
>
> The one case I can imagine is when you are starting a cluster which is not
> receiving any write traffic and does contain snapshots.
>
> =Rob
>

Just to clarify, I was thinking about a scenario/disaster where we lost the
entire cluster and had to rebuild from backups.  I assumed we would start
each node with the backed up data and commit log directories already there
and with auto_bootstrap=false, and I also hoped that we could do all nodes
at once, since they each already had their data.  Is that wrong?  If so,
how would you handle such a situation?


Re: Would we have data corruption if we bootstrapped 10 nodes at once?

2015-10-20 Thread Branton Davis
On Tue, Oct 20, 2015 at 3:31 PM, Robert Coli <rc...@eventbrite.com> wrote:

> On Tue, Oct 20, 2015 at 9:13 AM, Branton Davis <branton.da...@spanning.com
> > wrote:
>
>>
>>> Just to clarify, I was thinking about a scenario/disaster where we lost
>> the entire cluster and had to rebuild from backups.  I assumed we would
>> start each node with the backed up data and commit log directories already
>> there and with auto_bootstrap=false, and I also hoped that we could do all
>> nodes at once, since they each already had their data.  Is that wrong?  If
>> so, how would you handle such a situation?
>>
>
> "The one case I can imagine is when you are starting a cluster which is
> not receiving any write traffic and does contain snapshots. "
>
> The case you describe is in that class of cases.
>
> =Rob
>
>
>
>>
>>
>
>
Thanks for confirming!


Re: "invalid global counter shard detected" warning on 2.1.3 and 2.1.10

2015-10-20 Thread Branton Davis
Sebastián, thanks so much for the info!

On Tue, Oct 20, 2015 at 11:34 AM, Sebastian Estevez <
sebastian.este...@datastax.com> wrote:

> Hi Branton,
>
>
>>- How much should we be freaking out?
>>
>> The impact of this is possible counter inaccuracy (over counting or under
> counting). If you are expecting counters to be exactly accurate, you are
> already in trouble because they are not. This is because of the fact that
> they are not idempotent operations operating in a distributed system
> (you've probably read Aleksey's
> <http://www.datastax.com/dev/blog/whats-new-in-cassandra-2-1-a-better-implementation-of-counters>
> post by now).
>
>>
>>- Why is this recurring?  If I understand what's happening, this is a
>>self-healing process.  So, why would it keep happening?  Are we possibly
>>using counters incorrectly?
>>
>> Even after running sstableupgrade, your counter cells will not be
> upgraded until they have all been incremented. You may still seeing the
> warning happening on pre 2.1 counter cells which have not been incremented
> yet.
>
>>
>>- What does it even mean that there were multiple shards for the same
>>counter?  How does that situation even occur?
>>
>> We used to maintain "counter shards" at the sstable level in pre 2.1
> counters. This means that on compaction or reads we would essentially add
> the shards together when getting the value or merging the cells. This
> caused a series of problems including the warning you are still seeing.
> TL;DR, we now store the final value of the counter (not the
> increment/shard) at the commitlog level and beyond in post 2.1 counters, so
> this is no longer an issue. Again, read Aleksey's post
> <http://www.datastax.com/dev/blog/whats-new-in-cassandra-2-1-a-better-implementation-of-counters>
> .
>
> Many users started fresh tables after upgrading to 2.1, update only the
> new tables, and added application logic to decide what table to read from.
> Something like monthly tables works well if you're doing time series
> counters, and would ensure that you stop seeing the warnings on the
> new/active tables and get the benefits of 2.1 counters quickly.
>
>
>
>
> All the best,
>
>
> [image: datastax_logo.png] <http://www.datastax.com/>
>
> Sebastián Estévez
>
> Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com
>
> [image: linkedin.png] <https://www.linkedin.com/company/datastax> [image:
> facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
> <https://twitter.com/datastax> [image: g+.png]
> <https://plus.google.com/+Datastax/about>
> <http://feeds.feedburner.com/datastax>
> <http://goog_410786983>
>
>
> <http://www.datastax.com/gartner-magic-quadrant-odbms>
>
> DataStax is the fastest, most scalable distributed database technology,
> delivering Apache Cassandra to the world’s most innovative enterprises.
> Datastax is built to be agile, always-on, and predictably scalable to any
> size. With more than 500 customers in 45 countries, DataStax is the
> database technology and transactional backbone of choice for the worlds
> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>
> On Tue, Oct 20, 2015 at 12:21 PM, Branton Davis <
> branton.da...@spanning.com> wrote:
>
>> Howdy Cassandra folks.
>>
>> Crickets here and it's sort of unsettling that we're alone with this
>> issue.  Is it appropriate to create a JIRA issue for this or is there maybe
>> another way to deal with it?
>>
>> Thanks!
>>
>> On Sun, Oct 18, 2015 at 1:55 PM, Branton Davis <
>> branton.da...@spanning.com> wrote:
>>
>>> Hey all.
>>>
>>> We've been seeing this warning on one of our clusters:
>>>
>>> 2015-10-18 14:28:52,898 WARN  [ValidationExecutor:14]
>>> org.apache.cassandra.db.context.CounterContext invalid global counter shard
>>> detected; (4aa69016-4cf8-4585-8f23-e59af050d174, 1, 67158) and
>>> (4aa69016-4cf8-4585-8f23-e59af050d174, 1, 21486) differ only in count; will
>>> pick highest to self-heal on compaction
>>>
>>>
>>> From what I've read and heard in the IRC channel, this warning could be
>>> related to not running upgradesstables after upgrading from 2.0.x to
>>> 2.1.x.  I don't think we ran that then, but we've been at 2.1 since last
>>> November.  Looking back, the warnings start appearing around June, when no
>>> maintenance had been performed on the cluster.  At that time, we had been
>>> on 2.1.3 for a couple of months.  We've b

Re: Would we have data corruption if we bootstrapped 10 nodes at once?

2015-10-19 Thread Branton Davis
Is that also true if you're standing up multiple nodes from backups that
already have data?  Could you not stand up more than one at a time since
they already have the data?

On Mon, Oct 19, 2015 at 10:48 AM, Eric Stevens  wrote:

> It seems to me that as long as cleanup hasn't happened, if you
> *decommission* the newly joined nodes, they'll stream whatever writes
> they took back to the original replicas.  Presumably that should be pretty
> quick as they won't have nearly as much data as the original nodes (as they
> only hold data written while they were online).  Then as long as cleanup
> hasn't happened, your cluster should have returned to a consistent view of
> the data.  You can now bootstrap the new nodes again.
>
> If you have done a cleanup, then the data is probably irreversibly
> corrupted, you will have to figure out how to restore the missing data
> incrementally from backups if they are available.
>
> On Sun, Oct 18, 2015 at 10:37 PM Raj Chudasama 
> wrote:
>
>> In this can does it make sense to remove newly added nodes, correct the
>> configuration and have them rejoin one at a time ?
>>
>> Thx
>>
>> Sent from my iPhone
>>
>> On Oct 18, 2015, at 11:19 PM, Jeff Jirsa 
>> wrote:
>>
>> Take a snapshot now, before you get rid of any data (whatever you do,
>> don’t run cleanup).
>>
>> If you identify missing data, you can go back to those snapshots, find
>> the nodes that had the data previously (sstable2json, for example), and
>> either re-stream that data into the cluster with sstableloader or copy it
>> to a new host and `nodetool refresh` it into the new system.
>>
>>
>>
>> From:  on behalf of Kevin Burton
>> Reply-To: "user@cassandra.apache.org"
>> Date: Sunday, October 18, 2015 at 8:10 PM
>> To: "user@cassandra.apache.org"
>> Subject: Re: Would we have data corruption if we bootstrapped 10 nodes
>> at once?
>>
>> ouch.. OK.. I think I really shot myself in the foot here then.  This
>> might be bad.
>>
>> I'm not sure if I would have missing data.  I mean basically the data is
>> on the other nodes.. but the cluster has been running with 10 nodes
>> accidentally bootstrapped with auto_bootstrap=false.
>>
>> So they have new data and seem to be missing values.
>>
>> this is somewhat misleading... Initially if you start it up and run
>> nodetool status , it only returns one node.
>>
>> So I assumed auto_bootstrap=false meant that it just doesn't join the
>> cluster.
>>
>> I'm running a nodetool repair now to hopefully fix this.
>>
>>
>>
>> On Sun, Oct 18, 2015 at 7:25 PM, Jeff Jirsa 
>> wrote:
>>
>>> auto_bootstrap=false tells it to join the cluster without running
>>> bootstrap – the node assumes it has all of the necessary data, and won’t
>>> stream any missing data.
>>>
>>> This generally violates consistency guarantees, but if done on a single
>>> node, is typically correctable with `nodetool repair`.
>>>
>>> If you do it on many  nodes at once, it’s possible that the new nodes
>>> could represent all 3 replicas of the data, but don’t physically have any
>>> of that data, leading to missing records.
>>>
>>>
>>>
>>> From:  on behalf of Kevin Burton
>>> Reply-To: "user@cassandra.apache.org"
>>> Date: Sunday, October 18, 2015 at 3:44 PM
>>> To: "user@cassandra.apache.org"
>>> Subject: Re: Would we have data corruption if we bootstrapped 10 nodes
>>> at once?
>>>
>>> An shit.. I think we're seeing corruption.. missing records :-/
>>>
>>> On Sat, Oct 17, 2015 at 10:45 AM, Kevin Burton 
>>> wrote:
>>>
 We just migrated from a 30 node cluster to a 45 node cluster. (so 15
 new nodes)

 By default we have auto_boostrap = false

 so we just push our config to the cluster, the cassandra daemons
 restart, and they're not cluster members and are the only nodes in the
 cluster.

 Anyway.  While I was about 1/2 way done adding the 15 nodes,  I had
 about 7 members of the cluster and 8 not yet joined.

 We are only doing 1 at a time because apparently bootstrapping more
 than 1 is unsafe.

 I did a rolling restart whereby I went through and restarted all the
 cassandra boxes.

 Somehow the new nodes auto boostrapped themselves EVEN though
 auto_bootstrap=false.

 We don't have any errors.  Everything seems functional.  I'm just
 worried about data loss.

 Thoughts?

 Kevin

 --

 We’re hiring if you know of any awesome Java Devops or Linux Operations
 Engineers!

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 


>>>
>>>
>>> --
>>>
>>> We’re hiring if you know of any awesome Java Devops or Linux Operations
>>> Engineers!
>>>
>>> Founder/CEO 

"invalid global counter shard detected" warning on 2.1.3 and 2.1.10

2015-10-18 Thread Branton Davis
Hey all.

We've been seeing this warning on one of our clusters:

2015-10-18 14:28:52,898 WARN  [ValidationExecutor:14]
org.apache.cassandra.db.context.CounterContext invalid global counter shard
detected; (4aa69016-4cf8-4585-8f23-e59af050d174, 1, 67158) and
(4aa69016-4cf8-4585-8f23-e59af050d174, 1, 21486) differ only in count; will
pick highest to self-heal on compaction


>From what I've read and heard in the IRC channel, this warning could be
related to not running upgradesstables after upgrading from 2.0.x to
2.1.x.  I don't think we ran that then, but we've been at 2.1 since last
November.  Looking back, the warnings start appearing around June, when no
maintenance had been performed on the cluster.  At that time, we had been
on 2.1.3 for a couple of months.  We've been on 2.1.10 for the last week
(the upgrade was when we noticed this warning for the first time).

>From a suggestion in IRC, I went ahead and ran upgradesstables on all the
nodes.  Our weekly repair also ran this morning.  But the warnings still
show up throughout the day.

So, we have many questions:

   - How much should we be freaking out?
   - Why is this recurring?  If I understand what's happening, this is a
   self-healing process.  So, why would it keep happening?  Are we possibly
   using counters incorrectly?
   - What does it even mean that there were multiple shards for the same
   counter?  How does that situation even occur?

We're pretty lost here, so any help would be greatly appreciated.

Thanks!