Twcs - clearing accidental out of order writes

2019-01-05 Thread Rutvij Bhatt
Hello,

I have a table with twcs that is currently not able to remove some expires
sstables because it's being blocked by an overlapping no-expired sstable.
>From looking at the content, where the insert timestamp + ttl don't equal
expiration, I think that this is an out of order write through the normal
write path. I don't believe this is due to read repairs as I have read
repairs chance set to 0 on this table.

The blocking sstable's data has expired for business purposes and I am fine
with losing it entirely. The forums seem to indicate that it's reasonable
to stop the node and remove the blocking sstable (rm) along with its
companion files (index, checksum etc) entirely and restart the node. Wanted
to ask and see if that was a reasonable thing to do or if there was a
cleaner alternative.

Thanks!
Rutvij


Re: Any Cassandra Backup and Restore tool like Cassandra Reaper?

2017-12-14 Thread Rutvij Bhatt
There is tablesnap/tablechop/tableslurp -
https://github.com/JeremyGrosser/tablesnap.


On Thu, Dec 14, 2017 at 3:49 PM Roger Brown <
roger.br...@perfectsearchcorp.com> wrote:

> I've found nothing affordable that works with vnodes. If you have money,
> you could use DataStax OpsCenter or Datos.io Recoverx.
>
> I ended up creating a cron job to make snapshots along with
> incremental_backups: true in the cassandra.yaml. And I'm thinking of
> setting up a replication strategy so that one rack contains 1  replica of
> each keyspace and then using r1soft to image each of those servers to tape
> for offsite backup.
>
>
> On Thu, Dec 14, 2017 at 1:30 PM Harika Vangapelli -T (hvangape - AKRAYA
> INC at Cisco)  wrote:
>
>> Any Cassandra Backup and Restore tool like Cassandra Reaper for Repairs?
>>
>>
>>
>> [image:
>> http://wwwin.cisco.com/c/dam/cec/organizations/gmcc/services-tools/signaturetool/images/logo/logo_gradient.png]
>>
>>
>>
>> *Harika Vangapelli*
>>
>> Engineer - IT
>>
>> hvang...@cisco.com
>>
>> Tel:
>>
>> *Cisco Systems, Inc.*
>>
>>
>>
>>
>> United States
>> cisco.com
>>
>>
>>
>> [image: http://www.cisco.com/assets/swa/img/thinkbeforeyouprint.gif]Think
>> before you print.
>>
>> This email may contain confidential and privileged material for the sole
>> use of the intended recipient. Any review, use, distribution or disclosure
>> by others is strictly prohibited. If you are not the intended recipient (or
>> authorized to receive for the recipient), please contact the sender by
>> reply email and delete all copies of this message.
>>
>> Please click here
>>  for
>> Company Registration Information.
>>
>>
>>
>


Re: Incorrect quorum count in driver error logs

2017-06-26 Thread Rutvij Bhatt
Yes.

On Mon, Jun 26, 2017 at 5:45 PM Hannu Kröger  wrote:

> Just to be sure: you have only one datacenter configured in Cassandra?
>
> Hannu
>
> On 27 Jun 2017, at 0.02, Rutvij Bhatt  wrote:
>
> Hi guys,
>
> I observed some odd behaviour with our Cassandra cluster the other day
> while doing some maintenance operation and was wondering if anyone would be
> able to provide some insight.
>
> Initially, I started a node up to join the cluster. That node appeared to
> be having issues joining due to some SSTable corruption it encountered.
> Since it was still in early staged and I had never seen this failure
> before, I decided to take it out of commission and just try again. However,
> since it was in a bad state, I decided to issue a "nodetool removenode
> " on a peer rather than a "nodetool decommission" on the node
> itself.
>
> The removenode command hung indefinitely - my guess is that this is
> related to https://issues.apache.org/jira/browse/CASSANDRA-6542. We are
> using 2.1.11.
>
> While this was happening, the driver in the application started logging
> error messages about not being able to reach a quorum of 4. This, to me,
> was mysterious as none of my keyspaces have an RF > 3. That quorum count in
> the error implied an RF of 6 or 7.
>
> I eventually forced that node out of the ring with "nodetool removenode
> force". This seemed to mostly fix the issue, though there seems to have
> been enough of a load spike to cause some of the machines' JVMs to
> accumulate a lot of garbage very fast and spit out a ton of "Not marking
> nodes down due to local pause of ... ", trying to clean it up. Some of
> these nodes seemed unresponsive to their peers, who marked them DOWN (as
> indicated by "nodetool status" and the cassandra log file on those
> machines), further exacerbating the situation on the nodes that were still
> up.
>
> I guess my question is two-fold. First, can anyone provide some insight
> into what may have happened? Second, what do you consider good practices
> when dealing with such issues? Any advice is greatly appreciated!
>
> Thanks,
> Rutvij
>
>


Incorrect quorum count in driver error logs

2017-06-26 Thread Rutvij Bhatt
Hi guys,

I observed some odd behaviour with our Cassandra cluster the other day
while doing some maintenance operation and was wondering if anyone would be
able to provide some insight.

Initially, I started a node up to join the cluster. That node appeared to
be having issues joining due to some SSTable corruption it encountered.
Since it was still in early staged and I had never seen this failure
before, I decided to take it out of commission and just try again. However,
since it was in a bad state, I decided to issue a "nodetool removenode
" on a peer rather than a "nodetool decommission" on the node
itself.

The removenode command hung indefinitely - my guess is that this is related
to https://issues.apache.org/jira/browse/CASSANDRA-6542. We are using
2.1.11.

While this was happening, the driver in the application started logging
error messages about not being able to reach a quorum of 4. This, to me,
was mysterious as none of my keyspaces have an RF > 3. That quorum count in
the error implied an RF of 6 or 7.

I eventually forced that node out of the ring with "nodetool removenode
force". This seemed to mostly fix the issue, though there seems to have
been enough of a load spike to cause some of the machines' JVMs to
accumulate a lot of garbage very fast and spit out a ton of "Not marking
nodes down due to local pause of ... ", trying to clean it up. Some of
these nodes seemed unresponsive to their peers, who marked them DOWN (as
indicated by "nodetool status" and the cassandra log file on those
machines), further exacerbating the situation on the nodes that were still
up.

I guess my question is two-fold. First, can anyone provide some insight
into what may have happened? Second, what do you consider good practices
when dealing with such issues? Any advice is greatly appreciated!

Thanks,
Rutvij


Re: Node replacement strategy with AWS EBS

2017-06-14 Thread Rutvij Bhatt
Thanks again for your help! To summarize for anyone who stumbles onto this
in the future, this article covers the procedure well:
https://www.eventbrite.com/engineering/changing-the-ip-address-of-a-cassandra-node-with-auto_bootstrapfalse/

It is more or less what Hannu suggested.

I carried out the following steps:
1. safely stop the cassandra instance (nodetool drain + service cassandra
stop)
2. Shut down the ec2 instance.
3. detach the storage volume from old instance.
4. attach to new instance.
5. point cassandra configuration on new instance to this drive and set
auto_bootstrap: false
6. start cassandra on new instance. Once it has established connection with
peers, you will notice that it takes over the token ranges on its own.
Doing a select on the system.peers table will show that the old node is
gone.
7. Run nodetool repair if need be.

On Tue, Jun 13, 2017 at 1:01 PM Rutvij Bhatt  wrote:

> Nevermind, I misunderstood the first link. In this case, the replacement
> would just be leaving the listen_address as is (to
> InetAddress.getLocalHost()) and just start the new instance up as you
> pointed out in your original answer Hannu.
>
> Thanks.
>
> On Tue, Jun 13, 2017 at 12:35 PM Rutvij Bhatt  wrote:
>
>> Hannu/Nitan,
>>
>> Thanks for your help so far! From what you said in your first response, I
>> can get away with just attaching the EBS volume to Cassandra and starting
>> it with the old node's private IP as my listen_address because it will take
>> over the token assignment from the old node using the data files? With
>> regards to "Cassandra automatically realizes that have just effectively
>> changed IP address.", it says in the first link to change this manually to
>> the desired address - does this not apply in my case if I'm replacing the
>> old node?
>>
>> As for the plan I outlined earlier, is this more for DR scenarios where I
>> have lost a node due to hardware failure and I need to recover the data in
>> a safe manner by requesting a stream from the other replicas?  Am I
>> understanding this right?
>>
>>
>> On Tue, Jun 13, 2017 at 11:59 AM Hannu Kröger  wrote:
>>
>>> Hello,
>>>
>>> So the local information about tokens is stored in the system keyspace.
>>> Also the host id and all that.
>>>
>>> Also documented here:
>>>
>>> https://support.datastax.com/hc/en-us/articles/204289959-Changing-IP-addresses-in-DSE
>>>
>>> If for any reason that causes issues, you can also check this:
>>> https://issues.apache.org/jira/browse/CASSANDRA-8382
>>>
>>> If you copy all cassandra data, you are on the safe side. Good point in
>>> the links is that if you have IP addresses in topolgy or other files, then
>>> update those as well.
>>>
>>> Hannu
>>>
>>> On 13 June 2017 at 11:53:13, Nitan Kainth (ni...@bamlabs.com) wrote:
>>>
>>> Hannu,
>>>
>>> "Cassandra automatically realizes that have just effectively changed IP
>>> address” —> are you sure C* will take care of IP change as is? How will it
>>> know which token range to be assigned to this new IP address?
>>>
>>> On Jun 13, 2017, at 10:51 AM, Hannu Kröger  wrote:
>>>
>>> Cassandra automatically realizes that have just effectively changed IP
>>> address
>>>
>>>
>>>


Re: Node replacement strategy with AWS EBS

2017-06-13 Thread Rutvij Bhatt
Nevermind, I misunderstood the first link. In this case, the replacement
would just be leaving the listen_address as is (to
InetAddress.getLocalHost()) and just start the new instance up as you
pointed out in your original answer Hannu.

Thanks.

On Tue, Jun 13, 2017 at 12:35 PM Rutvij Bhatt  wrote:

> Hannu/Nitan,
>
> Thanks for your help so far! From what you said in your first response, I
> can get away with just attaching the EBS volume to Cassandra and starting
> it with the old node's private IP as my listen_address because it will take
> over the token assignment from the old node using the data files? With
> regards to "Cassandra automatically realizes that have just effectively
> changed IP address.", it says in the first link to change this manually to
> the desired address - does this not apply in my case if I'm replacing the
> old node?
>
> As for the plan I outlined earlier, is this more for DR scenarios where I
> have lost a node due to hardware failure and I need to recover the data in
> a safe manner by requesting a stream from the other replicas?  Am I
> understanding this right?
>
>
> On Tue, Jun 13, 2017 at 11:59 AM Hannu Kröger  wrote:
>
>> Hello,
>>
>> So the local information about tokens is stored in the system keyspace.
>> Also the host id and all that.
>>
>> Also documented here:
>>
>> https://support.datastax.com/hc/en-us/articles/204289959-Changing-IP-addresses-in-DSE
>>
>> If for any reason that causes issues, you can also check this:
>> https://issues.apache.org/jira/browse/CASSANDRA-8382
>>
>> If you copy all cassandra data, you are on the safe side. Good point in
>> the links is that if you have IP addresses in topolgy or other files, then
>> update those as well.
>>
>> Hannu
>>
>> On 13 June 2017 at 11:53:13, Nitan Kainth (ni...@bamlabs.com) wrote:
>>
>> Hannu,
>>
>> "Cassandra automatically realizes that have just effectively changed IP
>> address” —> are you sure C* will take care of IP change as is? How will it
>> know which token range to be assigned to this new IP address?
>>
>> On Jun 13, 2017, at 10:51 AM, Hannu Kröger  wrote:
>>
>> Cassandra automatically realizes that have just effectively changed IP
>> address
>>
>>
>>


Re: Node replacement strategy with AWS EBS

2017-06-13 Thread Rutvij Bhatt
Hannu/Nitan,

Thanks for your help so far! From what you said in your first response, I
can get away with just attaching the EBS volume to Cassandra and starting
it with the old node's private IP as my listen_address because it will take
over the token assignment from the old node using the data files? With
regards to "Cassandra automatically realizes that have just effectively
changed IP address.", it says in the first link to change this manually to
the desired address - does this not apply in my case if I'm replacing the
old node?

As for the plan I outlined earlier, is this more for DR scenarios where I
have lost a node due to hardware failure and I need to recover the data in
a safe manner by requesting a stream from the other replicas?  Am I
understanding this right?


On Tue, Jun 13, 2017 at 11:59 AM Hannu Kröger  wrote:

> Hello,
>
> So the local information about tokens is stored in the system keyspace.
> Also the host id and all that.
>
> Also documented here:
>
> https://support.datastax.com/hc/en-us/articles/204289959-Changing-IP-addresses-in-DSE
>
> If for any reason that causes issues, you can also check this:
> https://issues.apache.org/jira/browse/CASSANDRA-8382
>
> If you copy all cassandra data, you are on the safe side. Good point in
> the links is that if you have IP addresses in topolgy or other files, then
> update those as well.
>
> Hannu
>
> On 13 June 2017 at 11:53:13, Nitan Kainth (ni...@bamlabs.com) wrote:
>
> Hannu,
>
> "Cassandra automatically realizes that have just effectively changed IP
> address” —> are you sure C* will take care of IP change as is? How will it
> know which token range to be assigned to this new IP address?
>
> On Jun 13, 2017, at 10:51 AM, Hannu Kröger  wrote:
>
> Cassandra automatically realizes that have just effectively changed IP
> address
>
>
>


Re: Node replacement strategy with AWS EBS

2017-06-13 Thread Rutvij Bhatt
Nitan,

Yes, that is what I've done. I snapshotted the volume after step 3 and will
create a new volume from that snapshot and attach it to the new instance.
Curious if I am indeed replacing a node completely, is there any logical
difference between snapshot->create->attach vs detach from old->attach to
new besides a margin of safety?

Thanks for your reply!

On Tue, Jun 13, 2017 at 11:37 AM Nitan Kainth  wrote:

> Steps are good Rutvij. Step 1 is not mandatory.
>
> We snapshot EBS volume and then restored on new node. How are you
> re-attaching EBS volume without snapshot?
>
>
> I
>
> On Jun 13, 2017, at 10:21 AM, Rutvij Bhatt  wrote:
>
> Hi!
>
> We're running a Cassandra cluster on AWS. I want to replace an old node
> with EBS storage with a new one. The steps I'm following are as follows and
> I want to get a second opinion on whether this is the right thing to do:
>
> 1. Remove old node from gossip.
> 2. Run nodetool drain
> 3. Stop cassandra
> 4. Create new new node and update JVM_OPTS in cassandra-env.sh with
> cassandra.replace_address= as instructed
> here -
> http://docs.datastax.com/en/cassandra/2.1/cassandra/operations/opsReplaceNode.html
> 5. Attach the EBS volume from the old node at the same mount point.
> 6. Start cassandra on the new node.
> 7. Run nodetool repair to catch the replacing node up on whatever it has
> missed.
>
> Thanks!
>
>
>


Node replacement strategy with AWS EBS

2017-06-13 Thread Rutvij Bhatt
Hi!

We're running a Cassandra cluster on AWS. I want to replace an old node
with EBS storage with a new one. The steps I'm following are as follows and
I want to get a second opinion on whether this is the right thing to do:

1. Remove old node from gossip.
2. Run nodetool drain
3. Stop cassandra
4. Create new new node and update JVM_OPTS in cassandra-env.sh with
cassandra.replace_address= as instructed
here -
http://docs.datastax.com/en/cassandra/2.1/cassandra/operations/opsReplaceNode.html
5. Attach the EBS volume from the old node at the same mount point.
6. Start cassandra on the new node.
7. Run nodetool repair to catch the replacing node up on whatever it has
missed.

Thanks!