Re: Corrupt SSTable Cassandra 3.11.2

2020-02-13 Thread manish khandelwal
Thanks Jeff for your response.

Do you see any risk in following approach

1. Stop the node.
2. Remove all sstable files from
*/var/lib/cassandra/data/keyspace/tablename-23dfadf32adf33d33s333s33s3s33 *
directory.
3. Start the node.
4. Run full repair on this particular table

I wanted to go this way because this table is small (5-6 GB).  I would like
to avoid 2-3 days of streaming in case of replacing the whole host.

Regards
Manish

On Fri, Feb 14, 2020 at 12:28 PM Jeff Jirsa  wrote:

> Agree this is both strictly possible and more common with LCS. The only
> thing that's strictly correct to do is treat every corrupt sstable
> exception as a failed host, and replace it just like you would a failed
> host.
>
>
> On Thu, Feb 13, 2020 at 10:55 PM manish khandelwal <
> manishkhandelwa...@gmail.com> wrote:
>
>> Thanks Erick
>>
>> I would like to explain how data resurrection can take place with single
>> SSTable deletion.
>>
>> Consider this case of table with Levelled Compaction Strategy
>>
>> 1. Data A written a long time back.
>> 2. Data A is deleted and tombstone is created.
>> 3. After GC grace tombstone is purgeable.
>> 4. Now the SSTable containing purgeable tombstone in one node is
>> corruputed.
>> 4. The node with corrupt SSTable cannot compact the data and purgeable
>> tombstone
>> 6. From other two nodes Data A is removed after compaction.
>> 7. Remove the corrupt SSTable from impacted node.
>> 8. When you run repair Data A is copied to all the nodes.
>>
>> This table in quesiton is using Levelled Compaction Strategy.
>>
>> Regards
>> Manish
>>
>> On Fri, Feb 14, 2020 at 12:00 PM Erick Ramirez <
>> erick.rami...@datastax.com> wrote:
>>
>>> The log shows that the the problem occurs when decompressing the SSTable
>>> but there's not much actionable info from it.
>>>
>>> I would like to know what will be "ordinary hammer" in this  case. Do
 you want to suggest that  deleting only corrupt sstable file ( in this case
 mc-1234-big-*.db) would be suffice ?
>>>
>>>
>>> Exactly. I mean if it's just a one-off, why go through the trouble of
>>> blowing away all the files? :)
>>>
>>> I am afraid that this may cause data resurrection (I have prior
 experience with same).
>>>
>>>
>>> Whoa! That's a long bow to draw. Sounds like there's more history to it.
>>>
>>> Note that i am not willing to run the entire node rebuild as it will
 take lots of time due to presence of multiple big tables (I am keeping it
 as my last option)
>>>
>>>
>>> I wasn't going to suggest that at all. I didn't like the sledge hammer
>>> approach. I certainly wouldn't recommend bringing in a wrecking ball. 
>>>
>>> Cheers!
>>>
>>


Re: Corrupt SSTable Cassandra 3.11.2

2020-02-13 Thread Jeff Jirsa
Agree this is both strictly possible and more common with LCS. The only
thing that's strictly correct to do is treat every corrupt sstable
exception as a failed host, and replace it just like you would a failed
host.


On Thu, Feb 13, 2020 at 10:55 PM manish khandelwal <
manishkhandelwa...@gmail.com> wrote:

> Thanks Erick
>
> I would like to explain how data resurrection can take place with single
> SSTable deletion.
>
> Consider this case of table with Levelled Compaction Strategy
>
> 1. Data A written a long time back.
> 2. Data A is deleted and tombstone is created.
> 3. After GC grace tombstone is purgeable.
> 4. Now the SSTable containing purgeable tombstone in one node is
> corruputed.
> 4. The node with corrupt SSTable cannot compact the data and purgeable
> tombstone
> 6. From other two nodes Data A is removed after compaction.
> 7. Remove the corrupt SSTable from impacted node.
> 8. When you run repair Data A is copied to all the nodes.
>
> This table in quesiton is using Levelled Compaction Strategy.
>
> Regards
> Manish
>
> On Fri, Feb 14, 2020 at 12:00 PM Erick Ramirez 
> wrote:
>
>> The log shows that the the problem occurs when decompressing the SSTable
>> but there's not much actionable info from it.
>>
>> I would like to know what will be "ordinary hammer" in this  case. Do you
>>> want to suggest that  deleting only corrupt sstable file ( in this case
>>> mc-1234-big-*.db) would be suffice ?
>>
>>
>> Exactly. I mean if it's just a one-off, why go through the trouble of
>> blowing away all the files? :)
>>
>> I am afraid that this may cause data resurrection (I have prior
>>> experience with same).
>>
>>
>> Whoa! That's a long bow to draw. Sounds like there's more history to it.
>>
>> Note that i am not willing to run the entire node rebuild as it will take
>>> lots of time due to presence of multiple big tables (I am keeping it as my
>>> last option)
>>
>>
>> I wasn't going to suggest that at all. I didn't like the sledge hammer
>> approach. I certainly wouldn't recommend bringing in a wrecking ball. 
>>
>> Cheers!
>>
>


Re: Corrupt SSTable Cassandra 3.11.2

2020-02-13 Thread manish khandelwal
Thanks Erick

I would like to explain how data resurrection can take place with single
SSTable deletion.

Consider this case of table with Levelled Compaction Strategy

1. Data A written a long time back.
2. Data A is deleted and tombstone is created.
3. After GC grace tombstone is purgeable.
4. Now the SSTable containing purgeable tombstone in one node is corruputed.
4. The node with corrupt SSTable cannot compact the data and purgeable
tombstone
6. From other two nodes Data A is removed after compaction.
7. Remove the corrupt SSTable from impacted node.
8. When you run repair Data A is copied to all the nodes.

This table in quesiton is using Levelled Compaction Strategy.

Regards
Manish

On Fri, Feb 14, 2020 at 12:00 PM Erick Ramirez 
wrote:

> The log shows that the the problem occurs when decompressing the SSTable
> but there's not much actionable info from it.
>
> I would like to know what will be "ordinary hammer" in this  case. Do you
>> want to suggest that  deleting only corrupt sstable file ( in this case
>> mc-1234-big-*.db) would be suffice ?
>
>
> Exactly. I mean if it's just a one-off, why go through the trouble of
> blowing away all the files? :)
>
> I am afraid that this may cause data resurrection (I have prior experience
>> with same).
>
>
> Whoa! That's a long bow to draw. Sounds like there's more history to it.
>
> Note that i am not willing to run the entire node rebuild as it will take
>> lots of time due to presence of multiple big tables (I am keeping it as my
>> last option)
>
>
> I wasn't going to suggest that at all. I didn't like the sledge hammer
> approach. I certainly wouldn't recommend bringing in a wrecking ball. 
>
> Cheers!
>


Re: Corrupt SSTable Cassandra 3.11.2

2020-02-13 Thread Erick Ramirez
The log shows that the the problem occurs when decompressing the SSTable
but there's not much actionable info from it.

I would like to know what will be "ordinary hammer" in this  case. Do you
> want to suggest that  deleting only corrupt sstable file ( in this case
> mc-1234-big-*.db) would be suffice ?


Exactly. I mean if it's just a one-off, why go through the trouble of
blowing away all the files? :)

I am afraid that this may cause data resurrection (I have prior experience
> with same).


Whoa! That's a long bow to draw. Sounds like there's more history to it.

Note that i am not willing to run the entire node rebuild as it will take
> lots of time due to presence of multiple big tables (I am keeping it as my
> last option)


I wasn't going to suggest that at all. I didn't like the sledge hammer
approach. I certainly wouldn't recommend bringing in a wrecking ball. 

Cheers!


Re: Corrupt SSTable Cassandra 3.11.2

2020-02-13 Thread manish khandelwal
Hi Erick

Thanks for your quick response. I have attached the full stacktrace which
show exception during validation phase of table repair.

I would like to know what will be "ordinary hammer" in this  case. Do you
want to suggest that  deleting only corrupt sstable file ( in this case
*mc-1234-big-*.db*) would be suffice ? I am afraid that this may cause data
resurrection (I have prior experience with same).
Or you are pointing towards running scrub ? Kindly explain.

Note that i am not willing to run the entire node rebuild as it will take
lots of time due to presence of multiple big tables (I am keeping it as my
last option)

Regards
Manish

On Fri, Feb 14, 2020 at 11:11 AM Erick Ramirez 
wrote:

> It will achieve the outcome you are after but I doubt anyone would
> recommend that approach. It's like using a sledgehammer when an ordinary
> hammer would suffice. And if you were hitting some bug then you'd run into
> the same problem anyway.
>
> Can you post the full stack trace? It might provide us some clues as to
> why you ran into the problem. Cheers!
>


error.log
Description: Binary data

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: AWS I3.XLARGE retiring instances advices

2020-02-13 Thread Jeff Jirsa
Feels that way and most people don’t do it, but definitely required for strict 
correctness.



> On Feb 13, 2020, at 8:57 PM, Erick Ramirez  wrote:
> 
> 
> Interesting... though it feels a bit extreme unless you're dealing with a 
> cluster that's constantly dropping mutations. In which case, you have bigger 
> problems anyway. :)


Re: Corrupt SSTable Cassandra 3.11.2

2020-02-13 Thread Erick Ramirez
It will achieve the outcome you are after but I doubt anyone would
recommend that approach. It's like using a sledgehammer when an ordinary
hammer would suffice. And if you were hitting some bug then you'd run into
the same problem anyway.

Can you post the full stack trace? It might provide us some clues as to why
you ran into the problem. Cheers!


Re: Corrupt SSTable Cassandra 3.11.2

2020-02-13 Thread manish khandelwal
Hi Eric

Thanks for reply.

Reason for corruption is unknown to me. I just found the corrupt table when
scheduled repair failed with logs showing






*ERROR [ValidationExecutor:16] 2020-01-21 19:13:18,123
CassandraDaemon.java:228 - Exception in thread
Thread[ValidationExecutor:16,1,main]org.apache.cassandra.io.sstable.CorruptSSTableException:
Corrupted:
/var/lib/cassandra/data/keyspace/tablename-23dfadf32adf33d33s333s33s3s33/mc-1234-big-Data.db
 at
org.apache.cassandra.io.sstable.SSTableIdentityIterator.hasNext(SSTableIdentityIterator.java:134)
~[apache-cassandra-3.11.2.jar:3.11.2]   at
org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:100)
~[apache-cassandra-3.11.2.jar:3.11.2]   at
org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:32)
~[apache-cassandra-3.11.2.jar:3.11.2]   at
org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)
~[apache-cassandra-3.11.2.jar:3.11.2]*


Regarding you question about removing all SSTable files of a table(column
family). I want quick recovery without any inconsistency. Since I have 3
node cluster with RF=3, my expectation is that repair would stream the data
from other two nodes. I just wanted to know is it correct to do this way

1. Stop the node.
2. Remove all sstable files from
*/var/lib/cassandra/data/keyspace/tablename-23dfadf32adf33d33s333s33s3s33
*directory.
3. Start the node.
4. Run full repair on this particular table

Regards
Manish







On Fri, Feb 14, 2020 at 4:44 AM Erick Ramirez 
wrote:

> You need to stop C* in order to run the offline sstable scrub utility.
> That's why it's referred to as "offline". :)
>
> Do you have any idea on what caused the corruption? It's highly unusual
> that you're thinking of removing all the files for just one table.
> Typically if the corruption was a result of a faulty disk or hardware
> failure, it wouldn't be isolated to just one table. If you provide a bit
> more background information, we would be able to give you a better
> response. Cheers!
>
> Erick Ramirez  |  Developer Relations
>
> erick.rami...@datastax.com | datastax.com 
> 
>  
>  
>
> 
>
>
>
> On Fri, 14 Feb 2020 at 04:39, manish khandelwal <
> manishkhandelwa...@gmail.com> wrote:
>
>> Hi
>>
>> I see a corrupt SSTable in one of my keyspace table on one node. Cluster
>> is 3 nodes with replication 3. Cassandra version is 3.11.2.
>> I am thinking on following lines to resolve the corrupt SSTable issue.
>> 1. Run nodetool scrub.
>> 2. If step 1 fails, run offline sstabablescrub.
>> 3. If step 2 fails, stop node. Remove all SSTables from problematic
>> table.Start the node and run full repair on table.I am removing all
>> SSTABLES of the particular table so as to avoid resurrection of data or any
>> data corruption.
>>
>> I would like to know are there any side effects of executing step 3 if
>> step 1 and step 2 fails.
>>
>> Regards
>> Manish
>>
>>
>>
>>
>>


Re: AWS I3.XLARGE retiring instances advices

2020-02-13 Thread Erick Ramirez
Interesting... though it feels a bit extreme unless you're dealing with a
cluster that's constantly dropping mutations. In which case, you have
bigger problems anyway. :)


Re: AWS I3.XLARGE retiring instances advices

2020-02-13 Thread Jeff Jirsa
Option 1 is only strictly safe if you run repair while the down replica is
down (otherwise you validate quorum consistency guarantees)

Option 2 is probably easier to manage and wont require any special effort
to avoid violating consistency.

I'd probably go with option 2.


On Thu, Feb 13, 2020 at 7:16 PM Sergio  wrote:

> We have i3xlarge instances with data directory in the XFS filesystem that
> is ephemeral and *hints*, *commit_log* and *saved_caches* in the EBS
> volume.
> Whenever AWS is going to retire the instance due to degraded hardware
> performance is it better:
>
> Option 1)
>- Nodetool drain
>- Stop cassandra
>- Restart the machine from aws-cli to be restored in a different VM
> from the hypervisor
>- Start Cassandra with -Dcassandra.replace_address
>- We lose only the ephemeral but the commit_logs, hints, saved_cache
> will be there
>
>
> OR
>
> Option 2)
>  - Add a new node and wait for the NORMAL status
>  - Decommission the one that is going to be retired
>  - Run cleanup with cstar across the datacenters
>
> ?
>
> Thanks,
>
> Sergio
>


Re: AWS I3.XLARGE retiring instances advices

2020-02-13 Thread Sergio
Thank you for the advices!

Best!

Sergio

On Thu, Feb 13, 2020, 7:44 PM Erick Ramirez 
wrote:

> Option 1 is a cheaper option because the cluster doesn't need to rebalance
> (with the loss of a replica) post-decommission then rebalance again when
> you add a new node.
>
> The hints directory on EBS is irrelevant because it would only contain
> mutations to replay to down replicas if the node was a coordinator. In the
> scenario where the node itself goes down, other nodes will be storing hints
> for this down node. The saved_caches are also useless if you're
> bootstrapping the node into the cluster because the cache entries are only
> valid for the previous data files, not the newly streamed files from the
> bootstrap. Similarly, your commitlog directory will be empty -- that's
> the whole point of running nodetool drain. :)
>
> A little off-topic but *personally* I would co-locate the commitlog on
> the same 950GB NVMe SSD as the data files. You would get a much better
> write performance from the nodes compared to EBS and they shouldn't hurt
> your reads since the NVMe disks have very high IOPS. I think they can
> sustain 400K+ IOPS (don't quote me). I'm sure others will comment if they
> have a different experience. And of course, YMMV. Cheers!
>
>
>
> On Fri, 14 Feb 2020 at 14:16, Sergio  wrote:
>
>> We have i3xlarge instances with data directory in the XFS filesystem that
>> is ephemeral and *hints*, *commit_log* and *saved_caches* in the EBS
>> volume.
>> Whenever AWS is going to retire the instance due to degraded hardware
>> performance is it better:
>>
>> Option 1)
>>- Nodetool drain
>>- Stop cassandra
>>- Restart the machine from aws-cli to be restored in a different VM
>> from the hypervisor
>>- Start Cassandra with -Dcassandra.replace_address
>>- We lose only the ephemeral but the commit_logs, hints, saved_cache
>> will be there
>>
>>
>> OR
>>
>> Option 2)
>>  - Add a new node and wait for the NORMAL status
>>  - Decommission the one that is going to be retired
>>  - Run cleanup with cstar across the datacenters
>>
>> ?
>>
>> Thanks,
>>
>> Sergio
>>
>


Re: New seed node in the cluster immediately UN without passing for UJ state

2020-02-13 Thread Erick Ramirez
Not a problem. And I've just responded on the new thread. Cheers! 

>


Re: AWS I3.XLARGE retiring instances advices

2020-02-13 Thread Erick Ramirez
Option 1 is a cheaper option because the cluster doesn't need to rebalance
(with the loss of a replica) post-decommission then rebalance again when
you add a new node.

The hints directory on EBS is irrelevant because it would only contain
mutations to replay to down replicas if the node was a coordinator. In the
scenario where the node itself goes down, other nodes will be storing hints
for this down node. The saved_caches are also useless if you're
bootstrapping the node into the cluster because the cache entries are only
valid for the previous data files, not the newly streamed files from the
bootstrap. Similarly, your commitlog directory will be empty -- that's the
whole point of running nodetool drain. :)

A little off-topic but *personally* I would co-locate the commitlog on the
same 950GB NVMe SSD as the data files. You would get a much better write
performance from the nodes compared to EBS and they shouldn't hurt your
reads since the NVMe disks have very high IOPS. I think they can sustain
400K+ IOPS (don't quote me). I'm sure others will comment if they have a
different experience. And of course, YMMV. Cheers!



On Fri, 14 Feb 2020 at 14:16, Sergio  wrote:

> We have i3xlarge instances with data directory in the XFS filesystem that
> is ephemeral and *hints*, *commit_log* and *saved_caches* in the EBS
> volume.
> Whenever AWS is going to retire the instance due to degraded hardware
> performance is it better:
>
> Option 1)
>- Nodetool drain
>- Stop cassandra
>- Restart the machine from aws-cli to be restored in a different VM
> from the hypervisor
>- Start Cassandra with -Dcassandra.replace_address
>- We lose only the ephemeral but the commit_logs, hints, saved_cache
> will be there
>
>
> OR
>
> Option 2)
>  - Add a new node and wait for the NORMAL status
>  - Decommission the one that is going to be retired
>  - Run cleanup with cstar across the datacenters
>
> ?
>
> Thanks,
>
> Sergio
>


Re: New seed node in the cluster immediately UN without passing for UJ state

2020-02-13 Thread Sergio
Thank you very much for this helpful information!

I opened a new thread for the other question :)

Sergio

Il giorno gio 13 feb 2020 alle ore 19:22 Erick Ramirez <
erick.rami...@datastax.com> ha scritto:

> I want to have more than one seed node in each DC, so unless I don't
>> restart the node after changing the seed_list in that node it will not
>> become the seed.
>
>
> That's not really going to hurt you if you have other seeds in other DCs.
> But if you're willing to take the hit from the restart then feel free to do
> so. Just saying that it's not necessary to do it immediately so the option
> is there for you. :)
>
>
> Do I need to update the seed_list across all the nodes even in separate
>> DCs and perform a rolling restart even across DCs or the restart should be
>> happening only on the new node that I want as a seed?
>
>
> You generally want to make the seeds list the same across all nodes in the
> cluster. You want to avoid the situation where lots of nodes are used as
> seeds by various nodes. Limiting the seeds to 2 per DC means that gossip
> convergence will happen much faster. Cheers!
>
>>


Re: New seed node in the cluster immediately UN without passing for UJ state

2020-02-13 Thread Erick Ramirez
>
> I want to have more than one seed node in each DC, so unless I don't
> restart the node after changing the seed_list in that node it will not
> become the seed.


That's not really going to hurt you if you have other seeds in other DCs.
But if you're willing to take the hit from the restart then feel free to do
so. Just saying that it's not necessary to do it immediately so the option
is there for you. :)


Do I need to update the seed_list across all the nodes even in separate DCs
> and perform a rolling restart even across DCs or the restart should be
> happening only on the new node that I want as a seed?


You generally want to make the seeds list the same across all nodes in the
cluster. You want to avoid the situation where lots of nodes are used as
seeds by various nodes. Limiting the seeds to 2 per DC means that gossip
convergence will happen much faster. Cheers!

>


AWS I3.XLARGE retiring instances advices

2020-02-13 Thread Sergio
We have i3xlarge instances with data directory in the XFS filesystem that
is ephemeral and *hints*, *commit_log* and *saved_caches* in the EBS
volume.
Whenever AWS is going to retire the instance due to degraded hardware
performance is it better:

Option 1)
   - Nodetool drain
   - Stop cassandra
   - Restart the machine from aws-cli to be restored in a different VM from
the hypervisor
   - Start Cassandra with -Dcassandra.replace_address
   - We lose only the ephemeral but the commit_logs, hints, saved_cache
will be there


OR

Option 2)
 - Add a new node and wait for the NORMAL status
 - Decommission the one that is going to be retired
 - Run cleanup with cstar across the datacenters

?

Thanks,

Sergio


Re: New seed node in the cluster immediately UN without passing for UJ state

2020-02-13 Thread Sergio
Right now yes I have one seed per DC.

I want to have more than one seed node in each DC, so unless I don't
restart the node after changing the seed_list in that node it will not
become the seed.

Do I need to update the seed_list across all the nodes even in separate DCs
and perform a rolling restart even across DCs or the restart should be
happening only on the new node that I want as a seed?

The reason each Datacenter has:
a seed from the current DC belongs to and a seed from the other DC.

Thanks,

Sergio


Il giorno gio 13 feb 2020 alle ore 18:41 Erick Ramirez <
erick.rami...@datastax.com> ha scritto:

> 1) If I don't restart the node after changing the seed list this will
>> never become the seed and I would like to be sure that I don't find my self
>> in a spot where I don't have seed nodes and this means that I can not add a
>> node in the cluster
>
>
> Are you saying you only have 1 seed node in the seeds list of each node?
> We recommend 2 nodes per DC as seeds -- if one node is down, there's still
> another node in the local DC to contact. In the worst case scenario where 2
> nodes in the local DC are down, then nodes can contact seeds in other DCs.
>
> For the second item, could I make a small request? Since it's unrelated to
> this thread, would you mind starting up a new email thread? It just makes
> it easier for other users to follow the threads in the future if they're
> searching for answers to similar questions. Cheers!
>
>>


Re: New seed node in the cluster immediately UN without passing for UJ state

2020-02-13 Thread Erick Ramirez
>
> 1) If I don't restart the node after changing the seed list this will
> never become the seed and I would like to be sure that I don't find my self
> in a spot where I don't have seed nodes and this means that I can not add a
> node in the cluster


Are you saying you only have 1 seed node in the seeds list of each node? We
recommend 2 nodes per DC as seeds -- if one node is down, there's still
another node in the local DC to contact. In the worst case scenario where 2
nodes in the local DC are down, then nodes can contact seeds in other DCs.

For the second item, could I make a small request? Since it's unrelated to
this thread, would you mind starting up a new email thread? It just makes
it easier for other users to follow the threads in the future if they're
searching for answers to similar questions. Cheers!

>


Re: New seed node in the cluster immediately UN without passing for UJ state

2020-02-13 Thread Sergio
Thank you very much for your response!

2 things:

1) If I don't restart the node after changing the seed list this will never
become the seed and I would like to be sure that I don't find my self in a
spot where I don't have seed nodes and this means that I can not add a node
in the cluster

2) We have i3xlarge instances with data directory in the XFS filesystem
that is ephemeral and hints, commit_log and saved_caches in the EBS volume.
Whenever AWS is going to retire the instance due to degraded hardware
performance is it better:

Option 1)
   - Nodetool drain
   - Stop cassandra
   - Restart the machine from aws to be restored in a different VM from the
hypervisor
   - Start Cassandra with -Dcassandra.replace_address

OR
Option 2)
 - Add a new node and wait for the NORMAL status
 - Decommission the one that is going to be retired
 - Run cleanup with cstar across the datacenters

?

Thanks,

Sergio




Il giorno gio 13 feb 2020 alle ore 18:15 Erick Ramirez <
erick.rami...@datastax.com> ha scritto:

> I did decommission of this node and I did all the steps mentioned except
>> the -Dcassandra.replace_address and now it is streaming correctly!
>
>
> That works too but I was trying to avoid the rebalance operations (like
> streaming to restore replica counts) since they can be expensive.
>
> So basically, if I want this new node as seed should I add its IP address
>> after it joined the cluster and after
>> - nodetool drain
>> - restart cassandra?
>
>
> There's no need to restart C* after updating the seeds list. It will just
> take effect the next time you restart.
>
> I deactivated the future repair happening in the cluster while this node
>> is joining.
>> When you add a node is it better to stop the repair process?
>
>
> It's not necessary to do so if you have sufficient capacity in your
> cluster. Topology changes are just a normal part of a C* cluster's
> operation just like repairs. But when you temporarily disable repairs,
> existing nodes have more capacity to bootstrap a new node so there is a
> benefit there. Cheers!
>
>>


Re: New seed node in the cluster immediately UN without passing for UJ state

2020-02-13 Thread Erick Ramirez
>
> I did decommission of this node and I did all the steps mentioned except
> the -Dcassandra.replace_address and now it is streaming correctly!


That works too but I was trying to avoid the rebalance operations (like
streaming to restore replica counts) since they can be expensive.

So basically, if I want this new node as seed should I add its IP address
> after it joined the cluster and after
> - nodetool drain
> - restart cassandra?


There's no need to restart C* after updating the seeds list. It will just
take effect the next time you restart.

I deactivated the future repair happening in the cluster while this node is
> joining.
> When you add a node is it better to stop the repair process?


It's not necessary to do so if you have sufficient capacity in your
cluster. Topology changes are just a normal part of a C* cluster's
operation just like repairs. But when you temporarily disable repairs,
existing nodes have more capacity to bootstrap a new node so there is a
benefit there. Cheers!

>


Re: New seed node in the cluster immediately UN without passing for UJ state

2020-02-13 Thread Sergio
I did decommission of this node and I did all the steps mentioned except
the -Dcassandra.replace_address and now it is streaming correctly!

So basically, if I want this new node as seed should I add its IP address
after it joined the cluster and after
- nodetool drain
- restart cassandra?

I deactivated the future repair happening in the cluster while this node is
joining.

When you add a node is it better to stop the repair process?

Thank you very much Erick!

Best,

Sergio


Il giorno gio 13 feb 2020 alle ore 17:52 Erick Ramirez <
erick.rami...@datastax.com> ha scritto:

> Should I do something to fix it or leave as it?
>
>
> It depends on what your intentions are. I would use the "replace" method
> to build it correctly. At a high level:
> - remove the IP from it's own seeds list
> - delete the contents of data, commitlog and saved_caches
> - add the replace flag in cassandra-env.sh (
> -Dcassandra.replace_address=its_own_ip)
> - start C*
>
> That should allow the node to "replace itself" in the ring and prevent
> expensive reshuffling/rebalancing of tokens. Cheers!
>
>>


Re: New seed node in the cluster immediately UN without passing for UJ state

2020-02-13 Thread Erick Ramirez
>
> Should I do something to fix it or leave as it?


It depends on what your intentions are. I would use the "replace" method to
build it correctly. At a high level:
- remove the IP from it's own seeds list
- delete the contents of data, commitlog and saved_caches
- add the replace flag in cassandra-env.sh (
-Dcassandra.replace_address=its_own_ip)
- start C*

That should allow the node to "replace itself" in the ring and prevent
expensive reshuffling/rebalancing of tokens. Cheers!

>


Re: New seed node in the cluster immediately UN without passing for UJ state

2020-02-13 Thread Sergio
Thanks for your fast reply!

No repairs are running!

https://cassandra.apache.org/doc/latest/faq/index.html#does-single-seed-mean-single-point-of-failure

I added the node IP itself and the IP of existing seeds and I started
Cassandra.

So the right procedure is not to add in the seed list the new node and an
already existing seed node and then start Cassandra?

What should I do? I am running nodetool nestats and the streaming are
happening from other nodes

Thanks


Il giorno gio 13 feb 2020 alle ore 17:39 Erick Ramirez <
erick.rami...@datastax.com> ha scritto:

> I wanted to add a new node in the cluster and it looks to be working fine
>> but instead to wait for 2-3 hours data streaming like 100GB it immediately
>> went to the UN (UP and NORMAL) state.
>>
>
> Are you running a repair? I can't see how it's possibly receiving 100GB
> since it won't bootstrap.
>


Re: New seed node in the cluster immediately UN without passing for UJ state

2020-02-13 Thread Erick Ramirez
>
> I wanted to add a new node in the cluster and it looks to be working fine
> but instead to wait for 2-3 hours data streaming like 100GB it immediately
> went to the UN (UP and NORMAL) state.
>

Are you running a repair? I can't see how it's possibly receiving 100GB
since it won't bootstrap.


Re: New seed node in the cluster immediately UN without passing for UJ state

2020-02-13 Thread Sergio
Should I do something to fix it or leave as it?

On Thu, Feb 13, 2020, 5:29 PM Jon Haddad  wrote:

> Seeds don't bootstrap, don't list new nodes as seeds.
>
> On Thu, Feb 13, 2020 at 5:23 PM Sergio  wrote:
>
>> Hi guys!
>>
>> I don't know how but this is the first time that I see such behavior. I
>> wanted to add a new node in the cluster and it looks to be working fine but
>> instead to wait for 2-3 hours data streaming like 100GB it immediately went
>> to the UN (UP and NORMAL) state.
>>
>> I saw a bunch of exception in the logs and WARN
>>  [MessagingService-Incoming-/10.1.17.126] 2020-02-14 01:08:07,812
>> IncomingTcpConnection.java:103 - UnknownColumnFamilyException reading from
>> socket; closing
>> org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find table
>> for cfId a5af88d0-24f6-11e9-b009-95ed77b72f6e. If a table was just created,
>> this is likely due to the schema not being fully propagated.  Please wait
>> for schema agreement on table creation.
>> at
>> org.apache.cassandra.config.CFMetaData$Serializer.deserialize(CFMetaData.java:1525)
>> ~[apache-cassandra-3.11.5.jar:3.11.5]
>> at
>> org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize30(PartitionUpdate.java:850)
>> ~[apache-cassandra-3.11.5.jar:3.11.5]
>> at
>> org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize(PartitionUpdate.java:825)
>> ~[apache-cassandra-3.11.5.jar:3.11.5]
>> at
>> org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:415)
>> ~[apache-cassandra-3.11.5.jar:3.11.5]
>> at
>> org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:434)
>> ~[apache-cassandra-3.11.5.jar:3.11.5]
>> at
>> org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:371)
>> ~[apache-cassandra-3.11.5.jar:3.11.5]
>> at org.apache.cassandra.net.MessageIn.read(MessageIn.java:123)
>> ~[apache-cassandra-3.11.5.jar:3.11.5]
>> at
>> org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:195)
>> ~[apache-cassandra-3.11.5.jar:3.11.5]
>> at
>> org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:183)
>> ~[apache-cassandra-3.11.5.jar:3.11.5]
>> at
>> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:94)
>> ~[apache-cassandra-3.11.5.jar:3.11.5]
>>
>> but in the end, it is working...
>>
>> Suggestion?
>>
>> Thanks,
>>
>> Sergio
>>
>


Re: New seed node in the cluster immediately UN without passing for UJ state

2020-02-13 Thread Jon Haddad
Seeds don't bootstrap, don't list new nodes as seeds.

On Thu, Feb 13, 2020 at 5:23 PM Sergio  wrote:

> Hi guys!
>
> I don't know how but this is the first time that I see such behavior. I
> wanted to add a new node in the cluster and it looks to be working fine but
> instead to wait for 2-3 hours data streaming like 100GB it immediately went
> to the UN (UP and NORMAL) state.
>
> I saw a bunch of exception in the logs and WARN
>  [MessagingService-Incoming-/10.1.17.126] 2020-02-14 01:08:07,812
> IncomingTcpConnection.java:103 - UnknownColumnFamilyException reading from
> socket; closing
> org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find table
> for cfId a5af88d0-24f6-11e9-b009-95ed77b72f6e. If a table was just created,
> this is likely due to the schema not being fully propagated.  Please wait
> for schema agreement on table creation.
> at
> org.apache.cassandra.config.CFMetaData$Serializer.deserialize(CFMetaData.java:1525)
> ~[apache-cassandra-3.11.5.jar:3.11.5]
> at
> org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize30(PartitionUpdate.java:850)
> ~[apache-cassandra-3.11.5.jar:3.11.5]
> at
> org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize(PartitionUpdate.java:825)
> ~[apache-cassandra-3.11.5.jar:3.11.5]
> at
> org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:415)
> ~[apache-cassandra-3.11.5.jar:3.11.5]
> at
> org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:434)
> ~[apache-cassandra-3.11.5.jar:3.11.5]
> at
> org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:371)
> ~[apache-cassandra-3.11.5.jar:3.11.5]
> at org.apache.cassandra.net.MessageIn.read(MessageIn.java:123)
> ~[apache-cassandra-3.11.5.jar:3.11.5]
> at
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:195)
> ~[apache-cassandra-3.11.5.jar:3.11.5]
> at
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:183)
> ~[apache-cassandra-3.11.5.jar:3.11.5]
> at
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:94)
> ~[apache-cassandra-3.11.5.jar:3.11.5]
>
> but in the end, it is working...
>
> Suggestion?
>
> Thanks,
>
> Sergio
>


New seed node in the cluster immediately UN without passing for UJ state

2020-02-13 Thread Sergio
Hi guys!

I don't know how but this is the first time that I see such behavior. I
wanted to add a new node in the cluster and it looks to be working fine but
instead to wait for 2-3 hours data streaming like 100GB it immediately went
to the UN (UP and NORMAL) state.

I saw a bunch of exception in the logs and WARN
 [MessagingService-Incoming-/10.1.17.126] 2020-02-14 01:08:07,812
IncomingTcpConnection.java:103 - UnknownColumnFamilyException reading from
socket; closing
org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find table
for cfId a5af88d0-24f6-11e9-b009-95ed77b72f6e. If a table was just created,
this is likely due to the schema not being fully propagated.  Please wait
for schema agreement on table creation.
at
org.apache.cassandra.config.CFMetaData$Serializer.deserialize(CFMetaData.java:1525)
~[apache-cassandra-3.11.5.jar:3.11.5]
at
org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize30(PartitionUpdate.java:850)
~[apache-cassandra-3.11.5.jar:3.11.5]
at
org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize(PartitionUpdate.java:825)
~[apache-cassandra-3.11.5.jar:3.11.5]
at
org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:415)
~[apache-cassandra-3.11.5.jar:3.11.5]
at
org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:434)
~[apache-cassandra-3.11.5.jar:3.11.5]
at
org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:371)
~[apache-cassandra-3.11.5.jar:3.11.5]
at org.apache.cassandra.net.MessageIn.read(MessageIn.java:123)
~[apache-cassandra-3.11.5.jar:3.11.5]
at
org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:195)
~[apache-cassandra-3.11.5.jar:3.11.5]
at
org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:183)
~[apache-cassandra-3.11.5.jar:3.11.5]
at
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:94)
~[apache-cassandra-3.11.5.jar:3.11.5]

but in the end, it is working...

Suggestion?

Thanks,

Sergio


Re: Corruption of frozen UDT during upgrade

2020-02-13 Thread Erick Ramirez
Paul, if you do a sstabledump in C* 3.0 (before upgrading) and compare it
to the dump output after upgrading to C* 3.11 then you will see that the
cell names in the outputs are different. This is the symptom of the broken
serialization header which leads to various exceptions during compactions
and reads.

CASSANDRA-15035  has
been fixed but is not yet included in a released version of C* (earmarked
for C* 3.11.6, 4.0). The patched version of sstablescrub includes a new
flag "-e" which rewrites the SSTable serialization headers to include the
missing info for the frozen UDTs. See NEWS.txt
 for
more details.

If you want to run a verification test on your SSTables, you can follow
this procedure as a workaround:
- copy the SSTables to another server that's not part of any C* cluster
- download the DSE 5.1 (equivalent to C* 3.11) tarball from
https://downloads.datastax.com/enterprise/dse-5.1.17-bin.tar.gz
- unpack the tarball (more details here

)
- run sstablescrub -e fix-only to just fix the headers without doing a
normal scrub

If the headers are fine, the scrub will be a no-op. Otherwise, it will
report that new metadata files are being written. For more details, see
https://support.datastax.com/hc/en-us/articles/360025955351. Cheers!

Erick Ramirez  |  Developer Relations

erick.rami...@datastax.com | datastax.com 

 
 





On Fri, 14 Feb 2020 at 01:43, Paul Chandler  wrote:

> Hi all,
>
> I have looked at the release notes for the up coming release 3.11.6 and
> seen the part about corruption of frozen UDT types during upgrade from 3.0.
>
> We have a number of cluster using UDT and have been upgrading to 3.11.4
> and haven’t noticed any problems.
>
> In the ticket ( CASSANDRA-15035 ) it does not seem to specify how to
> reproduce this problem, so I tried using the following definition:
>
> CREATE TYPE supplier_holiday_udt (
> holiday_type int,
> holiday_start date,
> holiday_end date
> );
>
> CREATE TABLE supplier (
> supplier_id int PRIMARY KEY,
> calendar frozen
> )
>
> I performed an upgrade from 3.0.15 to 3.11.4, including running a nodetool
> upgradesstables.
>
> There were no errors during the process and I can still read the data in
> supplier table.
>
> Can anyone tell me how I reproduce this problem, or check that the
> clusters we have already upgraded do not have any problems .
>
> Thanks
>
> Paul
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>


Re: Corrupt SSTable Cassandra 3.11.2

2020-02-13 Thread Erick Ramirez
You need to stop C* in order to run the offline sstable scrub utility.
That's why it's referred to as "offline". :)

Do you have any idea on what caused the corruption? It's highly unusual
that you're thinking of removing all the files for just one table.
Typically if the corruption was a result of a faulty disk or hardware
failure, it wouldn't be isolated to just one table. If you provide a bit
more background information, we would be able to give you a better
response. Cheers!

Erick Ramirez  |  Developer Relations

erick.rami...@datastax.com | datastax.com 

 
 





On Fri, 14 Feb 2020 at 04:39, manish khandelwal <
manishkhandelwa...@gmail.com> wrote:

> Hi
>
> I see a corrupt SSTable in one of my keyspace table on one node. Cluster
> is 3 nodes with replication 3. Cassandra version is 3.11.2.
> I am thinking on following lines to resolve the corrupt SSTable issue.
> 1. Run nodetool scrub.
> 2. If step 1 fails, run offline sstabablescrub.
> 3. If step 2 fails, stop node. Remove all SSTables from problematic
> table.Start the node and run full repair on table.I am removing all
> SSTABLES of the particular table so as to avoid resurrection of data or any
> data corruption.
>
> I would like to know are there any side effects of executing step 3 if
> step 1 and step 2 fails.
>
> Regards
> Manish
>
>
>
>
>


Re: [EXTERNAL] Cassandra 3.11.X upgrades

2020-02-13 Thread Sergio
   - Verify that nodetool upgradesstables has completed successfully on all
   nodes from any previous upgrade
   - Turn off repairs and any other streaming operations (add/remove nodes)
   - Nodetool drain on the node that needs to be stopped (seeds first,
   preferably)
   - Stop an un-upgraded node (seeds first, preferably)
   - Install new binaries and configs on the down node
   - Restart that node and make sure it comes up clean (it will function
   normally in the cluster – even with mixed versions)
   - nodetool statusbinary to verify if it is up and running
   - Repeat for all nodes
   - Once the binary upgrade has been performed in all the nodes: Run
   upgradesstables on each node (as many at a time as your load will allow).
   Minor upgrades usually don’t require this step (only if the sstable format
   has changed), but it is good to check.
   - NOTE: in most cases applications can keep running and will not notice
   much impact – unless the cluster is overloaded and a single node down
   causes impact.



   I added 2 points to the list to clarify.

   Should we add this in a FAQ in the cassandra doc or in the awesome
   cassandra https://cassandra.link/awesome/

   Thanks,

   Sergio


Il giorno mer 12 feb 2020 alle ore 10:58 Durity, Sean R <
sean_r_dur...@homedepot.com> ha scritto:

> Check the readme.txt for any upgrade notes, but the basic procedure is to:
>
>- Verify that nodetool upgradesstables has completed successfully on
>all nodes from any previous upgrade
>- Turn off repairs and any other streaming operations (add/remove
>nodes)
>- Stop an un-upgraded node (seeds first, preferably)
>- Install new binaries and configs on the down node
>- Restart that node and make sure it comes up clean (it will function
>normally in the cluster – even with mixed versions)
>- Repeat for all nodes
>- Run upgradesstables on each node (as many at a time as your load
>will allow). Minor upgrades usually don’t require this step (only if the
>sstable format has changed), but it is good to check.
>- NOTE: in most cases applications can keep running and will not
>notice much impact – unless the cluster is overloaded and a single node
>down causes impact.
>
>
>
>
>
>
>
> Sean Durity – Staff Systems Engineer, Cassandra
>
>
>
> *From:* Sergio 
> *Sent:* Wednesday, February 12, 2020 11:36 AM
> *To:* user@cassandra.apache.org
> *Subject:* [EXTERNAL] Cassandra 3.11.X upgrades
>
>
>
> Hi guys!
>
> How do you usually upgrade your cluster for minor version upgrades?
>
> I tried to add a node with 3.11.5 version to a test cluster with 3.11.4
> nodes.
>
> Is there any restriction?
>
> Best,
>
> Sergio
>
> --
>
> The information in this Internet Email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this Email
> by anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be
> taken in reliance on it, is prohibited and may be unlawful. When addressed
> to our clients any opinions or advice contained in this Email are subject
> to the terms and conditions expressed in any applicable governing The Home
> Depot terms of business or client engagement letter. The Home Depot
> disclaims all responsibility and liability for the accuracy and content of
> this attachment and for any damages or losses arising from any
> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
> items of a destructive nature, which may be contained in this attachment
> and shall not be liable for direct, indirect, consequential or special
> damages in connection with this e-mail message or its attachment.
>


Corrupt SSTable Cassandra 3.11.2

2020-02-13 Thread manish khandelwal
Hi

I see a corrupt SSTable in one of my keyspace table on one node. Cluster is
3 nodes with replication 3. Cassandra version is 3.11.2.
I am thinking on following lines to resolve the corrupt SSTable issue.
1. Run nodetool scrub.
2. If step 1 fails, run offline sstabablescrub.
3. If step 2 fails, stop node. Remove all SSTables from problematic
table.Start the node and run full repair on table.I am removing all
SSTABLES of the particular table so as to avoid resurrection of data or any
data corruption.

I would like to know are there any side effects of executing step 3 if step
1 and step 2 fails.

Regards
Manish


Re: [EXTERNAL] Re: Cassandra Encyrption between DC

2020-02-13 Thread Jai Bheemsen Rao Dhanwada
thank you

On Thu, Feb 13, 2020 at 6:30 AM Durity, Sean R 
wrote:

> I will just add-on that I usually reserve security changes as the primary
> exception where app downtime may be necessary with Cassandra. (DSE has some
> Transitional tools that are useful, though.) Sometimes a short outage is
> preferred over a longer, more-complicated attempt to keep the app up. And,
> in many cases, there is no way to guarantee availability when making
> security-related changes (new cipher suites, adding encryption, turning on
> authentication, etc.). It is better to try and have those implemented from
> the beginning, where possible.
>
>
>
>
>
> Sean Durity
>
>
>
> *From:* Erick Ramirez 
> *Sent:* Wednesday, February 12, 2020 9:02 PM
> *To:* user@cassandra.apache.org
> *Subject:* [EXTERNAL] Re: Cassandra Encyrption between DC
>
>
>
> I've just seen your questions on ASF Slack and didn't immediately make the
> connection that this post in the mailing list is one and the same. I
> understand what you're doing now -- you have an existing DC with no
> encryption and you want to add a new DC with encryption enabled but don't
> want the downtime associated with enabling encryption on the existing DC.
>
>
>
> As driftx, exlt, myself & co pointed out, there isn't a "transitional
> path" of implementing it without downtime in the current (released)
> versions of C*. Cheers!
>
> --
>
> The information in this Internet Email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this Email
> by anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be
> taken in reliance on it, is prohibited and may be unlawful. When addressed
> to our clients any opinions or advice contained in this Email are subject
> to the terms and conditions expressed in any applicable governing The Home
> Depot terms of business or client engagement letter. The Home Depot
> disclaims all responsibility and liability for the accuracy and content of
> this attachment and for any damages or losses arising from any
> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
> items of a destructive nature, which may be contained in this attachment
> and shall not be liable for direct, indirect, consequential or special
> damages in connection with this e-mail message or its attachment.
>


Re: Connection reset by peer

2020-02-13 Thread Reid Pinchback
Since ping is ICMP, not TCP, you probably want to investigate a mix of TCP and 
CPU stats to see what is behind the slow pings. I’d guess you are getting 
network impacts beyond what the ping times are hinting at.  ICMP isn’t subject 
to retransmission, so your TCP situation could be far worse than ping latencies 
may suggest.

From: "Hanauer, Arnulf, Vodacom South Africa (External)" 

Reply-To: "user@cassandra.apache.org" 
Date: Thursday, February 13, 2020 at 2:06 AM
To: "user@cassandra.apache.org" 
Subject: RE: Connection reset by peer

Message from External Sender

Thanks to both Erik/Shaun for your responses,

Both your explanations are plausible in my scenario, this is what I have done 
subsequently which seems to have improved the situation,



  1.  The cluster was very busy trying to run repairs/sync the new replicas 
(about 350GB)  in the new DC (Gossip was temporarily marking down the source 
nodes at different points in time)

  *   Disabled Reaper, stopped all validation/repairs



  1.  I removed the new replica’s to stop any potential read_repair across the 
WAN

  *   I will recreate the replica’s over the weekend during quiet time & run 
the repair to sync



  1.  The network ping response time was quite high around 10-15msec at error 
times

  *   This dropped to under 1ms later in the day when some jobs were rerun 
successfully



  1.  I will apply some of the recommended TCP_KEEPALIVE settings Shaun pointed 
me to



Last question: In all your experiences, how high can the latency (simple ping 
response times go) before it becomes a problem? (Obviously the lower the better 
but is there some sort of cut off/formula where problems can be expected 
intermittently like the connection resets)




Kind regards

Arnulf Hanauer



From: Erick Ramirez 
Sent: Thursday, 13 February 2020 03:10
To: user@cassandra.apache.org
Subject: Re: Connection reset by peer

I generally see these exceptions when the cluster is overloaded. I think what's 
happening is that when the app/driver sends a read request, the coordinator 
takes a long time to respond because the nodes are busy serving other requests. 
The driver gives up (client-side timeout reached) and the socket is closed. 
Meanwhile, the coordinator eventually gets results from replicas and tries to 
send the response back to the app/driver but can't because the connection is no 
longer there. Does this scenario sound plausible for your cluster?


Erick Ramirez  |  Developer Relations

erick.rami...@datastax.com | 
datastax.com
[Image removed by 
sender.][Image
 removed by 
sender.][Image
 removed by 
sender.][Image
 removed by 
sender.][Image
 removed by 
sender.]

[Image removed by 
sender.]


On Wed, 12 Feb 2020 at 21:13, Hanauer, Arnulf, Vodacom South Africa (External) 
mailto:arnulf.hana...@vcontractor.co.za>> 
wrote:
Hi Cassandra folks,

We are getting a lot of these errors and transactions are timing out and I was 
wondering if this can be caused by Cassandra itself or if this is a genuine 
Linux network issue only. The client job reports Cassandra node down after this 
occurs but I suspect this is 

Corruption of frozen UDT during upgrade

2020-02-13 Thread Paul Chandler
Hi all,

I have looked at the release notes for the up coming release 3.11.6 and seen 
the part about corruption of frozen UDT types during upgrade from 3.0.

We have a number of cluster using UDT and have been upgrading to 3.11.4 and 
haven’t noticed any problems.

In the ticket ( CASSANDRA-15035 ) it does not seem to specify how to reproduce 
this problem, so I tried using the following definition:

CREATE TYPE supplier_holiday_udt (
holiday_type int,
holiday_start date,
holiday_end date
);

CREATE TABLE supplier (
supplier_id int PRIMARY KEY,
calendar frozen
)

I performed an upgrade from 3.0.15 to 3.11.4, including running a nodetool 
upgradesstables.

There were no errors during the process and I can still read the data in 
supplier table.

Can anyone tell me how I reproduce this problem, or check that the clusters we 
have already upgraded do not have any problems .

Thanks 

Paul 
-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



RE: [EXTERNAL] Cassandra 3.11.X upgrades

2020-02-13 Thread Durity, Sean R
+1 on nodetool drain. I added that to our upgrade automation and it really 
helps with post-upgrade start-up time.

Sean Durity

From: Erick Ramirez 
Sent: Wednesday, February 12, 2020 10:29 PM
To: user@cassandra.apache.org
Subject: Re: [EXTERNAL] Cassandra 3.11.X upgrades

Yes to the steps. The only thing I would add is to run a nodetool drain before 
shutting C* down so all mutations are flushed to SSTables and there won't be 
any commit logs to replay on startup.

Also, the usual "backup your cluster and configuration files" boilerplate 
applies. 



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


RE: [EXTERNAL] Re: Cassandra Encyrption between DC

2020-02-13 Thread Durity, Sean R
I will just add-on that I usually reserve security changes as the primary 
exception where app downtime may be necessary with Cassandra. (DSE has some 
Transitional tools that are useful, though.) Sometimes a short outage is 
preferred over a longer, more-complicated attempt to keep the app up. And, in 
many cases, there is no way to guarantee availability when making 
security-related changes (new cipher suites, adding encryption, turning on 
authentication, etc.). It is better to try and have those implemented from the 
beginning, where possible.


Sean Durity

From: Erick Ramirez 
Sent: Wednesday, February 12, 2020 9:02 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Cassandra Encyrption between DC

I've just seen your questions on ASF Slack and didn't immediately make the 
connection that this post in the mailing list is one and the same. I understand 
what you're doing now -- you have an existing DC with no encryption and you 
want to add a new DC with encryption enabled but don't want the downtime 
associated with enabling encryption on the existing DC.

As driftx, exlt, myself & co pointed out, there isn't a "transitional path" of 
implementing it without downtime in the current (released) versions of C*. 
Cheers!



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


Re: Connection reset by peer

2020-02-13 Thread Erick Ramirez
>
> Last question: In all your experiences, how high can the latency (simple
> ping response times go) before it becomes a problem? (Obviously the lower
> the better but is there some sort of cut off/formula where problems can be
> expected intermittently like the connection resets)


Unfortunately, there's no magic number because what's "acceptable" to your
app is driven by your business rules aka SLA. What is acceptable to 1 use
case won't necessarily apply to another. You need to design your data
model, infrastructure and cluster capacity to meet the business
requirements. Cheers!

>