Re: GC pauses way up after single node Cassandra 2.2 -> 3.11 binary upgrade

2020-10-28 Thread Steinmaurer, Thomas
Leon,

we had an awful performance/throughput experience with 3.x coming from 2.1. 
3.11 is simply a memory hog, if you are using batch statements on the client 
side. If so, you are likely affected by 
https://issues.apache.org/jira/browse/CASSANDRA-16201


Regards,
Thomas

From: Leon Zaruvinsky 
Sent: Wednesday, October 28, 2020 5:21 AM
To: user@cassandra.apache.org 
Subject: Re: GC pauses way up after single node Cassandra 2.2 -> 3.11 binary 
upgrade



Our JVM options are unchanged between 2.2 and 3.11

For the sake of clarity, do you mean:
(a) you're using the default JVM options in 3.11 and it's different to the 
options you had in 2.2?
(b) you've copied the same JVM options you had in 2.2 to 3.11?

(b), which are the default options from 2.2 (and I believe the default options 
in 3.11 from a brief glance).

Copied here for clarity, though I'm skeptical that GC settings are actually a 
cause here because I would expect them to only impact the upgraded node and not 
the cluster overall.

### CMS Settings
-XX:+UseParNewGC
XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled
-XX:SurvivorRatio=8
-XX:MaxTenuringThreshold=1
XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly
-XX:CMSWaitDuration=1
-XX:+CMSParallelInitialMarkEnabled
-XX:+CMSEdenChunksRecordAlways
XX:+CMSClassUnloadingEnabled

The distinction is important because at the moment, you need to go through a 
process of elimination to identify the cause.


Read throughput (rate, bytes read/range scanned, etc.) seems fairly consistent 
before and after the upgrade across all nodes.

What I was trying to get at is whether the upgraded node was getting hit with 
more traffic compared to the other nodes since it will indicate that the longer 
GCs are just the symptom, not the cause.


I don't see any distinct change, nor do I see an increase in traffic to the 
upgraded node that would result in longer GC pauses.  Frankly I don't see any 
changes or aberrations in client-related metrics at all that correlate to the 
GC pauses, except for the corresponding timeouts.
The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4020 Linz, Austria, Am 
F?nfundzwanziger Turm 20


RE: Cassandra going OOM due to tombstones (heapdump screenshots provided)

2020-01-30 Thread Steinmaurer, Thomas
If possible, prefer m5 over m4, cause they are running on a newer hypervisor 
(KVM-based), single core performance is ~ 10% better compared to m4 with m5 
even being slightly cheaper than m4.

Thomas

From: Erick Ramirez 
Sent: Donnerstag, 30. Jänner 2020 03:00
To: user@cassandra.apache.org
Subject: Re: Cassandra going OOM due to tombstones (heapdump screenshots 
provided)

It looks like the number of tables is the problem, with 5,000 - 10,000 tables, 
that is way above the recommendations.
Take a look here: 
https://docs.datastax.com/en/dse-planning/doc/planning/planningAntiPatterns.html#planningAntiPatterns__AntiPatTooManyTables
This suggests that 5-10GB of heap is going to be taken up just with the table 
information ( 1MB per table )

+1 to Paul Chandler & Hannu Kröger. Although there isn't a hard limit on the 
maximum number of tables, there's a reasonable number that is operationally 
sound and we recommend that 200 total tables per cluster is the sweet spot. We 
know from experience that the clusters suffer as the total number of tables 
approaches 400+ so stick as close to 200 as possible. I had these 
recommendations published in the DataStax Docs a couple of years ago to provide 
clear guidance to users.

1000 keyspaces suggests that you have a multi-tenant setup. Perhaps you can 
distribute the keyspaces across multiple clusters so each cluster has less than 
500 tables. To be clear, the number of keyspaces isn't relevant in this context 
-- it's the total number of tables across all keyspaces that matters.

- We observed this problem on a c4.4xlarge (AWS EC2) instance having 30GB RAM 
with 8GB heap
- We observed the same problem on a c4.8xlarge having 60GB RAM with 12GB heap

A little off-topic but it sounds like you've been evaluating different instance 
types. The c4 instances may not be ideal for your circumstances because you're 
trading less RAM for more powerful CPUs. I generally recommend m4 instances 
because they're a good balance of CPU and RAM for the money. In a m4.4xlarge 
configuration, what you lose in raw CPU power over a c4.4xlarge (2.4GHz Intel 
Xeon E5-2676 vs 2.9GHz E5-2666) you gain 34GB of RAM (64GB vs 30GB) for nearly 
identical pricing. I think the m4 type is better value compared to c4. YMMV but 
run your tests and you might be surprised.

In relation to the heap, I imagine you're using CMS so allocate at least 16GB 
but 20 or 24GB might turn out to be the ideal size for your cluster based on 
your testing. Just make sure you reserve at least 8GB of RAM for the operating 
system.

I hope this helps. Cheers!
The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4020 Linz, Austria, Am 
Fünfundzwanziger Turm 20


Cassandra 3.0.19 and 3.11.5 cannot start on Windows

2020-01-10 Thread Steinmaurer, Thomas
Hello,

https://issues.apache.org/jira/browse/CASSANDRA-15426. According to the ticket, 
changes in https://issues.apache.org/jira/browse/CASSANDRA-15053 likely being 
the root cause.

Will this be fixed in 3.0.20 and 3.11.6?

Thanks,
Thomas
The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4020 Linz, Austria, Am 
F?nfundzwanziger Turm 20


Cassandra 3.0.18 showing ~ 10x higher on-heap allocations for processing batch messages compared to 2.1.18

2019-11-15 Thread Steinmaurer, Thomas
Hello,

looks like 3.0.18 can't handle the same write ingest compared to 2.1.18 on the 
same hardware. Basically it looks like the write path, processing batch 
messages show 10x higher numbers in regard to on-heap allocations.

I've tried to summarize the finding on the following ticket: 
https://issues.apache.org/jira/browse/CASSANDRA-15430

Haven't been able to attach the JFR Sessions onto the ticket due to size 
limits, thus please let me know, if they can be uploaded somewhere else.

Would be great, if someone could have a look. Thanks!

Thomas
The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freist?dterstra?e 313


Cassandra 3.0.20 release ETA?

2019-11-13 Thread Steinmaurer, Thomas
Hello,

sorry, I know, 3.0.19 has been released just recently. Any ETA for 3.0.20?

Reason is that we are having quite some pain with on-heap pressure after moving 
from 2.1.18 to 3.0.18.
https://issues.apache.org/jira/browse/CASSANDRA-15400

Thanks a lot,
Thomas
The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freist?dterstra?e 313


RE: Cassandra 3.0.18 went OOM several hours after joining a cluster

2019-11-06 Thread Steinmaurer, Thomas
Reid,

thanks for thoughts.

I agree with your last comment and I’m pretty sure/convinced that the 
increasing number of SSTables is causing the issue, although I’m not sure if 
compaction or read requests (after the node flipped from UJ to UN) or both, but 
I tend more towards client read requests resulting in accessing a high number 
of SSTables which basically results in ~ 2Mbyte on-heap usage per 
BigTableReader instance, with ~ 5K such object instances on the heap.

The big question for us is why this starts to pop-up with Cas 3.0 without 
seeing this with 2.1 in > 3 years production usage.

To avoid double work, I will try to continue providing additional information / 
thoughts on the Cassandra ticket.

Regards,
Thomas

From: Reid Pinchback 
Sent: Mittwoch, 06. November 2019 18:28
To: user@cassandra.apache.org
Subject: Re: Cassandra 3.0.18 went OOM several hours after joining a cluster

The other thing that comes to mind is that the increase in pending compactions 
suggests back pressure on compaction activity.  GC is only one possible source 
of that.  Between your throughput setting and how your disk I/O is set up, 
maybe that’s throttling you to a rate where the rate of added reasons for 
compactions > the rate of compactions completed.

In fact, the more that I think about it, I wonder about that a lot.

If you can’t keep up with compactions, then operations have to span more and 
more SSTables over time.  You’ll keep holding on to what you read, as you read 
more of them, until eventually…pop.


From: Reid Pinchback 
mailto:rpinchb...@tripadvisor.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
mailto:user@cassandra.apache.org>>
Date: Wednesday, November 6, 2019 at 12:11 PM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
mailto:user@cassandra.apache.org>>
Subject: Re: Cassandra 3.0.18 went OOM several hours after joining a cluster

Message from External Sender
My first thought was that you were running into the merkle tree depth problem, 
but the details on the ticket don’t seem to confirm that.

It does look like eden is too small.   C* lives in Java’s GC pain point, a lot 
of medium-lifetime objects.  If you haven’t already done so, you’ll want to 
configure as many things to be off-heap as you can, but I’d definitely look at 
improving the ratio of eden to old gen, and see if you can get the young gen GC 
activity to be more successful at sweeping away the medium-lived objects.

All that really comes to mind is if you’re getting to a point where GC isn’t 
coping.  That can be hard to sometimes spot on metrics with coarse granularity. 
 Per-second metrics might show CPU cores getting pegged.

I’m not sure that GC tuning eliminates this problem, but if it isn’t being 
caused by that, GC tuning may at least improve the visibility of the underlying 
problem.

From: "Steinmaurer, Thomas" 
mailto:thomas.steinmau...@dynatrace.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
mailto:user@cassandra.apache.org>>
Date: Wednesday, November 6, 2019 at 11:27 AM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
mailto:user@cassandra.apache.org>>
Subject: Cassandra 3.0.18 went OOM several hours after joining a cluster

Message from External Sender
Hello,

after moving from 2.1.18 to 3.0.18, we are facing OOM situations after several 
hours a node has successfully joined a cluster (via auto-bootstrap).

I have created the following ticket trying to describe the situation, including 
hprof / MAT screens: 
https://issues.apache.org/jira/browse/CASSANDRA-15400<https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__issues.apache.org_jira_browse_CASSANDRA-2D15400%26d%3DDwMF-g%26c%3D9Hv6XPedRSA-5PSECC38X80c1h60_XWA4z1k_R1pROA%26r%3DOIgB3poYhzp3_A7WgD7iBCnsJaYmspOa2okNpf6uqWc%26m%3DlnQdpMrbVjmjj_af9BwSn1ftI8H2uSyvAya3887aDLk%26s%3DBEeQbrRZS6Z1i25NSdwRmQVpQ36AvSNz_i8Y9ks5UmA%26e%3D=02%7C01%7Cthomas.steinmaurer%40dynatrace.com%7C8d53c19106b84b0e4fef08d762dfaad4%7C70ebe3a35b30435d9d677716d74ca190%7C1%7C0%7C637086585097094534=BMfphm5RaKTpKXwQxLCoL5ePfe9hQg9pHnNAp5e27xQ%3D=0>

Would be great if someone could have a look.

Thanks a lot.

Thomas
The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 313
The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential.

Cassandra 3.0.18 went OOM several hours after joining a cluster

2019-11-06 Thread Steinmaurer, Thomas
Hello,

after moving from 2.1.18 to 3.0.18, we are facing OOM situations after several 
hours a node has successfully joined a cluster (via auto-bootstrap).

I have created the following ticket trying to describe the situation, including 
hprof / MAT screens: https://issues.apache.org/jira/browse/CASSANDRA-15400

Would be great if someone could have a look.

Thanks a lot.

Thomas
The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freist?dterstra?e 313


RE: Cassandra 2.1.18 - Question on stream/bootstrap throughput

2019-10-22 Thread Steinmaurer, Thomas
Reid, Jon, thanks for the feedback and comments. Interesting readings.

https://jira.apache.org/jira/browse/CASSANDRA-9766 is basically describing 
exactly the same what we are experiencing, namely e.g. unthrottling not 
changing anything at all, thus I simply take it as Cassandra itself is limiting 
here.

Thomas

From: Reid Pinchback 
Sent: Dienstag, 22. Oktober 2019 19:31
To: user@cassandra.apache.org
Subject: Re: Cassandra 2.1.18 - Question on stream/bootstrap throughput

Thanks for the reading Jon.  

From: Jon Haddad mailto:j...@jonhaddad.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
mailto:user@cassandra.apache.org>>
Date: Tuesday, October 22, 2019 at 12:32 PM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
mailto:user@cassandra.apache.org>>
Subject: Re: Cassandra 2.1.18 - Question on stream/bootstrap throughput

Message from External Sender
CPU waiting on memory will look like CPU overhead.   There's a good post on the 
topic by Brendan Gregg: 
http://www.brendangregg.com/blog/2017-05-09/cpu-utilization-is-wrong.html<https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttp-3A__www.brendangregg.com_blog_2017-2D05-2D09_cpu-2Dutilization-2Dis-2Dwrong.html%26d%3DDwMFaQ%26c%3D9Hv6XPedRSA-5PSECC38X80c1h60_XWA4z1k_R1pROA%26r%3DOIgB3poYhzp3_A7WgD7iBCnsJaYmspOa2okNpf6uqWc%26m%3DuyQyRQAH6rGAAtjwZF7Xzd0gwksPBtKKNFpzfyi9f2M%26s%3Dg-34YFo5F6gV_lvv-fCjlGn5SdvQJRFUOT0DIohRpCQ%26e%3D=02%7C01%7Cthomas.steinmaurer%40dynatrace.com%7Cf7dabeb22cae4324984208d75719681f%7C70ebe3a35b30435d9d677716d74ca190%7C1%7C0%7C637073638948845238=6Z%2BsOsGuE1JX42qcpShjqFlUsByjbL0HG%2F9VAJ4Qxy4%3D=0>

Regarding GC, I agree with Reid.  You're probably not going to saturate your 
network card no matter what your settings, Cassandra has way too much overhead 
to do that.  It's one of the reasons why the whole zero-copy streaming feature 
was added to Cassandra 4.0: 
http://cassandra.apache.org/blog/2018/08/07/faster_streaming_in_cassandra.html<https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttp-3A__cassandra.apache.org_blog_2018_08_07_faster-5Fstreaming-5Fin-5Fcassandra.html%26d%3DDwMFaQ%26c%3D9Hv6XPedRSA-5PSECC38X80c1h60_XWA4z1k_R1pROA%26r%3DOIgB3poYhzp3_A7WgD7iBCnsJaYmspOa2okNpf6uqWc%26m%3DuyQyRQAH6rGAAtjwZF7Xzd0gwksPBtKKNFpzfyi9f2M%26s%3DkCbODyLouPOI__Ku2DHXUXvBhw29wixkEsbXj8uwICk%26e%3D=02%7C01%7Cthomas.steinmaurer%40dynatrace.com%7Cf7dabeb22cae4324984208d75719681f%7C70ebe3a35b30435d9d677716d74ca190%7C1%7C0%7C637073638948845238=c0gmFVay2ly9SRTNC2EFgFuTA%2BKnDRMs7IOu6E4IGIc%3D=0>

Reid is also correct in pointing out the method by which you're monitoring your 
metrics might be problematic.  With prometheus, the same data can show 
significantly different graphs when using rate vs irate, and only collecting 
once a minute would hide a lot of useful data.

If you keep digging and find you're not using all your CPU during GC pauses, 
you can try using more GC threads by setting -XX:ParallelGCThreads to match the 
number of cores you have, since by default it won't use them all.  You've got 
40 cores in the m4.10xlarge, try setting -XX:ParallelGCThreads to 40.
Jon



On Tue, Oct 22, 2019 at 11:38 AM Reid Pinchback 
mailto:rpinchb...@tripadvisor.com>> wrote:
Thomas, what is your frequency of metric collection?  If it is minute-level 
granularity, that can give a very false impression.  I’ve seen CPU and disk 
throttles that don’t even begin to show visibility until second-level 
granularity around the time of the constraining event.  Even clearer is 100ms.

Also, are you monitoring your GC activity at all?  GC bound up in a lot of 
memory copies is not going to manifest that much CPU, it’s memory bus bandwidth 
you are fighting against then.  It is easy to have a box that looks unused but 
in reality its struggling.  Given that you’ve opened up the floodgates on 
compaction, that would seem quite plausible to be what you are experiencing.

From: "Steinmaurer, Thomas" 
mailto:thomas.steinmau...@dynatrace.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
mailto:user@cassandra.apache.org>>
Date: Tuesday, October 22, 2019 at 11:22 AM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
mailto:user@cassandra.apache.org>>
Subject: RE: Cassandra 2.1.18 - Question on stream/bootstrap throughput

Message from External Sender
Hi Alex,

Increased streaming throughput has been set on the existing nodes only, cause 
it is meant to limit outgoing traffic only, right? At least when judging from 
the name, reading the documentation etc.

Increased compaction throughput on all nodes, although my understanding is that 
it would be necessary only on the joining node to catchup with compacting 
received SSTables.

We real

RE: Cassandra 2.1.18 - Question on stream/bootstrap throughput

2019-10-22 Thread Steinmaurer, Thomas
Hi Alex,

Increased streaming throughput has been set on the existing nodes only, cause 
it is meant to limit outgoing traffic only, right? At least when judging from 
the name, reading the documentation etc.

Increased compaction throughput on all nodes, although my understanding is that 
it would be necessary only on the joining node to catchup with compacting 
received SSTables.

We really see no resource (CPU, NW and disk) being somehow maxed out on any 
node, which would explain the limit in the area of the new node receiving data 
at ~ 180-200 Mbit/s.

Thanks again,
Thomas

From: Oleksandr Shulgin 
Sent: Dienstag, 22. Oktober 2019 16:35
To: User 
Subject: Re: Cassandra 2.1.18 - Question on stream/bootstrap throughput

On Tue, Oct 22, 2019 at 12:47 PM Steinmaurer, Thomas 
mailto:thomas.steinmau...@dynatrace.com>> 
wrote:

using 2.1.8, 3 nodes (m4.10xlarge, ESB SSD-based), vnodes=256, RF=3, we are 
trying to add a 4th node.

The two options to my knowledge, mainly affecting throughput, namely stream 
output and compaction throttling has been set to very high values (e.g. stream 
output = 800 Mbit/s resp. compaction throughput = 500 Mbyte/s) or even set to 0 
(unthrottled) in cassandra.yaml + process restart. In both scenarios 
(throttling with high values vs. unthrottled), the 4th node is streaming from 
one node capped ~ 180-200Mbit/s, according to our SFM.

The nodes have plenty of resources available (10Gbit, disk io/iops), also 
confirmed by e.g. iperf in regard to NW throughput and write to / read from 
disk in the area of 200 MByte/s.

Are there any other known throughput / bootstrap limitations, which basically 
outrule above settings?

Hi Thomas,

Assuming you have 3 Availability Zones and you are adding the new node to one 
of the zones where you already have a node running, it is expected that it only 
streams from that node (its local rack).

Have you increased the streaming throughput on the node it streams from or only 
on the new node?  The limit applies to the source node as well.  You can change 
it online w/o the need to restart using nodetool command.

Have you checked if the new node is not CPU-bound?  It's unlikely though due to 
big instance type and only one node to stream from, more relevant for scenarios 
when streaming from a lot of nodes.

Cheers,
--
Alex

The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 313


Cassandra 2.1.18 - Question on stream/bootstrap throughput

2019-10-22 Thread Steinmaurer, Thomas
Hello,

using 2.1.8, 3 nodes (m4.10xlarge, ESB SSD-based), vnodes=256, RF=3, we are 
trying to add a 4th node.

The two options to my knowledge, mainly affecting throughput, namely stream 
output and compaction throttling has been set to very high values (e.g. stream 
output = 800 Mbit/s resp. compaction throughput = 500 Mbyte/s) or even set to 0 
(unthrottled) in cassandra.yaml + process restart. In both scenarios 
(throttling with high values vs. unthrottled), the 4th node is streaming from 
one node capped ~ 180-200Mbit/s, according to our SFM.

The nodes have plenty of resources available (10Gbit, disk io/iops), also 
confirmed by e.g. iperf in regard to NW throughput and write to / read from 
disk in the area of 200 MByte/s.

Are there any other known throughput / bootstrap limitations, which basically 
outrule above settings?

Thanks,
Thomas


The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freist?dterstra?e 313


RE: Cassandra 2.1.18 - NPE during startup

2019-03-27 Thread Steinmaurer, Thomas
Hello,

any ideas regarding below, cause it happened again on a different node.

Thanks
Thomas

From: Steinmaurer, Thomas 
Sent: Dienstag, 05. Februar 2019 23:03
To: user@cassandra.apache.org
Subject: Cassandra 2.1.18 - NPE during startup

Hello,

at a particular customer location, we are seeing the following NPE during 
startup with Cassandra 2.1.18.

INFO  [SSTableBatchOpen:2] 2019-02-03 13:32:56,131 SSTableReader.java:475 - 
Opening 
/var/opt/data/cassandra/system/schema_keyspaces-b0f2235744583cdb9631c43e59ce3676/system-schema_keyspaces-ka-130
 (256 bytes)
ERROR [main] 2019-02-03 13:32:56,552 CassandraDaemon.java:583 - Exception 
encountered during startup
org.apache.cassandra.io.FSReadError: java.lang.NullPointerException
at 
org.apache.cassandra.db.ColumnFamilyStore.removeUnfinishedCompactionLeftovers(ColumnFamilyStore.java:672)
 ~[apache-cassandra-2.1.18.jar:2.1.18]
at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:310) 
[apache-cassandra-2.1.18.jar:2.1.18]
at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:566) 
[apache-cassandra-2.1.18.jar:2.1.18]
at 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:655) 
[apache-cassandra-2.1.18.jar:2.1.18]
Caused by: java.lang.NullPointerException: null
at 
org.apache.cassandra.db.ColumnFamilyStore.removeUnfinishedCompactionLeftovers(ColumnFamilyStore.java:664)
 ~[apache-cassandra-2.1.18.jar:2.1.18]
... 3 common frames omitted

I found 
https://issues.apache.org/jira/browse/CASSANDRA-10501<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FCASSANDRA-10501=02%7C01%7Cthomas.steinmaurer%40dynatrace.com%7C796c92ba8c4d4c94937208d68bb69f1d%7C70ebe3a35b30435d9d677716d74ca190%7C1%7C0%7C636850013806426682=5Fx4An63c5R%2Faa2%2BfFXFCg7%2FDejG1V57st7UNtq9hrA%3D=0>,
 but this should be fixed in 2.1.18.

Is the above log stating that it is caused by a system keyspace related SSTable?

This is a 3 node setup with 2 others running fine. If system table related and 
as LocalStrategy is used as replication strategy (to my knowledge), perhaps 
simply copying over data for the schema_keyspaces table from another node might 
fix it?

Any help appreciated.

Thanks.
Thomas
The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 313
The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 313


Cassandra 2.1.18 - NPE during startup

2019-02-05 Thread Steinmaurer, Thomas
Hello,

at a particular customer location, we are seeing the following NPE during 
startup with Cassandra 2.1.18.

INFO  [SSTableBatchOpen:2] 2019-02-03 13:32:56,131 SSTableReader.java:475 - 
Opening 
/var/opt/data/cassandra/system/schema_keyspaces-b0f2235744583cdb9631c43e59ce3676/system-schema_keyspaces-ka-130
 (256 bytes)
ERROR [main] 2019-02-03 13:32:56,552 CassandraDaemon.java:583 - Exception 
encountered during startup
org.apache.cassandra.io.FSReadError: java.lang.NullPointerException
at 
org.apache.cassandra.db.ColumnFamilyStore.removeUnfinishedCompactionLeftovers(ColumnFamilyStore.java:672)
 ~[apache-cassandra-2.1.18.jar:2.1.18]
at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:310) 
[apache-cassandra-2.1.18.jar:2.1.18]
at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:566) 
[apache-cassandra-2.1.18.jar:2.1.18]
at 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:655) 
[apache-cassandra-2.1.18.jar:2.1.18]
Caused by: java.lang.NullPointerException: null
at 
org.apache.cassandra.db.ColumnFamilyStore.removeUnfinishedCompactionLeftovers(ColumnFamilyStore.java:664)
 ~[apache-cassandra-2.1.18.jar:2.1.18]
... 3 common frames omitted

I found https://issues.apache.org/jira/browse/CASSANDRA-10501, but this should 
be fixed in 2.1.18.

Is the above log stating that it is caused by a system keyspace related SSTable?

This is a 3 node setup with 2 others running fine. If system table related and 
as LocalStrategy is used as replication strategy (to my knowledge), perhaps 
simply copying over data for the schema_keyspaces table from another node might 
fix it?

Any help appreciated.

Thanks.
Thomas
The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freist?dterstra?e 313


RE: JMX metric for dropped hints?

2019-01-22 Thread Steinmaurer, Thomas
Hi,

thanks. I have seen that, but I’m not sure if this maps 1:1 to dropped hints in 
the Cassandra log.

Dropped “message” could also mean that something went wrong during a hint 
replay for a recovering node, but not necessarily dropped hints due to a node 
being offline longer than the default 3hr hinted handoff window.

Am I wrong?

Thanks again,
Thomas

From: Hayato Shimizu 
Sent: Dienstag, 22. Jänner 2019 10:16
To: user@cassandra.apache.org
Subject: Re: JMX metric for dropped hints?

Hi Thomas,

The dropped hints count and various rates are available in the following 
location:


org.apache.cassandra.metrics:type=DroppedMessage,scope=HINT,name=Dropped,*


org.apache.cassandra.metrics/DroppedMessage/HINT/Attributes/Count
org.apache.cassandra.metrics/DroppedMessage/HINT/Attributes/MeanRate
org.apache.cassandra.metrics/DroppedMessage/HINT/Attributes/OneMinuteRate
org.apache.cassandra.metrics/DroppedMessage/HINT/Attributes/FiveMinuteRate
org.apache.cassandra.metrics/DroppedMessage/HINT/Attributes/FifteenMinuteRate


Hayato


On Tue, 22 Jan 2019 at 07:45, Steinmaurer, Thomas 
mailto:thomas.steinmau...@dynatrace.com>> 
wrote:
Hello,

is there a JMX metric for monitoring dropped hints as a counter/rate, 
equivalent to what we see in Cassandra log, e.g.:

WARN  [HintedHandoffManager:1] 2018-11-13 13:28:46,991 
HintedHandoffMetrics.java:79 - /XXX has 18180 dropped hints, because node is 
down past configured hint window.
WARN  [HintedHandoffManager:1] 2018-11-13 13:27:29,305 
HintedHandoffMetrics.java:79 - /XXX has 1191 dropped hints, because node is 
down past configured hint window.
WARN  [HintedHandoffManager:1] 2018-11-13 13:23:09,393 
HintedHandoffMetrics.java:79 - /XXX has 135531 dropped hints, because node is 
down past configured hint window.


Thanks,
Thomas

The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 313

The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 313


JMX metric for dropped hints?

2019-01-21 Thread Steinmaurer, Thomas
Hello,

is there a JMX metric for monitoring dropped hints as a counter/rate, 
equivalent to what we see in Cassandra log, e.g.:

WARN  [HintedHandoffManager:1] 2018-11-13 13:28:46,991 
HintedHandoffMetrics.java:79 - /XXX has 18180 dropped hints, because node is 
down past configured hint window.
WARN  [HintedHandoffManager:1] 2018-11-13 13:27:29,305 
HintedHandoffMetrics.java:79 - /XXX has 1191 dropped hints, because node is 
down past configured hint window.
WARN  [HintedHandoffManager:1] 2018-11-13 13:23:09,393 
HintedHandoffMetrics.java:79 - /XXX has 135531 dropped hints, because node is 
down past configured hint window.


Thanks,
Thomas

The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freist?dterstra?e 313


Cassandra 2.1 bootstrap - No streaming progress from one node

2018-11-07 Thread Steinmaurer, Thomas
Hello,

while bootstrapping a new node into an existing cluster, a node which is acting 
as source for streaming got restarted unfortunately. Since then, from nodetool 
netstats I don't see any progress for this particular node anymore.

E.g.:

/X.X.X.X
Receiving 94 files, 260.09 GB total. Already received 26 files, 69.33 
GB total

Basically, it is stuck at 69.33GB for hours. Is Cassandra (2.1 in our case) not 
doing any resume here, in case there have been e.g. connectivity troubles or in 
our case, Cassandra on the node acting as stream source got restarted?

Can I force the joining node to recover connection to X.X.X.X or do I need to 
restart the bootstrap via restart on the new node from scratch?

Thanks,
Thomas

The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freist?dterstra?e 313


RE: Cassandra 2.1.21 ETA?

2018-10-01 Thread Steinmaurer, Thomas
Michael,

can you please elaborate on your SocketServer question. Is this for Thrift only 
or also affects the native protocol (CQL)?

Yes, we basically have iptables rules in place disallowing remote access from 
machines outside the cluster.

Thanks again,
Thomas

> -Original Message-
> From: Michael Shuler  On Behalf Of Michael
> Shuler
> Sent: Freitag, 21. September 2018 15:49
> To: user@cassandra.apache.org
> Subject: Re: Cassandra 2.1.21 ETA?
>
> On 9/21/18 3:28 AM, Steinmaurer, Thomas wrote:
> >
> > is there an ETA for 2.1.21 containing the logback update (security
> > vulnerability fix)?
>
> Are you using SocketServer? Is your cluster firewalled?
>
> Feb 2018 2.1->3.11 commits noting this in NEWS.txt:
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub
> .com%2Fapache%2Fcassandra%2Fcommit%2F4bbd28adata=01%7C01
> %7Cthomas.steinmaurer%40dynatrace.com%7C4b4bcec4c04d4c52f74c08d61
> fc9e154%7C70ebe3a35b30435d9d677716d74ca190%7C1sdata=YqHz6ul
> 55SdPuxHhz5qubNb6MeK1XEjxg63Ttf2v6Uc%3Dreserved=0
>
> Feb 2018 trunk (4.0) commit for the library update:
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub
> .com%2Fapache%2Fcassandra%2Fcommit%2Fc0aa79edata=01%7C01%
> 7Cthomas.steinmaurer%40dynatrace.com%7C4b4bcec4c04d4c52f74c08d61fc
> 9e154%7C70ebe3a35b30435d9d677716d74ca190%7C1sdata=256fWCvc
> XDCdFqeQYe618JZfQQDAmV8LVRga4UBvSKs%3Dreserved=0
>
> --
> Michael
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org

The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 313


RE: Cassandra 2.1.21 ETA?

2018-09-21 Thread Steinmaurer, Thomas
Michael,

thanks. I see now, this is just a notice in NEWS.txt.

Thomas

> -Original Message-
> From: Michael Shuler  On Behalf Of Michael
> Shuler
> Sent: Freitag, 21. September 2018 15:49
> To: user@cassandra.apache.org
> Subject: Re: Cassandra 2.1.21 ETA?
>
> On 9/21/18 3:28 AM, Steinmaurer, Thomas wrote:
> >
> > is there an ETA for 2.1.21 containing the logback update (security
> > vulnerability fix)?
>
> Are you using SocketServer? Is your cluster firewalled?
>
> Feb 2018 2.1->3.11 commits noting this in NEWS.txt:
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub
> .com%2Fapache%2Fcassandra%2Fcommit%2F4bbd28adata=01%7C01
> %7Cthomas.steinmaurer%40dynatrace.com%7C4b4bcec4c04d4c52f74c08d61
> fc9e154%7C70ebe3a35b30435d9d677716d74ca190%7C1sdata=YqHz6ul
> 55SdPuxHhz5qubNb6MeK1XEjxg63Ttf2v6Uc%3Dreserved=0
>
> Feb 2018 trunk (4.0) commit for the library update:
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub
> .com%2Fapache%2Fcassandra%2Fcommit%2Fc0aa79edata=01%7C01%
> 7Cthomas.steinmaurer%40dynatrace.com%7C4b4bcec4c04d4c52f74c08d61fc
> 9e154%7C70ebe3a35b30435d9d677716d74ca190%7C1sdata=256fWCvc
> XDCdFqeQYe618JZfQQDAmV8LVRga4UBvSKs%3Dreserved=0
>
> --
> Michael
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org

The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 313


Cassandra 2.1.21 ETA?

2018-09-21 Thread Steinmaurer, Thomas
Hello,

is there an ETA for 2.1.21 containing the logback update (security 
vulnerability fix)?

Thanks,
Thomas

The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freist?dterstra?e 313


RE: Major compaction ignoring one SSTable? (was Re: Fresh SSTable files (due to repair?) in a static table (was Re: Drop TTLd rows: upgradesstables -a or scrub?))

2018-09-18 Thread Steinmaurer, Thomas
Alex,

any indications in Cassandra log about insufficient disk space during 
compactions?

Thomas

From: Oleksandr Shulgin 
Sent: Dienstag, 18. September 2018 10:01
To: User 
Subject: Major compaction ignoring one SSTable? (was Re: Fresh SSTable files 
(due to repair?) in a static table (was Re: Drop TTLd rows: upgradesstables -a 
or scrub?))

On Mon, Sep 17, 2018 at 4:29 PM Oleksandr Shulgin 
mailto:oleksandr.shul...@zalando.de>> wrote:

Thanks for your reply!  Indeed it could be coming from single-SSTable 
compaction, this I didn't think about.  By any chance looking into 
compaction_history table could be useful to trace it down?

Hello,

Yet another unexpected thing we are seeing is that after a major compaction 
completed on one of the nodes there are two SSTables instead of only one (time 
is UTC):

-rw-r--r-- 1 999 root 99G Sep 18 00:13 mc-583-big-Data.db -rw-r--r-- 1 999 root 
70G Mar 8 2018 mc-74-big-Data.db

The more recent one must be the result of major compaction on this table, but 
why the other one from March was not included?

The logs don't help to understand the reason, and from compaction history on 
this node the following record seems to be the only trace:

@ Row 1
---+--
 id| b6feb180-bad7-11e8-9f42-f1a67c22839a
 bytes_in  | 223804299627
 bytes_out | 105322622473
 columnfamily_name | XXX
 compacted_at  | 2018-09-18 00:13:48+
 keyspace_name | YYY
 rows_merged   | {1: 31321943, 2: 11722759, 3: 382232, 4: 23405, 5: 2250, 
6: 134}

This also doesn't tell us a lot.

This has happened only on one node out of 10 where the same command was used to 
start major compaction on this table.

Any ideas what could be the reason?

For now we have just started major compaction again to ensure these two last 
tables are compacted together, but we would really like to understand the 
reason for this behavior.

Regards,
--
Alex

The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 313


Apache Thrift library 0.9.2 update due to security vulnerability?

2018-09-14 Thread Steinmaurer, Thomas
Hello,

a Blackduck security scan of our product detected a security vulnerability in 
the Apache Thrift library 0.9.2, which is shipped in Cassandra up to 3.11 
(haven't checked 4.0), also pointed out here:
https://www.cvedetails.com/vulnerability-list/vendor_id-45/product_id-38295/Apache-Thrift.html

Any plans to upgrade the Apache Thrift library?

Thanks,
Thomas


The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freist?dterstra?e 313


RE: Read timeouts when performing rolling restart

2018-09-12 Thread Steinmaurer, Thomas
Hi,

I remember something that a client using the native protocol gets notified too 
early by Cassandra being ready due to the following issue:
https://issues.apache.org/jira/browse/CASSANDRA-8236

which looks similar, but above was marked as fixed in 2.2.

Thomas

From: Riccardo Ferrari 
Sent: Mittwoch, 12. September 2018 18:25
To: user@cassandra.apache.org
Subject: Re: Read timeouts when performing rolling restart

Hi Alain,

Thank you for chiming in!

I was thinking to perform the 'start_native_transport=false' test as well and 
indeed the issue is not showing up. Starting the/a node with native transport 
disabled and letting it cool down lead to no timeout exceptions no dropped 
messages, simply a crystal clean startup. Agreed it is a workaround

# About upgrading:
Yes, I desperately want to upgrade despite is a long and slow task. Just 
reviewing all the changes from 3.0.6 to 3.0.17
is going to be a huge pain, top of your head, any breaking change I should 
absolutely take care of reviewing ?

# describecluster output: YES they agree on the same schema version

# keyspaces:
system WITH replication = {'class': 'LocalStrategy'}
system_schema WITH replication = {'class': 'LocalStrategy'}
system_auth WITH replication = {'class': 'SimpleStrategy', 
'replication_factor': '1'}
system_distributed WITH replication = {'class': 'SimpleStrategy', 
'replication_factor': '3'}
system_traces WITH replication = {'class': 'SimpleStrategy', 
'replication_factor': '2'}

 WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 
'3'}
  WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 
'3'}

# Snitch
Ec2Snitch

## About Snitch and replication:
- We have the default DC and all nodes are in the same RACK
- We are planning to move to GossipingPropertyFileSnitch configuring the 
cassandra-rackdc accortingly.
-- This should be a transparent change, correct?

- Once switched to GPFS, we plan to move to 'NetworkTopologyStrategy' with 
'us-' DC and replica counts as before
- Then adding a new DC inside the VPC, but this is another story...

Any concerns here ?

# nodetool status 
--  Address Load   Tokens   Owns (effective)  Host ID   
Rack
UN  10.x.x.a  177 GB 256  50.3% 
d8bfe4ad-8138-41fe-89a4-ee9a043095b5  rr
UN  10.x.x.b152.46 GB  256  51.8% 
7888c077-346b-4e09-96b0-9f6376b8594f  rr
UN  10.x.x.c   159.59 GB  256  49.0% 
329b288e-c5b5-4b55-b75e-fbe9243e75fa  rr
UN  10.x.x.d  162.44 GB  256  49.3% 
07038c11-d200-46a0-9f6a-6e2465580fb1  rr
UN  10.x.x.e174.9 GB   256  50.5% 
c35b5d51-2d14-4334-9ffc-726f9dd8a214  rr
UN  10.x.x.f  194.71 GB  256  49.2% 
f20f7a87-d5d2-4f38-a963-21e24167b8ac  rr

# gossipinfo
/10.x.x.a
  STATUS:827:NORMAL,-1350078789194251746
  LOAD:289986:1.90078037902E11
  SCHEMA:281088:af4461c3-d269-39bc-9d03-3566031c1e0a
  DC:6:
  RACK:8:rr
  RELEASE_VERSION:4:3.0.6
  SEVERITY:290040:0.5934718251228333
  NET_VERSION:1:10
  HOST_ID:2:d8bfe4ad-8138-41fe-89a4-ee9a043095b5
  RPC_READY:868:true
  TOKENS:826:
/10.x.x.b
  STATUS:16:NORMAL,-1023229528754013265
  LOAD:7113:1.63730480619E11
  SCHEMA:10:af4461c3-d269-39bc-9d03-3566031c1e0a
  DC:6:
  RACK:8:rr
  RELEASE_VERSION:4:3.0.6
  SEVERITY:7274:0.5988024473190308
  NET_VERSION:1:10
  HOST_ID:2:7888c077-346b-4e09-96b0-9f6376b8594f
  TOKENS:15:
/10.x.x.c
  STATUS:732:NORMAL,-111717275923547
  LOAD:245839:1.71409806942E11
  SCHEMA:237168:af4461c3-d269-39bc-9d03-3566031c1e0a
  DC:6:
  RACK:8:rr
  RELEASE_VERSION:4:3.0.6
  SEVERITY:245989:0.0
  NET_VERSION:1:10
  HOST_ID:2:329b288e-c5b5-4b55-b75e-fbe9243e75fa
  RPC_READY:763:true
  TOKENS:731:
/10.x.x.d
  STATUS:14:NORMAL,-1004942496246544417
  LOAD:313125:1.74447964917E11
  SCHEMA:304268:af4461c3-d269-39bc-9d03-3566031c1e0a
  DC:6:
  RACK:8:rr
  RELEASE_VERSION:4:3.0.6
  SEVERITY:313215:0.25641027092933655
  NET_VERSION:1:10
  HOST_ID:2:07038c11-d200-46a0-9f6a-6e2465580fb1
  RPC_READY:56:true
  TOKENS:13:
/10.x.x.e
  STATUS:520:NORMAL,-1058809960483771749
  LOAD:276118:1.87831573032E11
  SCHEMA:267327:af4461c3-d269-39bc-9d03-3566031c1e0a
  DC:6:
  RACK:8:rr
  RELEASE_VERSION:4:3.0.6
  SEVERITY:276217:0.32786884903907776
  NET_VERSION:1:10
  HOST_ID:2:c35b5d51-2d14-4334-9ffc-726f9dd8a214
  RPC_READY:550:true
  TOKENS:519:
/10.x.x.f
  STATUS:1081:NORMAL,-1039671799603495012
  LOAD:239114:2.09082017545E11
  SCHEMA:230229:af4461c3-d269-39bc-9d03-3566031c1e0a
  DC:6:
  RACK:8:rr
  RELEASE_VERSION:4:3.0.6
  SEVERITY:239180:0.5665722489356995
  NET_VERSION:1:10
  HOST_ID:2:f20f7a87-d5d2-4f38-a963-21e24167b8ac
  RPC_READY:1118:true
  TOKENS:1080:

## About load and tokens:
- While load is pretty even this does not apply to tokens, I guess we have some 
table with uneven distribution. This should not be the case for high load 
tabels as partition keys are are build with some 'id + '
- I was not able to find some 

RE: Drop TTLd rows: upgradesstables -a or scrub?

2018-09-11 Thread Steinmaurer, Thomas
Alex,

a single (largish) SSTable or any other SSTable for a table, which does not get 
any writes (with e.g. deletes) anymore, will most likely not be part of an 
automatic minor compaction anymore, thus may stay forever on disk, if I don’t 
miss anything crucial here. Might be different though, if you are entirely 
writing TTL-based, cause single SSTable based automatic tombstone compaction 
may kick in here, but I’m not really experienced with that.

We had been suffering a lot with storing timeseries data with STCS and disk 
capacity to have the cluster working smoothly and automatic minor compactions 
kicking out aged timeseries data according to our retention policies in the 
business logic. TWCS is unfortunately not an option for us. So, we did run 
major compactions every X weeks to reclaim disk space, thus from an operational 
perspective, by far not nice. Thus, finally decided to change STCS 
min_threshold from default 4 to 2, to let minor compactions kick in more 
frequently. We can live with the additional IO/CPU this is causing, thus is our 
current approach to disk space and sizing issues we had in the past.

Thomas

From: Oleksandr Shulgin 
Sent: Dienstag, 11. September 2018 09:47
To: User 
Subject: Re: Drop TTLd rows: upgradesstables -a or scrub?

On Tue, Sep 11, 2018 at 9:31 AM Steinmaurer, Thomas 
mailto:thomas.steinmau...@dynatrace.com>> 
wrote:
As far as I remember, in newer Cassandra versions, with STCS, nodetool compact 
offers a ‘-s’ command-line option to split the output into files with 50%, 25% 
… in size, thus in this case, not a single largish SSTable anymore. By default, 
without -s, it is a single SSTable though.

Thanks Thomas, I've also spotted the option while testing this approach.  I 
understand that doing major compactions is generally not recommended, but do 
you see any real drawback of having a single SSTable file in case we stopped 
writing new data to the table?

--
Alex

The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 313


RE: Drop TTLd rows: upgradesstables -a or scrub?

2018-09-11 Thread Steinmaurer, Thomas
As far as I remember, in newer Cassandra versions, with STCS, nodetool compact 
offers a ‘-s’ command-line option to split the output into files with 50%, 25% 
… in size, thus in this case, not a single largish SSTable anymore. By default, 
without -s, it is a single SSTable though.

Thomas

From: Jeff Jirsa 
Sent: Montag, 10. September 2018 19:40
To: cassandra 
Subject: Re: Drop TTLd rows: upgradesstables -a or scrub?

I think it's important to describe exactly what's going on for people who just 
read the list but who don't have context. This blog does a really good job: 
http://thelastpickle.com/blog/2016/07/27/about-deletes-and-tombstones.html
 , but briefly:

- When a TTL expires, we treat it as a tombstone, because it may have been 
written ON TOP of another piece of live data, so we need to get that deletion 
marker to all hosts, just like a manual explicit delete
- Tombstones in sstable A may shadow data in sstable B, so doing anything on 
just one sstable MAY NOT remove the tombstone - we can't get rid of the 
tombstone if sstable A overlaps another sstable with the same partition (which 
we identify via bloom filter) that has any data with a lower timestamp (we 
don't check the sstable for a shadowed value, we just look at the minimum live 
timestamp of the table)

"nodetool garbagecollect" looks for sstables that overlap (partition keys) and 
combine them together, which makes tombstones past GCGS purgable and should 
remove them (and data shadowed by them).

If you're on a version without nodetool garbagecollection, you can approximate 
it using user defined compaction ( 
http://thelastpickle.com/blog/2016/10/18/user-defined-compaction.html
 ) - it's a JMX endpoint that let's you tell cassandra to compact one or more 
sstables together based on parameters you choose. This is somewhat like 
upgradesstables or scrub, but you can combine sstables as well. If you choose 
candidates intelligently (notably, oldest sstables first, or sstables you know 
overlap), you can likely manually clean things up pretty quickly. At one point, 
I had a jar that would do single sstable at a time, oldest sstable first, and 
it pretty much worked for this purpose most of the time.

If you have room, a "nodetool compact" on stcs will also work, but it'll give 
you one huge sstable, which will be unfortunate long term (probably less of a 
problem if you're no longer writing to this table).


On Mon, Sep 10, 2018 at 10:29 AM Charulata Sharma (charshar) 
mailto:chars...@cisco.com.invalid>> wrote:
Scrub takes a very long time and does not remove the tombstones. You should do 
garbage cleaning. It immediately removes the tombstones.

Thaks,
Charu

From: Oleksandr Shulgin 
mailto:oleksandr.shul...@zalando.de>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Monday, September 10, 2018 at 6:53 AM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Drop TTLd rows: upgradesstables -a or scrub?

Hello,

We have some tables with significant amount of TTLd rows that have expired by 
now (and more gc_grace_seconds have passed since the TTL).  We have stopped 
writing more data to these tables quite a while ago, so background compaction 
isn't running.  The compaction strategy is the default SizeTiered one.

Now we would like to get rid of all the droppable tombstones in these tables.  
What would be the approach that puts the least stress on the cluster?

We've considered a few, but the most promising ones seem to be these two: 
`nodetool scrub` or `nodetool upgradesstables -a`.  We are using Cassandra 
version 3.0.

Now, this docs page recommends to use upgradesstables wherever possible: 
https://docs.datastax.com/en/cassandra/3.0/cassandra/tools/toolsScrub.html
What is the reason behind it?

From source code I can see that Scrubber the class which is going to drop the 
tombstones (and report the total number in the logs): 

RE: Drop TTLd rows: upgradesstables -a or scrub?

2018-09-11 Thread Steinmaurer, Thomas

From: Jeff Jirsa 
Sent: Montag, 10. September 2018 19:40
To: cassandra 
Subject: Re: Drop TTLd rows: upgradesstables -a or scrub?

I think it's important to describe exactly what's going on for people who just 
read the list but who don't have context. This blog does a really good job: 
http://thelastpickle.com/blog/2016/07/27/about-deletes-and-tombstones.html
 , but briefly:

- When a TTL expires, we treat it as a tombstone, because it may have been 
written ON TOP of another piece of live data, so we need to get that deletion 
marker to all hosts, just like a manual explicit delete
- Tombstones in sstable A may shadow data in sstable B, so doing anything on 
just one sstable MAY NOT remove the tombstone - we can't get rid of the 
tombstone if sstable A overlaps another sstable with the same partition (which 
we identify via bloom filter) that has any data with a lower timestamp (we 
don't check the sstable for a shadowed value, we just look at the minimum live 
timestamp of the table)

"nodetool garbagecollect" looks for sstables that overlap (partition keys) and 
combine them together, which makes tombstones past GCGS purgable and should 
remove them (and data shadowed by them).

If you're on a version without nodetool garbagecollection, you can approximate 
it using user defined compaction ( 
http://thelastpickle.com/blog/2016/10/18/user-defined-compaction.html
 ) - it's a JMX endpoint that let's you tell cassandra to compact one or more 
sstables together based on parameters you choose. This is somewhat like 
upgradesstables or scrub, but you can combine sstables as well. If you choose 
candidates intelligently (notably, oldest sstables first, or sstables you know 
overlap), you can likely manually clean things up pretty quickly. At one point, 
I had a jar that would do single sstable at a time, oldest sstable first, and 
it pretty much worked for this purpose most of the time.

If you have room, a "nodetool compact" on stcs will also work, but it'll give 
you one huge sstable, which will be unfortunate long term (probably less of a 
problem if you're no longer writing to this table).





On Mon, Sep 10, 2018 at 10:29 AM Charulata Sharma (charshar) 
mailto:chars...@cisco.com.invalid>> wrote:
Scrub takes a very long time and does not remove the tombstones. You should do 
garbage cleaning. It immediately removes the tombstones.

Thaks,
Charu

From: Oleksandr Shulgin 
mailto:oleksandr.shul...@zalando.de>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Monday, September 10, 2018 at 6:53 AM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Drop TTLd rows: upgradesstables -a or scrub?

Hello,

We have some tables with significant amount of TTLd rows that have expired by 
now (and more gc_grace_seconds have passed since the TTL).  We have stopped 
writing more data to these tables quite a while ago, so background compaction 
isn't running.  The compaction strategy is the default SizeTiered one.

Now we would like to get rid of all the droppable tombstones in these tables.  
What would be the approach that puts the least stress on the cluster?

We've considered a few, but the most promising ones seem to be these two: 
`nodetool scrub` or `nodetool upgradesstables -a`.  We are using Cassandra 
version 3.0.

Now, this docs page recommends to use upgradesstables wherever possible: 
https://docs.datastax.com/en/cassandra/3.0/cassandra/tools/toolsScrub.html
What is the reason behind it?

From source code I can see that Scrubber the class which is going to drop the 
tombstones (and report the total number in the logs): 

Scrub a single SSTable only?

2018-09-11 Thread Steinmaurer, Thomas
Hello,

is there a way to Online scrub a particular SSTable file only and not the 
entire column family?

According to the Cassandra logs we have a corrupted SSTable smallish compared 
to the entire data volume of the column family in question.

To my understanding, both, nodetool scrub and sstablescrub operate on the 
entire column family and can't work on a single SSTable, right?

There is still the way to shutdown Cassandra and remove the file from disk, but 
ideally I want to have that as an online operation.

Perhaps there is something JMX based?

Thanks,
Thomas

The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freist?dterstra?e 313


RE: Configuration parameter to reject incremental repair?

2018-09-09 Thread Steinmaurer, Thomas
Kurt,

I have created https://issues.apache.org/jira/browse/CASSANDRA-14709 as a 
request for enhancement. Not sure if/how this gets attention though.

Thanks for your support.

Thomas

From: kurt greaves 
Sent: Montag, 13. August 2018 13:30
To: User 
Subject: Re: Configuration parameter to reject incremental repair?

No flag currently exists. Probably a good idea considering the serious issues 
with incremental repairs since forever, and the change of defaults since 3.0.

On 7 August 2018 at 16:44, Steinmaurer, Thomas 
mailto:thomas.steinmau...@dynatrace.com>> 
wrote:
Hello,

we are running Cassandra in AWS and On-Premise at customer sites, currently 2.1 
in production with 3.11 in loadtest.

In a migration path from 2.1 to 3.11.x, I’m afraid that at some point in time 
we end up in incremental repairs being enabled / ran a first time 
unintentionally, cause:
a) A lot of online resources / examples do not use the -full command-line option
b) Our internal (support) tickets of course also state nodetool repair command 
without the -full option, as these are for 2.1

Especially for On-Premise customers (with less control than with our AWS 
deployments), this asks a bit for getting out-of-control once we have 3.11 out 
and nodetool repair being run without the -full command-line option.

So, what do you think about a JVM system property, cassandra.yaml … to 
basically let the operator chose if incremental repairs are allowed or not? I 
know, such a flag still can be flipped then (by the customer), but as a first 
safety stage possibly sufficient enough.

Or perhaps something like that is already available (vaguely remember something 
like that for MV).

Thanks a lot,
Thomas

The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 313

The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 313


RE: nodetool cleanup - compaction remaining time

2018-09-07 Thread Steinmaurer, Thomas
I have created https://issues.apache.org/jira/browse/CASSANDRA-14701

Please adapt as needed. Thanks!

Thomas

From: Jeff Jirsa 
Sent: Donnerstag, 06. September 2018 07:52
To: cassandra 
Subject: Re: nodetool cleanup - compaction remaining time

Probably worth a JIRA (especially if you can repro in 3.0 or higher, since 2.1 
is critical fixes only)

On Wed, Sep 5, 2018 at 10:46 PM Steinmaurer, Thomas 
mailto:thomas.steinmau...@dynatrace.com>> 
wrote:
Hello,

is it a known issue / limitation that cleanup compactions aren’t counted in the 
compaction remaining time?

nodetool compactionstats -H

pending tasks: 1
   compaction type   keyspace   table   completed totalunit   
progress
   CleanupXXX YYY   908.16 GB   1.13 TB   bytes 
78.63%
Active compaction remaining time :   0h00m00s


This is with 2.1.18.


Thanks,
Thomas

The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 313
The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 313


RE: nodetool cleanup - compaction remaining time

2018-09-06 Thread Steinmaurer, Thomas
Alain,

compaction throughput is set to 32.

Regards,
Thomas

From: Alain RODRIGUEZ 
Sent: Donnerstag, 06. September 2018 11:50
To: user cassandra.apache.org 
Subject: Re: nodetool cleanup - compaction remaining time

Hello Thomas.

Be aware that this behavior happens when the compaction throughput is set to 0 
(unthrottled/unlimited). I believe the estimate uses the speed limit for 
calculation (which is often very much wrong anyway).

I just meant to say, you might want to make sure that it's due to cleanup type 
of compaction indeed and not due to some changes you could have made in the 
compaction throughput threshold.

C*heers,
---
Alain Rodriguez - @arodream - 
al...@thelastpickle.com<mailto:al...@thelastpickle.com>
France / Spain

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.thelastpickle.com=01%7C01%7Cthomas.steinmaurer%40dynatrace.com%7Cc2bc41a121c24575fbd208d613df2772%7C70ebe3a35b30435d9d677716d74ca190%7C1=uJumOQxj3HEkZZl1Q%2BNrGoPGvMVTnvJgRO1Q%2Bdoot5M%3D=0>

Le jeu. 6 sept. 2018 à 06:51, Jeff Jirsa 
mailto:jji...@gmail.com>> a écrit :
Probably worth a JIRA (especially if you can repro in 3.0 or higher, since 2.1 
is critical fixes only)

On Wed, Sep 5, 2018 at 10:46 PM Steinmaurer, Thomas 
mailto:thomas.steinmau...@dynatrace.com>> 
wrote:
Hello,

is it a known issue / limitation that cleanup compactions aren’t counted in the 
compaction remaining time?

nodetool compactionstats -H

pending tasks: 1
   compaction type   keyspace   table   completed totalunit   
progress
   CleanupXXX YYY   908.16 GB   1.13 TB   bytes 
78.63%
Active compaction remaining time :   0h00m00s


This is with 2.1.18.


Thanks,
Thomas

The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 313
The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 313


nodetool cleanup - compaction remaining time

2018-09-05 Thread Steinmaurer, Thomas
Hello,

is it a known issue / limitation that cleanup compactions aren't counted in the 
compaction remaining time?

nodetool compactionstats -H

pending tasks: 1
   compaction type   keyspace   table   completed totalunit   
progress
   CleanupXXX YYY   908.16 GB   1.13 TB   bytes 
78.63%
Active compaction remaining time :   0h00m00s


This is with 2.1.18.


Thanks,
Thomas

The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freist?dterstra?e 313


RE: Data Corruption due to multiple Cassandra 2.1 processes?

2018-09-05 Thread Steinmaurer, Thomas
Kurt,

I cloned the original ticket. The new one is: 
https://issues.apache.org/jira/browse/CASSANDRA-14691

I can’t change the Assignee resp. unassign it.

Thanks,
Thomas

From: kurt greaves 
Sent: Dienstag, 14. August 2018 04:53
To: User 
Subject: Re: Data Corruption due to multiple Cassandra 2.1 processes?

New ticket for backporting, referencing the existing.

On Mon., 13 Aug. 2018, 22:50 Steinmaurer, Thomas, 
mailto:thomas.steinmau...@dynatrace.com>> 
wrote:
Thanks Kurt.

What is the proper workflow here to get this accepted? Create a new ticket 
dedicated for the backport referencing 11540 or re-open 11540?

Thanks for your help.

Thomas

From: kurt greaves mailto:k...@instaclustr.com>>
Sent: Montag, 13. August 2018 13:24
To: User mailto:user@cassandra.apache.org>>
Subject: Re: Data Corruption due to multiple Cassandra 2.1 processes?

Yeah that's not ideal and could lead to problems. I think corruption is only 
likely if compactions occur, but seems like data loss is a potential not to 
mention all sorts of other possible nasties that could occur running two C*'s 
at once. Seems to me that 11540 should have gone to 2.1 in the first place, but 
it just got missed. Very simple patch so I think a backport should be accepted.

On 7 August 2018 at 15:57, Steinmaurer, Thomas 
mailto:thomas.steinmau...@dynatrace.com>> 
wrote:
Hello,

with 2.1, in case a second Cassandra process/instance is started on a host (by 
accident), may this result in some sort of corruption, although Cassandra will 
exit at some point in time due to not being able to bind TCP ports already in 
use?

What we have seen in this scenario is something like that:

ERROR [main] 2018-08-05 21:10:24,046 CassandraDaemon.java:120 - Error starting 
local jmx server:
java.rmi.server.ExportException: Port already in use: 7199; nested exception is:
java.net.BindException: Address already in use (Bind failed)
…

But then continuing with stuff like opening system and even user tables:

INFO  [main] 2018-08-05 21:10:24,060 CacheService.java:110 - Initializing key 
cache with capacity of 100 MBs.
INFO  [main] 2018-08-05 21:10:24,067 CacheService.java:132 - Initializing row 
cache with capacity of 0 MBs
INFO  [main] 2018-08-05 21:10:24,073 CacheService.java:149 - Initializing 
counter cache with capacity of 50 MBs
INFO  [main] 2018-08-05 21:10:24,074 CacheService.java:160 - Scheduling counter 
cache save to every 7200 seconds (going to save all keys).
INFO  [main] 2018-08-05 21:10:24,161 ColumnFamilyStore.java:365 - Initializing 
system.sstable_activity
INFO  [SSTableBatchOpen:2] 2018-08-05 21:10:24,692 SSTableReader.java:475 - 
Opening 
/var/opt/xxx-managed/cassandra/system/sstable_activity-5a1ff267ace03f128563cfae6103c65e/system-sstable_activity-ka-165
 (2023 bytes)
INFO  [SSTableBatchOpen:3] 2018-08-05 21:10:24,692 SSTableReader.java:475 - 
Opening 
/var/opt/xxx-managed/cassandra/system/sstable_activity-5a1ff267ace03f128563cfae6103c65e/system-sstable_activity-ka-167
 (2336 bytes)
INFO  [SSTableBatchOpen:1] 2018-08-05 21:10:24,692 SSTableReader.java:475 - 
Opening 
/var/opt/xxx-managed/cassandra/system/sstable_activity-5a1ff267ace03f128563cfae6103c65e/system-sstable_activity-ka-166
 (2686 bytes)
INFO  [main] 2018-08-05 21:10:24,755 ColumnFamilyStore.java:365 - Initializing 
system.hints
INFO  [SSTableBatchOpen:1] 2018-08-05 21:10:24,758 SSTableReader.java:475 - 
Opening 
/var/opt/xxx-managed/cassandra/system/hints-2666e20573ef38b390fefecf96e8f0c7/system-hints-ka-377
 (46210621 bytes)
INFO  [main] 2018-08-05 21:10:24,766 ColumnFamilyStore.java:365 - Initializing 
system.compaction_history
INFO  [SSTableBatchOpen:1] 2018-08-05 21:10:24,768 SSTableReader.java:475 - 
Opening 
/var/opt/xxx-managed/cassandra/system/compaction_history-b4dbb7b4dc493fb5b3bfce6e434832ca/system-compaction_history-ka-129
 (91269 bytes)
…

Replaying commit logs:

…
INFO  [main] 2018-08-05 21:10:25,896 CommitLogReplayer.java:267 - Replaying 
/var/opt/dynatrace-managed/cassandra/commitlog/CommitLog-4-1533133668366.log
INFO  [main] 2018-08-05 21:10:25,896 CommitLogReplayer.java:270 - Replaying 
/var/opt/dynatrace-managed/cassandra/commitlog/CommitLog-4-1533133668366.log 
(CL version 4, messaging version 8)
…

Even writing memtables already (below just pasted system tables, but also user 
tables):

…
INFO  [MemtableFlushWriter:4] 2018-08-05 21:11:52,524 Memtable.java:347 - 
Writing 
Memtable-size_estimates@1941663179(2.655MiB<mailto:Memtable-size_estimates@1941663179(2.655MiB>
 serialized bytes, 325710 ops, 2%/0% of on/off-heap limit)
INFO  [MemtableFlushWriter:3] 2018-08-05 21:11:52,552 Memtable.java:347 - 
Writing 
Memtable-peer_events@1474667699(0.199KiB<mailto:Memtable-peer_events@1474667699(0.199KiB>
 serialized bytes, 4 ops, 0%/0% of on/off-heap limit)
…

Until it comes to a point where it can’t bind ports like the storage port 7000:

ERROR [main] 2018-08-05 21:11:54,350 CassandraDaemon.j

RE: Data Corruption due to multiple Cassandra 2.1 processes?

2018-08-13 Thread Steinmaurer, Thomas
Thanks Kurt.

What is the proper workflow here to get this accepted? Create a new ticket 
dedicated for the backport referencing 11540 or re-open 11540?

Thanks for your help.

Thomas

From: kurt greaves 
Sent: Montag, 13. August 2018 13:24
To: User 
Subject: Re: Data Corruption due to multiple Cassandra 2.1 processes?

Yeah that's not ideal and could lead to problems. I think corruption is only 
likely if compactions occur, but seems like data loss is a potential not to 
mention all sorts of other possible nasties that could occur running two C*'s 
at once. Seems to me that 11540 should have gone to 2.1 in the first place, but 
it just got missed. Very simple patch so I think a backport should be accepted.

On 7 August 2018 at 15:57, Steinmaurer, Thomas 
mailto:thomas.steinmau...@dynatrace.com>> 
wrote:
Hello,

with 2.1, in case a second Cassandra process/instance is started on a host (by 
accident), may this result in some sort of corruption, although Cassandra will 
exit at some point in time due to not being able to bind TCP ports already in 
use?

What we have seen in this scenario is something like that:

ERROR [main] 2018-08-05 21:10:24,046 CassandraDaemon.java:120 - Error starting 
local jmx server:
java.rmi.server.ExportException: Port already in use: 7199; nested exception is:
java.net.BindException: Address already in use (Bind failed)
…

But then continuing with stuff like opening system and even user tables:

INFO  [main] 2018-08-05 21:10:24,060 CacheService.java:110 - Initializing key 
cache with capacity of 100 MBs.
INFO  [main] 2018-08-05 21:10:24,067 CacheService.java:132 - Initializing row 
cache with capacity of 0 MBs
INFO  [main] 2018-08-05 21:10:24,073 CacheService.java:149 - Initializing 
counter cache with capacity of 50 MBs
INFO  [main] 2018-08-05 21:10:24,074 CacheService.java:160 - Scheduling counter 
cache save to every 7200 seconds (going to save all keys).
INFO  [main] 2018-08-05 21:10:24,161 ColumnFamilyStore.java:365 - Initializing 
system.sstable_activity
INFO  [SSTableBatchOpen:2] 2018-08-05 21:10:24,692 SSTableReader.java:475 - 
Opening 
/var/opt/xxx-managed/cassandra/system/sstable_activity-5a1ff267ace03f128563cfae6103c65e/system-sstable_activity-ka-165
 (2023 bytes)
INFO  [SSTableBatchOpen:3] 2018-08-05 21:10:24,692 SSTableReader.java:475 - 
Opening 
/var/opt/xxx-managed/cassandra/system/sstable_activity-5a1ff267ace03f128563cfae6103c65e/system-sstable_activity-ka-167
 (2336 bytes)
INFO  [SSTableBatchOpen:1] 2018-08-05 21:10:24,692 SSTableReader.java:475 - 
Opening 
/var/opt/xxx-managed/cassandra/system/sstable_activity-5a1ff267ace03f128563cfae6103c65e/system-sstable_activity-ka-166
 (2686 bytes)
INFO  [main] 2018-08-05 21:10:24,755 ColumnFamilyStore.java:365 - Initializing 
system.hints
INFO  [SSTableBatchOpen:1] 2018-08-05 21:10:24,758 SSTableReader.java:475 - 
Opening 
/var/opt/xxx-managed/cassandra/system/hints-2666e20573ef38b390fefecf96e8f0c7/system-hints-ka-377
 (46210621 bytes)
INFO  [main] 2018-08-05 21:10:24,766 ColumnFamilyStore.java:365 - Initializing 
system.compaction_history
INFO  [SSTableBatchOpen:1] 2018-08-05 21:10:24,768 SSTableReader.java:475 - 
Opening 
/var/opt/xxx-managed/cassandra/system/compaction_history-b4dbb7b4dc493fb5b3bfce6e434832ca/system-compaction_history-ka-129
 (91269 bytes)
…

Replaying commit logs:

…
INFO  [main] 2018-08-05 21:10:25,896 CommitLogReplayer.java:267 - Replaying 
/var/opt/dynatrace-managed/cassandra/commitlog/CommitLog-4-1533133668366.log
INFO  [main] 2018-08-05 21:10:25,896 CommitLogReplayer.java:270 - Replaying 
/var/opt/dynatrace-managed/cassandra/commitlog/CommitLog-4-1533133668366.log 
(CL version 4, messaging version 8)
…

Even writing memtables already (below just pasted system tables, but also user 
tables):

…
INFO  [MemtableFlushWriter:4] 2018-08-05 21:11:52,524 Memtable.java:347 - 
Writing 
Memtable-size_estimates@1941663179(2.655MiB<mailto:Memtable-size_estimates@1941663179(2.655MiB>
 serialized bytes, 325710 ops, 2%/0% of on/off-heap limit)
INFO  [MemtableFlushWriter:3] 2018-08-05 21:11:52,552 Memtable.java:347 - 
Writing 
Memtable-peer_events@1474667699(0.199KiB<mailto:Memtable-peer_events@1474667699(0.199KiB>
 serialized bytes, 4 ops, 0%/0% of on/off-heap limit)
…

Until it comes to a point where it can’t bind ports like the storage port 7000:

ERROR [main] 2018-08-05 21:11:54,350 CassandraDaemon.java:395 - Fatal 
configuration error
org.apache.cassandra.exceptions.ConfigurationException: /XXX:7000 is in use by 
another process.  Change listen_address:storage_port in cassandra.yaml to 
values that do not conflict with other services
at 
org.apache.cassandra.net<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Forg.apache.cassandra.net=01%7C01%7Cthomas.steinmaurer%40dynatrace.com%7Cc9c83c6a323b4b24cffa08d60110530c%7C70ebe3a35b30435d9d677716d74ca190%7C1=0ZMpjJBhOWznJH3SfvW%2BHBflSq%2F1Q4CCYvdy8jlcbT0%3D=0>.MessagingService.getServerSocke

Configuration parameter to reject incremental repair?

2018-08-07 Thread Steinmaurer, Thomas
Hello,

we are running Cassandra in AWS and On-Premise at customer sites, currently 2.1 
in production with 3.11 in loadtest.

In a migration path from 2.1 to 3.11.x, I'm afraid that at some point in time 
we end up in incremental repairs being enabled / ran a first time 
unintentionally, cause:
a) A lot of online resources / examples do not use the -full command-line option
b) Our internal (support) tickets of course also state nodetool repair command 
without the -full option, as these are for 2.1

Especially for On-Premise customers (with less control than with our AWS 
deployments), this asks a bit for getting out-of-control once we have 3.11 out 
and nodetool repair being run without the -full command-line option.

So, what do you think about a JVM system property, cassandra.yaml ... to 
basically let the operator chose if incremental repairs are allowed or not? I 
know, such a flag still can be flipped then (by the customer), but as a first 
safety stage possibly sufficient enough.

Or perhaps something like that is already available (vaguely remember something 
like that for MV).

Thanks a lot,
Thomas

The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freist?dterstra?e 313


Data Corruption due to multiple Cassandra 2.1 processes?

2018-08-06 Thread Steinmaurer, Thomas
Hello,

with 2.1, in case a second Cassandra process/instance is started on a host (by 
accident), may this result in some sort of corruption, although Cassandra will 
exit at some point in time due to not being able to bind TCP ports already in 
use?

What we have seen in this scenario is something like that:

ERROR [main] 2018-08-05 21:10:24,046 CassandraDaemon.java:120 - Error starting 
local jmx server:
java.rmi.server.ExportException: Port already in use: 7199; nested exception is:
java.net.BindException: Address already in use (Bind failed)
...

But then continuing with stuff like opening system and even user tables:

INFO  [main] 2018-08-05 21:10:24,060 CacheService.java:110 - Initializing key 
cache with capacity of 100 MBs.
INFO  [main] 2018-08-05 21:10:24,067 CacheService.java:132 - Initializing row 
cache with capacity of 0 MBs
INFO  [main] 2018-08-05 21:10:24,073 CacheService.java:149 - Initializing 
counter cache with capacity of 50 MBs
INFO  [main] 2018-08-05 21:10:24,074 CacheService.java:160 - Scheduling counter 
cache save to every 7200 seconds (going to save all keys).
INFO  [main] 2018-08-05 21:10:24,161 ColumnFamilyStore.java:365 - Initializing 
system.sstable_activity
INFO  [SSTableBatchOpen:2] 2018-08-05 21:10:24,692 SSTableReader.java:475 - 
Opening 
/var/opt/xxx-managed/cassandra/system/sstable_activity-5a1ff267ace03f128563cfae6103c65e/system-sstable_activity-ka-165
 (2023 bytes)
INFO  [SSTableBatchOpen:3] 2018-08-05 21:10:24,692 SSTableReader.java:475 - 
Opening 
/var/opt/xxx-managed/cassandra/system/sstable_activity-5a1ff267ace03f128563cfae6103c65e/system-sstable_activity-ka-167
 (2336 bytes)
INFO  [SSTableBatchOpen:1] 2018-08-05 21:10:24,692 SSTableReader.java:475 - 
Opening 
/var/opt/xxx-managed/cassandra/system/sstable_activity-5a1ff267ace03f128563cfae6103c65e/system-sstable_activity-ka-166
 (2686 bytes)
INFO  [main] 2018-08-05 21:10:24,755 ColumnFamilyStore.java:365 - Initializing 
system.hints
INFO  [SSTableBatchOpen:1] 2018-08-05 21:10:24,758 SSTableReader.java:475 - 
Opening 
/var/opt/xxx-managed/cassandra/system/hints-2666e20573ef38b390fefecf96e8f0c7/system-hints-ka-377
 (46210621 bytes)
INFO  [main] 2018-08-05 21:10:24,766 ColumnFamilyStore.java:365 - Initializing 
system.compaction_history
INFO  [SSTableBatchOpen:1] 2018-08-05 21:10:24,768 SSTableReader.java:475 - 
Opening 
/var/opt/xxx-managed/cassandra/system/compaction_history-b4dbb7b4dc493fb5b3bfce6e434832ca/system-compaction_history-ka-129
 (91269 bytes)
...

Replaying commit logs:

...
INFO  [main] 2018-08-05 21:10:25,896 CommitLogReplayer.java:267 - Replaying 
/var/opt/dynatrace-managed/cassandra/commitlog/CommitLog-4-1533133668366.log
INFO  [main] 2018-08-05 21:10:25,896 CommitLogReplayer.java:270 - Replaying 
/var/opt/dynatrace-managed/cassandra/commitlog/CommitLog-4-1533133668366.log 
(CL version 4, messaging version 8)
...

Even writing memtables already (below just pasted system tables, but also user 
tables):

...
INFO  [MemtableFlushWriter:4] 2018-08-05 21:11:52,524 Memtable.java:347 - 
Writing Memtable-size_estimates@1941663179(2.655MiB serialized bytes, 325710 
ops, 2%/0% of on/off-heap limit)
INFO  [MemtableFlushWriter:3] 2018-08-05 21:11:52,552 Memtable.java:347 - 
Writing Memtable-peer_events@1474667699(0.199KiB serialized bytes, 4 ops, 0%/0% 
of on/off-heap limit)
...

Until it comes to a point where it can't bind ports like the storage port 7000:

ERROR [main] 2018-08-05 21:11:54,350 CassandraDaemon.java:395 - Fatal 
configuration error
org.apache.cassandra.exceptions.ConfigurationException: /XXX:7000 is in use by 
another process.  Change listen_address:storage_port in cassandra.yaml to 
values that do not conflict with other services
at 
org.apache.cassandra.net.MessagingService.getServerSockets(MessagingService.java:495)
 ~[apache-cassandra-2.1.18.jar:2.1.18]
...

Until Cassandra stops:

...
INFO  [StorageServiceShutdownHook] 2018-08-05 21:11:54,361 Gossiper.java:1454 - 
Announcing shutdown
...


So, we have around 2 minutes where Cassandra is mangling with existing data, 
although it shouldn't.

Sounds like a potential candidate for data corruption, right? E.g. later on we 
then see things like (still while being in progress to shutdown?):

WARN  [SharedPool-Worker-1] 2018-08-05 21:11:58,181 
AbstractTracingAwareExecutorService.java:169 - Uncaught exception on thread 
Thread[SharedPool-Worker-1,5,main]: {}
java.lang.RuntimeException: java.io.FileNotFoundException: 
/var/opt/xxx-managed/cassandra/xxx/xxx-fdc68b70950611e8ad7179f2d5bfa3cf/xxx-xxx-ka-15-Data.db
 (No such file or directory)
at 
org.apache.cassandra.io.compress.CompressedRandomAccessReader.open(CompressedRandomAccessReader.java:52)
 ~[apache-cassandra-2.1.18.jar:2.1.18]
at 
org.apache.cassandra.io.util.CompressedPoolingSegmentedFile.createPooledReader(CompressedPoolingSegmentedFile.java:95)
 ~[apache-cassandra-2.1.18.jar:2.1.18]
at 

RE: Reaper 1.2 released

2018-07-25 Thread Steinmaurer, Thomas
Jon,

eager trying it out.  Just FYI. Followed the installation instructions on 
http://cassandra-reaper.io/docs/download/install/ Debian-based.

1) Importing the key results in:

XXX:~$ sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 
2895100917357435
Executing: /tmp/tmp.tP0KAKG6iT/gpg.1.sh --keyserver
keyserver.ubuntu.com
--recv-keys
2895100917357435
gpg: requesting key 17357435 from hkp server keyserver.ubuntu.com
?: [fd 4]: read error: Connection reset by peer
gpgkeys: HTTP fetch error 7: couldn't connect: eof
gpg: no valid OpenPGP data found.
gpg: Total number processed: 0
gpg: keyserver communications error: keyserver unreachable
gpg: keyserver communications error: public key not found
gpg: keyserver receive failed: public key not found

I had to change the keyserver URL then the import worked:

XXX:~$ sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys 
2895100917357435
Executing: /tmp/tmp.JwPNeUkm6x/gpg.1.sh --keyserver
hkp://keyserver.ubuntu.com:80
--recv-keys
2895100917357435
gpg: requesting key 17357435 from hkp server keyserver.ubuntu.com
gpg: key 17357435: public key "TLP Reaper packages " 
imported
gpg: Total number processed: 1
gpg:   imported: 1  (RSA: 1)


2) Running apt-get update fails with:

XXX:~$ sudo apt-get update
Ign:1 https://dl.bintray.com/thelastpickle/reaper-deb wheezy InRelease
Ign:2 https://dl.bintray.com/thelastpickle/reaper-deb wheezy Release
Ign:3 https://dl.bintray.com/thelastpickle/reaper-deb wheezy/main amd64 Packages
Ign:4 https://dl.bintray.com/thelastpickle/reaper-deb wheezy/main i386 Packages
Ign:5 https://dl.bintray.com/thelastpickle/reaper-deb wheezy/main all Packages
Ign:6 https://dl.bintray.com/thelastpickle/reaper-deb wheezy/main 
Translation-en_US
Ign:7 https://dl.bintray.com/thelastpickle/reaper-deb wheezy/main Translation-en
Ign:3 https://dl.bintray.com/thelastpickle/reaper-deb wheezy/main amd64 Packages
Ign:4 https://dl.bintray.com/thelastpickle/reaper-deb wheezy/main i386 Packages
Ign:5 https://dl.bintray.com/thelastpickle/reaper-deb wheezy/main all Packages
Ign:6 https://dl.bintray.com/thelastpickle/reaper-deb wheezy/main 
Translation-en_US
Ign:7 https://dl.bintray.com/thelastpickle/reaper-deb wheezy/main Translation-en
Ign:3 https://dl.bintray.com/thelastpickle/reaper-deb wheezy/main amd64 Packages
Ign:4 https://dl.bintray.com/thelastpickle/reaper-deb wheezy/main i386 Packages
Ign:5 https://dl.bintray.com/thelastpickle/reaper-deb wheezy/main all Packages
Ign:6 https://dl.bintray.com/thelastpickle/reaper-deb wheezy/main 
Translation-en_US
Ign:7 https://dl.bintray.com/thelastpickle/reaper-deb wheezy/main Translation-en
Ign:3 https://dl.bintray.com/thelastpickle/reaper-deb wheezy/main amd64 Packages
Ign:4 https://dl.bintray.com/thelastpickle/reaper-deb wheezy/main i386 Packages
Ign:5 https://dl.bintray.com/thelastpickle/reaper-deb wheezy/main all Packages
Ign:6 https://dl.bintray.com/thelastpickle/reaper-deb wheezy/main 
Translation-en_US
Ign:7 https://dl.bintray.com/thelastpickle/reaper-deb wheezy/main Translation-en
Ign:3 https://dl.bintray.com/thelastpickle/reaper-deb wheezy/main amd64 Packages
Ign:4 https://dl.bintray.com/thelastpickle/reaper-deb wheezy/main i386 Packages
Ign:5 https://dl.bintray.com/thelastpickle/reaper-deb wheezy/main all Packages
Ign:6 https://dl.bintray.com/thelastpickle/reaper-deb wheezy/main 
Translation-en_US
Ign:7 https://dl.bintray.com/thelastpickle/reaper-deb wheezy/main Translation-en
Err:3 https://dl.bintray.com/thelastpickle/reaper-deb wheezy/main amd64 Packages
  Received HTTP code 403 from proxy after CONNECT
Ign:4 https://dl.bintray.com/thelastpickle/reaper-deb wheezy/main i386 Packages
Ign:5 https://dl.bintray.com/thelastpickle/reaper-deb wheezy/main all Packages
Ign:6 https://dl.bintray.com/thelastpickle/reaper-deb wheezy/main 
Translation-en_US
Ign:7 https://dl.bintray.com/thelastpickle/reaper-deb wheezy/main Translation-en
Ign:8 http://lnz-apt-cacher.dynatrace.vmta/apt xenial InRelease
Hit:9 http://pl.archive.ubuntu.com/ubuntu xenial InRelease
Hit:10 http://lnz-apt-cacher.dynatrace.vmta/apt xenial Release
Hit:11 http://pl.archive.ubuntu.com/ubuntu xenial-backports InRelease
Hit:13 http://pl.archive.ubuntu.com/ubuntu xenial-security InRelease
Hit:14 http://pl.archive.ubuntu.com/ubuntu xenial-updates InRelease
Reading package lists... Done
W: The repository 'https://dl.bintray.com/thelastpickle/reaper-deb wheezy 
Release' does not have a Release file.
N: Data from such a repository can't be authenticated and is therefore 
potentially dangerous to use.
N: See apt-secure(8) manpage for repository creation and user configuration 
details.
E: Failed to fetch 
https://dl.bintray.com/thelastpickle/reaper-deb/dists/wheezy/main/binary-amd64/Packages
  Received HTTP code 403 from proxy after CONNECT
E: Some index files failed to download. They have been ignored, or old ones 
used instead.


Thanks,
Thomas

From: Jonathan Haddad 
Sent: Dienstag, 24. Juli 2018 21:08
To: user 

RE: G1GC CPU Spike

2018-06-13 Thread Steinmaurer, Thomas
Explicitly setting Xmn with G1 basically results in overriding the target 
pause-time goal, thus should be avoided.
http://www.oracle.com/technetwork/articles/java/g1gc-1984535.html

Thomas


From: rajpal reddy [mailto:rajpalreddy...@gmail.com]
Sent: Mittwoch, 13. Juni 2018 17:27
To: user@cassandra.apache.org
Subject: Re: G1GC CPU Spike

we have this has the Heap settings . is the HEAP_NEWSIZE is required only for 
CMS. can we get rid of that for G1GC so that it can be used?
MAX_HEAP_SIZE="8192M"
HEAP_NEWSIZE=“800M"

On Jun 13, 2018, at 11:15 AM, Chris Lohfink 
mailto:clohf...@apple.com>> wrote:

That metric is the total number of seconds spent in GC, it will increase over 
time with every young gc which is expected. Whats interesting is the rate of 
growth not the fact that its increasing. If graphing tool has option to graph 
derivative you should use that instead.

Chris


On Jun 13, 2018, at 9:51 AM, rajpal reddy 
mailto:rajpalreddy...@gmail.com>> wrote:

jvm_gc_collection_seconds_count{gc="G1 Young Generation”} and also young 
generation seconds count keep increasing



On Jun 13, 2018, at 9:52 AM, Chris Lohfink 
mailto:clohf...@apple.com>> wrote:

The gc log file is best to share when asking for help with tuning. The top of 
file has all the computed args it ran with and it gives details on what part of 
the GC is taking time. I would guess the CPU spike is from full GCs which with 
that small heap of a heap is probably from evacuation failures. Reserving more 
of the heap to be free (-XX:G1ReservePercent=25) can help, along with 
increasing the amount of heap. 8GB is pretty small for G1, might be better off 
with CMS.

Chris


On Jun 13, 2018, at 8:42 AM, rajpal reddy 
mailto:rajpalreddy...@gmail.com>> wrote:

Hello,

we are using G1GC and noticing garbage collection taking a while and during 
that process we are seeing cpu spiking up to 70-80%. can you please let us 
know. if we have to tune any parameters for that. attaching the cassandra-env 
file with jam-options.
-
To unsubscribe, e-mail: 
user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: 
user-h...@cassandra.apache.org


-
To unsubscribe, e-mail: 
user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: 
user-h...@cassandra.apache.org



The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 313


RE: compaction_throughput: Difference between 0 (unthrottled) and large value

2018-06-11 Thread Steinmaurer, Thomas
Sorry, should have first looked at the source code. In case of 0, it is set to 
Double.MAX_VALUE.

Thomas

From: Steinmaurer, Thomas [mailto:thomas.steinmau...@dynatrace.com]
Sent: Montag, 11. Juni 2018 08:53
To: user@cassandra.apache.org
Subject: compaction_throughput: Difference between 0 (unthrottled) and large 
value

Hello,

on a 3 node loadtest cluster with very capable machines (32 physical cores, 
512G RAM, 20T storage (26 disk RAID)), I'm trying to max out compaction, thus 
currently testing with:

concurrent_compactors: 16
compaction_throughput_mb_per_sec: 0

With our simulated incoming load + compaction etc., the Linux volume shows ~ 20 
Mbyte/s Read IO + 50 Mbyte/s Write IO in AVG, constantly.


Setting throughput to 0 should mean unthrottled, right? Is this really 
unthrottled from a throughput perspective and then is basically limited by disk 
capabilities only? Or should it be better set to a very high value instead of 
0. Is there any semantical difference here?


Thanks,
Thomas

The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 313
The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 313


compaction_throughput: Difference between 0 (unthrottled) and large value

2018-06-11 Thread Steinmaurer, Thomas
Hello,

on a 3 node loadtest cluster with very capable machines (32 physical cores, 
512G RAM, 20T storage (26 disk RAID)), I'm trying to max out compaction, thus 
currently testing with:

concurrent_compactors: 16
compaction_throughput_mb_per_sec: 0

With our simulated incoming load + compaction etc., the Linux volume shows ~ 20 
Mbyte/s Read IO + 50 Mbyte/s Write IO in AVG, constantly.


Setting throughput to 0 should mean unthrottled, right? Is this really 
unthrottled from a throughput perspective and then is basically limited by disk 
capabilities only? Or should it be better set to a very high value instead of 
0. Is there any semantical difference here?


Thanks,
Thomas

The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freist?dterstra?e 313


Compaction throughput vs. number of compaction threads?

2018-06-05 Thread Steinmaurer, Thomas
Hello,

most likely obvious and perhaps already answered in the past, but just want to 
be sure ...

E.g. I have set:

concurrent_compactors: 4
compaction_throughput_mb_per_sec: 16

I guess this will lead to ~ 4MB/s per Thread if I have 4 compactions running in 
parallel?

So, in case of upscaling a machine and following the recommendation in 
cassandra.yaml I may set:

concurrent_compactors: 8


If this throughput remains unchanged, does this mean that we have 2 MB/s per 
Thread then, e.g. largish compactions running on a single thread taking twice 
the time then?

Using Cassandra 2.1 and 3.11 in case this matters.


Thanks a lot!
Thomas

The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freist?dterstra?e 313


RE: 3.11.2 memory leak

2018-06-04 Thread Steinmaurer, Thomas
Jeff,

FWIW, when talking about https://issues.apache.org/jira/browse/CASSANDRA-13929, 
there is a patch available since March without getting further attention.

Regards,
Thomas

From: Jeff Jirsa [mailto:jji...@gmail.com]
Sent: Dienstag, 05. Juni 2018 00:51
To: cassandra 
Subject: Re: 3.11.2 memory leak

There have been a few people who have reported it, but nobody (yet) has offered 
a patch to fix it. It would be good to have a reliable way to repro, and/or an 
analysis of a heap dump demonstrating the problem (what's actually retained at 
the time you're OOM'ing).

On Mon, Jun 4, 2018 at 6:52 AM, Abdul Patel 
mailto:abd786...@gmail.com>> wrote:
Hi All,

I recently upgraded my non prod cluster from 3.10 to 3.11.2.
It was working fine for a 1.5 weeks then suddenly nodetool info startee 
reporting 80% and more memory consumption.
Intially it was 16gb configured, then i bumped to 20gb and rebooted all 4 nodes 
of cluster-single DC.
Now after 8 days i again see 80% + usage and its 16gb and above ..which we 
never saw before .
Seems like memory leak bug?
Does anyone has any idea ? Our 3.11.2 release rollout has been halted because 
of this.
If not 3.11.2 whats the next best stable release we have now?

The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 313


RE: nodetool (2.1.18) - Xmx, ParallelGCThreads, High CPU usage

2018-05-29 Thread Steinmaurer, Thomas
Kurt,

in our test it also didn’t made a difference with the default number of GC 
Threads (43 on our large machine) and running with Xmx128M or XmX31G (derived 
from $MAX_HEAP_SIZE). For both Xmx, we saw the high CPU caused by nodetool.

Regards,
Thomas

From: kurt greaves [mailto:k...@instaclustr.com]
Sent: Dienstag, 29. Mai 2018 13:06
To: User 
Subject: Re: nodetool (2.1.18) - Xmx, ParallelGCThreads, High CPU usage

Thanks Thomas. After a bit more research today I found that the whole 
$MAX_HEAP_SIZE issue isn't really a problem because we don't explicitly set 
-Xms so the minimum heapsize by default will be 256mb, which isn't hugely 
problematic, and it's unlikely more than that would get allocated.

On 29 May 2018 at 09:29, Steinmaurer, Thomas 
mailto:thomas.steinmau...@dynatrace.com>> 
wrote:
Hi Kurt,

thanks for pointing me to the Xmx issue.

JIRA + patch (for Linux only based on C* 3.11) for the parallel GC thread issue 
is available here: https://issues.apache.org/jira/browse/CASSANDRA-14475

Thanks,
Thomas

From: kurt greaves [mailto:k...@instaclustr.com<mailto:k...@instaclustr.com>]
Sent: Dienstag, 29. Mai 2018 05:54
To: User mailto:user@cassandra.apache.org>>
Subject: Re: nodetool (2.1.18) - Xmx, ParallelGCThreads, High CPU usage

1) nodetool is reusing the $MAX_HEAP_SIZE environment variable, thus if we are 
running Cassandra with e.g. Xmx31G, nodetool is started with Xmx31G as well
This was fixed in 3.0.11/3.10 in 
CASSANDRA-12739<https://issues.apache.org/jira/browse/CASSANDRA-12739>. Not 
sure why it didn't make it into 2.1/2.2.
2) As -XX:ParallelGCThreads is not explicitly set upon startup, this basically 
defaults to a value dependent on the number of cores. In our case, with the 
machine above, the number of parallel GC threads for the JVM is set to 43!
3) Test-wise, we have adapted the nodetool startup script in a way to get a 
Java Flight Recording file on JVM exit, thus with each nodetool invocation we 
can inspect a JFR file. Here we may have seen System.gc() calls (without 
visible knowledge where they come from), GC times for the entire JVM life-time 
(e.g. ~1min) showing high cpu. This happened for both Xmx128M (default as it 
seems) and Xmx31G

After explicitly setting -XX:ParallelGCThreads=1 in the nodetool startup 
script, CPU usage spikes by nodetool are entirely gone.

Is this something which has been already adapted/tackled in Cassandra versions 
> 2.1 or worth to be considered as some sort of RFC?
Can you create a JIRA for this (and a patch, if you like)? We should be 
explicitly setting this on nodetool invocations.
​
The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 313

The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 313


RE: nodetool (2.1.18) - Xmx, ParallelGCThreads, High CPU usage

2018-05-29 Thread Steinmaurer, Thomas
Hi Kurt,

thanks for pointing me to the Xmx issue.

JIRA + patch (for Linux only based on C* 3.11) for the parallel GC thread issue 
is available here: https://issues.apache.org/jira/browse/CASSANDRA-14475

Thanks,
Thomas

From: kurt greaves [mailto:k...@instaclustr.com]
Sent: Dienstag, 29. Mai 2018 05:54
To: User 
Subject: Re: nodetool (2.1.18) - Xmx, ParallelGCThreads, High CPU usage

1) nodetool is reusing the $MAX_HEAP_SIZE environment variable, thus if we are 
running Cassandra with e.g. Xmx31G, nodetool is started with Xmx31G as well
This was fixed in 3.0.11/3.10 in 
CASSANDRA-12739. Not 
sure why it didn't make it into 2.1/2.2.
2) As -XX:ParallelGCThreads is not explicitly set upon startup, this basically 
defaults to a value dependent on the number of cores. In our case, with the 
machine above, the number of parallel GC threads for the JVM is set to 43!
3) Test-wise, we have adapted the nodetool startup script in a way to get a 
Java Flight Recording file on JVM exit, thus with each nodetool invocation we 
can inspect a JFR file. Here we may have seen System.gc() calls (without 
visible knowledge where they come from), GC times for the entire JVM life-time 
(e.g. ~1min) showing high cpu. This happened for both Xmx128M (default as it 
seems) and Xmx31G

After explicitly setting -XX:ParallelGCThreads=1 in the nodetool startup 
script, CPU usage spikes by nodetool are entirely gone.

Is this something which has been already adapted/tackled in Cassandra versions 
> 2.1 or worth to be considered as some sort of RFC?
Can you create a JIRA for this (and a patch, if you like)? We should be 
explicitly setting this on nodetool invocations.
​
The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 313


nodetool (2.1.18) - Xmx, ParallelGCThreads, High CPU usage

2018-05-28 Thread Steinmaurer, Thomas
Hello,

on a quite capable machine with 32 physical cores (64 vCPUs) we see sporadic 
CPU usage up to 50% caused by nodetool on this box, thus digged a bit further. 
A few observations:

1) nodetool is reusing the $MAX_HEAP_SIZE environment variable, thus if we are 
running Cassandra with e.g. Xmx31G, nodetool is started with Xmx31G as well
2) As -XX:ParallelGCThreads is not explicitly set upon startup, this basically 
defaults to a value dependent on the number of cores. In our case, with the 
machine above, the number of parallel GC threads for the JVM is set to 43!
3) Test-wise, we have adapted the nodetool startup script in a way to get a 
Java Flight Recording file on JVM exit, thus with each nodetool invocation we 
can inspect a JFR file. Here we may have seen System.gc() calls (without 
visible knowledge where they come from), GC times for the entire JVM life-time 
(e.g. ~1min) showing high cpu. This happened for both Xmx128M (default as it 
seems) and Xmx31G

After explicitly setting -XX:ParallelGCThreads=1 in the nodetool startup 
script, CPU usage spikes by nodetool are entirely gone.

Is this something which has been already adapted/tackled in Cassandra versions 
> 2.1 or worth to be considered as some sort of RFC?

Thanks,
Thomas

The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freist?dterstra?e 313


RE: Anyone try out C* with latest Oracle JDK update?

2018-05-24 Thread Steinmaurer, Thomas
Hi Sam,

in our pre-production stages, we are running Cassandra 3.0 and 3.11 with 8u172 
(previously u102 then u162) without any visible troubles/regressions.

In case of Cassandra 3.11, you need 3.11.2 due to: 
https://issues.apache.org/jira/browse/CASSANDRA-14173. Cassandra 3.0 is not 
affected by this issue.

Hope this helps.

Thomas

-Original Message-
From: Sam Sriramadhesikan [mailto:sam.sriramadhesi...@oracle.com]
Sent: Mittwoch, 23. Mai 2018 19:33
To: user@cassandra.apache.org
Subject: Anyone try out C* with latest Oracle JDK update?

Hi,

Any experiences any one have running C* 3.x with the Oracle JDK update 1.8 u172 
/ u171 that came out mid-April?

Thanks,

Sam
-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 313


Repair of 5GB data vs. disk throughput does not make sense

2018-04-26 Thread Steinmaurer, Thomas
Hello,

yet another question/issue with repair.

Cassandra 2.1.18, 3 nodes, RF=3, vnode=256, data volume ~ 5G per node only. A 
repair (nodetool repair -par) issued on a single node at this data volume takes 
around 36min with an AVG of ~ 15MByte/s disk throughput (read+write) for the 
entire time-frame, thus processing ~ 32GByte from a disk perspective so ~ 6 
times of the real data volume reported by nodetool status. Does this make any 
sense? This is with 4 compaction threads and compaction throughput = 64. 
Similar results doing this test a few times, where most/all inconsistent data 
should be already sorted out by previous runs.

I know there is e.g. reaper, but the above is a simple use case simply after a 
single failed node recovers beyond the 3h hinted handoff window. How should 
this finish in a timely manner for > 500G on a recovering node?

I have to admit this is with NFS as storage. I know, NFS might not be the best 
idea, but with the above test at ~ 5GB data volume, we see an IOPS rate at ~ 
700 at a disk latency of ~ 15ms, thus I wouldn't treat it as that bad. This all 
is using/running Cassandra on-premise (at the customer, so not hosted by us), 
so while we can make recommendations storage-wise (of course preferring local 
disks), it may and will happen that NFS is being in use then.

Why we are using -par in combination with NFS is a different story and related 
to this issue: https://issues.apache.org/jira/browse/CASSANDRA-8743. Without 
switching from sequential to parallel repair, we basically kill Cassandra.

Throughput-wise, I also don't think it is related to NFS, cause we see similar 
repair throughput values with AWS EBS (gp2, SSD based) running regular repairs 
on small-sized CFs.

Thanks for any input.
Thomas
The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freist?dterstra?e 313


RE: Cassandra 2.1.18 - Concurrent nodetool repair resulting in > 30K SSTables for a single small (GBytes) CF

2018-03-06 Thread Steinmaurer, Thomas
Hi Kurt,

our provisioning layer allows extending a cluster only one-by-one, thus we 
didn’t add multiple nodes at the same time.

What we did have was some sort of overlapping between our daily repair cronjob 
and the newly added node still in progress joining. Don’t know if this sort of 
combination might causing troubles.

I did some further testing and run on the same node the following repair call.

nodetool repair -pr ks cf1 cf2

with waiting a few minutes after each finished execution and every time I see 
“… out of sync …” log messages in context of the repair, so it looks like, that 
each repair execution is detecting inconsistencies. Does this make sense?

As said: We are using vnodes (256), RF=3. Additionally, we are writing at CL 
ANY, reading at ONE and repair chance for the 2 CFs in question is default 0.1

Currently testing a few consecutives executions without -pr on the same node.

Thanks,
Thomas

From: kurt greaves [mailto:k...@instaclustr.com]
Sent: Montag, 05. März 2018 01:10
To: User 
Subject: Re: Cassandra 2.1.18 - Concurrent nodetool repair resulting in > 30K 
SSTables for a single small (GBytes) CF

Repairs with vnodes is likely to cause a lot of small SSTables if you have 
inconsistencies (at least 1 per vnode). Did you have any issues when adding 
nodes, or did you add multiple nodes at a time? Anything that could have lead 
to a bit of inconsistency could have been the cause.

I'd probably avoid running the repairs across all the nodes simultaneously and 
instead spread them out over a week. That likely made it worse. Also worth 
noting that in versions 3.0+ you won't be able to run nodetool repair in such a 
way because anti-compaction will be triggered which will fail if multiple 
anti-compactions are attempted simultaneously (if you run multiple repairs 
simultaneously).

Have a look at orchestrating your repairs with TLP's fork of 
cassandra-reaper.
​
The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 313


Cassandra 2.1.18 - Concurrent nodetool repair resulting in > 30K SSTables for a single small (GBytes) CF

2018-03-01 Thread Steinmaurer, Thomas
Hello,

Production, 9 node cluster with Cassandra 2.1.18, vnodes, default 256 tokens, 
RF=3, compaction throttling = 16, concurrent compactors = 4, running in AWS 
using m4.xlarge at ~ 35% CPU AVG

We have a nightly cronjob starting a "nodetool repair -pr ks cf1 cf2" 
concurrently on all nodes, where data volume for cf1 and cf2 is ~ 1-5GB in 
size, so pretty small.

After extending the cluster from 6 to the current 9 nodes and "nodetool 
cleanup" being finished, the above repair is resulting in > 30K SSTables for 
these two CFs on several nodes with very, very tiny files < 1Kbytes , but not 
on all nodes. Obviously, this affects read latency + disk IO + CPU a lot and it 
needs several hours until the situation relaxes. We have other clusters with 
the same spec which also have been extended from 6 to 9 nodes in the past, 
where we don't see this issue. For now, we have disabled the nightly cron job.

Any input on how to trouble-shoot this issue about the root cause?

Thanks,
Thomas

The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freist?dterstra?e 313


RE: if the heap size exceeds 32GB..

2018-02-12 Thread Steinmaurer, Thomas
Stick with 31G in your case. Another article on compressed Oops: 
https://blog.codecentric.de/en/2014/02/35gb-heap-less-32gb-java-jvm-memory-oddities/

Thomas

From: Eunsu Kim [mailto:eunsu.bil...@gmail.com]
Sent: Dienstag, 13. Februar 2018 08:09
To: user@cassandra.apache.org
Subject: if the heap size exceeds 32GB..

https://www.elastic.co/guide/en/elasticsearch/guide/current/heap-sizing.html#compressed_oops

According to the article above, if the heap size of the JVM is about 32GB, it 
is a waste of memory because it can not use the compress object pointer. (Of 
course talking about ES)

But if this is a general theory about the JVM, does that apply to Cassandra as 
well?

I am using a 64 GB physical memory server and I am concerned about heap size 
allocation.

Thank you.
The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freist?dterstra?e 313


RE: Old tombstones not being cleaned up

2018-02-01 Thread Steinmaurer, Thomas
Right. In this case, cleanup should have done the necessary work here.

Thomas

From: Bo Finnerup Madsen [mailto:bo.gunder...@gmail.com]
Sent: Freitag, 02. Februar 2018 06:59
To: user@cassandra.apache.org
Subject: Re: Old tombstones not being cleaned up

We did start with a 3 node cluster and a RF of 3, then added another 3 nodes 
and again another 3 nodes. So it is a good guess :)
But I have run both repair and cleanup against the table on all nodes, would 
that not have removed any stray partitions?
tor. 1. feb. 2018 kl. 22.31 skrev Steinmaurer, Thomas 
<thomas.steinmau...@dynatrace.com<mailto:thomas.steinmau...@dynatrace.com>>:
Did you started with a 9 node cluster from the beginning or did you extend / 
scale out your cluster (with vnodes) beyond the replication factor?

If later applies and if you are deleting by explicit deletes and not via TTL, 
then nodes might not see the deletes anymore, as a node might not own the 
partition anymore after a topology change (e.g. scale out beyond the keyspace 
RF).

Just a very wild guess.

Thomas

From: Bo Finnerup Madsen 
[mailto:bo.gunder...@gmail.com<mailto:bo.gunder...@gmail.com>]
Sent: Donnerstag, 01. Februar 2018 22:14

To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: Old tombstones not being cleaned up

We do not use TTL anywhere...records are inserted and deleted "manually" by our 
software.
tor. 1. feb. 2018 kl. 18.29 skrev Jonathan Haddad 
<j...@jonhaddad.com<mailto:j...@jonhaddad.com>>:
Changing the defaul TTL doesn’t change the TTL on the existing data, only new 
data. It’s only set if you don’t supply one yourself.

On Wed, Jan 31, 2018 at 11:35 PM Bo Finnerup Madsen 
<bo.gunder...@gmail.com<mailto:bo.gunder...@gmail.com>> wrote:
Hi,

We are running a small 9 node Cassandra v2.1.17 cluster. The cluster generally 
runs fine, but we have one table that are causing OOMs because an enormous 
amount of tombstones.
Looking at the data in the table (sstable2json), the first of the tombstones 
are almost a year old. The table was initially created with a gc_grace_period 
of 10 days, but I have now lowered it to 1 hour.
I have run a full repair of the table across all nodes. I have forced several 
major compactions of the table by using "nodetool compact", and also tried to 
switch from LeveledCompaction to SizeTierCompaction and back.

What could cause cassandra to keep these tombstones?

sstable2json:
{"key": "foo",
 "cells": 
[["082f-25ef-4324-bb8a-8cf013c823c1:_","082f-25ef-4324-bb8a-8cf013c823c1:!",1507819135148000,"t",1507819135],
   
["10f3-c05d-4ab9-9b8a-e6ebd8f5818a:_","10f3-c05d-4ab9-9b8a-e6ebd8f5818a:!",1503661731697000,"t",1503661731],
   
["1d7a-ce95-4c74-b67e-f8cdffec4f85:_","1d7a-ce95-4c74-b67e-f8cdffec4f85:!",1509542102909000,"t",1509542102],
   
["1dd3-ae22-4f6e-944a-8cfa147cde68:_","1dd3-ae22-4f6e-944a-8cfa147cde68:!",1512418006838000,"t",1512418006],
   
["22cc-d69c-4596-89e5-3e976c0cb9a8:_","22cc-d69c-4596-89e5-3e976c0cb9a8:!",1497377448737001,"t",1497377448],
   
["2777-4b1a-4267-8efc-c43054e63170:_","2777-4b1a-4267-8efc-c43054e63170:!",1491014691515001,"t",1491014691],
   
["61e8-f48b-4484-96f1-f8b6a3ed8f9f:_","61e8-f48b-4484-96f1-f8b6a3ed8f9f:!",1500820300544000,"t",1500820300],
   
["63da-f165-449b-b65d-2b7869368734:_","63da-f165-449b-b65d-2b7869368734:!",1512806634968000,"t",1512806634],
   
["656f-f8b5-472b-93ed-1a893002f027:_","656f-f8b5-472b-93ed-1a893002f027:!",1514554716141000,"t",1514554716],
...
{"key": "bar",
 "metadata": {"deletionInfo": 
{"markedForDeleteAt":1517402198585982,"localDeletionTime":1517402198}},
 "cells": 
[["000af8c2-ffe9-4217-9032-61a1cd21781d:_","000af8c2-ffe9-4217-9032-61a1cd21781d:!",1495094965916000,"t",1495094965],
   
["005b96cb-7eb3-4ec3-bfa2-8573e46892f4:_","005b96cb-7eb3-4ec3-bfa2-8573e46892f4:!",1516360186865000,"t",1516360186],
   
["005ec167-aa61-4868-a3ae-a44b00099eb6:_","005ec167-aa61-4868-a3ae-a44b00099eb6:!",1516671840920002,"t",1516671840],


sstablemetadata:
stablemetadata 
/data/cassandra/data/xxx/yyy-9ed502c0734011e6a128fdafd829b1c6/ddp-yyy-ka-2741-Data.db
SSTable: 
/data/cassandra/data/xxx/yyy-9ed502c0734011e6a128fdafd829b1c6/ddp-yyy-ka-2741
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Bloom Filter FP chance: 0.10
Minimum timestamp: 1488976211688000
Maximum timestamp: 1

RE: Old tombstones not being cleaned up

2018-02-01 Thread Steinmaurer, Thomas
Did you started with a 9 node cluster from the beginning or did you extend / 
scale out your cluster (with vnodes) beyond the replication factor?

If later applies and if you are deleting by explicit deletes and not via TTL, 
then nodes might not see the deletes anymore, as a node might not own the 
partition anymore after a topology change (e.g. scale out beyond the keyspace 
RF).

Just a very wild guess.

Thomas

From: Bo Finnerup Madsen [mailto:bo.gunder...@gmail.com]
Sent: Donnerstag, 01. Februar 2018 22:14
To: user@cassandra.apache.org
Subject: Re: Old tombstones not being cleaned up

We do not use TTL anywhere...records are inserted and deleted "manually" by our 
software.
tor. 1. feb. 2018 kl. 18.29 skrev Jonathan Haddad 
>:
Changing the defaul TTL doesn’t change the TTL on the existing data, only new 
data. It’s only set if you don’t supply one yourself.

On Wed, Jan 31, 2018 at 11:35 PM Bo Finnerup Madsen 
> wrote:
Hi,

We are running a small 9 node Cassandra v2.1.17 cluster. The cluster generally 
runs fine, but we have one table that are causing OOMs because an enormous 
amount of tombstones.
Looking at the data in the table (sstable2json), the first of the tombstones 
are almost a year old. The table was initially created with a gc_grace_period 
of 10 days, but I have now lowered it to 1 hour.
I have run a full repair of the table across all nodes. I have forced several 
major compactions of the table by using "nodetool compact", and also tried to 
switch from LeveledCompaction to SizeTierCompaction and back.

What could cause cassandra to keep these tombstones?

sstable2json:
{"key": "foo",
 "cells": 
[["082f-25ef-4324-bb8a-8cf013c823c1:_","082f-25ef-4324-bb8a-8cf013c823c1:!",1507819135148000,"t",1507819135],
   
["10f3-c05d-4ab9-9b8a-e6ebd8f5818a:_","10f3-c05d-4ab9-9b8a-e6ebd8f5818a:!",1503661731697000,"t",1503661731],
   
["1d7a-ce95-4c74-b67e-f8cdffec4f85:_","1d7a-ce95-4c74-b67e-f8cdffec4f85:!",1509542102909000,"t",1509542102],
   
["1dd3-ae22-4f6e-944a-8cfa147cde68:_","1dd3-ae22-4f6e-944a-8cfa147cde68:!",1512418006838000,"t",1512418006],
   
["22cc-d69c-4596-89e5-3e976c0cb9a8:_","22cc-d69c-4596-89e5-3e976c0cb9a8:!",1497377448737001,"t",1497377448],
   
["2777-4b1a-4267-8efc-c43054e63170:_","2777-4b1a-4267-8efc-c43054e63170:!",1491014691515001,"t",1491014691],
   
["61e8-f48b-4484-96f1-f8b6a3ed8f9f:_","61e8-f48b-4484-96f1-f8b6a3ed8f9f:!",1500820300544000,"t",1500820300],
   
["63da-f165-449b-b65d-2b7869368734:_","63da-f165-449b-b65d-2b7869368734:!",1512806634968000,"t",1512806634],
   
["656f-f8b5-472b-93ed-1a893002f027:_","656f-f8b5-472b-93ed-1a893002f027:!",1514554716141000,"t",1514554716],
...
{"key": "bar",
 "metadata": {"deletionInfo": 
{"markedForDeleteAt":1517402198585982,"localDeletionTime":1517402198}},
 "cells": 
[["000af8c2-ffe9-4217-9032-61a1cd21781d:_","000af8c2-ffe9-4217-9032-61a1cd21781d:!",1495094965916000,"t",1495094965],
   
["005b96cb-7eb3-4ec3-bfa2-8573e46892f4:_","005b96cb-7eb3-4ec3-bfa2-8573e46892f4:!",1516360186865000,"t",1516360186],
   
["005ec167-aa61-4868-a3ae-a44b00099eb6:_","005ec167-aa61-4868-a3ae-a44b00099eb6:!",1516671840920002,"t",1516671840],


sstablemetadata:
stablemetadata 
/data/cassandra/data/xxx/yyy-9ed502c0734011e6a128fdafd829b1c6/ddp-yyy-ka-2741-Data.db
SSTable: 
/data/cassandra/data/xxx/yyy-9ed502c0734011e6a128fdafd829b1c6/ddp-yyy-ka-2741
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Bloom Filter FP chance: 0.10
Minimum timestamp: 1488976211688000
Maximum timestamp: 1517468644066000
SSTable max local deletion time: 2147483647
Compression ratio: 0.5121956624389545
Estimated droppable tombstones: 18.00161766553587
SSTable Level: 0
Repaired at: 0
ReplayPosition(segmentId=1517168739626, 
position=22690189)
Estimated tombstone drop times:%n
1488976211: 1
1489906506:  4706
1490174752:  6111
1490449759:  6554
1490735410:  6559
1491016789:  6369
1491347982: 10216
1491680214: 13502
...

desc:
CREATE TABLE xxx.yyy (
ti text,
uuid text,
json_data text,
PRIMARY KEY (ti, uuid)
) WITH CLUSTERING ORDER BY (uuid ASC)
AND bloom_filter_fp_chance = 0.1
AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
AND comment = ''
AND compaction = {'class': 
'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'}
AND compression = {'sstable_compression': 
'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 3600
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE';

jmx 

RE: Cleanup blocking snapshots - Options?

2018-01-30 Thread Steinmaurer, Thomas
Hi Kurt,

had another try now, and yes, with 2.1.18, this constantly happens. Currently 
running nodetool cleanup on a single node in production with disabled hourly 
snapshots. SSTables with > 100G involved here. Triggering nodetool snapshot 
will result in being blocked. From an operational perspective, a bit annoying 
right now 

Have asked on https://issues.apache.org/jira/browse/CASSANDRA-13873 regarding a 
backport to 2.1, but possibly won’t get attention, cause the ticket has been 
resolved for 2.2+ already.

Regards,
Thomas

From: kurt greaves [mailto:k...@instaclustr.com]
Sent: Montag, 15. Jänner 2018 06:18
To: User <user@cassandra.apache.org>
Subject: Re: Cleanup blocking snapshots - Options?

Disabling the snapshots is the best and only real option other than upgrading 
at the moment. Although apparently it was thought that there was only a small 
race condition in 2.1 that triggered this and it wasn't worth fixing. If you 
are triggering it easily maybe it is worth fixing in 2.1 as well. Does this 
happen consistently? Can you provide some more logs on the JIRA or better yet a 
way to reproduce?

On 14 January 2018 at 16:12, Steinmaurer, Thomas 
<thomas.steinmau...@dynatrace.com<mailto:thomas.steinmau...@dynatrace.com>> 
wrote:
Hello,

we are running 2.1.18 with vnodes in production and due to 
(https://issues.apache.org/jira/browse/CASSANDRA-11155) we can’t run cleanup 
e.g. after extending the cluster without blocking our hourly snapshots.

What options do we have to get rid of partitions a node does not own anymore?

• Using a version which has this issue fixed, although upgrading to 
2.2+, due to various issues, is not an option at the moment

• Temporarily disabling the hourly cron job before starting cleanup and 
re-enable after cleanup has finished

• Any other way to re-write SSTables with data a node owns after a 
cluster scale out

Thanks,
Thomas

The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freist<https://maps.google.com/?q=4040+Linz,+Austria,+Freist%C3%A4dterstra%C3%9Fe+313=gmail=g>ädterstra<https://maps.google.com/?q=4040+Linz,+Austria,+Freist%C3%A4dterstra%C3%9Fe+313=gmail=g>ße
 
313<https://maps.google.com/?q=4040+Linz,+Austria,+Freist%C3%A4dterstra%C3%9Fe+313=gmail=g>

The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 313


Cassandra 3.11 - nodetool cleanup - Compaction interrupted ...

2018-01-22 Thread Steinmaurer, Thomas
Hello,

when triggering a "nodetool cleanup" with Cassandra 3.11, the nodetool call 
almost returns instantly and I see the following INFO log.

INFO  [CompactionExecutor:54] 2018-01-22 12:59:53,903 
CompactionManager.java:1777 - Compaction interrupted: 
Compaction@fc9b0073-1008-3a07-aeb9-baf6f3cd0b1c(ruxitdb, Ts2Final05Min, 
98624438/107305082)bytes
INFO  [CompactionExecutor:69] 2018-01-22 12:59:54,135 
CompactionManager.java:1777 - Compaction interrupted: 
Compaction@ea0742f8-f3be-365d-a689-26ab346fdfb0(ruxitdb, Ts2Final01Min, 
49238516/72948781)bytes


No error etc. Does the "compaction interrupted" sound ok?

Thanks,
Thomas

The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freist?dterstra?e 313


RE: Cassandra 3.11 fails to start with JDK8u162

2018-01-18 Thread Steinmaurer, Thomas
Ben,

at least 3.0.14 starts up fine for me with 8u162.

Regards,
Thomas

From: Ben Wood [mailto:bw...@mesosphere.io]
Sent: Donnerstag, 18. Jänner 2018 23:24
To: user@cassandra.apache.org
Subject: Re: Cassandra 3.11 fails to start with JDK8u162

I'm correct in assuming 10091 didn't go into 3.0?

On Thu, Jan 18, 2018 at 2:32 AM, Steinmaurer, Thomas 
<thomas.steinmau...@dynatrace.com<mailto:thomas.steinmau...@dynatrace.com>> 
wrote:
Sam,

thanks for the confirmation. Going back to u152 then.

Thomas

From: li...@beobal.com<mailto:li...@beobal.com> 
[mailto:li...@beobal.com<mailto:li...@beobal.com>] On Behalf Of Sam Tunnicliffe
Sent: Donnerstag, 18. Jänner 2018 10:16
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: Cassandra 3.11 fails to start with JDK8u162

This isn't (wasn't) a known issue, but the way that CASSANDRA-10091 was 
implemented using internal JDK classes means it was always possible that a 
minor JVM version change could introduce incompatibilities (CASSANDRA-2967 is 
also relevant).
We did already know that we need to revisit the way this works in 4.0 for JDK9 
support (CASSANDRA-9608), so we should identify a more stable solution & apply 
that to both 3.11 and 4.0.
In the meantime, downgrading to 152 is the only real option.

I've opened https://issues.apache.org/jira/browse/CASSANDRA-14173 for this.

Thanks,
Sam


On 18 January 2018 at 08:43, Nicolas Guyomar 
<nicolas.guyo...@gmail.com<mailto:nicolas.guyo...@gmail.com>> wrote:
Thank you Thomas for starting this thread, I'm having exactly the same issue on 
AWS EC2 RHEL-7.4_HVM-20180103-x86_64-2-Hourly2-GP2 (ami-dc13a4a1)  I was 
starting to bang my head on my desk !

So I'll try to downgrade back to 152 then !



On 18 January 2018 at 08:34, Steinmaurer, Thomas 
<thomas.steinmau...@dynatrace.com<mailto:thomas.steinmau...@dynatrace.com>> 
wrote:
Hello,

after switching from JDK8u152 to JDK8u162, Cassandra fails with the following 
stack trace upon startup.

ERROR [main] 2018-01-18 07:33:18,804 CassandraDaemon.java:706 - Exception 
encountered during startup
java.lang.AbstractMethodError: 
org.apache.cassandra.utils.JMXServerUtils$Exporter.exportObject(Ljava/rmi/Remote;ILjava/rmi/server/RMIClientSocketFactory;Ljava/rmi/server/RMIServerSocketFactory;Lsun/misc/ObjectInputFilter;)Ljava/rmi/Remote;
at 
javax.management.remote.rmi.RMIJRMPServerImpl.export(RMIJRMPServerImpl.java:150)
 ~[na:1.8.0_162]
at 
javax.management.remote.rmi.RMIJRMPServerImpl.export(RMIJRMPServerImpl.java:135)
 ~[na:1.8.0_162]
at 
javax.management.remote.rmi.RMIConnectorServer.start(RMIConnectorServer.java:405)
 ~[na:1.8.0_162]
at 
org.apache.cassandra.utils.JMXServerUtils.createJMXServer(JMXServerUtils.java:104)
 ~[apache-cassandra-3.11.2-SNAPSHOT.jar:3.11.2-SNAPSHOT]
at 
org.apache.cassandra.service.CassandraDaemon.maybeInitJmx(CassandraDaemon.java:143)
 [apache-cassandra-3.11.2-SNAPSHOT.jar:3.11.2-SNAPSHOT]
at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:188) 
[apache-cassandra-3.11.2-SNAPSHOT.jar:3.11.2-SNAPSHOT]
at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:600) 
[apache-cassandra-3.11.2-SNAPSHOT.jar:3.11.2-SNAPSHOT]
at 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:689) 
[apache-cassandra-3.11.2-SNAPSHOT.jar:3.11.2-SNAPSHOT]

Is this a known issue?


Thanks,
Thomas

The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freist<https://maps.google.com/?q=4040+Linz,+Austria,+Freist%C3%A4dterstra%C3%9Fe+313=gmail=g>ädterstra<https://maps.google.com/?q=4040+Linz,+Austria,+Freist%C3%A4dterstra%C3%9Fe+313=gmail=g>ße
 
313<https://maps.google.com/?q=4040+Linz,+Austria,+Freist%C3%A4dterstra%C3%9Fe+313=gmail=g>


The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 
313<https://maps.google.com/?q=4040+Linz,+Austria,+Freist%C3%A4dterstra%C3%9Fe+313=gmail=g>



--
Ben Wood
Software Engineer - Data Agility
Mesosphere
The contents of this e-mail are intended for the named addressee only. It 

RE: Cassandra 3.11 fails to start with JDK8u162

2018-01-18 Thread Steinmaurer, Thomas
Sam,

thanks for the confirmation. Going back to u152 then.

Thomas

From: li...@beobal.com [mailto:li...@beobal.com] On Behalf Of Sam Tunnicliffe
Sent: Donnerstag, 18. Jänner 2018 10:16
To: user@cassandra.apache.org
Subject: Re: Cassandra 3.11 fails to start with JDK8u162

This isn't (wasn't) a known issue, but the way that CASSANDRA-10091 was 
implemented using internal JDK classes means it was always possible that a 
minor JVM version change could introduce incompatibilities (CASSANDRA-2967 is 
also relevant).
We did already know that we need to revisit the way this works in 4.0 for JDK9 
support (CASSANDRA-9608), so we should identify a more stable solution & apply 
that to both 3.11 and 4.0.
In the meantime, downgrading to 152 is the only real option.

I've opened https://issues.apache.org/jira/browse/CASSANDRA-14173 for this.

Thanks,
Sam


On 18 January 2018 at 08:43, Nicolas Guyomar 
<nicolas.guyo...@gmail.com<mailto:nicolas.guyo...@gmail.com>> wrote:
Thank you Thomas for starting this thread, I'm having exactly the same issue on 
AWS EC2 RHEL-7.4_HVM-20180103-x86_64-2-Hourly2-GP2 (ami-dc13a4a1)  I was 
starting to bang my head on my desk !

So I'll try to downgrade back to 152 then !



On 18 January 2018 at 08:34, Steinmaurer, Thomas 
<thomas.steinmau...@dynatrace.com<mailto:thomas.steinmau...@dynatrace.com>> 
wrote:
Hello,

after switching from JDK8u152 to JDK8u162, Cassandra fails with the following 
stack trace upon startup.

ERROR [main] 2018-01-18 07:33:18,804 CassandraDaemon.java:706 - Exception 
encountered during startup
java.lang.AbstractMethodError: 
org.apache.cassandra.utils.JMXServerUtils$Exporter.exportObject(Ljava/rmi/Remote;ILjava/rmi/server/RMIClientSocketFactory;Ljava/rmi/server/RMIServerSocketFactory;Lsun/misc/ObjectInputFilter;)Ljava/rmi/Remote;
at 
javax.management.remote.rmi.RMIJRMPServerImpl.export(RMIJRMPServerImpl.java:150)
 ~[na:1.8.0_162]
at 
javax.management.remote.rmi.RMIJRMPServerImpl.export(RMIJRMPServerImpl.java:135)
 ~[na:1.8.0_162]
at 
javax.management.remote.rmi.RMIConnectorServer.start(RMIConnectorServer.java:405)
 ~[na:1.8.0_162]
at 
org.apache.cassandra.utils.JMXServerUtils.createJMXServer(JMXServerUtils.java:104)
 ~[apache-cassandra-3.11.2-SNAPSHOT.jar:3.11.2-SNAPSHOT]
at 
org.apache.cassandra.service.CassandraDaemon.maybeInitJmx(CassandraDaemon.java:143)
 [apache-cassandra-3.11.2-SNAPSHOT.jar:3.11.2-SNAPSHOT]
at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:188) 
[apache-cassandra-3.11.2-SNAPSHOT.jar:3.11.2-SNAPSHOT]
at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:600) 
[apache-cassandra-3.11.2-SNAPSHOT.jar:3.11.2-SNAPSHOT]
at 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:689) 
[apache-cassandra-3.11.2-SNAPSHOT.jar:3.11.2-SNAPSHOT]

Is this a known issue?


Thanks,
Thomas

The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freist<https://maps.google.com/?q=4040+Linz,+Austria,+Freist%C3%A4dterstra%C3%9Fe+313=gmail=g>ädterstra<https://maps.google.com/?q=4040+Linz,+Austria,+Freist%C3%A4dterstra%C3%9Fe+313=gmail=g>ße
 
313<https://maps.google.com/?q=4040+Linz,+Austria,+Freist%C3%A4dterstra%C3%9Fe+313=gmail=g>


The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 313


Cassandra 3.11 fails to start with JDK8u162

2018-01-17 Thread Steinmaurer, Thomas
Hello,

after switching from JDK8u152 to JDK8u162, Cassandra fails with the following 
stack trace upon startup.

ERROR [main] 2018-01-18 07:33:18,804 CassandraDaemon.java:706 - Exception 
encountered during startup
java.lang.AbstractMethodError: 
org.apache.cassandra.utils.JMXServerUtils$Exporter.exportObject(Ljava/rmi/Remote;ILjava/rmi/server/RMIClientSocketFactory;Ljava/rmi/server/RMIServerSocketFactory;Lsun/misc/ObjectInputFilter;)Ljava/rmi/Remote;
at 
javax.management.remote.rmi.RMIJRMPServerImpl.export(RMIJRMPServerImpl.java:150)
 ~[na:1.8.0_162]
at 
javax.management.remote.rmi.RMIJRMPServerImpl.export(RMIJRMPServerImpl.java:135)
 ~[na:1.8.0_162]
at 
javax.management.remote.rmi.RMIConnectorServer.start(RMIConnectorServer.java:405)
 ~[na:1.8.0_162]
at 
org.apache.cassandra.utils.JMXServerUtils.createJMXServer(JMXServerUtils.java:104)
 ~[apache-cassandra-3.11.2-SNAPSHOT.jar:3.11.2-SNAPSHOT]
at 
org.apache.cassandra.service.CassandraDaemon.maybeInitJmx(CassandraDaemon.java:143)
 [apache-cassandra-3.11.2-SNAPSHOT.jar:3.11.2-SNAPSHOT]
at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:188) 
[apache-cassandra-3.11.2-SNAPSHOT.jar:3.11.2-SNAPSHOT]
at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:600) 
[apache-cassandra-3.11.2-SNAPSHOT.jar:3.11.2-SNAPSHOT]
at 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:689) 
[apache-cassandra-3.11.2-SNAPSHOT.jar:3.11.2-SNAPSHOT]

Is this a known issue?


Thanks,
Thomas

The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freist?dterstra?e 313


RE: Cleanup blocking snapshots - Options?

2018-01-14 Thread Steinmaurer, Thomas
Hi Kurt,

it was easily triggered with the mentioned combination (cleanup after extending 
the cluster) a few months ago, thus I guess it will be the same when I re-try. 
Due to the issue we simply omitted running cleanup then, but as disk space is 
becoming some sort of bottle-neck again, we need to re-evaluate this situation ☺

Regards,
Thomas

From: kurt greaves [mailto:k...@instaclustr.com]
Sent: Montag, 15. Jänner 2018 06:18
To: User <user@cassandra.apache.org>
Subject: Re: Cleanup blocking snapshots - Options?

Disabling the snapshots is the best and only real option other than upgrading 
at the moment. Although apparently it was thought that there was only a small 
race condition in 2.1 that triggered this and it wasn't worth fixing. If you 
are triggering it easily maybe it is worth fixing in 2.1 as well. Does this 
happen consistently? Can you provide some more logs on the JIRA or better yet a 
way to reproduce?

On 14 January 2018 at 16:12, Steinmaurer, Thomas 
<thomas.steinmau...@dynatrace.com<mailto:thomas.steinmau...@dynatrace.com>> 
wrote:
Hello,

we are running 2.1.18 with vnodes in production and due to 
(https://issues.apache.org/jira/browse/CASSANDRA-11155) we can’t run cleanup 
e.g. after extending the cluster without blocking our hourly snapshots.

What options do we have to get rid of partitions a node does not own anymore?

• Using a version which has this issue fixed, although upgrading to 
2.2+, due to various issues, is not an option at the moment

• Temporarily disabling the hourly cron job before starting cleanup and 
re-enable after cleanup has finished

• Any other way to re-write SSTables with data a node owns after a 
cluster scale out

Thanks,
Thomas

The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freist<https://maps.google.com/?q=4040+Linz,+Austria,+Freist%C3%A4dterstra%C3%9Fe+313=gmail=g>ädterstra<https://maps.google.com/?q=4040+Linz,+Austria,+Freist%C3%A4dterstra%C3%9Fe+313=gmail=g>ße
 
313<https://maps.google.com/?q=4040+Linz,+Austria,+Freist%C3%A4dterstra%C3%9Fe+313=gmail=g>

The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 313


RE: Meltdown/Spectre Linux patch - Performance impact on Cassandra?

2018-01-14 Thread Steinmaurer, Thomas
Ben,

regarding OS/VM level patching impact. We see almost zero additional impact 
with 4.9.51-10.52.amzn1.x86_64 vs. 4.9.75-25.55.amzn1.x86_64 
(https://alas.aws.amazon.com/ALAS-2018-939.html) on a m5.2xlarge. m5 instance 
type family is rather new and AWS told us to give them a try compared to m4, as 
they are running a different hypervisor technology etc.

Before the additional hypervisor patch on Jan 12, we saw a relative CPU 
improvement of ~ 29% at an advertised ECU improvement of 19%. With the 
additional Jan 12 AWS patching at hypervisor level (?), we see m5 vs. m4 almost 
at the same CPU level. So, m4 on XEN looks pretty good again. Same as before 
again (as you have already mentioned) + with the 2 mentioned kernels above in 
comparison, the situation looks, even with the additional OS/VM level patching. 
Thus, currently we do not see any further action items needed now.

Thomas

From: Ben Slater [mailto:ben.sla...@instaclustr.com]
Sent: Freitag, 12. Jänner 2018 23:37
To: user@cassandra.apache.org
Subject: Re: Meltdown/Spectre Linux patch - Performance impact on Cassandra?

We’re seeing evidence across our fleet that AWS has rolled something out in the 
last 24 hours that has significantly reduce the performance impacts - back 
pretty close to pre-patch levels. Yet to see if the impacts come back with o/s 
patching on top of the improved hypervisor.

Cheers
Ben



On Thu, 11 Jan 2018 at 05:32 Jon Haddad 
<j...@jonhaddad.com<mailto:j...@jonhaddad.com>> wrote:
For what it’s worth, we (TLP) just posted some results comparing pre and post 
meltdown statistics: 
http://thelastpickle.com/blog/2018/01/10/meltdown-impact-on-latency.html



On Jan 10, 2018, at 1:57 AM, Steinmaurer, Thomas 
<thomas.steinmau...@dynatrace.com<mailto:thomas.steinmau...@dynatrace.com>> 
wrote:

m4.xlarge do have PCID to my knowledge, but possibly we need a rather new 
kernel 4.14. But I fail to see how this could help anyway, cause this looks 
highly Amazon Hypervisor patch related and we do not have the production 
instances patched at OS/VM level (yet).

Thomas

From: Dor Laor [mailto:d...@scylladb.com]
Sent: Dienstag, 09. Jänner 2018 19:30
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: Meltdown/Spectre Linux patch - Performance impact on Cassandra?

Make sure you pick instances with PCID cpu capability, their TLB overhead flush
overhead is much smaller

On Tue, Jan 9, 2018 at 2:04 AM, Steinmaurer, Thomas 
<thomas.steinmau...@dynatrace.com<mailto:thomas.steinmau...@dynatrace.com>> 
wrote:
Quick follow up.

Others in AWS reporting/seeing something similar, 
e.g.:https://twitter.com/BenBromhead/status/950245250504601600

So, while we have seen an relative CPU increase of ~ 50% since Jan 4, 2018, we 
now also have applied a kernel update at OS/VM level on a single node (loadtest 
and not production though), thus more or less double patched now. Additional 
CPU impact by OS/VM level kernel patching is more or less negligible, so looks 
highly Hypervisor related.

Regards,
Thomas

From: Steinmaurer, Thomas 
[mailto:thomas.steinmau...@dynatrace.com<mailto:thomas.steinmau...@dynatrace.com>]
Sent: Freitag, 05. Jänner 2018 12:09
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Meltdown/Spectre Linux patch - Performance impact on Cassandra?

Hello,

has anybody already some experience/results if a patched Linux kernel regarding 
Meltdown/Spectre is affecting performance of Cassandra negatively?

In production, all nodes running in AWS with m4.xlarge, we see up to a 50% 
relative (e.g. AVG CPU from 40% => 60%) CPU increase since Jan 4, 2018, most 
likely correlating with Amazon finished patching the underlying Hypervisor 
infrastructure …

Anybody else seeing a similar CPU increase?

Thanks,
Thomas

The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 
313<https://maps.google.com/?q=4040+Linz,+Austria,+Freist%C3%A4dterstra%C3%9Fe+313=gmail=g>
The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 
313<https://maps.google.com/?q=4040+Linz,+Austria,+Freist%C3%A4dterstra%C3%9Fe+31

Cleanup blocking snapshots - Options?

2018-01-14 Thread Steinmaurer, Thomas
Hello,

we are running 2.1.18 with vnodes in production and due to 
(https://issues.apache.org/jira/browse/CASSANDRA-11155) we can't run cleanup 
e.g. after extending the cluster without blocking our hourly snapshots.

What options do we have to get rid of partitions a node does not own anymore?

* Using a version which has this issue fixed, although upgrading to 
2.2+, due to various issues, is not an option at the moment

* Temporarily disabling the hourly cron job before starting cleanup and 
re-enable after cleanup has finished

* Any other way to re-write SSTables with data a node owns after a 
cluster scale out

Thanks,
Thomas

The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freist?dterstra?e 313


RE: Meltdown/Spectre Linux patch - Performance impact on Cassandra?

2018-01-13 Thread Steinmaurer, Thomas
Hello Ben,

thanks for the notice. Similar here + others reporting as well: 
https://blog.appoptics.com/visualizing-meltdown-aws/


Regards,
Thomas

From: Ben Slater [mailto:ben.sla...@instaclustr.com]
Sent: Freitag, 12. Jänner 2018 23:37
To: user@cassandra.apache.org
Subject: Re: Meltdown/Spectre Linux patch - Performance impact on Cassandra?

We’re seeing evidence across our fleet that AWS has rolled something out in the 
last 24 hours that has significantly reduce the performance impacts - back 
pretty close to pre-patch levels. Yet to see if the impacts come back with o/s 
patching on top of the improved hypervisor.

Cheers
Ben



On Thu, 11 Jan 2018 at 05:32 Jon Haddad 
<j...@jonhaddad.com<mailto:j...@jonhaddad.com>> wrote:
For what it’s worth, we (TLP) just posted some results comparing pre and post 
meltdown statistics: 
http://thelastpickle.com/blog/2018/01/10/meltdown-impact-on-latency.html



On Jan 10, 2018, at 1:57 AM, Steinmaurer, Thomas 
<thomas.steinmau...@dynatrace.com<mailto:thomas.steinmau...@dynatrace.com>> 
wrote:

m4.xlarge do have PCID to my knowledge, but possibly we need a rather new 
kernel 4.14. But I fail to see how this could help anyway, cause this looks 
highly Amazon Hypervisor patch related and we do not have the production 
instances patched at OS/VM level (yet).

Thomas

From: Dor Laor [mailto:d...@scylladb.com]
Sent: Dienstag, 09. Jänner 2018 19:30
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: Meltdown/Spectre Linux patch - Performance impact on Cassandra?

Make sure you pick instances with PCID cpu capability, their TLB overhead flush
overhead is much smaller

On Tue, Jan 9, 2018 at 2:04 AM, Steinmaurer, Thomas 
<thomas.steinmau...@dynatrace.com<mailto:thomas.steinmau...@dynatrace.com>> 
wrote:
Quick follow up.

Others in AWS reporting/seeing something similar, 
e.g.:https://twitter.com/BenBromhead/status/950245250504601600

So, while we have seen an relative CPU increase of ~ 50% since Jan 4, 2018, we 
now also have applied a kernel update at OS/VM level on a single node (loadtest 
and not production though), thus more or less double patched now. Additional 
CPU impact by OS/VM level kernel patching is more or less negligible, so looks 
highly Hypervisor related.

Regards,
Thomas

From: Steinmaurer, Thomas 
[mailto:thomas.steinmau...@dynatrace.com<mailto:thomas.steinmau...@dynatrace.com>]
Sent: Freitag, 05. Jänner 2018 12:09
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Meltdown/Spectre Linux patch - Performance impact on Cassandra?

Hello,

has anybody already some experience/results if a patched Linux kernel regarding 
Meltdown/Spectre is affecting performance of Cassandra negatively?

In production, all nodes running in AWS with m4.xlarge, we see up to a 50% 
relative (e.g. AVG CPU from 40% => 60%) CPU increase since Jan 4, 2018, most 
likely correlating with Amazon finished patching the underlying Hypervisor 
infrastructure …

Anybody else seeing a similar CPU increase?

Thanks,
Thomas

The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 
313<https://maps.google.com/?q=4040+Linz,+Austria,+Freist%C3%A4dterstra%C3%9Fe+313=gmail=g>
The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 
313<https://maps.google.com/?q=4040+Linz,+Austria,+Freist%C3%A4dterstra%C3%9Fe+313=gmail=g>

The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 
313<https://maps.google.com/?q=4040+Linz,+Austria,+Freist%C3%A4dterstra%C3%9Fe+313=gmail=g>



--

Ben Slater
Chief Product Officer
[https://cdn2.hubspot.net/hubfs/2549680/Instaclustr-Navy-logo-new.png]<https://www.instaclustr.com/>

[http

RE: Meltdown/Spectre Linux patch - Performance impact on Cassandra?

2018-01-10 Thread Steinmaurer, Thomas
m4.xlarge do have PCID to my knowledge, but possibly we need a rather new 
kernel 4.14. But I fail to see how this could help anyway, cause this looks 
highly Amazon Hypervisor patch related and we do not have the production 
instances patched at OS/VM level (yet).

Thomas

From: Dor Laor [mailto:d...@scylladb.com]
Sent: Dienstag, 09. Jänner 2018 19:30
To: user@cassandra.apache.org
Subject: Re: Meltdown/Spectre Linux patch - Performance impact on Cassandra?

Make sure you pick instances with PCID cpu capability, their TLB overhead flush
overhead is much smaller

On Tue, Jan 9, 2018 at 2:04 AM, Steinmaurer, Thomas 
<thomas.steinmau...@dynatrace.com<mailto:thomas.steinmau...@dynatrace.com>> 
wrote:
Quick follow up.

Others in AWS reporting/seeing something similar, e.g.: 
https://twitter.com/BenBromhead/status/950245250504601600

So, while we have seen an relative CPU increase of ~ 50% since Jan 4, 2018, we 
now also have applied a kernel update at OS/VM level on a single node (loadtest 
and not production though), thus more or less double patched now. Additional 
CPU impact by OS/VM level kernel patching is more or less negligible, so looks 
highly Hypervisor related.

Regards,
Thomas

From: Steinmaurer, Thomas 
[mailto:thomas.steinmau...@dynatrace.com<mailto:thomas.steinmau...@dynatrace.com>]
Sent: Freitag, 05. Jänner 2018 12:09
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Meltdown/Spectre Linux patch - Performance impact on Cassandra?

Hello,

has anybody already some experience/results if a patched Linux kernel regarding 
Meltdown/Spectre is affecting performance of Cassandra negatively?

In production, all nodes running in AWS with m4.xlarge, we see up to a 50% 
relative (e.g. AVG CPU from 40% => 60%) CPU increase since Jan 4, 2018, most 
likely correlating with Amazon finished patching the underlying Hypervisor 
infrastructure …

Anybody else seeing a similar CPU increase?

Thanks,
Thomas

The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 
313<https://maps.google.com/?q=4040+Linz,+Austria,+Freist%C3%A4dterstra%C3%9Fe+313=gmail=g>
The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 
313<https://maps.google.com/?q=4040+Linz,+Austria,+Freist%C3%A4dterstra%C3%9Fe+313=gmail=g>

The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 313


RE: Meltdown/Spectre Linux patch - Performance impact on Cassandra?

2018-01-09 Thread Steinmaurer, Thomas
Quick follow up.

Others in AWS reporting/seeing something similar, e.g.: 
https://twitter.com/BenBromhead/status/950245250504601600

So, while we have seen an relative CPU increase of ~ 50% since Jan 4, 2018, we 
now also have applied a kernel update at OS/VM level on a single node (loadtest 
and not production though), thus more or less double patched now. Additional 
CPU impact by OS/VM level kernel patching is more or less negligible, so looks 
highly Hypervisor related.

Regards,
Thomas

From: Steinmaurer, Thomas [mailto:thomas.steinmau...@dynatrace.com]
Sent: Freitag, 05. Jänner 2018 12:09
To: user@cassandra.apache.org
Subject: Meltdown/Spectre Linux patch - Performance impact on Cassandra?

Hello,

has anybody already some experience/results if a patched Linux kernel regarding 
Meltdown/Spectre is affecting performance of Cassandra negatively?

In production, all nodes running in AWS with m4.xlarge, we see up to a 50% 
relative (e.g. AVG CPU from 40% => 60%) CPU increase since Jan 4, 2018, most 
likely correlating with Amazon finished patching the underlying Hypervisor 
infrastructure ...

Anybody else seeing a similar CPU increase?

Thanks,
Thomas

The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 313
The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 313


Meltdown/Spectre Linux patch - Performance impact on Cassandra?

2018-01-05 Thread Steinmaurer, Thomas
Hello,

has anybody already some experience/results if a patched Linux kernel regarding 
Meltdown/Spectre is affecting performance of Cassandra negatively?

In production, all nodes running in AWS with m4.xlarge, we see up to a 50% 
relative (e.g. AVG CPU from 40% => 60%) CPU increase since Jan 4, 2018, most 
likely correlating with Amazon finished patching the underlying Hypervisor 
infrastructure ...

Anybody else seeing a similar CPU increase?

Thanks,
Thomas

The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freist?dterstra?e 313


RE: Stable Cassandra 3.x version for production

2017-11-07 Thread Steinmaurer, Thomas
Latest DSE is based on 3.11 (possibly due to CASSANDRA-12269, but just a guess).

For us (only), none of 3.0+/3.11+ qualifies for production to be honest, when 
you are familiar with having 2.1 in production.


· 3.0 needs more hardware resources to handle the same load => 
https://issues.apache.org/jira/browse/CASSANDRA-12269. Improved (close/back to 
2.1 level) in 3.11

· With 3.11.0 we are seeing the following memory leak (at least here, 
but possibly nobody else out there, cause the ticket gets close to zero 
attention *g*): https://issues.apache.org/jira/browse/CASSANDRA-13929

· From an operational perspective, the repair area got much more 
troublesome compared to 2.1 when introducing incremental repairs being the 
default in 2.2+

We stay on 2.1 for now.

Just my opinion.

Thomas

From: Herbert Fischer [mailto:herbert.fisc...@crossengage.io]
Sent: Dienstag, 07. November 2017 14:35
To: user@cassandra.apache.org
Subject: Re: Stable Cassandra 3.x version for production

I know that people usually prefers to use the 3.0.x branch because that's the 
one that is underneath DSE.

I've never heard of anyone using Cassandra > 3.0.x on production.



On Tue, Nov 7, 2017 at 11:29 AM, shini gupta 
> wrote:


Hi

Which version of Cassandra 3.x is stable and production-ready?

Regards



--
Herbert Fischer | Senior IT Architect
CrossEngage GmbH | Bertha-Benz Straße 5 | 10557 Berlin

E-Mail: herbert.fisc...@crossengage.io
Web: www.crossengage.io

Amtsgericht Berlin-Charlottenburg | HRB 169537 B
Geschäftsführer: Dr. Markus Wübben, Manuel Hinz | USt-IdNr.: DE301504202
The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 313


cassandra.yaml configuration for large machines (scale up vs. scale out)

2017-11-03 Thread Steinmaurer, Thomas
Hello,

I know that Cassandra is built for scale out on commodity hardware, but I 
wonder if anyone can share some experience when running Cassandra on rather 
capable machines.

Let's say we have a 3 node cluster with 128G RAM, 32 physical cores (16 per CPU 
socket), Large Raid with Spinning Disks (so somewhere beyond 2000 IOPS).

What are some recommended cassandra.yaml configuration / JVM settings, e.g. we 
have been using with something like that as a first baseline:

* 31G heap, G1, -XX:MaxGCPauseMillis=2000

* concurrent_compactors: 8

* compaction_throughput_mb_per_sec: 128

* key_cache_size_in_mb: 2048

* concurrent_reads: 256

* concurrent_writes: 256

* native_transport_max_threads: 256

Anything else we should add to our first baseline of settings?

E.g. although we have a key cache of 2G, nodetool info gives me only 0.451 as 
hit rate:

Key Cache  : entries 2919619, size 1.99 GB, capacity 2 GB, 71493172 
hits, 158411217 requests, 0.451 recent hit rate, 14400 save period in seconds


Thanks,
Thomas

The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freist?dterstra?e 313


RE: Not serving read requests while running nodetool repair

2017-10-18 Thread Steinmaurer, Thomas
Hi Jeff,

thanks for the info. Hoped that nodetool provides an option for that. We will 
go with the temporary QUORUM approach for now.

Thanks again.

Thomas

From: Jeff Jirsa [mailto:jji...@gmail.com]
Sent: Mittwoch, 18. Oktober 2017 15:46
To: user@cassandra.apache.org
Subject: Re: Not serving read requests while running nodetool repair

You can accomplish this by manually tweaking the values in the dynamic snitch 
mbean so other nodes won’t select it for reads
--
Jeff Jirsa


On Oct 18, 2017, at 3:24 AM, Steinmaurer, Thomas 
<thomas.steinmau...@dynatrace.com<mailto:thomas.steinmau...@dynatrace.com>> 
wrote:
Hi Nicolas,

newly bootstrapping is not really an option, I think.

I hoped there might be a mode (perhaps there is, but haven’t found it yet) 
where a node might not actively participate in read requests but still writing 
newly arriving data during a repair or whatever maintenance task one thinks 
that some sort of write-only is appropriate.

Thanks,
Thomas

From: Nicolas Guyomar [mailto:nicolas.guyo...@gmail.com]
Sent: Mittwoch, 18. Oktober 2017 09:58
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: Not serving read requests while running nodetool repair

Hi Thomas,

AFAIK  temporarily reading at LOCAL_QUORUM/QUORUM until nodetool repair is 
finished is the way to go. You can still disable binary/thrift on the node to 
"protect" it from acting as a coordinator, and complete its repair quietly, but 
I'm not sure that would make such a huge difference in recovery time.

If you disable gossip the node will be see as down, thus disabling repair if 
I'm correct

If repair is taking too much time, or switching to Read Quorum not feasible 
within your application, is it OK for you to shutdown the node, wipe its data 
and launch a repair on it at an appropriate time windows ?



On 18 October 2017 at 08:04, Steinmaurer, Thomas 
<thomas.steinmau...@dynatrace.com<mailto:thomas.steinmau...@dynatrace.com>> 
wrote:
Hello,

due to performance/latency reasons, we are currently reading and writing time 
series data at consistency level  ONE/ANY.

In case of a node being down and recovering after the default hinted handoff 
window of 3 hrs, we may potentially read stale data from the recovering node. 
Of course, from an operational POV, we will trigger a nodetool repair after the 
recovered node has start up, but to my understanding, this still may cause 
reading stale data from this particular node until nodetool repair is finished, 
which may take several hours. Is this correct?

Is there a way (e.g. in newer releases 3.0/3.11) to tell a node not being part 
of read requests? Something like a counterpart of nodetool drain for writes?

Other options:


-  Disabling binary/thrift: Although the node won’t act as coordinator 
then for client requests, as it won’t be available from a client perspective, 
inter-node, the recovering node is being contacted by other nodes for serving 
requests, right?

-  Disabling gossip: Basically the node is marked as DOWN, so treated 
like it is not available, thus not an option?

-  Temporarily reading at LOCAL_QUORUM/QUORUM until nodetool repair is 
finished?

Did I miss something?

Thanks,
Thomas

The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freist<https://maps.google.com/?q=4040+Linz,+Austria,+Freist%C3%A4dterstra%C3%9Fe+313=gmail=g>ädterstra<https://maps.google.com/?q=4040+Linz,+Austria,+Freist%C3%A4dterstra%C3%9Fe+313=gmail=g>ße
 
313<https://maps.google.com/?q=4040+Linz,+Austria,+Freist%C3%A4dterstra%C3%9Fe+313=gmail=g>

The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 313
The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 313


RE: Not serving read requests while running nodetool repair

2017-10-18 Thread Steinmaurer, Thomas
Hi Nicolas,

newly bootstrapping is not really an option, I think.

I hoped there might be a mode (perhaps there is, but haven’t found it yet) 
where a node might not actively participate in read requests but still writing 
newly arriving data during a repair or whatever maintenance task one thinks 
that some sort of write-only is appropriate.

Thanks,
Thomas

From: Nicolas Guyomar [mailto:nicolas.guyo...@gmail.com]
Sent: Mittwoch, 18. Oktober 2017 09:58
To: user@cassandra.apache.org
Subject: Re: Not serving read requests while running nodetool repair

Hi Thomas,

AFAIK  temporarily reading at LOCAL_QUORUM/QUORUM until nodetool repair is 
finished is the way to go. You can still disable binary/thrift on the node to 
"protect" it from acting as a coordinator, and complete its repair quietly, but 
I'm not sure that would make such a huge difference in recovery time.

If you disable gossip the node will be see as down, thus disabling repair if 
I'm correct

If repair is taking too much time, or switching to Read Quorum not feasible 
within your application, is it OK for you to shutdown the node, wipe its data 
and launch a repair on it at an appropriate time windows ?



On 18 October 2017 at 08:04, Steinmaurer, Thomas 
<thomas.steinmau...@dynatrace.com<mailto:thomas.steinmau...@dynatrace.com>> 
wrote:
Hello,

due to performance/latency reasons, we are currently reading and writing time 
series data at consistency level  ONE/ANY.

In case of a node being down and recovering after the default hinted handoff 
window of 3 hrs, we may potentially read stale data from the recovering node. 
Of course, from an operational POV, we will trigger a nodetool repair after the 
recovered node has start up, but to my understanding, this still may cause 
reading stale data from this particular node until nodetool repair is finished, 
which may take several hours. Is this correct?

Is there a way (e.g. in newer releases 3.0/3.11) to tell a node not being part 
of read requests? Something like a counterpart of nodetool drain for writes?

Other options:


-  Disabling binary/thrift: Although the node won’t act as coordinator 
then for client requests, as it won’t be available from a client perspective, 
inter-node, the recovering node is being contacted by other nodes for serving 
requests, right?

-  Disabling gossip: Basically the node is marked as DOWN, so treated 
like it is not available, thus not an option?

-  Temporarily reading at LOCAL_QUORUM/QUORUM until nodetool repair is 
finished?

Did I miss something?

Thanks,
Thomas

The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freist<https://maps.google.com/?q=4040+Linz,+Austria,+Freist%C3%A4dterstra%C3%9Fe+313=gmail=g>ädterstra<https://maps.google.com/?q=4040+Linz,+Austria,+Freist%C3%A4dterstra%C3%9Fe+313=gmail=g>ße
 
313<https://maps.google.com/?q=4040+Linz,+Austria,+Freist%C3%A4dterstra%C3%9Fe+313=gmail=g>

The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 313


Not serving read requests while running nodetool repair

2017-10-18 Thread Steinmaurer, Thomas
Hello,

due to performance/latency reasons, we are currently reading and writing time 
series data at consistency level  ONE/ANY.

In case of a node being down and recovering after the default hinted handoff 
window of 3 hrs, we may potentially read stale data from the recovering node. 
Of course, from an operational POV, we will trigger a nodetool repair after the 
recovered node has start up, but to my understanding, this still may cause 
reading stale data from this particular node until nodetool repair is finished, 
which may take several hours. Is this correct?

Is there a way (e.g. in newer releases 3.0/3.11) to tell a node not being part 
of read requests? Something like a counterpart of nodetool drain for writes?

Other options:


-  Disabling binary/thrift: Although the node won't act as coordinator 
then for client requests, as it won't be available from a client perspective, 
inter-node, the recovering node is being contacted by other nodes for serving 
requests, right?

-  Disabling gossip: Basically the node is marked as DOWN, so treated 
like it is not available, thus not an option?

-  Temporarily reading at LOCAL_QUORUM/QUORUM until nodetool repair is 
finished?

Did I miss something?

Thanks,
Thomas

The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freist?dterstra?e 313


RE: Cassandra and G1 Garbage collector stop the world event (STW)

2017-10-09 Thread Steinmaurer, Thomas
Hi,

my previously mentioned G1 bug does not seem to be related to your case

Thomas

From: Gustavo Scudeler [mailto:scudel...@gmail.com]
Sent: Montag, 09. Oktober 2017 15:13
To: user@cassandra.apache.org
Subject: Re: Cassandra and G1 Garbage collector stop the world event (STW)

Hello,

@kurt greaves: Have you tried CMS with that sized heap?

Yes, for testing for testing purposes, I have 3 nodes with CMS and 3 with G1. 
The behavior is basically the same.


Using CMS suggested settings 
http://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMTcvMTAvOC8tLWdjLmxvZy4wLmN1cnJlbnQtLTE5LTAtNDk=

Using G1 suggested settings 
http://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMTcvMTAvOC8tLWdjLmxvZy4wLmN1cnJlbnQtLTE5LTExLTE3


@Steinmaurer, Thomas If this happens in a very short very frequently and 
depending on your allocation rate in MB/s, a combination of the G1 bug and a 
small heap, might result going towards OOM.

We have a really high obj allocation rate:

Avg creation rate

622.9 mb/sec


Avg promotion rate

18.39 mb/sec


It could be the cause, where the GC can't keep up with this rate.

I'm stating to think it could be some wrong configuration where Cassandra is 
configured in a way that bursts allocations in a manner that G1 can't keep up 
with.

Any ideas?

Best regards,


2017-10-09 12:44 GMT+01:00 Steinmaurer, Thomas 
<thomas.steinmau...@dynatrace.com<mailto:thomas.steinmau...@dynatrace.com>>:
Hi,

although not happening here with Cassandra (due to using CMS), we had some 
weird problem with our server application e.g. hit by the following JVM/G1 bugs:
https://bugs.openjdk.java.net/browse/JDK-8140597
https://bugs.openjdk.java.net/browse/JDK-8141402 (more or less  a duplicate of 
above)
https://bugs.openjdk.java.net/browse/JDK-8048556

Especially the first, JDK-8140597, might be interesting, if you see periodic 
humongous allocations (according to a GC log) resulting in mixed GC phases 
being steadily interrupted due to G1 bug, thus no GC in OLD regions. Humongous 
allocations will happen if a single (?) allocation is > (region size / 2), if I 
remember correctly. Can’t recall the default G1 region size for a 12GB heap, 
but possibly 4MB. So, in case you are allocating something larger than > 2MB, 
you might end up in something called “humongous” allocations, spanning several 
G1 regions. If this happens in a very short very frequently and depending on 
your allocation rate in MB/s, a combination of the G1 bug and a small heap, 
might result going towards OOM.

Possibly worth a further route for investigation.

Regards,
Thomas

From: Gustavo Scudeler [mailto:scudel...@gmail.com<mailto:scudel...@gmail.com>]
Sent: Montag, 09. Oktober 2017 13:12
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Cassandra and G1 Garbage collector stop the world event (STW)


Hi guys,

We have a 6 node Cassandra Cluster under heavy utilization. We have been 
dealing a lot with garbage collector stop the world event, which can take up to 
50 seconds in our nodes, in the meantime Cassandra Node is unresponsive, not 
even accepting new logins.

Extra details:
• Cassandra Version: 3.11
• Heap Size = 12 GB
• We are using G1 Garbage Collector with default settings
• Nodes size: 4 CPUs 28 GB RAM
• All CPU cores are at 100% all the time.
• The G1 GC behavior is the same across all nodes.

The behavior remains basically:
1.  Old Gen starts to fill up.
2.  GC can't clean it properly without a full GC and a STW event.
3.  The full GC starts to take longer, until the node is completely 
unresponsive.
Extra details and GC reports:
https://stackoverflow.com/questions/46568777/cassandra-and-g1-garbage-collector-stop-the-world-event-stw

Can someone point me what configurations or events I could check?

Thanks!

Best regards,

The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 
313<https://maps.google.com/?q=4040+Linz,+Austria,+Freist%C3%A4dterstra%C3%9Fe+313=gmail=g>



The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 313


RE: Cassandra and G1 Garbage collector stop the world event (STW)

2017-10-09 Thread Steinmaurer, Thomas
Hi,

although not happening here with Cassandra (due to using CMS), we had some 
weird problem with our server application e.g. hit by the following JVM/G1 bugs:
https://bugs.openjdk.java.net/browse/JDK-8140597
https://bugs.openjdk.java.net/browse/JDK-8141402 (more or less  a duplicate of 
above)
https://bugs.openjdk.java.net/browse/JDK-8048556

Especially the first, JDK-8140597, might be interesting, if you see periodic 
humongous allocations (according to a GC log) resulting in mixed GC phases 
being steadily interrupted due to G1 bug, thus no GC in OLD regions. Humongous 
allocations will happen if a single (?) allocation is > (region size / 2), if I 
remember correctly. Can’t recall the default G1 region size for a 12GB heap, 
but possibly 4MB. So, in case you are allocating something larger than > 2MB, 
you might end up in something called “humongous” allocations, spanning several 
G1 regions. If this happens in a very short very frequently and depending on 
your allocation rate in MB/s, a combination of the G1 bug and a small heap, 
might result going towards OOM.

Possibly worth a further route for investigation.

Regards,
Thomas

From: Gustavo Scudeler [mailto:scudel...@gmail.com]
Sent: Montag, 09. Oktober 2017 13:12
To: user@cassandra.apache.org
Subject: Cassandra and G1 Garbage collector stop the world event (STW)


Hi guys,

We have a 6 node Cassandra Cluster under heavy utilization. We have been 
dealing a lot with garbage collector stop the world event, which can take up to 
50 seconds in our nodes, in the meantime Cassandra Node is unresponsive, not 
even accepting new logins.

Extra details:
· Cassandra Version: 3.11
· Heap Size = 12 GB
· We are using G1 Garbage Collector with default settings
· Nodes size: 4 CPUs 28 GB RAM
· All CPU cores are at 100% all the time.
· The G1 GC behavior is the same across all nodes.

The behavior remains basically:
1.  Old Gen starts to fill up.
2.  GC can't clean it properly without a full GC and a STW event.
3.  The full GC starts to take longer, until the node is completely 
unresponsive.
Extra details and GC reports:
https://stackoverflow.com/questions/46568777/cassandra-and-g1-garbage-collector-stop-the-world-event-stw

Can someone point me what configurations or events I could check?

Thanks!

Best regards,

The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 313


RE: Got error, removing parent repair session - When doing multiple repair -pr — Cassandra 3.x

2017-10-07 Thread Steinmaurer, Thomas
Marshall,

-pr should not be used with incremental repairs, which is the default since 
2.2. But even when used with full repairs (-full option), this will cause 
troubles when running nodetool repair -pr from several nodes concurrently. So, 
unfortunately, this does not seem to work anymore and makes the operational 
side much more difficult to handle.

We are in the same boat. There has been a lengthy discussion on that a few 
weeks ago and was asked to open the following ticket:
https://issues.apache.org/jira/browse/CASSANDRA-13885

Thomas

-Original Message-
From: marshall.s.k...@virtustream.com [mailto:marshall.s.k...@virtustream.com]
Sent: Samstag, 07. Oktober 2017 19:28
To: user@cassandra.apache.org
Subject: Got error, removing parent repair session - When doing multiple repair 
-pr — Cassandra 3.x



On cassandra 3.11 were getting an error (see error below) when doing multiple 
repairs with -pr option. Performing this type of repair worked fine on 
Cassandra 2.x. My questions are:

Is this a valid operation (nodetool repair -pr .. simultaneously on 
multiple nodes) to do?
If question one is valid, then has anyone seen this error before and if so 
what do you do to fix?

The error is:

ERROR [AntiEntropyStage:1] 2017-09-26 16:10:41,135 
RepairMessageVerbHandler.java:168 - Got error, removing parent repair session 
ERROR [AntiEntropyStage:1] 2017-09-26 16:10:41,135 CassandraDaemon.java:228 - 
Exception in thread Thread[AntiEntropyStage:1,5,main] 
java.lang.RuntimeException: Parent repair session with id = 
2689fa80-a2d5-11e7-9a55-b1319b7b5400 has failed.

Thanks, Marshall


-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 313

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org


RE: Node failure

2017-10-06 Thread Steinmaurer, Thomas
QUORUM should succeed with a RF=3 and 2 of 3 nodes available.

Modern client drivers also have ways to “downgrade” the CL of requests, in case 
they fail. E.g. for the Java driver: 
http://docs.datastax.com/en/latest-java-driver-api/com/datastax/driver/core/policies/DowngradingConsistencyRetryPolicy.html


Thomas

From: Mark Furlong [mailto:mfurl...@ancestry.com]
Sent: Freitag, 06. Oktober 2017 19:43
To: user@cassandra.apache.org
Subject: RE: Node failure

Thanks for the detail. I’ll have to remove and then add one back in. It’s my 
consistency levels that may bite me in the interim.

Thanks
Mark
801-705-7115 office

From: Jeff Jirsa [mailto:jji...@gmail.com]
Sent: Friday, October 6, 2017 11:29 AM
To: cassandra >
Subject: Re: Node failure

There's a lot to talk about here, what's your exact question?


- You can either remove it from the cluster or replace it. You typically remove 
it if it'll never be replaced, but in RF=3 with 3 nodes, you probably need to 
replace it. To replace, you'll start a new server with 
-Dcassandra.replace_address=a.b.c.d ( 
http://cassandra.apache.org/doc/latest/operating/topo_changes.html#replacing-a-dead-node
 ) , and it'll stream data from the neighbors and eventually replace the dead 
node in the ring (the dead node will be removed from 'nodetool status', the new 
node will be there instead).

- If you're not going to replace it, things get a bit more complex - you'll do 
some combination of repair, 'nodetool removenode' or 'nodetool assassinate', 
and ALTERing the keyspace to set RF=2. The order matters, and so does the 
consistency level you use for reads/writes (so we can tell you whether or not 
you're likely to lose data in this process), so I'm not giving step-by-steps 
here because it's not very straight forward and there are a lot of caveats.




On Fri, Oct 6, 2017 at 10:20 AM, Mark Furlong 
> wrote:
What happens when I have a 3 node cluster with RF 3 and a node fails that needs 
to be removed?

Mark Furlong

Sr. Database Administrator

mfurl...@ancestry.com
M: 801-859-7427
O: 801-705-7115
1300 W Traverse 
Pkwy
Lehi, UT 
84043





​[http://c.mfcreative.com/mars/email/shared-icon/sig-logo.gif]




The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 313


RE: Alter table gc_grace_seconds

2017-10-02 Thread Steinmaurer, Thomas
Hello Justin,

yes, but in real-world, hard to accomplish for high volume column families >= 
3-digit GB. Even with the default of 10 days grace period, this may get a real 
challenge, to complete a full repair. ☺

Possibly back again to the discussion that incremental repair has some flaws, 
full repair (-full option) in 3.0+ (2.2+?) isn’t behaving the same way as in 
2.1 anymore, due to troubles when kicking off with –pr on several nodes at once.

Regards,
Thomas

From: Justin Cameron [mailto:jus...@instaclustr.com]
Sent: Montag, 02. Oktober 2017 08:32
To: user@cassandra.apache.org
Subject: Re: Alter table gc_grace_seconds

>> * You should not try on real clusters directly.
>Why not? :)

It's highly recommended that you complete a full repair before the GC grace 
period expires, otherwise it's possible you could experience zombie data (i.e. 
data that was previously deleted coming back to life)

See http://thelastpickle.com/blog/2016/07/27/about-deletes-and-tombstones.html 
for a good overview of the problem


On Mon, 2 Oct 2017 at 16:51 Gábor Auth 
> wrote:
Hi,
On Mon, Oct 2, 2017 at 5:55 AM Varun Barala 
> wrote:
select gc_grace_seconds from system_schema.tables where keyspace_name = 
'keyspace' and table_name = 'number_item;

cassandra@cqlsh:mat> DESCRIBE TABLE mat.number_item;

CREATE TABLE mat.number_item (
   nodeid uuid,
   type text,
   created timeuuid,
   value float,
   PRIMARY KEY (nodeid, type, created)
) WITH CLUSTERING ORDER BY (type ASC, created ASC)
   AND bloom_filter_fp_chance = 0.01
   AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
   AND cdc = false
   AND comment = ''
   AND compaction = {'class': 
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
'max_threshold': '32', 'min_threshold': '4'}
   AND compression = {'chunk_length_in_kb': '64', 'class': 
'org.apache.cassandra.io.compress.LZ4Compressor'}
   AND crc_check_chance = 1.0
   AND dclocal_read_repair_chance = 0.1
   AND default_time_to_live = 0
   AND gc_grace_seconds = 3600
   AND max_index_interval = 2048
   AND memtable_flush_period_in_ms = 0
   AND min_index_interval = 128
   AND read_repair_chance = 0.0
   AND speculative_retry = '99PERCENTILE';

cassandra@cqlsh:mat> select gc_grace_seconds from system_schema.tables where 
keyspace_name = 'mat' and table_name = 'number_item';

gc_grace_seconds
--
3600

(1 rows)

Bye,
Gábor Auth
--
Justin Cameron
Senior Software Engineer

[Image removed by sender.]

This email has been sent on behalf of Instaclustr Pty. Limited (Australia) and 
Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally privileged 
information.  If you are not the intended recipient, do not copy or disclose 
its content, but please reply to this email immediately and highlight the error 
to the sender and then immediately delete the message.
The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 313


Cassandra 3.11.1 (snapshot build) - io.netty.util.Recycler$Stack memory leak

2017-10-01 Thread Steinmaurer, Thomas
Hello,

we were facing a memory leak with 3.11.0 
(https://issues.apache.org/jira/browse/CASSANDRA-13754) thus upgraded our 
loadtest environment to a snapshot build of 3.11.1. Having it running for > 48 
hrs now, we still see a steady increase on heap utilization.

Eclipse memory analyzer shows 147 instances of io.netty.util.Recycler$Stack 
with a total retained heap usage of ~ 1,8G, growing over time.

Should this be fixed already by CASSANDRA-13754 or is this something new?

Thanks,
Thomas

The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freist?dterstra?e 313


RE: space left for compaction

2017-10-01 Thread Steinmaurer, Thomas
Hi,

half of free space does not make sense. Imagine your SSTables need 100G space 
and you have 20G free disk. Compaction won't be able to do its job with 10G.

Half free of total disk makes more sense and is what you need for a major 
compaction worst case.

Thomas

From: Peng Xiao [mailto:2535...@qq.com]
Sent: Samstag, 30. September 2017 10:21
To: user 
Subject: space left for compaction

Dear All,

As for STCS,datastax suggest us to keep half of the free space for 
compaction,this is not strict,could anyone advise how many space should we left 
for one node?

Thanks,
Peng Xiao
The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freist?dterstra?e 313


RE:

2017-09-28 Thread Steinmaurer, Thomas
Dan,

do you see any major GC? We have been hit by the following memory leak in our 
loadtest environment with 3.11.0.
https://issues.apache.org/jira/browse/CASSANDRA-13754

So, depending on the heap size and uptime, you might get into heap troubles.

Thomas

From: Dan Kinder [mailto:dkin...@turnitin.com]
Sent: Donnerstag, 28. September 2017 18:20
To: user@cassandra.apache.org
Subject:


Hi,

I recently upgraded our 16-node cluster from 2.2.6 to 3.11 and see the 
following. The cluster does function, for a while, but then some stages begin 
to back up and the node does not recover and does not drain the tasks, even 
under no load. This happens both to MutationStage and GossipStage.

I do see the following exception happen in the logs:



ERROR [ReadRepairStage:2328] 2017-09-26 23:07:55,440 CassandraDaemon.java:228 - 
Exception in thread Thread[ReadRepairStage:2328,5,main]

org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - 
received only 1 responses.

at 
org.apache.cassandra.service.DataResolver$RepairMergeListener.close(DataResolver.java:171)
 ~[apache-cassandra-3.11.0.jar:3.11.0]

at 
org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$2.close(UnfilteredPartitionIterators.java:182)
 ~[apache-cassandra-3.11.0.jar:3.11.0]

at 
org.apache.cassandra.db.transform.BaseIterator.close(BaseIterator.java:82) 
~[apache-cassandra-3.11.0.jar:3.11.0]

at 
org.apache.cassandra.service.DataResolver.compareResponses(DataResolver.java:89)
 ~[apache-cassandra-3.11.0.jar:3.11.0]

at 
org.apache.cassandra.service.AsyncRepairCallback$1.runMayThrow(AsyncRepairCallback.java:50)
 ~[apache-cassandra-3.11.0.jar:3.11.0]

at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) 
~[apache-cassandra-3.11.0.jar:3.11.0]

at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
~[na:1.8.0_91]

at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
~[na:1.8.0_91]

at 
org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81)
 ~[apache-cassandra-3.11.0.jar:3.11.0]

at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_91]



But it's hard to correlate precisely with things going bad. It is also very 
strange to me since I have both read_repair_chance and 
dclocal_read_repair_chance set to 0.0 for ALL of my tables. So it is confusing 
why ReadRepairStage would err.

Anyone have thoughts on this? It's pretty muddling, and causes nodes to lock 
up. Once it happens Cassandra can't even shut down, I have to kill -9. If I 
can't find a resolution I'm going to need to downgrade and restore to backup...

The only issue I found that looked similar is 
https://issues.apache.org/jira/browse/CASSANDRA-12689 but that appears to be 
fixed by 3.10.



$ nodetool tpstats

Pool Name Active   Pending  Completed   Blocked  
All time blocked

ReadStage  0 0 582103 0 
0

MiscStage  0 0  0 0 
0

CompactionExecutor1111   2868 0 
0

MutationStage 32   4593678   55057393 0 
0

GossipStage1  2818 371487 0 
0

RequestResponseStage   0 04345522 0 
0

ReadRepairStage0 0 151473 0 
0

CounterMutationStage   0 0  0 0 
0

MemtableFlushWriter181 76 0 
0

MemtablePostFlush  1   382139 0 
0

ValidationExecutor 0 0  0 0 
0

ViewMutationStage  0 0  0 0 
0

CacheCleanupExecutor   0 0  0 0 
0

PerDiskMemtableFlushWriter_10  0 0 69 0 
0

PerDiskMemtableFlushWriter_11  0 0 69 0 
0

MemtableReclaimMemory  0 0 81 0 
0

PendingRangeCalculator 0 0 32 0 
0

SecondaryIndexManagement   0 0  0 0 
0

HintsDispatcher0 0596 0 
0

PerDiskMemtableFlushWriter_1   0 0 69 0 
0

Native-Transport-Requests 11 04547746 

RE: 回复: nodetool cleanup in parallel

2017-09-26 Thread Steinmaurer, Thomas
Side-note: At least with 2.1 (or even later), be aware that you might run into 
the following issue:
https://issues.apache.org/jira/browse/CASSANDRA-11155

We are doing cron―job based hourly snapshots in production and have tried to 
also run cleanup after extending a cluster from 6 to 9 nodes. This resulted in 
snapshot creation getting stuck, so we gave up running cleanup (and reclaiming 
disk space) in favor of still having actual snapshots in place.

We might find a time window where we disable snapshots, but cleanup may take 
while depending on the data volume, thus this would mean possibly disabling 
snapshots for many hours.

Regards,
Thomas


From: Peng Xiao [mailto:2535...@qq.com]
Sent: Mittwoch, 27. September 2017 06:25
To: user 
Subject: 回复: nodetool cleanup in parallel

Thanks Kurt.


-- 原始邮件 --
发件人: "kurt";>;
发送时间: 2017年9月27日(星期三) 中午11:57
收件人: "User">;
主题: Re: nodetool cleanup in parallel

correct. you can run it in parallel across many nodes if you have capacity. 
generally see about a 10% CPU increase from cleanups which isn't a big deal if 
you have the capacity to handle it + the io.

on that note on later versions you can specify -j  to run multiple 
cleanup compactions at the same time on a single node, and also increase 
compaction throughput to speed the process up.

On 27 Sep. 2017 13:20, "Peng Xiao" <2535...@qq.com> 
wrote:
hi,

nodetool cleanup will only remove those keys which no longer belong to those 
nodes,than theoretically we can run nodetool cleanup in parallel,right?the 
document suggests us to run this one by one,but it's too slow.

Thanks,
Peng Xiao

The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freist?dterstra?e 313


RE: GC/CPU increase after upgrading to 3.0.14 (from 2.1.18)

2017-09-26 Thread Steinmaurer, Thomas
Hi Alex,

we tested with larger new gen sizes up to ¼ of max heap, but m4.xlarge look 
like being to weak to deal with larger new gen. The result was that we then got 
much more GCInspector related logs, but perhaps we need to re-test.

Right, we are using batches extensively. Unlogged/non-atomic. We are aware of 
avoiding multi partition batches, if possible. For test purposes we built 
something into our application to switch a flag to move from multi partition 
batches to strictly single partition per batch. We have not seen any measurable 
high-level improvement (e.g. decreased CPU, GC suspension …) on the 
Cassandra-side with single partition batches. Naturally, this resulted in much 
more requests executed by our application against the Cassandra cluster, with 
the affect in our application/server, that we saw a significant GC/CPU increase 
on our server, caused by the DataStax driver due to executing now more requests 
by a factor of X. So, with no visible gain on the Cassandra-side, but impacting 
our application/server negatively, we don’t strictly execute single partition 
batches.

As said on the ticket (https://issues.apache.org/jira/browse/CASSANDRA-13900), 
anything except Cassandra binaries have been unchanged in our loadtest 
environment.


Thanks,
Thomas



From: Alexander Dejanovski [mailto:a...@thelastpickle.com]
Sent: Dienstag, 26. September 2017 11:14
To: user@cassandra.apache.org
Subject: Re: GC/CPU increase after upgrading to 3.0.14 (from 2.1.18)

Hi Thomas,

I wouldn't move to G1GC with small heaps (<24GB) but just looking at your 
ticket I think that your new gen is way too small.
I get that it worked better in 2.1 in your case though, which would suggest 
that the memory footprint is different between 2.1 and 3.0. It looks like 
you're using batches extensively.
Hopefully you're aware that multi partition batches are discouraged because 
they indeed create heap pressure and high coordination costs (on top of 
batchlog writes/deletions), leading to more GC pauses.
With a 400MB new gen, you're very likely to have a lot of premature promotions 
(especially with the default max tenuring threshold), which will fill the old 
gen faster than necessary and is likely to trigger major GCs.

I'd suggest you re-run those tests with a 2GB new gen and compare results. Know 
that with Cassandra you can easily go up to 40%-50% of your heap for the new 
gen.

Cheers,


On Tue, Sep 26, 2017 at 10:58 AM Matope Ono 
<matope@gmail.com<mailto:matope@gmail.com>> wrote:
Hi. We met similar situation after upgrading from 2.1.14 to 3.11 in our 
production.

Have you already tried G1GC instead of CMS? Our timeouts were mitigated after 
replacing CMS with G1GC.

Thanks.

2017-09-25 20:01 GMT+09:00 Steinmaurer, Thomas 
<thomas.steinmau...@dynatrace.com<mailto:thomas.steinmau...@dynatrace.com>>:
Hello,

I have now some concrete numbers from our 9 node loadtest cluster with constant 
load, same infrastructure after upgrading to 3.0.14 from 2.1.18.

We see doubled GC suspension time + correlating CPU increase. In short, 3.0.14 
is not able to handle the same load.

I have created https://issues.apache.org/jira/browse/CASSANDRA-13900. Feel free 
to request any further additional information on the ticket.

Unfortunately this is a real show-stopper for us upgrading to 3.0.

Thanks for your attention.

Thomas

From: Steinmaurer, Thomas 
[mailto:thomas.steinmau...@dynatrace.com<mailto:thomas.steinmau...@dynatrace.com>]
Sent: Freitag, 15. September 2017 13:51
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: RE: GC/CPU increase after upgrading to 3.0.14 (from 2.1.18)

Hi Jeff,

we are using native (CQL3) via Java DataStax driver (3.1). We also have 
OpsCenter running (to be removed soon) via Thrift, if I remember correctly.

As said, the write request latency for our keyspace hasn’t really changed, so 
perhaps another one (system related, OpsCenter …?) is affected or perhaps the 
JMX metric is reporting something differently now. ☺ So not a real issue for 
now hopefully, just popping up in our monitoring, wondering what this may be.

Regarding compression metadata memory usage drop. Right, storage engine 
re-write could be a reason. Thanks.

Still wondering about the GC/CPU increase.

Thanks!

Thomas



From: Jeff Jirsa [mailto:jji...@gmail.com]
Sent: Freitag, 15. September 2017 13:14
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: GC/CPU increase after upgrading to 3.0.14 (from 2.1.18)

Most people find 3.0 slightly slower than 2.1. The only thing that really 
stands out in your email is the huge change in 95% latency - that's atypical. 
Are you using thrift or native 9042)?  The decrease in compression metadata 
offheap usage is likely due to the increased storage efficiency of the storage 
engine (see Cassandra-8099).


--
Jeff Jirsa


On Sep 15, 2017, at 2:37 AM, Steinmaurer, Thomas 
<thomas.

RE: GC/CPU increase after upgrading to 3.0.14 (from 2.1.18)

2017-09-26 Thread Steinmaurer, Thomas
Hi,

in our experience CMS is doing much better with smaller heaps.
Regards,
Thomas


From: Matope Ono [mailto:matope@gmail.com]
Sent: Dienstag, 26. September 2017 10:58
To: user@cassandra.apache.org
Subject: Re: GC/CPU increase after upgrading to 3.0.14 (from 2.1.18)

Hi. We met similar situation after upgrading from 2.1.14 to 3.11 in our 
production.

Have you already tried G1GC instead of CMS? Our timeouts were mitigated after 
replacing CMS with G1GC.

Thanks.

2017-09-25 20:01 GMT+09:00 Steinmaurer, Thomas 
<thomas.steinmau...@dynatrace.com<mailto:thomas.steinmau...@dynatrace.com>>:
Hello,

I have now some concrete numbers from our 9 node loadtest cluster with constant 
load, same infrastructure after upgrading to 3.0.14 from 2.1.18.

We see doubled GC suspension time + correlating CPU increase. In short, 3.0.14 
is not able to handle the same load.

I have created https://issues.apache.org/jira/browse/CASSANDRA-13900. Feel free 
to request any further additional information on the ticket.

Unfortunately this is a real show-stopper for us upgrading to 3.0.

Thanks for your attention.

Thomas

From: Steinmaurer, Thomas 
[mailto:thomas.steinmau...@dynatrace.com<mailto:thomas.steinmau...@dynatrace.com>]
Sent: Freitag, 15. September 2017 13:51
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: RE: GC/CPU increase after upgrading to 3.0.14 (from 2.1.18)

Hi Jeff,

we are using native (CQL3) via Java DataStax driver (3.1). We also have 
OpsCenter running (to be removed soon) via Thrift, if I remember correctly.

As said, the write request latency for our keyspace hasn’t really changed, so 
perhaps another one (system related, OpsCenter …?) is affected or perhaps the 
JMX metric is reporting something differently now. ☺ So not a real issue for 
now hopefully, just popping up in our monitoring, wondering what this may be.

Regarding compression metadata memory usage drop. Right, storage engine 
re-write could be a reason. Thanks.

Still wondering about the GC/CPU increase.

Thanks!

Thomas



From: Jeff Jirsa [mailto:jji...@gmail.com]
Sent: Freitag, 15. September 2017 13:14
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: GC/CPU increase after upgrading to 3.0.14 (from 2.1.18)

Most people find 3.0 slightly slower than 2.1. The only thing that really 
stands out in your email is the huge change in 95% latency - that's atypical. 
Are you using thrift or native 9042)?  The decrease in compression metadata 
offheap usage is likely due to the increased storage efficiency of the storage 
engine (see Cassandra-8099).


--
Jeff Jirsa


On Sep 15, 2017, at 2:37 AM, Steinmaurer, Thomas 
<thomas.steinmau...@dynatrace.com<mailto:thomas.steinmau...@dynatrace.com>> 
wrote:
Hello,

we have a test (regression) environment hosted in AWS, which is used for auto 
deploying our software on a daily basis and attach constant load across all 
deployments. Basically to allow us to detect any regressions in our software on 
a daily basis.

On the Cassandra-side, this is single-node in AWS, m4.xlarge, EBS gp2, 8G heap, 
CMS. The environment has also been upgraded from Cassandra 2.1.18 to 3.0.14 at 
a certain point in time. Without running upgradesstables so far. We have not 
made any additional JVM/GC configuration change when going from 2.1.18 to 
3.0.14 on our own, thus, any self-made configuration changes (e.g. new gen heap 
size) for 2.1.18 are also in place with 3.0.14.

What we see after a time-frame of ~ 7 days (so, e.g. should not be caused by 
some sort of spiky compaction pattern) is an AVG increase in GC/CPU (most 
likely correlating):

• CPU: ~ 12% => ~ 17%

• GC Suspension: ~ 1,7% => 3,29%

In this environment not a big deal, but relatively we have a CPU increase of ~ 
50% (with increased GC most likely contributing). Something we have deal with 
when going into production (going into larger, multi-node loadtest environments 
first though).

Beside the CPU/GC shift, we also monitor the following noticeable changes 
(don’t know if they somehow correlate with the CPU/GC shift above):

• Increased AVG Write Client Requests Latency (95th Percentile), 
org.apache.cassandra.metrics.ClientRequest.Latency.Write: 6,05ms => 29,2ms, but 
almost constant (no change in) write client request latency for our particular 
keyspace only, org.apache.cassandra.metrics.Keyspace.ruxitdb.WriteLatency

• Compression metadata memory usage drop, 
org.apache.cassandra.metrics.Keyspace.XXX. 
CompressionMetadataOffHeapMemoryUsed: ~218MB => ~105MB => Good or bad? Known?

I know, looks all a bit vague, but perhaps someone else has seen something 
similar when upgrading to 3.0.14 and can share their thoughts/ideas. Especially 
the (relative) CPU/GC increase is something we are curious about.

Thanks a lot.

Thomas
The contents of this e-mail are intended for the named addressee only. It 
contains infor

RE: GC/CPU increase after upgrading to 3.0.14 (from 2.1.18)

2017-09-25 Thread Steinmaurer, Thomas
Hello,

I have now some concrete numbers from our 9 node loadtest cluster with constant 
load, same infrastructure after upgrading to 3.0.14 from 2.1.18.

We see doubled GC suspension time + correlating CPU increase. In short, 3.0.14 
is not able to handle the same load.

I have created https://issues.apache.org/jira/browse/CASSANDRA-13900. Feel free 
to request any further additional information on the ticket.

Unfortunately this is a real show-stopper for us upgrading to 3.0.

Thanks for your attention.

Thomas

From: Steinmaurer, Thomas [mailto:thomas.steinmau...@dynatrace.com]
Sent: Freitag, 15. September 2017 13:51
To: user@cassandra.apache.org
Subject: RE: GC/CPU increase after upgrading to 3.0.14 (from 2.1.18)

Hi Jeff,

we are using native (CQL3) via Java DataStax driver (3.1). We also have 
OpsCenter running (to be removed soon) via Thrift, if I remember correctly.

As said, the write request latency for our keyspace hasn’t really changed, so 
perhaps another one (system related, OpsCenter …?) is affected or perhaps the 
JMX metric is reporting something differently now. ☺ So not a real issue for 
now hopefully, just popping up in our monitoring, wondering what this may be.

Regarding compression metadata memory usage drop. Right, storage engine 
re-write could be a reason. Thanks.

Still wondering about the GC/CPU increase.

Thanks!

Thomas



From: Jeff Jirsa [mailto:jji...@gmail.com]
Sent: Freitag, 15. September 2017 13:14
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: GC/CPU increase after upgrading to 3.0.14 (from 2.1.18)

Most people find 3.0 slightly slower than 2.1. The only thing that really 
stands out in your email is the huge change in 95% latency - that's atypical. 
Are you using thrift or native 9042)?  The decrease in compression metadata 
offheap usage is likely due to the increased storage efficiency of the storage 
engine (see Cassandra-8099).


--
Jeff Jirsa


On Sep 15, 2017, at 2:37 AM, Steinmaurer, Thomas 
<thomas.steinmau...@dynatrace.com<mailto:thomas.steinmau...@dynatrace.com>> 
wrote:
Hello,

we have a test (regression) environment hosted in AWS, which is used for auto 
deploying our software on a daily basis and attach constant load across all 
deployments. Basically to allow us to detect any regressions in our software on 
a daily basis.

On the Cassandra-side, this is single-node in AWS, m4.xlarge, EBS gp2, 8G heap, 
CMS. The environment has also been upgraded from Cassandra 2.1.18 to 3.0.14 at 
a certain point in time. Without running upgradesstables so far. We have not 
made any additional JVM/GC configuration change when going from 2.1.18 to 
3.0.14 on our own, thus, any self-made configuration changes (e.g. new gen heap 
size) for 2.1.18 are also in place with 3.0.14.

What we see after a time-frame of ~ 7 days (so, e.g. should not be caused by 
some sort of spiky compaction pattern) is an AVG increase in GC/CPU (most 
likely correlating):

· CPU: ~ 12% => ~ 17%

· GC Suspension: ~ 1,7% => 3,29%

In this environment not a big deal, but relatively we have a CPU increase of ~ 
50% (with increased GC most likely contributing). Something we have deal with 
when going into production (going into larger, multi-node loadtest environments 
first though).

Beside the CPU/GC shift, we also monitor the following noticeable changes 
(don’t know if they somehow correlate with the CPU/GC shift above):

· Increased AVG Write Client Requests Latency (95th Percentile), 
org.apache.cassandra.metrics.ClientRequest.Latency.Write: 6,05ms => 29,2ms, but 
almost constant (no change in) write client request latency for our particular 
keyspace only, org.apache.cassandra.metrics.Keyspace.ruxitdb.WriteLatency

· Compression metadata memory usage drop, 
org.apache.cassandra.metrics.Keyspace.XXX. 
CompressionMetadataOffHeapMemoryUsed: ~218MB => ~105MB => Good or bad? Known?

I know, looks all a bit vague, but perhaps someone else has seen something 
similar when upgrading to 3.0.14 and can share their thoughts/ideas. Especially 
the (relative) CPU/GC increase is something we are curious about.

Thanks a lot.

Thomas
The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 313
The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
t

RE: Massive deletes -> major compaction?

2017-09-22 Thread Steinmaurer, Thomas
Additional to Kurt’s reply. Double disk usage is really the worst case. Most of 
the time you are fine having > largest column family free disk available.

Also take local snapshots into account. Even after a finished major compaction, 
disk space may have not been reclaimed, if snapshot sym links still keep disk 
usage of already compacted SSTable alive.

Regards,
Thomas

From: Durity, Sean R [mailto:sean_r_dur...@homedepot.com]
Sent: Freitag, 22. September 2017 13:38
To: user@cassandra.apache.org
Subject: RE: Massive deletes -> major compaction?

Thanks for the pointer. I had never heard of this. While it seems that it could 
help, I think our rules for determining which records to keep are not 
supported. Also, this requires adding a new jar to production. Too risky at 
this point.


Sean Durity

From: Jon Haddad [mailto:jonathan.had...@gmail.com] On Behalf Of Jon Haddad
Sent: Thursday, September 21, 2017 2:59 PM
To: user >
Subject: Re: Massive deletes -> major compaction?

Have you considered the fantastic DeletingCompactionStrategy?  
https://github.com/protectwise/cassandra-util/tree/master/deleting-compaction-strategy


On Sep 21, 2017, at 11:51 AM, Jeff Jirsa 
> wrote:

The major compaction is most efficient but can temporarily double (nearly) disk 
usage - if you can afford that, go for it.

Alternatively you can do a user-defined compaction on each sstable in reverse 
generational order (oldest first) and as long as the data is minimally 
overlapping it’ll purge tombstones that way as well - takes longer but much 
less disk involved.


--
Jeff Jirsa


On Sep 21, 2017, at 11:27 AM, Durity, Sean R 
> wrote:
Cassandra version 2.0.17 (yes, it’s old – waiting for new hardware/new OS to 
upgrade)

In a long-running system with billions of rows, TTL was not set. So a one-time 
purge is being planned to reduce disk usage. Records older than a certain date 
will be deleted. The table uses size-tiered compaction. Deletes are probably 
25-40% of the complete data set. To actually recover the disk space, would you 
recommend a major compaction after the gc_grace_seconds time? I expect 
compaction would then need to be scheduled regularly (ick)…

We also plan to re-insert the remaining data with a calculated TTL, which could 
also benefit from compaction.


Sean Durity



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.




The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.
The contents of this e-mail are intended for the named addressee only. It 
contains information that may be 

RE: network down between DCs

2017-09-21 Thread Steinmaurer, Thomas
Hi,

within the default hint window of 3 hours, the hinted handoff mechanism should 
take care of that, but we have seen that failing from time to time (depending 
on the load) in 2.1 with some sort of tombstone related issues causing failing 
requests on the system hints table. So, watch out any sign of hinted handoff 
troubles in the Cassandra log.

Hint storage has been re-written in 3.0+ to flat files, thus tombstone related 
troubles in that area should be gone.

Thomas

From: Hannu Kröger [mailto:hkro...@gmail.com]
Sent: Donnerstag, 21. September 2017 10:32
To: Peng Xiao <2535...@qq.com>; user@cassandra.apache.org
Subject: Re: network down between DCs

Hi,

That’s correct.

You need to run repairs only after a node/DC/connection is down for more then 
max_hint_window_in_ms.

Cheers,
Hannu




On 21 September 2017 at 11:30:44, Peng Xiao 
(2535...@qq.com) wrote:
Hi there,

We have two DCs for a Cassandra Cluster,if the network is down less than 3 
hours(default hint window),with my understanding,it will recover 
automatically,right?Do we need to run repair manually?

Thanks,
Peng Xiao
The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 313


RE: Row Cache hit issue

2017-09-19 Thread Steinmaurer, Thomas
Hi,

additionally, with saved (key) caches, we had some sort of corruption (I think, 
for whatever reason) once. So, if you see something like that upon Cassandra 
startup:

INFO [main] 2017-01-04 15:38:58,772 AutoSavingCache.java (line 114) reading 
saved cache /var/opt/xxx/cassandra/saved_caches/ks-cf-KeyCache-b.db
ERROR [main] 2017-01-04 15:38:58,891 CassandraDaemon.java (line 571) Exception 
encountered during startup
java.lang.OutOfMemoryError: Java heap space
at java.util.ArrayList.(ArrayList.java:152)
at 
org.apache.cassandra.db.RowIndexEntry$Serializer.deserialize(RowIndexEntry.java:132)
at 
org.apache.cassandra.service.CacheService$KeyCacheSerializer.deserialize(CacheService.java:365)
at 
org.apache.cassandra.cache.AutoSavingCache.loadSaved(AutoSavingCache.java:119)
at 
org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:276)
at 
org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:435)
at 
org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:406)
at org.apache.cassandra.db.Keyspace.initCf(Keyspace.java:322)
at org.apache.cassandra.db.Keyspace.(Keyspace.java:268)
at org.apache.cassandra.db.Keyspace.open(Keyspace.java:110)
at org.apache.cassandra.db.Keyspace.open(Keyspace.java:88)
at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:364)
at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:554)
at 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:643)

resulting in Cassandra going OOM, with a “reading saved cache” log entry close 
before the OOM, you may have hit some sort of corruption. Workaround is to 
physically delete the saved cache file and Cassandra will start up just fine.

Regards,
Thomas


From: Dikang Gu [mailto:dikan...@gmail.com]
Sent: Mittwoch, 20. September 2017 06:06
To: cassandra 
Subject: Re: Row Cache hit issue

Hi Peng,

C* periodically saves cache to disk, to solve cold start problem. If 
row_cache_save_period=0, it means C* does not save cache to disk. But the cache 
is still working, if it's enabled in table schema, just the cache will be empty 
after restart.

--Dikang.

On Tue, Sep 19, 2017 at 8:27 PM, Peng Xiao 
<2535...@qq.com> wrote:
And we are using C* 2.1.18.


-- Original --
From:  "我自己的邮箱";<2535...@qq.com>;
Date:  Wed, Sep 20, 2017 11:27 AM
To:  "user">;
Subject:  Row Cache hit issue

Dear All,

The default row_cache_save_period=0,looks Row Cache does not work in this 
situation?
but we can still see the row cache hit.

Row Cache  : entries 202787, size 100 MB, capacity 100 MB, 3095293 
hits, 6796801 requests, 0.455 recent hit rate, 0 save period in seconds

Could anyone please explain this?

Thanks,
Peng Xiao



--
Dikang

The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 313


RE: Reg:- Install / Configure Cassandra on 2 DCs with 3 nodes

2017-09-19 Thread Steinmaurer, Thomas
Nandan,

you may find the following useful.

Slideshare:
https://www.slideshare.net/DataStax/apache-cassandra-multidatacenter-essentials-julien-anguenot-iland-internet-solutions-c-summit-2016

Youtube:
https://www.youtube.com/watch?v=G6od16YKSsA

From a client perspective, if you are targeting quorum requests, be aware that 
there is LOCAL_QUORUM and QUORUM.

Regards,
Thomas

From: @Nandan@ [mailto:nandanpriyadarshi...@gmail.com]
Sent: Dienstag, 19. September 2017 18:58
To: user 
Subject: Reg:- Install / Configure Cassandra on 2 DCs with 3 nodes

Hi Techies,

I need to configure Apache Cassandra for my upcoming project on 2 DCs.
Both DCs should have 3 nodes respective.
Details are :-
DC1 nodes --
Node 1 ->10.0.0.1
Node 2 -> 10.0.0.2
Node 3 -> 10.0.0.3
DC2 nodes --
Node 1 -> 10.0.0.4
Node 2 -> 10.0.0.5
Node 3 -> 10.0.0.6

On all nodes , I want to use UBUNTU 16.04 .
Please suggest best way to configure my DCs as well as may be in Future I will 
extend my DCs more.

Best Regards,
Nandan Priyadarshi
The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 313


RE: Multi-node repair fails after upgrading to 3.0.14

2017-09-19 Thread Steinmaurer, Thomas
Paulo,

as requested: https://issues.apache.org/jira/browse/CASSANDRA-13885

Feel free to adjust any properties of the ticket. Hopefully it gets proper 
attention. Thanks.

Thomas

-Original Message-
From: Paulo Motta [mailto:pauloricard...@gmail.com]
Sent: Dienstag, 19. September 2017 08:56
To: user@cassandra.apache.org
Subject: Re: Multi-node repair fails after upgrading to 3.0.14

In 4.0 anti-compaction is no longer run after full repairs, so we should 
probably backport this behavior to 3.0, given there are known limitations with 
incremental repair on 3.0 and non-incremental users may want to run keep 
running full repairs without the additional cost of anti-compaction.

Would you mind opening a ticket for this?

2017-09-19 1:33 GMT-05:00 Steinmaurer, Thomas
<thomas.steinmau...@dynatrace.com>:
> Hi Kurt,
>
>
>
> thanks for the link!
>
>
>
> Honestly, a pity, that in 3.0, we can’t get the simple, reliable and
> predictable way back to run a full repair for very low data volume CFs
> being kicked off on all nodes in parallel, without all the magic
> behind the scene introduced by incremental repairs, even if not used,
> as anticompaction even with –full has been introduced with 2.2+ J
>
>
>
>
>
> Regards,
>
> Thomas
>
>
>
> From: kurt greaves [mailto:k...@instaclustr.com]
> Sent: Dienstag, 19. September 2017 06:24
> To: User <user@cassandra.apache.org>
>
>
> Subject: Re: Multi-node repair fails after upgrading to 3.0.14
>
>
>
> https://issues.apache.org/jira/browse/CASSANDRA-13153 implies full
> repairs still triggers anti-compaction on non-repaired SSTables (if
> I'm reading that right), so might need to make sure you don't run
> multiple repairs at the same time across your nodes (if your using
> vnodes), otherwise could still end up trying to run anti-compaction on the 
> same SSTable from 2 repairs.
>
>
>
> Anyone else feel free to jump in and correct me if my interpretation
> is wrong.
>
>
>
> On 18 September 2017 at 17:11, Steinmaurer, Thomas
> <thomas.steinmau...@dynatrace.com> wrote:
>
> Jeff,
>
>
>
> what should be the expected outcome when running with 3.0.14:
>
>
>
> nodetool repair –full –pr keyspace cfs
>
>
>
> · Should –full trigger anti-compaction?
>
> · Should this be the same operation as nodetool repair –pr keyspace
> cfs in 2.1?
>
> · Should I be able to  run this on several nodes in parallel as in
> 2.1 without troubles, where incremental repair was not the default?
>
>
>
> Still confused if I’m missing something obvious. Sorry about that. J
>
>
>
> Thanks,
>
> Thomas
>
>
>
> From: Jeff Jirsa [mailto:jji...@gmail.com]
> Sent: Montag, 18. September 2017 16:10
>
>
> To: user@cassandra.apache.org
> Subject: Re: Multi-node repair fails after upgrading to 3.0.14
>
>
>
> Sorry I may be wrong about the cause - didn't see -full
>
>
>
> Mea culpa, its early here and I'm not awake
>
>
>
> --
>
> Jeff Jirsa
>
>
>
>
> On Sep 18, 2017, at 7:01 AM, Steinmaurer, Thomas
> <thomas.steinmau...@dynatrace.com> wrote:
>
> Hi Jeff,
>
>
>
> understood. That’s quite a change then coming from 2.1 from an
> operational POV.
>
>
>
> Thanks again.
>
>
>
> Thomas
>
>
>
> From: Jeff Jirsa [mailto:jji...@gmail.com]
> Sent: Montag, 18. September 2017 15:56
> To: user@cassandra.apache.org
> Subject: Re: Multi-node repair fails after upgrading to 3.0.14
>
>
>
> The command you're running will cause anticompaction and the range
> borders for all instances at the same time
>
>
>
> Since only one repair session can anticompact any given sstable, it's
> almost guaranteed to fail
>
>
>
> Run it on one instance at a time
>
>
>
> --
>
> Jeff Jirsa
>
>
>
>
> On Sep 18, 2017, at 1:11 AM, Steinmaurer, Thomas
> <thomas.steinmau...@dynatrace.com> wrote:
>
> Hi Alex,
>
>
>
> I now ran nodetool repair –full –pr keyspace cfs on all nodes in
> parallel and this may pop up now:
>
>
>
> 0.176.38.128 (progress: 1%)
>
> [2017-09-18 07:59:17,145] Some repair failed
>
> [2017-09-18 07:59:17,151] Repair command #3 finished in 0 seconds
>
> error: Repair job has failed with the error message: [2017-09-18
> 07:59:17,145] Some repair failed
>
> -- StackTrace --
>
> java.lang.RuntimeException: Repair job has failed with the error message:
> [2017-09-18 07:59:17,145] Some repair failed
>
> at
> org.apache.cassandra.tools.RepairRunner.progress(RepairRunner.java:115
> )
>
> at
> org.apache.cassandra.utils.progress.

RE: Multi-node repair fails after upgrading to 3.0.14

2017-09-18 Thread Steinmaurer, Thomas
Hi Jeff,

understood. That’s quite a change then coming from 2.1 from an operational POV.

Thanks again.

Thomas

From: Jeff Jirsa [mailto:jji...@gmail.com]
Sent: Montag, 18. September 2017 15:56
To: user@cassandra.apache.org
Subject: Re: Multi-node repair fails after upgrading to 3.0.14

The command you're running will cause anticompaction and the range borders for 
all instances at the same time

Since only one repair session can anticompact any given sstable, it's almost 
guaranteed to fail

Run it on one instance at a time


--
Jeff Jirsa


On Sep 18, 2017, at 1:11 AM, Steinmaurer, Thomas 
<thomas.steinmau...@dynatrace.com<mailto:thomas.steinmau...@dynatrace.com>> 
wrote:
Hi Alex,

I now ran nodetool repair –full –pr keyspace cfs on all nodes in parallel and 
this may pop up now:

0.176.38.128 (progress: 1%)
[2017-09-18 07:59:17,145] Some repair failed
[2017-09-18 07:59:17,151] Repair command #3 finished in 0 seconds
error: Repair job has failed with the error message: [2017-09-18 07:59:17,145] 
Some repair failed
-- StackTrace --
java.lang.RuntimeException: Repair job has failed with the error message: 
[2017-09-18 07:59:17,145] Some repair failed
at 
org.apache.cassandra.tools.RepairRunner.progress(RepairRunner.java:115)
at 
org.apache.cassandra.utils.progress.jmx.JMXNotificationProgressListener.handleNotification(JMXNotificationProgressListener.java:77)
at 
com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.dispatchNotification(ClientNotifForwarder.java:583)
at 
com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.doRun(ClientNotifForwarder.java:533)
at 
com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.run(ClientNotifForwarder.java:452)
at 
com.sun.jmx.remote.internal.ClientNotifForwarder$LinearExecutor$1.run(ClientNotifForwarder.java:108)

2017-09-18 07:59:17 repair finished


If running the above nodetool call sequentially on all nodes, repair finishes 
without printing a stack trace.

The error message and stack trace isn’t really useful here. Any further 
ideas/experiences?

Thanks,
Thomas

From: Alexander Dejanovski [mailto:a...@thelastpickle.com]
Sent: Freitag, 15. September 2017 11:30
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: Multi-node repair fails after upgrading to 3.0.14

Right, you should indeed add the "--full" flag to perform full repairs, and you 
can then keep the "-pr" flag.

I'd advise to monitor the status of your SSTables as you'll probably end up 
with a pool of SSTables marked as repaired, and another pool marked as 
unrepaired which won't be compacted together (hence the suggestion of running 
subrange repairs).
Use sstablemetadata to check on the "Repaired at" value for each. 0 means 
unrepaired and any other value (a timestamp) means the SSTable has been 
repaired.
I've had behaviors in the past where running "-pr" on the whole cluster would 
still not mark all SSTables as repaired, but I can't say if that behavior has 
changed in latest versions.

Having separate pools of SStables that cannot be compacted means that you might 
have tombstones that don't get evicted due to partitions living in both states 
(repaired/unrepaired).

To sum up the recommendations :
- Run a full repair with both "--full" and "-pr" and check that SSTables are 
properly marked as repaired
- Use a tight repair schedule to avoid keeping partitions for too long in both 
repaired and unrepaired state
- Switch to subrange repair if you want to fully avoid marking SSTables as 
repaired (which you don't need anyway since you're not using incremental 
repairs). If you wish to do this, you'll have to mark back all your sstables to 
unrepaired, using nodetool 
sstablerepairedset<https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsSSTableRepairedSet.html>.

Cheers,

On Fri, Sep 15, 2017 at 10:27 AM Steinmaurer, Thomas 
<thomas.steinmau...@dynatrace.com<mailto:thomas.steinmau...@dynatrace.com>> 
wrote:
Hi Alex,

thanks a lot. Somehow missed that incremental repairs are the default now.

We have been happy with full repair so far, cause data what we currently 
manually invoke for being prepared is a small (~1GB or even smaller).

So I guess with full repairs across all nodes, we still can stick with the 
partition range (-pr) option, but with 3.0 we additionally have to provide the 
–full option, right?

Thanks again,
Thomas

From: Alexander Dejanovski 
[mailto:a...@thelastpickle.com<mailto:a...@thelastpickle.com>]
Sent: Freitag, 15. September 2017 09:45
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: Multi-node repair fails after upgrading to 3.0.14

Hi Thomas,

in 2.1.18, the default repair mode was full repair while since 2.2 it is 
incremental repair.
So running "nodetool repair -pr" since your upgrade to 3.0.14 doesn't trigger 
the 

RE: GC/CPU increase after upgrading to 3.0.14 (from 2.1.18)

2017-09-18 Thread Steinmaurer, Thomas
Hello again,

digged a bit further. Comparing 1hr flight recording sessions for both, 2.1 and 
3.0 with the same incoming simulated load from our loadtest environment.

We are heavily write than read bound in this environment/scenario and it looks 
like there is a noticeable/measurable difference in 3.0 on what is happening 
underneath org.apache.cassandra.cql3.statements.BatchStatement.execute in both 
JFR/JMC areas, Code and Memory (allocation rate / object churn).

E.g. for org.apache.cassandra.cql3.statements.BatchStatement.execute, while JFR 
reports for the 1hr session a total TLAB size of 59,35 GB, it is 246,12 GB in 
Cassandra 3.0, so if this is trustworthy, a 4 times higher allocation rate in 
the BatchStatement.execute code path, which would explain the increased GC 
suspension since upgrading.

Is anybody aware of some kind of write-bound benchmarks of the storage engine 
in 3.0 in context of CPU/GC and not disk savings?

Thanks,
Thomas


From: Steinmaurer, Thomas [mailto:thomas.steinmau...@dynatrace.com]
Sent: Freitag, 15. September 2017 13:51
To: user@cassandra.apache.org
Subject: RE: GC/CPU increase after upgrading to 3.0.14 (from 2.1.18)

Hi Jeff,

we are using native (CQL3) via Java DataStax driver (3.1). We also have 
OpsCenter running (to be removed soon) via Thrift, if I remember correctly.

As said, the write request latency for our keyspace hasn’t really changed, so 
perhaps another one (system related, OpsCenter …?) is affected or perhaps the 
JMX metric is reporting something differently now. ☺ So not a real issue for 
now hopefully, just popping up in our monitoring, wondering what this may be.

Regarding compression metadata memory usage drop. Right, storage engine 
re-write could be a reason. Thanks.

Still wondering about the GC/CPU increase.

Thanks!

Thomas



From: Jeff Jirsa [mailto:jji...@gmail.com]
Sent: Freitag, 15. September 2017 13:14
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: GC/CPU increase after upgrading to 3.0.14 (from 2.1.18)

Most people find 3.0 slightly slower than 2.1. The only thing that really 
stands out in your email is the huge change in 95% latency - that's atypical. 
Are you using thrift or native 9042)?  The decrease in compression metadata 
offheap usage is likely due to the increased storage efficiency of the storage 
engine (see Cassandra-8099).


--
Jeff Jirsa


On Sep 15, 2017, at 2:37 AM, Steinmaurer, Thomas 
<thomas.steinmau...@dynatrace.com<mailto:thomas.steinmau...@dynatrace.com>> 
wrote:
Hello,

we have a test (regression) environment hosted in AWS, which is used for auto 
deploying our software on a daily basis and attach constant load across all 
deployments. Basically to allow us to detect any regressions in our software on 
a daily basis.

On the Cassandra-side, this is single-node in AWS, m4.xlarge, EBS gp2, 8G heap, 
CMS. The environment has also been upgraded from Cassandra 2.1.18 to 3.0.14 at 
a certain point in time. Without running upgradesstables so far. We have not 
made any additional JVM/GC configuration change when going from 2.1.18 to 
3.0.14 on our own, thus, any self-made configuration changes (e.g. new gen heap 
size) for 2.1.18 are also in place with 3.0.14.

What we see after a time-frame of ~ 7 days (so, e.g. should not be caused by 
some sort of spiky compaction pattern) is an AVG increase in GC/CPU (most 
likely correlating):

· CPU: ~ 12% => ~ 17%

· GC Suspension: ~ 1,7% => 3,29%

In this environment not a big deal, but relatively we have a CPU increase of ~ 
50% (with increased GC most likely contributing). Something we have deal with 
when going into production (going into larger, multi-node loadtest environments 
first though).

Beside the CPU/GC shift, we also monitor the following noticeable changes 
(don’t know if they somehow correlate with the CPU/GC shift above):

· Increased AVG Write Client Requests Latency (95th Percentile), 
org.apache.cassandra.metrics.ClientRequest.Latency.Write: 6,05ms => 29,2ms, but 
almost constant (no change in) write client request latency for our particular 
keyspace only, org.apache.cassandra.metrics.Keyspace.ruxitdb.WriteLatency

· Compression metadata memory usage drop, 
org.apache.cassandra.metrics.Keyspace.XXX. 
CompressionMetadataOffHeapMemoryUsed: ~218MB => ~105MB => Good or bad? Known?

I know, looks all a bit vague, but perhaps someone else has seen something 
similar when upgrading to 3.0.14 and can share their thoughts/ideas. Especially 
the (relative) CPU/GC increase is something we are curious about.

Thanks a lot.

Thomas
The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria Gm

RE: Multi-node repair fails after upgrading to 3.0.14

2017-09-18 Thread Steinmaurer, Thomas
Hi Alex,

I now ran nodetool repair –full –pr keyspace cfs on all nodes in parallel and 
this may pop up now:

0.176.38.128 (progress: 1%)
[2017-09-18 07:59:17,145] Some repair failed
[2017-09-18 07:59:17,151] Repair command #3 finished in 0 seconds
error: Repair job has failed with the error message: [2017-09-18 07:59:17,145] 
Some repair failed
-- StackTrace --
java.lang.RuntimeException: Repair job has failed with the error message: 
[2017-09-18 07:59:17,145] Some repair failed
at 
org.apache.cassandra.tools.RepairRunner.progress(RepairRunner.java:115)
at 
org.apache.cassandra.utils.progress.jmx.JMXNotificationProgressListener.handleNotification(JMXNotificationProgressListener.java:77)
at 
com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.dispatchNotification(ClientNotifForwarder.java:583)
at 
com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.doRun(ClientNotifForwarder.java:533)
at 
com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.run(ClientNotifForwarder.java:452)
at 
com.sun.jmx.remote.internal.ClientNotifForwarder$LinearExecutor$1.run(ClientNotifForwarder.java:108)

2017-09-18 07:59:17 repair finished


If running the above nodetool call sequentially on all nodes, repair finishes 
without printing a stack trace.

The error message and stack trace isn’t really useful here. Any further 
ideas/experiences?

Thanks,
Thomas

From: Alexander Dejanovski [mailto:a...@thelastpickle.com]
Sent: Freitag, 15. September 2017 11:30
To: user@cassandra.apache.org
Subject: Re: Multi-node repair fails after upgrading to 3.0.14

Right, you should indeed add the "--full" flag to perform full repairs, and you 
can then keep the "-pr" flag.

I'd advise to monitor the status of your SSTables as you'll probably end up 
with a pool of SSTables marked as repaired, and another pool marked as 
unrepaired which won't be compacted together (hence the suggestion of running 
subrange repairs).
Use sstablemetadata to check on the "Repaired at" value for each. 0 means 
unrepaired and any other value (a timestamp) means the SSTable has been 
repaired.
I've had behaviors in the past where running "-pr" on the whole cluster would 
still not mark all SSTables as repaired, but I can't say if that behavior has 
changed in latest versions.

Having separate pools of SStables that cannot be compacted means that you might 
have tombstones that don't get evicted due to partitions living in both states 
(repaired/unrepaired).

To sum up the recommendations :
- Run a full repair with both "--full" and "-pr" and check that SSTables are 
properly marked as repaired
- Use a tight repair schedule to avoid keeping partitions for too long in both 
repaired and unrepaired state
- Switch to subrange repair if you want to fully avoid marking SSTables as 
repaired (which you don't need anyway since you're not using incremental 
repairs). If you wish to do this, you'll have to mark back all your sstables to 
unrepaired, using nodetool 
sstablerepairedset<https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsSSTableRepairedSet.html>.

Cheers,

On Fri, Sep 15, 2017 at 10:27 AM Steinmaurer, Thomas 
<thomas.steinmau...@dynatrace.com<mailto:thomas.steinmau...@dynatrace.com>> 
wrote:
Hi Alex,

thanks a lot. Somehow missed that incremental repairs are the default now.

We have been happy with full repair so far, cause data what we currently 
manually invoke for being prepared is a small (~1GB or even smaller).

So I guess with full repairs across all nodes, we still can stick with the 
partition range (-pr) option, but with 3.0 we additionally have to provide the 
–full option, right?

Thanks again,
Thomas

From: Alexander Dejanovski 
[mailto:a...@thelastpickle.com<mailto:a...@thelastpickle.com>]
Sent: Freitag, 15. September 2017 09:45
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: Multi-node repair fails after upgrading to 3.0.14

Hi Thomas,

in 2.1.18, the default repair mode was full repair while since 2.2 it is 
incremental repair.
So running "nodetool repair -pr" since your upgrade to 3.0.14 doesn't trigger 
the same operation.

Incremental repair cannot run on more than one node at a time on a cluster, 
because you risk to have conflicts with sessions trying to anticompact and run 
validation compactions on the same SSTables (which will make the validation 
phase fail, like your logs are showing).
Furthermore, you should never use "-pr" with incremental repair because it is 
useless in that mode, and won't properly perform anticompaction on all nodes.

If you were happy with full repairs in 2.1.18, I'd suggest to stick with those 
in 3.0.14 as well because there are still too many caveats with incremental 
repairs that should hopefully be fixed in 4.0+.
Note that full repair will also trigger anticompaction and mark SSTables

RE: Compaction in cassandra

2017-09-15 Thread Steinmaurer, Thomas
Hi,

usually automatic minor compactions are fine, but you may need much more free 
disk space to reclaim disk space via automatic minor compactions, especially in 
a time series use case with size-tiered compaction strategy (possibly with 
leveled as well, I’m not familiar with this strategy type). We are in the time 
series / STCS combination and currently plan to run a major compaction every X 
weeks. Although not perfect, this is currently our only way to effectively 
really get rid of out-dated data from disk, without the extra cost of storage 
we would additionally need, cause it needs a lot of time that delete markers 
(tombstones) according to our retention policy are actually getting 
automatically minor compacted with potentially large SSTables. Mind you, with 
pre 2.2, a major compaction results in a single (large) SSTable again, so the 
whole disk usage troubles start again. With 2.2+ there is an option to end up 
with SSTables in 50%, 25% etc.. in file size per column family / table, so this 
might be useful.

If you have a time series use case you may want to look at the new time window 
compaction strategy introduced in 3.0, but it relies on TTL-based time series 
data only. We tested it and it works great, but unfortunately we can’t use it, 
cause we may have different TTL/retention policies in a single column family, 
even varying retention configurations per customer over time, so TWCS not 
really an option here, unfortunately.

Thomas

From: Akshit Jain [mailto:akshit13...@iiitd.ac.in]
Sent: Donnerstag, 14. September 2017 08:50
To: user@cassandra.apache.org
Subject: Compaction in cassandra

Is it helpful to run nodetool compaction in cassandra?
or automatic compaction is just fine.
Regards

The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 313


RE: GC/CPU increase after upgrading to 3.0.14 (from 2.1.18)

2017-09-15 Thread Steinmaurer, Thomas
Hi Jeff,

we are using native (CQL3) via Java DataStax driver (3.1). We also have 
OpsCenter running (to be removed soon) via Thrift, if I remember correctly.

As said, the write request latency for our keyspace hasn’t really changed, so 
perhaps another one (system related, OpsCenter …?) is affected or perhaps the 
JMX metric is reporting something differently now. ☺ So not a real issue for 
now hopefully, just popping up in our monitoring, wondering what this may be.

Regarding compression metadata memory usage drop. Right, storage engine 
re-write could be a reason. Thanks.

Still wondering about the GC/CPU increase.

Thanks!

Thomas



From: Jeff Jirsa [mailto:jji...@gmail.com]
Sent: Freitag, 15. September 2017 13:14
To: user@cassandra.apache.org
Subject: Re: GC/CPU increase after upgrading to 3.0.14 (from 2.1.18)

Most people find 3.0 slightly slower than 2.1. The only thing that really 
stands out in your email is the huge change in 95% latency - that's atypical. 
Are you using thrift or native 9042)?  The decrease in compression metadata 
offheap usage is likely due to the increased storage efficiency of the storage 
engine (see Cassandra-8099).


--
Jeff Jirsa


On Sep 15, 2017, at 2:37 AM, Steinmaurer, Thomas 
<thomas.steinmau...@dynatrace.com<mailto:thomas.steinmau...@dynatrace.com>> 
wrote:
Hello,

we have a test (regression) environment hosted in AWS, which is used for auto 
deploying our software on a daily basis and attach constant load across all 
deployments. Basically to allow us to detect any regressions in our software on 
a daily basis.

On the Cassandra-side, this is single-node in AWS, m4.xlarge, EBS gp2, 8G heap, 
CMS. The environment has also been upgraded from Cassandra 2.1.18 to 3.0.14 at 
a certain point in time. Without running upgradesstables so far. We have not 
made any additional JVM/GC configuration change when going from 2.1.18 to 
3.0.14 on our own, thus, any self-made configuration changes (e.g. new gen heap 
size) for 2.1.18 are also in place with 3.0.14.

What we see after a time-frame of ~ 7 days (so, e.g. should not be caused by 
some sort of spiky compaction pattern) is an AVG increase in GC/CPU (most 
likely correlating):

· CPU: ~ 12% => ~ 17%

· GC Suspension: ~ 1,7% => 3,29%

In this environment not a big deal, but relatively we have a CPU increase of ~ 
50% (with increased GC most likely contributing). Something we have deal with 
when going into production (going into larger, multi-node loadtest environments 
first though).

Beside the CPU/GC shift, we also monitor the following noticeable changes 
(don’t know if they somehow correlate with the CPU/GC shift above):

· Increased AVG Write Client Requests Latency (95th Percentile), 
org.apache.cassandra.metrics.ClientRequest.Latency.Write: 6,05ms => 29,2ms, but 
almost constant (no change in) write client request latency for our particular 
keyspace only, org.apache.cassandra.metrics.Keyspace.ruxitdb.WriteLatency

· Compression metadata memory usage drop, 
org.apache.cassandra.metrics.Keyspace.XXX. 
CompressionMetadataOffHeapMemoryUsed: ~218MB => ~105MB => Good or bad? Known?

I know, looks all a bit vague, but perhaps someone else has seen something 
similar when upgrading to 3.0.14 and can share their thoughts/ideas. Especially 
the (relative) CPU/GC increase is something we are curious about.

Thanks a lot.

Thomas
The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 313
The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 313


RE: Multi-node repair fails after upgrading to 3.0.14

2017-09-15 Thread Steinmaurer, Thomas
Alex,

thanks again! We will switch back to the 2.1 behavior for now.

Thomas

From: Alexander Dejanovski [mailto:a...@thelastpickle.com]
Sent: Freitag, 15. September 2017 11:30
To: user@cassandra.apache.org
Subject: Re: Multi-node repair fails after upgrading to 3.0.14

Right, you should indeed add the "--full" flag to perform full repairs, and you 
can then keep the "-pr" flag.

I'd advise to monitor the status of your SSTables as you'll probably end up 
with a pool of SSTables marked as repaired, and another pool marked as 
unrepaired which won't be compacted together (hence the suggestion of running 
subrange repairs).
Use sstablemetadata to check on the "Repaired at" value for each. 0 means 
unrepaired and any other value (a timestamp) means the SSTable has been 
repaired.
I've had behaviors in the past where running "-pr" on the whole cluster would 
still not mark all SSTables as repaired, but I can't say if that behavior has 
changed in latest versions.

Having separate pools of SStables that cannot be compacted means that you might 
have tombstones that don't get evicted due to partitions living in both states 
(repaired/unrepaired).

To sum up the recommendations :
- Run a full repair with both "--full" and "-pr" and check that SSTables are 
properly marked as repaired
- Use a tight repair schedule to avoid keeping partitions for too long in both 
repaired and unrepaired state
- Switch to subrange repair if you want to fully avoid marking SSTables as 
repaired (which you don't need anyway since you're not using incremental 
repairs). If you wish to do this, you'll have to mark back all your sstables to 
unrepaired, using nodetool 
sstablerepairedset<https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsSSTableRepairedSet.html>.

Cheers,

On Fri, Sep 15, 2017 at 10:27 AM Steinmaurer, Thomas 
<thomas.steinmau...@dynatrace.com<mailto:thomas.steinmau...@dynatrace.com>> 
wrote:
Hi Alex,

thanks a lot. Somehow missed that incremental repairs are the default now.

We have been happy with full repair so far, cause data what we currently 
manually invoke for being prepared is a small (~1GB or even smaller).

So I guess with full repairs across all nodes, we still can stick with the 
partition range (-pr) option, but with 3.0 we additionally have to provide the 
–full option, right?

Thanks again,
Thomas

From: Alexander Dejanovski 
[mailto:a...@thelastpickle.com<mailto:a...@thelastpickle.com>]
Sent: Freitag, 15. September 2017 09:45
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: Multi-node repair fails after upgrading to 3.0.14

Hi Thomas,

in 2.1.18, the default repair mode was full repair while since 2.2 it is 
incremental repair.
So running "nodetool repair -pr" since your upgrade to 3.0.14 doesn't trigger 
the same operation.

Incremental repair cannot run on more than one node at a time on a cluster, 
because you risk to have conflicts with sessions trying to anticompact and run 
validation compactions on the same SSTables (which will make the validation 
phase fail, like your logs are showing).
Furthermore, you should never use "-pr" with incremental repair because it is 
useless in that mode, and won't properly perform anticompaction on all nodes.

If you were happy with full repairs in 2.1.18, I'd suggest to stick with those 
in 3.0.14 as well because there are still too many caveats with incremental 
repairs that should hopefully be fixed in 4.0+.
Note that full repair will also trigger anticompaction and mark SSTables as 
repaired in your release of Cassandra, and only full subrange repairs are the 
only flavor that will skip anticompaction.

You will need some tooling to help with subrange repairs though, and I'd 
recommend to use Reaper which handles automation for you : 
http://cassandra-reaper.io/

If you decide to stick with incremental repairs, first perform a rolling 
restart of your cluster to make sure no repair session still runs, and run 
"nodetool repair" on a single node at a time. Move on to the next node only 
when nodetool or the logs show that repair is over (which will include the 
anticompaction phase).

Cheers,



On Fri, Sep 15, 2017 at 8:42 AM Steinmaurer, Thomas 
<thomas.steinmau...@dynatrace.com<mailto:thomas.steinmau...@dynatrace.com>> 
wrote:
Hello,

we are currently in the process of upgrading from 2.1.18 to 3.0.14. After 
upgrading a few test environments, we start to see some suspicious log entries 
regarding repair issues.

We have a cron job on all nodes basically executing the following repair call 
on a daily basis:

nodetool repair –pr 

This gets started on all nodes at the same time. While this has worked with 
2.1.18 (at least we haven’t seen anything suspicious in Cassandra log), with 
3.0.14 we get something similar like that on all nodes (see below; IP addresses 
and KS/CF faked).

An

GC/CPU increase after upgrading to 3.0.14 (from 2.1.18)

2017-09-15 Thread Steinmaurer, Thomas
Hello,

we have a test (regression) environment hosted in AWS, which is used for auto 
deploying our software on a daily basis and attach constant load across all 
deployments. Basically to allow us to detect any regressions in our software on 
a daily basis.

On the Cassandra-side, this is single-node in AWS, m4.xlarge, EBS gp2, 8G heap, 
CMS. The environment has also been upgraded from Cassandra 2.1.18 to 3.0.14 at 
a certain point in time. Without running upgradesstables so far. We have not 
made any additional JVM/GC configuration change when going from 2.1.18 to 
3.0.14 on our own, thus, any self-made configuration changes (e.g. new gen heap 
size) for 2.1.18 are also in place with 3.0.14.

What we see after a time-frame of ~ 7 days (so, e.g. should not be caused by 
some sort of spiky compaction pattern) is an AVG increase in GC/CPU (most 
likely correlating):

* CPU: ~ 12% => ~ 17%

* GC Suspension: ~ 1,7% => 3,29%

In this environment not a big deal, but relatively we have a CPU increase of ~ 
50% (with increased GC most likely contributing). Something we have deal with 
when going into production (going into larger, multi-node loadtest environments 
first though).

Beside the CPU/GC shift, we also monitor the following noticeable changes 
(don't know if they somehow correlate with the CPU/GC shift above):

* Increased AVG Write Client Requests Latency (95th Percentile), 
org.apache.cassandra.metrics.ClientRequest.Latency.Write: 6,05ms => 29,2ms, but 
almost constant (no change in) write client request latency for our particular 
keyspace only, org.apache.cassandra.metrics.Keyspace.ruxitdb.WriteLatency

* Compression metadata memory usage drop, 
org.apache.cassandra.metrics.Keyspace.XXX. 
CompressionMetadataOffHeapMemoryUsed: ~218MB => ~105MB => Good or bad? Known?

I know, looks all a bit vague, but perhaps someone else has seen something 
similar when upgrading to 3.0.14 and can share their thoughts/ideas. Especially 
the (relative) CPU/GC increase is something we are curious about.

Thanks a lot.

Thomas
The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freist?dterstra?e 313


RE: Multi-node repair fails after upgrading to 3.0.14

2017-09-15 Thread Steinmaurer, Thomas
Hi Alex,

thanks a lot. Somehow missed that incremental repairs are the default now.

We have been happy with full repair so far, cause data what we currently 
manually invoke for being prepared is a small (~1GB or even smaller).

So I guess with full repairs across all nodes, we still can stick with the 
partition range (-pr) option, but with 3.0 we additionally have to provide the 
–full option, right?

Thanks again,
Thomas

From: Alexander Dejanovski [mailto:a...@thelastpickle.com]
Sent: Freitag, 15. September 2017 09:45
To: user@cassandra.apache.org
Subject: Re: Multi-node repair fails after upgrading to 3.0.14

Hi Thomas,

in 2.1.18, the default repair mode was full repair while since 2.2 it is 
incremental repair.
So running "nodetool repair -pr" since your upgrade to 3.0.14 doesn't trigger 
the same operation.

Incremental repair cannot run on more than one node at a time on a cluster, 
because you risk to have conflicts with sessions trying to anticompact and run 
validation compactions on the same SSTables (which will make the validation 
phase fail, like your logs are showing).
Furthermore, you should never use "-pr" with incremental repair because it is 
useless in that mode, and won't properly perform anticompaction on all nodes.

If you were happy with full repairs in 2.1.18, I'd suggest to stick with those 
in 3.0.14 as well because there are still too many caveats with incremental 
repairs that should hopefully be fixed in 4.0+.
Note that full repair will also trigger anticompaction and mark SSTables as 
repaired in your release of Cassandra, and only full subrange repairs are the 
only flavor that will skip anticompaction.

You will need some tooling to help with subrange repairs though, and I'd 
recommend to use Reaper which handles automation for you : 
http://cassandra-reaper.io/

If you decide to stick with incremental repairs, first perform a rolling 
restart of your cluster to make sure no repair session still runs, and run 
"nodetool repair" on a single node at a time. Move on to the next node only 
when nodetool or the logs show that repair is over (which will include the 
anticompaction phase).

Cheers,



On Fri, Sep 15, 2017 at 8:42 AM Steinmaurer, Thomas 
<thomas.steinmau...@dynatrace.com<mailto:thomas.steinmau...@dynatrace.com>> 
wrote:
Hello,

we are currently in the process of upgrading from 2.1.18 to 3.0.14. After 
upgrading a few test environments, we start to see some suspicious log entries 
regarding repair issues.

We have a cron job on all nodes basically executing the following repair call 
on a daily basis:

nodetool repair –pr 

This gets started on all nodes at the same time. While this has worked with 
2.1.18 (at least we haven’t seen anything suspicious in Cassandra log), with 
3.0.14 we get something similar like that on all nodes (see below; IP addresses 
and KS/CF faked).

Any pointers are appreciated. Thanks.
Thomas


INFO  [Thread-2941] 2017-09-15 03:00:28,036 RepairSession.java:224 - [repair 
#071f81e0-99c2-11e7-91dc-6132f5fe5fb0] new session: will sync /FAKE.33.64, 
/FAKE.35.153, /FAKE.34.171 on range [(8195393703879512303,8196334842725538685], 
(8166975326273137878,8182604850967732931], 
(-7246799942440641887,-7227869626613009045], 
(-8371707510273823988,-8365977215604569699], 
(-141862581573028594,-140310864869418908], 
(3732113975108886193,3743105867152786342], 
(4998127507903069087,5008922734235607550], 
(-5115827291264930140,-5111054924035590372], 
(-2475342271852943287,-2447285553369030332], 
(-8318606053827235336,-8308721754886697230], 
(-5208900659917654871,-5202385837264015269], 
(6618737991399272130,6623100721269775102], 
(-4650650128572424858,-4650260492494258461], 
(1886545362164970333,1886646959491599822], 
(-4511817721998311568,-4507491187192881115], 
(8114903118676615937,8132992506844206601], 
(6224957219376301858,6304379125732293904], 
(-3460547504877234383,-3459262416082517136], 
(-167838948111369123,-141862581573028594], 
(481579232521229473,491242114841289497], 
(4052464144722307684,4059745901618136723], 
(1659668187498418295,1679582585970705122], 
(-1118922763210109192,-1093766915505652874], 
(7504365235878319341,752615210185292], 
(-79866884352549492,-77667207866300333], 
(8151204058820798561,8154760186218662205], 
(-1040398370287131739,-1033770179677543189], 
(3767057277953758442,3783780844370292025], 
(-6491678058233994892,-6487797181789288329], 
(-916868210769480248,-907141794196269524], 
(-9005441616028750657,-9002220258513351832], 
(8183526518331102304,8186908810225025483], 
(-5685737903527826627,-5672136154194382932], 
(4976122621177738811,4987871287137312689], 
(6051670147160447042,6051686987147911650], 
(-1161640137086921883,-1159172734746043158], 
(6895951547735922309,6899152466544114890], 
(-3357667382515377172,-3356304907368646189], 
(-5370953856683870319,-5345971445444542485], 
(3824272999898372667,3829315045986248983], 
(8132992506844206601,8149858096109302285],

Multi-node repair fails after upgrading to 3.0.14

2017-09-15 Thread Steinmaurer, Thomas
Hello,

we are currently in the process of upgrading from 2.1.18 to 3.0.14. After 
upgrading a few test environments, we start to see some suspicious log entries 
regarding repair issues.

We have a cron job on all nodes basically executing the following repair call 
on a daily basis:

nodetool repair -pr 

This gets started on all nodes at the same time. While this has worked with 
2.1.18 (at least we haven't seen anything suspicious in Cassandra log), with 
3.0.14 we get something similar like that on all nodes (see below; IP addresses 
and KS/CF faked).

Any pointers are appreciated. Thanks.
Thomas


INFO  [Thread-2941] 2017-09-15 03:00:28,036 RepairSession.java:224 - [repair 
#071f81e0-99c2-11e7-91dc-6132f5fe5fb0] new session: will sync /FAKE.33.64, 
/FAKE.35.153, /FAKE.34.171 on range [(8195393703879512303,8196334842725538685], 
(8166975326273137878,8182604850967732931], 
(-7246799942440641887,-7227869626613009045], 
(-8371707510273823988,-8365977215604569699], 
(-141862581573028594,-140310864869418908], 
(3732113975108886193,3743105867152786342], 
(4998127507903069087,5008922734235607550], 
(-5115827291264930140,-5111054924035590372], 
(-2475342271852943287,-2447285553369030332], 
(-8318606053827235336,-8308721754886697230], 
(-5208900659917654871,-5202385837264015269], 
(6618737991399272130,6623100721269775102], 
(-4650650128572424858,-4650260492494258461], 
(1886545362164970333,1886646959491599822], 
(-4511817721998311568,-4507491187192881115], 
(8114903118676615937,8132992506844206601], 
(6224957219376301858,6304379125732293904], 
(-3460547504877234383,-3459262416082517136], 
(-167838948111369123,-141862581573028594], 
(481579232521229473,491242114841289497], 
(4052464144722307684,4059745901618136723], 
(1659668187498418295,1679582585970705122], 
(-1118922763210109192,-1093766915505652874], 
(7504365235878319341,752615210185292], 
(-79866884352549492,-77667207866300333], 
(8151204058820798561,8154760186218662205], 
(-1040398370287131739,-1033770179677543189], 
(3767057277953758442,3783780844370292025], 
(-6491678058233994892,-6487797181789288329], 
(-916868210769480248,-907141794196269524], 
(-9005441616028750657,-9002220258513351832], 
(8183526518331102304,8186908810225025483], 
(-5685737903527826627,-5672136154194382932], 
(4976122621177738811,4987871287137312689], 
(6051670147160447042,6051686987147911650], 
(-1161640137086921883,-1159172734746043158], 
(6895951547735922309,6899152466544114890], 
(-3357667382515377172,-3356304907368646189], 
(-5370953856683870319,-5345971445444542485], 
(3824272999898372667,3829315045986248983], 
(8132992506844206601,8149858096109302285], 
(3975126143101303723,3980729378827590597], 
(-956691623200349709,-946602525018301692], 
(-82499927325251331,-79866884352549492], 
(3952144214544622998,3955602392726495936], 
(8154760186218662205,8157079055586089583], 
(3840595196718778916,3866458971850198755], 
(-1066905024007783341,-1055954824488508260], 
(-7252356975874511782,-7246799942440641887], 
(-810612946397276081,-792189809286829222], 
(4964519403172053705,4970446606512414858], 
(-5380038118840759647,-5370953856683870319], 
(-3221630728515706463,-3206856875356976885], 
(-1193448110686154165,-1161640137086921883], 
(-3356304907368646189,-3346460884208327912], 
(3466596314109623830,346814432669172], 
(-9050241313548454460,-9005441616028750657], 
(402227699082311580,407458511300218383]] for XXX.[YYY, ZZZ]
INFO  [Repair#1:1] 2017-09-15 03:00:28,419 RepairJob.java:172 - [repair 
#071f81e0-99c2-11e7-91dc-6132f5fe5fb0] Requesting merkle trees for YYY (to 
[/FAKE.35.153, /FAKE.34.171, /FAKE.33.64])
INFO  [Thread-2941] 2017-09-15 03:00:28,434 RepairSession.java:224 - [repair 
#075d2720-99c2-11e7-91dc-6132f5fe5fb0] new session: will sync /FAKE.33.64, 
/FAKE.35.57, /FAKE.34.171 on range 
[(-5410955131843184047,-5390722609201388849], 
(-2429793939970389370,-2402273315769352748], 
(8085575576842594575,8086965740279021106], 
(-8802193901675845653,-8790472027607832351], 
(-3900412470120874591,-3892641480459306647], 
(5455804264750818305,5465037357825542970], 
(4930767198829659527,4939587074207662799], 
(8086965740279021106,8087442741329154201], 
(-8933201045321260661,-8926445549049070674], 
(-4841328524165418854,-4838895482794593338], 
(628107265570603622,682509946926464280], 
(7043245467621414187,7055126022831789025], 
(624871765540463735,627374995781897409], 
(9219228482330263660,9221294940422311559], 
(-2335215188301493066,-2315034243278984017], 
(-6216599212198827632,-6211460136507414133], 
(-3276490559558850323,-3273110814046238767], 
(7204991007334459472,7214826985711309418], 
(1815809811279373566,1846961604192445001], 
(8743912118048160970,8751518028513315549], 
(-9204701745739426439,-9200185935622985719], 
(7926527126882050773,7941554683778488797], 
(-1307707180308444994,-1274682085495751899], 
(8354147540115782875,8358523989614737607], 
(-5418282332713406631,-541509309282099], 
(2436459402559272117,2441988676982099299], 

Questions on time series use case, tombstones, TWCS

2017-08-09 Thread Steinmaurer, Thomas
Hello,

our top contributor from a data volume perspective is time series data. We are 
running with STCS since our initial production deployment in 2014 with several 
clusters with a varying number of nodes, but currently with max. 9 nodes per 
single cluster per different region in AWS with m4.xlarge / EBS gp2 storage. We 
have a road of Cassandra versions starting with 1.2 to actually using DSC 
2.1.15 soon to be replaced by Apache Cassandra 2.1.18 across all deployments. 
Lately we switched from Thrift (Astyanax) to Native/CQL (DataStax driver). 
Overall we are pretty happy with stability and the scale out offering.

We store time series data in different resolutions, from 1min up to 1day 
aggregates per "time slot". Each resolution has its own column family / table 
and a periodic worker is executing our business logic regarding time series 
aging from e.g. 1min => 5min => ... resolution + deletion in source resolutions 
according to our retention per resolution policy. So deletions will happen way 
later (e.g. at least > 14d). We don't use TTLs on written time series data (in 
production, see TWCS testing below), so purging is exclusively handled by 
explicit DELETEs in our aging business logic creating tombstones.

Naturally with STCS and late explicit deletions / tombstones, it will take a 
lot of time to finally reclaim disk space, even worse, we are now running a 
major compaction every X weeks. We currently are also testing with STCS 
min_threshold = 2 etc., but all in all, this all feels not being a long-term 
solution. I guess there is nothing else we are missing from a 
configuration/setting side with STCS? Single SSTable compaction might not kick 
in as well, cause checking with sstablemeta, estimated droppable tombstones for 
our time series based SSTables is pretty much 0.0 all the time. I guess as we 
don't write with TTL?

TWCS caught my eyes in 2015 I think, and even more at the Cassandra Summit 2016 
+ other Tombstone related talks. Cassandra 3.0 is around 6 months ahead for us, 
thus initial testing was with 2.1.18 patched with TWCS from GitHub.

Looks like TWCS is exactly what we need, thus test-wise, once we start writing 
with TTL we end up with a single SSTable per passed window size and data 
(SSTables) older than TTL + grace get automatically removed from disk. Even 
with enabled out-of-orders DELETEs from our business logic, purging SSTables 
seems not be stucked. Not sure if this is expected. Writing with TTL is also a 
bit problematic, in case our retention policy changes in general or for 
particular customers.

A few questions, as we need some short-term (with C* 2.1) and long-term (with 
C* 3.0) mitigation:

* With STCS, estimated droppable tombstones being always 0.0 (thus also 
no automatic single SSTable compaction may happen): Is this a matter of not 
writing with TTL? If yes, would enabling TTL with STCS improve the disk reclaim 
situation, cause then single SSTAble compactions will kick in?

* What is the semantic of "default_time_to_live" at table level? From: 
http://docs.datastax.com/en/cql/3.1/cql/cql_using/use_expire_c.html : "After 
the default_time_to_live TTL value has been exceed, Cassandra tombstones the 
entire table". What does "entire table" mean? Hopefully / I guess I don't end 
up with an empty table every X past TTLs?

* Anything else I'm missing regarding STCS and reclaiming disk space 
earlier in our TS use case?

* I know, changing compaction is a matter of executing ALTER TABLE (or 
temporary via JMX for a single node), but as we have legacy data being written 
without TTL, I wonder if we may end up in stuck SSTable again

* In case of stuck SSTables with any compaction strategy, what is the 
best way to debug/analyze why it got stuck (overlapping etc.)?

Thanks a lot and sorry for the lengthy email.

Thomas
The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freist?dterstra?e 313