Hi There -
I have noticed an issue where I consistently see high p999 read latency on
a node for a few hours after replacing the node. Before replacing the node,
the p999 read latency is ~30ms, but after it increases to 1-5s. I am
running C* 3.11.2 in EC2.
I am testing out using EBS snapshots of
8 at 2:24 PM, Mike Torra wrote:
>
>> Hi There -
>>
>> I have noticed an issue where I consistently see high p999 read latency
>> on a node for a few hours after replacing the node. Before replacing the
>> node, the p999 read latency is ~30ms, but after it increa
Hi Guys -
I recently ran in to a problem (for the 2nd time) where my nodejs app for
some reason refuses to connect to one node in my C* cluster. I noticed that
in both cases, the node that was not receiving any client connections had
the same private ip as another node in the cluster, but in a dif
Hello -
I have a 48 node C* cluster spread across 4 AWS regions with RF=3. A few
months ago I started noticing disk usage on some nodes increasing
consistently. At first I solved the problem by destroying the nodes and
rebuilding them, but the problem returns.
I did some more investigation recent
etion_info" : {
"local_delete_time" : "2019-01-22T17:59:35Z" }
}
]
}
]
}
```
As expected, almost all of the data except this one suspicious partition
has a ttl and is already expired. But if a partition isn't expired and I
see it in t
kle.com/blog/2016/12/08/TWCS-part1.html the sections
> towards the bottom of this post may well explain why the sstable is not
> being deleted.
>
> Thanks
>
> Paul
> www.redshots.com
>
> On 2 May 2019, at 16:08, Mike Torra wrote:
>
> I'm pretty stumped by this
hen you did the major compaction.
>
> This would happen on all replicas of the data, hence the reason you this
> problem on 3 nodes.
>
> Thanks
>
> Paul
> www.redshots.com
>
> On 3 May 2019, at 15:35, Mike Torra wrote:
>
> This does indeed seem to be a problem of o
at
> effectively blocks all other expiring cells from being purged.
>
> --
> Jeff Jirsa
>
>
> On May 3, 2019, at 7:57 PM, Nick Hatfield
> wrote:
>
> Hi Mike,
>
>
>
> If you will, share your compaction settings. More than likely, your issue
> is from 1 of
properties, you’ll compact
> away most of the other data in those old sstables (but not the partition
> that’s been manually updated)
>
> Also table level TTLs help catch this type of manual manipulation -
> consider adding it if appropriate.
>
> --
> Jeff Jirsa
>
>
&
Hi All -
I am trying to bootstrap a replacement node in a cluster, but it consistently
fails to bootstrap because of OOM exceptions. For almost a week I've been going
through cycles of bootstrapping, finding errors, then restarting / resuming
bootstrap, and I am struggling to move forward. Some
o:user@cassandra.apache.org>"
mailto:user@cassandra.apache.org>>
Date: Wednesday, November 2, 2016 at 1:07 PM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>"
mailto:user@cassandra.apache.org>>
Subject: Re: failing bootstraps with OOM
On Wed, Nov 2, 2016 at 3:
Hi There -
I recently upgraded from cassandra 3.5 to 3.9 (DDC), and I noticed that the
"new" jvm metrics are reporting with an extra '.' character in them. Here is a
snippet of what I see from one of my nodes:
ubuntu@ip-10-0-2-163:~$ sudo tcpdump -i eth0 -v dst port 2003 -A | grep 'jvm'
tcpdu
Just bumping - has anyone seen this before?
http://stackoverflow.com/questions/41446352/cassandra-3-9-jvm-metrics-have-bad-name
From: Mike Torra mailto:mto...@demandware.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>"
mailto:user@cassa
We currently use redis to store sorted sets that we increment many, many times
more than we read. For example, only about 5% of these sets are ever read. We
are getting to the point where redis is becoming difficult to scale (currently
at >20 nodes).
We've started using cassandra for other thin
e to
sort them by score for you, so you will have to load the complete set to redis
for caching and / or do sorting in your app on demand. This certainly won't
work out well with sets with millions of entries.
2017-01-13 23:14 GMT+01:00 Mike Torra
mailto:mto...@demandware.com>>:
W
Hi there -
Cluster info:
C* 3.9, replicated across 4 EC2 regions (us-east-1, us-west-2, eu-west-1,
ap-southeast-1), c4.4xlarge
Around the same time every day (~7-8am EST), 2 DC's (eu-west-1 and
ap-southeast-1) in our cluster start experiencing a high number of timeouts
(Connection.TotalTimeouts
I can't say that I have tried that while the issue is going on, but I have
done such rolling restarts for sure, and the timeouts still occur every
day. What would a rolling restart do to fix the issue?
In fact, as I write this, I am restarting each node one by one in the
eu-west-1 datacenter, and
I'm trying to change compaction strategy one node at a time. I'm using
jmxterm like this:
`echo 'set -b
org.apache.cassandra.db:type=ColumnFamilies,keyspace=my_ks,columnfamily=my_cf
CompactionParametersJson
\{"class":"TimeWindowCompactionStrategy","compaction_window_unit":"HOURS","compaction_windo
a way to tell when/if the local node has successfully updated the
compaction strategy? Looking at the sstable files, it seems like they are
still based on STCS but I don't know how to be sure.
Appreciate any tips or suggestions!
On Mon, Mar 13, 2017 at 5:30 PM, Mike Torra wrote:
>
I'm trying to use sstableloader to bulk load some data to my 4 DC cluster,
and I can't quite get it to work. Here is how I'm trying to run it:
sstableloader -d 127.0.0.1 -i {csv list of private ips of nodes in cluster}
myks/mttest
At first this seems to work, with a steady stream of logging like
Hi -
I am running a 29 node cluster spread over 4 DC's in EC2, using C* 3.11.1
on Ubuntu. Occasionally I have the need to restart nodes in the cluster,
but every time I do, I see errors and application (nodejs) timeouts.
I restart a node like this:
nodetool disablethrift && nodetool disablegossi
g nodes easier (or rather, we need to make drain do
> the right thing), but in this case, your data model looks like the biggest
> culprit (unless it's an incomplete recreation).
>
> - Jeff
>
>
> On Tue, Feb 6, 2018 at 10:58 AM, Mike Torra wrote:
>
>> Hi -
>
No, I am not
On Wed, Feb 7, 2018 at 11:35 AM, Jeff Jirsa wrote:
> Are you using internode ssl?
>
>
> --
> Jeff Jirsa
>
>
> On Feb 7, 2018, at 8:24 AM, Mike Torra wrote:
>
> Thanks for the feedback guys. That example data model was indeed
> abbreviated - the re
Any other ideas? If I simply stop the node, there is no latency problem,
but once I start the node the problem appears. This happens consistently
for all nodes in the cluster
On Wed, Feb 7, 2018 at 11:36 AM, Mike Torra wrote:
> No, I am not
>
> On Wed, Feb 7, 2018 at 11:35 AM, Jeff Jir
s that I moved
`nodetool disablegossip` to after `nodetool drain`. This is pretty
anecdotal, but is there any explanation for why this might happen? I'll be
monitoring my cluster closely to see if this change does indeed fix the
problem.
On Mon, Feb 12, 2018 at 9:33 AM, Mike Torra wrote:
>
Then could it be that calling `nodetool drain` after calling `nodetool
disablegossip` is what causes the problem?
On Mon, Feb 12, 2018 at 6:12 PM, kurt greaves wrote:
>
> Actually, it's not really clear to me why disablebinary and thrift are
> necessary prior to drain, because they happen in th
26 matches
Mail list logo