Thank you Jeff for the link.
Please do comment on the G1GC settings,if they are ok for the cluster.
Also comment on reducing the concurrent reads to 32 on all nodes in the
cluster.
As has earlier lead to reads getting dropped.
Will adding nodes to the cluster be helpful.
Thanks,
Rajsekhar Mallick
https://docs.datastax.com/en/developer/java-driver/3.2/manual/paging/
--
Jeff Jirsa
> On Feb 5, 2019, at 11:33 PM, Rajsekhar Mallick
> wrote:
>
> Hello Jeff,
>
> Thanks for the reply.
> We do have GC logs enabled.
> We do observe gc pauses upto 2 seconds but quite often we see this issue
Hello Jeff,
Thanks for the reply.
We do have GC logs enabled.
We do observe gc pauses upto 2 seconds but quite often we see this issue
even when the gc log reads good and clear.
JVM Flags related to G1GC:
Xms: 48G
Xmx:48G
Maxgcpausemillis=200
Parallels gc threads=32
Concurrent gc threads= 10
Ini
What you're potentially seeing is the GC impact of reading a large
partition - do you have GC logs or StatusLogger output indicating you're
pausing? What are you actual JVM flags you're using?
Given your heap size, the easiest mitigation may be significantly
increasing your key cache size (up to a
Hello Team,
Cluster Details:
1. Number of Nodes in cluster : 7
2. Number of CPU cores: 48
3. Swap is enabled on all nodes
4. Memory available on all nodes : 120GB
5. Disk space available : 745GB
6. Cassandra version: 2.1
7. Active tables are using size-tiered compaction strategy
8. Read Throughpu
other thing to notice is :
>>>>>
>>>>> system_auth WITH replication = {'class': 'SimpleStrategy',
>>>>> 'replication_factor': '1'}
>>>>>
>>>>> system_auth has a replication factor of 1 and
>
>>>> system_auth has a replication factor of 1 and even if one node is down
>>>> it may impact the system because of the replication factor.
>>>>
>>>>
>>>>
>>>> On Wed, 12 Sep 2018 at 09:46, Steinmaurer, Thomas <
>>>> Hi,
>>>>
>>>>
>>>>
>>>> I remember something that a client using the native protocol gets
>>>> notified too early by Cassandra being ready due to the following issue:
>>>>
>>>> https:/
t;>> I remember something that a client using the native protocol gets
>>> notified too early by Cassandra being ready due to the following issue:
>>>
>>> https://issues.apache.org/jira/browse/CASSANDRA-8236
>>>
>>>
>>>
>>> which looks
g ready due to the following issue:
>>
>> https://issues.apache.org/jira/browse/CASSANDRA-8236
>>
>>
>>
>> which looks similar, but above was marked as fixed in 2.2.
>>
>>
>>
>> Thomas
>>
>>
>>
>> *From:* Riccardo Fe
/jira/browse/CASSANDRA-8236
>
>
>
> which looks similar, but above was marked as fixed in 2.2.
>
>
>
> Thomas
>
>
>
> *From:* Riccardo Ferrari
> *Sent:* Mittwoch, 12. September 2018 18:25
> *To:* user@cassandra.apache.org
> *Subject:* Re: Read tim
: Mittwoch, 12. September 2018 18:25
To: user@cassandra.apache.org
Subject: Re: Read timeouts when performing rolling restart
Hi Alain,
Thank you for chiming in!
I was thinking to perform the 'start_native_transport=false' test as well and
indeed the issue is not showing up. Starting the/a
Hi Alain,
Thank you for chiming in!
I was thinking to perform the 'start_native_transport=false' test as well
and indeed the issue is not showing up. Starting the/a node with native
transport disabled and letting it cool down lead to no timeout exceptions
no dropped messages, simply a crystal cle
odetool drain'
> * Then 'service cassandra restart'
>
> so far so good. The load incerase on the other 5 nodes is negligible.
> The node is generally out of service just for the time of the restart (ie.
> cassandra.yml update)
>
> When the node comes back u
of service just for the time of the restart (ie.
cassandra.yml update)
When the node comes back up and switch on the native transport I start see
lots of read timeouts in our various services:
com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout
during read query at cons
Thank you Jeff.
We are at Cassandra 3.0.10
Will look forward to upgrade or enable driver logging.
> On May 16, 2017, at 11:44 AM, Jeff Jirsa wrote:
>
>
>
> On 2017-05-16 08:53 (-0700), Nitan Kainth wrote:
>> Hi,
>>
>> We see read timeouts intermittently
On 2017-05-16 08:53 (-0700), Nitan Kainth wrote:
> Hi,
>
> We see read timeouts intermittently. Mostly after they have occurred.
> Timeouts are not consistent and does not occur in 100s at a moment.
>
> 1. Does read timeout considered as Dropped Mutation?
No, a dro
Hi,
We see read timeouts intermittently. Mostly after they have occurred. Timeouts
are not consistent and does not occur in 100s at a moment.
1. Does read timeout considered as Dropped Mutation?
2. What is best way to nail down exact issue of scattered timeouts?
Thank you
onfiguration?
>>
>> The cell count is the number of triplets: (name, value, timestamp)
>>
>> Also, I see that you have set sstable_size_in_mb at 50 MB. What is the
>> rational behind this? (Yes I'm curious :-) ). Anyway your "SSTables per
>> read&quo
stograms your data maybe just skewed
(most likely explanation is probably the correct one here)
Regard,
Ryan Svihla
_____
From: Joseph Tech
Sent: Wednesday, August 31, 2016 11:16 PM
Subject: Re: Read timeouts on primary key queries
To:
Patrick,
The desc table is bel
each other. On the
> surface, don't see anything which indicates too much skewing (assuming
> skewing ==keys spread across many SSTables) . Please confirm. Related to
> this, what does the "cell count" metric indicate ; didn't find a clear
> explanation in the documents
correct one here)
Regard,
Ryan Svihla
_____
From: Joseph Tech
Sent: Wednesday, August 31, 2016 11:16 PM
Subject: Re: Read timeouts on primary key queries
To:
Patrick,
The desc table is below (only col names changed) :
CREATE TABLE db.tbl ( id1 text, id2 text,
the cfhistograms run within few mins of each other. On the
>> surface, don't see anything which indicates too much skewing (assuming
>> skewing ==keys spread across many SSTables) . Please confirm. Related to
>> this, what does the "cell count" metric indicate ; didn
gt;
> Thanks,
> Joseph
>
>
> On Thu, Sep 1, 2016 at 6:30 PM, Ryan Svihla wrote:
>
> Have you looked at cfhistograms/tablehistograms your data maybe just
> skewed (most likely explanation is probably the correct one here)
>
> Regard,
>
> Ryan Svihla
>
&g
___
From: Joseph Tech
Sent: Wednesday, August 31, 2016 11:16 PM
Subject: Re: Read timeouts on primary key queries
To:
Patrick,
The desc table is below (only col names changed) :
CREATE TABLE db.tbl ( id1 text, id2 text, id3 text, id4 text, f1
text, f2 map, f3 map, c
ost likely explanation is probably the correct one here)
>
> Regard,
>
> Ryan Svihla
>
> _
> From: Joseph Tech
> Sent: Wednesday, August 31, 2016 11:16 PM
> Subject: Re: Read timeouts on primary key queries
> To:
>
>
>
>
anation is probably the correct one here)
Regard,
Ryan Svihla
_
From: Joseph Tech
Sent: Wednesday, August 31, 2016 11:16 PM
Subject: Re: Read timeouts on primary key queries
To:
Patrick,
The desc table is below (only col names changed) :
CREATE TABLE db.tbl (
gt;
> Regard,
>
> Ryan Svihla
>
> _
> From: Joseph Tech
> Sent: Wednesday, August 31, 2016 11:16 PM
> Subject: Re: Read timeouts on primary key queries
> To:
>
>
>
> Patrick,
>
> The desc table is below (only col names changed) :
>
> CREATE TABLE db.tbl
Have you looked at cfhistograms/tablehistograms your data maybe just skewed
(most likely explanation is probably the correct one here)
Regard,
Ryan Svihla
_
From: Joseph Tech
Sent: Wednesday, August 31, 2016 11:16 PM
Subject: Re: Read timeouts on
Patrick,
The desc table is below (only col names changed) :
CREATE TABLE db.tbl (
id1 text,
id2 text,
id3 text,
id4 text,
f1 text,
f2 map,
f3 map,
created timestamp,
updated timestamp,
PRIMARY KEY (id1, id2, id3, id4)
) WITH CLUSTERING ORDER BY (id2 ASC, id
If you are getting a timeout on one table, then a mismatch of RF and node
count doesn't seem as likely.
Time to look at your query. You said it was a 'select * from table where
key=?' type query. I would next use the trace facility in cqlsh to
investigate further. That's a good way to find hard to
On further analysis, this issue happens only on 1 table in the KS which has
the max reads.
@Atul, I will look at system health, but didnt see anything standing out
from GC logs. (using JDK 1.8_92 with G1GC).
@Patrick , could you please elaborate the "mismatch on node count + RF"
part.
On Tue, Au
There could be many reasons for this if it is intermittent. CPU usage + I/O
wait status. As read are I/O intensive, your IOPS requirement should be met
that time load. Heap issue if CPU is busy for GC only. Network health could
be the reason. So better to look system health during that time when it
Hi Patrick,
The nodetool status shows all nodes up and normal now. From OpsCenter
"Event Log" , there are some nodes reported as being down/up etc. during
the timeframe of timeout, but these are Search workload nodes from the
remote (non-local) DC. The RF is 3 and there are 9 nodes per DC.
Thanks
You aren't achieving quorum on your reads as the error is explains. That
means you either have some nodes down or your topology is not matching up.
The fact you are using LOCAL_QUORUM might point to a datacenter mis-match
on node count + RF.
What does your nodetool status look like?
Patrick
On M
Hi,
We recently started getting intermittent timeouts on primary key queries
(select * from table where key=)
The error is : com.datastax.driver.core.exceptions.ReadTimeoutException:
Cassandra timeout during read query at consistency LOCAL_QUORUM (2
responses were required but only 1 replica
a re
Emils,
We believe we've tracked it down to the following issue:
https://issues.apache.org/jira/browse/CASSANDRA-11302, introduced in 2.1.5.
We are running a build of 2.2.5 with that patch and so far have not seen
any more timeouts.
Mike
On Fri, Mar 4, 2016 at 3:14 AM, Emīls Šolmanis
wrote:
>
Mike,
Is that where you've bisected it to having been introduced?
I'll see what I can do, but doubt it, since we've long since upgraded prod
to 2.2.4 (and stage before that) and the tests I'm running were for a new
feature.
On Fri, 4 Mar 2016 03:54 Mike Heffner, wrote:
> Emils,
>
> I realize t
Emils,
I realize this may be a big downgrade, but are you timeouts reproducible
under Cassandra 2.1.4?
Mike
On Thu, Feb 25, 2016 at 10:34 AM, Emīls Šolmanis
wrote:
> Having had a read through the archives, I missed this at first, but this
> seems to be *exactly* like what we're experiencing.
>
We have had similar issues sometimes.
Usually the problem was that failing queries where reading the same
partition that another query still running and that partition is too big.
The fact that is reading the same partition is why your query works upon
retry. The fact that the partition (or the r
Having had a read through the archives, I missed this at first, but this
seems to be *exactly* like what we're experiencing.
http://www.mail-archive.com/user@cassandra.apache.org/msg46064.html
Only difference is we're getting this for reads and using CQL, but the
behaviour is identical.
On Thu,
Hello,
We're having a problem with concurrent requests. It seems that whenever we
try resolving more
than ~ 15 queries at the same time, one or two get a read timeout and then
succeed on a retry.
We're running Cassandra 2.2.4 accessed via the 2.1.9 Datastax driver on AWS.
What we've found while
On Tue, Aug 5, 2014 at 11:53 AM, Clint Kelly wrote:
> Ah FWIW I was able to reproduce the problem by reducing
> "range_request_timeout_in_ms." This is great since I want to increase
> the timeout for batch jobs where we scan a large set of rows, but
> leave the timeout for single-row queries alo
gt;
>>> Allow me to rephrase a question I asked last week. I am performing some
>>> queries with ALLOW FILTERING and getting consistent read timeouts like the
>>> following:
>>
>>
>> ALLOW FILTERING should be renamed PROBABLY TIMEOUT in order to proper
cy.
Thanks for your help!
Best regards,
Clint
On Tue, Aug 5, 2014 at 10:54 AM, Robert Coli wrote:
> On Tue, Aug 5, 2014 at 10:01 AM, Clint Kelly wrote:
>>
>> Allow me to rephrase a question I asked last week. I am performing some
>> queries with ALLOW FILTERING and getti
g 5, 2014 at 10:01 AM, Clint Kelly
> wrote:
>
>> Allow me to rephrase a question I asked last week. I am performing some
>> queries with ALLOW FILTERING and getting consistent read timeouts like the
>> following:
>>
>
> ALLOW FILTERING should be rename
On Tue, Aug 5, 2014 at 10:01 AM, Clint Kelly wrote:
> Allow me to rephrase a question I asked last week. I am performing some
> queries with ALLOW FILTERING and getting consistent read timeouts like the
> following:
>
ALLOW FILTERING should be renamed PROBABLY TIMEOUT in order
Hi all,
Allow me to rephrase a question I asked last week. I am performing some
queries with ALLOW FILTERING and getting consistent read timeouts like the
following:
com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra
timeout during read query at consistency ONE (1 responses
ansky
>
> -Original Message- From: Duncan Sands
> Sent: Saturday, August 2, 2014 7:04 AM
> To: user@cassandra.apache.org
> Subject: Re: Occasional read timeouts seen during row scans
>
>
> Hi Clint, is time correctly synchronized between your nodes?
>
> Ciao, Duncan
: user@cassandra.apache.org
Subject: Re: Occasional read timeouts seen during row scans
Hi Clint, is time correctly synchronized between your nodes?
Ciao, Duncan.
On 02/08/14 02:12, Clint Kelly wrote:
BTW a few other details, sorry for omitting these:
* We are using version 2.0.4 of the Java
page size (even reducing it down to a single
record) and that didn't seem to help (in the cases where I was observing the
timeout)
Best regards,
Clint
On Fri, Aug 1, 2014 at 5:02 PM, Clint Kelly mailto:clint.ke...@gmail.com>> wrote:
Hi everyone,
I am seeing occa
I was
observing the timeout)
Best regards,
Clint
On Fri, Aug 1, 2014 at 5:02 PM, Clint Kelly wrote:
> Hi everyone,
>
> I am seeing occasional read timeouts during multi-row queries, but I'm
> having difficulty reproducing them or understanding what the problem
> is.
&
Hi everyone,
I am seeing occasional read timeouts during multi-row queries, but I'm
having difficulty reproducing them or understanding what the problem
is.
First, some background:
Our team wrote a custom MapReduce InputFormat that looks pretty
similar to the DataStax InputFormat except th
Looks like the read timeouts were a result of a bug that will be fixed in
2.0.3.
I found this question on the Datastax Java Driver mailing list:
https://groups.google.com/a/lists.datastax.com/forum/#!topic/java-driver-user/ao1ohSLpjRM
which led me to:
https://issues.apache.org/jira/browse
It seems that with NTP properly configured, the replication is now working
as expected, but there are still a lot of read timeouts. The
troubleshooting continues...
On Tue, Nov 19, 2013 at 8:53 AM, Steven A Robenalt wrote:
> Thanks Michael, I will try that out.
>
>
> On Tue, Nov 1
l,
>>
>> I am attempting to bring up our new app on a 3-node cluster and am having
>> problems with frequent read timeouts and slow inter-node replication.
>> Initially, these errors were mostly occurring in our app server, affecting
>> 0.02%-1.0% of our queries in an ot
We had a similar problem when our nodes could not sync using ntp due to VPC
ACL settings. -ml
On Mon, Nov 18, 2013 at 8:49 PM, Steven A Robenalt wrote:
> Hi all,
>
> I am attempting to bring up our new app on a 3-node cluster and am having
> problems with frequent read timeouts an
Hi all,
I am attempting to bring up our new app on a 3-node cluster and am having
problems with frequent read timeouts and slow inter-node replication.
Initially, these errors were mostly occurring in our app server, affecting
0.02%-1.0% of our queries in an otherwise unloaded cluster. No
t (TimedOutException), so I don't
>> think it was a network problem.
>>
>> The number of read timeouts diminished as the number of upgraded nodes
>> increased, until it reached stability. The logs were showing the following
>> messages periodically:
>&g
think it was a network problem.
>
> The number of read timeouts diminished as the number of upgraded nodes
> increased, until it reached stability. The logs were showing the following
> messages periodically:
>
> ...
> Two similar issues were reported, but without satisfactory
Hello,
I have isolated one of our data centers to simulate a rolling restart
upgrade from C* 1.1.10 to 1.2.10. We replayed our production traffic to the
C* nodes during the upgrade and observed an increased number of read
timeouts during the upgrade process.
I executed nodetool drain before
On Mon, 18 Jun 2012 11:57:17 -0700, Gurpreet Singh wrote:
> Thanks for all the information Holger.
>
> Will do the jvm updates, kernel updates will be slow to come by. I see
> that with disk access mode standard, the performance is stable and better
> than in mmap mode, so i will probably stick t
Soory i mistaken,here is right string
INFO [main] 2012-06-14 02:03:14,520 CLibrary.java (line 109) JNA
mlockall successful
2012/6/15 ruslan usifov :
> 2012/6/14 Gurpreet Singh :
>> JNA is installed. swappiness was 0. vfs_cache_pressure was 100. 2 questions
>> on this..
>> 1. Is there a way to
2012/6/14 Gurpreet Singh :
> JNA is installed. swappiness was 0. vfs_cache_pressure was 100. 2 questions
> on this..
> 1. Is there a way to find out if mlockall really worked other than just the
> mlockall successful log message?
yes you must see something like this (from our test server):
INFO [
Upgrade java (version 1.6.21 have memleaks) to latest 1.6.32. Its
abnormally that on 80Gigs you have 15Gigs of index
vfs_cache_pressure - used for inodes and dentrys
Also to check that you have memleaks use drop_cache sysctl
2012/6/14 Gurpreet Singh :
> JNA is installed. swappiness was 0. vf
JNA is installed. swappiness was 0. vfs_cache_pressure was 100. 2 questions
on this..
1. Is there a way to find out if mlockall really worked other than just the
mlockall successful log message?
2. Does cassandra only mlock the jvm heap or also the mmaped memory?
I disabled mmap completely, and th
I would check /etc/sysctl.conf and get the values of
/proc/sys/vm/swappiness and /proc/sys/vm/vfs_cache_pressure.
If you don't have JNA enabled (which Cassandra uses to fadvise) and
swappiness is at its default of 60, the Linux kernel will happily swap out
your heap for cache space. Set swappines
Hm, it's very strange what amount of you data? You linux kernel
version? Java version?
PS: i can suggest switch diskaccessmode to standart in you case
PS:PS also upgrade you linux to latest, and javahotspot to 1.6.32
(from oracle site)
2012/6/13 Gurpreet Singh :
> Alright, here it goes again...
>
Alright, here it goes again...
Even with mmap_index_only, once the RES memory hit 15 gigs, the read
latency went berserk. This happens in 12 hours if diskaccessmode is mmap,
abt 48 hrs if its mmap_index_only.
only reads happening at 50 reads/second
row cache size: 730 mb, row cache hit ratio: 0.75
Aaron, Ruslan,
I changed the disk access mode to mmap_index_only, and it has been stable
ever since, well at least for the past 20 hours. Previously, in abt 10-12
hours, as soon as the resident memory was full, the client would start
timing out on all its reads. It looks fine for now, i am going to
2012/6/8 aaron morton :
> Ruslan,
> Why did you suggest changing the disk_access_mode ?
Because this bring problems on empty seat, in any case for me mmap
bring similar problem and i doesn't have find any solution to resolve
it, only change disk_access_mode:-((. For me also will be interesting
he
Ruslan,
Why did you suggest changing the disk_access_mode ?
Gurpreet,
I would leave the disk_access_mode with the default until you have a
reason to change it.
> > 8 core, 16 gb ram, 6 data disks raid0, no swap configured
is swap disabled ?
> Gradually,
> > the system cpu bec
Thanks Ruslan.
I will try the mmap_index_only.
Is there any guideline as to when to leave it to auto and when to use
mmap_index_only?
/G
On Fri, Jun 8, 2012 at 1:21 AM, ruslan usifov wrote:
> disk_access_mode: mmap??
>
> set to disk_access_mode: mmap_index_only in cassandra yaml
>
> 2012/6/8 Gur
disk_access_mode: mmap??
set to disk_access_mode: mmap_index_only in cassandra yaml
2012/6/8 Gurpreet Singh :
> Hi,
> I am testing cassandra 1.1 on a 1 node cluster.
> 8 core, 16 gb ram, 6 data disks raid0, no swap configured
>
> cassandra 1.1.1
> heap size: 8 gigs
> key cache size in mb: 800 (us
Hi,
I am testing cassandra 1.1 on a 1 node cluster.
8 core, 16 gb ram, 6 data disks raid0, no swap configured
cassandra 1.1.1
heap size: 8 gigs
key cache size in mb: 800 (used only 200mb till now)
memtable_total_space_in_mb : 2048
I am running a read workload.. about 30 reads/second. no writes at
Hi,
I have a simple keyspace:
org.apache.cassandra.locator.RackUnawareStrategy
1
org.apache.cassandra.locator.EndPointSnitch
We're using it as a data historian. Many rows of measurments, measurement
history is in the columns by milliseconds since UNIX epoch. The single nod
76 matches
Mail list logo