Re: Help debugging a very slow query

2016-01-13 Thread Jeff Jirsa
Very large partitions create a lot of garbage during reads: 
https://issues.apache.org/jira/browse/CASSANDRA-9754 - you will see significant 
GC pauses trying to read from large enough partitions. 

I suspect GC, though it’s odd that you’re unable to see it. 



From:  Bryan Cheng
Reply-To:  "user@cassandra.apache.org"
Date:  Wednesday, January 13, 2016 at 12:40 PM
To:  "user@cassandra.apache.org"
Subject:  Help debugging a very slow query

Hi list, 

Would appreciate some insight into some irregular performance we're seeing.

We have a column family that has become problematic recently. We've noticed a 
few queries take enormous amounts of time, and seem to clog up read resources 
on the machine (read pending tasks pile up, then immediately are relieved). 

I've included the output of cfhistograms on this keyspace[1]. The latencies 
sampled do not include one of these problematic partitions, but show two 
things: 1) the vast majority of queries to this table seem to be healthy, and 
2) that the maximum partition size is absurd (4139110981 bytes).

This particular cf is not expected to be updated beyond an initial set of 
writes, but can be read many times. The data model includes several hashes that 
amount to a few KB at most, a set that can hit ~30-40 entries, and three 
list that reach a hundred or so entries at most. There doesn't appear to 
be any material difference in the size or character of the data saved between 
"good" and "bad" partitions. Often, the same extremely slow partition queried 
with consistency ONE returns cleanly and very quickly against other replicas.

I've included a trace of one of these slow returns[2], which I find very 
strange: The vast majority of operations are very quick, but the final step is 
extremely slow. Nothing exceeds 2ms until the final "Read 1 live and 0 
tombstone cells" which takes a whopping 69 seconds [!!]. We've checked our 
garbage collection in this time period and have not noticed any significant 
collections.

As far as I can tell, the trace doesn't raise any red flags, and we're largely 
stumped.

We've got two main questions:

1) What's up with the megapartition? What's the best way to debug this? Our 
data model is largely write once, we don't do any updates. We do DELETE, but 
the partitions that are giving us issues haven't been removed. We had some 
suspicions on https://issues.apache.org/jira/browse/CASSANDRA-10547, but that 
seems to largely be triggered by UPDATE operations.

2) What could cause the Read to take such an absurd amount of time when it's a 
pair of sstables and the memtable being examined, and its just a single cell 
being read? We originally suspected just memory pressure from huge sstables, 
but without a corresponding GC this seems unlikely?

Any ideas?

Thanks in advance!

--Bryan


[1] 
Percentile  SSTables Write Latency  Read LatencyPartition Size  
  Cell Count
  (micros)  (micros)   (bytes)  

50% 1.00 35.00 72.00  1109  
  14
75% 1.00 50.00149.00  1331  
  17
95% 1.00 72.00924.00  4768  
  35
98% 2.00103.00   1597.00  9887  
  72
99% 2.00149.00   1597.00 14237  
 103
Min 0.00 15.00 25.0043  
   0
Max 2.00258.00   6866.004139110981  
   20501

[2]
[
  {
"Sessionid": "4f51fa70-ba2f-11e5-8729-e1d125cb9b2d",
"Eventid": "4f524890-ba2f-11e5-8729-e1d125cb9b2d",
"Activity": "Parsing select * from pooltx where hash = 
0x5f805c68d66e7d271361e7774a7eeec0591eb5197d4f420126cea83171f0a8ff;",
"Source": "172.31.54.46",
"SourceElapsed": 26
  },
  {
"Sessionid": "4f51fa70-ba2f-11e5-8729-e1d125cb9b2d",
"Eventid": "4f526fa0-ba2f-11e5-8729-e1d125cb9b2d",
"Activity": "Preparing statement",
"Source": "172.31.54.46",
"SourceElapsed": 79
  },
  {
"Sessionid": "4f51fa70-ba2f-11e5-8729-e1d125cb9b2d",
"Eventid": "4f52bdc0-ba2f-11e5-8729-e1d125cb9b2d",
"Activity": "Executing single-partition query on pooltx",
"Source": "172.31.54.46",
"SourceElapsed": 1014
  },
  {
"Sessionid": "4f51fa70-ba2f-11e5-8729-e1d125cb9b2d",
"Eventid": "4f52e4d0-ba2f-11e5-8729-e1d125cb9b2d",
"Activity": "Acquiring sstable references",
"Source&q

Re: Help debugging a very slow query

2016-01-13 Thread Robert Coli
On Wed, Jan 13, 2016 at 12:40 PM, Bryan Cheng  wrote:

> 1) What's up with the megapartition? What's the best way to debug this?
> Our data model is largely write once, we don't do any updates. We do
> DELETE, but the partitions that are giving us issues haven't been removed.
> We had some suspicions on
> https://issues.apache.org/jira/browse/CASSANDRA-10547, but that seems to
> largely be triggered by UPDATE operations.
>

Modern versions of Cassandra log the partition key of large partitions when
they are compacted.

Assuming no data model problems, large partitions are frequently the
partition for the key 'None' or some other application level error.

=Rob


Help debugging a very slow query

2016-01-13 Thread Bryan Cheng
Hi list,

Would appreciate some insight into some irregular performance we're seeing.

We have a column family that has become problematic recently. We've noticed
a few queries take enormous amounts of time, and seem to clog up read
resources on the machine (read pending tasks pile up, then immediately are
relieved).

I've included the output of cfhistograms on this keyspace[1]. The latencies
sampled do not include one of these problematic partitions, but show two
things: 1) the vast majority of queries to this table seem to be healthy,
and 2) that the maximum partition size is absurd (4139110981 bytes).

This particular cf is not expected to be updated beyond an initial set of
writes, but can be read many times. The data model includes several hashes
that amount to a few KB at most, a set that can hit ~30-40 entries,
and three list that reach a hundred or so entries at most. There
doesn't appear to be any material difference in the size or character of
the data saved between "good" and "bad" partitions. Often, the same
extremely slow partition queried with consistency ONE returns cleanly and
very quickly against other replicas.

I've included a trace of one of these slow returns[2], which I find very
strange: The vast majority of operations are very quick, but the final step
is extremely slow. Nothing exceeds 2ms until the final "Read 1 live and 0
tombstone cells" which takes a whopping 69 seconds [!!]. We've checked our
garbage collection in this time period and have not noticed any significant
collections.

As far as I can tell, the trace doesn't raise any red flags, and we're
largely stumped.

We've got two main questions:

1) What's up with the megapartition? What's the best way to debug this? Our
data model is largely write once, we don't do any updates. We do DELETE,
but the partitions that are giving us issues haven't been removed. We had
some suspicions on https://issues.apache.org/jira/browse/CASSANDRA-10547,
but that seems to largely be triggered by UPDATE operations.

2) What could cause the Read to take such an absurd amount of time when
it's a pair of sstables and the memtable being examined, and its just a
single cell being read? We originally suspected just memory pressure from
huge sstables, but without a corresponding GC this seems unlikely?

Any ideas?

Thanks in advance!

--Bryan


[1]
Percentile  SSTables Write Latency  Read LatencyPartition Size
   Cell Count
  (micros)  (micros)   (bytes)

50% 1.00 35.00 72.00  1109
   14
75% 1.00 50.00149.00  1331
   17
95% 1.00 72.00924.00  4768
   35
98% 2.00103.00   1597.00  9887
   72
99% 2.00149.00   1597.00 14237
  103
Min 0.00 15.00 25.0043
0
Max 2.00258.00   6866.004139110981
20501

[2]

[
  {
"Sessionid": "4f51fa70-ba2f-11e5-8729-e1d125cb9b2d",
"Eventid": "4f524890-ba2f-11e5-8729-e1d125cb9b2d",
"Activity": "Parsing select * from pooltx where hash =
0x5f805c68d66e7d271361e7774a7eeec0591eb5197d4f420126cea83171f0a8ff;",
"Source": "172.31.54.46",
"SourceElapsed": 26
  },
  {
"Sessionid": "4f51fa70-ba2f-11e5-8729-e1d125cb9b2d",
"Eventid": "4f526fa0-ba2f-11e5-8729-e1d125cb9b2d",
"Activity": "Preparing statement",
"Source": "172.31.54.46",
"SourceElapsed": 79
  },
  {
"Sessionid": "4f51fa70-ba2f-11e5-8729-e1d125cb9b2d",
"Eventid": "4f52bdc0-ba2f-11e5-8729-e1d125cb9b2d",
"Activity": "Executing single-partition query on pooltx",
"Source": "172.31.54.46",
"SourceElapsed": 1014
  },
  {
"Sessionid": "4f51fa70-ba2f-11e5-8729-e1d125cb9b2d",
"Eventid": "4f52e4d0-ba2f-11e5-8729-e1d125cb9b2d",
"Activity": "Acquiring sstable references",
"Source": "172.31.54.46",
"SourceElapsed": 1016
  },
  {
"Sessionid": "4f51fa70-ba2f-11e5-8729-e1d125cb9b2d",
"Eventid": "4f530be0-ba2f-11e5-8729-e1d125cb9b2d",
"Activity": "Merging memtable tombstones",
"Source": "172.31.54.46",
"SourceElapsed": 1029
  },
  {
"Sessionid": "4f51fa70-ba2f-11e5-8729-e1d125cb9b2d",
"Eventid": "4f5332f0-ba2f-11e5-8729-e1d125cb9b2d",
"Activity": "Bloom filter allows skipping sstable 387133",
"Source": "172.31.54.46",
"SourceElapsed": 1040
  },
  {
"Sessionid": "4f51fa70-ba2f-11e5-8729-e1d125cb9b2d",
"Eventid": "4f535a00-ba2f-11e5-8729-e1d125cb9b2d",
"Activity": "Key cache hit for sstable 386331",
"Source": "172.31.54.46",
"SourceElapsed": 1046
  },
  {
"Sessionid": "4f51fa70-ba2f-11e5-8729-e1d125cb9b2d",
"Eventid": "4f538110-ba2f-11e5-8729-e1d125cb9b2d",
"Activity": "Seeking to