Can you try to isolate this to a reproducible test case or script and open
a jira ticket at https://issues.apache.org/jira/browse/CASSANDRA?

On Wed, Nov 11, 2015 at 2:54 PM, Brice Figureau <
brice+cassan...@daysofwonder.com> wrote:

> Hi,
>
> Following my previous Read query timing out, I'm now running in another
> timeout issue, on cassandra 2.1.11.
>
> Still with the same schema from the Akka Persistence Cassandra journal:
> CREATE TABLE akka.messages (
>     persistence_id text,
>     partition_nr bigint,
>     sequence_nr bigint,
>     message blob,
>     used boolean static,
>     PRIMARY KEY ((persistence_id, partition_nr), sequence_nr)
> ) WITH CLUSTERING ORDER BY (sequence_nr ASC)
>     AND bloom_filter_fp_chance = 0.01
>     AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
>     AND comment = ''
>     AND compaction = {'class':
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'}
>     AND compression = {'sstable_compression':
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>     AND dclocal_read_repair_chance = 0.1
>     AND default_time_to_live = 0
>     AND gc_grace_seconds = 216000
>     AND max_index_interval = 2048
>     AND memtable_flush_period_in_ms = 0
>     AND min_index_interval = 128
>     AND read_repair_chance = 0.0
>     AND speculative_retry = '99.0PERCENTILE';
>
>
> The following query:
> SELECT used from akka.messages WHERE
>       persistence_id = 'player-SW11f03e20b8802000' AND
>       partition_nr = 0;
>
> times out, or when the timeout is increased (or using a faster cassandra
> cluster), it reports the following trace:
>
>  activity
>                           | timestamp                  | source         |
> source_elapsed
>
> -----------------------------------------------------------------------------------------------------+----------------------------+----------------+----------------
>
>         Execute CQL3 query | 2015-11-11 19:38:34.424000 | 192.168.169.10 |
>             0
>               READ message received from /192.168.169.10
> [MessagingService-Incoming-/192.168.169.10] | 2015-11-11 19:38:31.621000
> | 192.168.169.20 |             30
>                                   Executing single-partition query on
> messages [SharedPool-Worker-1] | 2015-11-11 19:38:31.623000 |
> 192.168.169.20 |            221
>                                                   Acquiring sstable
> references [SharedPool-Worker-1] | 2015-11-11 19:38:31.624000 |
> 192.168.169.20 |            237
>                                                    Merging memtable
> tombstones [SharedPool-Worker-1] | 2015-11-11 19:38:31.625000 |
> 192.168.169.20 |            270
>                                                   Key cache hit for
> sstable 15 [SharedPool-Worker-1] | 2015-11-11 19:38:31.626000 |
> 192.168.169.20 |            438
>                                    Seeking to partition beginning in data
> file [SharedPool-Worker-1] | 2015-11-11 19:38:31.627000 | 192.168.169.20 |
>           445
>      Skipped 0/1 non-slice-intersecting sstables, included 0 due to
> tombstones [SharedPool-Worker-1] | 2015-11-11 19:38:31.628000 |
> 192.168.169.20 |            876
>                                     Merging data from memtables and 1
> sstables [SharedPool-Worker-1] | 2015-11-11 19:38:31.628000 |
> 192.168.169.20 |            884
>
> Parsing  [SharedPool-Worker-1] | 2015-11-11 19:38:34.424000 |
> 192.168.169.10 |             83
>                                                            Preparing
> statement [SharedPool-Worker-1] | 2015-11-11 19:38:34.424000 |
> 192.168.169.10 |            273
>                                              reading data from /
> 192.168.169.20 [SharedPool-Worker-1] | 2015-11-11 19:38:34.425000 |
> 192.168.169.10 |            766
>                  Sending READ message to /192.168.169.20
> [MessagingService-Outgoing-/192.168.169.20] | 2015-11-11 19:38:34.425000
> | 192.168.169.10 |            920
>                                            Read 101 live and 0 tombstone
> cells [SharedPool-Worker-1] | 2015-11-11 19:38:37.837000 | 192.168.169.20
> |         215791
>                                          Enqueuing response to /
> 192.168.169.10 [SharedPool-Worker-1] | 2015-11-11 19:38:37.850000 |
> 192.168.169.20 |         228498
>      Sending REQUEST_RESPONSE message to /192.168.169.10
> [MessagingService-Outgoing-/192.168.169.10] | 2015-11-11 19:38:37.851000
> | 192.168.169.20 |         228572
>   REQUEST_RESPONSE message received from /192.168.169.20
> [MessagingService-Incoming-/192.168.169.20] | 2015-11-11 19:38:40.754000
> | 192.168.169.10 |         330080
>                                       Processing response from /
> 192.168.169.20 [SharedPool-Worker-2] | 2015-11-11 19:38:40.754000 |
> 192.168.169.10 |         330177
>
>           Request complete | 2015-11-11 19:38:40.813963 | 192.168.169.10 |
>        389963
>
> This specific key has about 1900 records of around 50/100 bytes each which
> makes it quite large (compared to others), and the `used` static column is
> True.
>
> I know this is a C* anti-pattern, but regularly, smaller (older)
> `sequence_nr` are deleted.
> I think this isn't a problem since most of the read requests are bounded
> by sequence_nr (and are pretty fast), so there are certainly many
> tombstones (even though the trace above doesn't tell that).
>
> What's strange is that it seems the query scans the whole set of records,
> even though it should return only the static column (whose by definition
> has only one value indepedently of the number of records), so it should be
> pretty fast, isn't it?
>
> Note that using `SELECT DISTINCT` doesn't seem to change anything
> regarding speed (I understand that it is the recommended way of doing this
> kind of queries).
>
> Anyone can explain me how this problem can be solved, or what could be its
> root cause?
>
> Thanks for any answers,
> --
> Brice Figureau
>



-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Reply via email to