Trying to find out why a cassandra read is taking so long, I used tracing and
limited the number of rows. Strangely, when I query 600 rows, I get results in
~50 milliseconds. But 610 rows takes nearly 1 second!
cqlsh> select containerdefinitionid from containerdefinition limit 600;
... lots of output ...
Tracing session: 6b506cd0-83bc-11e3-96e8-e182571757d7
activity
| timestamp | source | source_elapsed
-------------------------------------------------------------------------------------------------+--------------+---------------+----------------
execute_cql3_query | 15:25:02,878 | 130.4.147.116 | 0
Parsing statement | 15:25:02,878 | 130.4.147.116 | 39
Peparing statement | 15:25:02,878 | 130.4.147.116 | 101
Determining
replicas to query | 15:25:02,878 | 130.4.147.116 | 152
Executing seq scan across 1 sstables for [min(-9223372036854775808),
min(-9223372036854775808)] | 15:25:02,879 | 130.4.147.116 | 1021
Scanned 755
rows and matched 755 | 15:25:02,933 | 130.4.147.116 | 55169
Request complete | 15:25:02,934 | 130.4.147.116 | 56300
cqlsh> select containerdefinitionid from containerdefinition limit 610;
... just about the same output and trace info, except...
Scanned 766
rows and matched 766 | 15:25:58,908 | 130.4.147.116 | 739141
There seems to be nothing unusual about the data in those particular rows: -
values are similar to those before and after. - using the COPY command I can
export the whole table and import on a different cluster and performance is
fine. - these rows are the first example, but there seem to be other places
where query time jumps as well. Whole table is only ~3000 rows but takes ~15sec
to list all primary keys.
There does seem to be something unusual about the data STORAGE: - snapshot
copied to another cluster and imported gives same results with same limits -
COPY data to CSV and then into another cluster does not, performance is great
Have tried compaction, repair, reindex, cleanup and refresh. No effect.
I realize I could "fix" by copying data out and in, but I'm trying to figure
out what is going on here to avoid it happening in production on a table too
big to fix with COPY.
Table has 17 columns, 3 indices, TEXT primary key, two LIST columns and two
TIMESTAMP columns; the rest are TEXT. Can reproduce issue with both
SimpleStrategy and DC-aware replication. Can reproduce with 4 copies of data on
4 servers, 2 copies on 2 servers and 1 copy on 2 servers (so doesn't matter if
query is performed locally or involves multiple servers). Cassandra-1.2 with
cqlsh.
Any ideas? Suggestions?