> * select id from table where token(id) > token(some_value) and > secondary_index = other_val limit 2 allow filtering; > > Filtering absolutely kills the performance. On a table populated with 130.000 > records, single node Cassandra server (on my i7 notebook, 2GB of JVM heap) > and secondary index built on column with low cardinality of its value set > this query takes 156 seconds to finish. Yes, this is why you have to add allow_filtering. You are asking the nodes to read all the data that matches and filter in memory, that’s a SQL type operation.
Your example query is somewhat complex and I doubt it could get decent performance, what does the query plan look like? IMHO you need to do further de-normalisation, you will get the best performance when you select rows by their full or part primary key. > By the way, the performance is order of magnitude better if this patch is > applied: That looks like it’s tuned to your specific need, it would ignore the max results included in the query. > * select id from table; > > As we saw in the trace log, the query - although it queries just row ids - > scans all columns of all the rows and (probably) compares TTL with current > time (?) (we saw hundreds of thousands of gettimeofday(2)). This means that > if the table somehow mixes wide and narrow rows, the performance suffers > horribly. Select all rows from a table requires a range scan, which reads all rows from all nodes. It should never be used production. Not sure what you mean by “scans all columns from all rows” a select by column name will use a SliceByNamesReadCommand which will only read the required columns from each SSTable (it normally short circuits though and read from less). if there is a TTL the ExpiringColumn.localExpirationTime must be checked, if there is no TTL it will no be checked. > As Cassandra checks all the columns in selects, performance suffers badly if > the collection is of any interesting size. This is not true, could you provide an example where you think this is happening ? > Additionally, we saw various random irreproducible freezes, high CPU > consumption when nothing happens (even with trace log level set no activity > was reported) and highly inpredictable performance characteristics after > nodetool flush and/or major compaction. What was the HW platform and what was the load ? Typically freezes in the server correlate to JVM GC, the JVM GC can also be using the CPU. If you have wide rows or make large reads you may run into more JVM GC issues. nodetool flush will (as it says) flush all the tables to disk, if you have a lot tables and/or a lot of secondary indexes this can cause the switch lock to be held preventing write threads from progressing. Once flush threads stop waiting on the flush queue the lock will be released. See the help for memtable_flush_queue_size in the yaml file. major compaction is not recommended to be used in production. If you are seeing it cause performance problems I would guess it is related to JVM GC and/or the disk IO is not able to keep up. When used it creates a single SSTable for each table which will not be compacted again until (default) 3 other large SSTables are created or you run major compaction again. For this reason it is not recommended. > Conclusions: > > - do not use collections > - do not use secondary indexes > - do not use filtering > - have your rows as narrow as possible if you need any kind of all row keys > traversal These features all have a use, but it looks like you leaned on them heavily while creating a relational model. Specially the filtering, you have to explicitly enable it to prevent the client sending queries that will take a long time. The only time row key traversal is used normally is reading data through hadoop. You should always strive to read row(s) from a table by the full or partial primary key. > With these conclusions in mind, CQL seems redundant, plain old thrift may be > used, joins should be done client side and/or all indexes need to be handled > manually. Correct? No. CQL provide a set of functionality not present in the thrift API. Joins and indexes should generally be handled by denormlaising the data during writes. It sounds like your data model was too relational, you need to denormalise and read rows by primary key. Secondary indexes are useful when you have a query pattern that is used infrequently. Hope that helps. ----------------- Aaron Morton New Zealand @aaronmorton Co-Founder & Principal Consultant Apache Cassandra Consulting http://www.thelastpickle.com On 18/12/2013, at 3:47 am, Ondřej Černoš <[email protected]> wrote: > Hi all, > > we are reimplementing a legacy interface of an inventory-like service > (currently built on top of mysql) on Cassandra and I thought I would share > some findings with the list. The interface semantics is given and cannot be > changed. We chose Cassandra due to its multiple datacenter capabilities and > no-spof qualities. The dataset is small (6 tables having 150.000 records, a > bunch of tables with up to thousands of records), but the model is not > trivial - the mysql model has some 20+ tables, joins are frequent, m:n > relationships are frequent and the like. The interface is read heavy. We > thought the size of the dataset should allow the whole dataset to fit into > memory of each node (3 node cluster in each DC, 3 replicas, local quorum > operations) and that even though some operations (like secondary index > lookup) are not superfast, due to the size of the dataset it should perform > ok. We were wrong. > > We use CQL3 exclusively and we use all of its capabilities (collections, > secondary indexes, filtering), because they make the data model maintainable. > We denormalised what had to be denormalised in order to avoid client side > joins. Usual query to the storage means one CQL query on a denormalised > table. We need to support integer offset/limit paging, filter-by-example kind > of queries, M:N relationship queries and all the usual suspects of old > SQL-backed interface. > > This is the list of operations that perform really poorly we identified so > far. Row id is called id in the following: > > * select id from table where token(id) > token(some_value) and > secondary_index = other_val limit 2 allow filtering; > > Filtering absolutely kills the performance. On a table populated with 130.000 > records, single node Cassandra server (on my i7 notebook, 2GB of JVM heap) > and secondary index built on column with low cardinality of its value set > this query takes 156 seconds to finish. > > By the way, the performance is order of magnitude better if this patch is > applied: > > diff --git > a/src/java/org/apache/cassandra/db/index/composites/CompositesSearcher.java > b/src/java/org/apache/cassandra/db/index/composites/CompositesSearcher.java > index 5ab1df6..13af671 100644 > --- > a/src/java/org/apache/cassandra/db/index/composites/CompositesSearcher.java > +++ > b/src/java/org/apache/cassandra/db/index/composites/CompositesSearcher.java > @@ -190,7 +190,8 @@ public class CompositesSearcher extends > SecondaryIndexSearcher > > private int meanColumns = > Math.max(index.getIndexCfs().getMeanColumns(), 1); > // We shouldn't fetch only 1 row as this provides buggy paging in > case the first row doesn't satisfy all clauses > - private final int rowsPerQuery = > Math.max(Math.min(filter.maxRows(), filter.maxColumns() / meanColumns), 2); > +// private final int rowsPerQuery = > Math.max(Math.min(filter.maxRows(), filter.maxColumns() / meanColumns), 2); > + private final int rowsPerQuery = 100000; > > public boolean needsFiltering() > { > > * select id from table; > > As we saw in the trace log, the query - although it queries just row ids - > scans all columns of all the rows and (probably) compares TTL with current > time (?) (we saw hundreds of thousands of gettimeofday(2)). This means that > if the table somehow mixes wide and narrow rows, the performance suffers > horribly. > > * CQL collections > > See the point above with mixing wide rows and narrow rows. As Cassandra > checks all the columns in selects, performance suffers badly if the > collection is of any interesting size. > > Additionally, we saw various random irreproducible freezes, high CPU > consumption when nothing happens (even with trace log level set no activity > was reported) and highly inpredictable performance characteristics after > nodetool flush and/or major compaction. > > Conclusions: > > - do not use collections > - do not use secondary indexes > - do not use filtering > - have your rows as narrow as possible if you need any kind of all row keys > traversal > > With these conclusions in mind, CQL seems redundant, plain old thrift may be > used, joins should be done client side and/or all indexes need to be handled > manually. Correct? > > Thanks for reading, > > ondrej cernos
