> I don't know. How do I find out? The only mention about query plan in > Cassandra I found is your article on your site, from 2011 and considering > version 0.8.
See the help for TRACE in cqlsh My general approach is to solve problems with the read path by making changes to the write path. So I would normally say make a new table to store the data you want to read, or change the layout of a table to me more flexible. Can you provide the table definition and the query you are using ? Cheers ----------------- Aaron Morton New Zealand @aaronmorton Co-Founder & Principal Consultant Apache Cassandra Consulting http://www.thelastpickle.com On 15/01/2014, at 9:48 am, Ondřej Černoš <[email protected]> wrote: > Hi, > > thanks for the answer and sorry for the delay. Let me answer inline. > > > On Wed, Dec 18, 2013 at 4:53 AM, Aaron Morton <[email protected]> wrote: > > * select id from table where token(id) > token(some_value) and > > secondary_index = other_val limit 2 allow filtering; > > > > Filtering absolutely kills the performance. On a table populated with > > 130.000 records, single node Cassandra server (on my i7 notebook, 2GB of > > JVM heap) and secondary index built on column with low cardinality of its > > value set this query takes 156 seconds to finish. > Yes, this is why you have to add allow_filtering. You are asking the nodes to > read all the data that matches and filter in memory, that’s a SQL type > operation. > > Your example query is somewhat complex and I doubt it could get decent > performance, what does the query plan look like? > > I don't know. How do I find out? The only mention about query plan in > Cassandra I found is your article on your site, from 2011 and considering > version 0.8. > > The example query gets computed in a fraction of the time if I perform just > the fetch of all rows matching the token function and perform the filtering > client side. > > > IMHO you need to do further de-normalisation, you will get the best > performance when you select rows by their full or part primary key. > > I denormalize all the way I can. The problem is I need to support paging and > filtering at the same time. The API I must support allows filtering by > example and paging - so how should I denormalize? Should I somehow manage > pages of primary row keys manually? Or should I have manual secondary index > and page somehow in the denormalized wide row? > > The trouble goes even further, even this doesn't perform well: > > select id from table where token(id) > token(some_value) and pk_cluster = > 'val' limit N; > > where id and pk_cluster are primary key (CQL3 table). I guess this should be > ordered row query and ordered column slice query, so where is the problem > with performance? > > > > By the way, the performance is order of magnitude better if this patch is > > applied: > That looks like it’s tuned to your specific need, it would ignore the max > results included in the query > > It is tuned, it only demonstrates the heuristics doesn't work well. > > > * select id from table; > > > > As we saw in the trace log, the query - although it queries just row ids - > > scans all columns of all the rows and (probably) compares TTL with current > > time (?) (we saw hundreds of thousands of gettimeofday(2)). This means that > > if the table somehow mixes wide and narrow rows, the performance suffers > > horribly. > Select all rows from a table requires a range scan, which reads all rows from > all nodes. It should never be used production. > > The trouble is I just need to perform it, sometimes. I know what the problem > with the query is, but I have just a couple of thousands records - 150.000 - > the datasets can all be stored in memory, SSTables can be fully mmapped. > There is no reason for this query to be slow in this case. > > Not sure what you mean by “scans all columns from all rows” a select by > column name will use a SliceByNamesReadCommand which will only read the > required columns from each SSTable (it normally short circuits though and > read from less). > > The query should fetch only IDs, it checks TTLs of columns though. That is > the point. Why does it do it? > > if there is a TTL the ExpiringColumn.localExpirationTime must be checked, if > there is no TTL it will no be checked. > > It is a standard CQL3 table with ID, couple of columns and a CQL3 collection. > I didn't do anything with TTL on the table and it's columns. > > > As Cassandra checks all the columns in selects, performance suffers badly > > if the collection is of any interesting size. > This is not true, could you provide an example where you think this is > happening ? > > We saw it in the trace log. It happened in the select ID from table query. > The table had a collection column. > > > Additionally, we saw various random irreproducible freezes, high CPU > > consumption when nothing happens (even with trace log level set no activity > > was reported) and highly inpredictable performance characteristics after > > nodetool flush and/or major compaction. > What was the HW platform and what was the load ? > > My I7/8GB notebook, single node cluster, and virtualised AWS like > environment, on nodes of various sizes. > > Typically freezes in the server correlate to JVM GC, the JVM GC can also be > using the CPU. > If you have wide rows or make large reads you may run into more JVM GC issues. > > nodetool flush will (as it says) flush all the tables to disk, if you have a > lot tables and/or a lot of secondary indexes this can cause the switch lock > to be held preventing write threads from progressing. Once flush threads stop > waiting on the flush queue the lock will be released. See the help for > memtable_flush_queue_size in the yaml file. > > I will check this, thanks. > > major compaction is not recommended to be used in production. If you are > seeing it cause performance problems I would guess it is related to JVM GC > and/or the disk IO is not able to keep up. When used it creates a single > SSTable for each table which will not be compacted again until (default) 3 > other large SSTables are created or you run major compaction again. For this > reason it is not recommended. > > > Conclusions: > > > > - do not use collections > > - do not use secondary indexes > > - do not use filtering > > - have your rows as narrow as possible if you need any kind of all row keys > > traversal > These features all have a use, but it looks like you leaned on them heavily > while creating a relational model. Specially the filtering, you have to > explicitly enable it to prevent the client sending queries that will take a > long time. > The only time row key traversal is used normally is reading data through > hadoop. You should always strive to read row(s) from a table by the full or > partial primary key. > > > With these conclusions in mind, CQL seems redundant, plain old thrift may > > be used, joins should be done client side and/or all indexes need to be > > handled manually. Correct? > No. > CQL provide a set of functionality not present in the thrift API. > Joins and indexes should generally be handled by denormlaising the data > during writes. > > It sounds like your data model was too relational, you need to denormalise > and read rows by primary key. Secondary indexes are useful when you have a > query pattern that is used infrequently. > > regards, > > ondrej cernos > > Hope that helps. > > ----------------- > Aaron Morton > New Zealand > @aaronmorton > > Co-Founder & Principal Consultant > Apache Cassandra Consulting > http://www.thelastpickle.com > > On 18/12/2013, at 3:47 am, Ondřej Černoš <[email protected]> wrote: > > > Hi all, > > > > we are reimplementing a legacy interface of an inventory-like service > > (currently built on top of mysql) on Cassandra and I thought I would share > > some findings with the list. The interface semantics is given and cannot be > > changed. We chose Cassandra due to its multiple datacenter capabilities and > > no-spof qualities. The dataset is small (6 tables having 150.000 records, a > > bunch of tables with up to thousands of records), but the model is not > > trivial - the mysql model has some 20+ tables, joins are frequent, m:n > > relationships are frequent and the like. The interface is read heavy. We > > thought the size of the dataset should allow the whole dataset to fit into > > memory of each node (3 node cluster in each DC, 3 replicas, local quorum > > operations) and that even though some operations (like secondary index > > lookup) are not superfast, due to the size of the dataset it should perform > > ok. We were wrong. > > > > We use CQL3 exclusively and we use all of its capabilities (collections, > > secondary indexes, filtering), because they make the data model > > maintainable. We denormalised what had to be denormalised in order to avoid > > client side joins. Usual query to the storage means one CQL query on a > > denormalised table. We need to support integer offset/limit paging, > > filter-by-example kind of queries, M:N relationship queries and all the > > usual suspects of old SQL-backed interface. > > > > This is the list of operations that perform really poorly we identified so > > far. Row id is called id in the following: > > > > * select id from table where token(id) > token(some_value) and > > secondary_index = other_val limit 2 allow filtering; > > > > Filtering absolutely kills the performance. On a table populated with > > 130.000 records, single node Cassandra server (on my i7 notebook, 2GB of > > JVM heap) and secondary index built on column with low cardinality of its > > value set this query takes 156 seconds to finish. > > > > By the way, the performance is order of magnitude better if this patch is > > applied: > > > > diff --git > > a/src/java/org/apache/cassandra/db/index/composites/CompositesSearcher.java > > b/src/java/org/apache/cassandra/db/index/composites/CompositesSearcher.java > > index 5ab1df6..13af671 100644 > > --- > > a/src/java/org/apache/cassandra/db/index/composites/CompositesSearcher.java > > +++ > > b/src/java/org/apache/cassandra/db/index/composites/CompositesSearcher.java > > @@ -190,7 +190,8 @@ public class CompositesSearcher extends > > SecondaryIndexSearcher > > > > private int meanColumns = > > Math.max(index.getIndexCfs().getMeanColumns(), 1); > > // We shouldn't fetch only 1 row as this provides buggy paging > > in case the first row doesn't satisfy all clauses > > - private final int rowsPerQuery = > > Math.max(Math.min(filter.maxRows(), filter.maxColumns() / meanColumns), 2); > > +// private final int rowsPerQuery = > > Math.max(Math.min(filter.maxRows(), filter.maxColumns() / meanColumns), 2); > > + private final int rowsPerQuery = 100000; > > > > public boolean needsFiltering() > > { > > > > * select id from table; > > > > As we saw in the trace log, the query - although it queries just row ids - > > scans all columns of all the rows and (probably) compares TTL with current > > time (?) (we saw hundreds of thousands of gettimeofday(2)). This means that > > if the table somehow mixes wide and narrow rows, the performance suffers > > horribly. > > > > * CQL collections > > > > See the point above with mixing wide rows and narrow rows. As Cassandra > > checks all the columns in selects, performance suffers badly if the > > collection is of any interesting size. > > > > Additionally, we saw various random irreproducible freezes, high CPU > > consumption when nothing happens (even with trace log level set no activity > > was reported) and highly inpredictable performance characteristics after > > nodetool flush and/or major compaction. > > > > Conclusions: > > > > - do not use collections > > - do not use secondary indexes > > - do not use filtering > > - have your rows as narrow as possible if you need any kind of all row keys > > traversal > > > > With these conclusions in mind, CQL seems redundant, plain old thrift may > > be used, joins should be done client side and/or all indexes need to be > > handled manually. Correct? > > > > Thanks for reading, > > > > ondrej cernos > >
