Hi, by the way, some of the issues are summarised here: https://issues.apache.org/jira/browse/CASSANDRA-6586 and here: https://issues.apache.org/jira/browse/CASSANDRA-6587.
regards, ondrej cernos On Tue, Jan 14, 2014 at 9:48 PM, Ondřej Černoš <[email protected]> wrote: > Hi, > > thanks for the answer and sorry for the delay. Let me answer inline. > > > On Wed, Dec 18, 2013 at 4:53 AM, Aaron Morton <[email protected]>wrote: > >> > * select id from table where token(id) > token(some_value) and >> secondary_index = other_val limit 2 allow filtering; >> > >> > Filtering absolutely kills the performance. On a table populated with >> 130.000 records, single node Cassandra server (on my i7 notebook, 2GB of >> JVM heap) and secondary index built on column with low cardinality of its >> value set this query takes 156 seconds to finish. >> Yes, this is why you have to add allow_filtering. You are asking the >> nodes to read all the data that matches and filter in memory, that’s a SQL >> type operation. >> >> Your example query is somewhat complex and I doubt it could get decent >> performance, what does the query plan look like? >> > > I don't know. How do I find out? The only mention about query plan in > Cassandra I found is your article on your site, from 2011 and considering > version 0.8. > > The example query gets computed in a fraction of the time if I perform > just the fetch of all rows matching the token function and perform the > filtering client side. > > > >> IMHO you need to do further de-normalisation, you will get the best >> performance when you select rows by their full or part primary key. > > > I denormalize all the way I can. The problem is I need to support paging > and filtering at the same time. The API I must support allows filtering by > example and paging - so how should I denormalize? Should I somehow manage > pages of primary row keys manually? Or should I have manual secondary index > and page somehow in the denormalized wide row? > > The trouble goes even further, even this doesn't perform well: > > select id from table where token(id) > token(some_value) and pk_cluster = > 'val' limit N; > > where id and pk_cluster are primary key (CQL3 table). I guess this should > be ordered row query and ordered column slice query, so where is the > problem with performance? > > > >> > By the way, the performance is order of magnitude better if this patch >> is applied: >> That looks like it’s tuned to your specific need, it would ignore the max >> results included in the query > > > It is tuned, it only demonstrates the heuristics doesn't work well. > > >> > * select id from table; >> > >> > As we saw in the trace log, the query - although it queries just row >> ids - scans all columns of all the rows and (probably) compares TTL with >> current time (?) (we saw hundreds of thousands of gettimeofday(2)). This >> means that if the table somehow mixes wide and narrow rows, the performance >> suffers horribly. >> Select all rows from a table requires a range scan, which reads all rows >> from all nodes. It should never be used production. >> > > The trouble is I just need to perform it, sometimes. I know what the > problem with the query is, but I have just a couple of thousands records - > 150.000 - the datasets can all be stored in memory, SSTables can be fully > mmapped. There is no reason for this query to be slow in this case. > > >> Not sure what you mean by “scans all columns from all rows” a select by >> column name will use a SliceByNamesReadCommand which will only read the >> required columns from each SSTable (it normally short circuits though and >> read from less). >> > > The query should fetch only IDs, it checks TTLs of columns though. That is > the point. Why does it do it? > > >> if there is a TTL the ExpiringColumn.localExpirationTime must be checked, >> if there is no TTL it will no be checked. > > > It is a standard CQL3 table with ID, couple of columns and a CQL3 > collection. I didn't do anything with TTL on the table and it's columns. > > >> > As Cassandra checks all the columns in selects, performance suffers >> badly if the collection is of any interesting size. >> This is not true, could you provide an example where you think this is >> happening ? > > > We saw it in the trace log. It happened in the select ID from table query. > The table had a collection column. > > >> > Additionally, we saw various random irreproducible freezes, high CPU >> consumption when nothing happens (even with trace log level set no activity >> was reported) and highly inpredictable performance characteristics after >> nodetool flush and/or major compaction. >> What was the HW platform and what was the load ? >> > > My I7/8GB notebook, single node cluster, and virtualised AWS like > environment, on nodes of various sizes. > > >> Typically freezes in the server correlate to JVM GC, the JVM GC can also >> be using the CPU. >> If you have wide rows or make large reads you may run into more JVM GC >> issues. >> > > >> nodetool flush will (as it says) flush all the tables to disk, if you >> have a lot tables and/or a lot of secondary indexes this can cause the >> switch lock to be held preventing write threads from progressing. Once >> flush threads stop waiting on the flush queue the lock will be released. >> See the help for memtable_flush_queue_size in the yaml file. >> > > I will check this, thanks. > > >> major compaction is not recommended to be used in production. If you are >> seeing it cause performance problems I would guess it is related to JVM GC >> and/or the disk IO is not able to keep up. When used it creates a single >> SSTable for each table which will not be compacted again until (default) 3 >> other large SSTables are created or you run major compaction again. For >> this reason it is not recommended. >> >> > Conclusions: >> > >> > - do not use collections >> > - do not use secondary indexes >> > - do not use filtering >> > - have your rows as narrow as possible if you need any kind of all row >> keys traversal >> These features all have a use, but it looks like you leaned on them >> heavily while creating a relational model. Specially the filtering, you >> have to explicitly enable it to prevent the client sending queries that >> will take a long time. >> The only time row key traversal is used normally is reading data through >> hadoop. You should always strive to read row(s) from a table by the full >> or partial primary key. > > > >> > With these conclusions in mind, CQL seems redundant, plain old thrift >> may be used, joins should be done client side and/or all indexes need to be >> handled manually. Correct? >> No. >> CQL provide a set of functionality not present in the thrift API. >> Joins and indexes should generally be handled by denormlaising the data >> during writes. >> >> It sounds like your data model was too relational, you need to >> denormalise and read rows by primary key. Secondary indexes are useful when >> you have a query pattern that is used infrequently. >> > > regards, > > ondrej cernos > > >> Hope that helps. >> >> ----------------- >> Aaron Morton >> New Zealand >> @aaronmorton >> >> Co-Founder & Principal Consultant >> Apache Cassandra Consulting >> http://www.thelastpickle.com >> >> On 18/12/2013, at 3:47 am, Ondřej Černoš <[email protected]> wrote: >> >> > Hi all, >> > >> > we are reimplementing a legacy interface of an inventory-like service >> (currently built on top of mysql) on Cassandra and I thought I would share >> some findings with the list. The interface semantics is given and cannot be >> changed. We chose Cassandra due to its multiple datacenter capabilities and >> no-spof qualities. The dataset is small (6 tables having 150.000 records, a >> bunch of tables with up to thousands of records), but the model is not >> trivial - the mysql model has some 20+ tables, joins are frequent, m:n >> relationships are frequent and the like. The interface is read heavy. We >> thought the size of the dataset should allow the whole dataset to fit into >> memory of each node (3 node cluster in each DC, 3 replicas, local quorum >> operations) and that even though some operations (like secondary index >> lookup) are not superfast, due to the size of the dataset it should perform >> ok. We were wrong. >> > >> > We use CQL3 exclusively and we use all of its capabilities >> (collections, secondary indexes, filtering), because they make the data >> model maintainable. We denormalised what had to be denormalised in order to >> avoid client side joins. Usual query to the storage means one CQL query on >> a denormalised table. We need to support integer offset/limit paging, >> filter-by-example kind of queries, M:N relationship queries and all the >> usual suspects of old SQL-backed interface. >> > >> > This is the list of operations that perform really poorly we identified >> so far. Row id is called id in the following: >> > >> > * select id from table where token(id) > token(some_value) and >> secondary_index = other_val limit 2 allow filtering; >> > >> > Filtering absolutely kills the performance. On a table populated with >> 130.000 records, single node Cassandra server (on my i7 notebook, 2GB of >> JVM heap) and secondary index built on column with low cardinality of its >> value set this query takes 156 seconds to finish. >> > >> > By the way, the performance is order of magnitude better if this patch >> is applied: >> > >> > diff --git >> a/src/java/org/apache/cassandra/db/index/composites/CompositesSearcher.java >> b/src/java/org/apache/cassandra/db/index/composites/CompositesSearcher.java >> > index 5ab1df6..13af671 100644 >> > --- >> a/src/java/org/apache/cassandra/db/index/composites/CompositesSearcher.java >> > +++ >> b/src/java/org/apache/cassandra/db/index/composites/CompositesSearcher.java >> > @@ -190,7 +190,8 @@ public class CompositesSearcher extends >> SecondaryIndexSearcher >> > >> > private int meanColumns = >> Math.max(index.getIndexCfs().getMeanColumns(), 1); >> > // We shouldn't fetch only 1 row as this provides buggy >> paging in case the first row doesn't satisfy all clauses >> > - private final int rowsPerQuery = >> Math.max(Math.min(filter.maxRows(), filter.maxColumns() / meanColumns), 2); >> > +// private final int rowsPerQuery = >> Math.max(Math.min(filter.maxRows(), filter.maxColumns() / meanColumns), 2); >> > + private final int rowsPerQuery = 100000; >> > >> > public boolean needsFiltering() >> > { >> > >> > * select id from table; >> > >> > As we saw in the trace log, the query - although it queries just row >> ids - scans all columns of all the rows and (probably) compares TTL with >> current time (?) (we saw hundreds of thousands of gettimeofday(2)). This >> means that if the table somehow mixes wide and narrow rows, the performance >> suffers horribly. >> > >> > * CQL collections >> > >> > See the point above with mixing wide rows and narrow rows. As Cassandra >> checks all the columns in selects, performance suffers badly if the >> collection is of any interesting size. >> > >> > Additionally, we saw various random irreproducible freezes, high CPU >> consumption when nothing happens (even with trace log level set no activity >> was reported) and highly inpredictable performance characteristics after >> nodetool flush and/or major compaction. >> > >> > Conclusions: >> > >> > - do not use collections >> > - do not use secondary indexes >> > - do not use filtering >> > - have your rows as narrow as possible if you need any kind of all row >> keys traversal >> > >> > With these conclusions in mind, CQL seems redundant, plain old thrift >> may be used, joins should be done client side and/or all indexes need to be >> handled manually. Correct? >> > >> > Thanks for reading, >> > >> > ondrej cernos >> >> >
