> I don't know. How do I find out? The only mention about query plan in 
> Cassandra I found is your article on your site, from 2011 and considering 
> version 0.8.

See the help for TRACE in cqlsh

My general approach is to solve problems with the read path by making changes 
to the write path. So I would normally say make a new table to store the data 
you want to read, or change the layout of a table to me more flexible. 

Can you provide the table definition and the query you are using ? 

Cheers
 
-----------------
Aaron Morton
New Zealand
@aaronmorton

Co-Founder & Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 15/01/2014, at 9:48 am, Ondřej Černoš <[email protected]> wrote:

> Hi,
> 
> thanks for the answer and sorry for the delay. Let me answer inline.
> 
> 
> On Wed, Dec 18, 2013 at 4:53 AM, Aaron Morton <[email protected]> wrote:
> > * select id from table where token(id) > token(some_value) and 
> > secondary_index = other_val limit 2 allow filtering;
> >
> > Filtering absolutely kills the performance. On a table populated with 
> > 130.000 records, single node Cassandra server (on my i7 notebook, 2GB of 
> > JVM heap) and secondary index built on column with low cardinality of its 
> > value set this query takes 156 seconds to finish.
> Yes, this is why you have to add allow_filtering. You are asking the nodes to 
> read all the data that matches and filter in memory, that’s a SQL type 
> operation.
> 
> Your example query is somewhat complex and I doubt it could get decent 
> performance, what does the query plan look like?
> 
> I don't know. How do I find out? The only mention about query plan in 
> Cassandra I found is your article on your site, from 2011 and considering 
> version 0.8.
> 
> The example query gets computed in a fraction of the time if I perform just 
> the fetch of all rows matching the token function and perform the filtering 
> client side.
> 
>  
> IMHO you need to do further de-normalisation, you will get the best 
> performance when you select rows by their full or part primary key.
> 
> I denormalize all the way I can. The problem is I need to support paging and 
> filtering at the same time. The API I must support allows filtering by 
> example and paging - so how should I denormalize? Should I somehow manage 
> pages of primary row keys manually? Or should I have manual secondary index 
> and page somehow in the denormalized wide row?
> 
> The trouble goes even further, even this doesn't perform well:
> 
> select id from table where token(id) > token(some_value) and pk_cluster = 
> 'val' limit N;
> 
> where id and pk_cluster are primary key (CQL3 table). I guess this should be 
> ordered row query and ordered column slice query, so where is the problem 
> with performance?
> 
>  
> > By the way, the performance is order of magnitude better if this patch is 
> > applied:
> That looks like it’s tuned to your specific need, it would ignore the max 
> results included in the query
> 
> It is tuned, it only demonstrates the heuristics doesn't work well.
>  
> > * select id from table;
> >
> > As we saw in the trace log, the query - although it queries just row ids - 
> > scans all columns of all the rows and (probably) compares TTL with current 
> > time (?) (we saw hundreds of thousands of gettimeofday(2)). This means that 
> > if the table somehow mixes wide and narrow rows, the performance suffers 
> > horribly.
> Select all rows from a table requires a range scan, which reads all rows from 
> all nodes. It should never be used production.
> 
> The trouble is I just need to perform it, sometimes. I know what the problem 
> with the query is, but I have just a couple of thousands records - 150.000 - 
> the datasets can all be stored in memory, SSTables can be fully mmapped. 
> There is no reason for this query to be slow in this case.
>  
> Not sure what you mean by “scans all columns from all rows” a select by 
> column name will use a SliceByNamesReadCommand which will only read the 
> required columns from each SSTable (it normally short circuits though and 
> read from less).
> 
> The query should fetch only IDs, it checks TTLs of columns though. That is 
> the point. Why does it do it? 
>  
> if there is a TTL the ExpiringColumn.localExpirationTime must be checked, if 
> there is no TTL it will no be checked.
> 
> It is a standard CQL3 table with ID, couple of columns and a CQL3 collection. 
> I didn't do anything with TTL on the table and it's columns.
>  
> > As Cassandra checks all the columns in selects, performance suffers badly 
> > if the collection is of any interesting size.
> This is not true, could you provide an example where you think this is 
> happening ?
> 
> We saw it in the trace log. It happened in the select ID from table query. 
> The table had a collection column.
>  
> > Additionally, we saw various random irreproducible freezes, high CPU 
> > consumption when nothing happens (even with trace log level set no activity 
> > was reported) and highly inpredictable performance characteristics after 
> > nodetool flush and/or major compaction.
> What was the HW platform and what was the load ?
> 
> My I7/8GB notebook, single node cluster, and virtualised AWS like 
> environment, on nodes of various sizes.
>  
> Typically freezes in the server correlate to JVM GC, the JVM GC can also be 
> using the CPU.
> If you have wide rows or make large reads you may run into more JVM GC issues.
>  
> nodetool flush will (as it says) flush all the tables to disk, if you have a 
> lot tables and/or a lot of secondary indexes this can cause the switch lock 
> to be held preventing write threads from progressing. Once flush threads stop 
> waiting on the flush queue the lock will be released. See the help for 
> memtable_flush_queue_size in the yaml file.
> 
> I will check this, thanks.
>  
> major compaction is not recommended to be used in production. If you are 
> seeing it cause performance problems I would guess it is related to JVM GC 
> and/or the disk IO is not able to keep up. When used it creates a single 
> SSTable for each table which will not be compacted again until (default) 3 
> other large SSTables are created or you run major compaction again. For this 
> reason it is not recommended.
> 
> > Conclusions:
> >
> > - do not use collections
> > - do not use secondary indexes
> > - do not use filtering
> > - have your rows as narrow as possible if you need any kind of all row keys 
> > traversal
> These features all have a use, but it looks like you leaned on them heavily 
> while creating a relational model. Specially the filtering, you have to 
> explicitly enable  it to prevent the client sending queries that will take a 
> long time.
> The only time row key traversal is used normally is reading data through 
> hadoop. You should always strive to read row(s) from a table by  the full or 
> partial primary key.
>  
> > With these conclusions in mind, CQL seems redundant, plain old thrift may 
> > be used, joins should be done client side and/or all indexes need to be 
> > handled manually. Correct?
> No.
> CQL provide a set of functionality not present in the thrift API.
> Joins and indexes should generally be handled by denormlaising the data 
> during writes.
> 
> It sounds like your data model was too relational, you need to denormalise 
> and read rows by primary key. Secondary indexes are useful when you have a 
> query pattern that is used infrequently.
> 
> regards,
> 
> ondrej cernos
>  
> Hope that helps.
> 
> -----------------
> Aaron Morton
> New Zealand
> @aaronmorton
> 
> Co-Founder & Principal Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
> 
> On 18/12/2013, at 3:47 am, Ondřej Černoš <[email protected]> wrote:
> 
> > Hi all,
> >
> > we are reimplementing a legacy interface of an inventory-like service 
> > (currently built on top of mysql) on Cassandra and I thought I would share 
> > some findings with the list. The interface semantics is given and cannot be 
> > changed. We chose Cassandra due to its multiple datacenter capabilities and 
> > no-spof qualities. The dataset is small (6 tables having 150.000 records, a 
> > bunch of tables with up to thousands of records), but the model is not 
> > trivial - the mysql model has some 20+ tables, joins are frequent, m:n 
> > relationships are frequent and the like. The interface is read heavy. We 
> > thought the size of the dataset should allow the whole dataset to fit into 
> > memory of each node (3 node cluster in each DC, 3 replicas, local quorum 
> > operations) and that even though some operations (like secondary index 
> > lookup) are not superfast, due to the size of the dataset it should perform 
> > ok. We were wrong.
> >
> > We use CQL3 exclusively and we use all of its capabilities (collections, 
> > secondary indexes, filtering), because they make the data model 
> > maintainable. We denormalised what had to be denormalised in order to avoid 
> > client side joins. Usual query to the storage means one CQL query on a 
> > denormalised table. We need to support integer offset/limit paging, 
> > filter-by-example kind of queries, M:N relationship queries and all the 
> > usual suspects of old SQL-backed interface.
> >
> > This is the list of operations that perform really poorly we identified so 
> > far. Row id is called id in the following:
> >
> > * select id from table where token(id) > token(some_value) and 
> > secondary_index = other_val limit 2 allow filtering;
> >
> > Filtering absolutely kills the performance. On a table populated with 
> > 130.000 records, single node Cassandra server (on my i7 notebook, 2GB of 
> > JVM heap) and secondary index built on column with low cardinality of its 
> > value set this query takes 156 seconds to finish.
> >
> > By the way, the performance is order of magnitude better if this patch is 
> > applied:
> >
> > diff --git 
> > a/src/java/org/apache/cassandra/db/index/composites/CompositesSearcher.java 
> > b/src/java/org/apache/cassandra/db/index/composites/CompositesSearcher.java
> > index 5ab1df6..13af671 100644
> > --- 
> > a/src/java/org/apache/cassandra/db/index/composites/CompositesSearcher.java
> > +++ 
> > b/src/java/org/apache/cassandra/db/index/composites/CompositesSearcher.java
> > @@ -190,7 +190,8 @@ public class CompositesSearcher extends 
> > SecondaryIndexSearcher
> >
> >             private int meanColumns = 
> > Math.max(index.getIndexCfs().getMeanColumns(), 1);
> >             // We shouldn't fetch only 1 row as this provides buggy paging 
> > in case the first row doesn't satisfy all clauses
> > -            private final int rowsPerQuery = 
> > Math.max(Math.min(filter.maxRows(), filter.maxColumns() / meanColumns), 2);
> > +//            private final int rowsPerQuery = 
> > Math.max(Math.min(filter.maxRows(), filter.maxColumns() / meanColumns), 2);
> > +            private final int rowsPerQuery = 100000;
> >
> >             public boolean needsFiltering()
> >             {
> >
> > * select id from table;
> >
> > As we saw in the trace log, the query - although it queries just row ids - 
> > scans all columns of all the rows and (probably) compares TTL with current 
> > time (?) (we saw hundreds of thousands of gettimeofday(2)). This means that 
> > if the table somehow mixes wide and narrow rows, the performance suffers 
> > horribly.
> >
> > * CQL collections
> >
> > See the point above with mixing wide rows and narrow rows. As Cassandra 
> > checks all the columns in selects, performance suffers badly if the 
> > collection is of any interesting size.
> >
> > Additionally, we saw various random irreproducible freezes, high CPU 
> > consumption when nothing happens (even with trace log level set no activity 
> > was reported) and highly inpredictable performance characteristics after 
> > nodetool flush and/or major compaction.
> >
> > Conclusions:
> >
> > - do not use collections
> > - do not use secondary indexes
> > - do not use filtering
> > - have your rows as narrow as possible if you need any kind of all row keys 
> > traversal
> >
> > With these conclusions in mind, CQL seems redundant, plain old thrift may 
> > be used, joins should be done client side and/or all indexes need to be 
> > handled manually. Correct?
> >
> > Thanks for reading,
> >
> > ondrej cernos
> 
> 

Reply via email to