Hi, Thank you for your reply.
As I was bothered by this problem, last night I upgraded the cluster to version 3.11.1 and everything is working now. As far as I can tell the counter table can be read now. I will be doing more testing today with this version but it is looking good. To answer your questions: - I might not have explained the table definition very well but the table does not have 6 partitions, but 6 partition keys. There are thousands of partitions in that table, a combination of all those partition keys. I also made sure that the partitions remained small when designing the table. - I also enabled tracing in the CQLSH but it showed nothing when querying this row. It however did when querying other tables... Thanks again for your reply!! I am very excited to be part of the Cassandra user base. Javier F Javier Pareja On Mon, Feb 19, 2018 at 8:08 AM, Alain RODRIGUEZ <arodr...@gmail.com> wrote: > > Hello, > > This table has 6 partition keys, 4 primary keys and 5 counters. > > > I think the root issue is this ^. There might be some inefficiency or > issues with counter, but this design, makes Cassandra relatively > inefficient in most cases and using standard columns or counters > indifferently. > > Cassandra data is supposed to be well distributed for a maximal > efficiency. With only 6 partitions, if you have 6+ nodes, there is 100% > chances that the load is fairly imbalanced. If you have less nodes, it's > still probably poorly balanced. Also reading from a small number of > sstables and in parallel within many nodes ideally to split the work and > make queries efficient, but in this case cassandra is reading huge > partitions from one node most probably. When the size of the request is too > big it can timeout. I am not sure how pagination works with counters, but I > believe even if pagination is working, at some point, you are just reading > too much (or too inefficiently) and the timeout is reached. > > I imagined it worked well for a while as counters are very small columns / > tables compared to any event data but at some point you might have reached > 'physical' limit, because you are pulling *all* the information you need > from one partition (and probably many SSTables) > > Is there really no other way to design this use case? > > When data starts to be inserted, I can query the counters correctly from >> that particular row but after a few minutes updating the table with >> thousands of events, I get a read timeout every time >> > > Troubleshot: > - Use tracing to understand what takes so long with your queries > - Check for warns / error in the logs. Cassandra use to complain when it > is unhappy with the configurations. There a lot of interesting and it's > been a while I last had a failure with no relevant informations in the logs. > - Check SSTable per read and other read performances for this counter > table. Using some monitoring could make the reason of this timeout obvious. > If you use Datadog for example, I guess that a quick look at the "Read > Path" Dashboard would help. If you are using any other tool, look for > SSTable per reads, Tombstone scanned (if any), keycache hitrate, resources > (as maybe fast insert rate compactions and implicit 'read-before-writes' > are making the machine less responsive. > > Fix: > - Improve design to improve the findings you made above ^ > - Improve compaction strategy or read operations depending on the findings > above ^ > > I am not saying there is no bug in counters and in your version, but I > would say it is to early to state this, given the data model, some other > reasons could explain this slowness. > > If you don't have any monitoring in place, tracing and logs are a nice > place to start digging. If you want to share those here, we can help > interpreting outputs you will share if needed :). > > C*heers, > > Alain > > > 2018-02-17 11:40 GMT+00:00 Javier Pareja <pareja.jav...@gmail.com>: > >> Hello everyone, >> >> I get a timeout error when reading a particular row from a large counters >> table. >> >> I have a storm topology that inserts data into a Cassandra counter table. >> This table has 6 partition keys, 4 primary keys and 5 counters. >> >> When data starts to be inserted, I can query the counters correctly from >> that particular row but after a few minutes updating the table with >> thousands of events, I get a readtimeout every time I try to read a >> particular row from the table (the most frequently updated). Other rows I >> can read quick and fine. Also if I run "select *", the top few hundreds are >> returned quick and fine as expected. The storm topology is stopped but the >> error is still there. >> >> I am using Cassandra 3.6. >> >> More information here: >> https://stackoverflow.com/q/48833146 >> >> Are counters in this version broken? I run the query from CQLSH and get >> the same error every time. I tried running it with trace enabled and get >> nothing but the error: >> >> ReadTimeout: Error from server: code=1200 [Coordinator node timed out >> waiting for replica nodes' responses] message="Operation timed out - >> received only 0 responses." info={'received_responses': 0, >> 'required_responses': 1, 'consistency': 'ONE'} >> >> >> Any ideas? >> > >