Hello,

This table has 6 partition keys, 4 primary keys and 5 counters.


I think the root issue is this ^. There might be some inefficiency or
issues with counter, but this design, makes Cassandra relatively
inefficient in most cases and using standard columns or counters
indifferently.

Cassandra data is supposed to be well distributed for a maximal efficiency.
With only 6 partitions, if you have 6+ nodes, there is 100% chances that
the load is fairly imbalanced. If you have less nodes, it's still probably
poorly balanced. Also reading from a small number of sstables and in
parallel within many nodes ideally to split the work and make queries
efficient, but in this case cassandra is reading huge partitions from one
node most probably. When the size of the request is too big it can timeout.
I am not sure how pagination works with counters, but I believe even if
pagination is working, at some point, you are just reading too much (or too
inefficiently) and the timeout is reached.

I imagined it worked well for a while as counters are very small columns /
tables compared to any event data but at some point you might have reached
'physical' limit, because you are pulling *all* the information you need
from one partition (and probably many SSTables)

Is there really no other way to design this use case?

When data starts to be inserted, I can query the counters correctly from
> that particular row but after a few minutes updating the table with
> thousands of events, I get a read timeout every time
>

Troubleshot:
- Use tracing to understand what takes so long with your queries
- Check for warns / error in the logs. Cassandra use to complain when it is
unhappy with the configurations. There a lot of interesting and it's been a
while I last had a failure with no relevant informations in the logs.
- Check SSTable per read and other read performances for this counter
table. Using some monitoring could make the reason of this timeout obvious.
If you use Datadog for example, I guess that a quick look at the "Read
Path" Dashboard would help. If you are using any other tool, look for
SSTable per reads, Tombstone scanned (if any), keycache hitrate, resources
(as maybe fast insert rate compactions and implicit 'read-before-writes'
are making the machine less responsive.

Fix:
- Improve design to improve the findings you made above ^
- Improve compaction strategy or read operations depending on the findings
above ^

I am not saying there is no bug in counters and in your version, but I
would say it is to early to state this, given the data model, some other
reasons could explain this slowness.

If you don't have any monitoring in place, tracing and logs are a nice
place to start digging. If you want to share those here, we can help
interpreting outputs you will share if needed :).

C*heers,

Alain


2018-02-17 11:40 GMT+00:00 Javier Pareja <pareja.jav...@gmail.com>:

> Hello everyone,
>
> I get a timeout error when reading a particular row from a large counters
> table.
>
> I have a storm topology that inserts data into a Cassandra counter table.
> This table has 6 partition keys, 4 primary keys and 5 counters.
>
> When data starts to be inserted, I can query the counters correctly from
> that particular row but after a few minutes updating the table with
> thousands of events, I get a readtimeout every time I try to read a
> particular row from the table (the most frequently updated). Other rows I
> can read quick and fine. Also if I run "select *", the top few hundreds are
> returned quick and fine as expected. The storm topology is stopped but the
> error is still there.
>
> I am using Cassandra 3.6.
>
> More information here:
> https://stackoverflow.com/q/48833146
>
> Are counters in this version broken? I run the query from CQLSH and get
> the same error every time. I tried running it with trace enabled and get
> nothing but the error:
>
> ReadTimeout: Error from server: code=1200 [Coordinator node timed out waiting 
> for replica nodes' responses] message="Operation timed out - received only 0 
> responses." info={'received_responses': 0, 'required_responses': 1, 
> 'consistency': 'ONE'}
>
>
> Any ideas?
>

Reply via email to