Re: Cassandra counter readtimeout error

Javier Pareja Mon, 19 Feb 2018 01:32:00 -0800

Hi,

Thank you for your reply.


As I was bothered by this problem, last night I upgraded the cluster to
version 3.11.1 and everything is working now. As far as I can tell the
counter table can be read now. I will be doing more testing today with this
version but it is looking good.

To answer your questions:
- I might not have explained the table definition very well but the table
does not have 6 partitions, but 6 partition keys. There are thousands of
partitions in that table, a combination of all those partition keys. I also
made sure that the partitions remained small when designing the table.
- I also enabled tracing in the CQLSH but it showed nothing when querying
this row. It however did when querying other tables...

Thanks again for your reply!! I am very excited to be part of the Cassandra
user base.

Javier



F Javier Pareja

On Mon, Feb 19, 2018 at 8:08 AM, Alain RODRIGUEZ <arodr...@gmail.com> wrote:

>
> Hello,
>
> This table has 6 partition keys, 4 primary keys and 5 counters.
>
>
> I think the root issue is this ^. There might be some inefficiency or
> issues with counter, but this design, makes Cassandra relatively
> inefficient in most cases and using standard columns or counters
> indifferently.
>
> Cassandra data is supposed to be well distributed for a maximal
> efficiency. With only 6 partitions, if you have 6+ nodes, there is 100%
> chances that the load is fairly imbalanced. If you have less nodes, it's
> still probably poorly balanced. Also reading from a small number of
> sstables and in parallel within many nodes ideally to split the work and
> make queries efficient, but in this case cassandra is reading huge
> partitions from one node most probably. When the size of the request is too
> big it can timeout. I am not sure how pagination works with counters, but I
> believe even if pagination is working, at some point, you are just reading
> too much (or too inefficiently) and the timeout is reached.
>
> I imagined it worked well for a while as counters are very small columns /
> tables compared to any event data but at some point you might have reached
> 'physical' limit, because you are pulling *all* the information you need
> from one partition (and probably many SSTables)
>
> Is there really no other way to design this use case?
>
> When data starts to be inserted, I can query the counters correctly from
>> that particular row but after a few minutes updating the table with
>> thousands of events, I get a read timeout every time
>>
>
> Troubleshot:
> - Use tracing to understand what takes so long with your queries
> - Check for warns / error in the logs. Cassandra use to complain when it
> is unhappy with the configurations. There a lot of interesting and it's
> been a while I last had a failure with no relevant informations in the logs.
> - Check SSTable per read and other read performances for this counter
> table. Using some monitoring could make the reason of this timeout obvious.
> If you use Datadog for example, I guess that a quick look at the "Read
> Path" Dashboard would help. If you are using any other tool, look for
> SSTable per reads, Tombstone scanned (if any), keycache hitrate, resources
> (as maybe fast insert rate compactions and implicit 'read-before-writes'
> are making the machine less responsive.
>
> Fix:
> - Improve design to improve the findings you made above ^
> - Improve compaction strategy or read operations depending on the findings
> above ^
>
> I am not saying there is no bug in counters and in your version, but I
> would say it is to early to state this, given the data model, some other
> reasons could explain this slowness.
>
> If you don't have any monitoring in place, tracing and logs are a nice
> place to start digging. If you want to share those here, we can help
> interpreting outputs you will share if needed :).
>
> C*heers,
>
> Alain
>
>
> 2018-02-17 11:40 GMT+00:00 Javier Pareja <pareja.jav...@gmail.com>:
>
>> Hello everyone,
>>
>> I get a timeout error when reading a particular row from a large counters
>> table.
>>
>> I have a storm topology that inserts data into a Cassandra counter table.
>> This table has 6 partition keys, 4 primary keys and 5 counters.
>>
>> When data starts to be inserted, I can query the counters correctly from
>> that particular row but after a few minutes updating the table with
>> thousands of events, I get a readtimeout every time I try to read a
>> particular row from the table (the most frequently updated). Other rows I
>> can read quick and fine. Also if I run "select *", the top few hundreds are
>> returned quick and fine as expected. The storm topology is stopped but the
>> error is still there.
>>
>> I am using Cassandra 3.6.
>>
>> More information here:
>> https://stackoverflow.com/q/48833146
>>
>> Are counters in this version broken? I run the query from CQLSH and get
>> the same error every time. I tried running it with trace enabled and get
>> nothing but the error:
>>
>> ReadTimeout: Error from server: code=1200 [Coordinator node timed out 
>> waiting for replica nodes' responses] message="Operation timed out - 
>> received only 0 responses." info={'received_responses': 0, 
>> 'required_responses': 1, 'consistency': 'ONE'}
>>
>>
>> Any ideas?
>>
>
>

Re: Cassandra counter readtimeout error

Reply via email to