Re: Cassandra counter readtimeout error

2018-02-20 Thread Carl Mueller
How "hot" are your partition keys in these counters?

I would think, theoretically, if specific partition keys are getting
thousands of counter increments/mutations updates, then compaction won't
"compact" those together into the final value, and you'll start
experiencing the problems people get with rows with thousands of tombstones.

So if you had an event 'birthdaypartyattendance'

and you had 1110 separate updates doing +1s/+2s/+3s to the attendance count
for that event (what a bday party!), then when you went to select that
final attendance value, with many of those increments may still be on other
nodes and not fully replicated, then it will have to read 1110 cells and
accumulate them to the final value. When replication has completed and
compaction runs, it should amalgamate those. QUORUM-write will help with
ensuring the counter mutations are written to the proper number of nodes,
with the usual three node wait overhead.

DISCLAIMER: I don't have working knowledge of the code in distributed
counters. I just know they are a really hard problem and don't work great
in 2.x. As said, 3.x seems to be a lot better.

On Mon, Feb 19, 2018 at 10:43 AM, Alain RODRIGUEZ 
wrote:

> Hi Javier,
>
> Glad to hear it is solved now. Cassandra 3.11.1 should be a more stable
> version and 3.11 a better series.
>
> Excuse my misunderstanding, your table seems to be better designed than
> thought.
>
> Welcome to the Apache Cassandra community!
>
> C*heers ;-)
> ---
> Alain Rodriguez - @arodream - al...@thelastpickle.com
> France / Spain
>
> The Last Pickle - Apache Cassandra Consulting
> http://www.thelastpickle.com
>
>
>
> 2018-02-19 9:31 GMT+00:00 Javier Pareja :
>
>> Hi,
>>
>> Thank you for your reply.
>>
>> As I was bothered by this problem, last night I upgraded the cluster to
>> version 3.11.1 and everything is working now. As far as I can tell the
>> counter table can be read now. I will be doing more testing today with this
>> version but it is looking good.
>>
>> To answer your questions:
>> - I might not have explained the table definition very well but the table
>> does not have 6 partitions, but 6 partition keys. There are thousands of
>> partitions in that table, a combination of all those partition keys. I also
>> made sure that the partitions remained small when designing the table.
>> - I also enabled tracing in the CQLSH but it showed nothing when querying
>> this row. It however did when querying other tables...
>>
>> Thanks again for your reply!! I am very excited to be part of the
>> Cassandra user base.
>>
>> Javier
>>
>>
>>
>> F Javier Pareja
>>
>> On Mon, Feb 19, 2018 at 8:08 AM, Alain RODRIGUEZ 
>> wrote:
>>
>>>
>>> Hello,
>>>
>>> This table has 6 partition keys, 4 primary keys and 5 counters.
>>>
>>>
>>> I think the root issue is this ^. There might be some inefficiency or
>>> issues with counter, but this design, makes Cassandra relatively
>>> inefficient in most cases and using standard columns or counters
>>> indifferently.
>>>
>>> Cassandra data is supposed to be well distributed for a maximal
>>> efficiency. With only 6 partitions, if you have 6+ nodes, there is 100%
>>> chances that the load is fairly imbalanced. If you have less nodes, it's
>>> still probably poorly balanced. Also reading from a small number of
>>> sstables and in parallel within many nodes ideally to split the work and
>>> make queries efficient, but in this case cassandra is reading huge
>>> partitions from one node most probably. When the size of the request is too
>>> big it can timeout. I am not sure how pagination works with counters, but I
>>> believe even if pagination is working, at some point, you are just reading
>>> too much (or too inefficiently) and the timeout is reached.
>>>
>>> I imagined it worked well for a while as counters are very small columns
>>> / tables compared to any event data but at some point you might have
>>> reached 'physical' limit, because you are pulling *all* the information
>>> you need from one partition (and probably many SSTables)
>>>
>>> Is there really no other way to design this use case?
>>>
>>> When data starts to be inserted, I can query the counters correctly from
 that particular row but after a few minutes updating the table with
 thousands of events, I get a read timeout every time

>>>
>>> Troubleshot:
>>> - Use tracing to understand what takes so long with your queries
>>> - Check for warns / error in the logs. Cassandra use to complain when it
>>> is unhappy with the configurations. There a lot of interesting and it's
>>> been a while I last had a failure with no relevant informations in the logs.
>>> - Check SSTable per read and other read performances for this counter
>>> table. Using some monitoring could make the reason of this timeout obvious.
>>> If you use Datadog for example, I guess that a quick look at the "Read
>>> Path" Dashboard would help. If you are using 

Re: Cassandra counter readtimeout error

2018-02-19 Thread Alain RODRIGUEZ
Hi Javier,

Glad to hear it is solved now. Cassandra 3.11.1 should be a more stable
version and 3.11 a better series.

Excuse my misunderstanding, your table seems to be better designed than
thought.

Welcome to the Apache Cassandra community!

C*heers ;-)
---
Alain Rodriguez - @arodream - al...@thelastpickle.com
France / Spain

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com



2018-02-19 9:31 GMT+00:00 Javier Pareja :

> Hi,
>
> Thank you for your reply.
>
> As I was bothered by this problem, last night I upgraded the cluster to
> version 3.11.1 and everything is working now. As far as I can tell the
> counter table can be read now. I will be doing more testing today with this
> version but it is looking good.
>
> To answer your questions:
> - I might not have explained the table definition very well but the table
> does not have 6 partitions, but 6 partition keys. There are thousands of
> partitions in that table, a combination of all those partition keys. I also
> made sure that the partitions remained small when designing the table.
> - I also enabled tracing in the CQLSH but it showed nothing when querying
> this row. It however did when querying other tables...
>
> Thanks again for your reply!! I am very excited to be part of the
> Cassandra user base.
>
> Javier
>
>
>
> F Javier Pareja
>
> On Mon, Feb 19, 2018 at 8:08 AM, Alain RODRIGUEZ 
> wrote:
>
>>
>> Hello,
>>
>> This table has 6 partition keys, 4 primary keys and 5 counters.
>>
>>
>> I think the root issue is this ^. There might be some inefficiency or
>> issues with counter, but this design, makes Cassandra relatively
>> inefficient in most cases and using standard columns or counters
>> indifferently.
>>
>> Cassandra data is supposed to be well distributed for a maximal
>> efficiency. With only 6 partitions, if you have 6+ nodes, there is 100%
>> chances that the load is fairly imbalanced. If you have less nodes, it's
>> still probably poorly balanced. Also reading from a small number of
>> sstables and in parallel within many nodes ideally to split the work and
>> make queries efficient, but in this case cassandra is reading huge
>> partitions from one node most probably. When the size of the request is too
>> big it can timeout. I am not sure how pagination works with counters, but I
>> believe even if pagination is working, at some point, you are just reading
>> too much (or too inefficiently) and the timeout is reached.
>>
>> I imagined it worked well for a while as counters are very small columns
>> / tables compared to any event data but at some point you might have
>> reached 'physical' limit, because you are pulling *all* the information
>> you need from one partition (and probably many SSTables)
>>
>> Is there really no other way to design this use case?
>>
>> When data starts to be inserted, I can query the counters correctly from
>>> that particular row but after a few minutes updating the table with
>>> thousands of events, I get a read timeout every time
>>>
>>
>> Troubleshot:
>> - Use tracing to understand what takes so long with your queries
>> - Check for warns / error in the logs. Cassandra use to complain when it
>> is unhappy with the configurations. There a lot of interesting and it's
>> been a while I last had a failure with no relevant informations in the logs.
>> - Check SSTable per read and other read performances for this counter
>> table. Using some monitoring could make the reason of this timeout obvious.
>> If you use Datadog for example, I guess that a quick look at the "Read
>> Path" Dashboard would help. If you are using any other tool, look for
>> SSTable per reads, Tombstone scanned (if any), keycache hitrate, resources
>> (as maybe fast insert rate compactions and implicit 'read-before-writes'
>> are making the machine less responsive.
>>
>> Fix:
>> - Improve design to improve the findings you made above ^
>> - Improve compaction strategy or read operations depending on the
>> findings above ^
>>
>> I am not saying there is no bug in counters and in your version, but I
>> would say it is to early to state this, given the data model, some other
>> reasons could explain this slowness.
>>
>> If you don't have any monitoring in place, tracing and logs are a nice
>> place to start digging. If you want to share those here, we can help
>> interpreting outputs you will share if needed :).
>>
>> C*heers,
>>
>> Alain
>>
>>
>> 2018-02-17 11:40 GMT+00:00 Javier Pareja :
>>
>>> Hello everyone,
>>>
>>> I get a timeout error when reading a particular row from a large
>>> counters table.
>>>
>>> I have a storm topology that inserts data into a Cassandra counter
>>> table. This table has 6 partition keys, 4 primary keys and 5 counters.
>>>
>>> When data starts to be inserted, I can query the counters correctly from
>>> that particular row but after a few minutes updating the table with
>>> thousands of 

Re: Cassandra counter readtimeout error

2018-02-19 Thread Javier Pareja
Hi,

Thank you for your reply.

As I was bothered by this problem, last night I upgraded the cluster to
version 3.11.1 and everything is working now. As far as I can tell the
counter table can be read now. I will be doing more testing today with this
version but it is looking good.

To answer your questions:
- I might not have explained the table definition very well but the table
does not have 6 partitions, but 6 partition keys. There are thousands of
partitions in that table, a combination of all those partition keys. I also
made sure that the partitions remained small when designing the table.
- I also enabled tracing in the CQLSH but it showed nothing when querying
this row. It however did when querying other tables...

Thanks again for your reply!! I am very excited to be part of the Cassandra
user base.

Javier



F Javier Pareja

On Mon, Feb 19, 2018 at 8:08 AM, Alain RODRIGUEZ  wrote:

>
> Hello,
>
> This table has 6 partition keys, 4 primary keys and 5 counters.
>
>
> I think the root issue is this ^. There might be some inefficiency or
> issues with counter, but this design, makes Cassandra relatively
> inefficient in most cases and using standard columns or counters
> indifferently.
>
> Cassandra data is supposed to be well distributed for a maximal
> efficiency. With only 6 partitions, if you have 6+ nodes, there is 100%
> chances that the load is fairly imbalanced. If you have less nodes, it's
> still probably poorly balanced. Also reading from a small number of
> sstables and in parallel within many nodes ideally to split the work and
> make queries efficient, but in this case cassandra is reading huge
> partitions from one node most probably. When the size of the request is too
> big it can timeout. I am not sure how pagination works with counters, but I
> believe even if pagination is working, at some point, you are just reading
> too much (or too inefficiently) and the timeout is reached.
>
> I imagined it worked well for a while as counters are very small columns /
> tables compared to any event data but at some point you might have reached
> 'physical' limit, because you are pulling *all* the information you need
> from one partition (and probably many SSTables)
>
> Is there really no other way to design this use case?
>
> When data starts to be inserted, I can query the counters correctly from
>> that particular row but after a few minutes updating the table with
>> thousands of events, I get a read timeout every time
>>
>
> Troubleshot:
> - Use tracing to understand what takes so long with your queries
> - Check for warns / error in the logs. Cassandra use to complain when it
> is unhappy with the configurations. There a lot of interesting and it's
> been a while I last had a failure with no relevant informations in the logs.
> - Check SSTable per read and other read performances for this counter
> table. Using some monitoring could make the reason of this timeout obvious.
> If you use Datadog for example, I guess that a quick look at the "Read
> Path" Dashboard would help. If you are using any other tool, look for
> SSTable per reads, Tombstone scanned (if any), keycache hitrate, resources
> (as maybe fast insert rate compactions and implicit 'read-before-writes'
> are making the machine less responsive.
>
> Fix:
> - Improve design to improve the findings you made above ^
> - Improve compaction strategy or read operations depending on the findings
> above ^
>
> I am not saying there is no bug in counters and in your version, but I
> would say it is to early to state this, given the data model, some other
> reasons could explain this slowness.
>
> If you don't have any monitoring in place, tracing and logs are a nice
> place to start digging. If you want to share those here, we can help
> interpreting outputs you will share if needed :).
>
> C*heers,
>
> Alain
>
>
> 2018-02-17 11:40 GMT+00:00 Javier Pareja :
>
>> Hello everyone,
>>
>> I get a timeout error when reading a particular row from a large counters
>> table.
>>
>> I have a storm topology that inserts data into a Cassandra counter table.
>> This table has 6 partition keys, 4 primary keys and 5 counters.
>>
>> When data starts to be inserted, I can query the counters correctly from
>> that particular row but after a few minutes updating the table with
>> thousands of events, I get a readtimeout every time I try to read a
>> particular row from the table (the most frequently updated). Other rows I
>> can read quick and fine. Also if I run "select *", the top few hundreds are
>> returned quick and fine as expected. The storm topology is stopped but the
>> error is still there.
>>
>> I am using Cassandra 3.6.
>>
>> More information here:
>> https://stackoverflow.com/q/48833146
>>
>> Are counters in this version broken? I run the query from CQLSH and get
>> the same error every time. I tried running it with trace enabled and get
>> nothing but the error:
>>
>> ReadTimeout: Error from 

Re: Cassandra counter readtimeout error

2018-02-19 Thread Alain RODRIGUEZ
Hello,

This table has 6 partition keys, 4 primary keys and 5 counters.


I think the root issue is this ^. There might be some inefficiency or
issues with counter, but this design, makes Cassandra relatively
inefficient in most cases and using standard columns or counters
indifferently.

Cassandra data is supposed to be well distributed for a maximal efficiency.
With only 6 partitions, if you have 6+ nodes, there is 100% chances that
the load is fairly imbalanced. If you have less nodes, it's still probably
poorly balanced. Also reading from a small number of sstables and in
parallel within many nodes ideally to split the work and make queries
efficient, but in this case cassandra is reading huge partitions from one
node most probably. When the size of the request is too big it can timeout.
I am not sure how pagination works with counters, but I believe even if
pagination is working, at some point, you are just reading too much (or too
inefficiently) and the timeout is reached.

I imagined it worked well for a while as counters are very small columns /
tables compared to any event data but at some point you might have reached
'physical' limit, because you are pulling *all* the information you need
from one partition (and probably many SSTables)

Is there really no other way to design this use case?

When data starts to be inserted, I can query the counters correctly from
> that particular row but after a few minutes updating the table with
> thousands of events, I get a read timeout every time
>

Troubleshot:
- Use tracing to understand what takes so long with your queries
- Check for warns / error in the logs. Cassandra use to complain when it is
unhappy with the configurations. There a lot of interesting and it's been a
while I last had a failure with no relevant informations in the logs.
- Check SSTable per read and other read performances for this counter
table. Using some monitoring could make the reason of this timeout obvious.
If you use Datadog for example, I guess that a quick look at the "Read
Path" Dashboard would help. If you are using any other tool, look for
SSTable per reads, Tombstone scanned (if any), keycache hitrate, resources
(as maybe fast insert rate compactions and implicit 'read-before-writes'
are making the machine less responsive.

Fix:
- Improve design to improve the findings you made above ^
- Improve compaction strategy or read operations depending on the findings
above ^

I am not saying there is no bug in counters and in your version, but I
would say it is to early to state this, given the data model, some other
reasons could explain this slowness.

If you don't have any monitoring in place, tracing and logs are a nice
place to start digging. If you want to share those here, we can help
interpreting outputs you will share if needed :).

C*heers,

Alain


2018-02-17 11:40 GMT+00:00 Javier Pareja :

> Hello everyone,
>
> I get a timeout error when reading a particular row from a large counters
> table.
>
> I have a storm topology that inserts data into a Cassandra counter table.
> This table has 6 partition keys, 4 primary keys and 5 counters.
>
> When data starts to be inserted, I can query the counters correctly from
> that particular row but after a few minutes updating the table with
> thousands of events, I get a readtimeout every time I try to read a
> particular row from the table (the most frequently updated). Other rows I
> can read quick and fine. Also if I run "select *", the top few hundreds are
> returned quick and fine as expected. The storm topology is stopped but the
> error is still there.
>
> I am using Cassandra 3.6.
>
> More information here:
> https://stackoverflow.com/q/48833146
>
> Are counters in this version broken? I run the query from CQLSH and get
> the same error every time. I tried running it with trace enabled and get
> nothing but the error:
>
> ReadTimeout: Error from server: code=1200 [Coordinator node timed out waiting 
> for replica nodes' responses] message="Operation timed out - received only 0 
> responses." info={'received_responses': 0, 'required_responses': 1, 
> 'consistency': 'ONE'}
>
>
> Any ideas?
>


Cassandra counter readtimeout error

2018-02-17 Thread Javier Pareja
Hello everyone,

I get a timeout error when reading a particular row from a large counters
table.

I have a storm topology that inserts data into a Cassandra counter table.
This table has 6 partition keys, 4 primary keys and 5 counters.

When data starts to be inserted, I can query the counters correctly from
that particular row but after a few minutes updating the table with
thousands of events, I get a readtimeout every time I try to read a
particular row from the table (the most frequently updated). Other rows I
can read quick and fine. Also if I run "select *", the top few hundreds are
returned quick and fine as expected. The storm topology is stopped but the
error is still there.

I am using Cassandra 3.6.

More information here:
https://stackoverflow.com/q/48833146

Are counters in this version broken? I run the query from CQLSH and get the
same error every time. I tried running it with trace enabled and get
nothing but the error:

ReadTimeout: Error from server: code=1200 [Coordinator node timed out
waiting for replica nodes' responses] message="Operation timed out -
received only 0 responses." info={'received_responses': 0,
'required_responses': 1, 'consistency': 'ONE'}


Any ideas?