concurrent sstable read

2022-10-25 Thread Grzegorz Pietrusza
HI all

I can't find any information about how cassandra handles reads involving
multiple sstables. Are sstables read concurrently or sequentially? Is read
latency directly connected to the number of opened sstables?

Regards
Grzegorz


Re: TWCS recommendation on number of windows

2022-09-28 Thread Grzegorz Pietrusza
Hi Jeff
Thanks a lot for all these details, they are really helpful. My
understanding is that the number of windows is a tradeoff between the
amount of data waiting for expiration and the number of sstables required
to satisfy a read request.

In my case the data model does have a timestamp component. What is your
recommendation for these cases?
* TTL = 21 days, typical read span <= 2 days
* TTL = 1300 days, typical read span 30 to 60 days



śr., 28 wrz 2022 o 16:22 Jeff Jirsa  napisał(a):

> So when I wrote TWCS, I wrote it for a use case that had 24h TTLs and 30
> days of retention. In that application, we had tested 12h windows, 24h
> windows, and 7 day windows, and eventually settled on 24h windows because
> that balanced factors like sstable size, sstables-per-read, and expired
> data waiting to be dropped (about 3%, 1/30th, on any given day). That's
> where that recommendation came from - it was mostly around how much expired
> data will sit around waiting to be dropped. That doesn't change with
> multiple data directories.
>
> If you go with fewer windows, you'll expire larger chunks at a time, which
> means you'll retain larger chunks waiting on expiration.
> If you go with more windows, you'll potentially touch more sstables on
> read.
>
> Realistically, if you can model your data to align with chunks (so each
> read only touches one window), the actual number of sstables shouldn't
> really matter much - the timestamps and bloom filter will avoid touching
> most of them on the read path anyway. If your data model doesnt have a
> timestamp component to it and you're touching lots of sstables on read,
> even 30 sstables is probably going to hurt you, and 210 would be really,
> really bad.
>
>
>
>
>
> On Wed, Sep 28, 2022 at 7:00 AM Grzegorz Pietrusza 
> wrote:
>
>> Hi All!
>>
>> According to TWCS documentation (
>> https://cassandra.apache.org/doc/latest/cassandra/operating/compaction/twcs.html)
>> the operator should choose compaction window parameters to select a
>> compaction_window_unit and compaction_window_size pair that produces
>> approximately 20-30 windows.
>>
>> I'm curious where this recommendation comes from? Also should the number
>> of windows be changed when more than one data directory is used? In my
>> example there are 7 data directories (partitions) and it seems that all of
>> them store 20-30 windows. Effectively this gives 140-210 sstables in total.
>> Is that an optimal configuration?
>>
>> Running on Cassandra 3.11
>>
>> Regards
>> Grzegorz
>>
>


TWCS recommendation on number of windows

2022-09-28 Thread Grzegorz Pietrusza
Hi All!

According to TWCS documentation (
https://cassandra.apache.org/doc/latest/cassandra/operating/compaction/twcs.html)
the operator should choose compaction window parameters to select a
compaction_window_unit and compaction_window_size pair that produces
approximately 20-30 windows.

I'm curious where this recommendation comes from? Also should the number of
windows be changed when more than one data directory is used? In my example
there are 7 data directories (partitions) and it seems that all of them
store 20-30 windows. Effectively this gives 140-210 sstables in total. Is
that an optimal configuration?

Running on Cassandra 3.11

Regards
Grzegorz


estimated number of keys vs ttl

2018-05-23 Thread Grzegorz Pietrusza
Hi

I'm using tablestats to get estimated number of partitioning keys. In my
case all writes are done with TTL of a few days. Is the key count decreased
when TTL hits?

Regards
Grzegorz


Re: Grafana data stored in a Cassandra database

2018-05-09 Thread Grzegorz Pietrusza
https://kairosdb.github.io

2018-05-09 19:43 GMT+02:00 Peter Sanford :

> This project implements the graphite api on top of Cassandra and can be
> used from grafana:
>
> https://github.com/pyr/cyanite
>
> On Wed, May 9, 2018 at 10:39 AM dba newsql  wrote:
>
>> Any one use Cassandra as data storage for Grafana as timeseries DB?
>>
>> Thanks,
>> Fay
>>
>


Re: read repair with consistency one

2018-04-25 Thread Grzegorz Pietrusza
Hi Ben

Thanks a lot. From my analysis of the code it looks like you are right.
When global read repair kicks in all live endpoints are queried for data,
regardless of consistency level.  Only EACH_QUORUM is treated differently.

Cheers
Grzegorz

2018-04-22 1:45 GMT+02:00 Ben Slater <ben.sla...@instaclustr.com>:

> I haven't checked the code to make sure this is still the case but last
> time I checked:
> - For any read, if an inconsistency between replicas is detected then this
> inconsistency will be repaired. This obviously wouldn’t apply with CL=ONE
> because you’re not reading multiple replicas to find inconsistencies.
> - If read_repair_chance or dc_local_read_repair_chance are >0 then extra
> replicas are checked as part of the query for the % of queries specified by
> the chance setting. Again, if inconsistencies are found, they are repaired.
> I expect this mechanism would still apply for CL=ONE.
>
>
> Cheers
> Ben
>
> On Sat, 21 Apr 2018 at 22:20 Grzegorz Pietrusza <gpietru...@gmail.com>
> wrote:
>
>> I haven't asked about "regular" repairs. I just wanted to know how read
>> repair behaves in my configuration (or is it doing anything at all).
>>
>> 2018-04-21 14:04 GMT+02:00 Rahul Singh <rahul.xavier.si...@gmail.com>:
>>
>>> Read repairs are one anti-entropy measure. Continuous repairs is
>>> another. If you do repairs via Reaper or your own method it will resolve
>>> your discrepencies.
>>>
>>> On Apr 21, 2018, 3:16 AM -0400, Grzegorz Pietrusza <gpietru...@gmail.com>,
>>> wrote:
>>>
>>> Hi all
>>>
>>> I'm a bit confused with how read repair works in my case, which is:
>>> - multiple DCs with RF 1 (NetworkTopologyStrategy)
>>> - reads with consistency ONE
>>>
>>>
>>> The article #1 says that read repair in fact runs RF reads for some
>>> percent of the requests. Let's say I have read_repair_chance = 0.1.
>>> Does it mean that 10% of requests will be read in all DCs (digest) and
>>> processed in a background?
>>>
>>> On the other hand article #2 says that for consistency ONE read repair
>>> is not performed. Does it mean that in my case read repair does not work at
>>> all? Is there any way to enable read repair across DCs and stay will
>>> consistency ONE for reads?
>>>
>>>
>>> #1 https://www.datastax.com/dev/blog/common-mistakes-and-misconceptions
>>> #2 https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/
>>> opsRepairNodesReadRepair.html
>>>
>>> Regards
>>> Grzegorz
>>>
>>>
>> --
>
>
> *Ben Slater*
>
> *Chief Product Officer <https://www.instaclustr.com/>*
>
> <https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
><https://www.linkedin.com/company/instaclustr>
>
> Read our latest technical blog posts here
> <https://www.instaclustr.com/blog/>.
>
> This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
> and Instaclustr Inc (USA).
>
> This email and any attachments may contain confidential and legally
> privileged information.  If you are not the intended recipient, do not copy
> or disclose its content, but please reply to this email immediately and
> highlight the error to the sender and then immediately delete the message.
>


Re: read repair with consistency one

2018-04-21 Thread Grzegorz Pietrusza
I haven't asked about "regular" repairs. I just wanted to know how read
repair behaves in my configuration (or is it doing anything at all).

2018-04-21 14:04 GMT+02:00 Rahul Singh <rahul.xavier.si...@gmail.com>:

> Read repairs are one anti-entropy measure. Continuous repairs is another.
> If you do repairs via Reaper or your own method it will resolve your
> discrepencies.
>
> On Apr 21, 2018, 3:16 AM -0400, Grzegorz Pietrusza <gpietru...@gmail.com>,
> wrote:
>
> Hi all
>
> I'm a bit confused with how read repair works in my case, which is:
> - multiple DCs with RF 1 (NetworkTopologyStrategy)
> - reads with consistency ONE
>
>
> The article #1 says that read repair in fact runs RF reads for some
> percent of the requests. Let's say I have read_repair_chance = 0.1. Does
> it mean that 10% of requests will be read in all DCs (digest) and processed
> in a background?
>
> On the other hand article #2 says that for consistency ONE read repair is
> not performed. Does it mean that in my case read repair does not work at
> all? Is there any way to enable read repair across DCs and stay will
> consistency ONE for reads?
>
>
> #1 https://www.datastax.com/dev/blog/common-mistakes-and-misconceptions
> #2 https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/
> opsRepairNodesReadRepair.html
>
> Regards
> Grzegorz
>
>


read repair with consistency one

2018-04-21 Thread Grzegorz Pietrusza
Hi all

I'm a bit confused with how read repair works in my case, which is:
- multiple DCs with RF 1 (NetworkTopologyStrategy)
- reads with consistency ONE


The article #1 says that read repair in fact runs RF reads for some percent
of the requests. Let's say I have read_repair_chance = 0.1. Does it mean
that 10% of requests will be read in all DCs (digest) and processed in a
background?

On the other hand article #2 says that for consistency ONE read repair is
not performed. Does it mean that in my case read repair does not work at
all? Is there any way to enable read repair across DCs and stay will
consistency ONE for reads?


#1 https://www.datastax.com/dev/blog/common-mistakes-and-misconceptions
#2
https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsRepairNodesReadRepair.html

Regards
Grzegorz


tablestats and gossip

2018-04-06 Thread Grzegorz Pietrusza
Hi all

Does local write count provided by tablestats include writes from gossip?