Re: TWCS recommendation on number of windows

2022-09-28 Thread Grzegorz Pietrusza
Hi Jeff
Thanks a lot for all these details, they are really helpful. My
understanding is that the number of windows is a tradeoff between the
amount of data waiting for expiration and the number of sstables required
to satisfy a read request.

In my case the data model does have a timestamp component. What is your
recommendation for these cases?
* TTL = 21 days, typical read span <= 2 days
* TTL = 1300 days, typical read span 30 to 60 days



śr., 28 wrz 2022 o 16:22 Jeff Jirsa  napisał(a):

> So when I wrote TWCS, I wrote it for a use case that had 24h TTLs and 30
> days of retention. In that application, we had tested 12h windows, 24h
> windows, and 7 day windows, and eventually settled on 24h windows because
> that balanced factors like sstable size, sstables-per-read, and expired
> data waiting to be dropped (about 3%, 1/30th, on any given day). That's
> where that recommendation came from - it was mostly around how much expired
> data will sit around waiting to be dropped. That doesn't change with
> multiple data directories.
>
> If you go with fewer windows, you'll expire larger chunks at a time, which
> means you'll retain larger chunks waiting on expiration.
> If you go with more windows, you'll potentially touch more sstables on
> read.
>
> Realistically, if you can model your data to align with chunks (so each
> read only touches one window), the actual number of sstables shouldn't
> really matter much - the timestamps and bloom filter will avoid touching
> most of them on the read path anyway. If your data model doesnt have a
> timestamp component to it and you're touching lots of sstables on read,
> even 30 sstables is probably going to hurt you, and 210 would be really,
> really bad.
>
>
>
>
>
> On Wed, Sep 28, 2022 at 7:00 AM Grzegorz Pietrusza 
> wrote:
>
>> Hi All!
>>
>> According to TWCS documentation (
>> https://cassandra.apache.org/doc/latest/cassandra/operating/compaction/twcs.html)
>> the operator should choose compaction window parameters to select a
>> compaction_window_unit and compaction_window_size pair that produces
>> approximately 20-30 windows.
>>
>> I'm curious where this recommendation comes from? Also should the number
>> of windows be changed when more than one data directory is used? In my
>> example there are 7 data directories (partitions) and it seems that all of
>> them store 20-30 windows. Effectively this gives 140-210 sstables in total.
>> Is that an optimal configuration?
>>
>> Running on Cassandra 3.11
>>
>> Regards
>> Grzegorz
>>
>


Re: Table with 'compact storage' is not shown in "describe table" output in cqlsh

2022-09-28 Thread manish khandelwal
Hi All

Could this be due to how "DESC" functionality changed via
https://issues.apache.org/jira/browse/CASSANDRA-14825?  Earlier client
drivers were creating schema so in the 3.11.x version, we were able to see
the schema of COMPACT tables but now in Cassandra 4.0.x we are seeing the
warning.

Regards
Manish

On Tue, Sep 27, 2022 at 3:22 PM manish khandelwal <
manishkhandelwa...@gmail.com> wrote:

> Hi All
>
> As I understand there was a plan to drop *Compact Storage* support with 
> *Cassandra
> 4* but later few issues were identified which resulted in continued
> support for Compact Storage in Cassandra 4. My cluster with a few old
> "compact storage" tables  was able to come up with Cassandra 4.0.5. But
> when I described those tables it gave me a warning
>
>
>
> *Warning: Table keyspace_name.table_name omitted because it has constructs
> not compatible with CQL (was created via legacy API).Approximate structure,
> for reference:(this should not be used to reproduce this schema*)
>
> The original schema is not printed. I find no issue querying the table,
> only the schema is not shown.* Is it the expected behaviour for Cassandra
> 4 for compact tables? *
>
>  DROP COMPACT STORAGE is not stable and marked experimental , another
> option is to migrate tables but that might require changes at client side
> as well.
>
> Regards
> Manish
>


Re: Do you know about DBA Stack Exchange?

2022-09-28 Thread Stéphane Alleaume
Thank you very much

Have a nice day

Kind regards
Stéphane

Le mer. 28 sept. 2022, 20:46, Patrick McFadin  a écrit :

> Hi everyone,
>
> I wanted to make sure you know about a great community resource. DBA Stack
> Exchange is a related site to Stack Overflow but strictly for DB operations
> people. There is a dedicated tag for Cassandra operations:
> https://dba.stackexchange.com/questions/tagged/cassandra
>
> I'm mentioning this because there are only 121 followers of this tag as of
> today. Doesn't look like many of you know about this! If you have a minute,
> take a look at any un-answered questions and follow the tag to show your
> support.
>
> Thanks, and hope to see you at ApacheCon next week.
>
> Patrick
>


Re: Do you know about DBA Stack Exchange?

2022-09-28 Thread Boyong N. Lambert
Thank you Patrick




On Wed, Sep 28, 2022 at 7:46 PM Patrick McFadin  wrote:

> Hi everyone,
>
> I wanted to make sure you know about a great community resource. DBA Stack
> Exchange is a related site to Stack Overflow but strictly for DB operations
> people. There is a dedicated tag for Cassandra operations:
> https://dba.stackexchange.com/questions/tagged/cassandra
>
> I'm mentioning this because there are only 121 followers of this tag as of
> today. Doesn't look like many of you know about this! If you have a minute,
> take a look at any un-answered questions and follow the tag to show your
> support.
>
> Thanks, and hope to see you at ApacheCon next week.
>
> Patrick
>


Do you know about DBA Stack Exchange?

2022-09-28 Thread Patrick McFadin
Hi everyone,

I wanted to make sure you know about a great community resource. DBA Stack
Exchange is a related site to Stack Overflow but strictly for DB operations
people. There is a dedicated tag for Cassandra operations:
https://dba.stackexchange.com/questions/tagged/cassandra

I'm mentioning this because there are only 121 followers of this tag as of
today. Doesn't look like many of you know about this! If you have a minute,
take a look at any un-answered questions and follow the tag to show your
support.

Thanks, and hope to see you at ApacheCon next week.

Patrick


Re: TWCS recommendation on number of windows

2022-09-28 Thread Jeff Jirsa
So when I wrote TWCS, I wrote it for a use case that had 24h TTLs and 30
days of retention. In that application, we had tested 12h windows, 24h
windows, and 7 day windows, and eventually settled on 24h windows because
that balanced factors like sstable size, sstables-per-read, and expired
data waiting to be dropped (about 3%, 1/30th, on any given day). That's
where that recommendation came from - it was mostly around how much expired
data will sit around waiting to be dropped. That doesn't change with
multiple data directories.

If you go with fewer windows, you'll expire larger chunks at a time, which
means you'll retain larger chunks waiting on expiration.
If you go with more windows, you'll potentially touch more sstables on read.

Realistically, if you can model your data to align with chunks (so each
read only touches one window), the actual number of sstables shouldn't
really matter much - the timestamps and bloom filter will avoid touching
most of them on the read path anyway. If your data model doesnt have a
timestamp component to it and you're touching lots of sstables on read,
even 30 sstables is probably going to hurt you, and 210 would be really,
really bad.





On Wed, Sep 28, 2022 at 7:00 AM Grzegorz Pietrusza 
wrote:

> Hi All!
>
> According to TWCS documentation (
> https://cassandra.apache.org/doc/latest/cassandra/operating/compaction/twcs.html)
> the operator should choose compaction window parameters to select a
> compaction_window_unit and compaction_window_size pair that produces
> approximately 20-30 windows.
>
> I'm curious where this recommendation comes from? Also should the number
> of windows be changed when more than one data directory is used? In my
> example there are 7 data directories (partitions) and it seems that all of
> them store 20-30 windows. Effectively this gives 140-210 sstables in total.
> Is that an optimal configuration?
>
> Running on Cassandra 3.11
>
> Regards
> Grzegorz
>


TWCS recommendation on number of windows

2022-09-28 Thread Grzegorz Pietrusza
Hi All!

According to TWCS documentation (
https://cassandra.apache.org/doc/latest/cassandra/operating/compaction/twcs.html)
the operator should choose compaction window parameters to select a
compaction_window_unit and compaction_window_size pair that produces
approximately 20-30 windows.

I'm curious where this recommendation comes from? Also should the number of
windows be changed when more than one data directory is used? In my example
there are 7 data directories (partitions) and it seems that all of them
store 20-30 windows. Effectively this gives 140-210 sstables in total. Is
that an optimal configuration?

Running on Cassandra 3.11

Regards
Grzegorz


Re: Questions on the count and multiple index behaviour in cassandra

2022-09-28 Thread Bowen Song via user

It sounds like you are misusing/abusing Cassandra.

I've noticed the following Cassandra anti-patterns in your post:

1. Large or uneven partitions
   All rows in a table in a single partition is definitely an
   anti-pattern unless you only have a very small number of rows.
2. "SELECT COUNT(*) FROM ..." without providing a partition key
   In your case, since all rows are in a single partition, it's
   equivalent to without a partition key.
3. Wide table (too many columns)
   91 columns sounds excessive, and may lead to reduced performance and
   heightened JVM GC pressure

Cassandra is not a SQL database. You should design your table schema 
around the queries, not design your queries around the table schema. You 
may also need to store multiple copies of the same data with different 
keys to satisfy different queries.


On 28/09/2022 12:44, Karthik K wrote:

Hi,

We have two doubts on cassandra 3.11 features:

1) Need to get counts of row from a cassandra table.
We have 3 node clusters with Apache Cassandra 3.11 version.

We loaded a table in cassandra with 9lakh records. We have around 91 
columns in this table. Most of the records have text as datatype.

All these 9lakh records were part of a single partition key.

When we tried a select count(*) query with that partition key, the 
query was timing out.


However, we were able to retrieve counts through multiple calls by 
fetching only
1 lakh records in each call. The only disadvantage here is the time 
taken which

is around 1minute and 3 seconds.

Is there any other approach to get the row count faster in cassandra? 
Do we need to '
change the data modelling approach to achieve this? Suggestions are 
welcome



2) How to data model in cassandra to support usage of multiple filters.
 We may also need the count of rows for this multiple filter query.

Thanks & Regards,
Karthikeyan

Re: Questions on the count and multiple index behaviour in cassandra

2022-09-28 Thread Stéphane Alleaume
Hi

1) how much size in Mo is your partition ? Should be less than 100 Mo (but
less in fact)

2) could you plug an Elasticsearch or Solr search in front  ?

Kind regards
Stephane





Le mer. 28 sept. 2022, 13:46, Karthik K  a
écrit :

> Hi,
>
> We have two doubts on cassandra 3.11 features:
>
> 1) Need to get counts of row from a cassandra table.
> We have 3 node clusters with Apache Cassandra 3.11 version.
>
> We loaded a table in cassandra with 9lakh records. We have around 91
> columns in this table. Most of the records have text as datatype.
> All these 9lakh records were part of a single partition key.
>
> When we tried a select count(*) query with that partition key, the query
> was timing out.
>
> However, we were able to retrieve counts through multiple calls by
> fetching only
> 1 lakh records in each call. The only disadvantage here is the time taken
> which
> is around 1minute and 3 seconds.
>
> Is there any other approach to get the row count faster in cassandra? Do
> we need to '
> change the data modelling approach to achieve this? Suggestions are welcome
>
>
> 2) How to data model in cassandra to support usage of multiple filters.
>  We may also need the count of rows for this multiple filter query.
>
> Thanks & Regards,
> Karthikeyan
>


Questions on the count and multiple index behaviour in cassandra

2022-09-28 Thread Karthik K
Hi,

We have two doubts on cassandra 3.11 features:

1) Need to get counts of row from a cassandra table.
We have 3 node clusters with Apache Cassandra 3.11 version.

We loaded a table in cassandra with 9lakh records. We have around 91
columns in this table. Most of the records have text as datatype.
All these 9lakh records were part of a single partition key.

When we tried a select count(*) query with that partition key, the query
was timing out.

However, we were able to retrieve counts through multiple calls by fetching
only
1 lakh records in each call. The only disadvantage here is the time taken
which
is around 1minute and 3 seconds.

Is there any other approach to get the row count faster in cassandra? Do we
need to '
change the data modelling approach to achieve this? Suggestions are welcome


2) How to data model in cassandra to support usage of multiple filters.
 We may also need the count of rows for this multiple filter query.

Thanks & Regards,
Karthikeyan