Re: What does "PER PARTITION LIMIT" means in cql query in cassandra?

2020-05-07 Thread Pekka Enberg
Hi Chuck,

On Thu, May 7, 2020 at 10:14 AM Check Peck  wrote:

> I have a scylla table as shown below:
>

(Please note that this is the Apache Cassandra users mailing list. Of
course, the feature is the same, so let me answer it here.)


>
> cqlsh:sampleks> describe table test;
>
>
> CREATE TABLE test (
>
> client_id int,
>
> when timestamp,
>
> process_ids list,
>
> md text,
>
> PRIMARY KEY (client_id, when) ) WITH CLUSTERING ORDER BY (when
> DESC)
>
> AND bloom_filter_fp_chance = 0.01
>
> AND caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'}
>
> AND comment = ''
>
> AND compaction = {'class': 'TimeWindowCompactionStrategy',
> 'compaction_window_size': '1', 'compaction_window_unit': 'DAYS'}
>
> AND compression = {'sstable_compression':
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>
> AND crc_check_chance = 1.0
>
> AND dclocal_read_repair_chance = 0.1
>
> AND default_time_to_live = 0
>
> AND gc_grace_seconds = 172800
>
> AND max_index_interval = 1024
>
> AND memtable_flush_period_in_ms = 0
>
> AND min_index_interval = 128
>
> AND read_repair_chance = 0.0
>
> AND speculative_retry = '99.0PERCENTILE';
>
>
> And I see this is how we are querying it. It's been a long time I worked
> on cassandra so this “PER PARTITION LIMIT” is new thing to me (looks like
> recently added). Can someone explain what does this do with some example in
> a layman language? I couldn't find any good doc on that which explains
> easily.
>
>
> SELECT * FROM test WHERE client_id IN ? PER PARTITION LIMIT 1;
>

The "PER PARTITION LIMIT" option is documented here, although I do agree
it's a rather terse explanation:

https://cassandra.apache.org/doc/latest/cql/dml.html#limiting-results

What it does is it limits the number of returned rows *per partition*. So,
for example, with your schema, if you have the following data:

cqlsh:ks1> SELECT client_id, when FROM test;

 client_id | when
---+-
 1 | 2020-01-01 22:00:00.00+
 1 | 2019-12-31 22:00:00.00+
 2 | 2020-02-12 22:00:00.00+
 2 | 2020-02-11 22:00:00.00+
 2 | 2020-02-10 22:00:00.00+

(5 rows)

You can ask the query to limit the number of rows returned for each
"client_id". For example, with limit of "1", you'd have:

cqlsh:ks1> SELECT client_id, when FROM test PER PARTITION LIMIT 1;

 client_id | when
---+-
 1 | 2020-01-01 22:00:00.00+
 2 | 2020-02-12 22:00:00.00+

(2 rows)

Increasing limit to "2", would yield:

cqlsh:ks1> SELECT client_id, when FROM test PER PARTITION LIMIT 2;

 client_id | when
---+-
 1 | 2020-01-01 22:00:00.00+
 1 | 2019-12-31 22:00:00.00+
 2 | 2020-02-12 22:00:00.00+
 2 | 2020-02-11 22:00:00.00+

(4 rows)

Hope this helps!

Regards,

- Pekka


Re: What does "PER PARTITION LIMIT" means in cql query in cassandra?

2020-05-07 Thread Dor Laor
In your schema case, for each client_id you will get a single 'when'
row. Just one. Even when there are multiple rows (clustering keys)

On Thu, May 7, 2020 at 12:14 AM Check Peck  wrote:
>
> I have a scylla table as shown below:
>
>
> cqlsh:sampleks> describe table test;
>
>
> CREATE TABLE test (
>
> client_id int,
>
> when timestamp,
>
> process_ids list,
>
> md text,
>
> PRIMARY KEY (client_id, when) ) WITH CLUSTERING ORDER BY (when DESC)
>
> AND bloom_filter_fp_chance = 0.01
>
> AND caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'}
>
> AND comment = ''
>
> AND compaction = {'class': 'TimeWindowCompactionStrategy', 
> 'compaction_window_size': '1', 'compaction_window_unit': 'DAYS'}
>
> AND compression = {'sstable_compression': 
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>
> AND crc_check_chance = 1.0
>
> AND dclocal_read_repair_chance = 0.1
>
> AND default_time_to_live = 0
>
> AND gc_grace_seconds = 172800
>
> AND max_index_interval = 1024
>
> AND memtable_flush_period_in_ms = 0
>
> AND min_index_interval = 128
>
> AND read_repair_chance = 0.0
>
> AND speculative_retry = '99.0PERCENTILE';
>
>
> And I see this is how we are querying it. It's been a long time I worked on 
> cassandra so this “PER PARTITION LIMIT” is new thing to me (looks like 
> recently added). Can someone explain what does this do with some example in a 
> layman language? I couldn't find any good doc on that which explains easily.
>
>
> SELECT * FROM test WHERE client_id IN ? PER PARTITION LIMIT 1;

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



What does "PER PARTITION LIMIT" means in cql query in cassandra?

2020-05-07 Thread Check Peck
I have a scylla table as shown below:


cqlsh:sampleks> describe table test;


CREATE TABLE test (

client_id int,

when timestamp,

process_ids list,

md text,

PRIMARY KEY (client_id, when) ) WITH CLUSTERING ORDER BY (when DESC)

AND bloom_filter_fp_chance = 0.01

AND caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'}

AND comment = ''

AND compaction = {'class': 'TimeWindowCompactionStrategy',
'compaction_window_size': '1', 'compaction_window_unit': 'DAYS'}

AND compression = {'sstable_compression':
'org.apache.cassandra.io.compress.LZ4Compressor'}

AND crc_check_chance = 1.0

AND dclocal_read_repair_chance = 0.1

AND default_time_to_live = 0

AND gc_grace_seconds = 172800

AND max_index_interval = 1024

AND memtable_flush_period_in_ms = 0

AND min_index_interval = 128

AND read_repair_chance = 0.0

AND speculative_retry = '99.0PERCENTILE';


And I see this is how we are querying it. It's been a long time I worked on
cassandra so this “PER PARTITION LIMIT” is new thing to me (looks like
recently added). Can someone explain what does this do with some example in
a layman language? I couldn't find any good doc on that which explains
easily.


SELECT * FROM test WHERE client_id IN ? PER PARTITION LIMIT 1;