Re: What does "PER PARTITION LIMIT" means in cql query in cassandra?
Hi Chuck, On Thu, May 7, 2020 at 10:14 AM Check Peck wrote: > I have a scylla table as shown below: > (Please note that this is the Apache Cassandra users mailing list. Of course, the feature is the same, so let me answer it here.) > > cqlsh:sampleks> describe table test; > > > CREATE TABLE test ( > > client_id int, > > when timestamp, > > process_ids list, > > md text, > > PRIMARY KEY (client_id, when) ) WITH CLUSTERING ORDER BY (when > DESC) > > AND bloom_filter_fp_chance = 0.01 > > AND caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'} > > AND comment = '' > > AND compaction = {'class': 'TimeWindowCompactionStrategy', > 'compaction_window_size': '1', 'compaction_window_unit': 'DAYS'} > > AND compression = {'sstable_compression': > 'org.apache.cassandra.io.compress.LZ4Compressor'} > > AND crc_check_chance = 1.0 > > AND dclocal_read_repair_chance = 0.1 > > AND default_time_to_live = 0 > > AND gc_grace_seconds = 172800 > > AND max_index_interval = 1024 > > AND memtable_flush_period_in_ms = 0 > > AND min_index_interval = 128 > > AND read_repair_chance = 0.0 > > AND speculative_retry = '99.0PERCENTILE'; > > > And I see this is how we are querying it. It's been a long time I worked > on cassandra so this “PER PARTITION LIMIT” is new thing to me (looks like > recently added). Can someone explain what does this do with some example in > a layman language? I couldn't find any good doc on that which explains > easily. > > > SELECT * FROM test WHERE client_id IN ? PER PARTITION LIMIT 1; > The "PER PARTITION LIMIT" option is documented here, although I do agree it's a rather terse explanation: https://cassandra.apache.org/doc/latest/cql/dml.html#limiting-results What it does is it limits the number of returned rows *per partition*. So, for example, with your schema, if you have the following data: cqlsh:ks1> SELECT client_id, when FROM test; client_id | when ---+- 1 | 2020-01-01 22:00:00.00+ 1 | 2019-12-31 22:00:00.00+ 2 | 2020-02-12 22:00:00.00+ 2 | 2020-02-11 22:00:00.00+ 2 | 2020-02-10 22:00:00.00+ (5 rows) You can ask the query to limit the number of rows returned for each "client_id". For example, with limit of "1", you'd have: cqlsh:ks1> SELECT client_id, when FROM test PER PARTITION LIMIT 1; client_id | when ---+- 1 | 2020-01-01 22:00:00.00+ 2 | 2020-02-12 22:00:00.00+ (2 rows) Increasing limit to "2", would yield: cqlsh:ks1> SELECT client_id, when FROM test PER PARTITION LIMIT 2; client_id | when ---+- 1 | 2020-01-01 22:00:00.00+ 1 | 2019-12-31 22:00:00.00+ 2 | 2020-02-12 22:00:00.00+ 2 | 2020-02-11 22:00:00.00+ (4 rows) Hope this helps! Regards, - Pekka
Re: What does "PER PARTITION LIMIT" means in cql query in cassandra?
In your schema case, for each client_id you will get a single 'when' row. Just one. Even when there are multiple rows (clustering keys) On Thu, May 7, 2020 at 12:14 AM Check Peck wrote: > > I have a scylla table as shown below: > > > cqlsh:sampleks> describe table test; > > > CREATE TABLE test ( > > client_id int, > > when timestamp, > > process_ids list, > > md text, > > PRIMARY KEY (client_id, when) ) WITH CLUSTERING ORDER BY (when DESC) > > AND bloom_filter_fp_chance = 0.01 > > AND caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'} > > AND comment = '' > > AND compaction = {'class': 'TimeWindowCompactionStrategy', > 'compaction_window_size': '1', 'compaction_window_unit': 'DAYS'} > > AND compression = {'sstable_compression': > 'org.apache.cassandra.io.compress.LZ4Compressor'} > > AND crc_check_chance = 1.0 > > AND dclocal_read_repair_chance = 0.1 > > AND default_time_to_live = 0 > > AND gc_grace_seconds = 172800 > > AND max_index_interval = 1024 > > AND memtable_flush_period_in_ms = 0 > > AND min_index_interval = 128 > > AND read_repair_chance = 0.0 > > AND speculative_retry = '99.0PERCENTILE'; > > > And I see this is how we are querying it. It's been a long time I worked on > cassandra so this “PER PARTITION LIMIT” is new thing to me (looks like > recently added). Can someone explain what does this do with some example in a > layman language? I couldn't find any good doc on that which explains easily. > > > SELECT * FROM test WHERE client_id IN ? PER PARTITION LIMIT 1; - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
What does "PER PARTITION LIMIT" means in cql query in cassandra?
I have a scylla table as shown below: cqlsh:sampleks> describe table test; CREATE TABLE test ( client_id int, when timestamp, process_ids list, md text, PRIMARY KEY (client_id, when) ) WITH CLUSTERING ORDER BY (when DESC) AND bloom_filter_fp_chance = 0.01 AND caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'} AND comment = '' AND compaction = {'class': 'TimeWindowCompactionStrategy', 'compaction_window_size': '1', 'compaction_window_unit': 'DAYS'} AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'} AND crc_check_chance = 1.0 AND dclocal_read_repair_chance = 0.1 AND default_time_to_live = 0 AND gc_grace_seconds = 172800 AND max_index_interval = 1024 AND memtable_flush_period_in_ms = 0 AND min_index_interval = 128 AND read_repair_chance = 0.0 AND speculative_retry = '99.0PERCENTILE'; And I see this is how we are querying it. It's been a long time I worked on cassandra so this “PER PARTITION LIMIT” is new thing to me (looks like recently added). Can someone explain what does this do with some example in a layman language? I couldn't find any good doc on that which explains easily. SELECT * FROM test WHERE client_id IN ? PER PARTITION LIMIT 1;