[jira] [Commented] (CASSANDRA-11031) MultiTenant : support “ALLOW FILTERING" for Partition Key
[ https://issues.apache.org/jira/browse/CASSANDRA-11031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15445137#comment-15445137 ] Benjamin Lerer commented on CASSANDRA-11031: In {{RowFilter}} the patch still use lambdas in {{IsSatisfiedFilter}} that will be called for each partition and row. > MultiTenant : support “ALLOW FILTERING" for Partition Key > - > > Key: CASSANDRA-11031 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11031 > Project: Cassandra > Issue Type: New Feature > Components: CQL >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Minor > Fix For: 3.x > > > Currently, Allow Filtering only works for secondary Index column or > clustering columns. And it's slow, because Cassandra will read all data from > SSTABLE from hard-disk to memory to filter. > But we can support allow filtering on Partition Key, as far as I know, > Partition Key is in memory, so we can easily filter them, and then read > required data from SSTable. > This will similar to "Select * from table" which scan through entire cluster. > CREATE TABLE multi_tenant_table ( > tenant_id text, > pk2 text, > c1 text, > c2 text, > v1 text, > v2 text, > PRIMARY KEY ((tenant_id,pk2),c1,c2) > ) ; > Select * from multi_tenant_table where tenant_id = "datastax" allow filtering; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11031) MultiTenant : support “ALLOW FILTERING" for Partition Key
[ https://issues.apache.org/jira/browse/CASSANDRA-11031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15436560#comment-15436560 ] Alex Petrov commented on CASSANDRA-11031: - Thank you for the review! I've addressed your comments and wait for the test results now. All changes are done in a separate commit to make it simpler to see what's been done, I'll combine them if you +1. bq. The hasSlice method is now duplicated in PartitionKeySingleRestrictionSet and ClusteringColumnRestrictions. It would be better to put it only into RestrictionSet and expose it into Restrictions. Removed the redundant method, pulled it up. bw. We should have some unit test for MaterializedViews with Slice and unrestricted partition keys. Added the MV tests, similar to the previously existing ones. bq. In StatementRestrictions::processPartitionKeyRestrictions I do not think that the error message: "Partition key parts: %s must be restricted as other parts are" makes sense any more (for non-MVs). We should probably only use the ALLOW FILTERING error message. Now we just have one error message, same everywhere. Tests adjusted accordingly. bq. The part of StatementRestrictions::processPartitionKeyRestrictions dealing with secondary index can probably be combined with the one dealing with the filtering. True, I've separated them for sakes of simplicity, but I hope that new version looks clean as well. Combining them furhter is tricky and will require double-checking of non-restricted PK parts and whether or not restrictions are empty. It was a bit simpler previously since we did't have to take care of slices. bq. In RowFilter we should not use Lambda expression to filter out the rows as they are pretty bad from the performance point of view and result in a lot of garbage. In RowFilter: Iterables.size(rowLevelExpressions) can be replaced by rowLevelExpressions.size() Optimised that part as well, got rid of streams alltogether, we iterate through restrictions just once now. bq. The tests using secondary indices should be put into SecondaryIndexTest Moved the tests. bq. It will be good to add more tests into SelectOrderedPartitionerTest::testFilteringOnPartitionKeyWithToken: with > on one of the partition key or only the second partition key being specified I've added one more clause that seemed missing, although since token restrictions require both parts for the token function, I did not add anything else. bq. In SelectTest::testAllowFilteringOnPartitionKey the message IN restrictions are not supported on indexed columns looks wrong to me as the test do not use any secondary index. For clarity here, I've added a more elaborate message that specifies that filtering is involved. |[trunk|https://github.com/ifesdjeen/cassandra/tree/11031-trunk]|[dtest|https://cassci.datastax.com/view/Dev/view/ifesdjeen/job/ifesdjeen-11031-trunk-dtest/]|[testall|https://cassci.datastax.com/view/Dev/view/ifesdjeen/job/ifesdjeen-11031-trunk-testall/]| > MultiTenant : support “ALLOW FILTERING" for Partition Key > - > > Key: CASSANDRA-11031 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11031 > Project: Cassandra > Issue Type: New Feature > Components: CQL >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Minor > Fix For: 3.x > > > Currently, Allow Filtering only works for secondary Index column or > clustering columns. And it's slow, because Cassandra will read all data from > SSTABLE from hard-disk to memory to filter. > But we can support allow filtering on Partition Key, as far as I know, > Partition Key is in memory, so we can easily filter them, and then read > required data from SSTable. > This will similar to "Select * from table" which scan through entire cluster. > CREATE TABLE multi_tenant_table ( > tenant_id text, > pk2 text, > c1 text, > c2 text, > v1 text, > v2 text, > PRIMARY KEY ((tenant_id,pk2),c1,c2) > ) ; > Select * from multi_tenant_table where tenant_id = "datastax" allow filtering; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11031) MultiTenant : support “ALLOW FILTERING" for Partition Key
[ https://issues.apache.org/jira/browse/CASSANDRA-11031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15432879#comment-15432879 ] ZhaoYang commented on CASSANDRA-11031: -- [~ifesdjeen] sure, I will add these dtest this week > MultiTenant : support “ALLOW FILTERING" for Partition Key > - > > Key: CASSANDRA-11031 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11031 > Project: Cassandra > Issue Type: New Feature > Components: CQL >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Minor > Fix For: 3.x > > > Currently, Allow Filtering only works for secondary Index column or > clustering columns. And it's slow, because Cassandra will read all data from > SSTABLE from hard-disk to memory to filter. > But we can support allow filtering on Partition Key, as far as I know, > Partition Key is in memory, so we can easily filter them, and then read > required data from SSTable. > This will similar to "Select * from table" which scan through entire cluster. > CREATE TABLE multi_tenant_table ( > tenant_id text, > pk2 text, > c1 text, > c2 text, > v1 text, > v2 text, > PRIMARY KEY ((tenant_id,pk2),c1,c2) > ) ; > Select * from multi_tenant_table where tenant_id = "datastax" allow filtering; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11031) MultiTenant : support “ALLOW FILTERING" for Partition Key
[ https://issues.apache.org/jira/browse/CASSANDRA-11031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15432368#comment-15432368 ] Benjamin Lerer commented on CASSANDRA-11031: Overall the patch looks good to me. I just have noted the following points: * The {{hasSlice}} method is now duplicated in {{PartitionKeySingleRestrictionSet}} and {{ClusteringColumnRestrictions}}. It would be better to put it only into {{RestrictionSet}} and expose it into {{Restrictions}}. * We should have some unit test for MaterializedViews with Slice and unrestricted partition keys. * In {{StatementRestrictions::processPartitionKeyRestrictions}} I do not think that the error message: {{"Partition key parts: %s must be restricted as other parts are"}} makes sense any more (for non-MVs). We should probably only use the {{ALLOW FILTERING}} error message. * The part of {{StatementRestrictions::processPartitionKeyRestrictions}} dealing with secondary index can probably be combined with the one dealing with the filtering. * In {{RowFilter}} we should not use Lambda expression to filter out the rows as they are pretty bad from the performance point of view and result in a lot of garbage. * In {{RowFilter}}: {{Iterables.size(rowLevelExpressions)}} can be replaced by {{rowLevelExpressions.size()}} * The tests using secondary indices should be put into {{SecondaryIndexTest}} * It will be good to add more tests into {{SelectOrderedPartitionerTest::testFilteringOnPartitionKeyWithToken}}: with {{>}} on one of the partition key or only the second partition key being specified * In {{SelectTest::testAllowFilteringOnPartitionKey}} the message {{IN restrictions are not supported on indexed columns}} looks wrong to me as the test do not use any secondary index. > MultiTenant : support “ALLOW FILTERING" for Partition Key > - > > Key: CASSANDRA-11031 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11031 > Project: Cassandra > Issue Type: New Feature > Components: CQL >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Minor > Fix For: 3.x > > > Currently, Allow Filtering only works for secondary Index column or > clustering columns. And it's slow, because Cassandra will read all data from > SSTABLE from hard-disk to memory to filter. > But we can support allow filtering on Partition Key, as far as I know, > Partition Key is in memory, so we can easily filter them, and then read > required data from SSTable. > This will similar to "Select * from table" which scan through entire cluster. > CREATE TABLE multi_tenant_table ( > tenant_id text, > pk2 text, > c1 text, > c2 text, > v1 text, > v2 text, > PRIMARY KEY ((tenant_id,pk2),c1,c2) > ) ; > Select * from multi_tenant_table where tenant_id = "datastax" allow filtering; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11031) MultiTenant : support “ALLOW FILTERING" for Partition Key
[ https://issues.apache.org/jira/browse/CASSANDRA-11031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15429723#comment-15429723 ] Alex Petrov commented on CASSANDRA-11031: - [~blerer] I haven't made any code chances since the last submission, just fixed / improved dtest suite. Now, all tests pass. If you could take a look, it'd be great. I haven't squashed commits just in case you'd like to see the history. To roughly describe what I've done: * short-circuited the PK & Static column queries in {{RowFilter}}, made sure that checks for PK and static column expressions are done once only * changed logic in {{processPartitionKeyRestrictions}}, hopefully making all cases more elaborate The rest is just minor adjustments. |[trunk|https://github.com/ifesdjeen/cassandra/tree/11031-trunk]|[dtest|https://cassci.datastax.com/view/Dev/view/ifesdjeen/job/ifesdjeen-11031-trunk-dtest/]|[testall|https://cassci.datastax.com/view/Dev/view/ifesdjeen/job/ifesdjeen-11031-trunk-testall/]| > MultiTenant : support “ALLOW FILTERING" for Partition Key > - > > Key: CASSANDRA-11031 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11031 > Project: Cassandra > Issue Type: New Feature > Components: CQL >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Minor > Fix For: 3.x > > > Currently, Allow Filtering only works for secondary Index column or > clustering columns. And it's slow, because Cassandra will read all data from > SSTABLE from hard-disk to memory to filter. > But we can support allow filtering on Partition Key, as far as I know, > Partition Key is in memory, so we can easily filter them, and then read > required data from SSTable. > This will similar to "Select * from table" which scan through entire cluster. > CREATE TABLE multi_tenant_table ( > tenant_id text, > pk2 text, > c1 text, > c2 text, > v1 text, > v2 text, > PRIMARY KEY ((tenant_id,pk2),c1,c2) > ) ; > Select * from multi_tenant_table where tenant_id = "datastax" allow filtering; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11031) MultiTenant : support “ALLOW FILTERING" for Partition Key
[ https://issues.apache.org/jira/browse/CASSANDRA-11031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15428384#comment-15428384 ] Alex Petrov commented on CASSANDRA-11031: - [~jasonstack] I've fixed most of problems with dtests by now. Unit tests look good. However, ordering was not correct in limit tests,{{PER PARTITION LIMIT}} tests and test with {{ORDER BY}} clause were absent. Could you please add those? Please, use {{assert_all}} with {{ignore_order}} if results return more than one partition as their order is not guaranteed. {{LIMIT}} tests are quite tricky. The easiest would probably be to make sure you know which partition you're going to be getting back first, this way order will be consistent. Otherwise you have to compare with contents of all possible partitions. > MultiTenant : support “ALLOW FILTERING" for Partition Key > - > > Key: CASSANDRA-11031 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11031 > Project: Cassandra > Issue Type: New Feature > Components: CQL >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Minor > Fix For: 3.x > > > Currently, Allow Filtering only works for secondary Index column or > clustering columns. And it's slow, because Cassandra will read all data from > SSTABLE from hard-disk to memory to filter. > But we can support allow filtering on Partition Key, as far as I know, > Partition Key is in memory, so we can easily filter them, and then read > required data from SSTable. > This will similar to "Select * from table" which scan through entire cluster. > CREATE TABLE multi_tenant_table ( > tenant_id text, > pk2 text, > c1 text, > c2 text, > v1 text, > v2 text, > PRIMARY KEY ((tenant_id,pk2),c1,c2) > ) ; > Select * from multi_tenant_table where tenant_id = "datastax" allow filtering; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11031) MultiTenant : support “ALLOW FILTERING" for Partition Key
[ https://issues.apache.org/jira/browse/CASSANDRA-11031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15428166#comment-15428166 ] Benjamin Lerer commented on CASSANDRA-11031: [~jasonstack] Sorry, I got stuck into some other patches. I will do the review as soon as Alex is happy with his last changes. > MultiTenant : support “ALLOW FILTERING" for Partition Key > - > > Key: CASSANDRA-11031 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11031 > Project: Cassandra > Issue Type: New Feature > Components: CQL >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Minor > Fix For: 3.x > > > Currently, Allow Filtering only works for secondary Index column or > clustering columns. And it's slow, because Cassandra will read all data from > SSTABLE from hard-disk to memory to filter. > But we can support allow filtering on Partition Key, as far as I know, > Partition Key is in memory, so we can easily filter them, and then read > required data from SSTable. > This will similar to "Select * from table" which scan through entire cluster. > CREATE TABLE multi_tenant_table ( > tenant_id text, > pk2 text, > c1 text, > c2 text, > v1 text, > v2 text, > PRIMARY KEY ((tenant_id,pk2),c1,c2) > ) ; > Select * from multi_tenant_table where tenant_id = "datastax" allow filtering; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11031) MultiTenant : support “ALLOW FILTERING" for Partition Key
[ https://issues.apache.org/jira/browse/CASSANDRA-11031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15428162#comment-15428162 ] Alex Petrov commented on CASSANDRA-11031: - After the test run for latest changes I've noticed that dtests are mostly incorrect since they're expecting particular partition order and sometimes even use limit which can not work since we can not guarantee positioning and order of partitions on slice query without re-ordering them on coordinator (which does not happen in this case). > MultiTenant : support “ALLOW FILTERING" for Partition Key > - > > Key: CASSANDRA-11031 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11031 > Project: Cassandra > Issue Type: New Feature > Components: CQL >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Minor > Fix For: 3.x > > > Currently, Allow Filtering only works for secondary Index column or > clustering columns. And it's slow, because Cassandra will read all data from > SSTABLE from hard-disk to memory to filter. > But we can support allow filtering on Partition Key, as far as I know, > Partition Key is in memory, so we can easily filter them, and then read > required data from SSTable. > This will similar to "Select * from table" which scan through entire cluster. > CREATE TABLE multi_tenant_table ( > tenant_id text, > pk2 text, > c1 text, > c2 text, > v1 text, > v2 text, > PRIMARY KEY ((tenant_id,pk2),c1,c2) > ) ; > Select * from multi_tenant_table where tenant_id = "datastax" allow filtering; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11031) MultiTenant : support “ALLOW FILTERING" for Partition Key
[ https://issues.apache.org/jira/browse/CASSANDRA-11031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427749#comment-15427749 ] Alex Petrov commented on CASSANDRA-11031: - I'm sure Benjamin did not forget. This is just a large change and needs lots of time and uninterrupted attention to get right. I've made several more changes yesterday and found several more problems. Also, dtests were not working properly. I've submitted patch for another round of CI and will take another look at it at Tuesday. > MultiTenant : support “ALLOW FILTERING" for Partition Key > - > > Key: CASSANDRA-11031 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11031 > Project: Cassandra > Issue Type: New Feature > Components: CQL >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Minor > Fix For: 3.x > > > Currently, Allow Filtering only works for secondary Index column or > clustering columns. And it's slow, because Cassandra will read all data from > SSTABLE from hard-disk to memory to filter. > But we can support allow filtering on Partition Key, as far as I know, > Partition Key is in memory, so we can easily filter them, and then read > required data from SSTable. > This will similar to "Select * from table" which scan through entire cluster. > CREATE TABLE multi_tenant_table ( > tenant_id text, > pk2 text, > c1 text, > c2 text, > v1 text, > v2 text, > PRIMARY KEY ((tenant_id,pk2),c1,c2) > ) ; > Select * from multi_tenant_table where tenant_id = "datastax" allow filtering; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11031) MultiTenant : support “ALLOW FILTERING" for Partition Key
[ https://issues.apache.org/jira/browse/CASSANDRA-11031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427699#comment-15427699 ] ZhaoYang commented on CASSANDRA-11031: -- Hi, [~blerer] in case you forgot.. > MultiTenant : support “ALLOW FILTERING" for Partition Key > - > > Key: CASSANDRA-11031 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11031 > Project: Cassandra > Issue Type: New Feature > Components: CQL >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Minor > Fix For: 3.x > > > Currently, Allow Filtering only works for secondary Index column or > clustering columns. And it's slow, because Cassandra will read all data from > SSTABLE from hard-disk to memory to filter. > But we can support allow filtering on Partition Key, as far as I know, > Partition Key is in memory, so we can easily filter them, and then read > required data from SSTable. > This will similar to "Select * from table" which scan through entire cluster. > CREATE TABLE multi_tenant_table ( > tenant_id text, > pk2 text, > c1 text, > c2 text, > v1 text, > v2 text, > PRIMARY KEY ((tenant_id,pk2),c1,c2) > ) ; > Select * from multi_tenant_table where tenant_id = "datastax" allow filtering; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11031) MultiTenant : support “ALLOW FILTERING" for Partition Key
[ https://issues.apache.org/jira/browse/CASSANDRA-11031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427700#comment-15427700 ] ZhaoYang commented on CASSANDRA-11031: -- Hi, [~blerer] in case you forgot.. > MultiTenant : support “ALLOW FILTERING" for Partition Key > - > > Key: CASSANDRA-11031 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11031 > Project: Cassandra > Issue Type: New Feature > Components: CQL >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Minor > Fix For: 3.x > > > Currently, Allow Filtering only works for secondary Index column or > clustering columns. And it's slow, because Cassandra will read all data from > SSTABLE from hard-disk to memory to filter. > But we can support allow filtering on Partition Key, as far as I know, > Partition Key is in memory, so we can easily filter them, and then read > required data from SSTable. > This will similar to "Select * from table" which scan through entire cluster. > CREATE TABLE multi_tenant_table ( > tenant_id text, > pk2 text, > c1 text, > c2 text, > v1 text, > v2 text, > PRIMARY KEY ((tenant_id,pk2),c1,c2) > ) ; > Select * from multi_tenant_table where tenant_id = "datastax" allow filtering; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11031) MultiTenant : support “ALLOW FILTERING" for Partition Key
[ https://issues.apache.org/jira/browse/CASSANDRA-11031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427701#comment-15427701 ] ZhaoYang commented on CASSANDRA-11031: -- thanks. LGTM > MultiTenant : support “ALLOW FILTERING" for Partition Key > - > > Key: CASSANDRA-11031 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11031 > Project: Cassandra > Issue Type: New Feature > Components: CQL >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Minor > Fix For: 3.x > > > Currently, Allow Filtering only works for secondary Index column or > clustering columns. And it's slow, because Cassandra will read all data from > SSTABLE from hard-disk to memory to filter. > But we can support allow filtering on Partition Key, as far as I know, > Partition Key is in memory, so we can easily filter them, and then read > required data from SSTable. > This will similar to "Select * from table" which scan through entire cluster. > CREATE TABLE multi_tenant_table ( > tenant_id text, > pk2 text, > c1 text, > c2 text, > v1 text, > v2 text, > PRIMARY KEY ((tenant_id,pk2),c1,c2) > ) ; > Select * from multi_tenant_table where tenant_id = "datastax" allow filtering; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11031) MultiTenant : support “ALLOW FILTERING" for Partition Key
[ https://issues.apache.org/jira/browse/CASSANDRA-11031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15412344#comment-15412344 ] Benjamin Lerer commented on CASSANDRA-11031: I will have a look as soon as I can (probably next week). > MultiTenant : support “ALLOW FILTERING" for Partition Key > - > > Key: CASSANDRA-11031 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11031 > Project: Cassandra > Issue Type: New Feature > Components: CQL >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Minor > Fix For: 3.x > > Attachments: CASSANDRA-11031-3.7.patch > > > Currently, Allow Filtering only works for secondary Index column or > clustering columns. And it's slow, because Cassandra will read all data from > SSTABLE from hard-disk to memory to filter. > But we can support allow filtering on Partition Key, as far as I know, > Partition Key is in memory, so we can easily filter them, and then read > required data from SSTable. > This will similar to "Select * from table" which scan through entire cluster. > CREATE TABLE multi_tenant_table ( > tenant_id text, > pk2 text, > c1 text, > c2 text, > v1 text, > v2 text, > PRIMARY KEY ((tenant_id,pk2),c1,c2) > ) ; > Select * from multi_tenant_table where tenant_id = "datastax" allow filtering; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11031) MultiTenant : support “ALLOW FILTERING" for Partition Key
[ https://issues.apache.org/jira/browse/CASSANDRA-11031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15412304#comment-15412304 ] Alex Petrov commented on CASSANDRA-11031: - [~jasonstack] I've made several changes in a separate commit, you can check them below. [~blerer] could you lend us another pair of eyes on the ticket? I've found a way to short-circuit several common cases for both partition and static restrictions, tests are looking good, but i'm biased now since I've touched the patch... |[trunk|https://github.com/ifesdjeen/cassandra/tree/11031-trunk]|[dtest|https://cassci.datastax.com/view/Dev/view/ifesdjeen/job/ifesdjeen-11031-trunk-dtest/]|[testall|https://cassci.datastax.com/view/Dev/view/ifesdjeen/job/ifesdjeen-11031-trunk-testall/]| > MultiTenant : support “ALLOW FILTERING" for Partition Key > - > > Key: CASSANDRA-11031 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11031 > Project: Cassandra > Issue Type: New Feature > Components: CQL >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Minor > Fix For: 3.x > > Attachments: CASSANDRA-11031-3.7.patch > > > Currently, Allow Filtering only works for secondary Index column or > clustering columns. And it's slow, because Cassandra will read all data from > SSTABLE from hard-disk to memory to filter. > But we can support allow filtering on Partition Key, as far as I know, > Partition Key is in memory, so we can easily filter them, and then read > required data from SSTable. > This will similar to "Select * from table" which scan through entire cluster. > CREATE TABLE multi_tenant_table ( > tenant_id text, > pk2 text, > c1 text, > c2 text, > v1 text, > v2 text, > PRIMARY KEY ((tenant_id,pk2),c1,c2) > ) ; > Select * from multi_tenant_table where tenant_id = "datastax" allow filtering; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11031) MultiTenant : support “ALLOW FILTERING" for Partition Key
[ https://issues.apache.org/jira/browse/CASSANDRA-11031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15405708#comment-15405708 ] ZhaoYang commented on CASSANDRA-11031: -- [~ifesdjeen] Can you check? branch: https://github.com/apache/cassandra/compare/trunk...jasonstack:cassandra-11031-3.8?expand=1#diff-559d2a170c681fa6e692c8321afbfea3 dtest: https://github.com/riptano/cassandra-dtest/pull/969 Thank you > MultiTenant : support “ALLOW FILTERING" for Partition Key > - > > Key: CASSANDRA-11031 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11031 > Project: Cassandra > Issue Type: New Feature > Components: CQL >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Minor > Fix For: 3.x > > Attachments: CASSANDRA-11031-3.7.patch > > > Currently, Allow Filtering only works for secondary Index column or > clustering columns. And it's slow, because Cassandra will read all data from > SSTABLE from hard-disk to memory to filter. > But we can support allow filtering on Partition Key, as far as I know, > Partition Key is in memory, so we can easily filter them, and then read > required data from SSTable. > This will similar to "Select * from table" which scan through entire cluster. > CREATE TABLE multi_tenant_table ( > tenant_id text, > pk2 text, > c1 text, > c2 text, > v1 text, > v2 text, > PRIMARY KEY ((tenant_id,pk2),c1,c2) > ) ; > Select * from multi_tenant_table where tenant_id = "datastax" allow filtering; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11031) MultiTenant : support “ALLOW FILTERING" for Partition Key
[ https://issues.apache.org/jira/browse/CASSANDRA-11031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15401752#comment-15401752 ] Sylvain Lebresne commented on CASSANDRA-11031: -- bq. but 'Expression' and its serialization will need to be changed, because now 'Expression' only handles one value That's not a problem as we can have {{Expression}} expect a serialized list in the the case of a {{IN}}. That said, we've left {{IN}} support out of {{RowFilter}} so far so I'm happy to let it out in that ticket, letting it's support to another ticket. > MultiTenant : support “ALLOW FILTERING" for Partition Key > - > > Key: CASSANDRA-11031 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11031 > Project: Cassandra > Issue Type: New Feature > Components: CQL >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Minor > Fix For: 3.x > > Attachments: CASSANDRA-11031-3.7.patch > > > Currently, Allow Filtering only works for secondary Index column or > clustering columns. And it's slow, because Cassandra will read all data from > SSTABLE from hard-disk to memory to filter. > But we can support allow filtering on Partition Key, as far as I know, > Partition Key is in memory, so we can easily filter them, and then read > required data from SSTable. > This will similar to "Select * from table" which scan through entire cluster. > CREATE TABLE multi_tenant_table ( > tenant_id text, > pk2 text, > c1 text, > c2 text, > v1 text, > v2 text, > PRIMARY KEY ((tenant_id,pk2),c1,c2) > ) ; > Select * from multi_tenant_table where tenant_id = "datastax" allow filtering; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11031) MultiTenant : support “ALLOW FILTERING" for Partition Key
[ https://issues.apache.org/jira/browse/CASSANDRA-11031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15401745#comment-15401745 ] Alex Petrov commented on CASSANDRA-11031: - RE {{IN}} I mostly meant that currently working {{IN}} functionality has to be tested in combination with the added feature. bq. different ordering When filtering is performed, in combination with (for example) clustering ordering, we have to add tests to make sure that ordering is working well in combination with the new feature. I've tried to list the examples of where we should be careful. The more permutations we test, the more we can be sure about the added feature. > MultiTenant : support “ALLOW FILTERING" for Partition Key > - > > Key: CASSANDRA-11031 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11031 > Project: Cassandra > Issue Type: New Feature > Components: CQL >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Minor > Fix For: 3.x > > Attachments: CASSANDRA-11031-3.7.patch > > > Currently, Allow Filtering only works for secondary Index column or > clustering columns. And it's slow, because Cassandra will read all data from > SSTABLE from hard-disk to memory to filter. > But we can support allow filtering on Partition Key, as far as I know, > Partition Key is in memory, so we can easily filter them, and then read > required data from SSTable. > This will similar to "Select * from table" which scan through entire cluster. > CREATE TABLE multi_tenant_table ( > tenant_id text, > pk2 text, > c1 text, > c2 text, > v1 text, > v2 text, > PRIMARY KEY ((tenant_id,pk2),c1,c2) > ) ; > Select * from multi_tenant_table where tenant_id = "datastax" allow filtering; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11031) MultiTenant : support “ALLOW FILTERING" for Partition Key
[ https://issues.apache.org/jira/browse/CASSANDRA-11031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15401732#comment-15401732 ] ZhaoYang commented on CASSANDRA-11031: -- [~ifesdjeen] thanks for the help. I think in issue #11310, it also doesn't support IN for arbitrary clustering key. Supporting IN for arbitrary clustering key or partition key in RowFilter is possible, but 'Expression' and its serialization will need to be changed, because now 'Expression' only handles one value. I think it's better not to support IN for arbitrary partition/clustering key due to compatibility. What do you think? > different ordering Can you explain about it? The order of filtered PK will be still based on their token values. > MultiTenant : support “ALLOW FILTERING" for Partition Key > - > > Key: CASSANDRA-11031 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11031 > Project: Cassandra > Issue Type: New Feature > Components: CQL >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Minor > Fix For: 3.x > > Attachments: CASSANDRA-11031-3.7.patch > > > Currently, Allow Filtering only works for secondary Index column or > clustering columns. And it's slow, because Cassandra will read all data from > SSTABLE from hard-disk to memory to filter. > But we can support allow filtering on Partition Key, as far as I know, > Partition Key is in memory, so we can easily filter them, and then read > required data from SSTable. > This will similar to "Select * from table" which scan through entire cluster. > CREATE TABLE multi_tenant_table ( > tenant_id text, > pk2 text, > c1 text, > c2 text, > v1 text, > v2 text, > PRIMARY KEY ((tenant_id,pk2),c1,c2) > ) ; > Select * from multi_tenant_table where tenant_id = "datastax" allow filtering; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11031) MultiTenant : support “ALLOW FILTERING" for Partition Key
[ https://issues.apache.org/jira/browse/CASSANDRA-11031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15379062#comment-15379062 ] Alex Petrov commented on CASSANDRA-11031: - I'd say we have to support everything that filtering on regular and clustering columns supports. I've updated the comment above to provide more possible use-cases. > MultiTenant : support “ALLOW FILTERING" for Partition Key > - > > Key: CASSANDRA-11031 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11031 > Project: Cassandra > Issue Type: New Feature > Components: CQL >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Minor > Fix For: 3.x > > Attachments: CASSANDRA-11031-3.7.patch > > > Currently, Allow Filtering only works for secondary Index column or > clustering columns. And it's slow, because Cassandra will read all data from > SSTABLE from hard-disk to memory to filter. > But we can support allow filtering on Partition Key, as far as I know, > Partition Key is in memory, so we can easily filter them, and then read > required data from SSTable. > This will similar to "Select * from table" which scan through entire cluster. > CREATE TABLE multi_tenant_table ( > tenant_id text, > pk2 text, > c1 text, > c2 text, > v1 text, > v2 text, > PRIMARY KEY ((tenant_id,pk2),c1,c2) > ) ; > Select * from multi_tenant_table where tenant_id = "datastax" allow filtering; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11031) MultiTenant : support “ALLOW FILTERING" for Partition Key
[ https://issues.apache.org/jira/browse/CASSANDRA-11031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15378862#comment-15378862 ] ZhaoYang commented on CASSANDRA-11031: -- [~ifesdjeen] thanks for reviewing. Is it necessary to support filtering with IN condition? or just EQ/GT/LT.. I will add more unit test. > MultiTenant : support “ALLOW FILTERING" for Partition Key > - > > Key: CASSANDRA-11031 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11031 > Project: Cassandra > Issue Type: New Feature > Components: CQL >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Minor > Fix For: 3.x > > Attachments: CASSANDRA-11031-3.7.patch > > > Currently, Allow Filtering only works for secondary Index column or > clustering columns. And it's slow, because Cassandra will read all data from > SSTABLE from hard-disk to memory to filter. > But we can support allow filtering on Partition Key, as far as I know, > Partition Key is in memory, so we can easily filter them, and then read > required data from SSTable. > This will similar to "Select * from table" which scan through entire cluster. > CREATE TABLE multi_tenant_table ( > tenant_id text, > pk2 text, > c1 text, > c2 text, > v1 text, > v2 text, > PRIMARY KEY ((tenant_id,pk2),c1,c2) > ) ; > Select * from multi_tenant_table where tenant_id = "datastax" allow filtering; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11031) MultiTenant : support “ALLOW FILTERING" for Partition Key
[ https://issues.apache.org/jira/browse/CASSANDRA-11031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373089#comment-15373089 ] Alex Petrov commented on CASSANDRA-11031: - Thank you for your patch [~jasonstack]. Sorry it took so long to review it. I'll do my best to keep all the future iterations as short as possible. >From the first glance, there are many cases that are missing. For example, >filtering by anything but {{=}} wouldn't work. {code} createTable("CREATE TABLE %s (a text, b int, v int, PRIMARY KEY ((a, b))) "); execute("INSERT INTO %s (a, b, v) VALUES ('a', 2, 1)"); execute("INSERT INTO %s (a, b, v) VALUES ('a', 3, 1)"); assertRows(execute("SELECT * FROM %s WHERE a = 'a' AND b > 0 ALLOW FILTERING"), row("a", 2, 1), row("a", 3, 1)); {code} Would throw {code} org.apache.cassandra.exceptions.InvalidRequestException: Only EQ and IN relation are supported on the partition key (unless you use the token() function) {code} I would start with the unit test, to be honest. Although dtests are also important. Paging tests might be also good. Could you please check out: * combinations of partition and clustering key filtering * compound partition keys and their filtering * non-EQ relations ({{LT}}, {{GT}} etc) * {{COMPACT STORAGE}} You can take some inspiration for tests from [CASSANDRA-11310]. > MultiTenant : support “ALLOW FILTERING" for Partition Key > - > > Key: CASSANDRA-11031 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11031 > Project: Cassandra > Issue Type: New Feature > Components: CQL >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Minor > Fix For: 3.x > > Attachments: CASSANDRA-11031-3.7.patch > > > Currently, Allow Filtering only works for secondary Index column or > clustering columns. And it's slow, because Cassandra will read all data from > SSTABLE from hard-disk to memory to filter. > But we can support allow filtering on Partition Key, as far as I know, > Partition Key is in memory, so we can easily filter them, and then read > required data from SSTable. > This will similar to "Select * from table" which scan through entire cluster. > CREATE TABLE multi_tenant_table ( > tenant_id text, > pk2 text, > c1 text, > c2 text, > v1 text, > v2 text, > PRIMARY KEY ((tenant_id,pk2),c1,c2) > ) ; > Select * from multi_tenant_table where tenant_id = "datastax" allow filtering; -- This message was sent by Atlassian JIRA (v6.3.4#6332)