[jira] [Commented] (CASSANDRA-11031) MultiTenant : support “ALLOW FILTERING" for Partition Key

2016-08-29 Thread Benjamin Lerer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15445137#comment-15445137
 ] 

Benjamin Lerer commented on CASSANDRA-11031:


In {{RowFilter}} the patch still use lambdas in {{IsSatisfiedFilter}} that will 
be called for each partition and row.

> MultiTenant : support “ALLOW FILTERING" for Partition Key
> -
>
> Key: CASSANDRA-11031
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11031
> Project: Cassandra
>  Issue Type: New Feature
>  Components: CQL
>Reporter: ZhaoYang
>Assignee: ZhaoYang
>Priority: Minor
> Fix For: 3.x
>
>
> Currently, Allow Filtering only works for secondary Index column or 
> clustering columns. And it's slow, because Cassandra will read all data from 
> SSTABLE from hard-disk to memory to filter.
> But we can support allow filtering on Partition Key, as far as I know, 
> Partition Key is in memory, so we can easily filter them, and then read 
> required data from SSTable.
> This will similar to "Select * from table" which scan through entire cluster.
> CREATE TABLE multi_tenant_table (
>   tenant_id text,
>   pk2 text,
>   c1 text,
>   c2 text,
>   v1 text,
>   v2 text,
>   PRIMARY KEY ((tenant_id,pk2),c1,c2)
> ) ;
> Select * from multi_tenant_table where tenant_id = "datastax" allow filtering;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11031) MultiTenant : support “ALLOW FILTERING" for Partition Key

2016-08-25 Thread Alex Petrov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15436560#comment-15436560
 ] 

Alex Petrov commented on CASSANDRA-11031:
-

Thank you for the review! I've addressed your comments and wait for the test 
results now. All changes are done in a separate commit to make it simpler to 
see what's been done, I'll combine them if you +1.

bq. The hasSlice method is now duplicated in PartitionKeySingleRestrictionSet 
and ClusteringColumnRestrictions. It would be better to put it only into 
RestrictionSet and expose it into Restrictions.

Removed the redundant method, pulled it up.

bw. We should have some unit test for MaterializedViews with Slice and 
unrestricted partition keys.

Added the MV tests, similar to the previously existing ones.

bq. In StatementRestrictions::processPartitionKeyRestrictions I do not think 
that the error message: "Partition key parts: %s must be restricted as other 
parts are" makes sense any more (for non-MVs). We should probably only use the 
ALLOW FILTERING error message.

Now we just have one error message, same everywhere. Tests adjusted accordingly.

bq. The part of StatementRestrictions::processPartitionKeyRestrictions dealing 
with secondary index can probably be combined with the one dealing with the 
filtering.

True, I've separated them for sakes of simplicity, but I hope that new version 
looks clean as well. Combining them furhter is tricky and will require 
double-checking of non-restricted PK parts and whether or not restrictions are 
empty. It was a bit simpler previously since we did't have to take care of 
slices.

bq. In RowFilter we should not use Lambda expression to filter out the rows as 
they are pretty bad from the performance point of view and result in a lot of 
garbage. In RowFilter: Iterables.size(rowLevelExpressions) can be replaced by 
rowLevelExpressions.size()

Optimised that part as well, got rid of streams alltogether, we iterate through 
restrictions just once now.

bq. The tests using secondary indices should be put into SecondaryIndexTest

Moved the tests.

bq. It will be good to add more tests into 
SelectOrderedPartitionerTest::testFilteringOnPartitionKeyWithToken: with > on 
one of the partition key or only the second partition key being specified

I've added one more clause that seemed missing, although since token 
restrictions require both parts for the token function, I did not add anything 
else.

bq. In SelectTest::testAllowFilteringOnPartitionKey the message IN restrictions 
are not supported on indexed columns looks wrong to me as the test do not use 
any secondary index.

For clarity here, I've added a more elaborate message that specifies that 
filtering is involved.

|[trunk|https://github.com/ifesdjeen/cassandra/tree/11031-trunk]|[dtest|https://cassci.datastax.com/view/Dev/view/ifesdjeen/job/ifesdjeen-11031-trunk-dtest/]|[testall|https://cassci.datastax.com/view/Dev/view/ifesdjeen/job/ifesdjeen-11031-trunk-testall/]|

> MultiTenant : support “ALLOW FILTERING" for Partition Key
> -
>
> Key: CASSANDRA-11031
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11031
> Project: Cassandra
>  Issue Type: New Feature
>  Components: CQL
>Reporter: ZhaoYang
>Assignee: ZhaoYang
>Priority: Minor
> Fix For: 3.x
>
>
> Currently, Allow Filtering only works for secondary Index column or 
> clustering columns. And it's slow, because Cassandra will read all data from 
> SSTABLE from hard-disk to memory to filter.
> But we can support allow filtering on Partition Key, as far as I know, 
> Partition Key is in memory, so we can easily filter them, and then read 
> required data from SSTable.
> This will similar to "Select * from table" which scan through entire cluster.
> CREATE TABLE multi_tenant_table (
>   tenant_id text,
>   pk2 text,
>   c1 text,
>   c2 text,
>   v1 text,
>   v2 text,
>   PRIMARY KEY ((tenant_id,pk2),c1,c2)
> ) ;
> Select * from multi_tenant_table where tenant_id = "datastax" allow filtering;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11031) MultiTenant : support “ALLOW FILTERING" for Partition Key

2016-08-23 Thread ZhaoYang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15432879#comment-15432879
 ] 

ZhaoYang commented on CASSANDRA-11031:
--

[~ifesdjeen] sure, I will add these dtest this week

> MultiTenant : support “ALLOW FILTERING" for Partition Key
> -
>
> Key: CASSANDRA-11031
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11031
> Project: Cassandra
>  Issue Type: New Feature
>  Components: CQL
>Reporter: ZhaoYang
>Assignee: ZhaoYang
>Priority: Minor
> Fix For: 3.x
>
>
> Currently, Allow Filtering only works for secondary Index column or 
> clustering columns. And it's slow, because Cassandra will read all data from 
> SSTABLE from hard-disk to memory to filter.
> But we can support allow filtering on Partition Key, as far as I know, 
> Partition Key is in memory, so we can easily filter them, and then read 
> required data from SSTable.
> This will similar to "Select * from table" which scan through entire cluster.
> CREATE TABLE multi_tenant_table (
>   tenant_id text,
>   pk2 text,
>   c1 text,
>   c2 text,
>   v1 text,
>   v2 text,
>   PRIMARY KEY ((tenant_id,pk2),c1,c2)
> ) ;
> Select * from multi_tenant_table where tenant_id = "datastax" allow filtering;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11031) MultiTenant : support “ALLOW FILTERING" for Partition Key

2016-08-23 Thread Benjamin Lerer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15432368#comment-15432368
 ] 

Benjamin Lerer commented on CASSANDRA-11031:


Overall the patch looks good to me. I just have noted the following points:
* The {{hasSlice}} method is now duplicated in 
{{PartitionKeySingleRestrictionSet}} and {{ClusteringColumnRestrictions}}. It 
would be better to put it only into {{RestrictionSet}} and expose it into 
{{Restrictions}}.
* We should have some unit test for MaterializedViews with Slice and 
unrestricted partition keys.
* In {{StatementRestrictions::processPartitionKeyRestrictions}} I do not think 
that the error message: {{"Partition key parts: %s must be restricted as other 
parts are"}} makes sense any more (for non-MVs). We should probably only use 
the {{ALLOW FILTERING}} error message.
* The part of {{StatementRestrictions::processPartitionKeyRestrictions}} 
dealing with secondary index can probably be combined with the one dealing with 
the filtering.
* In {{RowFilter}} we should not use Lambda expression to filter out the rows 
as they are pretty bad from the performance point of view and result in a lot 
of garbage.
* In {{RowFilter}}: {{Iterables.size(rowLevelExpressions)}} can be replaced by 
{{rowLevelExpressions.size()}}
* The tests using secondary indices should be put into {{SecondaryIndexTest}}
* It will be good to add more tests into 
{{SelectOrderedPartitionerTest::testFilteringOnPartitionKeyWithToken}}: with 
{{>}} on one of the partition key or only the second partition key being 
specified
* In {{SelectTest::testAllowFilteringOnPartitionKey}} the message {{IN 
restrictions are not supported on indexed columns}} looks wrong to me as the 
test do not use any secondary index.  

> MultiTenant : support “ALLOW FILTERING" for Partition Key
> -
>
> Key: CASSANDRA-11031
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11031
> Project: Cassandra
>  Issue Type: New Feature
>  Components: CQL
>Reporter: ZhaoYang
>Assignee: ZhaoYang
>Priority: Minor
> Fix For: 3.x
>
>
> Currently, Allow Filtering only works for secondary Index column or 
> clustering columns. And it's slow, because Cassandra will read all data from 
> SSTABLE from hard-disk to memory to filter.
> But we can support allow filtering on Partition Key, as far as I know, 
> Partition Key is in memory, so we can easily filter them, and then read 
> required data from SSTable.
> This will similar to "Select * from table" which scan through entire cluster.
> CREATE TABLE multi_tenant_table (
>   tenant_id text,
>   pk2 text,
>   c1 text,
>   c2 text,
>   v1 text,
>   v2 text,
>   PRIMARY KEY ((tenant_id,pk2),c1,c2)
> ) ;
> Select * from multi_tenant_table where tenant_id = "datastax" allow filtering;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11031) MultiTenant : support “ALLOW FILTERING" for Partition Key

2016-08-21 Thread Alex Petrov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15429723#comment-15429723
 ] 

Alex Petrov commented on CASSANDRA-11031:
-

[~blerer] I haven't made any code chances since the last submission, just fixed 
/ improved dtest suite. Now, all tests pass. If you could take a look, it'd be 
great. I haven't squashed commits just in case you'd like to see the history. 

To roughly describe what I've done: 
  * short-circuited the PK & Static column queries in {{RowFilter}}, made sure 
that checks for PK and static column expressions are done once only
  * changed logic in {{processPartitionKeyRestrictions}}, hopefully making all 
cases more elaborate

The rest is just minor adjustments.

|[trunk|https://github.com/ifesdjeen/cassandra/tree/11031-trunk]|[dtest|https://cassci.datastax.com/view/Dev/view/ifesdjeen/job/ifesdjeen-11031-trunk-dtest/]|[testall|https://cassci.datastax.com/view/Dev/view/ifesdjeen/job/ifesdjeen-11031-trunk-testall/]|

> MultiTenant : support “ALLOW FILTERING" for Partition Key
> -
>
> Key: CASSANDRA-11031
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11031
> Project: Cassandra
>  Issue Type: New Feature
>  Components: CQL
>Reporter: ZhaoYang
>Assignee: ZhaoYang
>Priority: Minor
> Fix For: 3.x
>
>
> Currently, Allow Filtering only works for secondary Index column or 
> clustering columns. And it's slow, because Cassandra will read all data from 
> SSTABLE from hard-disk to memory to filter.
> But we can support allow filtering on Partition Key, as far as I know, 
> Partition Key is in memory, so we can easily filter them, and then read 
> required data from SSTable.
> This will similar to "Select * from table" which scan through entire cluster.
> CREATE TABLE multi_tenant_table (
>   tenant_id text,
>   pk2 text,
>   c1 text,
>   c2 text,
>   v1 text,
>   v2 text,
>   PRIMARY KEY ((tenant_id,pk2),c1,c2)
> ) ;
> Select * from multi_tenant_table where tenant_id = "datastax" allow filtering;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11031) MultiTenant : support “ALLOW FILTERING" for Partition Key

2016-08-19 Thread Alex Petrov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15428384#comment-15428384
 ] 

Alex Petrov commented on CASSANDRA-11031:
-

[~jasonstack] I've fixed most of problems with dtests by now. Unit tests look 
good. However, ordering was not correct in limit tests,{{PER PARTITION LIMIT}} 
tests and test with {{ORDER BY}} clause were absent. Could you please add those?

Please, use {{assert_all}} with {{ignore_order}} if results return more than 
one partition as their order is not guaranteed. {{LIMIT}} tests are quite 
tricky. The easiest would probably be to make sure you know which partition 
you're going to be getting back first, this way order will be consistent. 
Otherwise you have to compare with contents of all possible partitions.

> MultiTenant : support “ALLOW FILTERING" for Partition Key
> -
>
> Key: CASSANDRA-11031
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11031
> Project: Cassandra
>  Issue Type: New Feature
>  Components: CQL
>Reporter: ZhaoYang
>Assignee: ZhaoYang
>Priority: Minor
> Fix For: 3.x
>
>
> Currently, Allow Filtering only works for secondary Index column or 
> clustering columns. And it's slow, because Cassandra will read all data from 
> SSTABLE from hard-disk to memory to filter.
> But we can support allow filtering on Partition Key, as far as I know, 
> Partition Key is in memory, so we can easily filter them, and then read 
> required data from SSTable.
> This will similar to "Select * from table" which scan through entire cluster.
> CREATE TABLE multi_tenant_table (
>   tenant_id text,
>   pk2 text,
>   c1 text,
>   c2 text,
>   v1 text,
>   v2 text,
>   PRIMARY KEY ((tenant_id,pk2),c1,c2)
> ) ;
> Select * from multi_tenant_table where tenant_id = "datastax" allow filtering;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11031) MultiTenant : support “ALLOW FILTERING" for Partition Key

2016-08-19 Thread Benjamin Lerer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15428166#comment-15428166
 ] 

Benjamin Lerer commented on CASSANDRA-11031:


[~jasonstack] Sorry, I got stuck into some other patches.
I will do the review as soon as Alex is happy with his last changes.

> MultiTenant : support “ALLOW FILTERING" for Partition Key
> -
>
> Key: CASSANDRA-11031
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11031
> Project: Cassandra
>  Issue Type: New Feature
>  Components: CQL
>Reporter: ZhaoYang
>Assignee: ZhaoYang
>Priority: Minor
> Fix For: 3.x
>
>
> Currently, Allow Filtering only works for secondary Index column or 
> clustering columns. And it's slow, because Cassandra will read all data from 
> SSTABLE from hard-disk to memory to filter.
> But we can support allow filtering on Partition Key, as far as I know, 
> Partition Key is in memory, so we can easily filter them, and then read 
> required data from SSTable.
> This will similar to "Select * from table" which scan through entire cluster.
> CREATE TABLE multi_tenant_table (
>   tenant_id text,
>   pk2 text,
>   c1 text,
>   c2 text,
>   v1 text,
>   v2 text,
>   PRIMARY KEY ((tenant_id,pk2),c1,c2)
> ) ;
> Select * from multi_tenant_table where tenant_id = "datastax" allow filtering;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11031) MultiTenant : support “ALLOW FILTERING" for Partition Key

2016-08-19 Thread Alex Petrov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15428162#comment-15428162
 ] 

Alex Petrov commented on CASSANDRA-11031:
-

After the test run for latest changes I've noticed that dtests are mostly 
incorrect since they're expecting particular partition order and sometimes even 
use limit which can not work since we can not guarantee positioning and order 
of partitions on slice query without re-ordering them on coordinator (which 
does not happen in this case).

> MultiTenant : support “ALLOW FILTERING" for Partition Key
> -
>
> Key: CASSANDRA-11031
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11031
> Project: Cassandra
>  Issue Type: New Feature
>  Components: CQL
>Reporter: ZhaoYang
>Assignee: ZhaoYang
>Priority: Minor
> Fix For: 3.x
>
>
> Currently, Allow Filtering only works for secondary Index column or 
> clustering columns. And it's slow, because Cassandra will read all data from 
> SSTABLE from hard-disk to memory to filter.
> But we can support allow filtering on Partition Key, as far as I know, 
> Partition Key is in memory, so we can easily filter them, and then read 
> required data from SSTable.
> This will similar to "Select * from table" which scan through entire cluster.
> CREATE TABLE multi_tenant_table (
>   tenant_id text,
>   pk2 text,
>   c1 text,
>   c2 text,
>   v1 text,
>   v2 text,
>   PRIMARY KEY ((tenant_id,pk2),c1,c2)
> ) ;
> Select * from multi_tenant_table where tenant_id = "datastax" allow filtering;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11031) MultiTenant : support “ALLOW FILTERING" for Partition Key

2016-08-19 Thread Alex Petrov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427749#comment-15427749
 ] 

Alex Petrov commented on CASSANDRA-11031:
-

I'm sure Benjamin did not forget. This is just a large change and needs lots of 
time and uninterrupted attention to get right. 

I've made several more changes yesterday and found several more problems. Also, 
dtests were not working properly. I've submitted patch for another round of CI 
and will take another look at it at Tuesday.

> MultiTenant : support “ALLOW FILTERING" for Partition Key
> -
>
> Key: CASSANDRA-11031
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11031
> Project: Cassandra
>  Issue Type: New Feature
>  Components: CQL
>Reporter: ZhaoYang
>Assignee: ZhaoYang
>Priority: Minor
> Fix For: 3.x
>
>
> Currently, Allow Filtering only works for secondary Index column or 
> clustering columns. And it's slow, because Cassandra will read all data from 
> SSTABLE from hard-disk to memory to filter.
> But we can support allow filtering on Partition Key, as far as I know, 
> Partition Key is in memory, so we can easily filter them, and then read 
> required data from SSTable.
> This will similar to "Select * from table" which scan through entire cluster.
> CREATE TABLE multi_tenant_table (
>   tenant_id text,
>   pk2 text,
>   c1 text,
>   c2 text,
>   v1 text,
>   v2 text,
>   PRIMARY KEY ((tenant_id,pk2),c1,c2)
> ) ;
> Select * from multi_tenant_table where tenant_id = "datastax" allow filtering;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11031) MultiTenant : support “ALLOW FILTERING" for Partition Key

2016-08-19 Thread ZhaoYang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427699#comment-15427699
 ] 

ZhaoYang commented on CASSANDRA-11031:
--

Hi, [~blerer] in case you forgot..

> MultiTenant : support “ALLOW FILTERING" for Partition Key
> -
>
> Key: CASSANDRA-11031
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11031
> Project: Cassandra
>  Issue Type: New Feature
>  Components: CQL
>Reporter: ZhaoYang
>Assignee: ZhaoYang
>Priority: Minor
> Fix For: 3.x
>
>
> Currently, Allow Filtering only works for secondary Index column or 
> clustering columns. And it's slow, because Cassandra will read all data from 
> SSTABLE from hard-disk to memory to filter.
> But we can support allow filtering on Partition Key, as far as I know, 
> Partition Key is in memory, so we can easily filter them, and then read 
> required data from SSTable.
> This will similar to "Select * from table" which scan through entire cluster.
> CREATE TABLE multi_tenant_table (
>   tenant_id text,
>   pk2 text,
>   c1 text,
>   c2 text,
>   v1 text,
>   v2 text,
>   PRIMARY KEY ((tenant_id,pk2),c1,c2)
> ) ;
> Select * from multi_tenant_table where tenant_id = "datastax" allow filtering;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11031) MultiTenant : support “ALLOW FILTERING" for Partition Key

2016-08-19 Thread ZhaoYang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427700#comment-15427700
 ] 

ZhaoYang commented on CASSANDRA-11031:
--

Hi, [~blerer] in case you forgot..

> MultiTenant : support “ALLOW FILTERING" for Partition Key
> -
>
> Key: CASSANDRA-11031
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11031
> Project: Cassandra
>  Issue Type: New Feature
>  Components: CQL
>Reporter: ZhaoYang
>Assignee: ZhaoYang
>Priority: Minor
> Fix For: 3.x
>
>
> Currently, Allow Filtering only works for secondary Index column or 
> clustering columns. And it's slow, because Cassandra will read all data from 
> SSTABLE from hard-disk to memory to filter.
> But we can support allow filtering on Partition Key, as far as I know, 
> Partition Key is in memory, so we can easily filter them, and then read 
> required data from SSTable.
> This will similar to "Select * from table" which scan through entire cluster.
> CREATE TABLE multi_tenant_table (
>   tenant_id text,
>   pk2 text,
>   c1 text,
>   c2 text,
>   v1 text,
>   v2 text,
>   PRIMARY KEY ((tenant_id,pk2),c1,c2)
> ) ;
> Select * from multi_tenant_table where tenant_id = "datastax" allow filtering;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11031) MultiTenant : support “ALLOW FILTERING" for Partition Key

2016-08-19 Thread ZhaoYang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427701#comment-15427701
 ] 

ZhaoYang commented on CASSANDRA-11031:
--

thanks. LGTM

> MultiTenant : support “ALLOW FILTERING" for Partition Key
> -
>
> Key: CASSANDRA-11031
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11031
> Project: Cassandra
>  Issue Type: New Feature
>  Components: CQL
>Reporter: ZhaoYang
>Assignee: ZhaoYang
>Priority: Minor
> Fix For: 3.x
>
>
> Currently, Allow Filtering only works for secondary Index column or 
> clustering columns. And it's slow, because Cassandra will read all data from 
> SSTABLE from hard-disk to memory to filter.
> But we can support allow filtering on Partition Key, as far as I know, 
> Partition Key is in memory, so we can easily filter them, and then read 
> required data from SSTable.
> This will similar to "Select * from table" which scan through entire cluster.
> CREATE TABLE multi_tenant_table (
>   tenant_id text,
>   pk2 text,
>   c1 text,
>   c2 text,
>   v1 text,
>   v2 text,
>   PRIMARY KEY ((tenant_id,pk2),c1,c2)
> ) ;
> Select * from multi_tenant_table where tenant_id = "datastax" allow filtering;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11031) MultiTenant : support “ALLOW FILTERING" for Partition Key

2016-08-08 Thread Benjamin Lerer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15412344#comment-15412344
 ] 

Benjamin Lerer commented on CASSANDRA-11031:


I will have a look as soon as I can (probably next week).

> MultiTenant : support “ALLOW FILTERING" for Partition Key
> -
>
> Key: CASSANDRA-11031
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11031
> Project: Cassandra
>  Issue Type: New Feature
>  Components: CQL
>Reporter: ZhaoYang
>Assignee: ZhaoYang
>Priority: Minor
> Fix For: 3.x
>
> Attachments: CASSANDRA-11031-3.7.patch
>
>
> Currently, Allow Filtering only works for secondary Index column or 
> clustering columns. And it's slow, because Cassandra will read all data from 
> SSTABLE from hard-disk to memory to filter.
> But we can support allow filtering on Partition Key, as far as I know, 
> Partition Key is in memory, so we can easily filter them, and then read 
> required data from SSTable.
> This will similar to "Select * from table" which scan through entire cluster.
> CREATE TABLE multi_tenant_table (
>   tenant_id text,
>   pk2 text,
>   c1 text,
>   c2 text,
>   v1 text,
>   v2 text,
>   PRIMARY KEY ((tenant_id,pk2),c1,c2)
> ) ;
> Select * from multi_tenant_table where tenant_id = "datastax" allow filtering;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11031) MultiTenant : support “ALLOW FILTERING" for Partition Key

2016-08-08 Thread Alex Petrov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15412304#comment-15412304
 ] 

Alex Petrov commented on CASSANDRA-11031:
-

[~jasonstack] I've made several changes in a separate commit, you can check 
them below. 

[~blerer] could you lend us another pair of eyes on the ticket? I've found a 
way to short-circuit several common cases for both partition and static 
restrictions, tests are looking good, but i'm biased now since I've touched the 
patch... 

|[trunk|https://github.com/ifesdjeen/cassandra/tree/11031-trunk]|[dtest|https://cassci.datastax.com/view/Dev/view/ifesdjeen/job/ifesdjeen-11031-trunk-dtest/]|[testall|https://cassci.datastax.com/view/Dev/view/ifesdjeen/job/ifesdjeen-11031-trunk-testall/]|

> MultiTenant : support “ALLOW FILTERING" for Partition Key
> -
>
> Key: CASSANDRA-11031
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11031
> Project: Cassandra
>  Issue Type: New Feature
>  Components: CQL
>Reporter: ZhaoYang
>Assignee: ZhaoYang
>Priority: Minor
> Fix For: 3.x
>
> Attachments: CASSANDRA-11031-3.7.patch
>
>
> Currently, Allow Filtering only works for secondary Index column or 
> clustering columns. And it's slow, because Cassandra will read all data from 
> SSTABLE from hard-disk to memory to filter.
> But we can support allow filtering on Partition Key, as far as I know, 
> Partition Key is in memory, so we can easily filter them, and then read 
> required data from SSTable.
> This will similar to "Select * from table" which scan through entire cluster.
> CREATE TABLE multi_tenant_table (
>   tenant_id text,
>   pk2 text,
>   c1 text,
>   c2 text,
>   v1 text,
>   v2 text,
>   PRIMARY KEY ((tenant_id,pk2),c1,c2)
> ) ;
> Select * from multi_tenant_table where tenant_id = "datastax" allow filtering;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11031) MultiTenant : support “ALLOW FILTERING" for Partition Key

2016-08-03 Thread ZhaoYang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15405708#comment-15405708
 ] 

ZhaoYang commented on CASSANDRA-11031:
--

[~ifesdjeen]  Can you check?

branch: 
https://github.com/apache/cassandra/compare/trunk...jasonstack:cassandra-11031-3.8?expand=1#diff-559d2a170c681fa6e692c8321afbfea3

dtest: https://github.com/riptano/cassandra-dtest/pull/969

Thank you

> MultiTenant : support “ALLOW FILTERING" for Partition Key
> -
>
> Key: CASSANDRA-11031
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11031
> Project: Cassandra
>  Issue Type: New Feature
>  Components: CQL
>Reporter: ZhaoYang
>Assignee: ZhaoYang
>Priority: Minor
> Fix For: 3.x
>
> Attachments: CASSANDRA-11031-3.7.patch
>
>
> Currently, Allow Filtering only works for secondary Index column or 
> clustering columns. And it's slow, because Cassandra will read all data from 
> SSTABLE from hard-disk to memory to filter.
> But we can support allow filtering on Partition Key, as far as I know, 
> Partition Key is in memory, so we can easily filter them, and then read 
> required data from SSTable.
> This will similar to "Select * from table" which scan through entire cluster.
> CREATE TABLE multi_tenant_table (
>   tenant_id text,
>   pk2 text,
>   c1 text,
>   c2 text,
>   v1 text,
>   v2 text,
>   PRIMARY KEY ((tenant_id,pk2),c1,c2)
> ) ;
> Select * from multi_tenant_table where tenant_id = "datastax" allow filtering;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11031) MultiTenant : support “ALLOW FILTERING" for Partition Key

2016-08-01 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15401752#comment-15401752
 ] 

Sylvain Lebresne commented on CASSANDRA-11031:
--

bq. but 'Expression' and its serialization will need to be changed, because now 
'Expression' only handles one value

That's not a problem as we can have {{Expression}} expect a serialized list in 
the the case of a {{IN}}. That said, we've left {{IN}} support out of 
{{RowFilter}} so far so I'm happy to let it out in that ticket, letting it's 
support to another ticket.

> MultiTenant : support “ALLOW FILTERING" for Partition Key
> -
>
> Key: CASSANDRA-11031
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11031
> Project: Cassandra
>  Issue Type: New Feature
>  Components: CQL
>Reporter: ZhaoYang
>Assignee: ZhaoYang
>Priority: Minor
> Fix For: 3.x
>
> Attachments: CASSANDRA-11031-3.7.patch
>
>
> Currently, Allow Filtering only works for secondary Index column or 
> clustering columns. And it's slow, because Cassandra will read all data from 
> SSTABLE from hard-disk to memory to filter.
> But we can support allow filtering on Partition Key, as far as I know, 
> Partition Key is in memory, so we can easily filter them, and then read 
> required data from SSTable.
> This will similar to "Select * from table" which scan through entire cluster.
> CREATE TABLE multi_tenant_table (
>   tenant_id text,
>   pk2 text,
>   c1 text,
>   c2 text,
>   v1 text,
>   v2 text,
>   PRIMARY KEY ((tenant_id,pk2),c1,c2)
> ) ;
> Select * from multi_tenant_table where tenant_id = "datastax" allow filtering;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11031) MultiTenant : support “ALLOW FILTERING" for Partition Key

2016-08-01 Thread Alex Petrov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15401745#comment-15401745
 ] 

Alex Petrov commented on CASSANDRA-11031:
-

RE {{IN}} I mostly meant that currently working {{IN}} functionality has to be 
tested in combination with the added feature. 

bq. different ordering

When filtering is performed, in combination with (for example) clustering 
ordering, we have to add tests to make sure that ordering is working well in 
combination with the new feature. 

I've tried to list the examples of where we should be careful. The more 
permutations we test, the more we can be sure about the added feature. 

> MultiTenant : support “ALLOW FILTERING" for Partition Key
> -
>
> Key: CASSANDRA-11031
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11031
> Project: Cassandra
>  Issue Type: New Feature
>  Components: CQL
>Reporter: ZhaoYang
>Assignee: ZhaoYang
>Priority: Minor
> Fix For: 3.x
>
> Attachments: CASSANDRA-11031-3.7.patch
>
>
> Currently, Allow Filtering only works for secondary Index column or 
> clustering columns. And it's slow, because Cassandra will read all data from 
> SSTABLE from hard-disk to memory to filter.
> But we can support allow filtering on Partition Key, as far as I know, 
> Partition Key is in memory, so we can easily filter them, and then read 
> required data from SSTable.
> This will similar to "Select * from table" which scan through entire cluster.
> CREATE TABLE multi_tenant_table (
>   tenant_id text,
>   pk2 text,
>   c1 text,
>   c2 text,
>   v1 text,
>   v2 text,
>   PRIMARY KEY ((tenant_id,pk2),c1,c2)
> ) ;
> Select * from multi_tenant_table where tenant_id = "datastax" allow filtering;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11031) MultiTenant : support “ALLOW FILTERING" for Partition Key

2016-08-01 Thread ZhaoYang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15401732#comment-15401732
 ] 

ZhaoYang commented on CASSANDRA-11031:
--

[~ifesdjeen] thanks for the help. I think in issue #11310, it also doesn't 
support IN for arbitrary clustering key.

Supporting IN for arbitrary clustering key or partition key in RowFilter is 
possible, but 'Expression' and its serialization will need to be changed, 
because now 'Expression' only handles one value.

I think it's better not to support IN for arbitrary partition/clustering key 
due to compatibility. What do you think?


> different ordering
Can you explain about it? The order of filtered PK will be still based on their 
token values.


> MultiTenant : support “ALLOW FILTERING" for Partition Key
> -
>
> Key: CASSANDRA-11031
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11031
> Project: Cassandra
>  Issue Type: New Feature
>  Components: CQL
>Reporter: ZhaoYang
>Assignee: ZhaoYang
>Priority: Minor
> Fix For: 3.x
>
> Attachments: CASSANDRA-11031-3.7.patch
>
>
> Currently, Allow Filtering only works for secondary Index column or 
> clustering columns. And it's slow, because Cassandra will read all data from 
> SSTABLE from hard-disk to memory to filter.
> But we can support allow filtering on Partition Key, as far as I know, 
> Partition Key is in memory, so we can easily filter them, and then read 
> required data from SSTable.
> This will similar to "Select * from table" which scan through entire cluster.
> CREATE TABLE multi_tenant_table (
>   tenant_id text,
>   pk2 text,
>   c1 text,
>   c2 text,
>   v1 text,
>   v2 text,
>   PRIMARY KEY ((tenant_id,pk2),c1,c2)
> ) ;
> Select * from multi_tenant_table where tenant_id = "datastax" allow filtering;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11031) MultiTenant : support “ALLOW FILTERING" for Partition Key

2016-07-15 Thread Alex Petrov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15379062#comment-15379062
 ] 

Alex Petrov commented on CASSANDRA-11031:
-

I'd say we have to support everything that filtering on regular and clustering 
columns supports. I've updated the comment above to provide more possible 
use-cases.

> MultiTenant : support “ALLOW FILTERING" for Partition Key
> -
>
> Key: CASSANDRA-11031
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11031
> Project: Cassandra
>  Issue Type: New Feature
>  Components: CQL
>Reporter: ZhaoYang
>Assignee: ZhaoYang
>Priority: Minor
> Fix For: 3.x
>
> Attachments: CASSANDRA-11031-3.7.patch
>
>
> Currently, Allow Filtering only works for secondary Index column or 
> clustering columns. And it's slow, because Cassandra will read all data from 
> SSTABLE from hard-disk to memory to filter.
> But we can support allow filtering on Partition Key, as far as I know, 
> Partition Key is in memory, so we can easily filter them, and then read 
> required data from SSTable.
> This will similar to "Select * from table" which scan through entire cluster.
> CREATE TABLE multi_tenant_table (
>   tenant_id text,
>   pk2 text,
>   c1 text,
>   c2 text,
>   v1 text,
>   v2 text,
>   PRIMARY KEY ((tenant_id,pk2),c1,c2)
> ) ;
> Select * from multi_tenant_table where tenant_id = "datastax" allow filtering;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11031) MultiTenant : support “ALLOW FILTERING" for Partition Key

2016-07-14 Thread ZhaoYang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15378862#comment-15378862
 ] 

ZhaoYang commented on CASSANDRA-11031:
--

[~ifesdjeen] thanks for reviewing.

Is it necessary to support filtering with IN condition? or just EQ/GT/LT..

I will add more unit test.

> MultiTenant : support “ALLOW FILTERING" for Partition Key
> -
>
> Key: CASSANDRA-11031
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11031
> Project: Cassandra
>  Issue Type: New Feature
>  Components: CQL
>Reporter: ZhaoYang
>Assignee: ZhaoYang
>Priority: Minor
> Fix For: 3.x
>
> Attachments: CASSANDRA-11031-3.7.patch
>
>
> Currently, Allow Filtering only works for secondary Index column or 
> clustering columns. And it's slow, because Cassandra will read all data from 
> SSTABLE from hard-disk to memory to filter.
> But we can support allow filtering on Partition Key, as far as I know, 
> Partition Key is in memory, so we can easily filter them, and then read 
> required data from SSTable.
> This will similar to "Select * from table" which scan through entire cluster.
> CREATE TABLE multi_tenant_table (
>   tenant_id text,
>   pk2 text,
>   c1 text,
>   c2 text,
>   v1 text,
>   v2 text,
>   PRIMARY KEY ((tenant_id,pk2),c1,c2)
> ) ;
> Select * from multi_tenant_table where tenant_id = "datastax" allow filtering;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11031) MultiTenant : support “ALLOW FILTERING" for Partition Key

2016-07-12 Thread Alex Petrov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373089#comment-15373089
 ] 

Alex Petrov commented on CASSANDRA-11031:
-

Thank you for your patch [~jasonstack]. Sorry it took so long to review it. 
I'll do my best to keep all the future iterations as short as possible.

>From the first glance, there are many cases that are missing. For example, 
>filtering by anything but {{=}} wouldn't work.

{code}
createTable("CREATE TABLE %s (a text, b int, v int, PRIMARY KEY ((a, b))) ");

execute("INSERT INTO %s (a, b, v) VALUES ('a', 2, 1)");
execute("INSERT INTO %s (a, b, v) VALUES ('a', 3, 1)");
assertRows(execute("SELECT * FROM %s WHERE a = 'a' AND b > 0 ALLOW 
FILTERING"),
   row("a", 2, 1),
   row("a", 3, 1));
{code}

Would throw 

{code}
org.apache.cassandra.exceptions.InvalidRequestException: Only EQ and IN 
relation are supported on the partition key (unless you use the token() 
function)
{code}

I would start with the unit test, to be honest. Although dtests are also 
important. Paging tests might be also good. 

Could you please check out: 
  * combinations of partition and clustering key filtering 
  * compound partition keys and their filtering 
  * non-EQ relations ({{LT}}, {{GT}} etc)
  * {{COMPACT STORAGE}} 

You can take some inspiration for tests from [CASSANDRA-11310].


> MultiTenant : support “ALLOW FILTERING" for Partition Key
> -
>
> Key: CASSANDRA-11031
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11031
> Project: Cassandra
>  Issue Type: New Feature
>  Components: CQL
>Reporter: ZhaoYang
>Assignee: ZhaoYang
>Priority: Minor
> Fix For: 3.x
>
> Attachments: CASSANDRA-11031-3.7.patch
>
>
> Currently, Allow Filtering only works for secondary Index column or 
> clustering columns. And it's slow, because Cassandra will read all data from 
> SSTABLE from hard-disk to memory to filter.
> But we can support allow filtering on Partition Key, as far as I know, 
> Partition Key is in memory, so we can easily filter them, and then read 
> required data from SSTable.
> This will similar to "Select * from table" which scan through entire cluster.
> CREATE TABLE multi_tenant_table (
>   tenant_id text,
>   pk2 text,
>   c1 text,
>   c2 text,
>   v1 text,
>   v2 text,
>   PRIMARY KEY ((tenant_id,pk2),c1,c2)
> ) ;
> Select * from multi_tenant_table where tenant_id = "datastax" allow filtering;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)