[jira] [Commented] (CASSANDRA-4476) Support 2ndary index queries with only inequality clauses (LT, LTE, GT, GTE)

2014-12-12 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14244155#comment-14244155
 ] 

Sylvain Lebresne commented on CASSANDRA-4476:
-

It should be possible to page despite the fact results are not in token order 
(after all, the order is deterministic) but we'd need to look at it more 
closely to assess if the complexity of that would be worth it or not, so I'll 
at least push this to 3.1 for now.  

 Support 2ndary index queries with only inequality clauses (LT, LTE, GT, GTE)
 

 Key: CASSANDRA-4476
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4476
 Project: Cassandra
  Issue Type: Improvement
  Components: API, Core
Reporter: Sylvain Lebresne
Assignee: Oded Peer
Priority: Minor
  Labels: cql
 Fix For: 3.1

 Attachments: 4476-2.patch, 4476-3.patch, 4476-5.patch, 
 cassandra-trunk-4476.patch


 Currently, a query that uses 2ndary indexes must have at least one EQ clause 
 (on an indexed column). Given that indexed CFs are local (and use 
 LocalPartitioner that order the row by the type of the indexed column), we 
 should extend 2ndary indexes to allow querying indexed columns even when no 
 EQ clause is provided.
 As far as I can tell, the main problem to solve for this is to update 
 KeysSearcher.highestSelectivityPredicate(). I.e. how do we estimate the 
 selectivity of non-EQ clauses? I note however that if we can do that estimate 
 reasonably accurately, this might provide better performance even for index 
 queries that both EQ and non-EQ clauses, because some non-EQ clauses may have 
 a much better selectivity than EQ ones (say you index both the user country 
 and birth date, for SELECT * FROM users WHERE country = 'US' AND birthdate  
 'Jan 2009' AND birtdate  'July 2009', you'd better use the birthdate index 
 first).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-4476) Support 2ndary index queries with only inequality clauses (LT, LTE, GT, GTE)

2014-12-04 Thread Oded Peer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14234764#comment-14234764
 ] 

Oded Peer commented on CASSANDRA-4476:
--

I understand. I created a test that demonstrates the issue.
That's a really good catch on your behalf.

I can't see a good way to query an index range and return the result in token 
order for paging.
It might be done by fetching the entire table into memory and sorting all the 
rows by token value, but that's just wrong.
Is it OK to close the issue as won't fix?

 Support 2ndary index queries with only inequality clauses (LT, LTE, GT, GTE)
 

 Key: CASSANDRA-4476
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4476
 Project: Cassandra
  Issue Type: Improvement
  Components: API, Core
Reporter: Sylvain Lebresne
Assignee: Oded Peer
Priority: Minor
  Labels: cql
 Fix For: 3.0

 Attachments: 4476-2.patch, 4476-3.patch, 4476-5.patch, 
 cassandra-trunk-4476.patch


 Currently, a query that uses 2ndary indexes must have at least one EQ clause 
 (on an indexed column). Given that indexed CFs are local (and use 
 LocalPartitioner that order the row by the type of the indexed column), we 
 should extend 2ndary indexes to allow querying indexed columns even when no 
 EQ clause is provided.
 As far as I can tell, the main problem to solve for this is to update 
 KeysSearcher.highestSelectivityPredicate(). I.e. how do we estimate the 
 selectivity of non-EQ clauses? I note however that if we can do that estimate 
 reasonably accurately, this might provide better performance even for index 
 queries that both EQ and non-EQ clauses, because some non-EQ clauses may have 
 a much better selectivity than EQ ones (say you index both the user country 
 and birth date, for SELECT * FROM users WHERE country = 'US' AND birthdate  
 'Jan 2009' AND birtdate  'July 2009', you'd better use the birthdate index 
 first).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-4476) Support 2ndary index queries with only inequality clauses (LT, LTE, GT, GTE)

2014-12-03 Thread Jeremiah Jordan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233123#comment-14233123
 ] 

Jeremiah Jordan commented on CASSANDRA-4476:


bq. I don't understand the problem in your example. The query result seems 
valid to me.

The problem is that (1, 6), (2, 6) are never going to be returned if you keep 
paging through the query in that fashion.

bq. In addition, can you please explain how a query using only secondary 
indexes such as select k from my_table where index1 = 5 and index2  10 allow 
filtering retains token order?

What do you mean?  It retains token order (and then clustering order) by the 
results from select k from my_table where index1 = 5 being in  order, and 
then filtering out anything with index2  10.

 Support 2ndary index queries with only inequality clauses (LT, LTE, GT, GTE)
 

 Key: CASSANDRA-4476
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4476
 Project: Cassandra
  Issue Type: Improvement
  Components: API, Core
Reporter: Sylvain Lebresne
Assignee: Oded Peer
Priority: Minor
  Labels: cql
 Fix For: 3.0

 Attachments: 4476-2.patch, 4476-3.patch, 4476-5.patch, 
 cassandra-trunk-4476.patch


 Currently, a query that uses 2ndary indexes must have at least one EQ clause 
 (on an indexed column). Given that indexed CFs are local (and use 
 LocalPartitioner that order the row by the type of the indexed column), we 
 should extend 2ndary indexes to allow querying indexed columns even when no 
 EQ clause is provided.
 As far as I can tell, the main problem to solve for this is to update 
 KeysSearcher.highestSelectivityPredicate(). I.e. how do we estimate the 
 selectivity of non-EQ clauses? I note however that if we can do that estimate 
 reasonably accurately, this might provide better performance even for index 
 queries that both EQ and non-EQ clauses, because some non-EQ clauses may have 
 a much better selectivity than EQ ones (say you index both the user country 
 and birth date, for SELECT * FROM users WHERE country = 'US' AND birthdate  
 'Jan 2009' AND birtdate  'July 2009', you'd better use the birthdate index 
 first).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-4476) Support 2ndary index queries with only inequality clauses (LT, LTE, GT, GTE)

2014-12-03 Thread Oded Peer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233307#comment-14233307
 ] 

Oded Peer commented on CASSANDRA-4476:
--

I am probably missing out something that is obvious to you.
{quote}The problem is that (1, 6), (2, 6) are never going to be returned if you 
keep paging through the query in that fashion.{quote}
Why would you need to return these values if you specified a limit of 3 in your 
query?

When doing a range query over a secondary index a view of the index data is 
created in {{Collationcontroller.collectAllData()}} and iteration is done over 
the intervalTree of that view. If I understand correctly the values in the 
interval tree are the index values, not token, which ensures the iterator 
results are ordered.
According to your comment paging is broken in this use case but the test I am 
running isn't failing. Do you expect the test to fail?


 Support 2ndary index queries with only inequality clauses (LT, LTE, GT, GTE)
 

 Key: CASSANDRA-4476
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4476
 Project: Cassandra
  Issue Type: Improvement
  Components: API, Core
Reporter: Sylvain Lebresne
Assignee: Oded Peer
Priority: Minor
  Labels: cql
 Fix For: 3.0

 Attachments: 4476-2.patch, 4476-3.patch, 4476-5.patch, 
 cassandra-trunk-4476.patch


 Currently, a query that uses 2ndary indexes must have at least one EQ clause 
 (on an indexed column). Given that indexed CFs are local (and use 
 LocalPartitioner that order the row by the type of the indexed column), we 
 should extend 2ndary indexes to allow querying indexed columns even when no 
 EQ clause is provided.
 As far as I can tell, the main problem to solve for this is to update 
 KeysSearcher.highestSelectivityPredicate(). I.e. how do we estimate the 
 selectivity of non-EQ clauses? I note however that if we can do that estimate 
 reasonably accurately, this might provide better performance even for index 
 queries that both EQ and non-EQ clauses, because some non-EQ clauses may have 
 a much better selectivity than EQ ones (say you index both the user country 
 and birth date, for SELECT * FROM users WHERE country = 'US' AND birthdate  
 'Jan 2009' AND birtdate  'July 2009', you'd better use the birthdate index 
 first).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-4476) Support 2ndary index queries with only inequality clauses (LT, LTE, GT, GTE)

2014-12-03 Thread Jeremiah Jordan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233367#comment-14233367
 ] 

Jeremiah Jordan commented on CASSANDRA-4476:


The issue is that you need to be able to keep saying give me 3 more starting 
from where I left off and eventually get all of the values.

 Support 2ndary index queries with only inequality clauses (LT, LTE, GT, GTE)
 

 Key: CASSANDRA-4476
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4476
 Project: Cassandra
  Issue Type: Improvement
  Components: API, Core
Reporter: Sylvain Lebresne
Assignee: Oded Peer
Priority: Minor
  Labels: cql
 Fix For: 3.0

 Attachments: 4476-2.patch, 4476-3.patch, 4476-5.patch, 
 cassandra-trunk-4476.patch


 Currently, a query that uses 2ndary indexes must have at least one EQ clause 
 (on an indexed column). Given that indexed CFs are local (and use 
 LocalPartitioner that order the row by the type of the indexed column), we 
 should extend 2ndary indexes to allow querying indexed columns even when no 
 EQ clause is provided.
 As far as I can tell, the main problem to solve for this is to update 
 KeysSearcher.highestSelectivityPredicate(). I.e. how do we estimate the 
 selectivity of non-EQ clauses? I note however that if we can do that estimate 
 reasonably accurately, this might provide better performance even for index 
 queries that both EQ and non-EQ clauses, because some non-EQ clauses may have 
 a much better selectivity than EQ ones (say you index both the user country 
 and birth date, for SELECT * FROM users WHERE country = 'US' AND birthdate  
 'Jan 2009' AND birtdate  'July 2009', you'd better use the birthdate index 
 first).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-4476) Support 2ndary index queries with only inequality clauses (LT, LTE, GT, GTE)

2014-12-02 Thread Benjamin Lerer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14231477#comment-14231477
 ] 

Benjamin Lerer commented on CASSANDRA-4476:
---

{quote}I see it as a trade-off between code complexity and query performance. 
As Sylvain explained in his earlier comment more than one indexed column means 
ALLOW FILTERING, for which all bets are off in terms of performance 
anyway.{quote}

 In the query {{Select * from myTable where a  1 and a  3}} there is only one 
indexed column {{a}} and as such this query does not need filtering and the 
performance should be predictable.

{quote}While it is good to strive and deliver the optimal performance 
altogether I think the use case you are describing is rare.{quote}

It is common use case. It is used a lot with time series data for example. When 
people want to analyse what happened for a range of dates.

{quote}Jonathan Ellis described “When Not to Use Secondary Indexes” in a blog 
post Do not use secondary indexes to query a huge volume of records for a small 
number of results{quote}

The statement of Jonathan is true but it has nothing to do with the ability to 
perform range query on an index. It is about choosing the right tool to query 
data based on your data distribution.

{quote} so for the proper use of indexed queries this shouldn't have a 
significant effect but it would make the code more complex.{quote}
Actually, if you think about it you will realize that it can have a big impact.
 

 Support 2ndary index queries with only inequality clauses (LT, LTE, GT, GTE)
 

 Key: CASSANDRA-4476
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4476
 Project: Cassandra
  Issue Type: Improvement
  Components: API, Core
Reporter: Sylvain Lebresne
Assignee: Oded Peer
Priority: Minor
  Labels: cql
 Fix For: 3.0

 Attachments: 4476-2.patch, 4476-3.patch, cassandra-trunk-4476.patch


 Currently, a query that uses 2ndary indexes must have at least one EQ clause 
 (on an indexed column). Given that indexed CFs are local (and use 
 LocalPartitioner that order the row by the type of the indexed column), we 
 should extend 2ndary indexes to allow querying indexed columns even when no 
 EQ clause is provided.
 As far as I can tell, the main problem to solve for this is to update 
 KeysSearcher.highestSelectivityPredicate(). I.e. how do we estimate the 
 selectivity of non-EQ clauses? I note however that if we can do that estimate 
 reasonably accurately, this might provide better performance even for index 
 queries that both EQ and non-EQ clauses, because some non-EQ clauses may have 
 a much better selectivity than EQ ones (say you index both the user country 
 and birth date, for SELECT * FROM users WHERE country = 'US' AND birthdate  
 'Jan 2009' AND birtdate  'July 2009', you'd better use the birthdate index 
 first).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-4476) Support 2ndary index queries with only inequality clauses (LT, LTE, GT, GTE)

2014-12-02 Thread Benjamin Lerer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14231775#comment-14231775
 ] 

Benjamin Lerer commented on CASSANDRA-4476:
---

Here are my review feedback on the latest patch:
* I think that you should use {{isRange}} or {{isSlice}} instead of 
{{isRelationalOrderOperator}} as it is clearer.
* The name of test class: {{SecondaryIndexNonEqTest}} is misleading. 
{{CONTAINS}} an {{CONTAINS KEY}} operator are also non eq tests.
* In {{getRelationalOrderEstimatedSize}} I do not understand why you do not 
return 0 if {{estimatedKeysForRange}} return 0. Could you explain?
* Instead of doing some dangerous casting in 
{{getRelationalOrderEstimatedSize}}, you should change the type from 
{{bestMeanCount}} from int to long.
* In {{computeNext}} I do not understand why you do not check for stale data 
for range queries? Could you explain?
* I think it would be nicer to have also an iterator for EQ and use 
polymorphism instead of if else.
* The close method of the {{AbstractScanIterator}} returned by 
{{getSequentialIterator}} should be called from the close method.
* The Unit tests are only covering a subset of the possible queries. Could you 
add more (a  3 and a 4, a  3 and a  4 ...)
* When testing for InvalidRequestException you should use 
{{assertInvalidMessage}}

 Support 2ndary index queries with only inequality clauses (LT, LTE, GT, GTE)
 

 Key: CASSANDRA-4476
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4476
 Project: Cassandra
  Issue Type: Improvement
  Components: API, Core
Reporter: Sylvain Lebresne
Assignee: Oded Peer
Priority: Minor
  Labels: cql
 Fix For: 3.0

 Attachments: 4476-2.patch, 4476-3.patch, cassandra-trunk-4476.patch


 Currently, a query that uses 2ndary indexes must have at least one EQ clause 
 (on an indexed column). Given that indexed CFs are local (and use 
 LocalPartitioner that order the row by the type of the indexed column), we 
 should extend 2ndary indexes to allow querying indexed columns even when no 
 EQ clause is provided.
 As far as I can tell, the main problem to solve for this is to update 
 KeysSearcher.highestSelectivityPredicate(). I.e. how do we estimate the 
 selectivity of non-EQ clauses? I note however that if we can do that estimate 
 reasonably accurately, this might provide better performance even for index 
 queries that both EQ and non-EQ clauses, because some non-EQ clauses may have 
 a much better selectivity than EQ ones (say you index both the user country 
 and birth date, for SELECT * FROM users WHERE country = 'US' AND birthdate  
 'Jan 2009' AND birtdate  'July 2009', you'd better use the birthdate index 
 first).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-4476) Support 2ndary index queries with only inequality clauses (LT, LTE, GT, GTE)

2014-12-02 Thread Jeremiah Jordan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14231898#comment-14231898
 ] 

Jeremiah Jordan commented on CASSANDRA-4476:


I think you need to re-visit the issue of the result ordering.  Without the 
full result set being in token order you cannot page through the results from 
the secondary index.  Internal and user driven paging rely on being able to 
start the next page by knowing the token the previous page ended on.  With an 
implementation that does not return the results in token order, you cannot send 
the end token of the previous result as the start token for the next page, 
or you will skip all values for following index rows that have a token before 
that.  For example:

Dataset:
{noformat}
(token(key), indexed)
(1, 6), (2, 6), (3, 5), (4, 5), (5, 5), (6, 5), (7, 6), (8, 6)
{noformat}

{noformat}
select token(key),indexed from temp where indexed  4 limit 3;
3, 5
4, 5
5, 5
{noformat}

Then without proper token order results:

{noformat}
select token(key),indexed from temp where indexed  4 and token(key)  5 limit 
3;
6, 5
7, 6
8, 6
{noformat}

You just skipped (1, 6) and (2, 6) and can not get them.


 Support 2ndary index queries with only inequality clauses (LT, LTE, GT, GTE)
 

 Key: CASSANDRA-4476
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4476
 Project: Cassandra
  Issue Type: Improvement
  Components: API, Core
Reporter: Sylvain Lebresne
Assignee: Oded Peer
Priority: Minor
  Labels: cql
 Fix For: 3.0

 Attachments: 4476-2.patch, 4476-3.patch, cassandra-trunk-4476.patch


 Currently, a query that uses 2ndary indexes must have at least one EQ clause 
 (on an indexed column). Given that indexed CFs are local (and use 
 LocalPartitioner that order the row by the type of the indexed column), we 
 should extend 2ndary indexes to allow querying indexed columns even when no 
 EQ clause is provided.
 As far as I can tell, the main problem to solve for this is to update 
 KeysSearcher.highestSelectivityPredicate(). I.e. how do we estimate the 
 selectivity of non-EQ clauses? I note however that if we can do that estimate 
 reasonably accurately, this might provide better performance even for index 
 queries that both EQ and non-EQ clauses, because some non-EQ clauses may have 
 a much better selectivity than EQ ones (say you index both the user country 
 and birth date, for SELECT * FROM users WHERE country = 'US' AND birthdate  
 'Jan 2009' AND birtdate  'July 2009', you'd better use the birthdate index 
 first).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-4476) Support 2ndary index queries with only inequality clauses (LT, LTE, GT, GTE)

2014-11-28 Thread Benjamin Lerer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14228264#comment-14228264
 ] 

Benjamin Lerer commented on CASSANDRA-4476:
---

Here are my feedbacks:
* Be carefull with whitespaces and indentation I had to use 
--ignore-space-change and --ignore-whitespace to be able to apply your patch. I 
had 82 lines with whitespace errors.
* {{SecondaryIndex.supportOperator}} is overloaded. Secondary indices also 
support {{contains}} and {{contains key}} operators.
You ignored them completely and broke that part of the code as you will see if 
you run {{ContainsRelationTest}}.
* The goal of {{SecondaryIndexSearcher.highestSelectivityPredicate}} is to 
determine which index will be selecting the smallest amount of rows (highest 
selectivity). There is no reason why an equal operator should select less row 
than a slice operator.
* Index expressions should be grouped when multiple slices apply to the same 
column. If a user does the following query Select * from myTable where a  1 
and a  3 you should only scan from 1 to 3 and not from 1 to infinity or from 
-infinity to 3.
* I have some trouble to understand the changes that you made in  
{{CompositesSearcher}}. Could you add some comments to explain your approach?
* You should use meaningfull name for the test methods. Naming them {{bug4476}} 
force the reader to go to JIRA to have some clue about what the method is 
actualy testing.
* Using {{pageSize}} as an instance variable in {{CQLTest}} is dangerous as it 
can have some unwanted effect on the other test methods (specially as JUnit 
does not guarantee the method execution order since Java 7).

 Support 2ndary index queries with only inequality clauses (LT, LTE, GT, GTE)
 

 Key: CASSANDRA-4476
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4476
 Project: Cassandra
  Issue Type: Improvement
  Components: API, Core
Reporter: Sylvain Lebresne
Assignee: Oded Peer
Priority: Minor
  Labels: cql
 Fix For: 3.0

 Attachments: 4476-2.patch, cassandra-trunk-4476.patch


 Currently, a query that uses 2ndary indexes must have at least one EQ clause 
 (on an indexed column). Given that indexed CFs are local (and use 
 LocalPartitioner that order the row by the type of the indexed column), we 
 should extend 2ndary indexes to allow querying indexed columns even when no 
 EQ clause is provided.
 As far as I can tell, the main problem to solve for this is to update 
 KeysSearcher.highestSelectivityPredicate(). I.e. how do we estimate the 
 selectivity of non-EQ clauses? I note however that if we can do that estimate 
 reasonably accurately, this might provide better performance even for index 
 queries that both EQ and non-EQ clauses, because some non-EQ clauses may have 
 a much better selectivity than EQ ones (say you index both the user country 
 and birth date, for SELECT * FROM users WHERE country = 'US' AND birthdate  
 'Jan 2009' AND birtdate  'July 2009', you'd better use the birthdate index 
 first).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-4476) Support 2ndary index queries with only inequality clauses (LT, LTE, GT, GTE)

2014-11-25 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224410#comment-14224410
 ] 

Sylvain Lebresne commented on CASSANDRA-4476:
-

I will note that so far for index queries we return results in partitioner 
order and clustering order (within a given partition key), but this won't be 
the case in this issue (each index entry has hits in partitioner order and 
clustering order, but we're sequentially scanning multiple entries and so the 
overall result will not be in order.

I suspect we might just have to accept that, but this at least mean that:
* the patch needs to refuse {{ORDER BY}} for such queries as we can't do it 
without post-query re-ordering, and post-query re-ordering doesn't work with 
paging so let's client do ordering client side if they want to.
* we should carefully test the paging of those queries. I'm not sure it's 
broken by this but we should have tests.

 Support 2ndary index queries with only inequality clauses (LT, LTE, GT, GTE)
 

 Key: CASSANDRA-4476
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4476
 Project: Cassandra
  Issue Type: Improvement
  Components: API, Core
Reporter: Sylvain Lebresne
Assignee: Oded Peer
Priority: Minor
  Labels: cql
 Fix For: 3.0

 Attachments: cassandra-trunk-4476.patch


 Currently, a query that uses 2ndary indexes must have at least one EQ clause 
 (on an indexed column). Given that indexed CFs are local (and use 
 LocalPartitioner that order the row by the type of the indexed column), we 
 should extend 2ndary indexes to allow querying indexed columns even when no 
 EQ clause is provided.
 As far as I can tell, the main problem to solve for this is to update 
 KeysSearcher.highestSelectivityPredicate(). I.e. how do we estimate the 
 selectivity of non-EQ clauses? I note however that if we can do that estimate 
 reasonably accurately, this might provide better performance even for index 
 queries that both EQ and non-EQ clauses, because some non-EQ clauses may have 
 a much better selectivity than EQ ones (say you index both the user country 
 and birth date, for SELECT * FROM users WHERE country = 'US' AND birthdate  
 'Jan 2009' AND birtdate  'July 2009', you'd better use the birthdate index 
 first).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-4476) Support 2ndary index queries with only inequality clauses (LT, LTE, GT, GTE)

2014-10-20 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14176736#comment-14176736
 ] 

Sylvain Lebresne commented on CASSANDRA-4476:
-

Good question. This ticket was only ever meant to deal with {{LT}}, {{LTE}}, 
{{GTE}} and {{GT}} so let's leave it to that for this ticket (I've made the 
title more precise). Regarding {{IN}}, it could be supported, but for the sake 
of doing one thing at a time, it's probably better to leave it a as follow up 
of this ticket. For {{NEQ}}, I see no way to do it in even a vaguely efficient 
way (at least with the current indexing scheme) so I don't think there is any 
plan to ever support it (but even if someone has a brilliant idea how to do it, 
it's definitively a separate issue). 

 Support 2ndary index queries with only inequality clauses (LT, LTE, GT, GTE)
 

 Key: CASSANDRA-4476
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4476
 Project: Cassandra
  Issue Type: Improvement
  Components: API, Core
Reporter: Sylvain Lebresne
Priority: Minor
  Labels: cql
 Fix For: 2.1.2


 Currently, a query that uses 2ndary indexes must have at least one EQ clause 
 (on an indexed column). Given that indexed CFs are local (and use 
 LocalPartitioner that order the row by the type of the indexed column), we 
 should extend 2ndary indexes to allow querying indexed columns even when no 
 EQ clause is provided.
 As far as I can tell, the main problem to solve for this is to update 
 KeysSearcher.highestSelectivityPredicate(). I.e. how do we estimate the 
 selectivity of non-EQ clauses? I note however that if we can do that estimate 
 reasonably accurately, this might provide better performance even for index 
 queries that both EQ and non-EQ clauses, because some non-EQ clauses may have 
 a much better selectivity than EQ ones (say you index both the user country 
 and birth date, for SELECT * FROM users WHERE country = 'US' AND birthdate  
 'Jan 2009' AND birtdate  'July 2009', you'd better use the birthdate index 
 first).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)