[jira] [Updated] (CASSANDRA-7016) can't map/reduce over subset of rows with cql

2015-01-05 Thread Benjamin Lerer (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Lerer updated CASSANDRA-7016:
--
Attachment: CASSANDRA-7016-V5-trunk.txt

This patch fixes the problems mentioned by Tyler

 can't map/reduce over subset of rows with cql
 -

 Key: CASSANDRA-7016
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7016
 Project: Cassandra
  Issue Type: Bug
  Components: Core, Hadoop
Reporter: Jonathan Halliday
Assignee: Benjamin Lerer
Priority: Minor
  Labels: cql, docs
 Fix For: 3.0

 Attachments: CASSANDRA-7016-V2.txt, CASSANDRA-7016-V3.txt, 
 CASSANDRA-7016-V4-trunk.txt, CASSANDRA-7016-V5-trunk.txt, CASSANDRA-7016.txt


 select ... where token(k)  x and token(k) = y and k in (a,b) allow 
 filtering;
 This fails on 2.0.6: can't restrict k by more than one relation.
 In the context of map/reduce (hence the token range) I want to map over only 
 a subset of the keys (hence the 'in').  Pushing the 'in' filter down to cql 
 is substantially cheaper than pulling all rows to the client and then 
 discarding most of them.
 Currently this is possible only if the hadoop integration code is altered to 
 apply the AND on the client side and use cql that contains only the resulting 
 filtered 'in' set.  The problem is not hadoop specific though, so IMO it 
 should really be solved in cql not the hadoop integration code.
 Most restrictions on cql syntax seem to exist to prevent unduly expensive 
 queries. This one seems to be doing the opposite.
 Edit: on further thought and with reference to the code in 
 SelectStatement$RawStatement, it seems to me that  token(k) and k should be 
 considered distinct entities for the purposes of processing restrictions. 
 That is, no restriction on the token should conflict with a restriction on 
 the raw key. That way any monolithic query in terms of k and be decomposed 
 into parallel chunks over the token range for the purposes of map/reduce 
 processing simply by appending a 'and where token(k)...' clause to the 
 exiting 'where k ...'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-7016) can't map/reduce over subset of rows with cql

2014-12-19 Thread Sylvain Lebresne (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-7016:

Reviewer: Tyler Hobbs  (was: Sylvain Lebresne)

 can't map/reduce over subset of rows with cql
 -

 Key: CASSANDRA-7016
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7016
 Project: Cassandra
  Issue Type: Bug
  Components: Core, Hadoop
Reporter: Jonathan Halliday
Assignee: Benjamin Lerer
Priority: Minor
  Labels: cql, docs
 Fix For: 3.0

 Attachments: CASSANDRA-7016-V2.txt, CASSANDRA-7016-V3.txt, 
 CASSANDRA-7016-V4-trunk.txt, CASSANDRA-7016.txt


 select ... where token(k)  x and token(k) = y and k in (a,b) allow 
 filtering;
 This fails on 2.0.6: can't restrict k by more than one relation.
 In the context of map/reduce (hence the token range) I want to map over only 
 a subset of the keys (hence the 'in').  Pushing the 'in' filter down to cql 
 is substantially cheaper than pulling all rows to the client and then 
 discarding most of them.
 Currently this is possible only if the hadoop integration code is altered to 
 apply the AND on the client side and use cql that contains only the resulting 
 filtered 'in' set.  The problem is not hadoop specific though, so IMO it 
 should really be solved in cql not the hadoop integration code.
 Most restrictions on cql syntax seem to exist to prevent unduly expensive 
 queries. This one seems to be doing the opposite.
 Edit: on further thought and with reference to the code in 
 SelectStatement$RawStatement, it seems to me that  token(k) and k should be 
 considered distinct entities for the purposes of processing restrictions. 
 That is, no restriction on the token should conflict with a restriction on 
 the raw key. That way any monolithic query in terms of k and be decomposed 
 into parallel chunks over the token range for the purposes of map/reduce 
 processing simply by appending a 'and where token(k)...' clause to the 
 exiting 'where k ...'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-7016) can't map/reduce over subset of rows with cql

2014-12-17 Thread Benjamin Lerer (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Lerer updated CASSANDRA-7016:
--
Fix Version/s: (was: 2.1.3)
   3.0
   Labels: cql docs  (was: cql)

 can't map/reduce over subset of rows with cql
 -

 Key: CASSANDRA-7016
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7016
 Project: Cassandra
  Issue Type: Bug
  Components: Core, Hadoop
Reporter: Jonathan Halliday
Assignee: Benjamin Lerer
Priority: Minor
  Labels: cql, docs
 Fix For: 3.0

 Attachments: CASSANDRA-7016-V2.txt, CASSANDRA-7016-V3.txt, 
 CASSANDRA-7016.txt


 select ... where token(k)  x and token(k) = y and k in (a,b) allow 
 filtering;
 This fails on 2.0.6: can't restrict k by more than one relation.
 In the context of map/reduce (hence the token range) I want to map over only 
 a subset of the keys (hence the 'in').  Pushing the 'in' filter down to cql 
 is substantially cheaper than pulling all rows to the client and then 
 discarding most of them.
 Currently this is possible only if the hadoop integration code is altered to 
 apply the AND on the client side and use cql that contains only the resulting 
 filtered 'in' set.  The problem is not hadoop specific though, so IMO it 
 should really be solved in cql not the hadoop integration code.
 Most restrictions on cql syntax seem to exist to prevent unduly expensive 
 queries. This one seems to be doing the opposite.
 Edit: on further thought and with reference to the code in 
 SelectStatement$RawStatement, it seems to me that  token(k) and k should be 
 considered distinct entities for the purposes of processing restrictions. 
 That is, no restriction on the token should conflict with a restriction on 
 the raw key. That way any monolithic query in terms of k and be decomposed 
 into parallel chunks over the token range for the purposes of map/reduce 
 processing simply by appending a 'and where token(k)...' clause to the 
 exiting 'where k ...'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-7016) can't map/reduce over subset of rows with cql

2014-12-17 Thread Benjamin Lerer (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Lerer updated CASSANDRA-7016:
--
Attachment: CASSANDRA-7016-V4-trunk.txt

The patch introduce a new {{PrimaryKeyRestrictions}} called {{TokenFilter}} 
that allow to merge token and non token restrictions for the partition key.
The patch has been made for trunk as CASSANDRA-7981was a prerequisite to make 
the patch work for all the possible queries.

 can't map/reduce over subset of rows with cql
 -

 Key: CASSANDRA-7016
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7016
 Project: Cassandra
  Issue Type: Bug
  Components: Core, Hadoop
Reporter: Jonathan Halliday
Assignee: Benjamin Lerer
Priority: Minor
  Labels: cql, docs
 Fix For: 3.0

 Attachments: CASSANDRA-7016-V2.txt, CASSANDRA-7016-V3.txt, 
 CASSANDRA-7016-V4-trunk.txt, CASSANDRA-7016.txt


 select ... where token(k)  x and token(k) = y and k in (a,b) allow 
 filtering;
 This fails on 2.0.6: can't restrict k by more than one relation.
 In the context of map/reduce (hence the token range) I want to map over only 
 a subset of the keys (hence the 'in').  Pushing the 'in' filter down to cql 
 is substantially cheaper than pulling all rows to the client and then 
 discarding most of them.
 Currently this is possible only if the hadoop integration code is altered to 
 apply the AND on the client side and use cql that contains only the resulting 
 filtered 'in' set.  The problem is not hadoop specific though, so IMO it 
 should really be solved in cql not the hadoop integration code.
 Most restrictions on cql syntax seem to exist to prevent unduly expensive 
 queries. This one seems to be doing the opposite.
 Edit: on further thought and with reference to the code in 
 SelectStatement$RawStatement, it seems to me that  token(k) and k should be 
 considered distinct entities for the purposes of processing restrictions. 
 That is, no restriction on the token should conflict with a restriction on 
 the raw key. That way any monolithic query in terms of k and be decomposed 
 into parallel chunks over the token range for the purposes of map/reduce 
 processing simply by appending a 'and where token(k)...' clause to the 
 exiting 'where k ...'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-7016) can't map/reduce over subset of rows with cql

2014-11-10 Thread T Jake Luciani (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

T Jake Luciani updated CASSANDRA-7016:
--
Fix Version/s: (was: 2.1.2)
   2.1.3

 can't map/reduce over subset of rows with cql
 -

 Key: CASSANDRA-7016
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7016
 Project: Cassandra
  Issue Type: Bug
  Components: Core, Hadoop
Reporter: Jonathan Halliday
Assignee: Benjamin Lerer
Priority: Minor
  Labels: cql
 Fix For: 2.1.3

 Attachments: CASSANDRA-7016-V2.txt, CASSANDRA-7016-V3.txt, 
 CASSANDRA-7016.txt


 select ... where token(k)  x and token(k) = y and k in (a,b) allow 
 filtering;
 This fails on 2.0.6: can't restrict k by more than one relation.
 In the context of map/reduce (hence the token range) I want to map over only 
 a subset of the keys (hence the 'in').  Pushing the 'in' filter down to cql 
 is substantially cheaper than pulling all rows to the client and then 
 discarding most of them.
 Currently this is possible only if the hadoop integration code is altered to 
 apply the AND on the client side and use cql that contains only the resulting 
 filtered 'in' set.  The problem is not hadoop specific though, so IMO it 
 should really be solved in cql not the hadoop integration code.
 Most restrictions on cql syntax seem to exist to prevent unduly expensive 
 queries. This one seems to be doing the opposite.
 Edit: on further thought and with reference to the code in 
 SelectStatement$RawStatement, it seems to me that  token(k) and k should be 
 considered distinct entities for the purposes of processing restrictions. 
 That is, no restriction on the token should conflict with a restriction on 
 the raw key. That way any monolithic query in terms of k and be decomposed 
 into parallel chunks over the token range for the purposes of map/reduce 
 processing simply by appending a 'and where token(k)...' clause to the 
 exiting 'where k ...'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-7016) can't map/reduce over subset of rows with cql

2014-08-28 Thread Benjamin Lerer (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Lerer updated CASSANDRA-7016:
--

Attachment: CASSANDRA-7016-V3.txt

The patch fix the previously mentioned issues and fix also the fact that the 
previous solution was not working properly with partition key on multiple 
columns.


 can't map/reduce over subset of rows with cql
 -

 Key: CASSANDRA-7016
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7016
 Project: Cassandra
  Issue Type: Bug
  Components: Core, Hadoop
Reporter: Jonathan Halliday
Assignee: Benjamin Lerer
Priority: Minor
  Labels: cql
 Fix For: 2.1.1

 Attachments: CASSANDRA-7016-V2.txt, CASSANDRA-7016-V3.txt, 
 CASSANDRA-7016.txt


 select ... where token(k)  x and token(k) = y and k in (a,b) allow 
 filtering;
 This fails on 2.0.6: can't restrict k by more than one relation.
 In the context of map/reduce (hence the token range) I want to map over only 
 a subset of the keys (hence the 'in').  Pushing the 'in' filter down to cql 
 is substantially cheaper than pulling all rows to the client and then 
 discarding most of them.
 Currently this is possible only if the hadoop integration code is altered to 
 apply the AND on the client side and use cql that contains only the resulting 
 filtered 'in' set.  The problem is not hadoop specific though, so IMO it 
 should really be solved in cql not the hadoop integration code.
 Most restrictions on cql syntax seem to exist to prevent unduly expensive 
 queries. This one seems to be doing the opposite.
 Edit: on further thought and with reference to the code in 
 SelectStatement$RawStatement, it seems to me that  token(k) and k should be 
 considered distinct entities for the purposes of processing restrictions. 
 That is, no restriction on the token should conflict with a restriction on 
 the raw key. That way any monolithic query in terms of k and be decomposed 
 into parallel chunks over the token range for the purposes of map/reduce 
 processing simply by appending a 'and where token(k)...' clause to the 
 exiting 'where k ...'.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-7016) can't map/reduce over subset of rows with cql

2014-08-20 Thread Benjamin Lerer (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Lerer updated CASSANDRA-7016:
--

Attachment: CASSANDRA-7016-V2.txt

I forgot to modify the SelectStatement execute method in the previous patch. 

 can't map/reduce over subset of rows with cql
 -

 Key: CASSANDRA-7016
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7016
 Project: Cassandra
  Issue Type: Bug
  Components: Core, Hadoop
Reporter: Jonathan Halliday
Assignee: Benjamin Lerer
Priority: Minor
  Labels: cql
 Fix For: 2.1.1

 Attachments: CASSANDRA-7016-V2.txt, CASSANDRA-7016.txt


 select ... where token(k)  x and token(k) = y and k in (a,b) allow 
 filtering;
 This fails on 2.0.6: can't restrict k by more than one relation.
 In the context of map/reduce (hence the token range) I want to map over only 
 a subset of the keys (hence the 'in').  Pushing the 'in' filter down to cql 
 is substantially cheaper than pulling all rows to the client and then 
 discarding most of them.
 Currently this is possible only if the hadoop integration code is altered to 
 apply the AND on the client side and use cql that contains only the resulting 
 filtered 'in' set.  The problem is not hadoop specific though, so IMO it 
 should really be solved in cql not the hadoop integration code.
 Most restrictions on cql syntax seem to exist to prevent unduly expensive 
 queries. This one seems to be doing the opposite.
 Edit: on further thought and with reference to the code in 
 SelectStatement$RawStatement, it seems to me that  token(k) and k should be 
 considered distinct entities for the purposes of processing restrictions. 
 That is, no restriction on the token should conflict with a restriction on 
 the raw key. That way any monolithic query in terms of k and be decomposed 
 into parallel chunks over the token range for the purposes of map/reduce 
 processing simply by appending a 'and where token(k)...' clause to the 
 exiting 'where k ...'.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-7016) can't map/reduce over subset of rows with cql

2014-08-08 Thread Benjamin Lerer (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Lerer updated CASSANDRA-7016:
--

Fix Version/s: (was: 2.0.10)
   2.1.1

 can't map/reduce over subset of rows with cql
 -

 Key: CASSANDRA-7016
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7016
 Project: Cassandra
  Issue Type: Bug
  Components: Core, Hadoop
Reporter: Jonathan Halliday
Assignee: Benjamin Lerer
Priority: Minor
  Labels: cql
 Fix For: 2.1.1

 Attachments: CASSANDRA-7016.txt


 select ... where token(k)  x and token(k) = y and k in (a,b) allow 
 filtering;
 This fails on 2.0.6: can't restrict k by more than one relation.
 In the context of map/reduce (hence the token range) I want to map over only 
 a subset of the keys (hence the 'in').  Pushing the 'in' filter down to cql 
 is substantially cheaper than pulling all rows to the client and then 
 discarding most of them.
 Currently this is possible only if the hadoop integration code is altered to 
 apply the AND on the client side and use cql that contains only the resulting 
 filtered 'in' set.  The problem is not hadoop specific though, so IMO it 
 should really be solved in cql not the hadoop integration code.
 Most restrictions on cql syntax seem to exist to prevent unduly expensive 
 queries. This one seems to be doing the opposite.
 Edit: on further thought and with reference to the code in 
 SelectStatement$RawStatement, it seems to me that  token(k) and k should be 
 considered distinct entities for the purposes of processing restrictions. 
 That is, no restriction on the token should conflict with a restriction on 
 the raw key. That way any monolithic query in terms of k and be decomposed 
 into parallel chunks over the token range for the purposes of map/reduce 
 processing simply by appending a 'and where token(k)...' clause to the 
 exiting 'where k ...'.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-7016) can't map/reduce over subset of rows with cql

2014-08-08 Thread Benjamin Lerer (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Lerer updated CASSANDRA-7016:
--

Attachment: CASSANDRA-7016.txt

 can't map/reduce over subset of rows with cql
 -

 Key: CASSANDRA-7016
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7016
 Project: Cassandra
  Issue Type: Bug
  Components: Core, Hadoop
Reporter: Jonathan Halliday
Assignee: Benjamin Lerer
Priority: Minor
  Labels: cql
 Fix For: 2.1.1

 Attachments: CASSANDRA-7016.txt


 select ... where token(k)  x and token(k) = y and k in (a,b) allow 
 filtering;
 This fails on 2.0.6: can't restrict k by more than one relation.
 In the context of map/reduce (hence the token range) I want to map over only 
 a subset of the keys (hence the 'in').  Pushing the 'in' filter down to cql 
 is substantially cheaper than pulling all rows to the client and then 
 discarding most of them.
 Currently this is possible only if the hadoop integration code is altered to 
 apply the AND on the client side and use cql that contains only the resulting 
 filtered 'in' set.  The problem is not hadoop specific though, so IMO it 
 should really be solved in cql not the hadoop integration code.
 Most restrictions on cql syntax seem to exist to prevent unduly expensive 
 queries. This one seems to be doing the opposite.
 Edit: on further thought and with reference to the code in 
 SelectStatement$RawStatement, it seems to me that  token(k) and k should be 
 considered distinct entities for the purposes of processing restrictions. 
 That is, no restriction on the token should conflict with a restriction on 
 the raw key. That way any monolithic query in terms of k and be decomposed 
 into parallel chunks over the token range for the purposes of map/reduce 
 processing simply by appending a 'and where token(k)...' clause to the 
 exiting 'where k ...'.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-7016) can't map/reduce over subset of rows with cql

2014-07-01 Thread Sylvain Lebresne (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-7016:


Assignee: Benjamin LERER

 can't map/reduce over subset of rows with cql
 -

 Key: CASSANDRA-7016
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7016
 Project: Cassandra
  Issue Type: Bug
  Components: Core, Hadoop
Reporter: Jonathan Halliday
Assignee: Benjamin LERER
Priority: Minor
  Labels: cql
 Fix For: 2.0.10


 select ... where token(k)  x and token(k) = y and k in (a,b) allow 
 filtering;
 This fails on 2.0.6: can't restrict k by more than one relation.
 In the context of map/reduce (hence the token range) I want to map over only 
 a subset of the keys (hence the 'in').  Pushing the 'in' filter down to cql 
 is substantially cheaper than pulling all rows to the client and then 
 discarding most of them.
 Currently this is possible only if the hadoop integration code is altered to 
 apply the AND on the client side and use cql that contains only the resulting 
 filtered 'in' set.  The problem is not hadoop specific though, so IMO it 
 should really be solved in cql not the hadoop integration code.
 Most restrictions on cql syntax seem to exist to prevent unduly expensive 
 queries. This one seems to be doing the opposite.
 Edit: on further thought and with reference to the code in 
 SelectStatement$RawStatement, it seems to me that  token(k) and k should be 
 considered distinct entities for the purposes of processing restrictions. 
 That is, no restriction on the token should conflict with a restriction on 
 the raw key. That way any monolithic query in terms of k and be decomposed 
 into parallel chunks over the token range for the purposes of map/reduce 
 processing simply by appending a 'and where token(k)...' clause to the 
 exiting 'where k ...'.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-7016) can't map/reduce over subset of rows with cql

2014-06-13 Thread Brandon Williams (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-7016:


Fix Version/s: 2.0.9

 can't map/reduce over subset of rows with cql
 -

 Key: CASSANDRA-7016
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7016
 Project: Cassandra
  Issue Type: Bug
  Components: Core, Hadoop
Reporter: Jonathan Halliday
Priority: Minor
 Fix For: 2.0.9


 select ... where token(k)  x and token(k) = y and k in (a,b) allow 
 filtering;
 This fails on 2.0.6: can't restrict k by more than one relation.
 In the context of map/reduce (hence the token range) I want to map over only 
 a subset of the keys (hence the 'in').  Pushing the 'in' filter down to cql 
 is substantially cheaper than pulling all rows to the client and then 
 discarding most of them.
 Currently this is possible only if the hadoop integration code is altered to 
 apply the AND on the client side and use cql that contains only the resulting 
 filtered 'in' set.  The problem is not hadoop specific though, so IMO it 
 should really be solved in cql not the hadoop integration code.
 Most restrictions on cql syntax seem to exist to prevent unduly expensive 
 queries. This one seems to be doing the opposite.
 Edit: on further thought and with reference to the code in 
 SelectStatement$RawStatement, it seems to me that  token(k) and k should be 
 considered distinct entities for the purposes of processing restrictions. 
 That is, no restriction on the token should conflict with a restriction on 
 the raw key. That way any monolithic query in terms of k and be decomposed 
 into parallel chunks over the token range for the purposes of map/reduce 
 processing simply by appending a 'and where token(k)...' clause to the 
 exiting 'where k ...'.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-7016) can't map/reduce over subset of rows with cql

2014-04-23 Thread Jonathan Halliday (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Halliday updated CASSANDRA-7016:
-

Description: 
select ... where token(k)  x and token(k) = y and k in (a,b) allow filtering;

This fails on 2.0.6: can't restrict k by more than one relation.

In the context of map/reduce (hence the token range) I want to map over only a 
subset of the keys (hence the 'in').  Pushing the 'in' filter down to cql is 
substantially cheaper than pulling all rows to the client and then discarding 
most of them.

Currently this is possible only if the hadoop integration code is altered to 
apply the AND on the client side and use cql that contains only the resulting 
filtered 'in' set.  The problem is not hadoop specific though, so IMO it should 
really be solved in cql not the hadoop integration code.

Most restrictions on cql syntax seem to exist to prevent unduly expensive 
queries. This one seems to be doing the opposite.

Edit: on further thought and with reference to the code in 
SelectStatement$RawStatement, it seems to me that  token(k) and k should be 
considered distinct entities for the purposes of processing restrictions. That 
is, no restriction on the token should conflict with a restriction on the raw 
key. That way any monolithic query in terms of k and be decomposed into 
parallel chunks over the token range for the purposes of map/reduce processing 
simply by appending a 'and where token(k)...' clause to the exiting 'where k 
...'.

  was:
select ... where token(k)  x and token(k) = y and k in (a,b) allow filtering;

This fails on 2.0.6: can't restrict k by more than one relation.

In the context of map/reduce (hence the token range) I want to map over only a 
subset of the keys (hence the 'in').  Pushing the 'in' filter down to cql is 
substantially cheaper than pulling all rows to the client and then discarding 
most of them.

Currently this is possible only if the hadoop integration code is altered to 
apply the AND on the client side and use cql that contains only the resulting 
filtered 'in' set.  The problem is not hadoop specific though, so IMO it should 
really be solved in cql not the hadoop integration code.

Most restrictions on cql syntax seem to exist to prevent unduly expensive 
queries. This one seems to be doing the opposite.


 can't map/reduce over subset of rows with cql
 -

 Key: CASSANDRA-7016
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7016
 Project: Cassandra
  Issue Type: Bug
  Components: Core, Hadoop
Reporter: Jonathan Halliday

 select ... where token(k)  x and token(k) = y and k in (a,b) allow 
 filtering;
 This fails on 2.0.6: can't restrict k by more than one relation.
 In the context of map/reduce (hence the token range) I want to map over only 
 a subset of the keys (hence the 'in').  Pushing the 'in' filter down to cql 
 is substantially cheaper than pulling all rows to the client and then 
 discarding most of them.
 Currently this is possible only if the hadoop integration code is altered to 
 apply the AND on the client side and use cql that contains only the resulting 
 filtered 'in' set.  The problem is not hadoop specific though, so IMO it 
 should really be solved in cql not the hadoop integration code.
 Most restrictions on cql syntax seem to exist to prevent unduly expensive 
 queries. This one seems to be doing the opposite.
 Edit: on further thought and with reference to the code in 
 SelectStatement$RawStatement, it seems to me that  token(k) and k should be 
 considered distinct entities for the purposes of processing restrictions. 
 That is, no restriction on the token should conflict with a restriction on 
 the raw key. That way any monolithic query in terms of k and be decomposed 
 into parallel chunks over the token range for the purposes of map/reduce 
 processing simply by appending a 'and where token(k)...' clause to the 
 exiting 'where k ...'.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-7016) can't map/reduce over subset of rows with cql

2014-04-23 Thread Sylvain Lebresne (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-7016:


Priority: Minor  (was: Major)

 can't map/reduce over subset of rows with cql
 -

 Key: CASSANDRA-7016
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7016
 Project: Cassandra
  Issue Type: Bug
  Components: Core, Hadoop
Reporter: Jonathan Halliday
Priority: Minor

 select ... where token(k)  x and token(k) = y and k in (a,b) allow 
 filtering;
 This fails on 2.0.6: can't restrict k by more than one relation.
 In the context of map/reduce (hence the token range) I want to map over only 
 a subset of the keys (hence the 'in').  Pushing the 'in' filter down to cql 
 is substantially cheaper than pulling all rows to the client and then 
 discarding most of them.
 Currently this is possible only if the hadoop integration code is altered to 
 apply the AND on the client side and use cql that contains only the resulting 
 filtered 'in' set.  The problem is not hadoop specific though, so IMO it 
 should really be solved in cql not the hadoop integration code.
 Most restrictions on cql syntax seem to exist to prevent unduly expensive 
 queries. This one seems to be doing the opposite.
 Edit: on further thought and with reference to the code in 
 SelectStatement$RawStatement, it seems to me that  token(k) and k should be 
 considered distinct entities for the purposes of processing restrictions. 
 That is, no restriction on the token should conflict with a restriction on 
 the raw key. That way any monolithic query in terms of k and be decomposed 
 into parallel chunks over the token range for the purposes of map/reduce 
 processing simply by appending a 'and where token(k)...' clause to the 
 exiting 'where k ...'.



--
This message was sent by Atlassian JIRA
(v6.2#6252)