[jira] [Updated] (CASSANDRA-7016) can't map/reduce over subset of rows with cql
[ https://issues.apache.org/jira/browse/CASSANDRA-7016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Lerer updated CASSANDRA-7016: -- Attachment: CASSANDRA-7016-V5-trunk.txt This patch fixes the problems mentioned by Tyler can't map/reduce over subset of rows with cql - Key: CASSANDRA-7016 URL: https://issues.apache.org/jira/browse/CASSANDRA-7016 Project: Cassandra Issue Type: Bug Components: Core, Hadoop Reporter: Jonathan Halliday Assignee: Benjamin Lerer Priority: Minor Labels: cql, docs Fix For: 3.0 Attachments: CASSANDRA-7016-V2.txt, CASSANDRA-7016-V3.txt, CASSANDRA-7016-V4-trunk.txt, CASSANDRA-7016-V5-trunk.txt, CASSANDRA-7016.txt select ... where token(k) x and token(k) = y and k in (a,b) allow filtering; This fails on 2.0.6: can't restrict k by more than one relation. In the context of map/reduce (hence the token range) I want to map over only a subset of the keys (hence the 'in'). Pushing the 'in' filter down to cql is substantially cheaper than pulling all rows to the client and then discarding most of them. Currently this is possible only if the hadoop integration code is altered to apply the AND on the client side and use cql that contains only the resulting filtered 'in' set. The problem is not hadoop specific though, so IMO it should really be solved in cql not the hadoop integration code. Most restrictions on cql syntax seem to exist to prevent unduly expensive queries. This one seems to be doing the opposite. Edit: on further thought and with reference to the code in SelectStatement$RawStatement, it seems to me that token(k) and k should be considered distinct entities for the purposes of processing restrictions. That is, no restriction on the token should conflict with a restriction on the raw key. That way any monolithic query in terms of k and be decomposed into parallel chunks over the token range for the purposes of map/reduce processing simply by appending a 'and where token(k)...' clause to the exiting 'where k ...'. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-7016) can't map/reduce over subset of rows with cql
[ https://issues.apache.org/jira/browse/CASSANDRA-7016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne updated CASSANDRA-7016: Reviewer: Tyler Hobbs (was: Sylvain Lebresne) can't map/reduce over subset of rows with cql - Key: CASSANDRA-7016 URL: https://issues.apache.org/jira/browse/CASSANDRA-7016 Project: Cassandra Issue Type: Bug Components: Core, Hadoop Reporter: Jonathan Halliday Assignee: Benjamin Lerer Priority: Minor Labels: cql, docs Fix For: 3.0 Attachments: CASSANDRA-7016-V2.txt, CASSANDRA-7016-V3.txt, CASSANDRA-7016-V4-trunk.txt, CASSANDRA-7016.txt select ... where token(k) x and token(k) = y and k in (a,b) allow filtering; This fails on 2.0.6: can't restrict k by more than one relation. In the context of map/reduce (hence the token range) I want to map over only a subset of the keys (hence the 'in'). Pushing the 'in' filter down to cql is substantially cheaper than pulling all rows to the client and then discarding most of them. Currently this is possible only if the hadoop integration code is altered to apply the AND on the client side and use cql that contains only the resulting filtered 'in' set. The problem is not hadoop specific though, so IMO it should really be solved in cql not the hadoop integration code. Most restrictions on cql syntax seem to exist to prevent unduly expensive queries. This one seems to be doing the opposite. Edit: on further thought and with reference to the code in SelectStatement$RawStatement, it seems to me that token(k) and k should be considered distinct entities for the purposes of processing restrictions. That is, no restriction on the token should conflict with a restriction on the raw key. That way any monolithic query in terms of k and be decomposed into parallel chunks over the token range for the purposes of map/reduce processing simply by appending a 'and where token(k)...' clause to the exiting 'where k ...'. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-7016) can't map/reduce over subset of rows with cql
[ https://issues.apache.org/jira/browse/CASSANDRA-7016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Lerer updated CASSANDRA-7016: -- Fix Version/s: (was: 2.1.3) 3.0 Labels: cql docs (was: cql) can't map/reduce over subset of rows with cql - Key: CASSANDRA-7016 URL: https://issues.apache.org/jira/browse/CASSANDRA-7016 Project: Cassandra Issue Type: Bug Components: Core, Hadoop Reporter: Jonathan Halliday Assignee: Benjamin Lerer Priority: Minor Labels: cql, docs Fix For: 3.0 Attachments: CASSANDRA-7016-V2.txt, CASSANDRA-7016-V3.txt, CASSANDRA-7016.txt select ... where token(k) x and token(k) = y and k in (a,b) allow filtering; This fails on 2.0.6: can't restrict k by more than one relation. In the context of map/reduce (hence the token range) I want to map over only a subset of the keys (hence the 'in'). Pushing the 'in' filter down to cql is substantially cheaper than pulling all rows to the client and then discarding most of them. Currently this is possible only if the hadoop integration code is altered to apply the AND on the client side and use cql that contains only the resulting filtered 'in' set. The problem is not hadoop specific though, so IMO it should really be solved in cql not the hadoop integration code. Most restrictions on cql syntax seem to exist to prevent unduly expensive queries. This one seems to be doing the opposite. Edit: on further thought and with reference to the code in SelectStatement$RawStatement, it seems to me that token(k) and k should be considered distinct entities for the purposes of processing restrictions. That is, no restriction on the token should conflict with a restriction on the raw key. That way any monolithic query in terms of k and be decomposed into parallel chunks over the token range for the purposes of map/reduce processing simply by appending a 'and where token(k)...' clause to the exiting 'where k ...'. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-7016) can't map/reduce over subset of rows with cql
[ https://issues.apache.org/jira/browse/CASSANDRA-7016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Lerer updated CASSANDRA-7016: -- Attachment: CASSANDRA-7016-V4-trunk.txt The patch introduce a new {{PrimaryKeyRestrictions}} called {{TokenFilter}} that allow to merge token and non token restrictions for the partition key. The patch has been made for trunk as CASSANDRA-7981was a prerequisite to make the patch work for all the possible queries. can't map/reduce over subset of rows with cql - Key: CASSANDRA-7016 URL: https://issues.apache.org/jira/browse/CASSANDRA-7016 Project: Cassandra Issue Type: Bug Components: Core, Hadoop Reporter: Jonathan Halliday Assignee: Benjamin Lerer Priority: Minor Labels: cql, docs Fix For: 3.0 Attachments: CASSANDRA-7016-V2.txt, CASSANDRA-7016-V3.txt, CASSANDRA-7016-V4-trunk.txt, CASSANDRA-7016.txt select ... where token(k) x and token(k) = y and k in (a,b) allow filtering; This fails on 2.0.6: can't restrict k by more than one relation. In the context of map/reduce (hence the token range) I want to map over only a subset of the keys (hence the 'in'). Pushing the 'in' filter down to cql is substantially cheaper than pulling all rows to the client and then discarding most of them. Currently this is possible only if the hadoop integration code is altered to apply the AND on the client side and use cql that contains only the resulting filtered 'in' set. The problem is not hadoop specific though, so IMO it should really be solved in cql not the hadoop integration code. Most restrictions on cql syntax seem to exist to prevent unduly expensive queries. This one seems to be doing the opposite. Edit: on further thought and with reference to the code in SelectStatement$RawStatement, it seems to me that token(k) and k should be considered distinct entities for the purposes of processing restrictions. That is, no restriction on the token should conflict with a restriction on the raw key. That way any monolithic query in terms of k and be decomposed into parallel chunks over the token range for the purposes of map/reduce processing simply by appending a 'and where token(k)...' clause to the exiting 'where k ...'. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-7016) can't map/reduce over subset of rows with cql
[ https://issues.apache.org/jira/browse/CASSANDRA-7016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] T Jake Luciani updated CASSANDRA-7016: -- Fix Version/s: (was: 2.1.2) 2.1.3 can't map/reduce over subset of rows with cql - Key: CASSANDRA-7016 URL: https://issues.apache.org/jira/browse/CASSANDRA-7016 Project: Cassandra Issue Type: Bug Components: Core, Hadoop Reporter: Jonathan Halliday Assignee: Benjamin Lerer Priority: Minor Labels: cql Fix For: 2.1.3 Attachments: CASSANDRA-7016-V2.txt, CASSANDRA-7016-V3.txt, CASSANDRA-7016.txt select ... where token(k) x and token(k) = y and k in (a,b) allow filtering; This fails on 2.0.6: can't restrict k by more than one relation. In the context of map/reduce (hence the token range) I want to map over only a subset of the keys (hence the 'in'). Pushing the 'in' filter down to cql is substantially cheaper than pulling all rows to the client and then discarding most of them. Currently this is possible only if the hadoop integration code is altered to apply the AND on the client side and use cql that contains only the resulting filtered 'in' set. The problem is not hadoop specific though, so IMO it should really be solved in cql not the hadoop integration code. Most restrictions on cql syntax seem to exist to prevent unduly expensive queries. This one seems to be doing the opposite. Edit: on further thought and with reference to the code in SelectStatement$RawStatement, it seems to me that token(k) and k should be considered distinct entities for the purposes of processing restrictions. That is, no restriction on the token should conflict with a restriction on the raw key. That way any monolithic query in terms of k and be decomposed into parallel chunks over the token range for the purposes of map/reduce processing simply by appending a 'and where token(k)...' clause to the exiting 'where k ...'. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-7016) can't map/reduce over subset of rows with cql
[ https://issues.apache.org/jira/browse/CASSANDRA-7016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Lerer updated CASSANDRA-7016: -- Attachment: CASSANDRA-7016-V3.txt The patch fix the previously mentioned issues and fix also the fact that the previous solution was not working properly with partition key on multiple columns. can't map/reduce over subset of rows with cql - Key: CASSANDRA-7016 URL: https://issues.apache.org/jira/browse/CASSANDRA-7016 Project: Cassandra Issue Type: Bug Components: Core, Hadoop Reporter: Jonathan Halliday Assignee: Benjamin Lerer Priority: Minor Labels: cql Fix For: 2.1.1 Attachments: CASSANDRA-7016-V2.txt, CASSANDRA-7016-V3.txt, CASSANDRA-7016.txt select ... where token(k) x and token(k) = y and k in (a,b) allow filtering; This fails on 2.0.6: can't restrict k by more than one relation. In the context of map/reduce (hence the token range) I want to map over only a subset of the keys (hence the 'in'). Pushing the 'in' filter down to cql is substantially cheaper than pulling all rows to the client and then discarding most of them. Currently this is possible only if the hadoop integration code is altered to apply the AND on the client side and use cql that contains only the resulting filtered 'in' set. The problem is not hadoop specific though, so IMO it should really be solved in cql not the hadoop integration code. Most restrictions on cql syntax seem to exist to prevent unduly expensive queries. This one seems to be doing the opposite. Edit: on further thought and with reference to the code in SelectStatement$RawStatement, it seems to me that token(k) and k should be considered distinct entities for the purposes of processing restrictions. That is, no restriction on the token should conflict with a restriction on the raw key. That way any monolithic query in terms of k and be decomposed into parallel chunks over the token range for the purposes of map/reduce processing simply by appending a 'and where token(k)...' clause to the exiting 'where k ...'. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (CASSANDRA-7016) can't map/reduce over subset of rows with cql
[ https://issues.apache.org/jira/browse/CASSANDRA-7016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Lerer updated CASSANDRA-7016: -- Attachment: CASSANDRA-7016-V2.txt I forgot to modify the SelectStatement execute method in the previous patch. can't map/reduce over subset of rows with cql - Key: CASSANDRA-7016 URL: https://issues.apache.org/jira/browse/CASSANDRA-7016 Project: Cassandra Issue Type: Bug Components: Core, Hadoop Reporter: Jonathan Halliday Assignee: Benjamin Lerer Priority: Minor Labels: cql Fix For: 2.1.1 Attachments: CASSANDRA-7016-V2.txt, CASSANDRA-7016.txt select ... where token(k) x and token(k) = y and k in (a,b) allow filtering; This fails on 2.0.6: can't restrict k by more than one relation. In the context of map/reduce (hence the token range) I want to map over only a subset of the keys (hence the 'in'). Pushing the 'in' filter down to cql is substantially cheaper than pulling all rows to the client and then discarding most of them. Currently this is possible only if the hadoop integration code is altered to apply the AND on the client side and use cql that contains only the resulting filtered 'in' set. The problem is not hadoop specific though, so IMO it should really be solved in cql not the hadoop integration code. Most restrictions on cql syntax seem to exist to prevent unduly expensive queries. This one seems to be doing the opposite. Edit: on further thought and with reference to the code in SelectStatement$RawStatement, it seems to me that token(k) and k should be considered distinct entities for the purposes of processing restrictions. That is, no restriction on the token should conflict with a restriction on the raw key. That way any monolithic query in terms of k and be decomposed into parallel chunks over the token range for the purposes of map/reduce processing simply by appending a 'and where token(k)...' clause to the exiting 'where k ...'. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (CASSANDRA-7016) can't map/reduce over subset of rows with cql
[ https://issues.apache.org/jira/browse/CASSANDRA-7016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Lerer updated CASSANDRA-7016: -- Fix Version/s: (was: 2.0.10) 2.1.1 can't map/reduce over subset of rows with cql - Key: CASSANDRA-7016 URL: https://issues.apache.org/jira/browse/CASSANDRA-7016 Project: Cassandra Issue Type: Bug Components: Core, Hadoop Reporter: Jonathan Halliday Assignee: Benjamin Lerer Priority: Minor Labels: cql Fix For: 2.1.1 Attachments: CASSANDRA-7016.txt select ... where token(k) x and token(k) = y and k in (a,b) allow filtering; This fails on 2.0.6: can't restrict k by more than one relation. In the context of map/reduce (hence the token range) I want to map over only a subset of the keys (hence the 'in'). Pushing the 'in' filter down to cql is substantially cheaper than pulling all rows to the client and then discarding most of them. Currently this is possible only if the hadoop integration code is altered to apply the AND on the client side and use cql that contains only the resulting filtered 'in' set. The problem is not hadoop specific though, so IMO it should really be solved in cql not the hadoop integration code. Most restrictions on cql syntax seem to exist to prevent unduly expensive queries. This one seems to be doing the opposite. Edit: on further thought and with reference to the code in SelectStatement$RawStatement, it seems to me that token(k) and k should be considered distinct entities for the purposes of processing restrictions. That is, no restriction on the token should conflict with a restriction on the raw key. That way any monolithic query in terms of k and be decomposed into parallel chunks over the token range for the purposes of map/reduce processing simply by appending a 'and where token(k)...' clause to the exiting 'where k ...'. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (CASSANDRA-7016) can't map/reduce over subset of rows with cql
[ https://issues.apache.org/jira/browse/CASSANDRA-7016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Lerer updated CASSANDRA-7016: -- Attachment: CASSANDRA-7016.txt can't map/reduce over subset of rows with cql - Key: CASSANDRA-7016 URL: https://issues.apache.org/jira/browse/CASSANDRA-7016 Project: Cassandra Issue Type: Bug Components: Core, Hadoop Reporter: Jonathan Halliday Assignee: Benjamin Lerer Priority: Minor Labels: cql Fix For: 2.1.1 Attachments: CASSANDRA-7016.txt select ... where token(k) x and token(k) = y and k in (a,b) allow filtering; This fails on 2.0.6: can't restrict k by more than one relation. In the context of map/reduce (hence the token range) I want to map over only a subset of the keys (hence the 'in'). Pushing the 'in' filter down to cql is substantially cheaper than pulling all rows to the client and then discarding most of them. Currently this is possible only if the hadoop integration code is altered to apply the AND on the client side and use cql that contains only the resulting filtered 'in' set. The problem is not hadoop specific though, so IMO it should really be solved in cql not the hadoop integration code. Most restrictions on cql syntax seem to exist to prevent unduly expensive queries. This one seems to be doing the opposite. Edit: on further thought and with reference to the code in SelectStatement$RawStatement, it seems to me that token(k) and k should be considered distinct entities for the purposes of processing restrictions. That is, no restriction on the token should conflict with a restriction on the raw key. That way any monolithic query in terms of k and be decomposed into parallel chunks over the token range for the purposes of map/reduce processing simply by appending a 'and where token(k)...' clause to the exiting 'where k ...'. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (CASSANDRA-7016) can't map/reduce over subset of rows with cql
[ https://issues.apache.org/jira/browse/CASSANDRA-7016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne updated CASSANDRA-7016: Assignee: Benjamin LERER can't map/reduce over subset of rows with cql - Key: CASSANDRA-7016 URL: https://issues.apache.org/jira/browse/CASSANDRA-7016 Project: Cassandra Issue Type: Bug Components: Core, Hadoop Reporter: Jonathan Halliday Assignee: Benjamin LERER Priority: Minor Labels: cql Fix For: 2.0.10 select ... where token(k) x and token(k) = y and k in (a,b) allow filtering; This fails on 2.0.6: can't restrict k by more than one relation. In the context of map/reduce (hence the token range) I want to map over only a subset of the keys (hence the 'in'). Pushing the 'in' filter down to cql is substantially cheaper than pulling all rows to the client and then discarding most of them. Currently this is possible only if the hadoop integration code is altered to apply the AND on the client side and use cql that contains only the resulting filtered 'in' set. The problem is not hadoop specific though, so IMO it should really be solved in cql not the hadoop integration code. Most restrictions on cql syntax seem to exist to prevent unduly expensive queries. This one seems to be doing the opposite. Edit: on further thought and with reference to the code in SelectStatement$RawStatement, it seems to me that token(k) and k should be considered distinct entities for the purposes of processing restrictions. That is, no restriction on the token should conflict with a restriction on the raw key. That way any monolithic query in terms of k and be decomposed into parallel chunks over the token range for the purposes of map/reduce processing simply by appending a 'and where token(k)...' clause to the exiting 'where k ...'. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (CASSANDRA-7016) can't map/reduce over subset of rows with cql
[ https://issues.apache.org/jira/browse/CASSANDRA-7016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Williams updated CASSANDRA-7016: Fix Version/s: 2.0.9 can't map/reduce over subset of rows with cql - Key: CASSANDRA-7016 URL: https://issues.apache.org/jira/browse/CASSANDRA-7016 Project: Cassandra Issue Type: Bug Components: Core, Hadoop Reporter: Jonathan Halliday Priority: Minor Fix For: 2.0.9 select ... where token(k) x and token(k) = y and k in (a,b) allow filtering; This fails on 2.0.6: can't restrict k by more than one relation. In the context of map/reduce (hence the token range) I want to map over only a subset of the keys (hence the 'in'). Pushing the 'in' filter down to cql is substantially cheaper than pulling all rows to the client and then discarding most of them. Currently this is possible only if the hadoop integration code is altered to apply the AND on the client side and use cql that contains only the resulting filtered 'in' set. The problem is not hadoop specific though, so IMO it should really be solved in cql not the hadoop integration code. Most restrictions on cql syntax seem to exist to prevent unduly expensive queries. This one seems to be doing the opposite. Edit: on further thought and with reference to the code in SelectStatement$RawStatement, it seems to me that token(k) and k should be considered distinct entities for the purposes of processing restrictions. That is, no restriction on the token should conflict with a restriction on the raw key. That way any monolithic query in terms of k and be decomposed into parallel chunks over the token range for the purposes of map/reduce processing simply by appending a 'and where token(k)...' clause to the exiting 'where k ...'. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (CASSANDRA-7016) can't map/reduce over subset of rows with cql
[ https://issues.apache.org/jira/browse/CASSANDRA-7016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Halliday updated CASSANDRA-7016: - Description: select ... where token(k) x and token(k) = y and k in (a,b) allow filtering; This fails on 2.0.6: can't restrict k by more than one relation. In the context of map/reduce (hence the token range) I want to map over only a subset of the keys (hence the 'in'). Pushing the 'in' filter down to cql is substantially cheaper than pulling all rows to the client and then discarding most of them. Currently this is possible only if the hadoop integration code is altered to apply the AND on the client side and use cql that contains only the resulting filtered 'in' set. The problem is not hadoop specific though, so IMO it should really be solved in cql not the hadoop integration code. Most restrictions on cql syntax seem to exist to prevent unduly expensive queries. This one seems to be doing the opposite. Edit: on further thought and with reference to the code in SelectStatement$RawStatement, it seems to me that token(k) and k should be considered distinct entities for the purposes of processing restrictions. That is, no restriction on the token should conflict with a restriction on the raw key. That way any monolithic query in terms of k and be decomposed into parallel chunks over the token range for the purposes of map/reduce processing simply by appending a 'and where token(k)...' clause to the exiting 'where k ...'. was: select ... where token(k) x and token(k) = y and k in (a,b) allow filtering; This fails on 2.0.6: can't restrict k by more than one relation. In the context of map/reduce (hence the token range) I want to map over only a subset of the keys (hence the 'in'). Pushing the 'in' filter down to cql is substantially cheaper than pulling all rows to the client and then discarding most of them. Currently this is possible only if the hadoop integration code is altered to apply the AND on the client side and use cql that contains only the resulting filtered 'in' set. The problem is not hadoop specific though, so IMO it should really be solved in cql not the hadoop integration code. Most restrictions on cql syntax seem to exist to prevent unduly expensive queries. This one seems to be doing the opposite. can't map/reduce over subset of rows with cql - Key: CASSANDRA-7016 URL: https://issues.apache.org/jira/browse/CASSANDRA-7016 Project: Cassandra Issue Type: Bug Components: Core, Hadoop Reporter: Jonathan Halliday select ... where token(k) x and token(k) = y and k in (a,b) allow filtering; This fails on 2.0.6: can't restrict k by more than one relation. In the context of map/reduce (hence the token range) I want to map over only a subset of the keys (hence the 'in'). Pushing the 'in' filter down to cql is substantially cheaper than pulling all rows to the client and then discarding most of them. Currently this is possible only if the hadoop integration code is altered to apply the AND on the client side and use cql that contains only the resulting filtered 'in' set. The problem is not hadoop specific though, so IMO it should really be solved in cql not the hadoop integration code. Most restrictions on cql syntax seem to exist to prevent unduly expensive queries. This one seems to be doing the opposite. Edit: on further thought and with reference to the code in SelectStatement$RawStatement, it seems to me that token(k) and k should be considered distinct entities for the purposes of processing restrictions. That is, no restriction on the token should conflict with a restriction on the raw key. That way any monolithic query in terms of k and be decomposed into parallel chunks over the token range for the purposes of map/reduce processing simply by appending a 'and where token(k)...' clause to the exiting 'where k ...'. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (CASSANDRA-7016) can't map/reduce over subset of rows with cql
[ https://issues.apache.org/jira/browse/CASSANDRA-7016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne updated CASSANDRA-7016: Priority: Minor (was: Major) can't map/reduce over subset of rows with cql - Key: CASSANDRA-7016 URL: https://issues.apache.org/jira/browse/CASSANDRA-7016 Project: Cassandra Issue Type: Bug Components: Core, Hadoop Reporter: Jonathan Halliday Priority: Minor select ... where token(k) x and token(k) = y and k in (a,b) allow filtering; This fails on 2.0.6: can't restrict k by more than one relation. In the context of map/reduce (hence the token range) I want to map over only a subset of the keys (hence the 'in'). Pushing the 'in' filter down to cql is substantially cheaper than pulling all rows to the client and then discarding most of them. Currently this is possible only if the hadoop integration code is altered to apply the AND on the client side and use cql that contains only the resulting filtered 'in' set. The problem is not hadoop specific though, so IMO it should really be solved in cql not the hadoop integration code. Most restrictions on cql syntax seem to exist to prevent unduly expensive queries. This one seems to be doing the opposite. Edit: on further thought and with reference to the code in SelectStatement$RawStatement, it seems to me that token(k) and k should be considered distinct entities for the purposes of processing restrictions. That is, no restriction on the token should conflict with a restriction on the raw key. That way any monolithic query in terms of k and be decomposed into parallel chunks over the token range for the purposes of map/reduce processing simply by appending a 'and where token(k)...' clause to the exiting 'where k ...'. -- This message was sent by Atlassian JIRA (v6.2#6252)