[jira] [Issue Comment Edited] (CASSANDRA-1125) Filter out ColumnFamily rows that aren't part of the query (using a KeyRange)
[ https://issues.apache.org/jira/browse/CASSANDRA-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094036#comment-13094036 ] Mck SembWever edited comment on CASSANDRA-1125 at 8/30/11 8:02 PM: --- Something broke here in production once we went out with 0.8.2. It may have been some poor testing, i'm not entirely sure and a little surprised. CFIF:135 breaks because inside {{dhtRange.intersects(jobRange)}} there's a call to {{new Range(token, token)}} which calls {{StorageService.getPartitioner()}} and StorageService is null as we're not inside the server. A quick fix is to change Range:148 from {{new Range(token, token)}} to {{new Range(token, token, partitioner)}} making the presumption that the partitioner for the new Range will be the same as this Range. was (Author: michaelsembwever): Something broke here in production once we went out with 0.8.2. It may have been some poor testing, i'm not entirely sure and a little surprised. CFIF:135 breaks because inside {{dhtRange.intersects(jobRange)}} there's a call to {{new Range(token, token)}} which calls {{StorageService.getPartitioner()}} and StorageService is null as we're not inside the server. A quick fix (tested) is to change Range:148 from {{new Range(token, token)}} to {{new Range(token, token, partitioner)}} making the presumption that the partitioner for the new Range will be the same as this Range. Filter out ColumnFamily rows that aren't part of the query (using a KeyRange) - Key: CASSANDRA-1125 URL: https://issues.apache.org/jira/browse/CASSANDRA-1125 Project: Cassandra Issue Type: New Feature Components: Hadoop Reporter: Jeremy Hanna Assignee: Mck SembWever Priority: Minor Fix For: 0.8.2 Attachments: 1125-formatted.txt, 1125-v3.txt, CASSANDRA-1125.patch, CASSANDRA-1125.patch Currently, when running a MapReduce job against data in a Cassandra data store, it reads through all the data for a particular ColumnFamily. This could be optimized to only read through those rows that have to do with the query. It's a small change but wanted to put it in Jira so that it didn't fall through the cracks. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-1125) Filter out ColumnFamily rows that aren't part of the query (using a KeyRange)
[ https://issues.apache.org/jira/browse/CASSANDRA-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094036#comment-13094036 ] Mck SembWever edited comment on CASSANDRA-1125 at 8/30/11 8:55 PM: --- Something broke here in production once we went out with 0.8.2. It may have been some poor testing, i'm not entirely sure and a little surprised. CFIF:135 breaks because inside {{dhtRange.intersects(jobRange)}} there's a call to {{new Range(token, token)}} which calls {{StorageService.getPartitioner()}} and StorageService is null as we're not inside the server. A quick fix is to change Range:148 from {{new Range(token, token)}} to {{new Range(token, token, partitioner)}} making the presumption that the partitioner for the new Range will be the same as this Range. This won't work if the Range wraps in any way (which could be just a limitation of the current KeyRange filtering), but otherwise tests ok. was (Author: michaelsembwever): Something broke here in production once we went out with 0.8.2. It may have been some poor testing, i'm not entirely sure and a little surprised. CFIF:135 breaks because inside {{dhtRange.intersects(jobRange)}} there's a call to {{new Range(token, token)}} which calls {{StorageService.getPartitioner()}} and StorageService is null as we're not inside the server. A quick fix is to change Range:148 from {{new Range(token, token)}} to {{new Range(token, token, partitioner)}} making the presumption that the partitioner for the new Range will be the same as this Range. Filter out ColumnFamily rows that aren't part of the query (using a KeyRange) - Key: CASSANDRA-1125 URL: https://issues.apache.org/jira/browse/CASSANDRA-1125 Project: Cassandra Issue Type: New Feature Components: Hadoop Reporter: Jeremy Hanna Assignee: Mck SembWever Priority: Minor Fix For: 0.8.2 Attachments: 1125-formatted.txt, 1125-v3.txt, CASSANDRA-1125.patch, CASSANDRA-1125.patch Currently, when running a MapReduce job against data in a Cassandra data store, it reads through all the data for a particular ColumnFamily. This could be optimized to only read through those rows that have to do with the query. It's a small change but wanted to put it in Jira so that it didn't fall through the cracks. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-1125) Filter out ColumnFamily rows that aren't part of the query
[ https://issues.apache.org/jira/browse/CASSANDRA-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055503#comment-13055503 ] Mck SembWever edited comment on CASSANDRA-1125 at 6/27/11 9:31 PM: --- can this go into 0.8.1 ? ( and can we split this issue into two: 1) for KeyRange and 2) for IndexClause ) was (Author: michaelsembwever): can this go into 0.8.1 ? Filter out ColumnFamily rows that aren't part of the query -- Key: CASSANDRA-1125 URL: https://issues.apache.org/jira/browse/CASSANDRA-1125 Project: Cassandra Issue Type: New Feature Components: Hadoop Reporter: Jeremy Hanna Assignee: Mck SembWever Priority: Minor Fix For: 1.0 Attachments: CASSANDRA-1125.patch Currently, when running a MapReduce job against data in a Cassandra data store, it reads through all the data for a particular ColumnFamily. This could be optimized to only read through those rows that have to do with the query. It's a small change but wanted to put it in Jira so that it didn't fall through the cracks. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-1125) Filter out ColumnFamily rows that aren't part of the query
[ https://issues.apache.org/jira/browse/CASSANDRA-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053467#comment-13053467 ] Mck SembWever edited comment on CASSANDRA-1125 at 6/23/11 8:01 AM: --- For now (without CASSANDRA-1600) I can use a {{KeyRange}} and {{Range.intersectionWith(..)}} for start/end rowKey limits in CFIF. To upgrade from KeyRange to IndexClause (once it contains an optional KeyRange field) can be easily enough done latter by replacing ConfigHelper.setInputKeyRange(..) to ConfigHelper.setInputIndexClause(..) and rewriting the two lines of code in CFRR's RowIterator.maybeInit(..) was (Author: michaelsembwever): For now (without CASSANDRA-1600) I can use a {{KeyRange}} and {{Range.intersectionWith(..)}} for start/end rowKey limits in CFIF. To upgrade from KeyRange to IndexClause (once it contains an optional KeyRange field) can be easily enough done latter by replacing ConfigHelper.setInputKeyRange(..) to ConfigHelper.setInputIndexClause(..) and rewriting the code two lines of code in CFRR's RowIterator.maybeInit(..) Filter out ColumnFamily rows that aren't part of the query -- Key: CASSANDRA-1125 URL: https://issues.apache.org/jira/browse/CASSANDRA-1125 Project: Cassandra Issue Type: New Feature Components: Hadoop Reporter: Jeremy Hanna Assignee: Mck SembWever Priority: Minor Fix For: 1.0 Currently, when running a MapReduce job against data in a Cassandra data store, it reads through all the data for a particular ColumnFamily. This could be optimized to only read through those rows that have to do with the query. It's a small change but wanted to put it in Jira so that it didn't fall through the cracks. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-1125) Filter out ColumnFamily rows that aren't part of the query
[ https://issues.apache.org/jira/browse/CASSANDRA-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053467#comment-13053467 ] Mck SembWever edited comment on CASSANDRA-1125 at 6/23/11 8:00 AM: --- For now (without CASSANDRA-1600) I can use a {{KeyRange}} and {{Range.intersectionWith(..)}} for start/end rowKey limits in CFIF. To upgrade from KeyRange to IndexClause (once it contains an optional KeyRange field) can be easily enough done latter by replacing ConfigHelper.setInputKeyRange(..) to ConfigHelper.setInputIndexClause(..) and rewriting the code two lines of code in CFRR's RowIterator.maybeInit(..) was (Author: michaelsembwever): I can use a {{KeyRange}} and {{Range.intersectionWith(..)}} for start/end rowKey limits in CFIF. -And i can use a {{IndexClause}} (which also permits a start_key) and then {{get_indexed_slices(..)}} in CFRR's {{RowIterator.maybeInit()}} But both approaches can't be combined. So i guess ConfigHelper could have methods setInputKeyRange(..) and setInputIndexClause(..) which are mutually exclusive to call.- Spoke a little early here about using {{get_indexed_slices}}. I can't see how IndexClause can specify a start/end rowKey - is this possible? (it needs to to pass through the batch's range) Filter out ColumnFamily rows that aren't part of the query -- Key: CASSANDRA-1125 URL: https://issues.apache.org/jira/browse/CASSANDRA-1125 Project: Cassandra Issue Type: New Feature Components: Hadoop Reporter: Jeremy Hanna Assignee: Mck SembWever Priority: Minor Fix For: 1.0 Currently, when running a MapReduce job against data in a Cassandra data store, it reads through all the data for a particular ColumnFamily. This could be optimized to only read through those rows that have to do with the query. It's a small change but wanted to put it in Jira so that it didn't fall through the cracks. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-1125) Filter out ColumnFamily rows that aren't part of the query
[ https://issues.apache.org/jira/browse/CASSANDRA-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053467#comment-13053467 ] Mck SembWever edited comment on CASSANDRA-1125 at 6/22/11 9:08 PM: --- I can use a {{KeyRange}} and {{Range.intersectionWith(..)}} for start/end rowKey limits in CFIF. -And i can use a {{IndexClause}} (which also permits a start_key) and then {{get_indexed_slices(..)}} in CFRR's {{RowIterator.maybeInit()}} But both approaches can't be combined. So i guess ConfigHelper could have methods setInputKeyRange(..) and setInputIndexClause(..) which are mutually exclusive to call.- Spoke a little early here about using {{get_indexed_slices}}. I can't see how IndexClause can specify a start/end rowKey - is this possible? was (Author: michaelsembwever): I can use a {{KeyRange}} and {{Range.intersectionWith(..)}} for start/end rowKey limits in CFIF. -And i can use a {{IndexClause}} (which also permits a start_key) and then {{get_indexed_slices(..)}} in CFRR's {{RowIterator.maybeInit()}} But both approaches can't be combined. So i guess ConfigHelper could have methods setInputKeyRange(..) and setInputIndexClause(..) which are mutually exclusive to call.- Spoke a little earlier about using {{get_indexed_slices}}. I can't see how IndexClause can specify a start/end rowKey - is this possible? Filter out ColumnFamily rows that aren't part of the query -- Key: CASSANDRA-1125 URL: https://issues.apache.org/jira/browse/CASSANDRA-1125 Project: Cassandra Issue Type: New Feature Components: Hadoop Reporter: Jeremy Hanna Priority: Minor Fix For: 1.0 Currently, when running a MapReduce job against data in a Cassandra data store, it reads through all the data for a particular ColumnFamily. This could be optimized to only read through those rows that have to do with the query. It's a small change but wanted to put it in Jira so that it didn't fall through the cracks. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-1125) Filter out ColumnFamily rows that aren't part of the query
[ https://issues.apache.org/jira/browse/CASSANDRA-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053467#comment-13053467 ] Mck SembWever edited comment on CASSANDRA-1125 at 6/22/11 9:08 PM: --- I can use a {{KeyRange}} and {{Range.intersectionWith(..)}} for start/end rowKey limits in CFIF. -And i can use a {{IndexClause}} (which also permits a start_key) and then {{get_indexed_slices(..)}} in CFRR's {{RowIterator.maybeInit()}} But both approaches can't be combined. So i guess ConfigHelper could have methods setInputKeyRange(..) and setInputIndexClause(..) which are mutually exclusive to call.- Spoke a little earlier about using {{get_indexed_slices}}. I can't see how IndexClause can specify a start/end rowKey - is this possible? was (Author: michaelsembwever): I can use a {{KeyRange}} and {{Range.intersectionWith(..)}} for start/end limits in CFIF. And i can use a {{IndexClause}} (which also permits a start_key) and then {{get_indexed_slices(..)}} in CFRR's {{RowIterator.maybeInit()}} But both approaches can't be combined. So i guess ConfigHelper could have methods setInputKeyRange(..) and setInputIndexClause(..) which are mutually exclusive to call. Filter out ColumnFamily rows that aren't part of the query -- Key: CASSANDRA-1125 URL: https://issues.apache.org/jira/browse/CASSANDRA-1125 Project: Cassandra Issue Type: New Feature Components: Hadoop Reporter: Jeremy Hanna Priority: Minor Fix For: 1.0 Currently, when running a MapReduce job against data in a Cassandra data store, it reads through all the data for a particular ColumnFamily. This could be optimized to only read through those rows that have to do with the query. It's a small change but wanted to put it in Jira so that it didn't fall through the cracks. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-1125) Filter out ColumnFamily rows that aren't part of the query
[ https://issues.apache.org/jira/browse/CASSANDRA-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053467#comment-13053467 ] Mck SembWever edited comment on CASSANDRA-1125 at 6/22/11 9:10 PM: --- I can use a {{KeyRange}} and {{Range.intersectionWith(..)}} for start/end rowKey limits in CFIF. -And i can use a {{IndexClause}} (which also permits a start_key) and then {{get_indexed_slices(..)}} in CFRR's {{RowIterator.maybeInit()}} But both approaches can't be combined. So i guess ConfigHelper could have methods setInputKeyRange(..) and setInputIndexClause(..) which are mutually exclusive to call.- Spoke a little early here about using {{get_indexed_slices}}. I can't see how IndexClause can specify a start/end rowKey - is this possible? (it needs to to pass through the batch's range) was (Author: michaelsembwever): I can use a {{KeyRange}} and {{Range.intersectionWith(..)}} for start/end rowKey limits in CFIF. -And i can use a {{IndexClause}} (which also permits a start_key) and then {{get_indexed_slices(..)}} in CFRR's {{RowIterator.maybeInit()}} But both approaches can't be combined. So i guess ConfigHelper could have methods setInputKeyRange(..) and setInputIndexClause(..) which are mutually exclusive to call.- Spoke a little early here about using {{get_indexed_slices}}. I can't see how IndexClause can specify a start/end rowKey - is this possible? Filter out ColumnFamily rows that aren't part of the query -- Key: CASSANDRA-1125 URL: https://issues.apache.org/jira/browse/CASSANDRA-1125 Project: Cassandra Issue Type: New Feature Components: Hadoop Reporter: Jeremy Hanna Priority: Minor Fix For: 1.0 Currently, when running a MapReduce job against data in a Cassandra data store, it reads through all the data for a particular ColumnFamily. This could be optimized to only read through those rows that have to do with the query. It's a small change but wanted to put it in Jira so that it didn't fall through the cracks. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Issue Comment Edited: (CASSANDRA-1125) Filter out ColumnFamily rows that aren't part of the query
[ https://issues.apache.org/jira/browse/CASSANDRA-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12976876#action_12976876 ] Mck SembWever edited comment on CASSANDRA-1125 at 1/3/11 1:58 PM: -- Jonathan: do you mean the IndexExpression and IndexClause and Table.open(Keyspace1).getColumnFamilyStore(Indexed1).scan(clause, filter); being used inside of ColumnFamilyRecordReader.maybeInit() ?? was (Author: michaelsembwever): Jonathan: do you mean the IndexExpression and IndexClause and Table.open(Keyspace1).getColumnFamilyStore(Indexed1).scan(clause, filter); inside of ColumnFamilyRecordReader.maybeInit() ?? Filter out ColumnFamily rows that aren't part of the query -- Key: CASSANDRA-1125 URL: https://issues.apache.org/jira/browse/CASSANDRA-1125 Project: Cassandra Issue Type: Improvement Components: Hadoop Reporter: Jeremy Hanna Priority: Minor Fix For: 0.7.1 Currently, when running a MapReduce job against data in a Cassandra data store, it reads through all the data for a particular ColumnFamily. This could be optimized to only read through those rows that have to do with the query. It's a small change but wanted to put it in Jira so that it didn't fall through the cracks. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (CASSANDRA-1125) Filter out ColumnFamily rows that aren't part of the query
[ https://issues.apache.org/jira/browse/CASSANDRA-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12976876#action_12976876 ] Mck SembWever edited comment on CASSANDRA-1125 at 1/3/11 2:01 PM: -- Jonathan: do you mean the IndexExpression and IndexClause and Table.open(Keyspace1).getColumnFamilyStore(Indexed1).scan(clause, filter); being used, instead of the KeyRange, inside of ColumnFamilyRecordReader.maybeInit() ?? was (Author: michaelsembwever): Jonathan: do you mean the IndexExpression and IndexClause and Table.open(Keyspace1).getColumnFamilyStore(Indexed1).scan(clause, filter); being used inside of ColumnFamilyRecordReader.maybeInit() ?? Filter out ColumnFamily rows that aren't part of the query -- Key: CASSANDRA-1125 URL: https://issues.apache.org/jira/browse/CASSANDRA-1125 Project: Cassandra Issue Type: Improvement Components: Hadoop Reporter: Jeremy Hanna Priority: Minor Fix For: 0.7.1 Currently, when running a MapReduce job against data in a Cassandra data store, it reads through all the data for a particular ColumnFamily. This could be optimized to only read through those rows that have to do with the query. It's a small change but wanted to put it in Jira so that it didn't fall through the cracks. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.