[jira] [Commented] (CASSANDRA-1125) Filter out ColumnFamily rows that aren't part of the query (using a KeyRange)
[ https://issues.apache.org/jira/browse/CASSANDRA-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13064118#comment-13064118 ] Hudson commented on CASSANDRA-1125: --- Integrated in Cassandra-0.8 #214 (See [https://builds.apache.org/job/Cassandra-0.8/214/]) add KeyRangeoption to Hadoop inputformat patch by Mck SembWever; reviewed by jbellis for CASSANDRA-1125 jbellis : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1145731 Files : * /cassandra/branches/cassandra-0.8/CHANGES.txt * /cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/hadoop/ConfigHelper.java * /cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/hadoop/ColumnFamilyInputFormat.java Filter out ColumnFamily rows that aren't part of the query (using a KeyRange) - Key: CASSANDRA-1125 URL: https://issues.apache.org/jira/browse/CASSANDRA-1125 Project: Cassandra Issue Type: New Feature Components: Hadoop Reporter: Jeremy Hanna Assignee: Mck SembWever Priority: Minor Fix For: 0.8.2 Attachments: 1125-formatted.txt, 1125-v3.txt, CASSANDRA-1125.patch, CASSANDRA-1125.patch Currently, when running a MapReduce job against data in a Cassandra data store, it reads through all the data for a particular ColumnFamily. This could be optimized to only read through those rows that have to do with the query. It's a small change but wanted to put it in Jira so that it didn't fall through the cracks. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-1125) Filter out ColumnFamily rows that aren't part of the query
[ https://issues.apache.org/jira/browse/CASSANDRA-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13062812#comment-13062812 ] Mck SembWever commented on CASSANDRA-1125: -- +1 (tested) on 1125-v3.txt Filter out ColumnFamily rows that aren't part of the query -- Key: CASSANDRA-1125 URL: https://issues.apache.org/jira/browse/CASSANDRA-1125 Project: Cassandra Issue Type: New Feature Components: Hadoop Reporter: Jeremy Hanna Assignee: Mck SembWever Priority: Minor Fix For: 1.0 Attachments: 1125-formatted.txt, 1125-v3.txt, CASSANDRA-1125.patch, CASSANDRA-1125.patch Currently, when running a MapReduce job against data in a Cassandra data store, it reads through all the data for a particular ColumnFamily. This could be optimized to only read through those rows that have to do with the query. It's a small change but wanted to put it in Jira so that it didn't fall through the cracks. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-1125) Filter out ColumnFamily rows that aren't part of the query (using a KeyRange)
[ https://issues.apache.org/jira/browse/CASSANDRA-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13062814#comment-13062814 ] Mck SembWever commented on CASSANDRA-1125: -- Created CASSANDRA-2878 for the better solution using a IndexClause Filter out ColumnFamily rows that aren't part of the query (using a KeyRange) - Key: CASSANDRA-1125 URL: https://issues.apache.org/jira/browse/CASSANDRA-1125 Project: Cassandra Issue Type: New Feature Components: Hadoop Reporter: Jeremy Hanna Assignee: Mck SembWever Priority: Minor Fix For: 1.0 Attachments: 1125-formatted.txt, 1125-v3.txt, CASSANDRA-1125.patch, CASSANDRA-1125.patch Currently, when running a MapReduce job against data in a Cassandra data store, it reads through all the data for a particular ColumnFamily. This could be optimized to only read through those rows that have to do with the query. It's a small change but wanted to put it in Jira so that it didn't fall through the cracks. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-1125) Filter out ColumnFamily rows that aren't part of the query
[ https://issues.apache.org/jira/browse/CASSANDRA-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059401#comment-13059401 ] Mck SembWever commented on CASSANDRA-1125: -- bq. using KeyRange but with tokens (which Thrift also uses for start-exclusive) this is my preference. i'll make a patch for it. Filter out ColumnFamily rows that aren't part of the query -- Key: CASSANDRA-1125 URL: https://issues.apache.org/jira/browse/CASSANDRA-1125 Project: Cassandra Issue Type: New Feature Components: Hadoop Reporter: Jeremy Hanna Assignee: Mck SembWever Priority: Minor Fix For: 1.0 Attachments: 1125-formatted.txt, CASSANDRA-1125.patch Currently, when running a MapReduce job against data in a Cassandra data store, it reads through all the data for a particular ColumnFamily. This could be optimized to only read through those rows that have to do with the query. It's a small change but wanted to put it in Jira so that it didn't fall through the cracks. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-1125) Filter out ColumnFamily rows that aren't part of the query
[ https://issues.apache.org/jira/browse/CASSANDRA-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059200#comment-13059200 ] Christian Hubmann commented on CASSANDRA-1125: -- Haha, yeah I follow this issue too... i just think that its going to be a pain to have to rewrite the pig scripts for the additional transient timebucket;-) Filter out ColumnFamily rows that aren't part of the query -- Key: CASSANDRA-1125 URL: https://issues.apache.org/jira/browse/CASSANDRA-1125 Project: Cassandra Issue Type: New Feature Components: Hadoop Reporter: Jeremy Hanna Assignee: Mck SembWever Priority: Minor Fix For: 1.0 Attachments: 1125-formatted.txt, CASSANDRA-1125.patch Currently, when running a MapReduce job against data in a Cassandra data store, it reads through all the data for a particular ColumnFamily. This could be optimized to only read through those rows that have to do with the query. It's a small change but wanted to put it in Jira so that it didn't fall through the cracks. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-1125) Filter out ColumnFamily rows that aren't part of the query
[ https://issues.apache.org/jira/browse/CASSANDRA-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059098#comment-13059098 ] Jeremy Hanna commented on CASSANDRA-1125: - So does this only include key ranges - that's what it sounds like. And indexes are out for now too, it sounds like - e.g. where timebucket = 12345. Filter out ColumnFamily rows that aren't part of the query -- Key: CASSANDRA-1125 URL: https://issues.apache.org/jira/browse/CASSANDRA-1125 Project: Cassandra Issue Type: New Feature Components: Hadoop Reporter: Jeremy Hanna Assignee: Mck SembWever Priority: Minor Fix For: 1.0 Attachments: 1125-formatted.txt, CASSANDRA-1125.patch Currently, when running a MapReduce job against data in a Cassandra data store, it reads through all the data for a particular ColumnFamily. This could be optimized to only read through those rows that have to do with the query. It's a small change but wanted to put it in Jira so that it didn't fall through the cracks. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-1125) Filter out ColumnFamily rows that aren't part of the query
[ https://issues.apache.org/jira/browse/CASSANDRA-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059114#comment-13059114 ] Jonathan Ellis commented on CASSANDRA-1125: --- Like Mck said, that will have to be split into another ticket, since it continues to depend on CASSANDRA-1600. Filter out ColumnFamily rows that aren't part of the query -- Key: CASSANDRA-1125 URL: https://issues.apache.org/jira/browse/CASSANDRA-1125 Project: Cassandra Issue Type: New Feature Components: Hadoop Reporter: Jeremy Hanna Assignee: Mck SembWever Priority: Minor Fix For: 1.0 Attachments: 1125-formatted.txt, CASSANDRA-1125.patch Currently, when running a MapReduce job against data in a Cassandra data store, it reads through all the data for a particular ColumnFamily. This could be optimized to only read through those rows that have to do with the query. It's a small change but wanted to put it in Jira so that it didn't fall through the cracks. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-1125) Filter out ColumnFamily rows that aren't part of the query
[ https://issues.apache.org/jira/browse/CASSANDRA-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058859#comment-13058859 ] Jonathan Ellis commented on CASSANDRA-1125: --- (And I'd be fine with putting this in 0.8.x.) Filter out ColumnFamily rows that aren't part of the query -- Key: CASSANDRA-1125 URL: https://issues.apache.org/jira/browse/CASSANDRA-1125 Project: Cassandra Issue Type: New Feature Components: Hadoop Reporter: Jeremy Hanna Assignee: Mck SembWever Priority: Minor Fix For: 1.0 Attachments: 1125-formatted.txt, CASSANDRA-1125.patch Currently, when running a MapReduce job against data in a Cassandra data store, it reads through all the data for a particular ColumnFamily. This could be optimized to only read through those rows that have to do with the query. It's a small change but wanted to put it in Jira so that it didn't fall through the cracks. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-1125) Filter out ColumnFamily rows that aren't part of the query
[ https://issues.apache.org/jira/browse/CASSANDRA-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053467#comment-13053467 ] Mck SembWever commented on CASSANDRA-1125: -- I can use a {{KeyRange}} and {{Range.intersectionWith(..)}} for start/end limits in CFIF. And i can use a {{IndexClause}} (which also permits a start_key) and then {{get_indexed_slices(..)}} in CFRR's {{RowIterator.maybeInit()}} But both approaches can't be combined. So i guess ConfigHelper could have methods setInputKeyRange(..) and setInputIndexClause(..) which are mutually exclusive to call. Filter out ColumnFamily rows that aren't part of the query -- Key: CASSANDRA-1125 URL: https://issues.apache.org/jira/browse/CASSANDRA-1125 Project: Cassandra Issue Type: New Feature Components: Hadoop Reporter: Jeremy Hanna Priority: Minor Fix For: 1.0 Currently, when running a MapReduce job against data in a Cassandra data store, it reads through all the data for a particular ColumnFamily. This could be optimized to only read through those rows that have to do with the query. It's a small change but wanted to put it in Jira so that it didn't fall through the cracks. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (CASSANDRA-1125) Filter out ColumnFamily rows that aren't part of the query
[ https://issues.apache.org/jira/browse/CASSANDRA-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12976876#action_12976876 ] Mck SembWever commented on CASSANDRA-1125: -- Jonathan: do you mean the IndexExpression and IndexClause and Table.open(Keyspace1).getColumnFamilyStore(Indexed1).scan(clause, filter); inside of ColumnFamilyRecordReader.maybeInit() ?? Filter out ColumnFamily rows that aren't part of the query -- Key: CASSANDRA-1125 URL: https://issues.apache.org/jira/browse/CASSANDRA-1125 Project: Cassandra Issue Type: Improvement Components: Hadoop Reporter: Jeremy Hanna Priority: Minor Fix For: 0.7.1 Currently, when running a MapReduce job against data in a Cassandra data store, it reads through all the data for a particular ColumnFamily. This could be optimized to only read through those rows that have to do with the query. It's a small change but wanted to put it in Jira so that it didn't fall through the cracks. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CASSANDRA-1125) Filter out ColumnFamily rows that aren't part of the query
[ https://issues.apache.org/jira/browse/CASSANDRA-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12976920#action_12976920 ] Jonathan Ellis commented on CASSANDRA-1125: --- I think everything I wanted to do here is covered by CASSANDRA-1600, but we weren't able to reach consensus on that for 0.7 so I tabled it. Filter out ColumnFamily rows that aren't part of the query -- Key: CASSANDRA-1125 URL: https://issues.apache.org/jira/browse/CASSANDRA-1125 Project: Cassandra Issue Type: Improvement Components: Hadoop Reporter: Jeremy Hanna Priority: Minor Fix For: 0.7.1 Currently, when running a MapReduce job against data in a Cassandra data store, it reads through all the data for a particular ColumnFamily. This could be optimized to only read through those rows that have to do with the query. It's a small change but wanted to put it in Jira so that it didn't fall through the cracks. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CASSANDRA-1125) Filter out ColumnFamily rows that aren't part of the query
[ https://issues.apache.org/jira/browse/CASSANDRA-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12935260#action_12935260 ] Mck SembWever commented on CASSANDRA-1125: -- this would be a very nice feature. it has been brought up http://thread.gmane.org/gmane.comp.db.cassandra.user/4965 and http://thread.gmane.org/gmane.comp.db.cassandra.user/6135 Filter out ColumnFamily rows that aren't part of the query -- Key: CASSANDRA-1125 URL: https://issues.apache.org/jira/browse/CASSANDRA-1125 Project: Cassandra Issue Type: Improvement Components: Hadoop Reporter: Jeremy Hanna Assignee: Matthew F. Dennis Priority: Minor Fix For: 0.7.1 Currently, when running a MapReduce job against data in a Cassandra data store, it reads through all the data for a particular ColumnFamily. This could be optimized to only read through those rows that have to do with the query. It's a small change but wanted to put it in Jira so that it didn't fall through the cracks. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CASSANDRA-1125) Filter out ColumnFamily rows that aren't part of the query
[ https://issues.apache.org/jira/browse/CASSANDRA-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12872354#action_12872354 ] Jonathan Ellis commented on CASSANDRA-1125: --- another option would be to use the RowPredicate with CASSANDRA-749 Filter out ColumnFamily rows that aren't part of the query -- Key: CASSANDRA-1125 URL: https://issues.apache.org/jira/browse/CASSANDRA-1125 Project: Cassandra Issue Type: Improvement Components: Hadoop Reporter: Jeremy Hanna Assignee: Jeremy Hanna Priority: Minor Fix For: 0.7 Currently, when running a MapReduce job against data in a Cassandra data store, it reads through all the data for a particular ColumnFamily. This could be optimized to only read through those rows that have to do with the query. It's a small change but wanted to put it in Jira so that it didn't fall through the cracks. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.