[jira] [Commented] (HBASE-4435) Add Group By functionality using Coprocessors
[ https://issues.apache.org/jira/browse/HBASE-4435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14662049#comment-14662049 ] nicu marasoiu commented on HBASE-4435: -- Hi, Is this still ongoing? I looked on github and seemed that only one metric like sum(column) is done, not multiple ones. The general case is of course group by (d1,..,dn) sum(c1) hyperlogUniq(c2) i.e. multiple metrics. Thank you, Nicu Add Group By functionality using Coprocessors - Key: HBASE-4435 URL: https://issues.apache.org/jira/browse/HBASE-4435 Project: HBase Issue Type: Improvement Components: Coprocessors Reporter: Nichole Treadway Priority: Minor Labels: by, coprocessors, group, hbase Attachments: HBASE-4435-v2.patch, HBase-4435.patch Adds in a Group By -like functionality to HBase, using the Coprocessor framework. It provides the ability to group the result set on one or more columns (groupBy families). It computes statistics (max, min, sum, count, sum of squares, number missing) for a second column, called the stats column. To use, I've provided two implementations. 1. In the first, you specify a single group-by column and a stats field: statsMap = gbc.getStats(tableName, scan, groupByFamily, groupByQualifier, statsFamily, statsQualifier, statsFieldColumnInterpreter); The result is a map with the Group By column value (as a String) to a GroupByStatsValues object. The GroupByStatsValues object has max,min,sum etc. of the stats column for that group. 2. The second implementation allows you to specify a list of group-by columns and a stats field. The List of group-by columns is expected to contain lists of {column family, qualifier} pairs. statsMap = gbc.getStats(tableName, scan, listOfGroupByColumns, statsFamily, statsQualifier, statsFieldColumnInterpreter); The GroupByStatsValues code is adapted from the Solr Stats component. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-4435) Add Group By functionality using Coprocessors
[ https://issues.apache.org/jira/browse/HBASE-4435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549291#comment-13549291 ] Aaron Tokhy commented on HBASE-4435: I was working on other coprocessors in addition to this one in performing reverse indexing for range queries as well as filters that could be used to only scan a small portion of a region given a key-list. I created a Github for this work, so that I could split this work out into individual JIRA tickets. https://github.com/atokhy/secondary-index-coprocessor I'll be working with HBase 0.94.1 until I have a complete working implementation, and eventually rewrite most of it to use Google's protobuf API as another attempt. Add Group By functionality using Coprocessors - Key: HBASE-4435 URL: https://issues.apache.org/jira/browse/HBASE-4435 Project: HBase Issue Type: Improvement Components: Coprocessors Reporter: Nichole Treadway Priority: Minor Labels: by, coprocessors, group, hbase Attachments: HBase-4435.patch, HBASE-4435-v2.patch Adds in a Group By -like functionality to HBase, using the Coprocessor framework. It provides the ability to group the result set on one or more columns (groupBy families). It computes statistics (max, min, sum, count, sum of squares, number missing) for a second column, called the stats column. To use, I've provided two implementations. 1. In the first, you specify a single group-by column and a stats field: statsMap = gbc.getStats(tableName, scan, groupByFamily, groupByQualifier, statsFamily, statsQualifier, statsFieldColumnInterpreter); The result is a map with the Group By column value (as a String) to a GroupByStatsValues object. The GroupByStatsValues object has max,min,sum etc. of the stats column for that group. 2. The second implementation allows you to specify a list of group-by columns and a stats field. The List of group-by columns is expected to contain lists of {column family, qualifier} pairs. statsMap = gbc.getStats(tableName, scan, listOfGroupByColumns, statsFamily, statsQualifier, statsFieldColumnInterpreter); The GroupByStatsValues code is adapted from the Solr Stats component. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4435) Add Group By functionality using Coprocessors
[ https://issues.apache.org/jira/browse/HBASE-4435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13540856#comment-13540856 ] Anoop Sam John commented on HBASE-4435: --- May be good to add to the CP examples section. Someone working with this? Add Group By functionality using Coprocessors - Key: HBASE-4435 URL: https://issues.apache.org/jira/browse/HBASE-4435 Project: HBase Issue Type: Improvement Components: Coprocessors Reporter: Nichole Treadway Priority: Minor Labels: by, coprocessors, group, hbase Attachments: HBase-4435.patch, HBASE-4435-v2.patch Adds in a Group By -like functionality to HBase, using the Coprocessor framework. It provides the ability to group the result set on one or more columns (groupBy families). It computes statistics (max, min, sum, count, sum of squares, number missing) for a second column, called the stats column. To use, I've provided two implementations. 1. In the first, you specify a single group-by column and a stats field: statsMap = gbc.getStats(tableName, scan, groupByFamily, groupByQualifier, statsFamily, statsQualifier, statsFieldColumnInterpreter); The result is a map with the Group By column value (as a String) to a GroupByStatsValues object. The GroupByStatsValues object has max,min,sum etc. of the stats column for that group. 2. The second implementation allows you to specify a list of group-by columns and a stats field. The List of group-by columns is expected to contain lists of {column family, qualifier} pairs. statsMap = gbc.getStats(tableName, scan, listOfGroupByColumns, statsFamily, statsQualifier, statsFieldColumnInterpreter); The GroupByStatsValues code is adapted from the Solr Stats component. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4435) Add Group By functionality using Coprocessors
[ https://issues.apache.org/jira/browse/HBASE-4435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13478311#comment-13478311 ] Ted Yu commented on HBASE-4435: --- Thanks for the patch. Can you provide trunk patch following the example of: HBASE-6785 'Convert AggregateProtocol to protobuf defined coprocessor service' Will provide comments soon. For patch of this size, review board (https://reviews.apache.org) would help reviewers. Add Group By functionality using Coprocessors - Key: HBASE-4435 URL: https://issues.apache.org/jira/browse/HBASE-4435 Project: HBase Issue Type: Improvement Components: Coprocessors Reporter: Nichole Treadway Priority: Minor Labels: by, coprocessors, group, hbase Attachments: HBase-4435.patch, HBASE-4435-v2.patch Adds in a Group By -like functionality to HBase, using the Coprocessor framework. It provides the ability to group the result set on one or more columns (groupBy families). It computes statistics (max, min, sum, count, sum of squares, number missing) for a second column, called the stats column. To use, I've provided two implementations. 1. In the first, you specify a single group-by column and a stats field: statsMap = gbc.getStats(tableName, scan, groupByFamily, groupByQualifier, statsFamily, statsQualifier, statsFieldColumnInterpreter); The result is a map with the Group By column value (as a String) to a GroupByStatsValues object. The GroupByStatsValues object has max,min,sum etc. of the stats column for that group. 2. The second implementation allows you to specify a list of group-by columns and a stats field. The List of group-by columns is expected to contain lists of {column family, qualifier} pairs. statsMap = gbc.getStats(tableName, scan, listOfGroupByColumns, statsFamily, statsQualifier, statsFieldColumnInterpreter); The GroupByStatsValues code is adapted from the Solr Stats component. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4435) Add Group By functionality using Coprocessors
[ https://issues.apache.org/jira/browse/HBASE-4435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13478351#comment-13478351 ] Ted Yu commented on HBASE-4435: --- I didn't find any test in the patch. It would be difficult for a feature to be accepted without new tests. Should GroupByStatsValues be named GroupByStats (since stats imply some values) ? {code} + * Copyright 2012 The Apache Software Foundation {code} The above line is no longer needed in license header. BigDecimalColumnInterpreter is covered in HBASE-6669. To make the workload reasonable for this JIRA, you can exclude it from patch. {code} +public class CharacterColumnInterpreter implements ColumnInterpreterCharacter, Character { {code} Add annotation for audience and stability for public classes. In GroupByClient.java, the following import can be removed: {code} +import com.sun.istack.logging.Logger; {code} {code} +MapText, GroupByStatsValuesT, S getStats( + final byte[] tableName, final Scan scan, + final Listbyte [][] groupByTuples, final byte[][] statsTuple, {code} The @param for the above method doesn't match actual parameters - probably you changed API in later iteration. {code} +class RowNumCallback implements {code} The above class can be made private. I think we should find a better name for the above class - it does aggregation. {code} +long bt = System.currentTimeMillis(); {code} Please use EnvironmentEdge instead. {code} +table.close(); {code} Please enclose the above in finally clause. Add Group By functionality using Coprocessors - Key: HBASE-4435 URL: https://issues.apache.org/jira/browse/HBASE-4435 Project: HBase Issue Type: Improvement Components: Coprocessors Reporter: Nichole Treadway Priority: Minor Labels: by, coprocessors, group, hbase Attachments: HBase-4435.patch, HBASE-4435-v2.patch Adds in a Group By -like functionality to HBase, using the Coprocessor framework. It provides the ability to group the result set on one or more columns (groupBy families). It computes statistics (max, min, sum, count, sum of squares, number missing) for a second column, called the stats column. To use, I've provided two implementations. 1. In the first, you specify a single group-by column and a stats field: statsMap = gbc.getStats(tableName, scan, groupByFamily, groupByQualifier, statsFamily, statsQualifier, statsFieldColumnInterpreter); The result is a map with the Group By column value (as a String) to a GroupByStatsValues object. The GroupByStatsValues object has max,min,sum etc. of the stats column for that group. 2. The second implementation allows you to specify a list of group-by columns and a stats field. The List of group-by columns is expected to contain lists of {column family, qualifier} pairs. statsMap = gbc.getStats(tableName, scan, listOfGroupByColumns, statsFamily, statsQualifier, statsFieldColumnInterpreter); The GroupByStatsValues code is adapted from the Solr Stats component. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4435) Add Group By functionality using Coprocessors
[ https://issues.apache.org/jira/browse/HBASE-4435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13478408#comment-13478408 ] Aaron Tokhy commented on HBASE-4435: Thanks for the quick review, I'll soon update JIRA with a new patch, based off of SVN trunk, though not at the moment. Also I'll have to clean up some of the code, thanks for the quick feedback! I may also change a few other things, such as using HashedBytes instead of Text to be able to perform roll-ups of types other than UTF-8 strings. Add Group By functionality using Coprocessors - Key: HBASE-4435 URL: https://issues.apache.org/jira/browse/HBASE-4435 Project: HBase Issue Type: Improvement Components: Coprocessors Reporter: Nichole Treadway Priority: Minor Labels: by, coprocessors, group, hbase Attachments: HBase-4435.patch, HBASE-4435-v2.patch Adds in a Group By -like functionality to HBase, using the Coprocessor framework. It provides the ability to group the result set on one or more columns (groupBy families). It computes statistics (max, min, sum, count, sum of squares, number missing) for a second column, called the stats column. To use, I've provided two implementations. 1. In the first, you specify a single group-by column and a stats field: statsMap = gbc.getStats(tableName, scan, groupByFamily, groupByQualifier, statsFamily, statsQualifier, statsFieldColumnInterpreter); The result is a map with the Group By column value (as a String) to a GroupByStatsValues object. The GroupByStatsValues object has max,min,sum etc. of the stats column for that group. 2. The second implementation allows you to specify a list of group-by columns and a stats field. The List of group-by columns is expected to contain lists of {column family, qualifier} pairs. statsMap = gbc.getStats(tableName, scan, listOfGroupByColumns, statsFamily, statsQualifier, statsFieldColumnInterpreter); The GroupByStatsValues code is adapted from the Solr Stats component. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4435) Add Group By functionality using Coprocessors
[ https://issues.apache.org/jira/browse/HBASE-4435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255546#comment-13255546 ] lifeng commented on HBASE-4435: --- when can this patch be put into hbase? Add Group By functionality using Coprocessors - Key: HBASE-4435 URL: https://issues.apache.org/jira/browse/HBASE-4435 Project: HBase Issue Type: Improvement Components: coprocessors Reporter: Nichole Treadway Priority: Minor Attachments: HBase-4435.patch Adds in a Group By -like functionality to HBase, using the Coprocessor framework. It provides the ability to group the result set on one or more columns (groupBy families). It computes statistics (max, min, sum, count, sum of squares, number missing) for a second column, called the stats column. To use, I've provided two implementations. 1. In the first, you specify a single group-by column and a stats field: statsMap = gbc.getStats(tableName, scan, groupByFamily, groupByQualifier, statsFamily, statsQualifier, statsFieldColumnInterpreter); The result is a map with the Group By column value (as a String) to a GroupByStatsValues object. The GroupByStatsValues object has max,min,sum etc. of the stats column for that group. 2. The second implementation allows you to specify a list of group-by columns and a stats field. The List of group-by columns is expected to contain lists of {column family, qualifier} pairs. statsMap = gbc.getStats(tableName, scan, listOfGroupByColumns, statsFamily, statsQualifier, statsFieldColumnInterpreter); The GroupByStatsValues code is adapted from the Solr Stats component. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4435) Add Group By functionality using Coprocessors
[ https://issues.apache.org/jira/browse/HBASE-4435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1327#comment-1327 ] Zhihong Yu commented on HBASE-4435: --- @Nicole: The attached patch is half year old. Do you have a newer version ? Add Group By functionality using Coprocessors - Key: HBASE-4435 URL: https://issues.apache.org/jira/browse/HBASE-4435 Project: HBase Issue Type: Improvement Components: coprocessors Reporter: Nichole Treadway Priority: Minor Attachments: HBase-4435.patch Adds in a Group By -like functionality to HBase, using the Coprocessor framework. It provides the ability to group the result set on one or more columns (groupBy families). It computes statistics (max, min, sum, count, sum of squares, number missing) for a second column, called the stats column. To use, I've provided two implementations. 1. In the first, you specify a single group-by column and a stats field: statsMap = gbc.getStats(tableName, scan, groupByFamily, groupByQualifier, statsFamily, statsQualifier, statsFieldColumnInterpreter); The result is a map with the Group By column value (as a String) to a GroupByStatsValues object. The GroupByStatsValues object has max,min,sum etc. of the stats column for that group. 2. The second implementation allows you to specify a list of group-by columns and a stats field. The List of group-by columns is expected to contain lists of {column family, qualifier} pairs. statsMap = gbc.getStats(tableName, scan, listOfGroupByColumns, statsFamily, statsQualifier, statsFieldColumnInterpreter); The GroupByStatsValues code is adapted from the Solr Stats component. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4435) Add Group By functionality using Coprocessors
[ https://issues.apache.org/jira/browse/HBASE-4435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13225933#comment-13225933 ] lifeng commented on HBASE-4435: --- 6 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel Add Group By functionality using Coprocessors - Key: HBASE-4435 URL: https://issues.apache.org/jira/browse/HBASE-4435 Project: HBase Issue Type: Improvement Components: coprocessors Reporter: Nichole Treadway Priority: Minor Attachments: HBase-4435.patch Adds in a Group By -like functionality to HBase, using the Coprocessor framework. It provides the ability to group the result set on one or more columns (groupBy families). It computes statistics (max, min, sum, count, sum of squares, number missing) for a second column, called the stats column. To use, I've provided two implementations. 1. In the first, you specify a single group-by column and a stats field: statsMap = gbc.getStats(tableName, scan, groupByFamily, groupByQualifier, statsFamily, statsQualifier, statsFieldColumnInterpreter); The result is a map with the Group By column value (as a String) to a GroupByStatsValues object. The GroupByStatsValues object has max,min,sum etc. of the stats column for that group. 2. The second implementation allows you to specify a list of group-by columns and a stats field. The List of group-by columns is expected to contain lists of {column family, qualifier} pairs. statsMap = gbc.getStats(tableName, scan, listOfGroupByColumns, statsFamily, statsQualifier, statsFieldColumnInterpreter); The GroupByStatsValues code is adapted from the Solr Stats component. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4435) Add Group By functionality using Coprocessors
[ https://issues.apache.org/jira/browse/HBASE-4435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13167477#comment-13167477 ] Jeff Hammerbacher commented on HBASE-4435: -- How does this approach compare to HBASE-1512? Add Group By functionality using Coprocessors - Key: HBASE-4435 URL: https://issues.apache.org/jira/browse/HBASE-4435 Project: HBase Issue Type: Improvement Components: coprocessors Reporter: Nichole Treadway Priority: Minor Attachments: HBase-4435.patch Adds in a Group By -like functionality to HBase, using the Coprocessor framework. It provides the ability to group the result set on one or more columns (groupBy families). It computes statistics (max, min, sum, count, sum of squares, number missing) for a second column, called the stats column. To use, I've provided two implementations. 1. In the first, you specify a single group-by column and a stats field: statsMap = gbc.getStats(tableName, scan, groupByFamily, groupByQualifier, statsFamily, statsQualifier, statsFieldColumnInterpreter); The result is a map with the Group By column value (as a String) to a GroupByStatsValues object. The GroupByStatsValues object has max,min,sum etc. of the stats column for that group. 2. The second implementation allows you to specify a list of group-by columns and a stats field. The List of group-by columns is expected to contain lists of {column family, qualifier} pairs. statsMap = gbc.getStats(tableName, scan, listOfGroupByColumns, statsFamily, statsQualifier, statsFieldColumnInterpreter); The GroupByStatsValues code is adapted from the Solr Stats component. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira