[jira] [Commented] (HBASE-4435) Add Group By functionality using Coprocessors

2015-08-07 Thread nicu marasoiu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14662049#comment-14662049
 ] 

nicu marasoiu commented on HBASE-4435:
--

Hi,

Is this still ongoing? I looked on github and seemed that only one metric like 
sum(column) is done, not multiple ones. The general case is of course group by 
(d1,..,dn) sum(c1) hyperlogUniq(c2) i.e. multiple metrics.

Thank you,
Nicu

 Add Group By functionality using Coprocessors
 -

 Key: HBASE-4435
 URL: https://issues.apache.org/jira/browse/HBASE-4435
 Project: HBase
  Issue Type: Improvement
  Components: Coprocessors
Reporter: Nichole Treadway
Priority: Minor
  Labels: by, coprocessors, group, hbase
 Attachments: HBASE-4435-v2.patch, HBase-4435.patch


 Adds in a Group By -like functionality to HBase, using the Coprocessor 
 framework. 
 It provides the ability to group the result set on one or more columns 
 (groupBy families). It computes statistics (max, min, sum, count, sum of 
 squares, number missing) for a second column, called the stats column. 
 To use, I've provided two implementations.
 1. In the first, you specify a single group-by column and a stats field:
   statsMap = gbc.getStats(tableName, scan, groupByFamily, 
 groupByQualifier, statsFamily, statsQualifier, statsFieldColumnInterpreter);
 The result is a map with the Group By column value (as a String) to a 
 GroupByStatsValues object. The GroupByStatsValues object has max,min,sum etc. 
 of the stats column for that group.
 2. The second implementation allows you to specify a list of group-by columns 
 and a stats field. The List of group-by columns is expected to contain lists 
 of {column family, qualifier} pairs. 
   statsMap = gbc.getStats(tableName, scan, listOfGroupByColumns, 
 statsFamily, statsQualifier, statsFieldColumnInterpreter);
 The GroupByStatsValues code is adapted from the Solr Stats component.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-4435) Add Group By functionality using Coprocessors

2013-01-09 Thread Aaron Tokhy (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549291#comment-13549291
 ] 

Aaron Tokhy commented on HBASE-4435:


I was working on other coprocessors in addition to this one in performing 
reverse indexing for range queries as well as filters that could be used to 
only scan a small portion of a region given a key-list.

I created a Github for this work, so that I could split this work out into 
individual JIRA tickets.

https://github.com/atokhy/secondary-index-coprocessor

I'll be working with HBase 0.94.1 until I have a complete working 
implementation, and eventually rewrite most of it to use Google's protobuf API 
as another attempt.

 Add Group By functionality using Coprocessors
 -

 Key: HBASE-4435
 URL: https://issues.apache.org/jira/browse/HBASE-4435
 Project: HBase
  Issue Type: Improvement
  Components: Coprocessors
Reporter: Nichole Treadway
Priority: Minor
  Labels: by, coprocessors, group, hbase
 Attachments: HBase-4435.patch, HBASE-4435-v2.patch


 Adds in a Group By -like functionality to HBase, using the Coprocessor 
 framework. 
 It provides the ability to group the result set on one or more columns 
 (groupBy families). It computes statistics (max, min, sum, count, sum of 
 squares, number missing) for a second column, called the stats column. 
 To use, I've provided two implementations.
 1. In the first, you specify a single group-by column and a stats field:
   statsMap = gbc.getStats(tableName, scan, groupByFamily, 
 groupByQualifier, statsFamily, statsQualifier, statsFieldColumnInterpreter);
 The result is a map with the Group By column value (as a String) to a 
 GroupByStatsValues object. The GroupByStatsValues object has max,min,sum etc. 
 of the stats column for that group.
 2. The second implementation allows you to specify a list of group-by columns 
 and a stats field. The List of group-by columns is expected to contain lists 
 of {column family, qualifier} pairs. 
   statsMap = gbc.getStats(tableName, scan, listOfGroupByColumns, 
 statsFamily, statsQualifier, statsFieldColumnInterpreter);
 The GroupByStatsValues code is adapted from the Solr Stats component.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-4435) Add Group By functionality using Coprocessors

2012-12-29 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13540856#comment-13540856
 ] 

Anoop Sam John commented on HBASE-4435:
---

May be good to add to the CP examples section. Someone working with this?

 Add Group By functionality using Coprocessors
 -

 Key: HBASE-4435
 URL: https://issues.apache.org/jira/browse/HBASE-4435
 Project: HBase
  Issue Type: Improvement
  Components: Coprocessors
Reporter: Nichole Treadway
Priority: Minor
  Labels: by, coprocessors, group, hbase
 Attachments: HBase-4435.patch, HBASE-4435-v2.patch


 Adds in a Group By -like functionality to HBase, using the Coprocessor 
 framework. 
 It provides the ability to group the result set on one or more columns 
 (groupBy families). It computes statistics (max, min, sum, count, sum of 
 squares, number missing) for a second column, called the stats column. 
 To use, I've provided two implementations.
 1. In the first, you specify a single group-by column and a stats field:
   statsMap = gbc.getStats(tableName, scan, groupByFamily, 
 groupByQualifier, statsFamily, statsQualifier, statsFieldColumnInterpreter);
 The result is a map with the Group By column value (as a String) to a 
 GroupByStatsValues object. The GroupByStatsValues object has max,min,sum etc. 
 of the stats column for that group.
 2. The second implementation allows you to specify a list of group-by columns 
 and a stats field. The List of group-by columns is expected to contain lists 
 of {column family, qualifier} pairs. 
   statsMap = gbc.getStats(tableName, scan, listOfGroupByColumns, 
 statsFamily, statsQualifier, statsFieldColumnInterpreter);
 The GroupByStatsValues code is adapted from the Solr Stats component.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-4435) Add Group By functionality using Coprocessors

2012-10-17 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13478311#comment-13478311
 ] 

Ted Yu commented on HBASE-4435:
---

Thanks for the patch.
Can you provide trunk patch following the example of:
HBASE-6785 'Convert AggregateProtocol to protobuf defined coprocessor service'

Will provide comments soon.

For patch of this size, review board (https://reviews.apache.org) would help 
reviewers.

 Add Group By functionality using Coprocessors
 -

 Key: HBASE-4435
 URL: https://issues.apache.org/jira/browse/HBASE-4435
 Project: HBase
  Issue Type: Improvement
  Components: Coprocessors
Reporter: Nichole Treadway
Priority: Minor
  Labels: by, coprocessors, group, hbase
 Attachments: HBase-4435.patch, HBASE-4435-v2.patch


 Adds in a Group By -like functionality to HBase, using the Coprocessor 
 framework. 
 It provides the ability to group the result set on one or more columns 
 (groupBy families). It computes statistics (max, min, sum, count, sum of 
 squares, number missing) for a second column, called the stats column. 
 To use, I've provided two implementations.
 1. In the first, you specify a single group-by column and a stats field:
   statsMap = gbc.getStats(tableName, scan, groupByFamily, 
 groupByQualifier, statsFamily, statsQualifier, statsFieldColumnInterpreter);
 The result is a map with the Group By column value (as a String) to a 
 GroupByStatsValues object. The GroupByStatsValues object has max,min,sum etc. 
 of the stats column for that group.
 2. The second implementation allows you to specify a list of group-by columns 
 and a stats field. The List of group-by columns is expected to contain lists 
 of {column family, qualifier} pairs. 
   statsMap = gbc.getStats(tableName, scan, listOfGroupByColumns, 
 statsFamily, statsQualifier, statsFieldColumnInterpreter);
 The GroupByStatsValues code is adapted from the Solr Stats component.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-4435) Add Group By functionality using Coprocessors

2012-10-17 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13478351#comment-13478351
 ] 

Ted Yu commented on HBASE-4435:
---

I didn't find any test in the patch. It would be difficult for a feature to be 
accepted without new tests.
Should GroupByStatsValues be named GroupByStats (since stats imply some values) 
?

{code}
+ * Copyright 2012 The Apache Software Foundation
{code}
The above line is no longer needed in license header.

BigDecimalColumnInterpreter is covered in HBASE-6669. To make the workload 
reasonable for this JIRA, you can exclude it from patch.
{code}
+public class CharacterColumnInterpreter implements 
ColumnInterpreterCharacter, Character {
{code}
Add annotation for audience and stability for public classes.

In GroupByClient.java, the following import can be removed:
{code}
+import com.sun.istack.logging.Logger;
{code}
{code}
+MapText, GroupByStatsValuesT, S getStats(
+  final byte[] tableName, final Scan scan, 
+  final Listbyte [][] groupByTuples, final byte[][] statsTuple, 
{code}
The @param for the above method doesn't match actual parameters - probably you 
changed API in later iteration.
{code}
+class RowNumCallback implements
{code}
The above class can be made private.
I think we should find a better name for the above class - it does aggregation.
{code}
+long bt = System.currentTimeMillis();
{code}
Please use EnvironmentEdge instead.
{code}
+table.close();
{code}
Please enclose the above in finally clause.

 Add Group By functionality using Coprocessors
 -

 Key: HBASE-4435
 URL: https://issues.apache.org/jira/browse/HBASE-4435
 Project: HBase
  Issue Type: Improvement
  Components: Coprocessors
Reporter: Nichole Treadway
Priority: Minor
  Labels: by, coprocessors, group, hbase
 Attachments: HBase-4435.patch, HBASE-4435-v2.patch


 Adds in a Group By -like functionality to HBase, using the Coprocessor 
 framework. 
 It provides the ability to group the result set on one or more columns 
 (groupBy families). It computes statistics (max, min, sum, count, sum of 
 squares, number missing) for a second column, called the stats column. 
 To use, I've provided two implementations.
 1. In the first, you specify a single group-by column and a stats field:
   statsMap = gbc.getStats(tableName, scan, groupByFamily, 
 groupByQualifier, statsFamily, statsQualifier, statsFieldColumnInterpreter);
 The result is a map with the Group By column value (as a String) to a 
 GroupByStatsValues object. The GroupByStatsValues object has max,min,sum etc. 
 of the stats column for that group.
 2. The second implementation allows you to specify a list of group-by columns 
 and a stats field. The List of group-by columns is expected to contain lists 
 of {column family, qualifier} pairs. 
   statsMap = gbc.getStats(tableName, scan, listOfGroupByColumns, 
 statsFamily, statsQualifier, statsFieldColumnInterpreter);
 The GroupByStatsValues code is adapted from the Solr Stats component.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-4435) Add Group By functionality using Coprocessors

2012-10-17 Thread Aaron Tokhy (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13478408#comment-13478408
 ] 

Aaron Tokhy commented on HBASE-4435:


Thanks for the quick review, I'll soon update JIRA with a new patch, based off 
of SVN trunk, though not at the moment.  Also I'll have to clean up some of the 
code, thanks for the quick feedback!

I may also change a few other things, such as using HashedBytes instead of Text 
to be able to perform roll-ups of types other than UTF-8 strings.

 Add Group By functionality using Coprocessors
 -

 Key: HBASE-4435
 URL: https://issues.apache.org/jira/browse/HBASE-4435
 Project: HBase
  Issue Type: Improvement
  Components: Coprocessors
Reporter: Nichole Treadway
Priority: Minor
  Labels: by, coprocessors, group, hbase
 Attachments: HBase-4435.patch, HBASE-4435-v2.patch


 Adds in a Group By -like functionality to HBase, using the Coprocessor 
 framework. 
 It provides the ability to group the result set on one or more columns 
 (groupBy families). It computes statistics (max, min, sum, count, sum of 
 squares, number missing) for a second column, called the stats column. 
 To use, I've provided two implementations.
 1. In the first, you specify a single group-by column and a stats field:
   statsMap = gbc.getStats(tableName, scan, groupByFamily, 
 groupByQualifier, statsFamily, statsQualifier, statsFieldColumnInterpreter);
 The result is a map with the Group By column value (as a String) to a 
 GroupByStatsValues object. The GroupByStatsValues object has max,min,sum etc. 
 of the stats column for that group.
 2. The second implementation allows you to specify a list of group-by columns 
 and a stats field. The List of group-by columns is expected to contain lists 
 of {column family, qualifier} pairs. 
   statsMap = gbc.getStats(tableName, scan, listOfGroupByColumns, 
 statsFamily, statsQualifier, statsFieldColumnInterpreter);
 The GroupByStatsValues code is adapted from the Solr Stats component.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-4435) Add Group By functionality using Coprocessors

2012-04-17 Thread lifeng (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255546#comment-13255546
 ] 

lifeng commented on HBASE-4435:
---

when can this patch be put into hbase?

 Add Group By functionality using Coprocessors
 -

 Key: HBASE-4435
 URL: https://issues.apache.org/jira/browse/HBASE-4435
 Project: HBase
  Issue Type: Improvement
  Components: coprocessors
Reporter: Nichole Treadway
Priority: Minor
 Attachments: HBase-4435.patch


 Adds in a Group By -like functionality to HBase, using the Coprocessor 
 framework. 
 It provides the ability to group the result set on one or more columns 
 (groupBy families). It computes statistics (max, min, sum, count, sum of 
 squares, number missing) for a second column, called the stats column. 
 To use, I've provided two implementations.
 1. In the first, you specify a single group-by column and a stats field:
   statsMap = gbc.getStats(tableName, scan, groupByFamily, 
 groupByQualifier, statsFamily, statsQualifier, statsFieldColumnInterpreter);
 The result is a map with the Group By column value (as a String) to a 
 GroupByStatsValues object. The GroupByStatsValues object has max,min,sum etc. 
 of the stats column for that group.
 2. The second implementation allows you to specify a list of group-by columns 
 and a stats field. The List of group-by columns is expected to contain lists 
 of {column family, qualifier} pairs. 
   statsMap = gbc.getStats(tableName, scan, listOfGroupByColumns, 
 statsFamily, statsQualifier, statsFieldColumnInterpreter);
 The GroupByStatsValues code is adapted from the Solr Stats component.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4435) Add Group By functionality using Coprocessors

2012-04-17 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1327#comment-1327
 ] 

Zhihong Yu commented on HBASE-4435:
---

@Nicole:
The attached patch is half year old. Do you have a newer version ?

 Add Group By functionality using Coprocessors
 -

 Key: HBASE-4435
 URL: https://issues.apache.org/jira/browse/HBASE-4435
 Project: HBase
  Issue Type: Improvement
  Components: coprocessors
Reporter: Nichole Treadway
Priority: Minor
 Attachments: HBase-4435.patch


 Adds in a Group By -like functionality to HBase, using the Coprocessor 
 framework. 
 It provides the ability to group the result set on one or more columns 
 (groupBy families). It computes statistics (max, min, sum, count, sum of 
 squares, number missing) for a second column, called the stats column. 
 To use, I've provided two implementations.
 1. In the first, you specify a single group-by column and a stats field:
   statsMap = gbc.getStats(tableName, scan, groupByFamily, 
 groupByQualifier, statsFamily, statsQualifier, statsFieldColumnInterpreter);
 The result is a map with the Group By column value (as a String) to a 
 GroupByStatsValues object. The GroupByStatsValues object has max,min,sum etc. 
 of the stats column for that group.
 2. The second implementation allows you to specify a list of group-by columns 
 and a stats field. The List of group-by columns is expected to contain lists 
 of {column family, qualifier} pairs. 
   statsMap = gbc.getStats(tableName, scan, listOfGroupByColumns, 
 statsFamily, statsQualifier, statsFieldColumnInterpreter);
 The GroupByStatsValues code is adapted from the Solr Stats component.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4435) Add Group By functionality using Coprocessors

2012-03-09 Thread lifeng (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13225933#comment-13225933
 ] 

lifeng commented on HBASE-4435:
---

6 millis timeout while waiting for channel to be ready for read. ch : 
java.nio.channels.SocketChannel

 Add Group By functionality using Coprocessors
 -

 Key: HBASE-4435
 URL: https://issues.apache.org/jira/browse/HBASE-4435
 Project: HBase
  Issue Type: Improvement
  Components: coprocessors
Reporter: Nichole Treadway
Priority: Minor
 Attachments: HBase-4435.patch


 Adds in a Group By -like functionality to HBase, using the Coprocessor 
 framework. 
 It provides the ability to group the result set on one or more columns 
 (groupBy families). It computes statistics (max, min, sum, count, sum of 
 squares, number missing) for a second column, called the stats column. 
 To use, I've provided two implementations.
 1. In the first, you specify a single group-by column and a stats field:
   statsMap = gbc.getStats(tableName, scan, groupByFamily, 
 groupByQualifier, statsFamily, statsQualifier, statsFieldColumnInterpreter);
 The result is a map with the Group By column value (as a String) to a 
 GroupByStatsValues object. The GroupByStatsValues object has max,min,sum etc. 
 of the stats column for that group.
 2. The second implementation allows you to specify a list of group-by columns 
 and a stats field. The List of group-by columns is expected to contain lists 
 of {column family, qualifier} pairs. 
   statsMap = gbc.getStats(tableName, scan, listOfGroupByColumns, 
 statsFamily, statsQualifier, statsFieldColumnInterpreter);
 The GroupByStatsValues code is adapted from the Solr Stats component.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4435) Add Group By functionality using Coprocessors

2011-12-12 Thread Jeff Hammerbacher (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13167477#comment-13167477
 ] 

Jeff Hammerbacher commented on HBASE-4435:
--

How does this approach compare to HBASE-1512?

 Add Group By functionality using Coprocessors
 -

 Key: HBASE-4435
 URL: https://issues.apache.org/jira/browse/HBASE-4435
 Project: HBase
  Issue Type: Improvement
  Components: coprocessors
Reporter: Nichole Treadway
Priority: Minor
 Attachments: HBase-4435.patch


 Adds in a Group By -like functionality to HBase, using the Coprocessor 
 framework. 
 It provides the ability to group the result set on one or more columns 
 (groupBy families). It computes statistics (max, min, sum, count, sum of 
 squares, number missing) for a second column, called the stats column. 
 To use, I've provided two implementations.
 1. In the first, you specify a single group-by column and a stats field:
   statsMap = gbc.getStats(tableName, scan, groupByFamily, 
 groupByQualifier, statsFamily, statsQualifier, statsFieldColumnInterpreter);
 The result is a map with the Group By column value (as a String) to a 
 GroupByStatsValues object. The GroupByStatsValues object has max,min,sum etc. 
 of the stats column for that group.
 2. The second implementation allows you to specify a list of group-by columns 
 and a stats field. The List of group-by columns is expected to contain lists 
 of {column family, qualifier} pairs. 
   statsMap = gbc.getStats(tableName, scan, listOfGroupByColumns, 
 statsFamily, statsQualifier, statsFieldColumnInterpreter);
 The GroupByStatsValues code is adapted from the Solr Stats component.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira