[jira] [Updated] (ACCUMULO-4730) Create an Entry length summarizer

2017-11-08 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ACCUMULO-4730:
-
Labels: newbie pull-request-available  (was: newbie)

> Create an Entry length summarizer
> -
>
> Key: ACCUMULO-4730
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4730
> Project: Accumulo
>  Issue Type: Improvement
>Reporter: Keith Turner
>Assignee: Jared R
>  Labels: newbie, pull-request-available
> Fix For: 2.0.0
>
>
> It would be very useful to have a built in 
> [Summarizer|https://github.com/apache/accumulo/blob/master/core/src/main/java/org/apache/accumulo/core/client/summary/Summarizer.java]
>  that computes summary information about field lengths.  Specifically key 
> length, row length, family length, qualifier length, visibility length, and 
> value length.   Whatever stats are computed must be able to computed 
> incrementally.  For example can incrementally compute min, max, count, sum, 
> and log2 histogram.  I think these would be good stats to start with.  Count 
> and sum can be used to compute the average.  There is an example of computing 
> a log2 histogram in the Summarizer javadoc.
> The Summarizer could be named EntryLenghtSummarizer and possibly produce 
> summaries like the following.  
> {noformat}
> count=XXX //do not need to track this per field, its the same for all
> key.min=XXX
> key.max=XXX
> key.sum=XXX
> key.logHist.8=XXX   //only output non zero exponents 
> key.logHist.9=XXX
> row.min=XXX
> row.max=XXX
> row.sum=XXX
> row.logHist.7=XXX
> row.logHist.8=XXX
> row.logHist.10=XXX
> family.min=XXX
> family.max=XXX
> family.sum=XXX
> family.logHist.6=XXX
> family.logHist.7=XXX
> etc...
> {noformat}
> This new summarizer would be placed in the 
> [summarizers|https://github.com/apache/accumulo/tree/master/core/src/main/java/org/apache/accumulo/core/client/summary/summarizers]
>  package.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (ACCUMULO-4730) Create an Entry length summarizer

2017-10-24 Thread Keith Turner (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Keith Turner updated ACCUMULO-4730:
---
Labels: newbie  (was: )

> Create an Entry length summarizer
> -
>
> Key: ACCUMULO-4730
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4730
> Project: Accumulo
>  Issue Type: Improvement
>Reporter: Keith Turner
>  Labels: newbie
> Fix For: 2.0.0
>
>
> It would be very useful to have a built in 
> [Summarizer|https://github.com/apache/accumulo/blob/master/core/src/main/java/org/apache/accumulo/core/client/summary/Summarizer.java]
>  that computes summary information about field lengths.  Specifically key 
> length, row length, family length, qualifier length, visibility length, and 
> value length.   Whatever stats are computed must be able to computed 
> incrementally.  For example can incrementally compute min, max, count, sum, 
> and log2 histogram.  I think these would be good stats to start with.  Count 
> and sum can be used to compute the average.  There is an example of computing 
> a log2 histogram in the Summarizer javadoc.
> The Summarizer could be named EntryLenghtSummarizer and possibly produce 
> summaries like the following.  
> {noformat}
> count=XXX //do not need to track this per field, its the same for all
> key.min=XXX
> key.max=XXX
> key.sum=XXX
> key.logHist.8=XXX   //only output non zero exponents 
> key.logHist.9=XXX
> row.min=XXX
> row.max=XXX
> row.sum=XXX
> row.logHist.7=XXX
> row.logHist.8=XXX
> row.logHist.10=XXX
> family.min=XXX
> family.max=XXX
> family.sum=XXX
> family.logHist.6=XXX
> family.logHist.7=XXX
> etc...
> {noformat}
> This new summarizer would be placed in the 
> [summarizers|https://github.com/apache/accumulo/tree/master/core/src/main/java/org/apache/accumulo/core/client/summary/summarizers]
>  package.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)