I am trying to print out the histogram with that command but get the usage
message instead. --dump option is working fine. I'm on Accumulo 1.5.0
PACKAGE=org.apache.accumulo.core.file.rfile
bin/accumulo $PACKAGE.PrintInfo --histogram
/accumulo/tables/53/t-0003371/A0003jbg.rf
Usage: org.apache.accumulo.core.file.rfile.PrintInfo [options] <file> {
<file> ... }
Options:
-d, --dump
dump the key/value pairs
Default: false
-h, -?, --help, -help
Default: false
--historgram
print a histogram of the key-value sizes
Default: false
Unknown option: --histogram
On Sat, Feb 22, 2014 at 8:47 AM, Mike Drob <[email protected]> wrote:
> There's not a single good way that I am aware of, but there are a couple
> ways that will get you close.
>
> First, you can use the SortedKeyIterator to truncate values and
> potentially save yourself a lot of data transfer.
> Second, each RFile header block will track the columns contained, up to
> 1000 (possibly configurable). Check out PrintInfo[1].
>
> Mike
>
> [1]:
> https://github.com/apache/accumulo/blob/master/core/src/main/java/org/apache/accumulo/core/file/rfile/PrintInfo.java
>
>
> On Sat, Feb 22, 2014 at 11:25 AM, Arshak Navruzyan <[email protected]>wrote:
>
>> I don't know the inner workings of the Rfiles enough but I was wondering
>> if there is a faster way to get a unique list of columns in Accumulo (short
>> of doing a full mapreduce). Is there some way to skip ahead all the
>> volumes and just get to the next column?
>>
>> Thanks
>>
>
>