You can stack a counting Combiner over the FirstEntryInRowIterator and batch scan the table. If it's just a test data set with under a billion rows, you can just count the result set coming out of the FirstEntryInRowIterator. You'll be I/O bound at the client, but it will work.
This does it with the shell, but the output is kinda voluminous: root@test> createtable foo root@test foo> insert row1 cf col1 value root@test foo> insert row1 cf col2 value root@test foo> insert row1 cf col999 value root@test foo> insert row2 cf col1 value root@test foo> scan row1 cf:col1 [] value row1 cf:col2 [] value row1 cf:col999 [] value row2 cf:col1 [] value root@test foo> setiter -class org.apache.accumulo.core.iterators.FirstEntryInRowIterator -p 99 -scan Only allows iteration over the first entry per row ----------> set FirstEntryInRowIterator parameter scansBeforeSeek, Number of scans to try before seeking [10]: 10 root@test foo> egrep .* row1 cf:col1 [] value row2 cf:col1 [] value On Fri, Oct 11, 2013 at 10:53 AM, Terry P. <[email protected]> wrote: > Hi guys, > I'm still a bit of a newbie as I'm more of an admin than a developer, and > now that formal testing has begun, I have testers asking me how to get a > total count of records in Accumulo for verification purposes after test > ingests have been run. > > In our case when I say "records" I mean the number of distinct rowkeys, not > the total number of entries. > > Is there any way to do this using just the Accumulo shell, maybe by writing > an aggregator or other class that can be run from within the Accumulo shell? > > Many thanks in advance, > Terry > > > On Tue, Jan 22, 2013 at 6:03 PM, Terry P. <[email protected]> wrote: >> >> Greetings everyone, >> I want to simply get the total count of rows in a table using the accumulo >> shell. I'm very new to Accumulo so I apologize if it's a newbie question. >> >> I'm prototyping with the accumulo shell, and love how it can ingest >> records using exefile, so I've used python to generate a lot of test data. >> For some test cases in this sprint I need to verify the rows loaded match >> what's expected, hence the reason I need to get the total rows in a table. >> >> I'd bet there is some way to use setiter or setscaniter with the -agg >> option, but I can't figure it out. >> >> Any help would be greatly appreciated. >> >> Best regards, >> Terry > >
