Actually, the egrep was used on purpose: it's the only way to get the shell to use the BatchScanner, which can talk to multiple tservers at once.
-Eric On Fri, Oct 11, 2013 at 3:10 PM, Josh Elser <[email protected]> wrote: > You'll need to add the '-np' option on the scan command as well. > > > On 10/11/2013 03:05 PM, Jared Winick wrote: >> >> After following the commands Eric lists to set the iterator for that >> table, instead of running 'egrep' in the shell, you could do this from the >> Linux command line >> >> accumulo shell -u username -p password -e "scan -t foo" | wc -l >> >> >> On Fri, Oct 11, 2013 at 11:42 AM, Eric Newton <[email protected] >> <mailto:[email protected]>> wrote: >> >> You can stack a counting Combiner over the FirstEntryInRowIterator and >> batch scan the table. If it's just a test data set with under a >> billion rows, you can just count the result set coming out of the >> FirstEntryInRowIterator. You'll be I/O bound at the client, but it >> will work. >> >> This does it with the shell, but the output is kinda voluminous: >> >> root@test> createtable foo >> root@test foo> insert row1 cf col1 value >> root@test foo> insert row1 cf col2 value >> root@test foo> insert row1 cf col999 value >> root@test foo> insert row2 cf col1 value >> root@test foo> scan >> row1 cf:col1 [] value >> row1 cf:col2 [] value >> row1 cf:col999 [] value >> row2 cf:col1 [] value >> root@test foo> setiter -class >> org.apache.accumulo.core.iterators.FirstEntryInRowIterator -p 99 -scan >> Only allows iteration over the first entry per row >> ----------> set FirstEntryInRowIterator parameter scansBeforeSeek, >> Number of scans to try before seeking [10]: 10 >> root@test foo> egrep .* >> row1 cf:col1 [] value >> row2 cf:col1 [] value >> >> >> On Fri, Oct 11, 2013 at 10:53 AM, Terry P. <[email protected] >> <mailto:[email protected]>> wrote: >> > Hi guys, >> > I'm still a bit of a newbie as I'm more of an admin than a >> developer, and >> > now that formal testing has begun, I have testers asking me how >> to get a >> > total count of records in Accumulo for verification purposes >> after test >> > ingests have been run. >> > >> > In our case when I say "records" I mean the number of distinct >> rowkeys, not >> > the total number of entries. >> > >> > Is there any way to do this using just the Accumulo shell, maybe >> by writing >> > an aggregator or other class that can be run from within the >> Accumulo shell? >> > >> > Many thanks in advance, >> > Terry >> > >> > >> > On Tue, Jan 22, 2013 at 6:03 PM, Terry P. <[email protected] >> <mailto:[email protected]>> wrote: >> >> >> >> Greetings everyone, >> >> I want to simply get the total count of rows in a table using >> the accumulo >> >> shell. I'm very new to Accumulo so I apologize if it's a >> newbie question. >> >> >> >> I'm prototyping with the accumulo shell, and love how it can ingest >> >> records using exefile, so I've used python to generate a lot of >> test data. >> >> For some test cases in this sprint I need to verify the rows >> loaded match >> >> what's expected, hence the reason I need to get the total rows >> in a table. >> >> >> >> I'd bet there is some way to use setiter or setscaniter with >> the -agg >> >> option, but I can't figure it out. >> >> >> >> Any help would be greatly appreciated. >> >> >> >> Best regards, >> >> Terry >> > >> > >> >> >
