Ya, you'll want to remove the iterator after you do the count. You might be able to use it as a scan-only iterator, but I was just being lazy.
-Eric On Fri, Oct 11, 2013 at 3:18 PM, Terry P. <[email protected]> wrote: > Thanks Eric, Jared, and Josh. > > Jared's reply I realize that the setiter command stays in effect beyond my > shell session obviously. I see it now with the listiter command in the > shell. > > Our app normally does lookups by rowkey. Will the firstEntry iterator > adversely affect those queries? I assume not, but I want to double check. > > Thanks again guys, this is very helpful, > Terry > > > > On Fri, Oct 11, 2013 at 2:15 PM, Eric Newton <[email protected]> wrote: >> >> Actually, the egrep was used on purpose: it's the only way to get the >> shell to use the BatchScanner, which can talk to multiple tservers at >> once. >> >> -Eric >> >> >> On Fri, Oct 11, 2013 at 3:10 PM, Josh Elser <[email protected]> wrote: >> > You'll need to add the '-np' option on the scan command as well. >> > >> > >> > On 10/11/2013 03:05 PM, Jared Winick wrote: >> >> >> >> After following the commands Eric lists to set the iterator for that >> >> table, instead of running 'egrep' in the shell, you could do this from >> >> the >> >> Linux command line >> >> >> >> accumulo shell -u username -p password -e "scan -t foo" | wc -l >> >> >> >> >> >> On Fri, Oct 11, 2013 at 11:42 AM, Eric Newton <[email protected] >> >> <mailto:[email protected]>> wrote: >> >> >> >> You can stack a counting Combiner over the FirstEntryInRowIterator >> >> and >> >> batch scan the table. If it's just a test data set with under a >> >> billion rows, you can just count the result set coming out of the >> >> FirstEntryInRowIterator. You'll be I/O bound at the client, but it >> >> will work. >> >> >> >> This does it with the shell, but the output is kinda voluminous: >> >> >> >> root@test> createtable foo >> >> root@test foo> insert row1 cf col1 value >> >> root@test foo> insert row1 cf col2 value >> >> root@test foo> insert row1 cf col999 value >> >> root@test foo> insert row2 cf col1 value >> >> root@test foo> scan >> >> row1 cf:col1 [] value >> >> row1 cf:col2 [] value >> >> row1 cf:col999 [] value >> >> row2 cf:col1 [] value >> >> root@test foo> setiter -class >> >> org.apache.accumulo.core.iterators.FirstEntryInRowIterator -p 99 >> >> -scan >> >> Only allows iteration over the first entry per row >> >> ----------> set FirstEntryInRowIterator parameter scansBeforeSeek, >> >> Number of scans to try before seeking [10]: 10 >> >> root@test foo> egrep .* >> >> row1 cf:col1 [] value >> >> row2 cf:col1 [] value >> >> >> >> >> >> On Fri, Oct 11, 2013 at 10:53 AM, Terry P. <[email protected] >> >> <mailto:[email protected]>> wrote: >> >> > Hi guys, >> >> > I'm still a bit of a newbie as I'm more of an admin than a >> >> developer, and >> >> > now that formal testing has begun, I have testers asking me how >> >> to get a >> >> > total count of records in Accumulo for verification purposes >> >> after test >> >> > ingests have been run. >> >> > >> >> > In our case when I say "records" I mean the number of distinct >> >> rowkeys, not >> >> > the total number of entries. >> >> > >> >> > Is there any way to do this using just the Accumulo shell, maybe >> >> by writing >> >> > an aggregator or other class that can be run from within the >> >> Accumulo shell? >> >> > >> >> > Many thanks in advance, >> >> > Terry >> >> > >> >> > >> >> > On Tue, Jan 22, 2013 at 6:03 PM, Terry P. <[email protected] >> >> <mailto:[email protected]>> wrote: >> >> >> >> >> >> Greetings everyone, >> >> >> I want to simply get the total count of rows in a table using >> >> the accumulo >> >> >> shell. I'm very new to Accumulo so I apologize if it's a >> >> newbie question. >> >> >> >> >> >> I'm prototyping with the accumulo shell, and love how it can >> >> ingest >> >> >> records using exefile, so I've used python to generate a lot of >> >> test data. >> >> >> For some test cases in this sprint I need to verify the rows >> >> loaded match >> >> >> what's expected, hence the reason I need to get the total rows >> >> in a table. >> >> >> >> >> >> I'd bet there is some way to use setiter or setscaniter with >> >> the -agg >> >> >> option, but I can't figure it out. >> >> >> >> >> >> Any help would be greatly appreciated. >> >> >> >> >> >> Best regards, >> >> >> Terry >> >> > >> >> > >> >> >> >> >> > > >
