Thanks Eric, Jared, and Josh. Jared's reply I realize that the setiter command stays in effect beyond my shell session obviously. I see it now with the listiter command in the shell.
Our app normally does lookups by rowkey. Will the firstEntry iterator adversely affect those queries? I assume not, but I want to double check. Thanks again guys, this is very helpful, Terry On Fri, Oct 11, 2013 at 2:15 PM, Eric Newton <[email protected]> wrote: > Actually, the egrep was used on purpose: it's the only way to get the > shell to use the BatchScanner, which can talk to multiple tservers at > once. > > -Eric > > > On Fri, Oct 11, 2013 at 3:10 PM, Josh Elser <[email protected]> wrote: > > You'll need to add the '-np' option on the scan command as well. > > > > > > On 10/11/2013 03:05 PM, Jared Winick wrote: > >> > >> After following the commands Eric lists to set the iterator for that > >> table, instead of running 'egrep' in the shell, you could do this from > the > >> Linux command line > >> > >> accumulo shell -u username -p password -e "scan -t foo" | wc -l > >> > >> > >> On Fri, Oct 11, 2013 at 11:42 AM, Eric Newton <[email protected] > >> <mailto:[email protected]>> wrote: > >> > >> You can stack a counting Combiner over the FirstEntryInRowIterator > and > >> batch scan the table. If it's just a test data set with under a > >> billion rows, you can just count the result set coming out of the > >> FirstEntryInRowIterator. You'll be I/O bound at the client, but it > >> will work. > >> > >> This does it with the shell, but the output is kinda voluminous: > >> > >> root@test> createtable foo > >> root@test foo> insert row1 cf col1 value > >> root@test foo> insert row1 cf col2 value > >> root@test foo> insert row1 cf col999 value > >> root@test foo> insert row2 cf col1 value > >> root@test foo> scan > >> row1 cf:col1 [] value > >> row1 cf:col2 [] value > >> row1 cf:col999 [] value > >> row2 cf:col1 [] value > >> root@test foo> setiter -class > >> org.apache.accumulo.core.iterators.FirstEntryInRowIterator -p 99 > -scan > >> Only allows iteration over the first entry per row > >> ----------> set FirstEntryInRowIterator parameter scansBeforeSeek, > >> Number of scans to try before seeking [10]: 10 > >> root@test foo> egrep .* > >> row1 cf:col1 [] value > >> row2 cf:col1 [] value > >> > >> > >> On Fri, Oct 11, 2013 at 10:53 AM, Terry P. <[email protected] > >> <mailto:[email protected]>> wrote: > >> > Hi guys, > >> > I'm still a bit of a newbie as I'm more of an admin than a > >> developer, and > >> > now that formal testing has begun, I have testers asking me how > >> to get a > >> > total count of records in Accumulo for verification purposes > >> after test > >> > ingests have been run. > >> > > >> > In our case when I say "records" I mean the number of distinct > >> rowkeys, not > >> > the total number of entries. > >> > > >> > Is there any way to do this using just the Accumulo shell, maybe > >> by writing > >> > an aggregator or other class that can be run from within the > >> Accumulo shell? > >> > > >> > Many thanks in advance, > >> > Terry > >> > > >> > > >> > On Tue, Jan 22, 2013 at 6:03 PM, Terry P. <[email protected] > >> <mailto:[email protected]>> wrote: > >> >> > >> >> Greetings everyone, > >> >> I want to simply get the total count of rows in a table using > >> the accumulo > >> >> shell. I'm very new to Accumulo so I apologize if it's a > >> newbie question. > >> >> > >> >> I'm prototyping with the accumulo shell, and love how it can > ingest > >> >> records using exefile, so I've used python to generate a lot of > >> test data. > >> >> For some test cases in this sprint I need to verify the rows > >> loaded match > >> >> what's expected, hence the reason I need to get the total rows > >> in a table. > >> >> > >> >> I'd bet there is some way to use setiter or setscaniter with > >> the -agg > >> >> option, but I can't figure it out. > >> >> > >> >> Any help would be greatly appreciated. > >> >> > >> >> Best regards, > >> >> Terry > >> > > >> > > >> > >> > > >
