Thanks! tsuna. I'm having multiple scanners operating on same table  and
rows are getting inserted at every moment (lock contention is possible).

Moreover, this is in dev phase and I'm running in pseudo distributed mode
with plenty of RAM (16GB)

>The current RPC protocol forces you to make this call.  You can't seek
> >back with a scanner.  When you move forward, the only way to go back
> > is to close the scanner and open a new one again.
>

Stack, I want to reset the scanner to initial position after a complete full
scan, this avoids opening/closing a new one. Yes, I have seen that link.
Thanks for that pointing again.

In a nutshell, I want to:
while(1)
{
   scan_all_rows_in_tableT_colFamA
   {

     do_something_on_each_row_to_find_a_tagX;

       //right-now opening a new scanner each time for below loop
       //getScanner(tableT, colFamB)

        scan_all_rows_in_tableT_which_has_colFamB:tagX;
        {
          //do_something
        }
       //closing scanner
   }
}

Since, I am operating only on a single tableT for this logic, I want to try
without MR jobs.
But opening and closing the scanner inside this nested loop is taking
mulitple seconds to complete on just 3000 rows :(

-Thanks,
Dani.

On Mon, Jan 24, 2011 at 11:30 PM, tsuna <[email protected]> wrote:

> On Mon, Jan 24, 2011 at 7:26 PM, Dani Rayan <[email protected]> wrote:
> >  ResultScanner refscanner = table.getScanner(Bytes.toBytes("ColA")); //
> > Looks expensive.
>
> > The getscanner operation looks expensive. Am I m(i,e)ssing something ?
>
> This shouldn't be expensive.  What happens under the hood is that the
> client makes an "openScanner" RPC call to the RegionServer, to which
> the RS responds with a scanner ID.  The state of the scanner is stored
> in the RS.
>
> The current RPC protocol forces you to make this call.  You can't seek
> back with a scanner.  When you move forward, the only way to go back
> is to close the scanner and open a new one again.
>
> Opening a scanner shouldn't take long, we're talking about
> milliseconds (I'm seeing ~2ms in one of our production clusters at
> StumbleUpon).  Are your RegionServers very busy?  Have you seen
> anything that might look like excessive GCing or lock contention?
>
> --
> Benoit "tsuna" Sigoure
> Software Engineer @ www.StumbleUpon.com
>

Reply via email to