I don't think it will work without exception in that case. These scanner Ids are generated from Random instance of HRegionServer. In case there is same scannerId then one will get a LeaseStillHeldException in the addScanner method?
Himanshu On Sun, Oct 9, 2011 at 3:53 PM, lars hofhansl <[email protected]> wrote: > How frequently does this happen? > I did notice a while ago in the code that scanner ids are drawn just from a > Random number generator. > > So in theory it would be possible that multiple concurrent scans draw the > same scanner id. > > Since these are longs, this is astronomically unlikely, though (picking the > same number of 2^64, just does not happen :) ). > > > > ________________________________ > From: Bryan Keller <[email protected]> > To: [email protected] > Sent: Sunday, October 9, 2011 2:40 PM > Subject: Re: Using Scans in parallel > > This is just scanning (reads). I'll need to do more testing to find a cause, > hopefully it is something with my test. > > On Oct 9, 2011, at 1:13 PM, lars hofhansl wrote: > >> Which version of HBase? >> Are there concurrent inserts? If so, do you see splits in the log files >> happening while you do the scanning? >> >> I am pretty sure this has nothing to do with concurrent scans. >> >> From: Bryan Keller <[email protected]> >> To: Bryan Keller <[email protected]> >> Cc: [email protected] >> Sent: Sunday, October 9, 2011 11:03 AM >> Subject: Re: Using Scans in parallel >> >> On further thought, it seems this might be a serious issue, as two unrelated >> processes within an application may be scanning the same table at the same >> time. >> >> On Oct 9, 2011, at 10:59 AM, Bryan Keller wrote: >> >> > I was not able to get consistent results using multiple scanners in >> > parallel on a table. I implemented a counter test that used 8 scanners in >> > parallel on a table with 2m rows with 2k+ columns each, and the results >> > were not consistent. There were no errors thrown, but the count was off by >> > as much as 2%. Using a single thread gave the same (correct) result every >> > run. >> > >> > I tried various approaches, such as creating an HTable and opening a >> > connection per thread, but I was not able to get stable results. I would >> > do some testing before using parallel scanners as described here. >> > >> > >> > On Oct 5, 2011, at 10:11 PM, lars hofhansl wrote: >> > >> >> That's part of it, the other part is to get the region demarcations. >> >> You can also just get the smallest and largest key of the table and pick >> >> other demarcations for your scans. Then your individual scans will likely >> >> cover multiple regions and regionservers. >> >> >> >> >> >> Your threading model depends on your needs. If you interested in lowest >> >> latency you want to keep your regionservers busy for each query. >> >> What exactly that means depends on your setup. Maybe you split up the >> >> overall scan so that no more than N scans are active at any regionserver. >> >> >> >> If you're more interested in overall predictability, you might not want >> >> parallelize each scan too much. >> >> >> >> >> >> >> >> ----- Original Message ----- >> >> From: Sam Seigal <[email protected]> >> >> To: [email protected]; lars hofhansl <[email protected]> >> >> Cc: "[email protected]" <[email protected]> >> >> Sent: Wednesday, October 5, 2011 6:18 PM >> >> Subject: Re: Using Scans in parallel >> >> >> >> So the whole point of getting the region locations is to ensure that >> >> there is one thread per region server ? >> >> >> >> >> >> On Wed, Oct 5, 2011 at 4:42 PM, lars hofhansl <[email protected]> wrote: >> >>> Hi Sam, >> >>> >> >>> >> >>> There were some attempts to build this in. In the end I think the exact >> >>> patterns are different based on what one is trying to achieve. >> >>> Currently what you can do is getting all the region locations >> >>> (HTable.getRegionLocations). From the HRegionInfos you can >> >>> get the regions start and end keys. >> >>> Now you can issue parallel scan for as many regions as you want (by >> >>> create a Scan object with start and row set to the region's >> >>> start and end key). >> >>> You probably want to group the regions by regionserver and have one >> >>> thread per region server, or something. >> >>> >> >>> >> >>> -- Lars >> >>> ________________________________ >> >>> From: Sam Seigal <[email protected]> >> >>> To: [email protected] >> >>> Sent: Wednesday, October 5, 2011 4:29 PM >> >>> Subject: Using Scans in parallel >> >>> >> >>> Hi , >> >>> >> >>> Is there a known way to be able to do Scan's in parallel (in different >> >>> threads even) and then sort/combine the output ? >> >>> >> >>> For a row key like: >> >>> >> >>> prefix-event_type-event_id >> >>> prefix-event_type-event_id >> >>> >> >>> I want to declare two scan objects (for say event_id_type foo) >> >>> >> >>> Scan 1 => 0-foo >> >>> Scan 2 => 1-foo >> >>> >> >>> execute the scans in parallel (maybe even in different threads) and >> >>> then merge the results ? >> >>> >> >>> Thank you, >> >>> >> >>> Sam >> >>> >> >> >> > >> >> >>
