That's part of it, the other part is to get the region demarcations. You can also just get the smallest and largest key of the table and pick other demarcations for your scans. Then your individual scans will likely cover multiple regions and regionservers.
Your threading model depends on your needs. If you interested in lowest latency you want to keep your regionservers busy for each query. What exactly that means depends on your setup. Maybe you split up the overall scan so that no more than N scans are active at any regionserver. If you're more interested in overall predictability, you might not want parallelize each scan too much. ----- Original Message ----- From: Sam Seigal <[email protected]> To: [email protected]; lars hofhansl <[email protected]> Cc: "[email protected]" <[email protected]> Sent: Wednesday, October 5, 2011 6:18 PM Subject: Re: Using Scans in parallel So the whole point of getting the region locations is to ensure that there is one thread per region server ? On Wed, Oct 5, 2011 at 4:42 PM, lars hofhansl <[email protected]> wrote: > Hi Sam, > > > There were some attempts to build this in. In the end I think the exact > patterns are different based on what one is trying to achieve. > Currently what you can do is getting all the region locations > (HTable.getRegionLocations). From the HRegionInfos you can > get the regions start and end keys. > Now you can issue parallel scan for as many regions as you want (by create a > Scan object with start and row set to the region's > start and end key). > You probably want to group the regions by regionserver and have one thread > per region server, or something. > > > -- Lars > ________________________________ > From: Sam Seigal <[email protected]> > To: [email protected] > Sent: Wednesday, October 5, 2011 4:29 PM > Subject: Using Scans in parallel > > Hi , > > Is there a known way to be able to do Scan's in parallel (in different > threads even) and then sort/combine the output ? > > For a row key like: > > prefix-event_type-event_id > prefix-event_type-event_id > > I want to declare two scan objects (for say event_id_type foo) > > Scan 1 => 0-foo > Scan 2 => 1-foo > > execute the scans in parallel (maybe even in different threads) and > then merge the results ? > > Thank you, > > Sam >
