Re: Scan a region in parallel

ramkrishna vasudevan Fri, 21 Oct 2016 01:57:01 -0700

Phoenix does support intelligent ways when you query using columns since it
is a SQL engine.


There the parallelism happens by using guideposts - those are fixed spaced
row keys stored in a seperate stats table. So when you do a query the
Phoenix internally spawns parallels scan queries using those guide posts
and thus making querying faster.

Regards
Ram

On Fri, Oct 21, 2016 at 1:26 PM, Anil <[email protected]> wrote:

> Thank you Ram.
>
> "So now  you are spawning those many scan threads equal to the number of
> regions " - YES
>
> There are two ways of scanning region in parallel
>
> 1. scan a region with start row and stop row in parallel with single scan
> operation on server side and hbase take care of parallelism internally.
> 2. transform a start row and stop row of a region into number of start and
> stop rows (by some criteria) and span scan query for each start and stop
> row.
>
> #1 is not supported (as you also said).
>
> i am looking for #2. i checked the phoenix documentation and code. it seems
> to me that phoenix is doing #2. i looked into phoenix code and could not
> understand it completely.
>
> The usecase is very simple. Hbase not good (at least in terms of
> performance for OLTP) query by all columns (other than row key) and sorting
> of all columns of a row. even phoenix too.
>
> So i am planning load the hbase/phoenix table into in-memory data base for
> faster access.
>
> scanning of big region sequentially will lead to larger load time. so
> finding ways to minimize the load time.
>
> Hope this helps.
>
> Thanks.
>
>
> On 21 October 2016 at 09:30, ramkrishna vasudevan <
> [email protected]> wrote:
>
> > Hi Anil
> >
> > So now  you are spawning those many scan threads equal to the number of
> > regions.
> > bq.Is there any way to scan a region in parallel ?
> > You mean with in a region you want to scan parallely? Which means that a
> > single query you want to split up into N number of small scans and read
> and
> > aggregate on the client side/server side?
> >
> > Currently you cannot do that. Once you set a start and stoprow the scan
> > will determine which region it belongs to and retrieves the data
> > sequentially in that region (it applies the filtering that you do during
> > the course of the scan).
> >
> > Have you tried Apache Phoenix?  Its a SQL wrapper over HBase and there
> you
> > could do parallel scans for a given SQL query if there are some guide
> posts
> > collected. Such things cannot be an integral part of HBase. But I fear
> as I
> > am not aware of your usecase we cannot suggest on this.
> >
> > REgards
> > Ram
> >
> >
> > On Fri, Oct 21, 2016 at 8:40 AM, Anil <[email protected]> wrote:
> >
> > > Any pointers ?
> > >
> > > On 20 October 2016 at 18:15, Anil <[email protected]> wrote:
> > >
> > > > HI,
> > > >
> > > > I am loading hbase table into an in-memory db to support filter,
> > ordering
> > > > and pagination.
> > > >
> > > > I am scanning region and inserting data into in-memory db. each
> region
> > > > scan is done in single thread so each region is scanned in parallel.
> > > >
> > > > Is there any way to scan a region in parallel ? any pointers would be
> > > > helpful.
> > > >
> > > > Thanks
> > > >
> > >
> >
>

Re: Scan a region in parallel

Reply via email to