Re: Scan a region in parallel

Anil Fri, 21 Oct 2016 02:06:07 -0700

Thank you Ram. Now its clear. i will take a look at it.

Thanks again.


On 21 October 2016 at 14:25, ramkrishna vasudevan <
[email protected]> wrote:

> Phoenix does support intelligent ways when you query using columns since it
> is a SQL engine.
>
> There the parallelism happens by using guideposts - those are fixed spaced
> row keys stored in a seperate stats table. So when you do a query the
> Phoenix internally spawns parallels scan queries using those guide posts
> and thus making querying faster.
>
> Regards
> Ram
>
> On Fri, Oct 21, 2016 at 1:26 PM, Anil <[email protected]> wrote:
>
> > Thank you Ram.
> >
> > "So now  you are spawning those many scan threads equal to the number of
> > regions " - YES
> >
> > There are two ways of scanning region in parallel
> >
> > 1. scan a region with start row and stop row in parallel with single scan
> > operation on server side and hbase take care of parallelism internally.
> > 2. transform a start row and stop row of a region into number of start
> and
> > stop rows (by some criteria) and span scan query for each start and stop
> > row.
> >
> > #1 is not supported (as you also said).
> >
> > i am looking for #2. i checked the phoenix documentation and code. it
> seems
> > to me that phoenix is doing #2. i looked into phoenix code and could not
> > understand it completely.
> >
> > The usecase is very simple. Hbase not good (at least in terms of
> > performance for OLTP) query by all columns (other than row key) and
> sorting
> > of all columns of a row. even phoenix too.
> >
> > So i am planning load the hbase/phoenix table into in-memory data base
> for
> > faster access.
> >
> > scanning of big region sequentially will lead to larger load time. so
> > finding ways to minimize the load time.
> >
> > Hope this helps.
> >
> > Thanks.
> >
> >
> > On 21 October 2016 at 09:30, ramkrishna vasudevan <
> > [email protected]> wrote:
> >
> > > Hi Anil
> > >
> > > So now  you are spawning those many scan threads equal to the number of
> > > regions.
> > > bq.Is there any way to scan a region in parallel ?
> > > You mean with in a region you want to scan parallely? Which means that
> a
> > > single query you want to split up into N number of small scans and read
> > and
> > > aggregate on the client side/server side?
> > >
> > > Currently you cannot do that. Once you set a start and stoprow the scan
> > > will determine which region it belongs to and retrieves the data
> > > sequentially in that region (it applies the filtering that you do
> during
> > > the course of the scan).
> > >
> > > Have you tried Apache Phoenix?  Its a SQL wrapper over HBase and there
> > you
> > > could do parallel scans for a given SQL query if there are some guide
> > posts
> > > collected. Such things cannot be an integral part of HBase. But I fear
> > as I
> > > am not aware of your usecase we cannot suggest on this.
> > >
> > > REgards
> > > Ram
> > >
> > >
> > > On Fri, Oct 21, 2016 at 8:40 AM, Anil <[email protected]> wrote:
> > >
> > > > Any pointers ?
> > > >
> > > > On 20 October 2016 at 18:15, Anil <[email protected]> wrote:
> > > >
> > > > > HI,
> > > > >
> > > > > I am loading hbase table into an in-memory db to support filter,
> > > ordering
> > > > > and pagination.
> > > > >
> > > > > I am scanning region and inserting data into in-memory db. each
> > region
> > > > > scan is done in single thread so each region is scanned in
> parallel.
> > > > >
> > > > > Is there any way to scan a region in parallel ? any pointers would
> be
> > > > > helpful.
> > > > >
> > > > > Thanks
> > > > >
> > > >
> > >
> >
>

Re: Scan a region in parallel

Reply via email to