Re: Best technique for doing lookup with Secondary Index

fding hbase Fri, 26 Oct 2012 03:14:45 -0700

https://github.com/danix800/hbase-indexed


On Fri, Oct 26, 2012 at 4:13 PM, Ramkrishna.S.Vasudevan <
[email protected]> wrote:

> > AFAIK, RPC cannot be avoided even if Region A and Region B are on same
> > RS
> > since these two regions are from different table. Am i right?
>
> No... suppose your Region A and Region B of different tables are collocated
> on same RS then from the coprocessor environment variable you can get
> access
> to the RS.
> From RS you can get the online regions and from that region object you can
> call puts or gets.  This will not involve any RPC with in that RS because
> we
> only deal with Region objects.
>
> Regards
> Ram
>
> > -----Original Message-----
> > From: anil gupta [mailto:[email protected]]
> > Sent: Friday, October 26, 2012 12:17 PM
> > To: [email protected]
> > Subject: Re: Best technique for doing lookup with Secondary Index
> >
> > >
> > > Now your main question is lookups right
> > > Now there are some more hooks in the scan flow called
> > pre/postScannerOpen,
> > > pre/postScannerNext.
> > > May be you can try using them to do a look up on the secondary table
> > and
> > > then use those values and pass it to the main table next().
> > >
> >
> > In secondary index its hard to avoid at-least two RPC calls(1 from
> > client
> > to table B and then from table B to Table A) whether you use coproc or
> > not.
> > But, i believe using coproc is better than doing RPC calls from client
> > since it might be outside the subnet/network of cluster. In this case,
> > the
> > RPC will be faster when we use coprocs. In my case the client is
> > certainly
> > not in the same subnet or network zone. I need to provide results of
> > query
> > in around 100 milliseconds or less so i need to be really frugal. Let
> > me
> > know your views on this.
> >
> > Have you implemented queries with Secondary indexes using coproc yet?
> > At present i have tried the client side query and i can get the results
> > of
> > query in around 100 ms. I am enticed to try out the coproc
> > implementation.
> >
> > But this may involve more RPC calls as your regions of "A" and "B" may
> > be in
> > > different RS.
> > >
> > AFAIK, RPC cannot be avoided even if Region A and Region B are on same
> > RS
> > since these two regions are from different table. Am i right?
> >
> >
> > Thanks,
> > Anil Gupta
> >
> > On Thu, Oct 25, 2012 at 9:20 PM, Ramkrishna.S.Vasudevan <
> > [email protected]> wrote:
> >
> > > > Is it a
> > > > good idea to create Htable instance on "B" and do put in my mapper?
> > I
> > > > might
> > > > try this idea.
> > > Yes you can do this..  May be the same mapper you can do a put for
> > table
> > > "B".  This was how we have tried loading data to another table by
> > using the
> > > main table "A"
> > > Puts.
> > >
> > > Now your main question is lookups right
> > > Now there are some more hooks in the scan flow called
> > pre/postScannerOpen,
> > > pre/postScannerNext.
> > > May be you can try using them to do a look up on the secondary table
> > and
> > > then use those values and pass it to the main table next().
> > > But this may involve more RPC calls as your regions of "A" and "B"
> > may be
> > > in
> > > different RS.
> > >
> > > If something is wrong in my understanding of what you said, kindly
> > spare
> > > me.
> > > :)
> > >
> > > Regards
> > > Ram
> > >
> > >
> > > > -----Original Message-----
> > > > From: anil gupta [mailto:[email protected]]
> > > > Sent: Friday, October 26, 2012 3:40 AM
> > > > To: [email protected]
> > > > Subject: Re: Best technique for doing lookup with Secondary Index
> > > >
> > > > Anoop:  In prePut hook u call HTable#put()?
> > > > Anil: Yes i call HTable#put() in prePut. Is there better way of
> > doing
> > > > it?
> > > >
> > > > Anoop: Why use the network calls from server side here then?
> > > > Anil: I thought this is a cleaner approach since i am using
> > BulkLoader.
> > > > I
> > > > decided not to run two jobs since i am generating a
> > UniqueIdentifier at
> > > > runtime in bulkloader.
> > > >
> > > > Anoop: can not handle it from client alone?
> > > > Anil: I cannot handle it from client since i am using BulkLoader.
> > Is it
> > > > a
> > > > good idea to create Htable instance on "B" and do put in my mapper?
> > I
> > > > might
> > > > try this idea.
> > > >
> > > > Anoop: You can have a look at Lily project.
> > > > Anil: It's little late for us to evaluate Lily now and at present
> > we
> > > > dont
> > > > need complex secondary index since our data is immutable.
> > > >
> > > > Ram: what is rowkey B here?
> > > > Anil: Suppose i am storing customer events in table A. I have two
> > > > requirement for data query:
> > > > 1. Query customer events on basis of customer_Id and event_ID.
> > > > 2. Query customer events on basis of event_timestamp and
> > customer_ID.
> > > >
> > > > 70% of querying is done by query#1, so i will create
> > > > <customer_Id><event_ID> as row key of Table A.
> > > > Now, in order to support fast results for query#2, i need to create
> > a
> > > > secondary index on A. I store that secondary index in B, rowkey of
> > B is
> > > > <event_timestamp><customer_ID>  .Every row stores the corresponding
> > > > rowkey
> > > > of A.
> > > >
> > > > Ram:How is the startRow determined for every query?
> > > > Anil: Its determined by a very simple application logic.
> > > >
> > > > Thanks,
> > > > Anil Gupta
> > > >
> > > > On Wed, Oct 24, 2012 at 10:16 PM, Ramkrishna.S.Vasudevan <
> > > > [email protected]> wrote:
> > > >
> > > > > Just out of curiosity,
> > > > > > The secondary index is stored in table "B" as rowkey B -->
> > > > > > family:<rowkey
> > > > > > A>
> > > > > what is rowkey B here?
> > > > > > 1. Scan the secondary table by using prefix filter and
> > startRow.
> > > > > How is the startRow determined for every query ?
> > > > >
> > > > > Regards
> > > > > Ram
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Anoop Sam John [mailto:[email protected]]
> > > > > > Sent: Thursday, October 25, 2012 10:15 AM
> > > > > > To: [email protected]
> > > > > > Subject: RE: Best technique for doing lookup with Secondary
> > Index
> > > > > >
> > > > > > >I build the secondary table "B" using a prePut RegionObserver.
> > > > > >
> > > > > > Anil,
> > > > > >        In prePut hook u call HTable#put()?  Why use the network
> > > > calls
> > > > > > from server side here then? can not handle it from client
> > alone?
> > > > You
> > > > > > can have a look at Lily project.   Thoughts after seeing ur
> > idea on
> > > > put
> > > > > > and scan..
> > > > > >
> > > > > > -Anoop-
> > > > > > ________________________________________
> > > > > > From: anil gupta [[email protected]]
> > > > > > Sent: Thursday, October 25, 2012 3:10 AM
> > > > > > To: [email protected]
> > > > > > Subject: Best technique for doing lookup with Secondary Index
> > > > > >
> > > > > > Hi All,
> > > > > >
> > > > > > I am using HBase 0.92.1. I have created a secondary index on
> > table
> > > > "A".
> > > > > > Table A stores immutable data. I build the secondary table "B"
> > > > using a
> > > > > > prePut RegionObserver.
> > > > > >
> > > > > > The secondary index is stored in table "B" as rowkey B -->
> > > > > > family:<rowkey
> > > > > > A>  . "<rowkey A>" is the column qualifier. Every row in B will
> > > > only on
> > > > > > have one column and the name of that column is the rowkey of A.
> > So
> > > > the
> > > > > > value is blank. As per my understanding, accessing column
> > qualifier
> > > > is
> > > > > > faster than accessing value. Please correct me if i am wrong.
> > > > > >
> > > > > >
> > > > > > HBase Querying approach:
> > > > > > 1. Scan the secondary table by using prefix filter and
> > startRow.
> > > > > > 2. Do a batch get on primary table by using
> > HTable.get(List<Get>)
> > > > > > method.
> > > > > >
> > > > > > The above approach for retrieval works fine but i was wondering
> > it
> > > > > > there is
> > > > > > a better approach. I was planning to try out doing the
> > retrieval
> > > > using
> > > > > > coprocessors.
> > > > > > Have anyone tried using coprocessors? I would appreciate if
> > others
> > > > can
> > > > > > share their experience with secondary index for HBase queries.
> > > > > >
> > > > > > --
> > > > > > Thanks & Regards,
> > > > > > Anil Gupta=
> > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Thanks & Regards,
> > > > Anil Gupta
> > >
> > >
> >
> >
> > --
> > Thanks & Regards,
> > Anil Gupta
>
>


-- 

Best Regards!

Fei Ding
[email protected]

Re: Best technique for doing lookup with Secondary Index

Reply via email to