I assume you have read http://hbase.apache.org/book.html#schema.casestudies (See 6.11.3)
What's the size of data that is not A or B's uniqueIds ? The answer is related to the amount of data redundancy that you are comfortable with in your design. Cheers On Wed, Dec 3, 2014 at 12:31 PM, Marc Sturm <[email protected]> wrote: > Hi, > > I have a many to many relationship that I am trying to model in hbase, and > I want to be sure I am not missing anything so please let me know or point > to the right documentation. > > Let's say I have an A to B many to many relationship, the query parameter > takes A unique id and returns all the B uniqueids related to A with their > properties and values. > > The first solution I found is having two tables: one with the rowKey equal > to A's unique id, the table column identifiers are equal to B's unique ids > related to A, the second table has its rowKeys equal to B unique ids and > its columns contain the property values. So the query is two steps, it > first does a get on A to collect all the B uniqueIds and then does a second > get on the B passing as a parameter an array of B rowkeys. When I run the > second query, I can get a latency much longer on the first query and then > good low latency on subsequent queries with same parameter. I believe > that's a caching issue... > > The second solution is having one table with a composite rowkey equal to A > uniqueid + B uniqueid, I will then have duplicate B uniqueid rows. But when > I do a scan on the just the first part of the rowKey (A uniqueid) the > response time and latency is more consistent and better (smaller). > > So, my questions are threefold: 1) which way is the best, 2) what is the > performance difference between a scan and a get with multiple rowkeys (I > think scan is faster because the data is not or less "distributed") and 3) > how can we make the get with multiple rowkeys more consistent? > > Thank you for your help, > Marc > > This electronic message is intended to be for the use only of the named > recipient, and may contain information that is confidential or privileged. > If you are not the intended recipient, you are hereby notified that any > disclosure, copying, distribution or use of the contents of this message is > strictly prohibited. If you have received this message in error or are not > the named recipient, please notify us immediately by contacting the sender > at the electronic mail address noted above, and delete and destroy all > copies of this message. Thank you.
