@Mohit: Here is the jira for prefix compression discussed here: https://issues.apache.org/jira/browse/HBASE-4676
HTH, Anil Gupta On Sun, Jan 6, 2013 at 12:40 PM, Adrien Mogenet <[email protected]>wrote: > Are your talking about Data block encoding of K/V ? > https://issues.apache.org/jira/browse/HBASE-4218 > > > On Sun, Jan 6, 2013 at 9:36 PM, Mohit Anchlia <[email protected] > >wrote: > > > Does anyone has any links or information to the new prefix encoding > feature > > in HBase that's being referred to in this mail? > > > > On Sun, Jan 6, 2013 at 12:30 PM, Adrien Mogenet < > [email protected] > > >wrote: > > > > > Nice topic, perhaps one of the most important for 2013 :-) > > > I still don't get how you're ensuring consistency between index table > and > > > main table, without an external component (such as > bookkeeper/zookeeper). > > > What's the exact write path in your situation when inserting data ? > > > (WAL/RegionObserver, pre/post put/WALedit...) > > > > > > The underlying question is about how you're ensuring that WALEdit in > > Index > > > and Main tables are perfectly sync'ed, and how you 're able to rollback > > in > > > case of issue in both WAL ? > > > > > > > > > On Fri, Dec 28, 2012 at 11:55 AM, Shengjie Min <[email protected]> > > > wrote: > > > > > > > >Yes as you say when the no of rows to be returned is becoming more > and > > > > more the latency will be becoming more. seeks within an HFile block > is > > > > some what expensive op now. (Not much but still) The new encoding > > > >prefix > > > > trie will be a huge bonus here. There the seeks will be flying.. [Ted > > > also > > > > presented this in the Hadoop China] Thanks to Matt... :) I am > trying > > to > > > > measure the scan performance with this new encoding . Trying to >back > > > port > > > > a simple patch for 94 version just for testing... Yes when the no > of > > > > results to be returned is more and more any index will become less > > > > performing as per my study :) > > > > > > > > yes, you are right, I guess it's just a drawback of any index > approach. > > > > Thanks for the explanation. > > > > > > > > Shengjie > > > > > > > > On 28 December 2012 04:14, Anoop Sam John <[email protected]> > wrote: > > > > > > > > > > Do you have link to that presentation? > > > > > > > > > > http://hbtc2012.hadooper.cn/subject/track4TedYu4.pdf > > > > > > > > > > -Anoop- > > > > > > > > > > ________________________________________ > > > > > From: Mohit Anchlia [[email protected]] > > > > > Sent: Friday, December 28, 2012 9:12 AM > > > > > To: [email protected] > > > > > Subject: Re: HBase - Secondary Index > > > > > > > > > > On Thu, Dec 27, 2012 at 7:33 PM, Anoop Sam John < > [email protected]> > > > > > wrote: > > > > > > > > > > > Yes as you say when the no of rows to be returned is becoming > more > > > and > > > > > > more the latency will be becoming more. seeks within an HFile > > block > > > is > > > > > > some what expensive op now. (Not much but still) The new > encoding > > > > prefix > > > > > > trie will be a huge bonus here. There the seeks will be flying.. > > [Ted > > > > > also > > > > > > presented this in the Hadoop China] Thanks to Matt... :) I am > > > trying > > > > to > > > > > > measure the scan performance with this new encoding . Trying to > > back > > > > > port a > > > > > > simple patch for 94 version just for testing... Yes when the no > > of > > > > > > results to be returned is more and more any index will become > less > > > > > > performing as per my study :) > > > > > > > > > > > > Do you have link to that presentation? > > > > > > > > > > > > > > > > >btw, quick question- in your presentation, the scale there is > > > seconds > > > > or > > > > > > mill-seconds:) > > > > > > > > > > > > It is seconds. Dont consider the exact values. What is the % of > > > > increase > > > > > > in latency is important :) Those were not high end machines. > > > > > > > > > > > > -Anoop- > > > > > > ________________________________________ > > > > > > From: Shengjie Min [[email protected]] > > > > > > Sent: Thursday, December 27, 2012 9:59 PM > > > > > > To: [email protected] > > > > > > Subject: Re: HBase - Secondary Index > > > > > > > > > > > > >Didnt follow u completely here. There wont be any get() > > happening.. > > > > As > > > > > > the > > > > > > >exact rowkey in a region we get from the index table, we can > seek > > to > > > > the > > > > > > >exact position and return that row. > > > > > > > > > > > > Sorry, When I misused "get()" here, I meant seeking. Yes, if it's > > > just > > > > > > small number of rows returned, this works perfect. As you said > you > > > will > > > > > get > > > > > > the exact rowkey positions per region, and simply seek them. I > was > > > > trying > > > > > > to work out the case that when the number of result rows > increases > > > > > > massively. Like in Anil's case, he wants to do a scan query > against > > > the > > > > > > 2ndary index(timestamp): "select all rows from timestamp1 to > > > > timestamp2" > > > > > > given no customerId provided. During that time period, he might > > have > > > a > > > > > big > > > > > > chunk of rows from different customerIds. The index table > returns a > > > lot > > > > > of > > > > > > rowkey positions for different customerIds (I believe they are > > > > scattered > > > > > in > > > > > > different regions), then you end up seeking all different > positions > > > in > > > > > > different regions and return all the rows needed. According to > your > > > > > > presentation page14 - Performance Test Results (Scan), without > > index, > > > > > it's > > > > > > a linear increase as result rows # increases. on the other hand, > > with > > > > > > index, time spent climbs up way quicker than the case without > > index. > > > > > > > > > > > > btw, quick question- in your presentation, the scale there is > > seconds > > > > or > > > > > > mill-seconds:) > > > > > > > > > > > > - Shengjie > > > > > > > > > > > > > > > > > > On 27 December 2012 15:54, Anoop John <[email protected]> > > wrote: > > > > > > > > > > > > > >how the massive number of get() is going to > > > > > > > perform againt the main table > > > > > > > > > > > > > > Didnt follow u completely here. There wont be any get() > > happening.. > > > > As > > > > > > the > > > > > > > exact rowkey in a region we get from the index table, we can > seek > > > to > > > > > the > > > > > > > exact position and return that row. > > > > > > > > > > > > > > -Anoop- > > > > > > > > > > > > > > On Thu, Dec 27, 2012 at 6:37 PM, Shengjie Min < > > > [email protected]> > > > > > > > wrote: > > > > > > > > > > > > > > > how the massive number of get() is going to > > > > > > > > perform againt the main table > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > All the best, > > > > > > Shengjie Min > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > All the best, > > > > Shengjie Min > > > > > > > > > > > > > > > > -- > > > Adrien Mogenet > > > 06.59.16.64.22 > > > http://www.mogenet.me > > > > > > > > > -- > Adrien Mogenet > 06.59.16.64.22 > http://www.mogenet.me > -- Thanks & Regards, Anil Gupta
