Hi Jimson, Are you talking about hbase.regionserver.blockCacheHitRatio ?
http://hbase.apache.org/book/rs_metrics.html Best Regards, Sonal Crux: Reporting for HBase <https://github.com/sonalgoyal/crux> Nube Technologies <http://www.nubetech.co> <http://in.linkedin.com/in/sonalgoyal> On Fri, Aug 26, 2011 at 12:21 PM, Jimson K. James < [email protected]> wrote: > Hi Sonal, > > Nice references, thank you :) > What I'm currently after is the data distribution in Hbase, Is there any > hbase hit ratio measuring tool? > Searching for some ways to get hit ratio per region, Is it possible? > > Thanks, > > -----Original Message----- > From: Sonal Goyal [mailto:[email protected]] > Sent: Friday, August 26, 2011 10:38 AM > To: [email protected] > Subject: Re: schema help > > Hi Jimson, > > Here are a few links that talk about the sorted architecture: > > http://wiki.apache.org/hadoop/Hbase/DataModel > http://jimbojw.com/wiki/index.php?title=Understanding_Hbase_and_BigTable > > i think the original BigTable paper ought to have some details too, I am > sorry I havent read it recently to quote with authority. > > Best Regards, > Sonal > Crux: Reporting for HBase <https://github.com/sonalgoyal/crux> > Nube Technologies <http://www.nubetech.co> > > <http://in.linkedin.com/in/sonalgoyal> > > > > > > On Fri, Aug 26, 2011 at 9:04 AM, Jimson K. James > <[email protected] > > wrote: > > > Hi Ian, > > > > Can you just get me some reference to the key sorted architecture in > > hbase? > > Seems there is not much documentation out there. > > > > > > -----Original Message----- > > From: Ian Varley [mailto:[email protected]] > > Sent: Thursday, August 25, 2011 8:33 PM > > To: [email protected] > > Subject: Re: schema help > > > > The rows don't need to be inserted in order; they're maintained in > > key-sorted order on the disk based on the architecture of HBase, which > > stores data sorted in memory and periodically flushes to immutable > files > > in HDFS (which are later compacted to make read access more > efficient). > > HBase keeps track of which physical files might contain a given key > > range, and only reads the ones it needs to. > > > > To do a query through the java API, you could create a scanner with a > > startrow that is the concatenation of your value for fieldA and the > > start time, and an endrow that has the current time. > > > > > http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html > > > > Ian > > > > On Aug 25, 2011, at 9:53 AM, Rita wrote: > > > > Thanks for your reponse. > > > > 30 million rows is the best case :-) > > > > Couple of questions about doing, [fieldA][time] as my key: > > Would I have to insert in order? > > If no, how would hbase know to stop scanning the entire table? > > How would a query actually look like, if my key was [fieldA time]? > > > > As a matter of fact, I can do 100% of my queries. I will leave the 5% > > out of my project/schema. > > > > > > On Thu, Aug 25, 2011 at 10:13 AM, Ian Varley > > <[email protected]<mailto:[email protected]>> wrote: > > Rita, > > > > There's no need to create separate tables here--the table is really > just > > a "namespace" for keys. A better option would probably be having one > > table with "[fieldA][time]" (the two fields concatenated) as your row > > key. Then, you can seek directly to the start of your records in > > constant time, and then scan forward until you get to the end of the > > data (linear time in the size of data you expect to get back). > > > > The downside of this is that for the 5% of your queries that aren't in > > this form, you may have to do a full table scan. (Alternately, you > could > > also maintain secondary indexes that help you get the data back with > > less than a full table scan; that would depend on the nature of the > > queries). > > > > In general, a good rule of thumb when designing a schema in HBase is, > > think first about how you'd ideally like to access the data. Then > > structure the data to match that access pattern. (This is obviously > not > > ideal if you have lots of different access patterns, but then, that's > > what relational databases are for. Most commercial relational DBs > > wouldn't blink at doing analytical queries against 30 million rows.) > > > > Ian > > > > On Aug 25, 2011, at 9:03 AM, Rita wrote: > > > > Hello, > > > > I am trying to solve a time related problem. I can certainly use > > opentsdb > > for this but was wondering if anyone had a clever way to create this > > type of > > schema. > > > > I have an inventory table, > > > > time (unix epoch), fieldA, fieldB, data > > > > > > There are about 30 million of these entries. > > > > 95% of my queries will look like this: > > show me where fieldA=zCORE from range [1314180693 to now] > > > > for fieldA, there is a possibility of 4000 unique items. > > for fieldB, there is a possibility of 2 unique items (bool). > > > > So, I was thinking of creating 4000*2 tables and place the data like > > that so > > I can easly scan. > > > > Any thoughts about this? Will hbase freak out if i have 8000 tables? > > > > > > > > > > > > > > -- > > --- Get your facts first, then you can distort them as you please.-- > > > > > > > > > > -- > > --- Get your facts first, then you can distort them as you please.-- > > > > ***** Confidentiality Statement/Disclaimer ***** > > > > This message and any attachments is intended for the sole use of the > > intended recipient. It may contain confidential information. Any > > unauthorized use, dissemination or modification is strictly > prohibited. If > > you are not the intended recipient, please notify the sender > immediately > > then delete it from all your systems, and do not copy, use or print. > > Internet communications are not secure and it is the responsibility of > the > > recipient to make sure that it is virus/malicious code exempt. > > The company/sender cannot be responsible for any unauthorized > alterations > > or modifications made to the contents. If you require any form of > > confirmation of the contents, please contact the company/sender. The > > company/sender is not liable for any errors or omissions in the > content of > > this message. > > > ***** Confidentiality Statement/Disclaimer ***** > > This message and any attachments is intended for the sole use of the > intended recipient. It may contain confidential information. Any > unauthorized use, dissemination or modification is strictly prohibited. If > you are not the intended recipient, please notify the sender immediately > then delete it from all your systems, and do not copy, use or print. > Internet communications are not secure and it is the responsibility of the > recipient to make sure that it is virus/malicious code exempt. > The company/sender cannot be responsible for any unauthorized alterations > or modifications made to the contents. If you require any form of > confirmation of the contents, please contact the company/sender. The > company/sender is not liable for any errors or omissions in the content of > this message. >
