Re: schema help

Sonal Goyal Thu, 25 Aug 2011 23:58:49 -0700

Hi Jimson,

Are you talking about hbase.regionserver.blockCacheHitRatio ?


http://hbase.apache.org/book/rs_metrics.html

Best Regards,
Sonal
Crux: Reporting for HBase <https://github.com/sonalgoyal/crux>
Nube Technologies <http://www.nubetech.co>

<http://in.linkedin.com/in/sonalgoyal>





On Fri, Aug 26, 2011 at 12:21 PM, Jimson K. James <
[email protected]> wrote:

> Hi Sonal,
>
> Nice references, thank you :)
> What I'm currently after is the data distribution in Hbase, Is there any
> hbase hit ratio measuring tool?
> Searching for some ways to get hit ratio per region, Is it possible?
>
> Thanks,
>
> -----Original Message-----
> From: Sonal Goyal [mailto:[email protected]]
> Sent: Friday, August 26, 2011 10:38 AM
> To: [email protected]
> Subject: Re: schema help
>
> Hi Jimson,
>
> Here are a few links that talk about the sorted architecture:
>
> http://wiki.apache.org/hadoop/Hbase/DataModel
> http://jimbojw.com/wiki/index.php?title=Understanding_Hbase_and_BigTable
>
> i think the original BigTable paper ought to have some details too, I am
> sorry I havent read it recently to quote with authority.
>
> Best Regards,
> Sonal
> Crux: Reporting for HBase <https://github.com/sonalgoyal/crux>
> Nube Technologies <http://www.nubetech.co>
>
> <http://in.linkedin.com/in/sonalgoyal>
>
>
>
>
>
> On Fri, Aug 26, 2011 at 9:04 AM, Jimson K. James
> <[email protected]
> > wrote:
>
> > Hi Ian,
> >
> > Can you just get me some reference to the key sorted architecture in
> > hbase?
> > Seems there is not much documentation out there.
> >
> >
> > -----Original Message-----
> > From: Ian Varley [mailto:[email protected]]
> > Sent: Thursday, August 25, 2011 8:33 PM
> > To: [email protected]
> > Subject: Re: schema help
> >
> > The rows don't need to be inserted in order; they're maintained in
> > key-sorted order on the disk based on the architecture of HBase, which
> > stores data sorted in memory and periodically flushes to immutable
> files
> > in HDFS (which are later compacted to make read access more
> efficient).
> > HBase keeps track of which physical files might contain a given key
> > range, and only reads the ones it needs to.
> >
> > To do a query through the java API, you could create a scanner with a
> > startrow that is the concatenation of your value for fieldA and the
> > start time, and an endrow that has the current time.
> >
> >
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html
> >
> > Ian
> >
> > On Aug 25, 2011, at 9:53 AM, Rita wrote:
> >
> > Thanks for your reponse.
> >
> > 30 million rows is the best case :-)
> >
> > Couple of questions about doing, [fieldA][time] as my key:
> >  Would I have to insert in order?
> >  If no, how would hbase know to stop scanning the entire table?
> >  How would a query actually look like, if my key was [fieldA time]?
> >
> > As a matter of fact, I can do 100% of my queries. I will leave the 5%
> > out of my project/schema.
> >
> >
> > On Thu, Aug 25, 2011 at 10:13 AM, Ian Varley
> > <[email protected]<mailto:[email protected]>> wrote:
> > Rita,
> >
> > There's no need to create separate tables here--the table is really
> just
> > a "namespace" for keys. A better option would probably be having one
> > table with "[fieldA][time]" (the two fields concatenated) as your row
> > key. Then, you can seek directly to the start of your records in
> > constant time, and then scan forward until you get to the end of the
> > data (linear time in the size of data you expect to get back).
> >
> > The downside of this is that for the 5% of your queries that aren't in
> > this form, you may have to do a full table scan. (Alternately, you
> could
> > also maintain secondary indexes that help you get the data back with
> > less than a full table scan; that would depend on the nature of the
> > queries).
> >
> > In general, a good rule of thumb when designing a schema in HBase is,
> > think first about how you'd ideally like to access the data. Then
> > structure the data to match that access pattern. (This is obviously
> not
> > ideal if you have lots of different access patterns, but then, that's
> > what relational databases are for. Most commercial relational DBs
> > wouldn't blink at doing analytical queries against 30 million rows.)
> >
> > Ian
> >
> > On Aug 25, 2011, at 9:03 AM, Rita wrote:
> >
> > Hello,
> >
> > I am trying to solve a time related problem. I can certainly use
> > opentsdb
> > for this but was wondering if anyone had a clever way to create this
> > type of
> > schema.
> >
> > I have an inventory table,
> >
> > time (unix epoch), fieldA, fieldB, data
> >
> >
> > There are about 30 million of these entries.
> >
> > 95% of my queries will look like this:
> > show me where fieldA=zCORE from range [1314180693 to now]
> >
> > for fieldA, there is a possibility of 4000 unique items.
> > for fieldB, there is a possibility of 2 unique items (bool).
> >
> > So, I was thinking of creating 4000*2 tables and place the data like
> > that so
> > I can easly scan.
> >
> > Any thoughts about this? Will hbase freak out if i have 8000 tables?
> >
> >
> >
> >
> >
> >
> > --
> > --- Get your facts first, then you can distort them as you please.--
> >
> >
> >
> >
> > --
> > --- Get your facts first, then you can distort them as you please.--
> >
> > ***** Confidentiality Statement/Disclaimer *****
> >
> > This message and any attachments is intended for the sole use of the
> > intended recipient. It may contain confidential information. Any
> > unauthorized use, dissemination or modification is strictly
> prohibited. If
> > you are not the intended recipient, please notify the sender
> immediately
> > then delete it from all your systems, and do not copy, use or print.
> > Internet communications are not secure and it is the responsibility of
> the
> > recipient to make sure that it is virus/malicious code exempt.
> > The company/sender cannot be responsible for any unauthorized
> alterations
> > or modifications made to the contents. If you require any form of
> > confirmation of the contents, please contact the company/sender. The
> > company/sender is not liable for any errors or omissions in the
> content of
> > this message.
> >
> ***** Confidentiality Statement/Disclaimer *****
>
> This message and any attachments is intended for the sole use of the
> intended recipient. It may contain confidential information. Any
> unauthorized use, dissemination or modification is strictly prohibited. If
> you are not the intended recipient, please notify the sender immediately
> then delete it from all your systems, and do not copy, use or print.
> Internet communications are not secure and it is the responsibility of the
> recipient to make sure that it is virus/malicious code exempt.
> The company/sender cannot be responsible for any unauthorized alterations
> or modifications made to the contents. If you require any form of
> confirmation of the contents, please contact the company/sender. The
> company/sender is not liable for any errors or omissions in the content of
> this message.
>

Re: schema help

Reply via email to