Re: HBase region size

Stack Thu, 30 Jun 2011 23:23:52 -0700

On Wed, Jun 29, 2011 at 10:08 PM, Florin P <[email protected]> wrote:
>  We have the almost the same scenario as Aditya, but with some differences.
>  1. our files are documents in any format (xls, pdf, doc, html etc)
>  2. we are expecting to have more than 5 millions of these documents


This is not many docs.  Will your document set be steady-state once it hits 5M?

>  3. The size of them varies like this
>            70% from them have their length < 1MB
>            29% from them have their length between 1MB and 10 MB
>            1% from them have their length > 10MB (they can have also 100MB)

What David says above though Jack in his yfrog presentation today
talks of storing all images in hbase up to 5MB in size.

Karthick in his presentation at hadoop summit talked about how once
cells cross a certain size -- he didn't saw what the threshold was I
believe -- then only the metadata is stored in hbase and the content
goes to their "big stuff" system.

Try it I'd say.  If only a few instances of 100MB, HBase might be fine.


>  4. We have to index all these files
>  5. We have to extract some metadata from just a subset of them having as 
> input a client key

One time or ongoing?

St.Ack

Re: HBase region size

Reply via email to