Yes, I am thinking to put 10 to 15 million files on each regionserver
(well, not literally, but be controlled by regionserver).   So thats
close to 4 TB worth of regions, which is about 4GB per region should
we target 1000 regions per server.  Note, not all files are 'hot', and
I only expect to keep about 1% super hot, and 5% relatively hot, the
rest are cold.  So in terms of keeping hbase blocks in RAM, that
should be adequate, for the rest we can afford a trip into hdfs.

If servers are running 8 GB of ram, and are shared for regionservers
and datanodes, how much heap should I allocate to each?  6GB for RS
and 1GB  for DN?

Also, on the question whether 8 core x 16G Ram helps a Master server
to bring up the cluster faster, the answer is definitely - yes.   It
took only 90 seconds to load 5000 regions across 13 servers, where
same task for Dual Core 8G Ram, took nearly 10 minutes.

-Jack



On Tue, Sep 21, 2010 at 11:38 PM, Stack <[email protected]> wrote:
> On Tue, Sep 21, 2010 at 11:11 PM, Jack Levin <[email protected]> wrote:
>> Its definitely binary, and I can even load it in my browser but
>> setting appropriate headers.  So I guess for PUT and GET via Accept:
>> application/octet-stream there is no base64 encoding at all.
>>
>
> OK.  Good.  If it were base64'd, you'd see it.
>
>> Btw, out of curiosity I have region max file size set to 1GB now, but
>> what if I set it to say 10G or 50G?  Is their significant overhead in
>> address seeking via HDFS?
>>
>
> You could do that.  We don't have much experience running regions of
> that size.  You should for sure pre-split your table on creation if
> you go this route (See HBaseAdmin API [1]).  This method is not
> available in shell so you'd have to script it or write a little java
> to do it).
>
> St.Ack
>
> 1. 
> http://hbase.apache.org/docs/r0.89.20100726/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#createTable(org.apache.hadoop.hbase.HTableDescriptor,
> byte[][])
>

Reply via email to