On Mon, Jun 6, 2011 at 7:57 AM, Mat Hofschen <[email protected]> wrote:
> Hello,
>
> the hbase book (http://hbase.apache.org/book.html) suggests to increase
> hbase.hregion.max.filesize to a large value. (> 1G)

Yes.  If you want to cut down on the number of regions in a table,
this is a good recommendation to follow.

> Then there are many
> suggestions on mailing list to keep the dfs.block.size set at 64M. What is
> the relationship between the two values? How does hbase prevent lots of
> network traffic if there are up to 18 dfs blocks per Region.
>

Going by the above, it doesn't look like you need me to tell you what
the relationship is; you know it already.

HBase does nothing to cut down on the network traffic (Wouldn't the
traffic effectively be the same whatever the block size?).

If bigger block sizes, there'd be less socket setups which would
probably be good and there'd be less accounting for the NN to do.

Most clusters run the default 64MB block sizes (though where I work,
we are 128MBers).

(Andrew, did you report a performance penalty running with bigger blocks?)


> Right now we operate a cluster with 50 nodes, dfs.block.size=64M and
> hbase.hregion.max.filesize=134M. We have one large table that has over 50000
> regions.

So you are running about 1k regions per server?  It'd probably be
better to up your region size, yes (Less regions for the cluster to
keep account of is usually a 'good' thing).

> Thats seems way to many. So according to the hbase book we ought to
> be able to increase the hbase.hregion.max.filesize=1G and benefit from much
> fewer splits. Our cluster is very heavy on writing, that means we currently
> are splitting all the time.
>

Yeah.  Less splits is usually also a 'good' thing.  If write heavy and
good distribution of writes, then yes, you should go to bigger
regions.

St.Ack

Reply via email to