Re: Performance at large number of regions/node

Vidhyashankar Venkataraman Fri, 28 May 2010 10:13:38 -0700

I am not sure if I understood this right, but does changing 
hfile.block.cache.size also help?

On 5/27/10 3:27 PM, "Jean-Daniel Cryans" <jdcry...@apache.org> wrote:

Well we do have a couple of other configs for high write throughput:

<property>
  <name>hbase.hstore.blockingStoreFiles</name>
  <value>15</value>
</property>
<property>
  <name>hbase.hregion.memstore.block.multiplier</name>
  <value>8</value>
</property>
<property>
  <name>hbase.regionserver.handler.count</name>
  <value>60</value>
</property>
<property>
  <name>hbase.regions.percheckin</name>
  <value>100</value>
</property>

The last one is for restarts. Uploading very fast, you will more
likely hit all the upper limits (blocking store file and memstore) and
this will lower your throughput. Those configs relax that. Also for
speedier uploads we disable writing to the WAL
http://hadoop.apache.org/hbase/docs/r0.20.4/api/org/apache/hadoop/hbase/client/Put.html#setWriteToWAL(boolean).
If the job fails or any machines fails you'll have to restart it or
figure the whole, and you absolutely need to force flushes when the MR
is done.

J-D

On Thu, May 27, 2010 at 2:57 PM, Jacob Isaac <ja...@ebrary.com> wrote:
> Thanks J-D
>
> Currently we are trying to find/optimize our load/write times - although in
> prod we expect it to be 25/75 (writes/reads) ratio.
> We are using long table model with only one column - row-size is typically ~
> 4-5k
>
> As to your suggestion on not using even 50% of disk space - I agree and was
> planning to use only ~30-40% (1.5T of 4T) for HDFS
> and as I reported earlier
> 4000 regi...@256m per region(with 3 replications) on 20 nodes ==  150G
> per/node == 10% utilization
>
> while using 1GB as maxfilesize did you have to adjust other params such
> as hbase.hstore.compactionThreshold and hbase.hregion.memstore.flush.size.
> There is an interesting observation by Jonathan Gray documented/reported in
> HBASE-2375 -
> wondering whether that issue gets compounded when using 1G as the
> hbase.hregion.max.filesize
>
> Thx
> Jacob
>
>

Re: Performance at large number of regions/node

Reply via email to