Re: Performance at large number of regions/node

Jean-Daniel Cryans Fri, 28 May 2010 10:16:28 -0700

Like I said in my first email, it helps for random reading when lots
of RAM is available to HBase. But it won't help the write throughput.


J-D

On Fri, May 28, 2010 at 10:12 AM, Vidhyashankar Venkataraman
<vidhy...@yahoo-inc.com> wrote:
> I am not sure if I understood this right, but does changing 
> hfile.block.cache.size also help?
>
>
> On 5/27/10 3:27 PM, "Jean-Daniel Cryans" <jdcry...@apache.org> wrote:
>
> Well we do have a couple of other configs for high write throughput:
>
> <property>
>  <name>hbase.hstore.blockingStoreFiles</name>
>  <value>15</value>
> </property>
> <property>
>  <name>hbase.hregion.memstore.block.multiplier</name>
>  <value>8</value>
> </property>
> <property>
>  <name>hbase.regionserver.handler.count</name>
>  <value>60</value>
> </property>
> <property>
>  <name>hbase.regions.percheckin</name>
>  <value>100</value>
> </property>
>
> The last one is for restarts. Uploading very fast, you will more
> likely hit all the upper limits (blocking store file and memstore) and
> this will lower your throughput. Those configs relax that. Also for
> speedier uploads we disable writing to the WAL
> http://hadoop.apache.org/hbase/docs/r0.20.4/api/org/apache/hadoop/hbase/client/Put.html#setWriteToWAL(boolean).
> If the job fails or any machines fails you'll have to restart it or
> figure the whole, and you absolutely need to force flushes when the MR
> is done.
>
> J-D
>
> On Thu, May 27, 2010 at 2:57 PM, Jacob Isaac <ja...@ebrary.com> wrote:
>> Thanks J-D
>>
>> Currently we are trying to find/optimize our load/write times - although in
>> prod we expect it to be 25/75 (writes/reads) ratio.
>> We are using long table model with only one column - row-size is typically ~
>> 4-5k
>>
>> As to your suggestion on not using even 50% of disk space - I agree and was
>> planning to use only ~30-40% (1.5T of 4T) for HDFS
>> and as I reported earlier
>> 4000 regi...@256m per region(with 3 replications) on 20 nodes ==  150G
>> per/node == 10% utilization
>>
>> while using 1GB as maxfilesize did you have to adjust other params such
>> as hbase.hstore.compactionThreshold and hbase.hregion.memstore.flush.size.
>> There is an interesting observation by Jonathan Gray documented/reported in
>> HBASE-2375 -
>> wondering whether that issue gets compounded when using 1G as the
>> hbase.hregion.max.filesize
>>
>> Thx
>> Jacob
>>
>>
>
>

Re: Performance at large number of regions/node

Reply via email to