Re: Performance at large number of regions/node

Jacob Isaac Fri, 28 May 2010 13:14:09 -0700

Hi J-D

The run was done on a reformatted hdfs.


Disabling WAL is not an option for us bcos this will be our normal mode of
operation and durability is important to us.
It was poor choice of words - 'upload' by me - it is more like
periodic/continous writes

hbase.regionserver.maxlogs was 256 although
hbase.regionserver.hlog.blocksize was the default.

Did not use compression. And autoflush is default (true)

Each of the 20 node is running custom server program that's reading and
writing to HBase
Max of 6 write threads per node and 1 thread reading
Also wanted to point out that in the current tests we are writing to two
tables and reading from only one

~Jacob

On Fri, May 28, 2010 at 12:42 PM, Jean-Daniel Cryans <jdcry...@apache.org>wrote:

> If the table was already created, changing hbase.hregion.max.filesize
> and hbase.hregion.memstore.flush.size won't be considered, those are
> the default values for new tables. You can set it in the shell too,
> see the "alter" command.
>
> Also, did you restart HBase? Did you push the configs to all nodes?
> Did you disable writing to the WAL? If not, because durability is
> still important to you but you want to upload as fast as you can, I
> would recommend changing this too:
>
> hbase.regionserver.hlog.blocksize 134217728
>
> hbase.regionserver.maxlogs 128
>
> I forgot you had quite largish values, so that must affect the log
> rolling a _lot_.
>
> Finally, did you LZOed the table? From experience, it will only do
> good http://wiki.apache.org/hadoop/UsingLzoCompression
>
> And finally (for real this time), how are you uploading to HBase? How
> many clients? Are you even using the write buffer?
>
> http://hadoop.apache.org/hbase/docs/r0.20.4/api/org/apache/hadoop/hbase/client/HTable.html#setAutoFlush(boolean)
>
> J-D
>
> On Fri, May 28, 2010 at 12:28 PM, Jacob Isaac <ja...@ebrary.com> wrote:
> > Did a run yesterday, posted the relevant parameters below.
> > Did not see any difference in throughput or total run time (~9 hrs)
> >
> > I am consistently getting about 5k rows/sec, each row around ~4-5k
> > using a 17 node Hbase on 20 node HDFS cluster
> >
> > How does it compare?? Can I juice it more?
> >
> > ~Jacob
> >
> >
> >  <property>
> >    <name>hbase.regionserver.handler.count</name>
> >    <value>60</value>
> >  </property>
> >
> >  <property>
> >    <name>hbase.hregion.max.filesize</name>
> >    <value>1073741824</value>
> >  </property>
> >
> >  <property>
> >    <name>hbase.hregion.memstore.flush.size</name>
> >    <value>100663296</value>
> >  </property>
> >
> >  <property>
> >    <name>hbase.hstore.blockingStoreFiles</name>
> >    <value>15</value>
> >  </property>
> >
> >  <property>
> >    <name>hbase.hstore.compactionThreshold</name>
> >    <value>4</value>
> >  </property>
> >
> >  <property>
> >    <name>hbase.hregion.memstore.block.multiplier</name>
> >    <value>8</value>
> >  </property>
> >
> >
> >
> > On Fri, May 28, 2010 at 10:15 AM, Jean-Daniel Cryans <
> jdcry...@apache.org>wrote:
> >
> >> Like I said in my first email, it helps for random reading when lots
> >> of RAM is available to HBase. But it won't help the write throughput.
> >>
> >> J-D
> >>
> >> On Fri, May 28, 2010 at 10:12 AM, Vidhyashankar Venkataraman
> >> <vidhy...@yahoo-inc.com> wrote:
> >> > I am not sure if I understood this right, but does changing
> >> hfile.block.cache.size also help?
> >> >
> >> >
> >> > On 5/27/10 3:27 PM, "Jean-Daniel Cryans" <jdcry...@apache.org> wrote:
> >> >
> >> > Well we do have a couple of other configs for high write throughput:
> >> >
> >> > <property>
> >> >  <name>hbase.hstore.blockingStoreFiles</name>
> >> >  <value>15</value>
> >> > </property>
> >> > <property>
> >> >  <name>hbase.hregion.memstore.block.multiplier</name>
> >> >  <value>8</value>
> >> > </property>
> >> > <property>
> >> >  <name>hbase.regionserver.handler.count</name>
> >> >  <value>60</value>
> >> > </property>
> >> > <property>
> >> >  <name>hbase.regions.percheckin</name>
> >> >  <value>100</value>
> >> > </property>
> >> >
> >> > The last one is for restarts. Uploading very fast, you will more
> >> > likely hit all the upper limits (blocking store file and memstore) and
> >> > this will lower your throughput. Those configs relax that. Also for
> >> > speedier uploads we disable writing to the WAL
> >> >
> >>
> http://hadoop.apache.org/hbase/docs/r0.20.4/api/org/apache/hadoop/hbase/client/Put.html#setWriteToWAL(boolean)
> >> .
> >> > If the job fails or any machines fails you'll have to restart it or
> >> > figure the whole, and you absolutely need to force flushes when the MR
> >> > is done.
> >> >
> >> > J-D
> >> >
> >> > On Thu, May 27, 2010 at 2:57 PM, Jacob Isaac <ja...@ebrary.com>
> wrote:
> >> >> Thanks J-D
> >> >>
> >> >> Currently we are trying to find/optimize our load/write times -
> although
> >> in
> >> >> prod we expect it to be 25/75 (writes/reads) ratio.
> >> >> We are using long table model with only one column - row-size is
> >> typically ~
> >> >> 4-5k
> >> >>
> >> >> As to your suggestion on not using even 50% of disk space - I agree
> and
> >> was
> >> >> planning to use only ~30-40% (1.5T of 4T) for HDFS
> >> >> and as I reported earlier
> >> >> 4000 regi...@256m per region(with 3 replications) on 20 nodes ==
>  150G
> >> >> per/node == 10% utilization
> >> >>
> >> >> while using 1GB as maxfilesize did you have to adjust other params
> such
> >> >> as hbase.hstore.compactionThreshold and
> >> hbase.hregion.memstore.flush.size.
> >> >> There is an interesting observation by Jonathan Gray
> documented/reported
> >> in
> >> >> HBASE-2375 -
> >> >> wondering whether that issue gets compounded when using 1G as the
> >> >> hbase.hregion.max.filesize
> >> >>
> >> >> Thx
> >> >> Jacob
> >> >>
> >> >>
> >> >
> >> >
> >>
> >
>

Re: Performance at large number of regions/node

Reply via email to