Re: Performance at large number of regions/node

Jacob Isaac Fri, 28 May 2010 13:37:15 -0700

Vidhya - This is using HBase API.

J-D - I do have timing info for inserts and gets - Let me process the data
and will post the results.


~Jacob.


On Fri, May 28, 2010 at 1:16 PM, Vidhyashankar Venkataraman <
vidhy...@yahoo-inc.com> wrote:

> Jacob,
>   Just curious: Is your observed upload throughput that of bulk importing
> or using the Hbase API?
> Thanks
> Vidhya
>
> On 5/28/10 1:13 PM, "Jacob Isaac" <ja...@ebrary.com> wrote:
>
> Hi J-D
>
> The run was done on a reformatted hdfs.
>
> Disabling WAL is not an option for us bcos this will be our normal mode of
> operation and durability is important to us.
> It was poor choice of words - 'upload' by me - it is more like
> periodic/continous writes
>
> hbase.regionserver.maxlogs was 256 although
> hbase.regionserver.hlog.blocksize was the default.
>
> Did not use compression. And autoflush is default (true)
>
> Each of the 20 node is running custom server program that's reading and
> writing to HBase
> Max of 6 write threads per node and 1 thread reading
> Also wanted to point out that in the current tests we are writing to two
> tables and reading from only one
>
> ~Jacob
>
> On Fri, May 28, 2010 at 12:42 PM, Jean-Daniel Cryans <jdcry...@apache.org
> >wrote:
>
> > If the table was already created, changing hbase.hregion.max.filesize
> > and hbase.hregion.memstore.flush.size won't be considered, those are
> > the default values for new tables. You can set it in the shell too,
> > see the "alter" command.
> >
> > Also, did you restart HBase? Did you push the configs to all nodes?
> > Did you disable writing to the WAL? If not, because durability is
> > still important to you but you want to upload as fast as you can, I
> > would recommend changing this too:
> >
> > hbase.regionserver.hlog.blocksize 134217728
> >
> > hbase.regionserver.maxlogs 128
> >
> > I forgot you had quite largish values, so that must affect the log
> > rolling a _lot_.
> >
> > Finally, did you LZOed the table? From experience, it will only do
> > good http://wiki.apache.org/hadoop/UsingLzoCompression
> >
> > And finally (for real this time), how are you uploading to HBase? How
> > many clients? Are you even using the write buffer?
> >
> >
> http://hadoop.apache.org/hbase/docs/r0.20.4/api/org/apache/hadoop/hbase/client/HTable.html#setAutoFlush(boolean)
> >
> > J-D
> >
> > On Fri, May 28, 2010 at 12:28 PM, Jacob Isaac <ja...@ebrary.com> wrote:
> > > Did a run yesterday, posted the relevant parameters below.
> > > Did not see any difference in throughput or total run time (~9 hrs)
> > >
> > > I am consistently getting about 5k rows/sec, each row around ~4-5k
> > > using a 17 node Hbase on 20 node HDFS cluster
> > >
> > > How does it compare?? Can I juice it more?
> > >
> > > ~Jacob
> > >
> > >
> > >  <property>
> > >    <name>hbase.regionserver.handler.count</name>
> > >    <value>60</value>
> > >  </property>
> > >
> > >  <property>
> > >    <name>hbase.hregion.max.filesize</name>
> > >    <value>1073741824</value>
> > >  </property>
> > >
> > >  <property>
> > >    <name>hbase.hregion.memstore.flush.size</name>
> > >    <value>100663296</value>
> > >  </property>
> > >
> > >  <property>
> > >    <name>hbase.hstore.blockingStoreFiles</name>
> > >    <value>15</value>
> > >  </property>
> > >
> > >  <property>
> > >    <name>hbase.hstore.compactionThreshold</name>
> > >    <value>4</value>
> > >  </property>
> > >
> > >  <property>
> > >    <name>hbase.hregion.memstore.block.multiplier</name>
> > >    <value>8</value>
> > >  </property>
> > >
> > >
> > >
> > > On Fri, May 28, 2010 at 10:15 AM, Jean-Daniel Cryans <
> > jdcry...@apache.org>wrote:
> > >
> > >> Like I said in my first email, it helps for random reading when lots
> > >> of RAM is available to HBase. But it won't help the write throughput.
> > >>
> > >> J-D
> > >>
> > >> On Fri, May 28, 2010 at 10:12 AM, Vidhyashankar Venkataraman
> > >> <vidhy...@yahoo-inc.com> wrote:
> > >> > I am not sure if I understood this right, but does changing
> > >> hfile.block.cache.size also help?
> > >> >
> > >> >
> > >> > On 5/27/10 3:27 PM, "Jean-Daniel Cryans" <jdcry...@apache.org>
> wrote:
> > >> >
> > >> > Well we do have a couple of other configs for high write throughput:
> > >> >
> > >> > <property>
> > >> >  <name>hbase.hstore.blockingStoreFiles</name>
> > >> >  <value>15</value>
> > >> > </property>
> > >> > <property>
> > >> >  <name>hbase.hregion.memstore.block.multiplier</name>
> > >> >  <value>8</value>
> > >> > </property>
> > >> > <property>
> > >> >  <name>hbase.regionserver.handler.count</name>
> > >> >  <value>60</value>
> > >> > </property>
> > >> > <property>
> > >> >  <name>hbase.regions.percheckin</name>
> > >> >  <value>100</value>
> > >> > </property>
> > >> >
> > >> > The last one is for restarts. Uploading very fast, you will more
> > >> > likely hit all the upper limits (blocking store file and memstore)
> and
> > >> > this will lower your throughput. Those configs relax that. Also for
> > >> > speedier uploads we disable writing to the WAL
> > >> >
> > >>
> >
> http://hadoop.apache.org/hbase/docs/r0.20.4/api/org/apache/hadoop/hbase/client/Put.html#setWriteToWAL(boolean)
> > >> .
> > >> > If the job fails or any machines fails you'll have to restart it or
> > >> > figure the whole, and you absolutely need to force flushes when the
> MR
> > >> > is done.
> > >> >
> > >> > J-D
> > >> >
> > >> > On Thu, May 27, 2010 at 2:57 PM, Jacob Isaac <ja...@ebrary.com>
> > wrote:
> > >> >> Thanks J-D
> > >> >>
> > >> >> Currently we are trying to find/optimize our load/write times -
> > although
> > >> in
> > >> >> prod we expect it to be 25/75 (writes/reads) ratio.
> > >> >> We are using long table model with only one column - row-size is
> > >> typically ~
> > >> >> 4-5k
> > >> >>
> > >> >> As to your suggestion on not using even 50% of disk space - I agree
> > and
> > >> was
> > >> >> planning to use only ~30-40% (1.5T of 4T) for HDFS
> > >> >> and as I reported earlier
> > >> >> 4000 regi...@256m per region(with 3 replications) on 20 nodes ==
> >  150G
> > >> >> per/node == 10% utilization
> > >> >>
> > >> >> while using 1GB as maxfilesize did you have to adjust other params
> > such
> > >> >> as hbase.hstore.compactionThreshold and
> > >> hbase.hregion.memstore.flush.size.
> > >> >> There is an interesting observation by Jonathan Gray
> > documented/reported
> > >> in
> > >> >> HBASE-2375 -
> > >> >> wondering whether that issue gets compounded when using 1G as the
> > >> >> hbase.hregion.max.filesize
> > >> >>
> > >> >> Thx
> > >> >> Jacob
> > >> >>
> > >> >>
> > >> >
> > >> >
> > >>
> > >
> >
>
>

Re: Performance at large number of regions/node

Reply via email to