Did a run yesterday, posted the relevant parameters below.
Did not see any difference in throughput or total run time (~9 hrs)

I am consistently getting about 5k rows/sec, each row around ~4-5k
using a 17 node Hbase on 20 node HDFS cluster

How does it compare?? Can I juice it more?

~Jacob


  <property>
    <name>hbase.regionserver.handler.count</name>
    <value>60</value>
  </property>

  <property>
    <name>hbase.hregion.max.filesize</name>
    <value>1073741824</value>
  </property>

  <property>
    <name>hbase.hregion.memstore.flush.size</name>
    <value>100663296</value>
  </property>

  <property>
    <name>hbase.hstore.blockingStoreFiles</name>
    <value>15</value>
  </property>

  <property>
    <name>hbase.hstore.compactionThreshold</name>
    <value>4</value>
  </property>

  <property>
    <name>hbase.hregion.memstore.block.multiplier</name>
    <value>8</value>
  </property>



On Fri, May 28, 2010 at 10:15 AM, Jean-Daniel Cryans <jdcry...@apache.org>wrote:

> Like I said in my first email, it helps for random reading when lots
> of RAM is available to HBase. But it won't help the write throughput.
>
> J-D
>
> On Fri, May 28, 2010 at 10:12 AM, Vidhyashankar Venkataraman
> <vidhy...@yahoo-inc.com> wrote:
> > I am not sure if I understood this right, but does changing
> hfile.block.cache.size also help?
> >
> >
> > On 5/27/10 3:27 PM, "Jean-Daniel Cryans" <jdcry...@apache.org> wrote:
> >
> > Well we do have a couple of other configs for high write throughput:
> >
> > <property>
> >  <name>hbase.hstore.blockingStoreFiles</name>
> >  <value>15</value>
> > </property>
> > <property>
> >  <name>hbase.hregion.memstore.block.multiplier</name>
> >  <value>8</value>
> > </property>
> > <property>
> >  <name>hbase.regionserver.handler.count</name>
> >  <value>60</value>
> > </property>
> > <property>
> >  <name>hbase.regions.percheckin</name>
> >  <value>100</value>
> > </property>
> >
> > The last one is for restarts. Uploading very fast, you will more
> > likely hit all the upper limits (blocking store file and memstore) and
> > this will lower your throughput. Those configs relax that. Also for
> > speedier uploads we disable writing to the WAL
> >
> http://hadoop.apache.org/hbase/docs/r0.20.4/api/org/apache/hadoop/hbase/client/Put.html#setWriteToWAL(boolean)
> .
> > If the job fails or any machines fails you'll have to restart it or
> > figure the whole, and you absolutely need to force flushes when the MR
> > is done.
> >
> > J-D
> >
> > On Thu, May 27, 2010 at 2:57 PM, Jacob Isaac <ja...@ebrary.com> wrote:
> >> Thanks J-D
> >>
> >> Currently we are trying to find/optimize our load/write times - although
> in
> >> prod we expect it to be 25/75 (writes/reads) ratio.
> >> We are using long table model with only one column - row-size is
> typically ~
> >> 4-5k
> >>
> >> As to your suggestion on not using even 50% of disk space - I agree and
> was
> >> planning to use only ~30-40% (1.5T of 4T) for HDFS
> >> and as I reported earlier
> >> 4000 regi...@256m per region(with 3 replications) on 20 nodes ==  150G
> >> per/node == 10% utilization
> >>
> >> while using 1GB as maxfilesize did you have to adjust other params such
> >> as hbase.hstore.compactionThreshold and
> hbase.hregion.memstore.flush.size.
> >> There is an interesting observation by Jonathan Gray documented/reported
> in
> >> HBASE-2375 -
> >> wondering whether that issue gets compounded when using 1G as the
> >> hbase.hregion.max.filesize
> >>
> >> Thx
> >> Jacob
> >>
> >>
> >
> >
>

Reply via email to