Did a run yesterday, posted the relevant parameters below. Did not see any difference in throughput or total run time (~9 hrs)
I am consistently getting about 5k rows/sec, each row around ~4-5k using a 17 node Hbase on 20 node HDFS cluster How does it compare?? Can I juice it more? ~Jacob <property> <name>hbase.regionserver.handler.count</name> <value>60</value> </property> <property> <name>hbase.hregion.max.filesize</name> <value>1073741824</value> </property> <property> <name>hbase.hregion.memstore.flush.size</name> <value>100663296</value> </property> <property> <name>hbase.hstore.blockingStoreFiles</name> <value>15</value> </property> <property> <name>hbase.hstore.compactionThreshold</name> <value>4</value> </property> <property> <name>hbase.hregion.memstore.block.multiplier</name> <value>8</value> </property> On Fri, May 28, 2010 at 10:15 AM, Jean-Daniel Cryans <jdcry...@apache.org>wrote: > Like I said in my first email, it helps for random reading when lots > of RAM is available to HBase. But it won't help the write throughput. > > J-D > > On Fri, May 28, 2010 at 10:12 AM, Vidhyashankar Venkataraman > <vidhy...@yahoo-inc.com> wrote: > > I am not sure if I understood this right, but does changing > hfile.block.cache.size also help? > > > > > > On 5/27/10 3:27 PM, "Jean-Daniel Cryans" <jdcry...@apache.org> wrote: > > > > Well we do have a couple of other configs for high write throughput: > > > > <property> > > <name>hbase.hstore.blockingStoreFiles</name> > > <value>15</value> > > </property> > > <property> > > <name>hbase.hregion.memstore.block.multiplier</name> > > <value>8</value> > > </property> > > <property> > > <name>hbase.regionserver.handler.count</name> > > <value>60</value> > > </property> > > <property> > > <name>hbase.regions.percheckin</name> > > <value>100</value> > > </property> > > > > The last one is for restarts. Uploading very fast, you will more > > likely hit all the upper limits (blocking store file and memstore) and > > this will lower your throughput. Those configs relax that. Also for > > speedier uploads we disable writing to the WAL > > > http://hadoop.apache.org/hbase/docs/r0.20.4/api/org/apache/hadoop/hbase/client/Put.html#setWriteToWAL(boolean) > . > > If the job fails or any machines fails you'll have to restart it or > > figure the whole, and you absolutely need to force flushes when the MR > > is done. > > > > J-D > > > > On Thu, May 27, 2010 at 2:57 PM, Jacob Isaac <ja...@ebrary.com> wrote: > >> Thanks J-D > >> > >> Currently we are trying to find/optimize our load/write times - although > in > >> prod we expect it to be 25/75 (writes/reads) ratio. > >> We are using long table model with only one column - row-size is > typically ~ > >> 4-5k > >> > >> As to your suggestion on not using even 50% of disk space - I agree and > was > >> planning to use only ~30-40% (1.5T of 4T) for HDFS > >> and as I reported earlier > >> 4000 regi...@256m per region(with 3 replications) on 20 nodes == 150G > >> per/node == 10% utilization > >> > >> while using 1GB as maxfilesize did you have to adjust other params such > >> as hbase.hstore.compactionThreshold and > hbase.hregion.memstore.flush.size. > >> There is an interesting observation by Jonathan Gray documented/reported > in > >> HBASE-2375 - > >> wondering whether that issue gets compounded when using 1G as the > >> hbase.hregion.max.filesize > >> > >> Thx > >> Jacob > >> > >> > > > > >