I am not sure if I understood this right, but does changing hfile.block.cache.size also help?
On 5/27/10 3:27 PM, "Jean-Daniel Cryans" <jdcry...@apache.org> wrote: Well we do have a couple of other configs for high write throughput: <property> <name>hbase.hstore.blockingStoreFiles</name> <value>15</value> </property> <property> <name>hbase.hregion.memstore.block.multiplier</name> <value>8</value> </property> <property> <name>hbase.regionserver.handler.count</name> <value>60</value> </property> <property> <name>hbase.regions.percheckin</name> <value>100</value> </property> The last one is for restarts. Uploading very fast, you will more likely hit all the upper limits (blocking store file and memstore) and this will lower your throughput. Those configs relax that. Also for speedier uploads we disable writing to the WAL http://hadoop.apache.org/hbase/docs/r0.20.4/api/org/apache/hadoop/hbase/client/Put.html#setWriteToWAL(boolean). If the job fails or any machines fails you'll have to restart it or figure the whole, and you absolutely need to force flushes when the MR is done. J-D On Thu, May 27, 2010 at 2:57 PM, Jacob Isaac <ja...@ebrary.com> wrote: > Thanks J-D > > Currently we are trying to find/optimize our load/write times - although in > prod we expect it to be 25/75 (writes/reads) ratio. > We are using long table model with only one column - row-size is typically ~ > 4-5k > > As to your suggestion on not using even 50% of disk space - I agree and was > planning to use only ~30-40% (1.5T of 4T) for HDFS > and as I reported earlier > 4000 regi...@256m per region(with 3 replications) on 20 nodes == 150G > per/node == 10% utilization > > while using 1GB as maxfilesize did you have to adjust other params such > as hbase.hstore.compactionThreshold and hbase.hregion.memstore.flush.size. > There is an interesting observation by Jonathan Gray documented/reported in > HBASE-2375 - > wondering whether that issue gets compounded when using 1G as the > hbase.hregion.max.filesize > > Thx > Jacob > >