Here is the summary of the runs puts (~4-5k per row) regionsize #rows Total time (ms) 1G 82282053*2 301943742 512M 82287593*2 313119378 256M 82246314*2 433200105
gets ((~4-5k per row) regionsize #rows Total time (ms) 1G 82427685 90116726 512M 82421943 94878466 256M 82395487 108160178 Note : for the 256m run the hbase.hregion.memstore.flush.size=64m and for the other two runs the hbase.hregion.memstore.flush.size=96m Regarding disabling autoflush - since there are large number of writes(~4k per row) happening we would have hit the hbase.client.write.buffer size every few seconds. ~Jacob On Fri, May 28, 2010 at 1:36 PM, Jacob Isaac <ja...@ebrary.com> wrote: > Vidhya - This is using HBase API. > > J-D - I do have timing info for inserts and gets - Let me process the data > and will post the results. > > ~Jacob. > > > On Fri, May 28, 2010 at 1:16 PM, Vidhyashankar Venkataraman < > vidhy...@yahoo-inc.com> wrote: > >> Jacob, >> Just curious: Is your observed upload throughput that of bulk importing >> or using the Hbase API? >> Thanks >> Vidhya >> >> On 5/28/10 1:13 PM, "Jacob Isaac" <ja...@ebrary.com> wrote: >> >> Hi J-D >> >> The run was done on a reformatted hdfs. >> >> Disabling WAL is not an option for us bcos this will be our normal mode of >> operation and durability is important to us. >> It was poor choice of words - 'upload' by me - it is more like >> periodic/continous writes >> >> hbase.regionserver.maxlogs was 256 although >> hbase.regionserver.hlog.blocksize was the default. >> >> Did not use compression. And autoflush is default (true) >> >> Each of the 20 node is running custom server program that's reading and >> writing to HBase >> Max of 6 write threads per node and 1 thread reading >> Also wanted to point out that in the current tests we are writing to two >> tables and reading from only one >> >> ~Jacob >> >> On Fri, May 28, 2010 at 12:42 PM, Jean-Daniel Cryans <jdcry...@apache.org >> >wrote: >> >> > If the table was already created, changing hbase.hregion.max.filesize >> > and hbase.hregion.memstore.flush.size won't be considered, those are >> > the default values for new tables. You can set it in the shell too, >> > see the "alter" command. >> > >> > Also, did you restart HBase? Did you push the configs to all nodes? >> > Did you disable writing to the WAL? If not, because durability is >> > still important to you but you want to upload as fast as you can, I >> > would recommend changing this too: >> > >> > hbase.regionserver.hlog.blocksize 134217728 >> > >> > hbase.regionserver.maxlogs 128 >> > >> > I forgot you had quite largish values, so that must affect the log >> > rolling a _lot_. >> > >> > Finally, did you LZOed the table? From experience, it will only do >> > good http://wiki.apache.org/hadoop/UsingLzoCompression >> > >> > And finally (for real this time), how are you uploading to HBase? How >> > many clients? Are you even using the write buffer? >> > >> > >> http://hadoop.apache.org/hbase/docs/r0.20.4/api/org/apache/hadoop/hbase/client/HTable.html#setAutoFlush(boolean) >> > >> > J-D >> > >> > On Fri, May 28, 2010 at 12:28 PM, Jacob Isaac <ja...@ebrary.com> wrote: >> > > Did a run yesterday, posted the relevant parameters below. >> > > Did not see any difference in throughput or total run time (~9 hrs) >> > > >> > > I am consistently getting about 5k rows/sec, each row around ~4-5k >> > > using a 17 node Hbase on 20 node HDFS cluster >> > > >> > > How does it compare?? Can I juice it more? >> > > >> > > ~Jacob >> > > >> > > >> > > <property> >> > > <name>hbase.regionserver.handler.count</name> >> > > <value>60</value> >> > > </property> >> > > >> > > <property> >> > > <name>hbase.hregion.max.filesize</name> >> > > <value>1073741824</value> >> > > </property> >> > > >> > > <property> >> > > <name>hbase.hregion.memstore.flush.size</name> >> > > <value>100663296</value> >> > > </property> >> > > >> > > <property> >> > > <name>hbase.hstore.blockingStoreFiles</name> >> > > <value>15</value> >> > > </property> >> > > >> > > <property> >> > > <name>hbase.hstore.compactionThreshold</name> >> > > <value>4</value> >> > > </property> >> > > >> > > <property> >> > > <name>hbase.hregion.memstore.block.multiplier</name> >> > > <value>8</value> >> > > </property> >> > > >> > > >> > > >> > > On Fri, May 28, 2010 at 10:15 AM, Jean-Daniel Cryans < >> > jdcry...@apache.org>wrote: >> > > >> > >> Like I said in my first email, it helps for random reading when lots >> > >> of RAM is available to HBase. But it won't help the write throughput. >> > >> >> > >> J-D >> > >> >> > >> On Fri, May 28, 2010 at 10:12 AM, Vidhyashankar Venkataraman >> > >> <vidhy...@yahoo-inc.com> wrote: >> > >> > I am not sure if I understood this right, but does changing >> > >> hfile.block.cache.size also help? >> > >> > >> > >> > >> > >> > On 5/27/10 3:27 PM, "Jean-Daniel Cryans" <jdcry...@apache.org> >> wrote: >> > >> > >> > >> > Well we do have a couple of other configs for high write >> throughput: >> > >> > >> > >> > <property> >> > >> > <name>hbase.hstore.blockingStoreFiles</name> >> > >> > <value>15</value> >> > >> > </property> >> > >> > <property> >> > >> > <name>hbase.hregion.memstore.block.multiplier</name> >> > >> > <value>8</value> >> > >> > </property> >> > >> > <property> >> > >> > <name>hbase.regionserver.handler.count</name> >> > >> > <value>60</value> >> > >> > </property> >> > >> > <property> >> > >> > <name>hbase.regions.percheckin</name> >> > >> > <value>100</value> >> > >> > </property> >> > >> > >> > >> > The last one is for restarts. Uploading very fast, you will more >> > >> > likely hit all the upper limits (blocking store file and memstore) >> and >> > >> > this will lower your throughput. Those configs relax that. Also for >> > >> > speedier uploads we disable writing to the WAL >> > >> > >> > >> >> > >> http://hadoop.apache.org/hbase/docs/r0.20.4/api/org/apache/hadoop/hbase/client/Put.html#setWriteToWAL(boolean) >> > >> . >> > >> > If the job fails or any machines fails you'll have to restart it or >> > >> > figure the whole, and you absolutely need to force flushes when the >> MR >> > >> > is done. >> > >> > >> > >> > J-D >> > >> > >> > >> > On Thu, May 27, 2010 at 2:57 PM, Jacob Isaac <ja...@ebrary.com> >> > wrote: >> > >> >> Thanks J-D >> > >> >> >> > >> >> Currently we are trying to find/optimize our load/write times - >> > although >> > >> in >> > >> >> prod we expect it to be 25/75 (writes/reads) ratio. >> > >> >> We are using long table model with only one column - row-size is >> > >> typically ~ >> > >> >> 4-5k >> > >> >> >> > >> >> As to your suggestion on not using even 50% of disk space - I >> agree >> > and >> > >> was >> > >> >> planning to use only ~30-40% (1.5T of 4T) for HDFS >> > >> >> and as I reported earlier >> > >> >> 4000 regi...@256m per region(with 3 replications) on 20 nodes == >> > 150G >> > >> >> per/node == 10% utilization >> > >> >> >> > >> >> while using 1GB as maxfilesize did you have to adjust other params >> > such >> > >> >> as hbase.hstore.compactionThreshold and >> > >> hbase.hregion.memstore.flush.size. >> > >> >> There is an interesting observation by Jonathan Gray >> > documented/reported >> > >> in >> > >> >> HBASE-2375 - >> > >> >> wondering whether that issue gets compounded when using 1G as the >> > >> >> hbase.hregion.max.filesize >> > >> >> >> > >> >> Thx >> > >> >> Jacob >> > >> >> >> > >> >> >> > >> > >> > >> > >> > >> >> > > >> > >> >> >