Vidhya - This is using HBase API. J-D - I do have timing info for inserts and gets - Let me process the data and will post the results.
~Jacob. On Fri, May 28, 2010 at 1:16 PM, Vidhyashankar Venkataraman < vidhy...@yahoo-inc.com> wrote: > Jacob, > Just curious: Is your observed upload throughput that of bulk importing > or using the Hbase API? > Thanks > Vidhya > > On 5/28/10 1:13 PM, "Jacob Isaac" <ja...@ebrary.com> wrote: > > Hi J-D > > The run was done on a reformatted hdfs. > > Disabling WAL is not an option for us bcos this will be our normal mode of > operation and durability is important to us. > It was poor choice of words - 'upload' by me - it is more like > periodic/continous writes > > hbase.regionserver.maxlogs was 256 although > hbase.regionserver.hlog.blocksize was the default. > > Did not use compression. And autoflush is default (true) > > Each of the 20 node is running custom server program that's reading and > writing to HBase > Max of 6 write threads per node and 1 thread reading > Also wanted to point out that in the current tests we are writing to two > tables and reading from only one > > ~Jacob > > On Fri, May 28, 2010 at 12:42 PM, Jean-Daniel Cryans <jdcry...@apache.org > >wrote: > > > If the table was already created, changing hbase.hregion.max.filesize > > and hbase.hregion.memstore.flush.size won't be considered, those are > > the default values for new tables. You can set it in the shell too, > > see the "alter" command. > > > > Also, did you restart HBase? Did you push the configs to all nodes? > > Did you disable writing to the WAL? If not, because durability is > > still important to you but you want to upload as fast as you can, I > > would recommend changing this too: > > > > hbase.regionserver.hlog.blocksize 134217728 > > > > hbase.regionserver.maxlogs 128 > > > > I forgot you had quite largish values, so that must affect the log > > rolling a _lot_. > > > > Finally, did you LZOed the table? From experience, it will only do > > good http://wiki.apache.org/hadoop/UsingLzoCompression > > > > And finally (for real this time), how are you uploading to HBase? How > > many clients? Are you even using the write buffer? > > > > > http://hadoop.apache.org/hbase/docs/r0.20.4/api/org/apache/hadoop/hbase/client/HTable.html#setAutoFlush(boolean) > > > > J-D > > > > On Fri, May 28, 2010 at 12:28 PM, Jacob Isaac <ja...@ebrary.com> wrote: > > > Did a run yesterday, posted the relevant parameters below. > > > Did not see any difference in throughput or total run time (~9 hrs) > > > > > > I am consistently getting about 5k rows/sec, each row around ~4-5k > > > using a 17 node Hbase on 20 node HDFS cluster > > > > > > How does it compare?? Can I juice it more? > > > > > > ~Jacob > > > > > > > > > <property> > > > <name>hbase.regionserver.handler.count</name> > > > <value>60</value> > > > </property> > > > > > > <property> > > > <name>hbase.hregion.max.filesize</name> > > > <value>1073741824</value> > > > </property> > > > > > > <property> > > > <name>hbase.hregion.memstore.flush.size</name> > > > <value>100663296</value> > > > </property> > > > > > > <property> > > > <name>hbase.hstore.blockingStoreFiles</name> > > > <value>15</value> > > > </property> > > > > > > <property> > > > <name>hbase.hstore.compactionThreshold</name> > > > <value>4</value> > > > </property> > > > > > > <property> > > > <name>hbase.hregion.memstore.block.multiplier</name> > > > <value>8</value> > > > </property> > > > > > > > > > > > > On Fri, May 28, 2010 at 10:15 AM, Jean-Daniel Cryans < > > jdcry...@apache.org>wrote: > > > > > >> Like I said in my first email, it helps for random reading when lots > > >> of RAM is available to HBase. But it won't help the write throughput. > > >> > > >> J-D > > >> > > >> On Fri, May 28, 2010 at 10:12 AM, Vidhyashankar Venkataraman > > >> <vidhy...@yahoo-inc.com> wrote: > > >> > I am not sure if I understood this right, but does changing > > >> hfile.block.cache.size also help? > > >> > > > >> > > > >> > On 5/27/10 3:27 PM, "Jean-Daniel Cryans" <jdcry...@apache.org> > wrote: > > >> > > > >> > Well we do have a couple of other configs for high write throughput: > > >> > > > >> > <property> > > >> > <name>hbase.hstore.blockingStoreFiles</name> > > >> > <value>15</value> > > >> > </property> > > >> > <property> > > >> > <name>hbase.hregion.memstore.block.multiplier</name> > > >> > <value>8</value> > > >> > </property> > > >> > <property> > > >> > <name>hbase.regionserver.handler.count</name> > > >> > <value>60</value> > > >> > </property> > > >> > <property> > > >> > <name>hbase.regions.percheckin</name> > > >> > <value>100</value> > > >> > </property> > > >> > > > >> > The last one is for restarts. Uploading very fast, you will more > > >> > likely hit all the upper limits (blocking store file and memstore) > and > > >> > this will lower your throughput. Those configs relax that. Also for > > >> > speedier uploads we disable writing to the WAL > > >> > > > >> > > > http://hadoop.apache.org/hbase/docs/r0.20.4/api/org/apache/hadoop/hbase/client/Put.html#setWriteToWAL(boolean) > > >> . > > >> > If the job fails or any machines fails you'll have to restart it or > > >> > figure the whole, and you absolutely need to force flushes when the > MR > > >> > is done. > > >> > > > >> > J-D > > >> > > > >> > On Thu, May 27, 2010 at 2:57 PM, Jacob Isaac <ja...@ebrary.com> > > wrote: > > >> >> Thanks J-D > > >> >> > > >> >> Currently we are trying to find/optimize our load/write times - > > although > > >> in > > >> >> prod we expect it to be 25/75 (writes/reads) ratio. > > >> >> We are using long table model with only one column - row-size is > > >> typically ~ > > >> >> 4-5k > > >> >> > > >> >> As to your suggestion on not using even 50% of disk space - I agree > > and > > >> was > > >> >> planning to use only ~30-40% (1.5T of 4T) for HDFS > > >> >> and as I reported earlier > > >> >> 4000 regi...@256m per region(with 3 replications) on 20 nodes == > > 150G > > >> >> per/node == 10% utilization > > >> >> > > >> >> while using 1GB as maxfilesize did you have to adjust other params > > such > > >> >> as hbase.hstore.compactionThreshold and > > >> hbase.hregion.memstore.flush.size. > > >> >> There is an interesting observation by Jonathan Gray > > documented/reported > > >> in > > >> >> HBASE-2375 - > > >> >> wondering whether that issue gets compounded when using 1G as the > > >> >> hbase.hregion.max.filesize > > >> >> > > >> >> Thx > > >> >> Jacob > > >> >> > > >> >> > > >> > > > >> > > > >> > > > > > > >