w.r.t. option #1, also consider http://hbase.apache.org/book.html#arch.bulk.load
FYI On Tue, Dec 15, 2015 at 12:17 PM, Frank Luo <[email protected]> wrote: > I am in a very similar situation. > > I guess you can try one of the options. > > Option one: avoid online insert by preparing data off-line. Do something > like http://hbase.apache.org/0.94/book/ops_mgt.html#importtsv > > Option two: If the first option doesn’t work for you. It will be better to > reduce your region size and increase read/write timeout. So that you allow > compact to happen while you insert data, but since the size is smaller, it > takes less time to compact/split. With this option, you can have a table > available 24/7 but the overall performance tends to go down dramatically > once some regions starts compacting. > > Option three: If you can afford some down time, ie, two hours every day. > You can manage compact/split during that time. What I usually do is to run > major-compact against all tables, then split ones that is large so that it > has enough room to grow for the next day’s insert. > > I hope it helps. > > From: 林豪 [mailto:[email protected]] > Sent: Monday, December 14, 2015 11:51 PM > To: [email protected] > Subject: Common advices for hosting a huge table > > Hi, all: > > We have a HBase Cluster which has several hundreds of region servers and > each RS hosts nearly 300 regions. Currently one of our tables has increased > to 16 TB and some region exceeds 10 GB. Major compaction on these regions > is painful as it produces a lot of disk I/O and will affect the performance > of RS. The auto splitting size of IncreasingToUpperBoundRegionSplitPolicy > increased to 16 GB or more for this huge table. My solution is set > attribute MAX_FILESIZE on this table so ConstantSizeRegionSplitPolicy auto > splitting will work again. > > My question is: What are the common advices or configuration options to > host such a huge table. If we decide to limit the region size, how can we > decide the optimised region size? If region size is too large, major > compaction is painful; but if region size is too small, then we have a lot > of small region which will overwhelm the RS. > > 林豪 > 云平台 研发工程师 > > 爱奇艺公司 > QIYI.com, Inc. > 地址:上海市徐汇区宜山路1388号民润大厦6层 > 邮编:201103 > 手机:+86 136 1180 1618 > 电话:+86 21 5451 9520 8393 > 传真:+86 21 5451 9529 > 邮箱:[email protected]<mailto:[email protected]> > 网址:www.iQIYI.com<http://www.iqiyi.com/> > [cid:B21E048D-B27D-4528-92D0-36BAE7117128]<http://www.iqiyi.com/> > > This email and any attachments transmitted with it are intended for use by > the intended recipient(s) only. If you have received this email in error, > please notify the sender immediately and then delete it. If you are not the > intended recipient, you must not keep, use, disclose, copy or distribute > this email without the author’s prior permission. We take precautions to > minimize the risk of transmitting software viruses, but we advise you to > perform your own virus checks on any attachment to this message. We cannot > accept liability for any loss or damage caused by software viruses. The > information contained in this communication may be confidential and may be > subject to the attorney-client privilege. >
