Even if you disable minor compaction during bulk load, wouldn't subsequent compaction(s) run into the same problem ?
Please take a look at the 3rd paragraph under http://hbase.apache.org/book.html#compaction . You can also read http://hbase.apache.org/book.html#compaction.file.selection.old to see how different parameters are used for file selection. By "controling the number of hfiles" I mean reducing the amount of data for each bulk load. If the regions for this table are not evenly distributed, some region server(s) may receive more data than the other servers. Cheers On Sat, Aug 26, 2017 at 7:03 AM, Liu, Ming (Ming) <[email protected]> wrote: > Thanks Ted, > > I don't know how to control the number of hfiles, need to check the > importtsv tool. But is there anyway we can disable 'minor compaction' now? > And why 'minor compaction' will increase the disk usage. The system is > idle, there are no other workload, just after load data, and HBase start to > do minor compact and we see disk space are smaller and smaller until > running out. > We think minor compact is just combining files , say, if we have 10 hfiles > using 100G disk space, after minor compact, it still should be 100G, if not > less. It is called compaction, isn't it? So we don't understand why it is > using so many extra disk space? Anything wrong in our system? > > thanks, > Ming > > -----Original Message----- > From: Ted Yu [mailto:[email protected]] > Sent: Saturday, August 26, 2017 9:54 PM > To: [email protected] > Subject: Re: [Help] minor compact is continuously consuming the disk space > until run out of space? > > bq. on each Region Server there are about 800 hfiles > > Is it possible to control the number of hfiles during each bulk load ? > > For this big table, are the regions evenly spread across the servers ? If > so, consider increasing the capacity of your cluster. > > From the doc for hbase.hstore.compactionThreshold : > > Larger values delay compaction, but when compaction does occur, it takes > longer to complete. > > > On Sat, Aug 26, 2017 at 6:48 AM, Liu, Ming (Ming) <[email protected]> > wrote: > > > hi, all, > > > > We have a system with 17 nodes, with a big table about 28T in size. We > use > > native hbase bulkloader (importtsv) to load data, and it generated a lot > of > > hfiles, on each Region Server there are about 800 hfiles. We turned off > > Major Compact, but the Minor compaction is running due to so many hfile. > > The problem is, after the initial loading, there are about 80% disk space > > used, when minor compaction is going on, we notice the disk space is > > reducing rapidly until all disk spaces are used and hbase went down. > > > > We try to change the hbase.hstore.compactionThreshold to 2000, but the > > minor compaction is still triggered. > > > > The system is CDH 5.7, HBase is 1.2. > > > > Could anyone help to give us some suggestions? We are really stuck. > Thanks > > in advance. > > > > Thanks, > > Ming > > >
