>We think minor compact is just combining files , say, if we have 10 hfiles 
>using 100G disk space, after minor compact, it still should be 100G, if not 
>less.

That is partially true.  The thing is when the new compacted files is
created, temp there will be duplicate data. Means unless the
compaction process is over and system takes new files, even the older
files can not get deleted.  The issue seems like the storage space is
less for u.  You r saying that when compaction is starting already 80%
space is occupied.    The remaining is not just enough to accommodate
this temp increase in space need.

-Anoop-

On Sat, Aug 26, 2017 at 7:47 PM, Ted Yu <yuzhih...@gmail.com> wrote:
> Even if you disable minor compaction during bulk load, wouldn't subsequent
> compaction(s) run into the same problem ?
>
> Please take a look at the 3rd paragraph under
> http://hbase.apache.org/book.html#compaction .
>
> You can also read
> http://hbase.apache.org/book.html#compaction.file.selection.old to see how
> different parameters are used for file selection.
>
> By "controling the number of hfiles" I mean reducing the amount of data for
> each bulk load.
>
> If the regions for this table are not evenly distributed, some region
> server(s) may receive more data than the other servers.
>
> Cheers
>
> On Sat, Aug 26, 2017 at 7:03 AM, Liu, Ming (Ming) <ming....@esgyn.cn> wrote:
>
>> Thanks Ted,
>>
>> I don't know how to control the number of hfiles, need to check the
>> importtsv tool. But is there anyway we can disable 'minor compaction' now?
>> And why 'minor compaction' will increase the disk usage. The system is
>> idle, there are no other workload, just after load data, and HBase start to
>> do minor compact and we see disk space are smaller and smaller until
>> running out.
>> We think minor compact is just combining files , say, if we have 10 hfiles
>> using 100G disk space, after minor compact, it still should be 100G, if not
>> less. It is called compaction, isn't it? So we don't understand why it is
>> using so many extra disk space? Anything wrong in our system?
>>
>> thanks,
>> Ming
>>
>> -----Original Message-----
>> From: Ted Yu [mailto:yuzhih...@gmail.com]
>> Sent: Saturday, August 26, 2017 9:54 PM
>> To: user@hbase.apache.org
>> Subject: Re: [Help] minor compact is continuously consuming the disk space
>> until run out of space?
>>
>> bq. on each Region Server there are about 800 hfiles
>>
>> Is it possible to control the number of hfiles during each bulk load ?
>>
>> For this big table, are the regions evenly spread across the servers ? If
>> so, consider increasing the capacity of your cluster.
>>
>> From the doc for hbase.hstore.compactionThreshold :
>>
>> Larger values delay compaction, but when compaction does occur, it takes
>> longer to complete.
>>
>>
>> On Sat, Aug 26, 2017 at 6:48 AM, Liu, Ming (Ming) <ming....@esgyn.cn>
>> wrote:
>>
>> > hi, all,
>> >
>> > We have a system with 17 nodes, with a big table about 28T in size. We
>> use
>> > native hbase bulkloader (importtsv) to load data, and it generated a lot
>> of
>> > hfiles, on each Region Server there are about 800 hfiles.  We turned off
>> > Major Compact, but the Minor compaction is running due to so many hfile.
>> > The problem is, after the initial loading, there are about 80% disk space
>> > used, when minor compaction is going on, we notice the disk space is
>> > reducing rapidly until all disk spaces are used and hbase went down.
>> >
>> > We try to change the hbase.hstore.compactionThreshold to 2000, but the
>> > minor compaction is still triggered.
>> >
>> > The system is CDH 5.7, HBase is 1.2.
>> >
>> > Could anyone help to give us some suggestions? We are really stuck.
>> Thanks
>> > in advance.
>> >
>> > Thanks,
>> > Ming
>> >
>>

Reply via email to