In applyCompactionPolicy method of ExploringCompactionPolicy 
<http://hbase.apache.org/xref/org/apache/hadoop/hbase/regionserver/compactions/ExploringCompactionPolicy.html>
 class, there is the following if

104 
<http://hbase.apache.org/xref/org/apache/hadoop/hbase/regionserver/compactions/ExploringCompactionPolicy.html#104>
         if (size >= comConf.getMinCompactSize()
105 
<http://hbase.apache.org/xref/org/apache/hadoop/hbase/regionserver/compactions/ExploringCompactionPolicy.html#105>
             && !filesInRatio(potentialMatchFiles, currentRatio)) {
106 
<http://hbase.apache.org/xref/org/apache/hadoop/hbase/regionserver/compactions/ExploringCompactionPolicy.html#106>
           continue;
107 
<http://hbase.apache.org/xref/org/apache/hadoop/hbase/regionserver/compactions/ExploringCompactionPolicy.html#107>
         }
Is it correct? Since if the first condition will evaluate to false, the second 
one won’t be evaluated, and filesInRatio won’t be checked.
This could explain the behaviour I’m having with selection of eligible files.
Thank you.

> On 29 May 2015, at 04:26, Ted Yu <[email protected]> wrote:
> 
> That could be the case.
> 
> Though there was no log statement in isBetterSelection().
> So we don't know for sure if mightBeStuck was false.
> 
> On Thu, May 28, 2015 at 9:30 AM, Vladimir Rodionov <[email protected]>
> wrote:
> 
>> Its possible that logic of ExploringCompactionPolicy (default compaction
>> policy) is broken. I am looking into this code (master):
>> 
>> private boolean isBetterSelection(List<StoreFile> bestSelection,
>>      long bestSize, List<StoreFile> selection, long size, boolean
>> mightBeStuck) {
>>    if (mightBeStuck && bestSize > 0 && size > 0) {
>>      // Keep the selection that removes most files for least size. That
>> penaltizes adding
>>      // large files to compaction, but not small files, so we don't become
>> totally inefficient
>>      // (might want to tweak that in future). Also, given the current
>> order of looking at
>>      // permutations, prefer earlier files and smaller selection if the
>> difference is small.
>>      final double REPLACE_IF_BETTER_BY = 1.05;
>>      double thresholdQuality = ((double)bestSelection.size() / bestSize) *
>> REPLACE_IF_BETTER_BY;
>>      return thresholdQuality < ((double)selection.size() / size);
>>    }
>>    // Keep if this gets rid of more files.  Or the same number of files
>> for less io.
>>    return selection.size() > bestSelection.size()
>>      || (selection.size() == bestSelection.size() && size < bestSize);
>>  }
>> 
>> which compares two selections and what I see here is when mightBeStuck =
>> false selection with more files will
>> always be preferred.
>> 
>> Correct if I am wrong.
>> 
>> -Vlad
>> 
>> On Thu, May 28, 2015 at 8:00 AM, Akmal Abbasov <[email protected]>
>> wrote:
>> 
>>> Hi Ted,
>>> Thank you for reply.
>>> Yes, it was promoted to a major compaction because all files were
>>> eligible, but the thing I don’t understand, is why all of them were
>>> eligible?
>>> afaik, the compaction algorithm should select the best match for
>>> compaction, and it should include files with similar sizes.
>>> But as you can see from the logs the files selected have: 4.7K, 5.1K,
>> 3.8K
>>> and 10.8M.
>>> Why it is including 10.8M file?
>>> Which setting should be tuned to avoid this?
>>> Thank you.
>>> 
>>> Kind regards,
>>> Akmal Abbasov
>>> 
>>>> On 28 May 2015, at 16:54, Ted Yu <[email protected]> wrote:
>>>> 
>>>> bq. Completed major compaction of 4 file(s) in s of metrics,
>>>> V\xA36\x56\x5E\xC5}\xA1\x43\x00\x32\x00T\x1BU\xE0,
>>>> 3547f43afae5ac3f4e8a162d43a892b4.1417707276446.
>>>> 
>>>> The compaction involved all the files of store 's' for the region. Thus
>>> it
>>>> was considered major compaction.
>>>> 
>>>> Cheers
>>>> 
>>>> On Thu, May 28, 2015 at 2:16 AM, Akmal Abbasov <
>> [email protected]
>>>> 
>>>> wrote:
>>>> 
>>>>> Hi Ted,
>>>>> Sorry for a late reply.
>>>>> Here is a snippet from log file
>>>>> 2015-05-28 00:54:39,754 DEBUG
>>>>> [regionserver60020-smallCompactions-1432714643311]
>>>>> regionserver.CompactSplitThread: CompactSplitThread Status:
>>>>> compaction_queue=(0:27), split_queue=0, merge_queue=0
>>>>> 2015-05-28 00:54:39,754 DEBUG
>>>>> [regionserver60020-smallCompactions-1432714643311]
>>>>> compactions.RatioBasedCompactionPolicy: Selecting compaction from 4
>>> store
>>>>> files, 0 compacting, 4 eligible, 10 blocking
>>>>> 2015-05-28 00:54:39,755 DEBUG
>>>>> [regionserver60020-smallCompactions-1432714643311]
>>>>> compactions.ExploringCompactionPolicy: Exploring compaction algorithm
>>> has
>>>>> selected 4 files of size 11304175 starting at candidate #0 after
>>>>> considering 3 permutations with 3 in ratio
>>>>> 2015-05-28 00:54:39,755 DEBUG
>>>>> [regionserver60020-smallCompactions-1432714643311]
>> regionserver.HStore:
>>>>> 3547f43afae5ac3f4e8a162d43a892b4 - s: Initiating major compaction
>>>>> 2015-05-28 00:54:39,755 INFO
>>>>> [regionserver60020-smallCompactions-1432714643311]
>> regionserver.HRegion:
>>>>> Starting compaction on s in region
>>>>> 
>>> 
>> metrics,V\xA36\x56\x5E\xC5}\xA1\x43\x00\x32\x00T\x1BU\xE0,1417707276446.3547f43afae5ac3f4e8a162d43a892b4.
>>>>> 2015-05-28 00:54:39,755 INFO
>>>>> [regionserver60020-smallCompactions-1432714643311]
>> regionserver.HStore:
>>>>> Starting compaction of 4 file(s) in s of
>>>>> 
>>> 
>> metrics,V\xA36\x56\x5E\xC5}\xA1\x43\x00\x32\x00T\x1BU\xE0,1417707276446.3547f43afae5ac3f4e8a162d43a892b4.
>>>>> into
>>>>> 
>>> 
>> tmpdir=hdfs://prod1/hbase/data/default/metrics/3547f43afae5ac3f4e8a162d43a892b4/.tmp,
>>>>> totalSize=10.8 M
>>>>> 2015-05-28 00:54:39,756 DEBUG
>>>>> [regionserver60020-smallCompactions-1432714643311]
>>> compactions.Compactor:
>>>>> Compacting
>>>>> 
>>> 
>> hdfs://prod1/hbase/data/default/metrics/3547f43afae5ac3f4e8a162d43a892b4/s/dab3e768593e44a39097451038c5ebd0,
>>>>> keycount=3203, bloomtype=ROW, size=10.8 M, encoding=NONE,
>>> seqNum=172299974,
>>>>> earliestPutTs=1407941317178
>>>>> 2015-05-28 00:54:39,756 DEBUG
>>>>> [regionserver60020-smallCompactions-1432714643311]
>>> compactions.Compactor:
>>>>> Compacting
>>>>> 
>>> 
>> hdfs://prod1/hbase/data/default/metrics/3547f43afae5ac3f4e8a162d43a892b4/s/2d6472ef99a5478689f7ba822bc407a7,
>>>>> keycount=4, bloomtype=ROW, size=4.7 K, encoding=NONE,
>> seqNum=172299976,
>>>>> earliestPutTs=1432761158066
>>>>> 2015-05-28 00:54:39,756 DEBUG
>>>>> [regionserver60020-smallCompactions-1432714643311]
>>> compactions.Compactor:
>>>>> Compacting
>>>>> 
>>> 
>> hdfs://prod1/hbase/data/default/metrics/3547f43afae5ac3f4e8a162d43a892b4/s/bdbc806d045740e69ab34e3ea2e113c4,
>>>>> keycount=6, bloomtype=ROW, size=5.1 K, encoding=NONE,
>> seqNum=172299977,
>>>>> earliestPutTs=1432764757438
>>>>> 2015-05-28 00:54:39,756 DEBUG
>>>>> [regionserver60020-smallCompactions-1432714643311]
>>> compactions.Compactor:
>>>>> Compacting
>>>>> 
>>> 
>> hdfs://prod1/hbase/data/default/metrics/3547f43afae5ac3f4e8a162d43a892b4/s/561f93db484b4b9fb6446152c3eef5b8,
>>>>> keycount=2, bloomtype=ROW, size=3.8 K, encoding=NONE,
>> seqNum=172299978,
>>>>> earliestPutTs=1432768358747
>>>>> 2015-05-28 00:54:41,881 DEBUG
>>>>> [regionserver60020-smallCompactions-1432714643311]
>>>>> regionserver.HRegionFileSystem: Committing store file
>>>>> 
>>> 
>> hdfs://prod1/hbase/data/default/metrics/3547f43afae5ac3f4e8a162d43a892b4/.tmp/144f05a9546f446984a5b8fa173dd13e
>>>>> as
>>>>> 
>>> 
>> hdfs://prod1/hbase/data/default/metrics/3547f43afae5ac3f4e8a162d43a892b4/s/144f05a9546f446984a5b8fa173dd13e
>>>>> 2015-05-28 00:54:41,918 DEBUG
>>>>> [regionserver60020-smallCompactions-1432714643311]
>> regionserver.HStore:
>>>>> Removing store files after compaction...
>>>>> 2015-05-28 00:54:41,959 DEBUG
>>>>> [regionserver60020-smallCompactions-1432714643311]
>> backup.HFileArchiver:
>>>>> Finished archiving from class
>>>>> org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile,
>>>>> 
>>> 
>> file:hdfs://prod1/hbase/data/default/metrics/3547f43afae5ac3f4e8a162d43a892b4/s/dab3e768593e44a39097451038c5ebd0,
>>>>> to
>>>>> 
>>> 
>> hdfs://prod1/hbase/archive/data/default/metrics/3547f43afae5ac3f4e8a162d43a892b4/s/dab3e768593e44a39097451038c5ebd0
>>>>> 2015-05-28 00:54:42,030 DEBUG
>>>>> [regionserver60020-smallCompactions-1432714643311]
>> backup.HFileArchiver:
>>>>> Finished archiving from class
>>>>> org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile,
>>>>> 
>>> 
>> file:hdfs://prod1/hbase/data/default/metrics/3547f43afae5ac3f4e8a162d43a892b4/s/2d6472ef99a5478689f7ba822bc407a7,
>>>>> to
>>>>> 
>>> 
>> hdfs://prod1/hbase/archive/data/default/metrics/3547f43afae5ac3f4e8a162d43a892b4/s/2d6472ef99a5478689f7ba822bc407a7
>>>>> 2015-05-28 00:54:42,051 DEBUG
>>>>> [regionserver60020-smallCompactions-1432714643311]
>> backup.HFileArchiver:
>>>>> Finished archiving from class
>>>>> org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile,
>>>>> 
>>> 
>> file:hdfs://prod1/hbase/data/default/metrics/3547f43afae5ac3f4e8a162d43a892b4/s/bdbc806d045740e69ab34e3ea2e113c4,
>>>>> to
>>>>> 
>>> 
>> hdfs://prod1/hbase/archive/data/default/metrics/3547f43afae5ac3f4e8a162d43a892b4/s/bdbc806d045740e69ab34e3ea2e113c4
>>>>> 2015-05-28 00:54:42,071 DEBUG
>>>>> [regionserver60020-smallCompactions-1432714643311]
>> backup.HFileArchiver:
>>>>> Finished archiving from class
>>>>> org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile,
>>>>> 
>>> 
>> file:hdfs://prod1/hbase/data/default/metrics/3547f43afae5ac3f4e8a162d43a892b4/s/561f93db484b4b9fb6446152c3eef5b8,
>>>>> to
>>>>> 
>>> 
>> hdfs://prod1/hbase/archive/data/default/metrics/3547f43afae5ac3f4e8a162d43a892b4/s/561f93db484b4b9fb6446152c3eef5b8
>>>>> 2015-05-28 00:54:42,072 INFO
>>>>> [regionserver60020-smallCompactions-1432714643311]
>> regionserver.HStore:
>>>>> Completed major compaction of 4 file(s) in s of
>>>>> 
>>> 
>> metrics,V\xA36\x56\x5E\xC5}\xA1\x43\x00\x32\x00T\x1BU\xE0,1417707276446.3547f43afae5ac3f4e8a162d43a892b4.
>>>>> into 144f05a9546f446984a5b8fa173dd13e(size=10.8 M), total size for
>>> store is
>>>>> 10.8 M. This selection was in queue for 0sec, and took 2sec to
>> execute.
>>>>> 2015-05-28 00:54:42,072 INFO
>>>>> [regionserver60020-smallCompactions-1432714643311]
>>>>> regionserver.CompactSplitThread: Completed compaction: Request =
>>>>> 
>>> 
>> regionName=metrics,V\xA36\x56\x5E\xC5}\xA1\x43\x00\x32\x00T\x1BU\xE0,1417707276446.3547f43afae5ac3f4e8a162d43a892b4.,
>>>>> storeName=s, fileCount=4, fileSize=10.8 M, priority=6,
>>>>> time=1368019430741233; duration=2sec
>>>>> 
>>>>> My question is, why the major compaction was executed instead of minor
>>>>> compaction.
>>>>> I have this messages all over the log file.
>>>>> Thank you!
>>>>> 
>>>>>> On 12 May 2015, at 23:53, Ted Yu <[email protected]> wrote:
>>>>>> 
>>>>>> Can you pastebin major compaction related log snippets ?
>>>>>> See the following for example of such logs:
>>>>>> 
>>>>>> 2015-05-09 10:57:58,961 INFO
>>>>>> [PriorityRpcServer.handler=13,queue=1,port=16020]
>>>>>> regionserver.RSRpcServices: Compacting
>>>>>> 
>>>>> 
>>> 
>> IntegrationTestBigLinkedList,\x91\x11\x11\x11\x11\x11\x11\x08,1431193978741.700b34f5d2a3aa10804eff35906fd6d8.
>>>>>> 2015-05-09 10:57:58,962 DEBUG
>>>>>> [PriorityRpcServer.handler=13,queue=1,port=16020]
>> regionserver.HStore:
>>>>>> Skipping expired store file removal due to min version being 1
>>>>>> 2015-05-09 10:57:58,962 DEBUG
>>>>>> [PriorityRpcServer.handler=13,queue=1,port=16020]
>>>>>> compactions.RatioBasedCompactionPolicy: Selecting compaction from 5
>>> store
>>>>>> files, 0 compacting, 5 eligible, 10 blocking
>>>>>> 2015-05-09 10:57:58,963 DEBUG
>>>>>> [PriorityRpcServer.handler=13,queue=1,port=16020]
>> regionserver.HStore:
>>>>>> 700b34f5d2a3aa10804eff35906fd6d8 - meta: Initiating major compaction
>>> (all
>>>>>> files)
>>>>>> 
>>>>>> 
>>>>>> Cheers
>>>>>> 
>>>>>> On Tue, May 12, 2015 at 2:06 PM, Akmal Abbasov <
>>> [email protected]
>>>>>> 
>>>>>> wrote:
>>>>>> 
>>>>>>> Hi Ted,
>>>>>>> Thank you for reply.
>>>>>>> I am running with the default settings.
>>>>>>> 
>>>>>>> Sent from my iPhone
>>>>>>> 
>>>>>>>> On 12 May 2015, at 22:02, Ted Yu <[email protected]> wrote:
>>>>>>>> 
>>>>>>>> Can you show us compaction related parameters you use ?
>>>>>>>> 
>>>>>>>> e.g. hbase.hregion.majorcompaction ,
>>>>>>> hbase.hregion.majorcompaction.jitter ,
>>>>>>>> etc
>>>>>>>> 
>>>>>>>> On Tue, May 12, 2015 at 9:52 AM, Akmal Abbasov <
>>>>> [email protected]
>>>>>>>> 
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> HI,
>>>>>>>>> I am using HBase 0.98.7.
>>>>>>>>> I am using HBase snapshots to backup data. I create snapshot of
>>> tables
>>>>>>>>> each our.
>>>>>>>>> Each create snapshot process will cause the flush of the memstore,
>>> and
>>>>>>>>> creation of hfiles.
>>>>>>>>> When the number of hfiles will reach 3 the MINOR compaction
>> process
>>>>> will
>>>>>>>>> start for each CF.
>>>>>>>>> Ok, I was expecting that the compaction will process only small
>>>>> hfiles,
>>>>>>>>> and I won’t have problems with moving all data to archive folder
>>>>>>>>> each time after compaction process ends.
>>>>>>>>> But most of the times, the minor compaction is promoted to
>>> major(more
>>>>>>> than
>>>>>>>>> 100 in 24 hours without loads).
>>>>>>>>> As far as I know, the only possibility for this is that all hfiles
>>> are
>>>>>>>>> eligible for compaction.
>>>>>>>>> But when I tested the archive folder for a CF I see the strange
>>>>>>> situation
>>>>>>>>> -rw-r--r--   3 akmal supergroup      1.0 K 2015-05-10 06:04
>>>>>>>>> 
>>>>>>> 
>>>>> 
>>> 
>> /hbase/archive/data/default/table1/0e8e3bf44a2ea5dfaa8a9c58d99b92e6/c/36dc06f4c34242daadc343d857a35734
>>>>>>>>> -rw-r--r--   3 akmal supergroup      1.0 K 2015-05-10 06:04
>>>>>>>>> 
>>>>>>> 
>>>>> 
>>> 
>> /hbase/archive/data/default/table1/0e8e3bf44a2ea5dfaa8a9c58d99b92e6/c/7e8b993f97b84f4594542144f15b0a1e
>>>>>>>>> -rw-r--r--   3 akmal supergroup      1.1 K 2015-05-10 06:04
>>>>>>>>> 
>>>>>>> 
>>>>> 
>>> 
>> /hbase/archive/data/default/table1/0e8e3bf44a2ea5dfaa8a9c58d99b92e6/c/b9afc64792ba4bf99a08f34033cc46ac
>>>>>>>>> -rw-r--r--   3 akmal supergroup    638.4 K 2015-05-10 06:04
>>>>>>>>> 
>>>>>>> 
>>>>> 
>>> 
>> /hbase/archive/data/default/table1/0e8e3bf44a2ea5dfaa8a9c58d99b92e6/c/dff846ae4fc24d418289a95322b35d46
>>>>>>>>> -rw-r--r--   3 akmal supergroup      1.0 K 2015-05-10 08:50
>>>>>>>>> 
>>>>>>> 
>>>>> 
>>> 
>> /hbase/archive/data/default/table1/0e8e3bf44a2ea5dfaa8a9c58d99b92e6/c/228eee22c32e458e8eb7f5d031f64b58
>>>>>>>>> -rw-r--r--   3 akmal supergroup      1.0 K 2015-05-10 08:50
>>>>>>>>> 
>>>>>>> 
>>>>> 
>>> 
>> /hbase/archive/data/default/table1/0e8e3bf44a2ea5dfaa8a9c58d99b92e6/c/529257432308466f971e41db49ecffdf
>>>>>>>>> -rw-r--r--   3 akmal supergroup    638.5 K 2015-05-10 08:50
>>>>>>>>> 
>>>>>>> 
>>>>> 
>>> 
>> /hbase/archive/data/default/table1/0e8e3bf44a2ea5dfaa8a9c58d99b92e6/c/839d3a6fc523435d8b44f63315fd11b8
>>>>>>>>> -rw-r--r--   3 akmal supergroup      1.0 K 2015-05-10 08:50
>>>>>>>>> 
>>>>>>> 
>>>>> 
>>> 
>> /hbase/archive/data/default/table1/0e8e3bf44a2ea5dfaa8a9c58d99b92e6/c/8c245e8661b140439e719f69a535d57f
>>>>>>>>> -rw-r--r--   3 akmal supergroup      1.0 K 2015-05-10 11:37
>>>>>>>>> 
>>>>>>> 
>>>>> 
>>> 
>> /hbase/archive/data/default/table1/0e8e3bf44a2ea5dfaa8a9c58d99b92e6/c/23497a31d3e721fe9b63c58fbe0224d5
>>>>>>>>> -rw-r--r--   3 akmal supergroup    638.7 K 2015-05-10 11:37
>>>>>>>>> 
>>>>>>> 
>>>>> 
>>> 
>> /hbase/archive/data/default/table1/0e8e3bf44a2ea5dfaa8a9c58d99b92e6/c/8c9af0357d164221ad46b336cd660b30
>>>>>>>>> -rw-r--r--   3 akmal supergroup      1.0 K 2015-05-10 11:37
>>>>>>>>> 
>>>>>>> 
>>>>> 
>>> 
>> /hbase/archive/data/default/table1/0e8e3bf44a2ea5dfaa8a9c58d99b92e6/c/8eb55b43c22d434954e2e0bfda656018
>>>>>>>>> -rw-r--r--   3 akmal supergroup      1.0 K 2015-05-10 11:37
>>>>>>>>> 
>>>>>>> 
>>>>> 
>>> 
>> /hbase/archive/data/default/table1/0e8e3bf44a2ea5dfaa8a9c58d99b92e6/c/b8b6210d9e6d4ec2344238c6e9c17ddf
>>>>>>>>> 
>>>>>>>>> As I understood this files were copied to archive folder after
>>>>>>> compaction.
>>>>>>>>> The part I didn’t understand is, why the file with 638 K was also
>>>>>>> selected
>>>>>>>>> for compaction?
>>>>>>>>> Any ideas?
>>>>>>>>> Thank you.
>>>>>>>>> 
>>>>>>>>> Kind regards,
>>>>>>>>> Akmal Abbasov
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>> 
>>>>> 
>>> 
>>> 
>> 

Reply via email to