[ 
https://issues.apache.org/jira/browse/HBASE-17172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15951360#comment-15951360
 ] 

huaxiang sun edited comment on HBASE-17172 at 3/31/17 5:35 PM:
---------------------------------------------------------------

[~jingcheng.du], a late follow-up on this. Grouping delete files by its 
first/last key is to avoid including delete files to set of 
files-to-be-compacted as much as possible. If only started key is used, there 
is one case which I am not sure how to handle it (maybe I am following your 
idea correctly). 

Let's say, for region 1, it starts with key0, ends at key2. It has one delete 
file key0***_del. After that, the region may split to region1-0, region1-1, For 
region1-1, key0***_del may be included for compaction as it may contain keys 
for it.  My understanding is that if we only use startKey to group files, 
key0***_del will not be included in region1-1's mob compaction. 

Maybe as you said
{quote}
Since now we have always retained the delete markers in hfiles, 
{quote}
It is ok not to include the delete file with reigon1-1, data for the delete 
cells will still be kept, and they will be bulkloaded after mob compaction, 
since delete markers are still in hfiles, they will not show up.

Is my understanding correct? Thanks [~jingcheng.du]!



was (Author: huaxiang):
[~jingcheng.du], a late follow-up on this. Grouping delete files by its 
first/last key is to avoid including delete files to set of 
files-to-be-compacted as much as possible. If only started key is used, there 
is one case which I am not sure how to handle it (maybe I am following your 
idea correctly). 

Let's say, for region 1, it starts with key0, ends at key2. It has one delete 
file key0***_del. After that, the region may split to region1-0, region1-1, For 
region1-1, key0***_del may be included for compaction as it may contain keys 
for it.  My understanding is that if we only use startKey to group files, 
key0***_del will not be included in region1-1's mob compaction. 

Maybe as you said
{quote}
Since now we have always retained the delete markers in hfiles, 
{quota}
It is ok not to include the delete file with reigon1-1, data for the delete 
cells will still be kept, and they will be bulkloaded after mob compaction, 
since delete markers are still in hfiles, they will not show up.

Is my understanding correct? Thanks [~jingcheng.du]!


> Optimize mob compaction with _del files
> ---------------------------------------
>
>                 Key: HBASE-17172
>                 URL: https://issues.apache.org/jira/browse/HBASE-17172
>             Project: HBase
>          Issue Type: Improvement
>          Components: mob
>    Affects Versions: 2.0.0
>            Reporter: huaxiang sun
>            Assignee: huaxiang sun
>             Fix For: 2.0.0
>
>         Attachments: HBASE-17172-master-001.patch, 
> HBASE-17172.master.001.patch, HBASE-17172.master.002.patch, 
> HBASE-17172.master.003.patch
>
>
> Today, when there is a _del file in mobdir, with major mob compaction, every 
> mob file will be recompacted, this causes lots of IO and slow down major mob 
> compaction (may take months to finish). This needs to be improved. A few 
> ideas are: 
> 1) Do not compact all _del files into one, instead, compact them based on 
> groups with startKey as the key. Then use firstKey/startKey to make each mob 
> file to see if the _del file needs to be included for this partition.
> 2). Based on the timerange of the _del file, compaction for files after that 
> timerange does not need to include the _del file as these are newer files.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to