[jira] [Commented] (HBASE-17172) Optimize major mob compaction with _del files
[ https://issues.apache.org/jira/browse/HBASE-17172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15714463#comment-15714463 ] Jingcheng Du commented on HBASE-17172: -- We design the mob to reduce the IO amplification. The design tries to guarantee the read performance no matter how many mob files there are. So we can reduce the compacted files (which leads to too many files) by setting such a threshold. We don't need to limit the number of files to small to fast the reading. That is why the default threshold is small, and that is why your compact policy JIRA is so important:) The threshold is a key to reduce IO amplification, so we don't recommend to set it as a very large number. Otherwise, mob doesn't have too many differences from storing cells directly in HBase. > Optimize major mob compaction with _del files > - > > Key: HBASE-17172 > URL: https://issues.apache.org/jira/browse/HBASE-17172 > Project: HBase > Issue Type: Improvement > Components: mob >Affects Versions: 2.0.0 >Reporter: huaxiang sun >Assignee: huaxiang sun > > Today, when there is a _del file in mobdir, with major mob compaction, every > mob file will be recompacted, this causes lots of IO and slow down major mob > compaction (may take months to finish). This needs to be improved. A few > ideas are: > 1) Do not compact all _del files into one, instead, compact them based on > groups with startKey as the key. Then use firstKey/startKey to make each mob > file to see if the _del file needs to be included for this partition. > 2). Based on the timerange of the _del file, compaction for files after that > timerange does not need to include the _del file as these are newer files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17172) Optimize major mob compaction with _del files
[ https://issues.apache.org/jira/browse/HBASE-17172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15714331#comment-15714331 ] huaxiang sun commented on HBASE-17172: -- One more question for you, Jingcheng, :). When threshold is so big that size for all mob files is less than this threshold, in this case, if there are _del files, the minor mob compaction actually turns into a major mob compaction. What is the reason behind the design? Since threshold is a user configurable variable, user may choose to configure a large value and turns the mob compaction into a major one, if there are _del files, compaction will take longer than expected. Thinking about compacting 1 mob file with _del files only for major_mob_compact case so user is aware of what is going to happen. comments? Thanks.. > Optimize major mob compaction with _del files > - > > Key: HBASE-17172 > URL: https://issues.apache.org/jira/browse/HBASE-17172 > Project: HBase > Issue Type: Improvement > Components: mob >Affects Versions: 2.0.0 >Reporter: huaxiang sun >Assignee: huaxiang sun > > Today, when there is a _del file in mobdir, with major mob compaction, every > mob file will be recompacted, this causes lots of IO and slow down major mob > compaction (may take months to finish). This needs to be improved. A few > ideas are: > 1) Do not compact all _del files into one, instead, compact them based on > groups with startKey as the key. Then use firstKey/startKey to make each mob > file to see if the _del file needs to be included for this partition. > 2). Based on the timerange of the _del file, compaction for files after that > timerange does not need to include the _del file as these are newer files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17172) Optimize major mob compaction with _del files
[ https://issues.apache.org/jira/browse/HBASE-17172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15714293#comment-15714293 ] Jingcheng Du commented on HBASE-17172: -- bq. Can I create a new jira to address "Meanwhile, we can add more constriction, for example only perform compaction when there are more than 2 mob files and _del files in minor compaction?"? Sure, thanks! > Optimize major mob compaction with _del files > - > > Key: HBASE-17172 > URL: https://issues.apache.org/jira/browse/HBASE-17172 > Project: HBase > Issue Type: Improvement > Components: mob >Affects Versions: 2.0.0 >Reporter: huaxiang sun >Assignee: huaxiang sun > > Today, when there is a _del file in mobdir, with major mob compaction, every > mob file will be recompacted, this causes lots of IO and slow down major mob > compaction (may take months to finish). This needs to be improved. A few > ideas are: > 1) Do not compact all _del files into one, instead, compact them based on > groups with startKey as the key. Then use firstKey/startKey to make each mob > file to see if the _del file needs to be included for this partition. > 2). Based on the timerange of the _del file, compaction for files after that > timerange does not need to include the _del file as these are newer files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17172) Optimize major mob compaction with _del files
[ https://issues.apache.org/jira/browse/HBASE-17172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15714196#comment-15714196 ] huaxiang sun commented on HBASE-17172: -- Is it possible to compact the _del files by regions, and save the start keys and stop keys in memory for each partition to decide if we need to compact? That is one of the ideas to optimize compaction with _del files. Can I create a new jira to address "Meanwhile, we can add more constriction, for example only perform compaction when there are more than 2 mob files and _del files in minor compaction?"? Thanks! > Optimize major mob compaction with _del files > - > > Key: HBASE-17172 > URL: https://issues.apache.org/jira/browse/HBASE-17172 > Project: HBase > Issue Type: Improvement > Components: mob >Affects Versions: 2.0.0 >Reporter: huaxiang sun >Assignee: huaxiang sun > > Today, when there is a _del file in mobdir, with major mob compaction, every > mob file will be recompacted, this causes lots of IO and slow down major mob > compaction (may take months to finish). This needs to be improved. A few > ideas are: > 1) Do not compact all _del files into one, instead, compact them based on > groups with startKey as the key. Then use firstKey/startKey to make each mob > file to see if the _del file needs to be included for this partition. > 2). Based on the timerange of the _del file, compaction for files after that > timerange does not need to include the _del file as these are newer files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17172) Optimize major mob compaction with _del files
[ https://issues.apache.org/jira/browse/HBASE-17172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15714169#comment-15714169 ] Jingcheng Du commented on HBASE-17172: -- You are right. It is like this now. We are now is trying to avoid the unnecessary, right? Is it possible to compact the _del files by regions, and save the start keys and stop keys in memory for each partition to decide if we need to compact? Meanwhile, we can add more constriction, for example only perform compaction when there are more than 2 mob files and _del files in minor compaction? > Optimize major mob compaction with _del files > - > > Key: HBASE-17172 > URL: https://issues.apache.org/jira/browse/HBASE-17172 > Project: HBase > Issue Type: Improvement > Components: mob >Affects Versions: 2.0.0 >Reporter: huaxiang sun >Assignee: huaxiang sun > > Today, when there is a _del file in mobdir, with major mob compaction, every > mob file will be recompacted, this causes lots of IO and slow down major mob > compaction (may take months to finish). This needs to be improved. A few > ideas are: > 1) Do not compact all _del files into one, instead, compact them based on > groups with startKey as the key. Then use firstKey/startKey to make each mob > file to see if the _del file needs to be included for this partition. > 2). Based on the timerange of the _del file, compaction for files after that > timerange does not need to include the _del file as these are newer files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17172) Optimize major mob compaction with _del files
[ https://issues.apache.org/jira/browse/HBASE-17172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15714142#comment-15714142 ] huaxiang sun commented on HBASE-17172: -- Thanks Jingcheng. Regarding with "If we skip the compacted files, the threshold is not that useful anymore.", today if there is only one file in the partition, and there is no _del files, the file is skipped. With del file, the current logic is to compact the already-compacted file with _del file. Let's say there is one mob file regionA20161101, which was compacted. On 12/1/2016, there is _del file regionB20161201_del, mob compaction kicks in, regionA20161101 is less than the threshold, and it is picked for compaction. Since there is a _del file, regionA20161101 and regionB20161201_del are compacted into regionA20161101_1 . After that, regionB20161201_del cannot be deleted since it is not a allFile compaction. The next mob compaction, regionA20161101_1 and regionB20161201_del will be picked up again and be compacted into regionA20161101_2. So in this case, it will cause more unnecessary IOs. Could you double confirm if this is the case? > Optimize major mob compaction with _del files > - > > Key: HBASE-17172 > URL: https://issues.apache.org/jira/browse/HBASE-17172 > Project: HBase > Issue Type: Improvement > Components: mob >Affects Versions: 2.0.0 >Reporter: huaxiang sun >Assignee: huaxiang sun > > Today, when there is a _del file in mobdir, with major mob compaction, every > mob file will be recompacted, this causes lots of IO and slow down major mob > compaction (may take months to finish). This needs to be improved. A few > ideas are: > 1) Do not compact all _del files into one, instead, compact them based on > groups with startKey as the key. Then use firstKey/startKey to make each mob > file to see if the _del file needs to be included for this partition. > 2). Based on the timerange of the _del file, compaction for files after that > timerange does not need to include the _del file as these are newer files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17172) Optimize major mob compaction with _del files
[ https://issues.apache.org/jira/browse/HBASE-17172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15711517#comment-15711517 ] Jingcheng Du commented on HBASE-17172: -- Thanks [~huaxiang]. If we skip the compacted files, the threshold is not that useful anymore. I have three options for the solution. One is to decrease the threshold, and use the compaction policy in HBASE-16981 in the compaction. The second one is we can skip the minor compaction if there is only one mob file (or two mob files) and one _del file. But we have to suffer the unnecessary compaction in major compaction ( although the major compaction is not recommended). The last one is we group the _del files by regions, but this is very difficult to allign the keys in _del files and the partitions in mob files. > Optimize major mob compaction with _del files > - > > Key: HBASE-17172 > URL: https://issues.apache.org/jira/browse/HBASE-17172 > Project: HBase > Issue Type: Improvement > Components: mob >Affects Versions: 2.0.0 >Reporter: huaxiang sun >Assignee: huaxiang sun > > Today, when there is a _del file in mobdir, with major mob compaction, every > mob file will be recompacted, this causes lots of IO and slow down major mob > compaction (may take months to finish). This needs to be improved. A few > ideas are: > 1) Do not compact all _del files into one, instead, compact them based on > groups with startKey as the key. Then use firstKey/startKey to make each mob > file to see if the _del file needs to be included for this partition. > 2). Based on the timerange of the _del file, compaction for files after that > timerange does not need to include the _del file as these are newer files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17172) Optimize major mob compaction with _del files
[ https://issues.apache.org/jira/browse/HBASE-17172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15707080#comment-15707080 ] huaxiang sun commented on HBASE-17172: -- Hi [~jingcheng.du] and [~anoop.hbase], just did more code reading and found that _del files can be included in minor mob compaction when the file size is less than the threshold. Assume that user sets a high threshold value, even for already compacted-files, it can be included in the compact list again and be compacted with the del files. If we want to deal with _del files mainly in major mob compaction. Can we skip these already-compacted files in the minor compaction? something like in the select() after files are added to filesToCompact map. This is to speed up minor compaction with del files. {code} diff --git a/hbase-server/src/main/java/org/apache/hadoop/hbase/mob/compactions/PartitionedMobCompactor.java b/hbase-server/src/main/java/org/apache/hadoop/hbase/mob/compactions/PartitionedMobCompactor.java index 33aecc0..dab05d2 100644 --- a/hbase-server/src/main/java/org/apache/hadoop/hbase/mob/compactions/PartitionedMobCompactor.java +++ b/hbase-server/src/main/java/org/apache/hadoop/hbase/mob/compactions/PartitionedMobCompactor.java @@ -25,6 +25,7 @@ import java.util.Collection; import java.util.Collections; import java.util.Date; import java.util.HashMap; +import java.util.Iterator; import java.util.List; import java.util.Map; import java.util.Map.Entry; @@ -179,6 +180,23 @@ public class PartitionedMobCompactor extends MobCompactor { selectedFileCount++; } } + +/* + * If it is not a major mob compaction with del files, and the file number in Partition is 1, + * remove the partition from filesToCompact list to avoid re-compacting files which has been + * compacted with del files. + */ +if (!allFiles && (allDelFiles.size() > 0)) { + for(Iterator> it = + filesToCompact.entrySet().iterator(); it.hasNext(); ) { +Map.Entry entry = it.next(); +if (entry.getValue().getFileNumbers() <= 1) { + it.remove(); + --selectedFileCount; +} + } +} + PartitionedMobCompactionRequest request = new PartitionedMobCompactionRequest( filesToCompact.values(), allDelFiles); if (candidates.size() == (allDelFiles.size() + selectedFileCount + irrelevantFileCount)) { {code} > Optimize major mob compaction with _del files > - > > Key: HBASE-17172 > URL: https://issues.apache.org/jira/browse/HBASE-17172 > Project: HBase > Issue Type: Improvement > Components: mob >Affects Versions: 2.0.0 >Reporter: huaxiang sun >Assignee: huaxiang sun > > Today, when there is a _del file in mobdir, with major mob compaction, every > mob file will be recompacted, this causes lots of IO and slow down major mob > compaction (may take months to finish). This needs to be improved. A few > ideas are: > 1) Do not compact all _del files into one, instead, compact them based on > groups with startKey as the key. Then use firstKey/startKey to make each mob > file to see if the _del file needs to be included for this partition. > 2). Based on the timerange of the _del file, compaction for files after that > timerange does not need to include the _del file as these are newer files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17172) Optimize major mob compaction with _del files
[ https://issues.apache.org/jira/browse/HBASE-17172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15703892#comment-15703892 ] Jingcheng Du commented on HBASE-17172: -- Thanks [~huaxiang] and [~anoopsamjohn]. bq. Means the deleted MOB cells will be there in MOB files unless they are TTL expired? Right. But according to Huaxiang's comments, this is not doable. So it seems merging files by regions is the only way now. > Optimize major mob compaction with _del files > - > > Key: HBASE-17172 > URL: https://issues.apache.org/jira/browse/HBASE-17172 > Project: HBase > Issue Type: Improvement > Components: mob >Affects Versions: 2.0.0 >Reporter: huaxiang sun >Assignee: huaxiang sun > > Today, when there is a _del file in mobdir, with major mob compaction, every > mob file will be recompacted, this causes lots of IO and slow down major mob > compaction (may take months to finish). This needs to be improved. A few > ideas are: > 1) Do not compact all _del files into one, instead, compact them based on > groups with startKey as the key. Then use firstKey/startKey to make each mob > file to see if the _del file needs to be included for this partition. > 2). Based on the timerange of the _del file, compaction for files after that > timerange does not need to include the _del file as these are newer files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17172) Optimize major mob compaction with _del files
[ https://issues.apache.org/jira/browse/HBASE-17172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15702722#comment-15702722 ] huaxiang sun commented on HBASE-17172: -- We have use cases that user wants TTL to be multiple years or so, there may be lots of deleted cells depending on the use case. I think we want to give user an option to free up the space for these deleted cells. > Optimize major mob compaction with _del files > - > > Key: HBASE-17172 > URL: https://issues.apache.org/jira/browse/HBASE-17172 > Project: HBase > Issue Type: Improvement > Components: mob >Affects Versions: 2.0.0 >Reporter: huaxiang sun >Assignee: huaxiang sun > > Today, when there is a _del file in mobdir, with major mob compaction, every > mob file will be recompacted, this causes lots of IO and slow down major mob > compaction (may take months to finish). This needs to be improved. A few > ideas are: > 1) Do not compact all _del files into one, instead, compact them based on > groups with startKey as the key. Then use firstKey/startKey to make each mob > file to see if the _del file needs to be included for this partition. > 2). Based on the timerange of the _del file, compaction for files after that > timerange does not need to include the _del file as these are newer files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17172) Optimize major mob compaction with _del files
[ https://issues.apache.org/jira/browse/HBASE-17172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15702059#comment-15702059 ] Anoop Sam John commented on HBASE-17172: bq.Or we can retain the delete markers in hbase tables until they are expired? Means the deleted MOB cells will be there in MOB files unless they are TTL expired? > Optimize major mob compaction with _del files > - > > Key: HBASE-17172 > URL: https://issues.apache.org/jira/browse/HBASE-17172 > Project: HBase > Issue Type: Improvement > Components: mob >Affects Versions: 2.0.0 >Reporter: huaxiang sun >Assignee: huaxiang sun > > Today, when there is a _del file in mobdir, with major mob compaction, every > mob file will be recompacted, this causes lots of IO and slow down major mob > compaction (may take months to finish). This needs to be improved. A few > ideas are: > 1) Do not compact all _del files into one, instead, compact them based on > groups with startKey as the key. Then use firstKey/startKey to make each mob > file to see if the _del file needs to be included for this partition. > 2). Based on the timerange of the _del file, compaction for files after that > timerange does not need to include the _del file as these are newer files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17172) Optimize major mob compaction with _del files
[ https://issues.apache.org/jira/browse/HBASE-17172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15701682#comment-15701682 ] Jingcheng Du commented on HBASE-17172: -- Thanks for clarification [~huaxiang]. You are right. The compaction is not necessary if only one mob file and some _del files and that mob file is not related with these _del files. I think we can group the _del files by regions. Or we can retain the delete markers in hbase tables until they are expired? [~anoopsamjohn], do you have preference on this? Thanks! > Optimize major mob compaction with _del files > - > > Key: HBASE-17172 > URL: https://issues.apache.org/jira/browse/HBASE-17172 > Project: HBase > Issue Type: Improvement > Components: mob >Affects Versions: 2.0.0 >Reporter: huaxiang sun >Assignee: huaxiang sun > > Today, when there is a _del file in mobdir, with major mob compaction, every > mob file will be recompacted, this causes lots of IO and slow down major mob > compaction (may take months to finish). This needs to be improved. A few > ideas are: > 1) Do not compact all _del files into one, instead, compact them based on > groups with startKey as the key. Then use firstKey/startKey to make each mob > file to see if the _del file needs to be included for this partition. > 2). Based on the timerange of the _del file, compaction for files after that > timerange does not need to include the _del file as these are newer files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17172) Optimize major mob compaction with _del files
[ https://issues.apache.org/jira/browse/HBASE-17172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15694193#comment-15694193 ] huaxiang sun commented on HBASE-17172: -- {code} Hmm. If the .del is not a performance killer, we don't need this. I reviewed the code, I think the .del files is not the reason of the slow compaction, major compaction itself is. {code} I need to provide more background here. Let's say mob files have been major compacted one week ago. There are regionA and regionB, assume there is regionA20161001*** and regionB20161001 which are the results from previous major compaction. There is one del file for regionA created the past week. A major compaction kicks in. regionA20161001*** and regionB20161001*** will be re-compacted in this case. While compacting regionA20161001 is needed, re-compacting regionB20161001*** is a waste. Given there are lots of other regions and many already-compacted files, unnecessary compaction slows down the major compaction. > Optimize major mob compaction with _del files > - > > Key: HBASE-17172 > URL: https://issues.apache.org/jira/browse/HBASE-17172 > Project: HBase > Issue Type: Improvement > Components: mob >Affects Versions: 2.0.0 >Reporter: huaxiang sun >Assignee: huaxiang sun > > Today, when there is a _del file in mobdir, with major mob compaction, every > mob file will be recompacted, this causes lots of IO and slow down major mob > compaction (may take months to finish). This needs to be improved. A few > ideas are: > 1) Do not compact all _del files into one, instead, compact them based on > groups with startKey as the key. Then use firstKey/startKey to make each mob > file to see if the _del file needs to be included for this partition. > 2). Based on the timerange of the _del file, compaction for files after that > timerange does not need to include the _del file as these are newer files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17172) Optimize major mob compaction with _del files
[ https://issues.apache.org/jira/browse/HBASE-17172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15693154#comment-15693154 ] Jingcheng Du commented on HBASE-17172: -- bq. So when we have _del files, we will promote compaction to be major and that the issue u r saying? As I know, it is not. The major compaction is either triggered by clients, or all of the mob files are smaller than the mergeable threshold. > Optimize major mob compaction with _del files > - > > Key: HBASE-17172 > URL: https://issues.apache.org/jira/browse/HBASE-17172 > Project: HBase > Issue Type: Improvement > Components: mob >Affects Versions: 2.0.0 >Reporter: huaxiang sun >Assignee: huaxiang sun > > Today, when there is a _del file in mobdir, with major mob compaction, every > mob file will be recompacted, this causes lots of IO and slow down major mob > compaction (may take months to finish). This needs to be improved. A few > ideas are: > 1) Do not compact all _del files into one, instead, compact them based on > groups with startKey as the key. Then use firstKey/startKey to make each mob > file to see if the _del file needs to be included for this partition. > 2). Based on the timerange of the _del file, compaction for files after that > timerange does not need to include the _del file as these are newer files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17172) Optimize major mob compaction with _del files
[ https://issues.apache.org/jira/browse/HBASE-17172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15693053#comment-15693053 ] Anoop Sam John commented on HBASE-17172: So when we have _del files, we will promote compaction to be major and that the issue u r saying? Any way ur proposal to change how _del files getting compacted and how to use them more effectively makes full sense.. Sorry for the Q. I forgot the impl details now. > Optimize major mob compaction with _del files > - > > Key: HBASE-17172 > URL: https://issues.apache.org/jira/browse/HBASE-17172 > Project: HBase > Issue Type: Improvement > Components: mob >Affects Versions: 2.0.0 >Reporter: huaxiang sun >Assignee: huaxiang sun > > Today, when there is a _del file in mobdir, with major mob compaction, every > mob file will be recompacted, this causes lots of IO and slow down major mob > compaction (may take months to finish). This needs to be improved. A few > ideas are: > 1) Do not compact all _del files into one, instead, compact them based on > groups with startKey as the key. Then use firstKey/startKey to make each mob > file to see if the _del file needs to be included for this partition. > 2). Based on the timerange of the _del file, compaction for files after that > timerange does not need to include the _del file as these are newer files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17172) Optimize major mob compaction with _del files
[ https://issues.apache.org/jira/browse/HBASE-17172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15692632#comment-15692632 ] Jingcheng Du commented on HBASE-17172: -- Thanks Huaxiang! bq. users may still choose to disable mob compaction chore and run mob compaction manually at scheduled maintenance. Right, how about to run minor compaction instead? It doesn't make sense to run major mob compaction periodically. Mob is designed to reduce the IO amplification during compaction. Major compaction will break this. bq. To keep delete marker in hbase files in mob-enabled cf is one way to avoid .del files, the concern is that it is inconsistent with non-mob cfs (maybe this can be provided as option through config?). Hmm. If the .del is not a performance killer, we don't need this. I reviewed the code, I think the .del files is not the reason of the slow compaction, major compaction itself is. bq. With the current major mob compaction, these .del files will be included in compacting of files for other regions which is not necessary. Right, it is not necessary. To split them by regions is an good choice. But is this necessary if .del file didn't impact the compaction performance badly? > Optimize major mob compaction with _del files > - > > Key: HBASE-17172 > URL: https://issues.apache.org/jira/browse/HBASE-17172 > Project: HBase > Issue Type: Improvement > Components: mob >Affects Versions: 2.0.0 >Reporter: huaxiang sun >Assignee: huaxiang sun > > Today, when there is a _del file in mobdir, with major mob compaction, every > mob file will be recompacted, this causes lots of IO and slow down major mob > compaction (may take months to finish). This needs to be improved. A few > ideas are: > 1) Do not compact all _del files into one, instead, compact them based on > groups with startKey as the key. Then use firstKey/startKey to make each mob > file to see if the _del file needs to be included for this partition. > 2). Based on the timerange of the _del file, compaction for files after that > timerange does not need to include the _del file as these are newer files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17172) Optimize major mob compaction with _del files
[ https://issues.apache.org/jira/browse/HBASE-17172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15692581#comment-15692581 ] huaxiang sun commented on HBASE-17172: -- Thanks [~jingcheng...@intel.com]! Distributed compaction is definitely helpful even with the minor compaction, consider that mob compaction needs to acquire a table lock. The purpose of major compaction is trying to reduce the number of files. With HBASE-16891, users may still choose to disable mob compaction chore and run mob compaction manually at scheduled maintenance. To keep delete marker in hbase files in mob-enabled cf is one way to avoid .del files, the concern is that it is inconsistent with non-mob cfs (maybe this can be provided as option through config?). Another way may be to optimize it as the current jira suggests. For an example, user deletes some rows for one or two regions, after compaction, there will be .del files created. With the current major mob compaction, these .del files will be included in compacting of files for other regions which is not necessary, the net effect is that all mob files will be re-compacted. More ideas about how to optimize it are welcome, but I think distributed mob compaction is definitely needed, thanks. > Optimize major mob compaction with _del files > - > > Key: HBASE-17172 > URL: https://issues.apache.org/jira/browse/HBASE-17172 > Project: HBase > Issue Type: Improvement > Components: mob >Affects Versions: 2.0.0 >Reporter: huaxiang sun >Assignee: huaxiang sun > > Today, when there is a _del file in mobdir, with major mob compaction, every > mob file will be recompacted, this causes lots of IO and slow down major mob > compaction (may take months to finish). This needs to be improved. A few > ideas are: > 1) Do not compact all _del files into one, instead, compact them based on > groups with startKey as the key. Then use firstKey/startKey to make each mob > file to see if the _del file needs to be included for this partition. > 2). Based on the timerange of the _del file, compaction for files after that > timerange does not need to include the _del file as these are newer files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17172) Optimize major mob compaction with _del files
[ https://issues.apache.org/jira/browse/HBASE-17172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15692109#comment-15692109 ] Jingcheng Du commented on HBASE-17172: -- Hi [~huaxiang], why do you need major compaction? You want to reduce the number of the files? If so, can the proposal in HBASE-16981 solve this and I guess the major compaction won't be needed if HBASE-16981 is implemented? > Optimize major mob compaction with _del files > - > > Key: HBASE-17172 > URL: https://issues.apache.org/jira/browse/HBASE-17172 > Project: HBase > Issue Type: Improvement > Components: mob >Affects Versions: 2.0.0 >Reporter: huaxiang sun >Assignee: huaxiang sun > > Today, when there is a _del file in mobdir, with major mob compaction, every > mob file will be recompacted, this causes lots of IO and slow down major mob > compaction (may take months to finish). This needs to be improved. A few > ideas are: > 1) Do not compact all _del files into one, instead, compact them based on > groups with startKey as the key. Then use firstKey/startKey to make each mob > file to see if the _del file needs to be included for this partition. > 2). Based on the timerange of the _del file, compaction for files after that > timerange does not need to include the _del file as these are newer files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17172) Optimize major mob compaction with _del files
[ https://issues.apache.org/jira/browse/HBASE-17172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15692088#comment-15692088 ] Jingcheng Du commented on HBASE-17172: -- Thanks [~huaxiang]! A major compaction compacts all the files even without del files which is slow. Is it related with the del files? How about to increase the number of threads to perform the compaction to reduce the running time? Actually if the delete is rare, we can always keep the delete marker in hbase files in mob-enabled cf even in all files and major compaction. And we won't need the .del files in mob anymore. If this slow is not related with the .del files, I guess we have to fix the slow compaction by implementing a distributed compaction. I filed a JIRA HBASE-15381 to implement this, the patch is there, but I didn't rebase for long time. Are you interested to take it? > Optimize major mob compaction with _del files > - > > Key: HBASE-17172 > URL: https://issues.apache.org/jira/browse/HBASE-17172 > Project: HBase > Issue Type: Improvement > Components: mob >Affects Versions: 2.0.0 >Reporter: huaxiang sun >Assignee: huaxiang sun > > Today, when there is a _del file in mobdir, with major mob compaction, every > mob file will be recompacted, this causes lots of IO and slow down major mob > compaction (may take months to finish). This needs to be improved. A few > ideas are: > 1) Do not compact all _del files into one, instead, compact them based on > groups with startKey as the key. Then use firstKey/startKey to make each mob > file to see if the _del file needs to be included for this partition. > 2). Based on the timerange of the _del file, compaction for files after that > timerange does not need to include the _del file as these are newer files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)