[jira] [Commented] (HBASE-17172) Optimize major mob compaction with _del files

2016-12-02 Thread Jingcheng Du (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15714463#comment-15714463
 ] 

Jingcheng Du commented on HBASE-17172:
--

We design the mob to reduce the IO amplification. The design tries to guarantee 
the read performance no matter how many mob files there are. So we can reduce 
the compacted files (which leads to too many files) by setting such a 
threshold. We don't need to limit the number of files to small to fast the 
reading. That is why the default threshold is small, and that is why your 
compact policy JIRA is so important:)
The threshold is a key to reduce IO amplification, so we don't recommend to set 
it as a very large number. Otherwise, mob doesn't have too many differences 
from storing cells directly in HBase.

> Optimize major mob compaction with _del files
> -
>
> Key: HBASE-17172
> URL: https://issues.apache.org/jira/browse/HBASE-17172
> Project: HBase
>  Issue Type: Improvement
>  Components: mob
>Affects Versions: 2.0.0
>Reporter: huaxiang sun
>Assignee: huaxiang sun
>
> Today, when there is a _del file in mobdir, with major mob compaction, every 
> mob file will be recompacted, this causes lots of IO and slow down major mob 
> compaction (may take months to finish). This needs to be improved. A few 
> ideas are: 
> 1) Do not compact all _del files into one, instead, compact them based on 
> groups with startKey as the key. Then use firstKey/startKey to make each mob 
> file to see if the _del file needs to be included for this partition.
> 2). Based on the timerange of the _del file, compaction for files after that 
> timerange does not need to include the _del file as these are newer files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17172) Optimize major mob compaction with _del files

2016-12-01 Thread huaxiang sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15714331#comment-15714331
 ] 

huaxiang sun commented on HBASE-17172:
--

One more question for you, Jingcheng, :). When threshold is so big that size 
for all mob files is less than this threshold, in this case, if there are _del 
files, the minor mob compaction actually turns into a major mob compaction. 
What is the reason behind the design? Since threshold is a user configurable 
variable, user may choose to configure a large value and turns the mob 
compaction into a major one, if there are _del files, compaction will take 
longer than expected. Thinking about compacting 1 mob file with _del files only 
for major_mob_compact case so user is aware of what is going to happen. 
comments? Thanks..

> Optimize major mob compaction with _del files
> -
>
> Key: HBASE-17172
> URL: https://issues.apache.org/jira/browse/HBASE-17172
> Project: HBase
>  Issue Type: Improvement
>  Components: mob
>Affects Versions: 2.0.0
>Reporter: huaxiang sun
>Assignee: huaxiang sun
>
> Today, when there is a _del file in mobdir, with major mob compaction, every 
> mob file will be recompacted, this causes lots of IO and slow down major mob 
> compaction (may take months to finish). This needs to be improved. A few 
> ideas are: 
> 1) Do not compact all _del files into one, instead, compact them based on 
> groups with startKey as the key. Then use firstKey/startKey to make each mob 
> file to see if the _del file needs to be included for this partition.
> 2). Based on the timerange of the _del file, compaction for files after that 
> timerange does not need to include the _del file as these are newer files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17172) Optimize major mob compaction with _del files

2016-12-01 Thread Jingcheng Du (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15714293#comment-15714293
 ] 

Jingcheng Du commented on HBASE-17172:
--

bq. Can I create a new jira to address "Meanwhile, we can add more 
constriction, for example only perform compaction when there are more than 2 
mob files and _del files in minor compaction?"?
Sure, thanks!

> Optimize major mob compaction with _del files
> -
>
> Key: HBASE-17172
> URL: https://issues.apache.org/jira/browse/HBASE-17172
> Project: HBase
>  Issue Type: Improvement
>  Components: mob
>Affects Versions: 2.0.0
>Reporter: huaxiang sun
>Assignee: huaxiang sun
>
> Today, when there is a _del file in mobdir, with major mob compaction, every 
> mob file will be recompacted, this causes lots of IO and slow down major mob 
> compaction (may take months to finish). This needs to be improved. A few 
> ideas are: 
> 1) Do not compact all _del files into one, instead, compact them based on 
> groups with startKey as the key. Then use firstKey/startKey to make each mob 
> file to see if the _del file needs to be included for this partition.
> 2). Based on the timerange of the _del file, compaction for files after that 
> timerange does not need to include the _del file as these are newer files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17172) Optimize major mob compaction with _del files

2016-12-01 Thread huaxiang sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15714196#comment-15714196
 ] 

huaxiang sun commented on HBASE-17172:
--

Is it possible to compact the _del files by regions, and save the start keys 
and stop keys in memory for each partition to decide if we need to compact?

That is one of the ideas to optimize compaction with _del files. 

Can I create a new jira to address "Meanwhile, we can add more constriction, 
for example only perform compaction when there are more than 2 mob files and 
_del files in minor compaction?"?

Thanks!

> Optimize major mob compaction with _del files
> -
>
> Key: HBASE-17172
> URL: https://issues.apache.org/jira/browse/HBASE-17172
> Project: HBase
>  Issue Type: Improvement
>  Components: mob
>Affects Versions: 2.0.0
>Reporter: huaxiang sun
>Assignee: huaxiang sun
>
> Today, when there is a _del file in mobdir, with major mob compaction, every 
> mob file will be recompacted, this causes lots of IO and slow down major mob 
> compaction (may take months to finish). This needs to be improved. A few 
> ideas are: 
> 1) Do not compact all _del files into one, instead, compact them based on 
> groups with startKey as the key. Then use firstKey/startKey to make each mob 
> file to see if the _del file needs to be included for this partition.
> 2). Based on the timerange of the _del file, compaction for files after that 
> timerange does not need to include the _del file as these are newer files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17172) Optimize major mob compaction with _del files

2016-12-01 Thread Jingcheng Du (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15714169#comment-15714169
 ] 

Jingcheng Du commented on HBASE-17172:
--

You are right. It is like this now.
We are now is trying to avoid the unnecessary, right? Is it possible to compact 
the _del files by regions, and save the start keys and stop keys in memory for 
each partition to decide if we need to compact? Meanwhile, we can add more 
constriction, for example only perform compaction when there are more than 2 
mob files and _del files in minor compaction?

> Optimize major mob compaction with _del files
> -
>
> Key: HBASE-17172
> URL: https://issues.apache.org/jira/browse/HBASE-17172
> Project: HBase
>  Issue Type: Improvement
>  Components: mob
>Affects Versions: 2.0.0
>Reporter: huaxiang sun
>Assignee: huaxiang sun
>
> Today, when there is a _del file in mobdir, with major mob compaction, every 
> mob file will be recompacted, this causes lots of IO and slow down major mob 
> compaction (may take months to finish). This needs to be improved. A few 
> ideas are: 
> 1) Do not compact all _del files into one, instead, compact them based on 
> groups with startKey as the key. Then use firstKey/startKey to make each mob 
> file to see if the _del file needs to be included for this partition.
> 2). Based on the timerange of the _del file, compaction for files after that 
> timerange does not need to include the _del file as these are newer files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17172) Optimize major mob compaction with _del files

2016-12-01 Thread huaxiang sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15714142#comment-15714142
 ] 

huaxiang sun commented on HBASE-17172:
--

Thanks Jingcheng. Regarding with "If we skip the compacted files, the threshold 
is not that useful anymore.", today if there is only one file in the partition, 
and there is no _del files, the file is skipped. With del file, the current 
logic is to compact the already-compacted file with _del file. Let's say there 
is one mob file regionA20161101, which was compacted. On 12/1/2016, there 
is  _del file regionB20161201_del, mob compaction kicks in, 
regionA20161101 is less than the threshold, and it is picked for 
compaction. Since there is a _del file, regionA20161101 and 
regionB20161201_del are compacted into regionA20161101_1 . After that, 
regionB20161201_del cannot be deleted since it is not a allFile compaction. 
The next mob compaction, regionA20161101_1 and regionB20161201_del  
will be picked up again and be compacted into regionA20161101_2. So in this 
case, it will cause more unnecessary IOs. Could you double confirm if this is 
the case?

> Optimize major mob compaction with _del files
> -
>
> Key: HBASE-17172
> URL: https://issues.apache.org/jira/browse/HBASE-17172
> Project: HBase
>  Issue Type: Improvement
>  Components: mob
>Affects Versions: 2.0.0
>Reporter: huaxiang sun
>Assignee: huaxiang sun
>
> Today, when there is a _del file in mobdir, with major mob compaction, every 
> mob file will be recompacted, this causes lots of IO and slow down major mob 
> compaction (may take months to finish). This needs to be improved. A few 
> ideas are: 
> 1) Do not compact all _del files into one, instead, compact them based on 
> groups with startKey as the key. Then use firstKey/startKey to make each mob 
> file to see if the _del file needs to be included for this partition.
> 2). Based on the timerange of the _del file, compaction for files after that 
> timerange does not need to include the _del file as these are newer files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17172) Optimize major mob compaction with _del files

2016-12-01 Thread Jingcheng Du (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15711517#comment-15711517
 ] 

Jingcheng Du commented on HBASE-17172:
--

Thanks [~huaxiang].
If we skip the compacted files, the threshold is not that useful anymore. I 
have three options for the solution.
One is to decrease the threshold, and use the compaction policy in HBASE-16981 
in the compaction.
The second one is we can skip the minor compaction if there is only one mob 
file (or two mob files) and one _del file. But we have to suffer the 
unnecessary compaction in major compaction ( although the major compaction is 
not recommended).
The last one is we group the _del files by regions, but this is very difficult 
to allign the keys in _del files and the partitions in mob files.

> Optimize major mob compaction with _del files
> -
>
> Key: HBASE-17172
> URL: https://issues.apache.org/jira/browse/HBASE-17172
> Project: HBase
>  Issue Type: Improvement
>  Components: mob
>Affects Versions: 2.0.0
>Reporter: huaxiang sun
>Assignee: huaxiang sun
>
> Today, when there is a _del file in mobdir, with major mob compaction, every 
> mob file will be recompacted, this causes lots of IO and slow down major mob 
> compaction (may take months to finish). This needs to be improved. A few 
> ideas are: 
> 1) Do not compact all _del files into one, instead, compact them based on 
> groups with startKey as the key. Then use firstKey/startKey to make each mob 
> file to see if the _del file needs to be included for this partition.
> 2). Based on the timerange of the _del file, compaction for files after that 
> timerange does not need to include the _del file as these are newer files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17172) Optimize major mob compaction with _del files

2016-11-29 Thread huaxiang sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15707080#comment-15707080
 ] 

huaxiang sun commented on HBASE-17172:
--

Hi [~jingcheng.du] and [~anoop.hbase], just did more code reading and found 
that _del files can be included in minor mob compaction when the file size is 
less than the threshold. Assume that user sets a high threshold value, even for 
already compacted-files, it can be included in the compact list again and be 
compacted with the del files. If we want to deal with _del files mainly in 
major mob compaction. Can we skip these already-compacted files in the minor 
compaction? something like in the select() after files are added to 
filesToCompact map. This is to speed up minor compaction with del files.

{code}
diff --git 
a/hbase-server/src/main/java/org/apache/hadoop/hbase/mob/compactions/PartitionedMobCompactor.java
 
b/hbase-server/src/main/java/org/apache/hadoop/hbase/mob/compactions/PartitionedMobCompactor.java
index 33aecc0..dab05d2 100644
--- 
a/hbase-server/src/main/java/org/apache/hadoop/hbase/mob/compactions/PartitionedMobCompactor.java
+++ 
b/hbase-server/src/main/java/org/apache/hadoop/hbase/mob/compactions/PartitionedMobCompactor.java
@@ -25,6 +25,7 @@ import java.util.Collection;
 import java.util.Collections;
 import java.util.Date;
 import java.util.HashMap;
+import java.util.Iterator;
 import java.util.List;
 import java.util.Map;
 import java.util.Map.Entry;
@@ -179,6 +180,23 @@ public class PartitionedMobCompactor extends MobCompactor {
 selectedFileCount++;
   }
 }
+
+/*
+ * If it is not a major mob compaction with del files, and the file number 
in Partition is 1,
+ * remove the partition from filesToCompact list to avoid re-compacting 
files which has been
+ * compacted with del files.
+ */
+if (!allFiles && (allDelFiles.size() > 0)) {
+  for(Iterator> it =
+  filesToCompact.entrySet().iterator(); it.hasNext(); ) {
+Map.Entry entry = 
it.next();
+if (entry.getValue().getFileNumbers() <= 1) {
+  it.remove();
+  --selectedFileCount;
+}
+  }
+}
+
 PartitionedMobCompactionRequest request = new 
PartitionedMobCompactionRequest(
   filesToCompact.values(), allDelFiles);
 if (candidates.size() == (allDelFiles.size() + selectedFileCount + 
irrelevantFileCount)) {
{code}



> Optimize major mob compaction with _del files
> -
>
> Key: HBASE-17172
> URL: https://issues.apache.org/jira/browse/HBASE-17172
> Project: HBase
>  Issue Type: Improvement
>  Components: mob
>Affects Versions: 2.0.0
>Reporter: huaxiang sun
>Assignee: huaxiang sun
>
> Today, when there is a _del file in mobdir, with major mob compaction, every 
> mob file will be recompacted, this causes lots of IO and slow down major mob 
> compaction (may take months to finish). This needs to be improved. A few 
> ideas are: 
> 1) Do not compact all _del files into one, instead, compact them based on 
> groups with startKey as the key. Then use firstKey/startKey to make each mob 
> file to see if the _del file needs to be included for this partition.
> 2). Based on the timerange of the _del file, compaction for files after that 
> timerange does not need to include the _del file as these are newer files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17172) Optimize major mob compaction with _del files

2016-11-28 Thread Jingcheng Du (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15703892#comment-15703892
 ] 

Jingcheng Du commented on HBASE-17172:
--

Thanks [~huaxiang] and [~anoopsamjohn].
bq. Means the deleted MOB cells will be there in MOB files unless they are TTL 
expired?
Right. But according to Huaxiang's comments, this is not doable.

So it seems merging files by regions is the only way now.

> Optimize major mob compaction with _del files
> -
>
> Key: HBASE-17172
> URL: https://issues.apache.org/jira/browse/HBASE-17172
> Project: HBase
>  Issue Type: Improvement
>  Components: mob
>Affects Versions: 2.0.0
>Reporter: huaxiang sun
>Assignee: huaxiang sun
>
> Today, when there is a _del file in mobdir, with major mob compaction, every 
> mob file will be recompacted, this causes lots of IO and slow down major mob 
> compaction (may take months to finish). This needs to be improved. A few 
> ideas are: 
> 1) Do not compact all _del files into one, instead, compact them based on 
> groups with startKey as the key. Then use firstKey/startKey to make each mob 
> file to see if the _del file needs to be included for this partition.
> 2). Based on the timerange of the _del file, compaction for files after that 
> timerange does not need to include the _del file as these are newer files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17172) Optimize major mob compaction with _del files

2016-11-28 Thread huaxiang sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15702722#comment-15702722
 ] 

huaxiang sun commented on HBASE-17172:
--

We have use cases that user wants TTL to be multiple years or so, there may be 
lots of deleted cells depending on the use case. I think we want to give user 
an option to free up the space for these deleted cells.

> Optimize major mob compaction with _del files
> -
>
> Key: HBASE-17172
> URL: https://issues.apache.org/jira/browse/HBASE-17172
> Project: HBase
>  Issue Type: Improvement
>  Components: mob
>Affects Versions: 2.0.0
>Reporter: huaxiang sun
>Assignee: huaxiang sun
>
> Today, when there is a _del file in mobdir, with major mob compaction, every 
> mob file will be recompacted, this causes lots of IO and slow down major mob 
> compaction (may take months to finish). This needs to be improved. A few 
> ideas are: 
> 1) Do not compact all _del files into one, instead, compact them based on 
> groups with startKey as the key. Then use firstKey/startKey to make each mob 
> file to see if the _del file needs to be included for this partition.
> 2). Based on the timerange of the _del file, compaction for files after that 
> timerange does not need to include the _del file as these are newer files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17172) Optimize major mob compaction with _del files

2016-11-28 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15702059#comment-15702059
 ] 

Anoop Sam John commented on HBASE-17172:


bq.Or we can retain the delete markers in hbase tables until they are expired?
Means the deleted MOB cells will be there in MOB files unless they are TTL 
expired?

> Optimize major mob compaction with _del files
> -
>
> Key: HBASE-17172
> URL: https://issues.apache.org/jira/browse/HBASE-17172
> Project: HBase
>  Issue Type: Improvement
>  Components: mob
>Affects Versions: 2.0.0
>Reporter: huaxiang sun
>Assignee: huaxiang sun
>
> Today, when there is a _del file in mobdir, with major mob compaction, every 
> mob file will be recompacted, this causes lots of IO and slow down major mob 
> compaction (may take months to finish). This needs to be improved. A few 
> ideas are: 
> 1) Do not compact all _del files into one, instead, compact them based on 
> groups with startKey as the key. Then use firstKey/startKey to make each mob 
> file to see if the _del file needs to be included for this partition.
> 2). Based on the timerange of the _del file, compaction for files after that 
> timerange does not need to include the _del file as these are newer files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17172) Optimize major mob compaction with _del files

2016-11-28 Thread Jingcheng Du (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15701682#comment-15701682
 ] 

Jingcheng Du commented on HBASE-17172:
--

Thanks for clarification [~huaxiang]. You are right. The compaction is not 
necessary if only one mob file and some _del files and that mob file is not 
related with these _del files.
I think we can group the _del files by regions. Or we can retain the delete 
markers in hbase tables until they are expired?
[~anoopsamjohn], do you have preference on this? Thanks!

> Optimize major mob compaction with _del files
> -
>
> Key: HBASE-17172
> URL: https://issues.apache.org/jira/browse/HBASE-17172
> Project: HBase
>  Issue Type: Improvement
>  Components: mob
>Affects Versions: 2.0.0
>Reporter: huaxiang sun
>Assignee: huaxiang sun
>
> Today, when there is a _del file in mobdir, with major mob compaction, every 
> mob file will be recompacted, this causes lots of IO and slow down major mob 
> compaction (may take months to finish). This needs to be improved. A few 
> ideas are: 
> 1) Do not compact all _del files into one, instead, compact them based on 
> groups with startKey as the key. Then use firstKey/startKey to make each mob 
> file to see if the _del file needs to be included for this partition.
> 2). Based on the timerange of the _del file, compaction for files after that 
> timerange does not need to include the _del file as these are newer files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17172) Optimize major mob compaction with _del files

2016-11-24 Thread huaxiang sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15694193#comment-15694193
 ] 

huaxiang sun commented on HBASE-17172:
--

{code}
Hmm. If the .del is not a performance killer, we don't need this. I reviewed 
the code, I think the .del files is not the reason of the slow compaction, 
major compaction itself is.
{code}
I need to provide more background here. Let's say mob files have been major 
compacted one week ago. There are regionA and regionB, assume there is 
regionA20161001*** and regionB20161001 which are the results from previous 
major compaction. There is one del file for regionA created the past week. A 
major compaction kicks in. regionA20161001*** and regionB20161001*** will be 
re-compacted in this case. While compacting regionA20161001 is needed, 
re-compacting regionB20161001*** is a waste. Given there are lots of other 
regions and many already-compacted files, unnecessary compaction slows down the 
major compaction.

> Optimize major mob compaction with _del files
> -
>
> Key: HBASE-17172
> URL: https://issues.apache.org/jira/browse/HBASE-17172
> Project: HBase
>  Issue Type: Improvement
>  Components: mob
>Affects Versions: 2.0.0
>Reporter: huaxiang sun
>Assignee: huaxiang sun
>
> Today, when there is a _del file in mobdir, with major mob compaction, every 
> mob file will be recompacted, this causes lots of IO and slow down major mob 
> compaction (may take months to finish). This needs to be improved. A few 
> ideas are: 
> 1) Do not compact all _del files into one, instead, compact them based on 
> groups with startKey as the key. Then use firstKey/startKey to make each mob 
> file to see if the _del file needs to be included for this partition.
> 2). Based on the timerange of the _del file, compaction for files after that 
> timerange does not need to include the _del file as these are newer files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17172) Optimize major mob compaction with _del files

2016-11-24 Thread Jingcheng Du (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15693154#comment-15693154
 ] 

Jingcheng Du commented on HBASE-17172:
--

bq. So when we have _del files, we will promote compaction to be major and that 
the issue u r saying?
As I know, it is not. The major compaction is either triggered by clients, or 
all of the mob files are smaller than the mergeable threshold.

> Optimize major mob compaction with _del files
> -
>
> Key: HBASE-17172
> URL: https://issues.apache.org/jira/browse/HBASE-17172
> Project: HBase
>  Issue Type: Improvement
>  Components: mob
>Affects Versions: 2.0.0
>Reporter: huaxiang sun
>Assignee: huaxiang sun
>
> Today, when there is a _del file in mobdir, with major mob compaction, every 
> mob file will be recompacted, this causes lots of IO and slow down major mob 
> compaction (may take months to finish). This needs to be improved. A few 
> ideas are: 
> 1) Do not compact all _del files into one, instead, compact them based on 
> groups with startKey as the key. Then use firstKey/startKey to make each mob 
> file to see if the _del file needs to be included for this partition.
> 2). Based on the timerange of the _del file, compaction for files after that 
> timerange does not need to include the _del file as these are newer files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17172) Optimize major mob compaction with _del files

2016-11-24 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15693053#comment-15693053
 ] 

Anoop Sam John commented on HBASE-17172:


So when we have _del files, we will promote compaction to be major and that the 
issue u r saying?  Any way ur proposal to change how _del files getting 
compacted and how to use them more effectively makes full sense..  Sorry for 
the Q. I forgot the impl details now.

> Optimize major mob compaction with _del files
> -
>
> Key: HBASE-17172
> URL: https://issues.apache.org/jira/browse/HBASE-17172
> Project: HBase
>  Issue Type: Improvement
>  Components: mob
>Affects Versions: 2.0.0
>Reporter: huaxiang sun
>Assignee: huaxiang sun
>
> Today, when there is a _del file in mobdir, with major mob compaction, every 
> mob file will be recompacted, this causes lots of IO and slow down major mob 
> compaction (may take months to finish). This needs to be improved. A few 
> ideas are: 
> 1) Do not compact all _del files into one, instead, compact them based on 
> groups with startKey as the key. Then use firstKey/startKey to make each mob 
> file to see if the _del file needs to be included for this partition.
> 2). Based on the timerange of the _del file, compaction for files after that 
> timerange does not need to include the _del file as these are newer files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17172) Optimize major mob compaction with _del files

2016-11-24 Thread Jingcheng Du (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15692632#comment-15692632
 ] 

Jingcheng Du commented on HBASE-17172:
--

Thanks Huaxiang!
bq. users may still choose to disable mob compaction chore and run mob 
compaction manually at scheduled maintenance.
Right, how about to run minor compaction instead? It doesn't make sense to run 
major mob compaction periodically. Mob is designed to reduce the IO 
amplification during compaction. Major compaction will break this.
bq.  To keep delete marker in hbase files in mob-enabled cf is one way to avoid 
.del files, the concern is that it is inconsistent with non-mob cfs (maybe this 
can be provided as option through config?).
Hmm. If the .del is not a performance killer, we don't need this. I reviewed 
the code, I think the .del files is not the reason of the slow compaction, 
major compaction itself is.
bq. With the current major mob compaction, these .del files will be included in 
compacting of files for other regions which is not necessary.
Right, it is not necessary. To split them by regions is an good choice. But is 
this necessary if .del file didn't impact the compaction performance badly?

> Optimize major mob compaction with _del files
> -
>
> Key: HBASE-17172
> URL: https://issues.apache.org/jira/browse/HBASE-17172
> Project: HBase
>  Issue Type: Improvement
>  Components: mob
>Affects Versions: 2.0.0
>Reporter: huaxiang sun
>Assignee: huaxiang sun
>
> Today, when there is a _del file in mobdir, with major mob compaction, every 
> mob file will be recompacted, this causes lots of IO and slow down major mob 
> compaction (may take months to finish). This needs to be improved. A few 
> ideas are: 
> 1) Do not compact all _del files into one, instead, compact them based on 
> groups with startKey as the key. Then use firstKey/startKey to make each mob 
> file to see if the _del file needs to be included for this partition.
> 2). Based on the timerange of the _del file, compaction for files after that 
> timerange does not need to include the _del file as these are newer files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17172) Optimize major mob compaction with _del files

2016-11-24 Thread huaxiang sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15692581#comment-15692581
 ] 

huaxiang sun commented on HBASE-17172:
--

Thanks [~jingcheng...@intel.com]! Distributed compaction is definitely helpful 
even with the minor compaction, consider that mob compaction needs to acquire a 
table lock. The purpose of major compaction is trying to reduce the number of 
files. With HBASE-16891, users may still choose to disable mob compaction chore 
and run mob compaction manually at scheduled maintenance. To keep delete marker 
in hbase files in mob-enabled cf is one way to avoid .del files, the concern is 
that it is inconsistent with non-mob cfs (maybe this can be provided as option 
through config?). Another way may be to optimize it as the current jira 
suggests. For an example, user deletes some rows for one or two regions, after 
compaction, there will be .del files created. With the current major mob 
compaction, these .del files will be included in compacting of files for other 
regions which is not necessary, the net effect is that all mob files will be 
re-compacted. More ideas about how to optimize it are welcome, but I think 
distributed mob compaction is definitely needed, thanks.

> Optimize major mob compaction with _del files
> -
>
> Key: HBASE-17172
> URL: https://issues.apache.org/jira/browse/HBASE-17172
> Project: HBase
>  Issue Type: Improvement
>  Components: mob
>Affects Versions: 2.0.0
>Reporter: huaxiang sun
>Assignee: huaxiang sun
>
> Today, when there is a _del file in mobdir, with major mob compaction, every 
> mob file will be recompacted, this causes lots of IO and slow down major mob 
> compaction (may take months to finish). This needs to be improved. A few 
> ideas are: 
> 1) Do not compact all _del files into one, instead, compact them based on 
> groups with startKey as the key. Then use firstKey/startKey to make each mob 
> file to see if the _del file needs to be included for this partition.
> 2). Based on the timerange of the _del file, compaction for files after that 
> timerange does not need to include the _del file as these are newer files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17172) Optimize major mob compaction with _del files

2016-11-23 Thread Jingcheng Du (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15692109#comment-15692109
 ] 

Jingcheng Du commented on HBASE-17172:
--

Hi [~huaxiang], why do you need major compaction? You want to reduce the number 
of the files? If so, can the proposal in HBASE-16981 solve this and I guess the 
major compaction won't be needed if HBASE-16981 is implemented?

> Optimize major mob compaction with _del files
> -
>
> Key: HBASE-17172
> URL: https://issues.apache.org/jira/browse/HBASE-17172
> Project: HBase
>  Issue Type: Improvement
>  Components: mob
>Affects Versions: 2.0.0
>Reporter: huaxiang sun
>Assignee: huaxiang sun
>
> Today, when there is a _del file in mobdir, with major mob compaction, every 
> mob file will be recompacted, this causes lots of IO and slow down major mob 
> compaction (may take months to finish). This needs to be improved. A few 
> ideas are: 
> 1) Do not compact all _del files into one, instead, compact them based on 
> groups with startKey as the key. Then use firstKey/startKey to make each mob 
> file to see if the _del file needs to be included for this partition.
> 2). Based on the timerange of the _del file, compaction for files after that 
> timerange does not need to include the _del file as these are newer files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17172) Optimize major mob compaction with _del files

2016-11-23 Thread Jingcheng Du (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15692088#comment-15692088
 ] 

Jingcheng Du commented on HBASE-17172:
--

Thanks [~huaxiang]!
A major compaction compacts all the files even without del files which is slow. 
Is it related with the del files? How about to increase the number of threads 
to perform the compaction to reduce the running time?
Actually if the delete is rare, we can always keep the delete marker in hbase 
files in mob-enabled cf even in all files and major compaction. And we won't 
need the .del files in mob anymore.
If this slow is not related with the .del files, I guess we have to fix the 
slow compaction by implementing a distributed compaction. I filed a JIRA 
HBASE-15381 to implement this, the patch is there, but I didn't rebase for long 
time. Are you interested to take it?

> Optimize major mob compaction with _del files
> -
>
> Key: HBASE-17172
> URL: https://issues.apache.org/jira/browse/HBASE-17172
> Project: HBase
>  Issue Type: Improvement
>  Components: mob
>Affects Versions: 2.0.0
>Reporter: huaxiang sun
>Assignee: huaxiang sun
>
> Today, when there is a _del file in mobdir, with major mob compaction, every 
> mob file will be recompacted, this causes lots of IO and slow down major mob 
> compaction (may take months to finish). This needs to be improved. A few 
> ideas are: 
> 1) Do not compact all _del files into one, instead, compact them based on 
> groups with startKey as the key. Then use firstKey/startKey to make each mob 
> file to see if the _del file needs to be included for this partition.
> 2). Based on the timerange of the _del file, compaction for files after that 
> timerange does not need to include the _del file as these are newer files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)