[jira] [Commented] (HBASE-22749) Distributed MOB compactions

2019-10-18 Thread HBase QA (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-22749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16954336#comment-16954336
 ] 

HBase QA commented on HBASE-22749:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
21s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
1s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 10 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
30s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
56s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
28s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
39s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
39s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  4m  
2s{color} | {color:blue} Used deprecated FindBugs config; considering switching 
to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
59s{color} | {color:green} master passed {color} |
| {color:orange}-0{color} | {color:orange} patch {color} | {color:orange}  4m  
9s{color} | {color:orange} Used diff version of patch file. Binary files and 
potentially other changes not applied. Please rebase and squash commits if 
necessary. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
 2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  1m 
25s{color} | {color:red} hbase-server: The patch generated 17 new + 308 
unchanged - 47 fixed = 325 total (was 355) {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 14 line(s) that end in whitespace. Use 
git apply --whitespace=fix <>. Refer 
https://git-scm.com/docs/git-apply {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
37s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
15m 40s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.8.5 2.9.2 or 3.1.2. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  4m 
21s{color} | {color:red} hbase-server generated 4 new + 0 unchanged - 0 fixed = 
4 total (was 0) {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}334m  5s{color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
35s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}393m 29s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hbase-server |
|  |  Possible null pointer dereference of mobRefData in 
org.apache.hadoop.hbase.master.MobFileCleanerChore.cleanupObsoleteMobFiles(Configuration,
 TableName)  Dereferenced at MobFileCleanerChore.java:mobRefData in 
org.apache.hadoop.hbase.master.MobFileCleanerChore.cleanupObsoleteMobFiles(Configuration,
 TableName)  Dereferenced at MobFileCleanerChore.java:[line 176] |
|  |  org.apache.hadoop.hbase.mob.FileSelection 

[jira] [Commented] (HBASE-22749) Distributed MOB compactions

2019-10-17 Thread HBase QA (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-22749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16954167#comment-16954167
 ] 

HBase QA commented on HBASE-22749:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
14s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
1s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 10 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
 3s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
57s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
36s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  5m 
 3s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
37s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  4m 
38s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
36s{color} | {color:green} master passed {color} |
| {color:orange}-0{color} | {color:orange} patch {color} | {color:orange}  4m 
44s{color} | {color:orange} Used diff version of patch file. Binary files and 
potentially other changes not applied. Please rebase and squash commits if 
necessary. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  1m 
33s{color} | {color:red} hbase-server: The patch generated 17 new + 308 
unchanged - 47 fixed = 325 total (was 355) {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 12 line(s) that end in whitespace. Use 
git apply --whitespace=fix <>. Refer 
https://git-scm.com/docs/git-apply {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  5m 
 3s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
17m 27s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.8.5 2.9.2 or 3.1.2. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  4m 
36s{color} | {color:red} hbase-server generated 3 new + 0 unchanged - 0 fixed = 
3 total (was 0) {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}237m  6s{color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
29s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}300m 25s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hbase-server |
|  |  org.apache.hadoop.hbase.mob.FileSelection defines 
compareTo(FileSelection) and uses Object.equals()  At 
DefaultMobStoreCompactor.java:Object.equals()  At 
DefaultMobStoreCompactor.java:[lines 615-620] |
|  |  org.apache.hadoop.hbase.mob.Generation defines compareTo(Generation) and 
uses Object.equals()  At DefaultMobStoreCompactor.java:Object.equals()  At 
DefaultMobStoreCompactor.java:[lines 793-798] |
|  |  

[jira] [Commented] (HBASE-22749) Distributed MOB compactions

2019-10-17 Thread HBase QA (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-22749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16953982#comment-16953982
 ] 

HBase QA commented on HBASE-22749:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  6s{color} 
| {color:red} https://github.com/apache/hbase/pull/623 does not apply to 
master. Rebase required? Wrong Branch? See 
https://yetus.apache.org/documentation/in-progress/precommit-patchnames for 
help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| GITHUB PR | https://github.com/apache/hbase/pull/623 |
| JIRA Issue | HBASE-22749 |
| Console output | 
https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-623/3/console |
| versions | git=2.17.1 |
| Powered by | Apache Yetus 0.11.0 https://yetus.apache.org |


This message was automatically generated.



> Distributed MOB compactions 
> 
>
> Key: HBASE-22749
> URL: https://issues.apache.org/jira/browse/HBASE-22749
> Project: HBase
>  Issue Type: New Feature
>  Components: mob
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Major
> Attachments: HBASE-22749-branch-2.2-v4.patch, 
> HBASE-22749-master-v1.patch, HBASE-22749-master-v2.patch, 
> HBase-MOB-2.0-v1.pdf, HBase-MOB-2.0-v2.1.pdf, HBase-MOB-2.0-v2.2.pdf, 
> HBase-MOB-2.0-v2.pdf
>
>
> There are several  drawbacks in the original MOB 1.0  (Moderate Object 
> Storage) implementation, which can limit the adoption of the MOB feature:  
> # MOB compactions are executed in a Master as a chore, which limits 
> scalability because all I/O goes through a single HBase Master server. 
> # Yarn/Mapreduce framework is required to run MOB compactions in a scalable 
> way, but this won’t work in a stand-alone HBase cluster.
> # Two separate compactors for MOB and for regular store files and their 
> interactions can result in a data loss (see HBASE-22075)
> The design goals for MOB 2.0 were to provide 100% MOB 1.0 - compatible 
> implementation, which is free of the above drawbacks and can be used as a 
> drop in replacement in existing MOB deployments. So, these are design goals 
> of a MOB 2.0:
> # Make MOB compactions scalable without relying on Yarn/Mapreduce framework
> # Provide unified compactor for both MOB and regular store files
> # Make it more robust especially w.r.t. to data losses. 
> # Simplify and reduce the overall MOB code.
> # Provide 100% compatible implementation with MOB 1.0.
> # No migration of data should be required between MOB 1.0 and MOB 2.0 - just 
> software upgrade.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-22749) Distributed MOB compactions

2019-10-14 Thread HBase QA (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-22749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16951278#comment-16951278
 ] 

HBase QA commented on HBASE-22749:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  6s{color} 
| {color:red} https://github.com/apache/hbase/pull/623 does not apply to 
master. Rebase required? Wrong Branch? See 
https://yetus.apache.org/documentation/in-progress/precommit-patchnames for 
help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| GITHUB PR | https://github.com/apache/hbase/pull/623 |
| JIRA Issue | HBASE-22749 |
| Console output | 
https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-623/2/console |
| versions | git=2.17.1 |
| Powered by | Apache Yetus 0.11.0 https://yetus.apache.org |


This message was automatically generated.



> Distributed MOB compactions 
> 
>
> Key: HBASE-22749
> URL: https://issues.apache.org/jira/browse/HBASE-22749
> Project: HBase
>  Issue Type: New Feature
>  Components: mob
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Major
> Attachments: HBASE-22749-branch-2.2-v4.patch, 
> HBASE-22749-master-v1.patch, HBASE-22749-master-v2.patch, 
> HBase-MOB-2.0-v1.pdf, HBase-MOB-2.0-v2.1.pdf, HBase-MOB-2.0-v2.2.pdf, 
> HBase-MOB-2.0-v2.pdf
>
>
> There are several  drawbacks in the original MOB 1.0  (Moderate Object 
> Storage) implementation, which can limit the adoption of the MOB feature:  
> # MOB compactions are executed in a Master as a chore, which limits 
> scalability because all I/O goes through a single HBase Master server. 
> # Yarn/Mapreduce framework is required to run MOB compactions in a scalable 
> way, but this won’t work in a stand-alone HBase cluster.
> # Two separate compactors for MOB and for regular store files and their 
> interactions can result in a data loss (see HBASE-22075)
> The design goals for MOB 2.0 were to provide 100% MOB 1.0 - compatible 
> implementation, which is free of the above drawbacks and can be used as a 
> drop in replacement in existing MOB deployments. So, these are design goals 
> of a MOB 2.0:
> # Make MOB compactions scalable without relying on Yarn/Mapreduce framework
> # Provide unified compactor for both MOB and regular store files
> # Make it more robust especially w.r.t. to data losses. 
> # Simplify and reduce the overall MOB code.
> # Provide 100% compatible implementation with MOB 1.0.
> # No migration of data should be required between MOB 1.0 and MOB 2.0 - just 
> software upgrade.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-22749) Distributed MOB compactions

2019-10-12 Thread HBase QA (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-22749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16950222#comment-16950222
 ] 

HBase QA commented on HBASE-22749:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
38s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
1s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 11 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
 3s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
54s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
25s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
35s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
34s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  4m  
5s{color} | {color:blue} Used deprecated FindBugs config; considering switching 
to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m  
2s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  1m 
24s{color} | {color:red} hbase-server: The patch generated 11 new + 310 
unchanged - 45 fixed = 321 total (was 355) {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 5 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
32s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
15m 13s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.8.5 2.9.2 or 3.1.2. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  4m 
12s{color} | {color:red} hbase-server generated 3 new + 0 unchanged - 0 fixed = 
3 total (was 0) {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}164m 25s{color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
32s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}219m 37s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hbase-server |
|  |  org.apache.hadoop.hbase.mob.FileSelection defines 
compareTo(FileSelection) and uses Object.equals()  At 
DefaultMobStoreCompactor.java:Object.equals()  At 
DefaultMobStoreCompactor.java:[lines 613-618] |
|  |  org.apache.hadoop.hbase.mob.Generation defines compareTo(Generation) and 
uses Object.equals()  At DefaultMobStoreCompactor.java:Object.equals()  At 
DefaultMobStoreCompactor.java:[lines 791-796] |
|  |  Unused field:DefaultMobStoreCompactor.java |
| Failed junit tests | hadoop.hbase.client.TestAsyncRegionAdminApi |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.3 Server=19.03.3 base: 

[jira] [Commented] (HBASE-22749) Distributed MOB compactions

2019-10-12 Thread Sean Busbey (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-22749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16950201#comment-16950201
 ] 

Sean Busbey commented on HBASE-22749:
-

Thanks for the update. Please update the pull request as well.

> Distributed MOB compactions 
> 
>
> Key: HBASE-22749
> URL: https://issues.apache.org/jira/browse/HBASE-22749
> Project: HBase
>  Issue Type: New Feature
>  Components: mob
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Major
> Attachments: HBASE-22749-branch-2.2-v4.patch, 
> HBASE-22749-master-v1.patch, HBASE-22749-master-v2.patch, 
> HBase-MOB-2.0-v1.pdf, HBase-MOB-2.0-v2.1.pdf, HBase-MOB-2.0-v2.2.pdf, 
> HBase-MOB-2.0-v2.pdf
>
>
> There are several  drawbacks in the original MOB 1.0  (Moderate Object 
> Storage) implementation, which can limit the adoption of the MOB feature:  
> # MOB compactions are executed in a Master as a chore, which limits 
> scalability because all I/O goes through a single HBase Master server. 
> # Yarn/Mapreduce framework is required to run MOB compactions in a scalable 
> way, but this won’t work in a stand-alone HBase cluster.
> # Two separate compactors for MOB and for regular store files and their 
> interactions can result in a data loss (see HBASE-22075)
> The design goals for MOB 2.0 were to provide 100% MOB 1.0 - compatible 
> implementation, which is free of the above drawbacks and can be used as a 
> drop in replacement in existing MOB deployments. So, these are design goals 
> of a MOB 2.0:
> # Make MOB compactions scalable without relying on Yarn/Mapreduce framework
> # Provide unified compactor for both MOB and regular store files
> # Make it more robust especially w.r.t. to data losses. 
> # Simplify and reduce the overall MOB code.
> # Provide 100% compatible implementation with MOB 1.0.
> # No migration of data should be required between MOB 1.0 and MOB 2.0 - just 
> software upgrade.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-22749) Distributed MOB compactions

2019-09-13 Thread HBase QA (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-22749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16929651#comment-16929651
 ] 

HBase QA commented on HBASE-22749:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  2m 
24s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
1s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 9 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
46s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
4s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
43s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  5m 
30s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
43s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  4m 
56s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
53s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
9s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  1m 
42s{color} | {color:red} hbase-server: The patch generated 48 new + 239 
unchanged - 42 fixed = 287 total (was 281) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  5m 
47s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
19m 47s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.8.5 2.9.2 or 3.1.2. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
40s{color} | {color:red} hbase-server generated 1 new + 0 unchanged - 0 fixed = 
1 total (was 0) {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  4m 
51s{color} | {color:red} hbase-server generated 5 new + 0 unchanged - 0 fixed = 
5 total (was 0) {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}300m 11s{color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
31s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}371m 38s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hbase-server |
|  |  Exception is caught when Exception is not thrown in 
org.apache.hadoop.hbase.master.MobFileCompactionChore.chore()  At 
MobFileCompactionChore.java:is not thrown in 
org.apache.hadoop.hbase.master.MobFileCompactionChore.chore()  At 
MobFileCompactionChore.java:[line 106] |
|  |  org.apache.hadoop.hbase.mob.FileSelection defines 
compareTo(FileSelection) and uses Object.equals()  At 
DefaultMobStoreCompactor.java:Object.equals()  At 
DefaultMobStoreCompactor.java:[lines 614-619] |
|  |  org.apache.hadoop.hbase.mob.Generation defines compareTo(Generation) and 
uses Object.equals()  At DefaultMobStoreCompactor.java:Object.equals()  At 
DefaultMobStoreCompactor.java:[lines 789-794] |
|  |  Unused 

[jira] [Commented] (HBASE-22749) Distributed MOB compactions

2019-09-13 Thread Vladimir Rodionov (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-22749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16929511#comment-16929511
 ] 

Vladimir Rodionov commented on HBASE-22749:
---

PR has been created.

> Distributed MOB compactions 
> 
>
> Key: HBASE-22749
> URL: https://issues.apache.org/jira/browse/HBASE-22749
> Project: HBase
>  Issue Type: New Feature
>  Components: mob
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Major
> Attachments: HBASE-22749-branch-2.2-v4.patch, 
> HBASE-22749-master-v1.patch, HBase-MOB-2.0-v1.pdf, HBase-MOB-2.0-v2.1.pdf, 
> HBase-MOB-2.0-v2.2.pdf, HBase-MOB-2.0-v2.pdf
>
>
> There are several  drawbacks in the original MOB 1.0  (Moderate Object 
> Storage) implementation, which can limit the adoption of the MOB feature:  
> # MOB compactions are executed in a Master as a chore, which limits 
> scalability because all I/O goes through a single HBase Master server. 
> # Yarn/Mapreduce framework is required to run MOB compactions in a scalable 
> way, but this won’t work in a stand-alone HBase cluster.
> # Two separate compactors for MOB and for regular store files and their 
> interactions can result in a data loss (see HBASE-22075)
> The design goals for MOB 2.0 were to provide 100% MOB 1.0 - compatible 
> implementation, which is free of the above drawbacks and can be used as a 
> drop in replacement in existing MOB deployments. So, these are design goals 
> of a MOB 2.0:
> # Make MOB compactions scalable without relying on Yarn/Mapreduce framework
> # Provide unified compactor for both MOB and regular store files
> # Make it more robust especially w.r.t. to data losses. 
> # Simplify and reduce the overall MOB code.
> # Provide 100% compatible implementation with MOB 1.0.
> # No migration of data should be required between MOB 1.0 and MOB 2.0 - just 
> software upgrade.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (HBASE-22749) Distributed MOB compactions

2019-09-12 Thread Sean Busbey (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-22749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16928897#comment-16928897
 ] 

Sean Busbey commented on HBASE-22749:
-

It looks like the difference is that you're building with Eclipse and our 
lifecycle mapping was missing an entry in hbase-protocol?

If I have that right, could you please file a different jira for correcting 
that and get that bit posted on its own?

> Distributed MOB compactions 
> 
>
> Key: HBASE-22749
> URL: https://issues.apache.org/jira/browse/HBASE-22749
> Project: HBase
>  Issue Type: New Feature
>  Components: mob
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Major
> Attachments: HBASE-22749-branch-2.2-v3.patch, 
> HBASE-22749-branch-2.2-v4.patch, HBase-MOB-2.0-v1.pdf, 
> HBase-MOB-2.0-v2.1.pdf, HBase-MOB-2.0-v2.2.pdf, HBase-MOB-2.0-v2.pdf
>
>
> There are several  drawbacks in the original MOB 1.0  (Moderate Object 
> Storage) implementation, which can limit the adoption of the MOB feature:  
> # MOB compactions are executed in a Master as a chore, which limits 
> scalability because all I/O goes through a single HBase Master server. 
> # Yarn/Mapreduce framework is required to run MOB compactions in a scalable 
> way, but this won’t work in a stand-alone HBase cluster.
> # Two separate compactors for MOB and for regular store files and their 
> interactions can result in a data loss (see HBASE-22075)
> The design goals for MOB 2.0 were to provide 100% MOB 1.0 - compatible 
> implementation, which is free of the above drawbacks and can be used as a 
> drop in replacement in existing MOB deployments. So, these are design goals 
> of a MOB 2.0:
> # Make MOB compactions scalable without relying on Yarn/Mapreduce framework
> # Provide unified compactor for both MOB and regular store files
> # Make it more robust especially w.r.t. to data losses. 
> # Simplify and reduce the overall MOB code.
> # Provide 100% compatible implementation with MOB 1.0.
> # No migration of data should be required between MOB 1.0 and MOB 2.0 - just 
> software upgrade.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (HBASE-22749) Distributed MOB compactions

2019-09-12 Thread Sean Busbey (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-22749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16928890#comment-16928890
 ] 

Sean Busbey commented on HBASE-22749:
-

please make a github PR, the patch is large enough that it'll be hard to 
provide review feedback otherwise.

> Distributed MOB compactions 
> 
>
> Key: HBASE-22749
> URL: https://issues.apache.org/jira/browse/HBASE-22749
> Project: HBase
>  Issue Type: New Feature
>  Components: mob
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Major
> Attachments: HBASE-22749-branch-2.2-v3.patch, 
> HBASE-22749-branch-2.2-v4.patch, HBase-MOB-2.0-v1.pdf, 
> HBase-MOB-2.0-v2.1.pdf, HBase-MOB-2.0-v2.2.pdf, HBase-MOB-2.0-v2.pdf
>
>
> There are several  drawbacks in the original MOB 1.0  (Moderate Object 
> Storage) implementation, which can limit the adoption of the MOB feature:  
> # MOB compactions are executed in a Master as a chore, which limits 
> scalability because all I/O goes through a single HBase Master server. 
> # Yarn/Mapreduce framework is required to run MOB compactions in a scalable 
> way, but this won’t work in a stand-alone HBase cluster.
> # Two separate compactors for MOB and for regular store files and their 
> interactions can result in a data loss (see HBASE-22075)
> The design goals for MOB 2.0 were to provide 100% MOB 1.0 - compatible 
> implementation, which is free of the above drawbacks and can be used as a 
> drop in replacement in existing MOB deployments. So, these are design goals 
> of a MOB 2.0:
> # Make MOB compactions scalable without relying on Yarn/Mapreduce framework
> # Provide unified compactor for both MOB and regular store files
> # Make it more robust especially w.r.t. to data losses. 
> # Simplify and reduce the overall MOB code.
> # Provide 100% compatible implementation with MOB 1.0.
> # No migration of data should be required between MOB 1.0 and MOB 2.0 - just 
> software upgrade.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (HBASE-22749) Distributed MOB compactions

2019-09-12 Thread Sean Busbey (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-22749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16928889#comment-16928889
 ] 

Sean Busbey commented on HBASE-22749:
-

bq. Nevertheless, failed again:

what's the output of your maven version command?

> Distributed MOB compactions 
> 
>
> Key: HBASE-22749
> URL: https://issues.apache.org/jira/browse/HBASE-22749
> Project: HBase
>  Issue Type: New Feature
>  Components: mob
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Major
> Attachments: HBASE-22749-branch-2.2-v3.patch, 
> HBASE-22749-branch-2.2-v4.patch, HBase-MOB-2.0-v1.pdf, 
> HBase-MOB-2.0-v2.1.pdf, HBase-MOB-2.0-v2.2.pdf, HBase-MOB-2.0-v2.pdf
>
>
> There are several  drawbacks in the original MOB 1.0  (Moderate Object 
> Storage) implementation, which can limit the adoption of the MOB feature:  
> # MOB compactions are executed in a Master as a chore, which limits 
> scalability because all I/O goes through a single HBase Master server. 
> # Yarn/Mapreduce framework is required to run MOB compactions in a scalable 
> way, but this won’t work in a stand-alone HBase cluster.
> # Two separate compactors for MOB and for regular store files and their 
> interactions can result in a data loss (see HBASE-22075)
> The design goals for MOB 2.0 were to provide 100% MOB 1.0 - compatible 
> implementation, which is free of the above drawbacks and can be used as a 
> drop in replacement in existing MOB deployments. So, these are design goals 
> of a MOB 2.0:
> # Make MOB compactions scalable without relying on Yarn/Mapreduce framework
> # Provide unified compactor for both MOB and regular store files
> # Make it more robust especially w.r.t. to data losses. 
> # Simplify and reduce the overall MOB code.
> # Provide 100% compatible implementation with MOB 1.0.
> # No migration of data should be required between MOB 1.0 and MOB 2.0 - just 
> software upgrade.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (HBASE-22749) Distributed MOB compactions

2019-09-12 Thread Vladimir Rodionov (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-22749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16928879#comment-16928879
 ] 

Vladimir Rodionov commented on HBASE-22749:
---

v4 should build on 2.2.

> Distributed MOB compactions 
> 
>
> Key: HBASE-22749
> URL: https://issues.apache.org/jira/browse/HBASE-22749
> Project: HBase
>  Issue Type: New Feature
>  Components: mob
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Major
> Attachments: HBASE-22749-branch-2.2-v3.patch, 
> HBASE-22749-branch-2.2-v4.patch, HBase-MOB-2.0-v1.pdf, 
> HBase-MOB-2.0-v2.1.pdf, HBase-MOB-2.0-v2.2.pdf, HBase-MOB-2.0-v2.pdf
>
>
> There are several  drawbacks in the original MOB 1.0  (Moderate Object 
> Storage) implementation, which can limit the adoption of the MOB feature:  
> # MOB compactions are executed in a Master as a chore, which limits 
> scalability because all I/O goes through a single HBase Master server. 
> # Yarn/Mapreduce framework is required to run MOB compactions in a scalable 
> way, but this won’t work in a stand-alone HBase cluster.
> # Two separate compactors for MOB and for regular store files and their 
> interactions can result in a data loss (see HBASE-22075)
> The design goals for MOB 2.0 were to provide 100% MOB 1.0 - compatible 
> implementation, which is free of the above drawbacks and can be used as a 
> drop in replacement in existing MOB deployments. So, these are design goals 
> of a MOB 2.0:
> # Make MOB compactions scalable without relying on Yarn/Mapreduce framework
> # Provide unified compactor for both MOB and regular store files
> # Make it more robust especially w.r.t. to data losses. 
> # Simplify and reduce the overall MOB code.
> # Provide 100% compatible implementation with MOB 1.0.
> # No migration of data should be required between MOB 1.0 and MOB 2.0 - just 
> software upgrade.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (HBASE-22749) Distributed MOB compactions

2019-09-12 Thread Vladimir Rodionov (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-22749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16928832#comment-16928832
 ] 

Vladimir Rodionov commented on HBASE-22749:
---

Nevertheless, failed again:
{code}
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-shade-plugin:3.1.1:shade 
(aggregate-into-a-jar-with-relocated-third-parties) on project 
hbase-shaded-client: Error creating shaded jar: duplicate entry: 
META-INF/services/org.apache.hadoop.hbase.shaded.com.fasterxml.jackson.core.ObjectCodec
 -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn  -rf :hbase-shaded-client
{code}

> Distributed MOB compactions 
> 
>
> Key: HBASE-22749
> URL: https://issues.apache.org/jira/browse/HBASE-22749
> Project: HBase
>  Issue Type: New Feature
>  Components: mob
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Major
> Attachments: HBASE-22749-branch-2.2-v3.patch, HBase-MOB-2.0-v1.pdf, 
> HBase-MOB-2.0-v2.1.pdf, HBase-MOB-2.0-v2.2.pdf, HBase-MOB-2.0-v2.pdf
>
>
> There are several  drawbacks in the original MOB 1.0  (Moderate Object 
> Storage) implementation, which can limit the adoption of the MOB feature:  
> # MOB compactions are executed in a Master as a chore, which limits 
> scalability because all I/O goes through a single HBase Master server. 
> # Yarn/Mapreduce framework is required to run MOB compactions in a scalable 
> way, but this won’t work in a stand-alone HBase cluster.
> # Two separate compactors for MOB and for regular store files and their 
> interactions can result in a data loss (see HBASE-22075)
> The design goals for MOB 2.0 were to provide 100% MOB 1.0 - compatible 
> implementation, which is free of the above drawbacks and can be used as a 
> drop in replacement in existing MOB deployments. So, these are design goals 
> of a MOB 2.0:
> # Make MOB compactions scalable without relying on Yarn/Mapreduce framework
> # Provide unified compactor for both MOB and regular store files
> # Make it more robust especially w.r.t. to data losses. 
> # Simplify and reduce the overall MOB code.
> # Provide 100% compatible implementation with MOB 1.0.
> # No migration of data should be required between MOB 1.0 and MOB 2.0 - just 
> software upgrade.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (HBASE-22749) Distributed MOB compactions

2019-09-12 Thread Sean Busbey (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-22749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16928823#comment-16928823
 ] 

Sean Busbey commented on HBASE-22749:
-

I just got a clean build off of branch-2.2 ref {{ }}.

Since branch-2.2 is going through release candidates ATM, the version numbers 
don't have SNAPSHOT in them, so you have to be careful to keep Maven from using 
an artifact from an old build. I use the following line, but you have to be 
careful because it will forcibly delete any untracked changes in the local 
working directory:

{code}
 git clean -xdf && rm -rf ~/.m2/repository/org/apache/hbase/* && mvn 
-Dtest=NoTests -Dit.test=NOITs package install verify
...
[INFO] 
[INFO] BUILD SUCCESS
[INFO] 
[INFO] Total time: 10:41 min
[INFO] Finished at: 2019-09-12T13:51:11-05:00
[INFO] 
{code}

> Distributed MOB compactions 
> 
>
> Key: HBASE-22749
> URL: https://issues.apache.org/jira/browse/HBASE-22749
> Project: HBase
>  Issue Type: New Feature
>  Components: mob
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Major
> Attachments: HBASE-22749-branch-2.2-v3.patch, HBase-MOB-2.0-v1.pdf, 
> HBase-MOB-2.0-v2.1.pdf, HBase-MOB-2.0-v2.2.pdf, HBase-MOB-2.0-v2.pdf
>
>
> There are several  drawbacks in the original MOB 1.0  (Moderate Object 
> Storage) implementation, which can limit the adoption of the MOB feature:  
> # MOB compactions are executed in a Master as a chore, which limits 
> scalability because all I/O goes through a single HBase Master server. 
> # Yarn/Mapreduce framework is required to run MOB compactions in a scalable 
> way, but this won’t work in a stand-alone HBase cluster.
> # Two separate compactors for MOB and for regular store files and their 
> interactions can result in a data loss (see HBASE-22075)
> The design goals for MOB 2.0 were to provide 100% MOB 1.0 - compatible 
> implementation, which is free of the above drawbacks and can be used as a 
> drop in replacement in existing MOB deployments. So, these are design goals 
> of a MOB 2.0:
> # Make MOB compactions scalable without relying on Yarn/Mapreduce framework
> # Provide unified compactor for both MOB and regular store files
> # Make it more robust especially w.r.t. to data losses. 
> # Simplify and reduce the overall MOB code.
> # Provide 100% compatible implementation with MOB 1.0.
> # No migration of data should be required between MOB 1.0 and MOB 2.0 - just 
> software upgrade.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (HBASE-22749) Distributed MOB compactions

2019-09-12 Thread Vladimir Rodionov (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-22749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16928796#comment-16928796
 ] 

Vladimir Rodionov commented on HBASE-22749:
---

It seems, that tip of branch-2.2 is broken. Not a patch related. I tried to 
build 2.2 w/o patch and it failed with multiple errors.

> Distributed MOB compactions 
> 
>
> Key: HBASE-22749
> URL: https://issues.apache.org/jira/browse/HBASE-22749
> Project: HBase
>  Issue Type: New Feature
>  Components: mob
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Major
> Attachments: HBASE-22749-branch-2.2-v3.patch, HBase-MOB-2.0-v1.pdf, 
> HBase-MOB-2.0-v2.1.pdf, HBase-MOB-2.0-v2.2.pdf, HBase-MOB-2.0-v2.pdf
>
>
> There are several  drawbacks in the original MOB 1.0  (Moderate Object 
> Storage) implementation, which can limit the adoption of the MOB feature:  
> # MOB compactions are executed in a Master as a chore, which limits 
> scalability because all I/O goes through a single HBase Master server. 
> # Yarn/Mapreduce framework is required to run MOB compactions in a scalable 
> way, but this won’t work in a stand-alone HBase cluster.
> # Two separate compactors for MOB and for regular store files and their 
> interactions can result in a data loss (see HBASE-22075)
> The design goals for MOB 2.0 were to provide 100% MOB 1.0 - compatible 
> implementation, which is free of the above drawbacks and can be used as a 
> drop in replacement in existing MOB deployments. So, these are design goals 
> of a MOB 2.0:
> # Make MOB compactions scalable without relying on Yarn/Mapreduce framework
> # Provide unified compactor for both MOB and regular store files
> # Make it more robust especially w.r.t. to data losses. 
> # Simplify and reduce the overall MOB code.
> # Provide 100% compatible implementation with MOB 1.0.
> # No migration of data should be required between MOB 1.0 and MOB 2.0 - just 
> software upgrade.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (HBASE-22749) Distributed MOB compactions

2019-09-12 Thread Vladimir Rodionov (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-22749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16928789#comment-16928789
 ] 

Vladimir Rodionov commented on HBASE-22749:
---

Oops, will fix it.

> Distributed MOB compactions 
> 
>
> Key: HBASE-22749
> URL: https://issues.apache.org/jira/browse/HBASE-22749
> Project: HBase
>  Issue Type: New Feature
>  Components: mob
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Major
> Attachments: HBASE-22749-branch-2.2-v3.patch, HBase-MOB-2.0-v1.pdf, 
> HBase-MOB-2.0-v2.1.pdf, HBase-MOB-2.0-v2.2.pdf, HBase-MOB-2.0-v2.pdf
>
>
> There are several  drawbacks in the original MOB 1.0  (Moderate Object 
> Storage) implementation, which can limit the adoption of the MOB feature:  
> # MOB compactions are executed in a Master as a chore, which limits 
> scalability because all I/O goes through a single HBase Master server. 
> # Yarn/Mapreduce framework is required to run MOB compactions in a scalable 
> way, but this won’t work in a stand-alone HBase cluster.
> # Two separate compactors for MOB and for regular store files and their 
> interactions can result in a data loss (see HBASE-22075)
> The design goals for MOB 2.0 were to provide 100% MOB 1.0 - compatible 
> implementation, which is free of the above drawbacks and can be used as a 
> drop in replacement in existing MOB deployments. So, these are design goals 
> of a MOB 2.0:
> # Make MOB compactions scalable without relying on Yarn/Mapreduce framework
> # Provide unified compactor for both MOB and regular store files
> # Make it more robust especially w.r.t. to data losses. 
> # Simplify and reduce the overall MOB code.
> # Provide 100% compatible implementation with MOB 1.0.
> # No migration of data should be required between MOB 1.0 and MOB 2.0 - just 
> software upgrade.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (HBASE-22749) Distributed MOB compactions

2019-09-12 Thread Sean Busbey (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-22749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16928761#comment-16928761
 ] 

Sean Busbey commented on HBASE-22749:
-

The patch {{HBASE-22749-branch-2.2.v3.patch}} fails to build when I apply it on 
top of branch-2.2 ref {{a010bf154ae31c7aa4cb165a34bc7d42a6b70f2f}}.

> Distributed MOB compactions 
> 
>
> Key: HBASE-22749
> URL: https://issues.apache.org/jira/browse/HBASE-22749
> Project: HBase
>  Issue Type: New Feature
>  Components: mob
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Major
> Attachments: HBASE-22749-branch-2.2-v3.patch, HBase-MOB-2.0-v1.pdf, 
> HBase-MOB-2.0-v2.1.pdf, HBase-MOB-2.0-v2.2.pdf, HBase-MOB-2.0-v2.pdf
>
>
> There are several  drawbacks in the original MOB 1.0  (Moderate Object 
> Storage) implementation, which can limit the adoption of the MOB feature:  
> # MOB compactions are executed in a Master as a chore, which limits 
> scalability because all I/O goes through a single HBase Master server. 
> # Yarn/Mapreduce framework is required to run MOB compactions in a scalable 
> way, but this won’t work in a stand-alone HBase cluster.
> # Two separate compactors for MOB and for regular store files and their 
> interactions can result in a data loss (see HBASE-22075)
> The design goals for MOB 2.0 were to provide 100% MOB 1.0 - compatible 
> implementation, which is free of the above drawbacks and can be used as a 
> drop in replacement in existing MOB deployments. So, these are design goals 
> of a MOB 2.0:
> # Make MOB compactions scalable without relying on Yarn/Mapreduce framework
> # Provide unified compactor for both MOB and regular store files
> # Make it more robust especially w.r.t. to data losses. 
> # Simplify and reduce the overall MOB code.
> # Provide 100% compatible implementation with MOB 1.0.
> # No migration of data should be required between MOB 1.0 and MOB 2.0 - just 
> software upgrade.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (HBASE-22749) Distributed MOB compactions

2019-09-12 Thread Sean Busbey (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-22749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16928597#comment-16928597
 ] 

Sean Busbey commented on HBASE-22749:
-

could you open this as a PR?

> Distributed MOB compactions 
> 
>
> Key: HBASE-22749
> URL: https://issues.apache.org/jira/browse/HBASE-22749
> Project: HBase
>  Issue Type: New Feature
>  Components: mob
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Major
> Attachments: HBASE-22749-branch-2.2-v3.patch, HBase-MOB-2.0-v1.pdf, 
> HBase-MOB-2.0-v2.1.pdf, HBase-MOB-2.0-v2.2.pdf, HBase-MOB-2.0-v2.pdf
>
>
> There are several  drawbacks in the original MOB 1.0  (Moderate Object 
> Storage) implementation, which can limit the adoption of the MOB feature:  
> # MOB compactions are executed in a Master as a chore, which limits 
> scalability because all I/O goes through a single HBase Master server. 
> # Yarn/Mapreduce framework is required to run MOB compactions in a scalable 
> way, but this won’t work in a stand-alone HBase cluster.
> # Two separate compactors for MOB and for regular store files and their 
> interactions can result in a data loss (see HBASE-22075)
> The design goals for MOB 2.0 were to provide 100% MOB 1.0 - compatible 
> implementation, which is free of the above drawbacks and can be used as a 
> drop in replacement in existing MOB deployments. So, these are design goals 
> of a MOB 2.0:
> # Make MOB compactions scalable without relying on Yarn/Mapreduce framework
> # Provide unified compactor for both MOB and regular store files
> # Make it more robust especially w.r.t. to data losses. 
> # Simplify and reduce the overall MOB code.
> # Provide 100% compatible implementation with MOB 1.0.
> # No migration of data should be required between MOB 1.0 and MOB 2.0 - just 
> software upgrade.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (HBASE-22749) Distributed MOB compactions

2019-09-11 Thread Vladimir Rodionov (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-22749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16928139#comment-16928139
 ] 

Vladimir Rodionov commented on HBASE-22749:
---

Uploaded patch for 2.2 branch. Master version will follow shortly. 

> Distributed MOB compactions 
> 
>
> Key: HBASE-22749
> URL: https://issues.apache.org/jira/browse/HBASE-22749
> Project: HBase
>  Issue Type: New Feature
>  Components: mob
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Major
> Attachments: HBASE-22749-branch-2.2-v3.patch, HBase-MOB-2.0-v1.pdf, 
> HBase-MOB-2.0-v2.1.pdf, HBase-MOB-2.0-v2.2.pdf, HBase-MOB-2.0-v2.pdf
>
>
> There are several  drawbacks in the original MOB 1.0  (Moderate Object 
> Storage) implementation, which can limit the adoption of the MOB feature:  
> # MOB compactions are executed in a Master as a chore, which limits 
> scalability because all I/O goes through a single HBase Master server. 
> # Yarn/Mapreduce framework is required to run MOB compactions in a scalable 
> way, but this won’t work in a stand-alone HBase cluster.
> # Two separate compactors for MOB and for regular store files and their 
> interactions can result in a data loss (see HBASE-22075)
> The design goals for MOB 2.0 were to provide 100% MOB 1.0 - compatible 
> implementation, which is free of the above drawbacks and can be used as a 
> drop in replacement in existing MOB deployments. So, these are design goals 
> of a MOB 2.0:
> # Make MOB compactions scalable without relying on Yarn/Mapreduce framework
> # Provide unified compactor for both MOB and regular store files
> # Make it more robust especially w.r.t. to data losses. 
> # Simplify and reduce the overall MOB code.
> # Provide 100% compatible implementation with MOB 1.0.
> # No migration of data should be required between MOB 1.0 and MOB 2.0 - just 
> software upgrade.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (HBASE-22749) Distributed MOB compactions

2019-09-04 Thread Vladimir Rodionov (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-22749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16922943#comment-16922943
 ] 

Vladimir Rodionov commented on HBASE-22749:
---

Updated design document to v2.2. Added totally new MOB compaction algorithm 
section, which now can limit for sure, overall Read/Write I/O amplification 
(major concern so far) The initial patch is almost done, just need to fix the 
algorithm and run tests. 

> Distributed MOB compactions 
> 
>
> Key: HBASE-22749
> URL: https://issues.apache.org/jira/browse/HBASE-22749
> Project: HBase
>  Issue Type: New Feature
>  Components: mob
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Major
> Attachments: HBase-MOB-2.0-v1.pdf, HBase-MOB-2.0-v2.1.pdf, 
> HBase-MOB-2.0-v2.2.pdf, HBase-MOB-2.0-v2.pdf
>
>
> There are several  drawbacks in the original MOB 1.0  (Moderate Object 
> Storage) implementation, which can limit the adoption of the MOB feature:  
> # MOB compactions are executed in a Master as a chore, which limits 
> scalability because all I/O goes through a single HBase Master server. 
> # Yarn/Mapreduce framework is required to run MOB compactions in a scalable 
> way, but this won’t work in a stand-alone HBase cluster.
> # Two separate compactors for MOB and for regular store files and their 
> interactions can result in a data loss (see HBASE-22075)
> The design goals for MOB 2.0 were to provide 100% MOB 1.0 - compatible 
> implementation, which is free of the above drawbacks and can be used as a 
> drop in replacement in existing MOB deployments. So, these are design goals 
> of a MOB 2.0:
> # Make MOB compactions scalable without relying on Yarn/Mapreduce framework
> # Provide unified compactor for both MOB and regular store files
> # Make it more robust especially w.r.t. to data losses. 
> # Simplify and reduce the overall MOB code.
> # Provide 100% compatible implementation with MOB 1.0.
> # No migration of data should be required between MOB 1.0 and MOB 2.0 - just 
> software upgrade.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (HBASE-22749) Distributed MOB compactions

2019-08-21 Thread Sean Busbey (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-22749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16912947#comment-16912947
 ] 

Sean Busbey commented on HBASE-22749:
-

{quote}
bq. region sizing - splitting, normalizers, etc
bq. Need to expressly state wether or not this change to per-region accounting 
plans to alter the current assumptions that use of the feature means that the 
MOB data isn’t counted when determining region size for decisions to normalize 
or split.

This part has not been touched - meaning that MOB 2.0 does exactly the same 
what MOB 1.0 does. If MOB is not counted for normalize/split decision now in 
MOB it won'y be in 2.0. Should it? Probably, yes. But it is not part of 
scalable compactions.
{quote}

not counted is exactly what I'd prefer to hear. :) if/when anyone wants to add 
that I'd strongly recommend making sure we can independently tune the threshold 
for mob vs non-mob for those decisions.

> Distributed MOB compactions 
> 
>
> Key: HBASE-22749
> URL: https://issues.apache.org/jira/browse/HBASE-22749
> Project: HBase
>  Issue Type: New Feature
>  Components: mob
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Major
> Attachments: HBase-MOB-2.0-v1.pdf, HBase-MOB-2.0-v2.1.pdf, 
> HBase-MOB-2.0-v2.pdf
>
>
> There are several  drawbacks in the original MOB 1.0  (Moderate Object 
> Storage) implementation, which can limit the adoption of the MOB feature:  
> # MOB compactions are executed in a Master as a chore, which limits 
> scalability because all I/O goes through a single HBase Master server. 
> # Yarn/Mapreduce framework is required to run MOB compactions in a scalable 
> way, but this won’t work in a stand-alone HBase cluster.
> # Two separate compactors for MOB and for regular store files and their 
> interactions can result in a data loss (see HBASE-22075)
> The design goals for MOB 2.0 were to provide 100% MOB 1.0 - compatible 
> implementation, which is free of the above drawbacks and can be used as a 
> drop in replacement in existing MOB deployments. So, these are design goals 
> of a MOB 2.0:
> # Make MOB compactions scalable without relying on Yarn/Mapreduce framework
> # Provide unified compactor for both MOB and regular store files
> # Make it more robust especially w.r.t. to data losses. 
> # Simplify and reduce the overall MOB code.
> # Provide 100% compatible implementation with MOB 1.0.
> # No migration of data should be required between MOB 1.0 and MOB 2.0 - just 
> software upgrade.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (HBASE-22749) Distributed MOB compactions

2019-08-21 Thread Vladimir Rodionov (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-22749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16912908#comment-16912908
 ] 

Vladimir Rodionov commented on HBASE-22749:
---

It is the big list [~busbey].  Below are some answers:
{quote}
region sizing - splitting, normalizers, etc
Need to expressly state wether or not this change to per-region accounting 
plans to alter the current assumptions that use of the feature means that the 
MOB data isn’t counted when determining region size for decisions to normalize 
or split.
{quote}

This part has not been touched - meaning that MOB 2.0 does exactly the same 
what MOB 1.0 does. If MOB is not counted for normalize/split decision now in 
MOB it won'y be in 2.0. Should it? Probably, yes. But it is not part of  
scalable compactions.

{quote}
write amplification
{quote}

Good question. Default (non partial) major compaction does have the same or 
similar to regular HBase tiered compaction WA. I would not call this unbounded, 
but it is probably worse than in MOB 1.0. Partial MOB compaction will 
definetely have a  bounded WA comparable to what we have in MOB 1.0 (where 
compaction is done by partitions and partitions are date-based)
The idea of partial major MOB compaction is either to keep total number of MOB 
files in a system under control (say - around 1 M), or do not compact MOB files 
which reached some size threshold (say 1GB).  The latter case is easier to 
explain. If you exclude all MOB files above 1GB from compaction - your WA will 
be bounded by log2(T/S), where log2 - logarithm base 2, T - maximum MOB file 
size (threshold) and S - average size of Memstore flush. This is approximation 
of course. How it compares to MOB 1.0 partitioned compaction? By varying T we 
can get any WA we want. Say, if we set limit on  number of MOB files to 10M we 
can decrease T to 100MB and it will give us total capacity for MOB data to 1PB. 
With 100MB threshold, WA can be very low (low one's). I will update the 
document and will add more info on partial major MOB compactions, including 
file selection policy.   


> Distributed MOB compactions 
> 
>
> Key: HBASE-22749
> URL: https://issues.apache.org/jira/browse/HBASE-22749
> Project: HBase
>  Issue Type: New Feature
>  Components: mob
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Major
> Attachments: HBase-MOB-2.0-v1.pdf, HBase-MOB-2.0-v2.1.pdf, 
> HBase-MOB-2.0-v2.pdf
>
>
> There are several  drawbacks in the original MOB 1.0  (Moderate Object 
> Storage) implementation, which can limit the adoption of the MOB feature:  
> # MOB compactions are executed in a Master as a chore, which limits 
> scalability because all I/O goes through a single HBase Master server. 
> # Yarn/Mapreduce framework is required to run MOB compactions in a scalable 
> way, but this won’t work in a stand-alone HBase cluster.
> # Two separate compactors for MOB and for regular store files and their 
> interactions can result in a data loss (see HBASE-22075)
> The design goals for MOB 2.0 were to provide 100% MOB 1.0 - compatible 
> implementation, which is free of the above drawbacks and can be used as a 
> drop in replacement in existing MOB deployments. So, these are design goals 
> of a MOB 2.0:
> # Make MOB compactions scalable without relying on Yarn/Mapreduce framework
> # Provide unified compactor for both MOB and regular store files
> # Make it more robust especially w.r.t. to data losses. 
> # Simplify and reduce the overall MOB code.
> # Provide 100% compatible implementation with MOB 1.0.
> # No migration of data should be required between MOB 1.0 and MOB 2.0 - just 
> software upgrade.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (HBASE-22749) Distributed MOB compactions

2019-08-16 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909365#comment-16909365
 ] 

Sean Busbey commented on HBASE-22749:
-

h2. region sizing - splitting, normalizers, etc

Need to expressly state wether or not this change to per-region accounting 
plans to alter the current assumptions that use of the feature means that the 
MOB data isn’t counted when determining region size for decisions to normalize 
or split.

h2. write amplification

Current description of the unified compactor’s handling of MOB data doesn’t 
include anything about doing the kind of mob file partitioning that was 
previously done. I think this will behave a lot like the example from Section 
3.1.1 in the MOB Design v5 document you reference, specifically where MOB stuff 
is segregated in a dedicated CF. We still end up getting unbounded write 
amplification.

Consider this use case, which I think is in line with the assumptions laid out 
in both your description and in the MOB Design v5 document:

* Table with 50k regions
* MOB values that are 300KiB
* No updates, no deletes
* periodic flushes set to 6 hours
* periodic major and mob compaction set to weekly
* infrequent writes (slow enough so that only periodic flushes happen, but 
enough that all regions have a mob write)

Under the current MOB implementation with a monthly partition policy, I can 
reason that:

* we’ll be generating 200k hfiles in the mob directory per day due to periodic 
flushes 
* at the first week we’ll have 1.4m new hfiles, which we’ll compact probably 
into a low-double-digit number of hfiles
* at the second week, we’ll have 1.4m new hfiles plus the results of the first 
compaction. we will probably compact this into a low-double-digit number of 
hfiles
* at the third week, same thing again
* on the fourth week, same thing again
* After that fourth week things repeat, but the files generated will be in a 
new partition and so anything from prior won’t be rewritten again.

In the steady state:

* we should have a number of hfiles that stays under the limits of HDFS
* for a given MOB value, we should only write it to HDFS no more than 5 times 
(flush + between 1 and 4 compactions)

So that means we have a write amplification of ~5x regardless of splits or 
merges from normalization.

For the new design I don’t think there’s currently any bound. If I use the 
default compaction strategy:

* We’ll still be generating 200k hfiles per day
* at the first week we’ll have 1.4m new hfiles which we’ll compact to 50k 
hfiles.
* at the end of second week we’ll have 1.4m new hfiles + the existing 50k 
hfiles, and we’ll compact to 50k hfiles
* third week, same thing
* forth week, same thing
* this will repeat until each of the 50k files hits 10 GiB - 20 GiB depending 
on configs (~35-70k cells)

At the extreme of exactly 1 mob value per region per periodic flush that would 
mean 1-2 thousand weeks. Splits over that time period would mean probably we’d 
keep rewriting indefinitely. So the amount of amplification is essentially 
going to be driven by the periodicity of the mob compaction chore.

With default configs we can still get memstores that have ~1GiB of MOB values 
and still only do periodic flushes, so this can remain a non-trivial amount of 
load on HDFS.

If we enable partial major mob compaction we’d avoid writing the values 
repeatedly, but we’d against HDFS limitations in ~10 days.

h2. MOB compaction request chore and Partial major mob compaction

It’s a bit confusing going through the “Partial major MOB compaction” section 
where it currently is in the write up. As I understand things, you’re 
essentially describing a strategy for the process that has to pick particular 
regions to issue major compaction requests against instead of just requesting 
the whole table be compacted. Since this is an optimization of cluster IO use 
that’s possible _once we have per-region accounting and maintenance of MOB 
data_ I think it’d be clearer if it was in a section _after_ you describe the 
“scalable MOB compactions” stuff.

instead of starting that section off with the description of the compaction 
request chore, you can explain the accounting changes to store maintenance, the 
resulting changes to cleaning, and then end with the explanation about how 
folks won’t have to schedule maintenance tasks themselves with a section that’s 
labeled as the description of the “MOB Compaction Request Chore” and include 
there the description fo the region prioritization strategy. Another good 
strategy to mention there is prioritizing the store files we know haven’t been 
converted to include accounting information the cleaner needs.

h2. split tracking for the above

Could we do this with entries in hbase:meta or a journal instead of individual 
files? It’s going to get very messy when there are tables with 
tens-of-thousands or hundreds-of-thousands of regions.

h2.