[jira] [Commented] (HBASE-13885) ZK watches leaks during snapshots
[ https://issues.apache.org/jira/browse/HBASE-13885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14590163#comment-14590163 ] Lars Hofhansl commented on HBASE-13885: --- Thanks [~apurtell]!! Looks like another "improve-flaky-jenkins" push is needed, although in this case it looks like we cannot do much. > ZK watches leaks during snapshots > - > > Key: HBASE-13885 > URL: https://issues.apache.org/jira/browse/HBASE-13885 > Project: HBase > Issue Type: Bug > Components: snapshots >Affects Versions: 0.98.12 >Reporter: Abhishek Singh Chouhan >Assignee: Lars Hofhansl >Priority: Critical > Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.1, 1.3.0 > > Attachments: 13885-0.98-v2.txt, 13885-0.98-v3.txt, 13885-master.txt > > > When taking snapshot of a table a watcher over > /hbase/online-snapshot/abort/snapshot-name is created which is never cleared > when the snapshot is successful. If we use snapshots to take backups daily we > accumulate a lot of watches. > Steps to reproduce - > 1) Take snapshot of a table - snapshot 'table_1', 'abc' > 2) Run the following on zk node or alternatively observe zk watches metric > echo "wchc" | nc localhost 2181 > /hbase/online-snapshot/abort/abc can be found. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13885) ZK watches leaks during snapshots
[ https://issues.apache.org/jira/browse/HBASE-13885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14590140#comment-14590140 ] Andrew Purtell commented on HBASE-13885: I checked the trunk, 1.0, 1.1, and 0.98 build failures. Trunk is not related. The 1.2 build failed because Surefire timed out a test, could just be a dirty Jenkins run due to system load. The 1.1 build failed because the Surefire executor was killed externally. I think this is some other builds misfiring zombie detector. The 1.0 failure looks spurious and not related as no procedures are involved. The 0.98 failure is unrelated and a known dirty test that doesn't run well up on Jenkins because the executor env is underpowered for what the test needs. Another crappy round of runs on ASF jenkins, another day. > ZK watches leaks during snapshots > - > > Key: HBASE-13885 > URL: https://issues.apache.org/jira/browse/HBASE-13885 > Project: HBase > Issue Type: Bug > Components: snapshots >Affects Versions: 0.98.12 >Reporter: Abhishek Singh Chouhan >Assignee: Lars Hofhansl >Priority: Critical > Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.1, 1.3.0 > > Attachments: 13885-0.98-v2.txt, 13885-0.98-v3.txt, 13885-master.txt > > > When taking snapshot of a table a watcher over > /hbase/online-snapshot/abort/snapshot-name is created which is never cleared > when the snapshot is successful. If we use snapshots to take backups daily we > accumulate a lot of watches. > Steps to reproduce - > 1) Take snapshot of a table - snapshot 'table_1', 'abc' > 2) Run the following on zk node or alternatively observe zk watches metric > echo "wchc" | nc localhost 2181 > /hbase/online-snapshot/abort/abc can be found. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13885) ZK watches leaks during snapshots
[ https://issues.apache.org/jira/browse/HBASE-13885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14590054#comment-14590054 ] Lars Hofhansl commented on HBASE-13885: --- I'll check the test failures and see whether they are related. > ZK watches leaks during snapshots > - > > Key: HBASE-13885 > URL: https://issues.apache.org/jira/browse/HBASE-13885 > Project: HBase > Issue Type: Bug > Components: snapshots >Affects Versions: 0.98.12 >Reporter: Abhishek Singh Chouhan >Assignee: Lars Hofhansl >Priority: Critical > Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.1, 1.3.0 > > Attachments: 13885-0.98-v2.txt, 13885-0.98-v3.txt, 13885-master.txt > > > When taking snapshot of a table a watcher over > /hbase/online-snapshot/abort/snapshot-name is created which is never cleared > when the snapshot is successful. If we use snapshots to take backups daily we > accumulate a lot of watches. > Steps to reproduce - > 1) Take snapshot of a table - snapshot 'table_1', 'abc' > 2) Run the following on zk node or alternatively observe zk watches metric > echo "wchc" | nc localhost 2181 > /hbase/online-snapshot/abort/abc can be found. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13885) ZK watches leaks during snapshots
[ https://issues.apache.org/jira/browse/HBASE-13885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14589208#comment-14589208 ] Hudson commented on HBASE-13885: FAILURE: Integrated in HBase-1.3 #2 (See [https://builds.apache.org/job/HBase-1.3/2/]) HBASE-13885 ZK watches leaks during snapshots. (larsh: rev 7f6f199cf70cbabc38be6407a15f0461e55ac850) * hbase-server/src/main/java/org/apache/hadoop/hbase/procedure/ZKProcedureUtil.java * hbase-server/src/main/java/org/apache/hadoop/hbase/procedure/ZKProcedureCoordinatorRpcs.java * hbase-server/src/main/java/org/apache/hadoop/hbase/procedure/ZKProcedureMemberRpcs.java > ZK watches leaks during snapshots > - > > Key: HBASE-13885 > URL: https://issues.apache.org/jira/browse/HBASE-13885 > Project: HBase > Issue Type: Bug > Components: snapshots >Affects Versions: 0.98.12 >Reporter: Abhishek Singh Chouhan >Assignee: Lars Hofhansl >Priority: Critical > Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.1, 1.3.0 > > Attachments: 13885-0.98-v2.txt, 13885-0.98-v3.txt, 13885-master.txt > > > When taking snapshot of a table a watcher over > /hbase/online-snapshot/abort/snapshot-name is created which is never cleared > when the snapshot is successful. If we use snapshots to take backups daily we > accumulate a lot of watches. > Steps to reproduce - > 1) Take snapshot of a table - snapshot 'table_1', 'abc' > 2) Run the following on zk node or alternatively observe zk watches metric > echo "wchc" | nc localhost 2181 > /hbase/online-snapshot/abort/abc can be found. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13885) ZK watches leaks during snapshots
[ https://issues.apache.org/jira/browse/HBASE-13885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14589143#comment-14589143 ] Hudson commented on HBASE-13885: FAILURE: Integrated in HBase-0.98 #1030 (See [https://builds.apache.org/job/HBase-0.98/1030/]) HBASE-13885 ZK watches leaks during snapshots. (larsh: rev 049e68177124623f589aa09611e867c0f4cb41bd) * hbase-server/src/main/java/org/apache/hadoop/hbase/procedure/ZKProcedureUtil.java * hbase-server/src/main/java/org/apache/hadoop/hbase/procedure/ZKProcedureCoordinatorRpcs.java * hbase-server/src/main/java/org/apache/hadoop/hbase/procedure/ZKProcedureMemberRpcs.java > ZK watches leaks during snapshots > - > > Key: HBASE-13885 > URL: https://issues.apache.org/jira/browse/HBASE-13885 > Project: HBase > Issue Type: Bug > Components: snapshots >Affects Versions: 0.98.12 >Reporter: Abhishek Singh Chouhan >Assignee: Lars Hofhansl >Priority: Critical > Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.1, 1.3.0 > > Attachments: 13885-0.98-v2.txt, 13885-0.98-v3.txt, 13885-master.txt > > > When taking snapshot of a table a watcher over > /hbase/online-snapshot/abort/snapshot-name is created which is never cleared > when the snapshot is successful. If we use snapshots to take backups daily we > accumulate a lot of watches. > Steps to reproduce - > 1) Take snapshot of a table - snapshot 'table_1', 'abc' > 2) Run the following on zk node or alternatively observe zk watches metric > echo "wchc" | nc localhost 2181 > /hbase/online-snapshot/abort/abc can be found. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13885) ZK watches leaks during snapshots
[ https://issues.apache.org/jira/browse/HBASE-13885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14588997#comment-14588997 ] Hudson commented on HBASE-13885: FAILURE: Integrated in HBase-TRUNK #6578 (See [https://builds.apache.org/job/HBase-TRUNK/6578/]) HBASE-13885 ZK watches leaks during snapshots. (larsh: rev ce2fd2c58c5358f07defe13c8ae56e1bbfd59590) * hbase-server/src/main/java/org/apache/hadoop/hbase/procedure/ZKProcedureCoordinatorRpcs.java * hbase-server/src/main/java/org/apache/hadoop/hbase/procedure/ZKProcedureUtil.java * hbase-server/src/main/java/org/apache/hadoop/hbase/procedure/ZKProcedureMemberRpcs.java > ZK watches leaks during snapshots > - > > Key: HBASE-13885 > URL: https://issues.apache.org/jira/browse/HBASE-13885 > Project: HBase > Issue Type: Bug > Components: snapshots >Affects Versions: 0.98.12 >Reporter: Abhishek Singh Chouhan >Assignee: Lars Hofhansl >Priority: Critical > Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.1, 1.3.0 > > Attachments: 13885-0.98-v2.txt, 13885-0.98-v3.txt, 13885-master.txt > > > When taking snapshot of a table a watcher over > /hbase/online-snapshot/abort/snapshot-name is created which is never cleared > when the snapshot is successful. If we use snapshots to take backups daily we > accumulate a lot of watches. > Steps to reproduce - > 1) Take snapshot of a table - snapshot 'table_1', 'abc' > 2) Run the following on zk node or alternatively observe zk watches metric > echo "wchc" | nc localhost 2181 > /hbase/online-snapshot/abort/abc can be found. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13885) ZK watches leaks during snapshots
[ https://issues.apache.org/jira/browse/HBASE-13885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14588979#comment-14588979 ] Hudson commented on HBASE-13885: FAILURE: Integrated in HBase-1.2 #17 (See [https://builds.apache.org/job/HBase-1.2/17/]) HBASE-13885 ZK watches leaks during snapshots. (larsh: rev cb126dd99e33732aff8b82d9eb5bd211343e7263) * hbase-server/src/main/java/org/apache/hadoop/hbase/procedure/ZKProcedureMemberRpcs.java * hbase-server/src/main/java/org/apache/hadoop/hbase/procedure/ZKProcedureUtil.java * hbase-server/src/main/java/org/apache/hadoop/hbase/procedure/ZKProcedureCoordinatorRpcs.java > ZK watches leaks during snapshots > - > > Key: HBASE-13885 > URL: https://issues.apache.org/jira/browse/HBASE-13885 > Project: HBase > Issue Type: Bug > Components: snapshots >Affects Versions: 0.98.12 >Reporter: Abhishek Singh Chouhan >Assignee: Lars Hofhansl >Priority: Critical > Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.1, 1.3.0 > > Attachments: 13885-0.98-v2.txt, 13885-0.98-v3.txt, 13885-master.txt > > > When taking snapshot of a table a watcher over > /hbase/online-snapshot/abort/snapshot-name is created which is never cleared > when the snapshot is successful. If we use snapshots to take backups daily we > accumulate a lot of watches. > Steps to reproduce - > 1) Take snapshot of a table - snapshot 'table_1', 'abc' > 2) Run the following on zk node or alternatively observe zk watches metric > echo "wchc" | nc localhost 2181 > /hbase/online-snapshot/abort/abc can be found. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13885) ZK watches leaks during snapshots
[ https://issues.apache.org/jira/browse/HBASE-13885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14588835#comment-14588835 ] Hudson commented on HBASE-13885: FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #983 (See [https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/983/]) HBASE-13885 ZK watches leaks during snapshots. (larsh: rev 049e68177124623f589aa09611e867c0f4cb41bd) * hbase-server/src/main/java/org/apache/hadoop/hbase/procedure/ZKProcedureMemberRpcs.java * hbase-server/src/main/java/org/apache/hadoop/hbase/procedure/ZKProcedureUtil.java * hbase-server/src/main/java/org/apache/hadoop/hbase/procedure/ZKProcedureCoordinatorRpcs.java > ZK watches leaks during snapshots > - > > Key: HBASE-13885 > URL: https://issues.apache.org/jira/browse/HBASE-13885 > Project: HBase > Issue Type: Bug > Components: snapshots >Affects Versions: 0.98.12 >Reporter: Abhishek Singh Chouhan >Assignee: Lars Hofhansl >Priority: Critical > Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.1, 1.3.0 > > Attachments: 13885-0.98-v2.txt, 13885-0.98-v3.txt, 13885-master.txt > > > When taking snapshot of a table a watcher over > /hbase/online-snapshot/abort/snapshot-name is created which is never cleared > when the snapshot is successful. If we use snapshots to take backups daily we > accumulate a lot of watches. > Steps to reproduce - > 1) Take snapshot of a table - snapshot 'table_1', 'abc' > 2) Run the following on zk node or alternatively observe zk watches metric > echo "wchc" | nc localhost 2181 > /hbase/online-snapshot/abort/abc can be found. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13885) ZK watches leaks during snapshots
[ https://issues.apache.org/jira/browse/HBASE-13885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14588824#comment-14588824 ] Hudson commented on HBASE-13885: FAILURE: Integrated in HBase-1.0 #964 (See [https://builds.apache.org/job/HBase-1.0/964/]) HBASE-13885 ZK watches leaks during snapshots. (larsh: rev a2085155f5d6b9c07f641aabbede9180636f655e) * hbase-server/src/main/java/org/apache/hadoop/hbase/procedure/ZKProcedureCoordinatorRpcs.java * hbase-server/src/main/java/org/apache/hadoop/hbase/procedure/ZKProcedureMemberRpcs.java * hbase-server/src/main/java/org/apache/hadoop/hbase/procedure/ZKProcedureUtil.java > ZK watches leaks during snapshots > - > > Key: HBASE-13885 > URL: https://issues.apache.org/jira/browse/HBASE-13885 > Project: HBase > Issue Type: Bug > Components: snapshots >Affects Versions: 0.98.12 >Reporter: Abhishek Singh Chouhan >Assignee: Lars Hofhansl >Priority: Critical > Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.1, 1.3.0 > > Attachments: 13885-0.98-v2.txt, 13885-0.98-v3.txt, 13885-master.txt > > > When taking snapshot of a table a watcher over > /hbase/online-snapshot/abort/snapshot-name is created which is never cleared > when the snapshot is successful. If we use snapshots to take backups daily we > accumulate a lot of watches. > Steps to reproduce - > 1) Take snapshot of a table - snapshot 'table_1', 'abc' > 2) Run the following on zk node or alternatively observe zk watches metric > echo "wchc" | nc localhost 2181 > /hbase/online-snapshot/abort/abc can be found. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13885) ZK watches leaks during snapshots
[ https://issues.apache.org/jira/browse/HBASE-13885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14588721#comment-14588721 ] Hudson commented on HBASE-13885: FAILURE: Integrated in HBase-1.1 #545 (See [https://builds.apache.org/job/HBase-1.1/545/]) HBASE-13885 ZK watches leaks during snapshots. (larsh: rev d7b56f631d1fe2a180746b134696d20e4617df2d) * hbase-server/src/main/java/org/apache/hadoop/hbase/procedure/ZKProcedureCoordinatorRpcs.java * hbase-server/src/main/java/org/apache/hadoop/hbase/procedure/ZKProcedureUtil.java * hbase-server/src/main/java/org/apache/hadoop/hbase/procedure/ZKProcedureMemberRpcs.java > ZK watches leaks during snapshots > - > > Key: HBASE-13885 > URL: https://issues.apache.org/jira/browse/HBASE-13885 > Project: HBase > Issue Type: Bug > Components: snapshots >Affects Versions: 0.98.12 >Reporter: Abhishek Singh Chouhan >Assignee: Lars Hofhansl >Priority: Critical > Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.1, 1.3.0 > > Attachments: 13885-0.98-v2.txt, 13885-0.98-v3.txt, 13885-master.txt > > > When taking snapshot of a table a watcher over > /hbase/online-snapshot/abort/snapshot-name is created which is never cleared > when the snapshot is successful. If we use snapshots to take backups daily we > accumulate a lot of watches. > Steps to reproduce - > 1) Take snapshot of a table - snapshot 'table_1', 'abc' > 2) Run the following on zk node or alternatively observe zk watches metric > echo "wchc" | nc localhost 2181 > /hbase/online-snapshot/abort/abc can be found. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13885) ZK watches leaks during snapshots
[ https://issues.apache.org/jira/browse/HBASE-13885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14588658#comment-14588658 ] Andrew Purtell commented on HBASE-13885: bq. I do think the committer should commit to all branches and not push that onto the RM. I agree, but as 0.98 RM I'm giving people an out :-) > ZK watches leaks during snapshots > - > > Key: HBASE-13885 > URL: https://issues.apache.org/jira/browse/HBASE-13885 > Project: HBase > Issue Type: Bug > Components: snapshots >Affects Versions: 0.98.12 >Reporter: Abhishek Singh Chouhan >Assignee: Lars Hofhansl >Priority: Critical > Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.1, 1.3.0 > > Attachments: 13885-0.98-v2.txt, 13885-0.98-v3.txt, 13885-master.txt > > > When taking snapshot of a table a watcher over > /hbase/online-snapshot/abort/snapshot-name is created which is never cleared > when the snapshot is successful. If we use snapshots to take backups daily we > accumulate a lot of watches. > Steps to reproduce - > 1) Take snapshot of a table - snapshot 'table_1', 'abc' > 2) Run the following on zk node or alternatively observe zk watches metric > echo "wchc" | nc localhost 2181 > /hbase/online-snapshot/abort/abc can be found. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13885) ZK watches leaks during snapshots
[ https://issues.apache.org/jira/browse/HBASE-13885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14588656#comment-14588656 ] Lars Hofhansl commented on HBASE-13885: --- I did cherry-pick from master all the way down to 0.98. :) But I almost missed the 1.2.0 branch, since branch-1 now means 1.3.0. I do think the committer should commit to all branches and not push that onto the RM. > ZK watches leaks during snapshots > - > > Key: HBASE-13885 > URL: https://issues.apache.org/jira/browse/HBASE-13885 > Project: HBase > Issue Type: Bug > Components: snapshots >Affects Versions: 0.98.12 >Reporter: Abhishek Singh Chouhan >Assignee: Lars Hofhansl >Priority: Critical > Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.1, 1.3.0 > > Attachments: 13885-0.98-v2.txt, 13885-0.98-v3.txt, 13885-master.txt > > > When taking snapshot of a table a watcher over > /hbase/online-snapshot/abort/snapshot-name is created which is never cleared > when the snapshot is successful. If we use snapshots to take backups daily we > accumulate a lot of watches. > Steps to reproduce - > 1) Take snapshot of a table - snapshot 'table_1', 'abc' > 2) Run the following on zk node or alternatively observe zk watches metric > echo "wchc" | nc localhost 2181 > /hbase/online-snapshot/abort/abc can be found. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13885) ZK watches leaks during snapshots
[ https://issues.apache.org/jira/browse/HBASE-13885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14588624#comment-14588624 ] Andrew Purtell commented on HBASE-13885: bq. do we have too many branches to maintain now? I don't find this to be the case, but it's going to depend on workflow. I start with a master patch, then pick back (with adjustments) to branch-1, then pick back that commit to 1.2, then pick back that commit to 1.1, then 1.0, 0.98 etc. If we are good with this approach then I think it can work for others. The number of branches will slow us down, though, that's true. Expect to only commit one or two things per day. That can have its benefits, actually. It's fine if you leave off 0.98, I'll take care of that, but leave the fix version if it's relevant. > ZK watches leaks during snapshots > - > > Key: HBASE-13885 > URL: https://issues.apache.org/jira/browse/HBASE-13885 > Project: HBase > Issue Type: Bug > Components: snapshots >Affects Versions: 0.98.12 >Reporter: Abhishek Singh Chouhan >Assignee: Lars Hofhansl >Priority: Critical > Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.1, 1.3.0 > > Attachments: 13885-0.98-v2.txt, 13885-0.98-v3.txt, 13885-master.txt > > > When taking snapshot of a table a watcher over > /hbase/online-snapshot/abort/snapshot-name is created which is never cleared > when the snapshot is successful. If we use snapshots to take backups daily we > accumulate a lot of watches. > Steps to reproduce - > 1) Take snapshot of a table - snapshot 'table_1', 'abc' > 2) Run the following on zk node or alternatively observe zk watches metric > echo "wchc" | nc localhost 2181 > /hbase/online-snapshot/abort/abc can be found. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13885) ZK watches leaks during snapshots
[ https://issues.apache.org/jira/browse/HBASE-13885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14588629#comment-14588629 ] Andrew Purtell commented on HBASE-13885: bq. I'd have to create a NodeAndData object Yeah, ok > ZK watches leaks during snapshots > - > > Key: HBASE-13885 > URL: https://issues.apache.org/jira/browse/HBASE-13885 > Project: HBase > Issue Type: Bug > Components: snapshots >Affects Versions: 0.98.12 >Reporter: Abhishek Singh Chouhan >Assignee: Lars Hofhansl >Priority: Critical > Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.1, 1.3.0 > > Attachments: 13885-0.98-v2.txt, 13885-0.98-v3.txt, 13885-master.txt > > > When taking snapshot of a table a watcher over > /hbase/online-snapshot/abort/snapshot-name is created which is never cleared > when the snapshot is successful. If we use snapshots to take backups daily we > accumulate a lot of watches. > Steps to reproduce - > 1) Take snapshot of a table - snapshot 'table_1', 'abc' > 2) Run the following on zk node or alternatively observe zk watches metric > echo "wchc" | nc localhost 2181 > /hbase/online-snapshot/abort/abc can be found. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13885) ZK watches leaks during snapshots
[ https://issues.apache.org/jira/browse/HBASE-13885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14588609#comment-14588609 ] Lars Hofhansl commented on HBASE-13885: --- Committed to 0.98, 1.0.2, 1.1.1, 1.2.0, 1.3.0, and 2.0.0. (do we have too many branches to maintain now?) > ZK watches leaks during snapshots > - > > Key: HBASE-13885 > URL: https://issues.apache.org/jira/browse/HBASE-13885 > Project: HBase > Issue Type: Bug > Components: snapshots >Affects Versions: 0.98.12 >Reporter: Abhishek Singh Chouhan >Assignee: Lars Hofhansl >Priority: Critical > Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.1, 1.3.0 > > Attachments: 13885-0.98-v2.txt, 13885-0.98-v3.txt, 13885-master.txt > > > When taking snapshot of a table a watcher over > /hbase/online-snapshot/abort/snapshot-name is created which is never cleared > when the snapshot is successful. If we use snapshots to take backups daily we > accumulate a lot of watches. > Steps to reproduce - > 1) Take snapshot of a table - snapshot 'table_1', 'abc' > 2) Run the following on zk node or alternatively observe zk watches metric > echo "wchc" | nc localhost 2181 > /hbase/online-snapshot/abort/abc can be found. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13885) ZK watches leaks during snapshots
[ https://issues.apache.org/jira/browse/HBASE-13885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14587404#comment-14587404 ] Lars Hofhansl commented on HBASE-13885: --- Thanks. Committing tonight to all branches (after changing to use ZKUtil.isEmpty). > ZK watches leaks during snapshots > - > > Key: HBASE-13885 > URL: https://issues.apache.org/jira/browse/HBASE-13885 > Project: HBase > Issue Type: Bug > Components: snapshots >Affects Versions: 0.98.12 >Reporter: Abhishek Singh Chouhan >Assignee: Lars Hofhansl >Priority: Critical > Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.1 > > Attachments: 13885-0.98-v2.txt, 13885-0.98-v3.txt, 13885-master.txt > > > When taking snapshot of a table a watcher over > /hbase/online-snapshot/abort/snapshot-name is created which is never cleared > when the snapshot is successful. If we use snapshots to take backups daily we > accumulate a lot of watches. > Steps to reproduce - > 1) Take snapshot of a table - snapshot 'table_1', 'abc' > 2) Run the following on zk node or alternatively observe zk watches metric > echo "wchc" | nc localhost 2181 > /hbase/online-snapshot/abort/abc can be found. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13885) ZK watches leaks during snapshots
[ https://issues.apache.org/jira/browse/HBASE-13885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14587062#comment-14587062 ] Jesse Yates commented on HBASE-13885: - Seems reasonable to me :) > ZK watches leaks during snapshots > - > > Key: HBASE-13885 > URL: https://issues.apache.org/jira/browse/HBASE-13885 > Project: HBase > Issue Type: Bug > Components: snapshots >Affects Versions: 0.98.12 >Reporter: Abhishek Singh Chouhan >Assignee: Lars Hofhansl >Priority: Critical > Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.1 > > Attachments: 13885-0.98-v2.txt, 13885-0.98-v3.txt, 13885-master.txt > > > When taking snapshot of a table a watcher over > /hbase/online-snapshot/abort/snapshot-name is created which is never cleared > when the snapshot is successful. If we use snapshots to take backups daily we > accumulate a lot of watches. > Steps to reproduce - > 1) Take snapshot of a table - snapshot 'table_1', 'abc' > 2) Run the following on zk node or alternatively observe zk watches metric > echo "wchc" | nc localhost 2181 > /hbase/online-snapshot/abort/abc can be found. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13885) ZK watches leaks during snapshots
[ https://issues.apache.org/jira/browse/HBASE-13885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14587055#comment-14587055 ] Lars Hofhansl commented on HBASE-13885: --- Cool... I'll commit this everywhere. [~jesse_yates], any last comments? > ZK watches leaks during snapshots > - > > Key: HBASE-13885 > URL: https://issues.apache.org/jira/browse/HBASE-13885 > Project: HBase > Issue Type: Bug > Components: snapshots >Affects Versions: 0.98.12 >Reporter: Abhishek Singh Chouhan >Assignee: Lars Hofhansl >Priority: Critical > Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.1 > > Attachments: 13885-0.98-v2.txt, 13885-0.98-v3.txt, 13885-master.txt > > > When taking snapshot of a table a watcher over > /hbase/online-snapshot/abort/snapshot-name is created which is never cleared > when the snapshot is successful. If we use snapshots to take backups daily we > accumulate a lot of watches. > Steps to reproduce - > 1) Take snapshot of a table - snapshot 'table_1', 'abc' > 2) Run the following on zk node or alternatively observe zk watches metric > echo "wchc" | nc localhost 2181 > /hbase/online-snapshot/abort/abc can be found. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13885) ZK watches leaks during snapshots
[ https://issues.apache.org/jira/browse/HBASE-13885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14586449#comment-14586449 ] Andrew Purtell commented on HBASE-13885: +1 bq. FYI for 0.98.14. Or if we want, since this is pretty bad we can pull it into the RC (0.98.14 seems fine, though) 0.98.13 is super late and we've been through a couple of RCs already. This issue has been there with snapshots/procedure v1 since the beginning. It's important but we can get to it when 0.98.14 goes out next month more or less on the normal cadence. HBASE-13901 adds a check in ZKUtil#isEmpty for this type of thing: {code} diff --git a/hbase-server/src/main/java/org/apache/hadoop/hbase/procedure/ZKProcedureMemberRpcs.java b/hbase-server/src/main/java/org/apache/hadoop/hbase/procedure/ZKProcedureMemberRpcs.java index 8620558..114d735 100644 --- a/hbase-server/src/main/java/org/apache/hadoop/hbase/procedure/ZKProcedureMemberRpcs.java +++ b/hbase-server/src/main/java/org/apache/hadoop/hbase/procedure/ZKProcedureMemberRpcs.java @@ -309,7 +309,10 @@ public class ZKProcedureMemberRpcs implements ProcedureMemberRpcs { // figure out the data we need to pass ForeignException ee; try { -if (!ProtobufUtil.isPBMagicPrefix(data)) { +if (data == null || data.length == 0) { + // ignore + return; +} else if (!ProtobufUtil.isPBMagicPrefix(data)) { String msg = "Illegally formatted data in abort node for proc " + opName + ". Killing the procedure."; LOG.error(msg); {code} I'll commit HBASE-13901 right after leaving this comment here. Use ZKUtil#isEmpty instead of if (data == null ...) ? Just a nit. > ZK watches leaks during snapshots > - > > Key: HBASE-13885 > URL: https://issues.apache.org/jira/browse/HBASE-13885 > Project: HBase > Issue Type: Bug > Components: snapshots >Affects Versions: 0.98.12 >Reporter: Abhishek Singh Chouhan >Assignee: Lars Hofhansl >Priority: Critical > Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.1 > > Attachments: 13885-0.98-v2.txt, 13885-0.98-v3.txt, 13885-master.txt > > > When taking snapshot of a table a watcher over > /hbase/online-snapshot/abort/snapshot-name is created which is never cleared > when the snapshot is successful. If we use snapshots to take backups daily we > accumulate a lot of watches. > Steps to reproduce - > 1) Take snapshot of a table - snapshot 'table_1', 'abc' > 2) Run the following on zk node or alternatively observe zk watches metric > echo "wchc" | nc localhost 2181 > /hbase/online-snapshot/abort/abc can be found. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13885) ZK watches leaks during snapshots
[ https://issues.apache.org/jira/browse/HBASE-13885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14584969#comment-14584969 ] Hadoop QA commented on HBASE-13885: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12739467/13885-master.txt against master branch at commit 682b8ab8a542a903e5807053282693e3a96bad2d. ATTACHMENT ID: 12739467 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.1 2.5.2 2.6.0) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/14399//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/14399//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/14399//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/14399//console This message is automatically generated. > ZK watches leaks during snapshots > - > > Key: HBASE-13885 > URL: https://issues.apache.org/jira/browse/HBASE-13885 > Project: HBase > Issue Type: Bug > Components: snapshots >Affects Versions: 0.98.12 >Reporter: Abhishek Singh Chouhan >Assignee: Lars Hofhansl >Priority: Critical > Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.1 > > Attachments: 13885-0.98-v2.txt, 13885-0.98-v3.txt, 13885-master.txt > > > When taking snapshot of a table a watcher over > /hbase/online-snapshot/abort/snapshot-name is created which is never cleared > when the snapshot is successful. If we use snapshots to take backups daily we > accumulate a lot of watches. > Steps to reproduce - > 1) Take snapshot of a table - snapshot 'table_1', 'abc' > 2) Run the following on zk node or alternatively observe zk watches metric > echo "wchc" | nc localhost 2181 > /hbase/online-snapshot/abort/abc can be found. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13885) ZK watches leaks during snapshots
[ https://issues.apache.org/jira/browse/HBASE-13885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14584284#comment-14584284 ] Lars Hofhansl commented on HBASE-13885: --- [~apurtell], FYI for 0.98.14. Or if we want, since this is pretty bad we can pull it into the RC (0.98.14 seems fine, though) > ZK watches leaks during snapshots > - > > Key: HBASE-13885 > URL: https://issues.apache.org/jira/browse/HBASE-13885 > Project: HBase > Issue Type: Bug > Components: snapshots >Affects Versions: 0.98.12 >Reporter: Abhishek Singh Chouhan >Priority: Critical > Attachments: 13885-0.98-v2.txt, 13885-0.98-v3.txt > > > When taking snapshot of a table a watcher over > /hbase/online-snapshot/abort/snapshot-name is created which is never cleared > when the snapshot is successful. If we use snapshots to take backups daily we > accumulate a lot of watches. > Steps to reproduce - > 1) Take snapshot of a table - snapshot 'table_1', 'abc' > 2) Run the following on zk node or alternatively observe zk watches metric > echo "wchc" | nc localhost 2181 > /hbase/online-snapshot/abort/abc can be found. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13885) ZK watches leaks during snapshots
[ https://issues.apache.org/jira/browse/HBASE-13885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14584130#comment-14584130 ] Lars Hofhansl commented on HBASE-13885: --- To summarize: For ZKProcedure we set watchers for the Aborted, Acquired, and Completed znodes. When a procedure completes successfully it never triggers the Abort watcher, and if the procedure aborts it never triggers the Acquired watcher. So for each snapshot we'll leave a watcher hanging around per involved region server. A solution is to make sure we trigger the watches, passing some data along that indicates to the watchers to ignore the trigger. > ZK watches leaks during snapshots > - > > Key: HBASE-13885 > URL: https://issues.apache.org/jira/browse/HBASE-13885 > Project: HBase > Issue Type: Bug > Components: snapshots >Affects Versions: 0.98.12 >Reporter: Abhishek Singh Chouhan >Priority: Critical > > When taking snapshot of a table a watcher over > /hbase/online-snapshot/abort/snapshot-name is created which is never cleared > when the snapshot is successful. If we use snapshots to take backups daily we > accumulate a lot of watches. > Steps to reproduce - > 1) Take snapshot of a table - snapshot 'table_1', 'abc' > 2) Run the following on zk node or alternatively observe zk watches metric > echo "wchc" | nc localhost 2181 > /hbase/online-snapshot/abort/abc can be found. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13885) ZK watches leaks during snapshots
[ https://issues.apache.org/jira/browse/HBASE-13885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14583011#comment-14583011 ] Lars Hofhansl commented on HBASE-13885: --- I tried doing a noop change to the abort/ znode, but then there's a bit of logic to ignore this instead of failing the procedure. > ZK watches leaks during snapshots > - > > Key: HBASE-13885 > URL: https://issues.apache.org/jira/browse/HBASE-13885 > Project: HBase > Issue Type: Bug > Components: snapshots >Affects Versions: 0.98.12 >Reporter: Abhishek Singh Chouhan >Priority: Critical > > When taking snapshot of a table a watcher over > /hbase/online-snapshot/abort/snapshot-name is created which is never cleared > when the snapshot is successful. If we use snapshots to take backups daily we > accumulate a lot of watches. > Steps to reproduce - > 1) Take snapshot of a table - snapshot 'table_1', 'abc' > 2) Run the following on zk node or alternatively observe zk watches metric > echo "wchc" | nc localhost 2181 > /hbase/online-snapshot/abort/abc can be found. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13885) ZK watches leaks during snapshots
[ https://issues.apache.org/jira/browse/HBASE-13885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14582973#comment-14582973 ] Lars Hofhansl commented on HBASE-13885: --- So apparently before ZOOKEEPER-442 we must trigger a watch in order to remove it. Otherwise the watches will linger and accumulate. [~jesse_yates], [~mbertozzi], any ideas? > ZK watches leaks during snapshots > - > > Key: HBASE-13885 > URL: https://issues.apache.org/jira/browse/HBASE-13885 > Project: HBase > Issue Type: Bug > Components: snapshots >Affects Versions: 0.98.12 >Reporter: Abhishek Singh Chouhan >Priority: Critical > > When taking snapshot of a table a watcher over > /hbase/online-snapshot/abort/snapshot-name is created which is never cleared > when the snapshot is successful. If we use snapshots to take backups daily we > accumulate a lot of watches. > Steps to reproduce - > 1) Take snapshot of a table - snapshot 'table_1', 'abc' > 2) Run the following on zk node or alternatively observe zk watches metric > echo "wchc" | nc localhost 2181 > /hbase/online-snapshot/abort/abc can be found. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13885) ZK watches leaks during snapshots
[ https://issues.apache.org/jira/browse/HBASE-13885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14582916#comment-14582916 ] Lars Hofhansl commented on HBASE-13885: --- Making critical as it will render the cluster unusable over time. > ZK watches leaks during snapshots > - > > Key: HBASE-13885 > URL: https://issues.apache.org/jira/browse/HBASE-13885 > Project: HBase > Issue Type: Bug > Components: snapshots >Affects Versions: 0.98.12 >Reporter: Abhishek Singh Chouhan >Priority: Critical > > When taking snapshot of a table a watcher over > /hbase/online-snapshot/abort/snapshot-name is created which is never cleared > when the snapshot is successful. If we use snapshots to take backups daily we > accumulate a lot of watches. > Steps to reproduce - > 1) Take snapshot of a table - snapshot 'table_1', 'abc' > 2) Run the following on zk node or alternatively observe zk watches metric > echo "wchc" | nc localhost 2181 > /hbase/online-snapshot/abort/abc can be found. -- This message was sent by Atlassian JIRA (v6.3.4#6332)