[jira] [Commented] (HBASE-13885) ZK watches leaks during snapshots

2015-06-17 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14590163#comment-14590163
 ] 

Lars Hofhansl commented on HBASE-13885:
---

Thanks [~apurtell]!!
Looks like another "improve-flaky-jenkins" push is needed, although in this 
case it looks like we cannot do much.

> ZK watches leaks during snapshots
> -
>
> Key: HBASE-13885
> URL: https://issues.apache.org/jira/browse/HBASE-13885
> Project: HBase
>  Issue Type: Bug
>  Components: snapshots
>Affects Versions: 0.98.12
>Reporter: Abhishek Singh Chouhan
>Assignee: Lars Hofhansl
>Priority: Critical
> Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.1, 1.3.0
>
> Attachments: 13885-0.98-v2.txt, 13885-0.98-v3.txt, 13885-master.txt
>
>
> When taking snapshot of a table a watcher over 
> /hbase/online-snapshot/abort/snapshot-name is created which is never cleared 
> when the snapshot is successful. If we use snapshots to take backups daily we 
> accumulate a lot of watches.
> Steps to reproduce -
> 1) Take snapshot of a table - snapshot 'table_1', 'abc'
> 2) Run the following on zk node or alternatively observe zk watches metric
>  echo "wchc" | nc localhost 2181
> /hbase/online-snapshot/abort/abc can be found.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13885) ZK watches leaks during snapshots

2015-06-17 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14590140#comment-14590140
 ] 

Andrew Purtell commented on HBASE-13885:


I checked the trunk, 1.0, 1.1, and 0.98 build failures. Trunk is not related. 
The 1.2 build failed because Surefire timed out a test, could just be a dirty 
Jenkins run due to system load. The 1.1 build failed because the Surefire 
executor was killed externally. I think this is some other builds misfiring 
zombie detector. The 1.0 failure looks spurious and not related as no 
procedures are involved. The 0.98 failure is unrelated and a known dirty test 
that doesn't run well up on Jenkins because the executor env is underpowered 
for what the test needs. 

Another crappy round of runs on ASF jenkins, another day.



> ZK watches leaks during snapshots
> -
>
> Key: HBASE-13885
> URL: https://issues.apache.org/jira/browse/HBASE-13885
> Project: HBase
>  Issue Type: Bug
>  Components: snapshots
>Affects Versions: 0.98.12
>Reporter: Abhishek Singh Chouhan
>Assignee: Lars Hofhansl
>Priority: Critical
> Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.1, 1.3.0
>
> Attachments: 13885-0.98-v2.txt, 13885-0.98-v3.txt, 13885-master.txt
>
>
> When taking snapshot of a table a watcher over 
> /hbase/online-snapshot/abort/snapshot-name is created which is never cleared 
> when the snapshot is successful. If we use snapshots to take backups daily we 
> accumulate a lot of watches.
> Steps to reproduce -
> 1) Take snapshot of a table - snapshot 'table_1', 'abc'
> 2) Run the following on zk node or alternatively observe zk watches metric
>  echo "wchc" | nc localhost 2181
> /hbase/online-snapshot/abort/abc can be found.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13885) ZK watches leaks during snapshots

2015-06-17 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14590054#comment-14590054
 ] 

Lars Hofhansl commented on HBASE-13885:
---

I'll check the test failures and see whether they are related.

> ZK watches leaks during snapshots
> -
>
> Key: HBASE-13885
> URL: https://issues.apache.org/jira/browse/HBASE-13885
> Project: HBase
>  Issue Type: Bug
>  Components: snapshots
>Affects Versions: 0.98.12
>Reporter: Abhishek Singh Chouhan
>Assignee: Lars Hofhansl
>Priority: Critical
> Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.1, 1.3.0
>
> Attachments: 13885-0.98-v2.txt, 13885-0.98-v3.txt, 13885-master.txt
>
>
> When taking snapshot of a table a watcher over 
> /hbase/online-snapshot/abort/snapshot-name is created which is never cleared 
> when the snapshot is successful. If we use snapshots to take backups daily we 
> accumulate a lot of watches.
> Steps to reproduce -
> 1) Take snapshot of a table - snapshot 'table_1', 'abc'
> 2) Run the following on zk node or alternatively observe zk watches metric
>  echo "wchc" | nc localhost 2181
> /hbase/online-snapshot/abort/abc can be found.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13885) ZK watches leaks during snapshots

2015-06-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14589208#comment-14589208
 ] 

Hudson commented on HBASE-13885:


FAILURE: Integrated in HBase-1.3 #2 (See 
[https://builds.apache.org/job/HBase-1.3/2/])
HBASE-13885 ZK watches leaks during snapshots. (larsh: rev 
7f6f199cf70cbabc38be6407a15f0461e55ac850)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/procedure/ZKProcedureUtil.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/procedure/ZKProcedureCoordinatorRpcs.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/procedure/ZKProcedureMemberRpcs.java


> ZK watches leaks during snapshots
> -
>
> Key: HBASE-13885
> URL: https://issues.apache.org/jira/browse/HBASE-13885
> Project: HBase
>  Issue Type: Bug
>  Components: snapshots
>Affects Versions: 0.98.12
>Reporter: Abhishek Singh Chouhan
>Assignee: Lars Hofhansl
>Priority: Critical
> Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.1, 1.3.0
>
> Attachments: 13885-0.98-v2.txt, 13885-0.98-v3.txt, 13885-master.txt
>
>
> When taking snapshot of a table a watcher over 
> /hbase/online-snapshot/abort/snapshot-name is created which is never cleared 
> when the snapshot is successful. If we use snapshots to take backups daily we 
> accumulate a lot of watches.
> Steps to reproduce -
> 1) Take snapshot of a table - snapshot 'table_1', 'abc'
> 2) Run the following on zk node or alternatively observe zk watches metric
>  echo "wchc" | nc localhost 2181
> /hbase/online-snapshot/abort/abc can be found.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13885) ZK watches leaks during snapshots

2015-06-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14589143#comment-14589143
 ] 

Hudson commented on HBASE-13885:


FAILURE: Integrated in HBase-0.98 #1030 (See 
[https://builds.apache.org/job/HBase-0.98/1030/])
HBASE-13885 ZK watches leaks during snapshots. (larsh: rev 
049e68177124623f589aa09611e867c0f4cb41bd)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/procedure/ZKProcedureUtil.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/procedure/ZKProcedureCoordinatorRpcs.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/procedure/ZKProcedureMemberRpcs.java


> ZK watches leaks during snapshots
> -
>
> Key: HBASE-13885
> URL: https://issues.apache.org/jira/browse/HBASE-13885
> Project: HBase
>  Issue Type: Bug
>  Components: snapshots
>Affects Versions: 0.98.12
>Reporter: Abhishek Singh Chouhan
>Assignee: Lars Hofhansl
>Priority: Critical
> Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.1, 1.3.0
>
> Attachments: 13885-0.98-v2.txt, 13885-0.98-v3.txt, 13885-master.txt
>
>
> When taking snapshot of a table a watcher over 
> /hbase/online-snapshot/abort/snapshot-name is created which is never cleared 
> when the snapshot is successful. If we use snapshots to take backups daily we 
> accumulate a lot of watches.
> Steps to reproduce -
> 1) Take snapshot of a table - snapshot 'table_1', 'abc'
> 2) Run the following on zk node or alternatively observe zk watches metric
>  echo "wchc" | nc localhost 2181
> /hbase/online-snapshot/abort/abc can be found.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13885) ZK watches leaks during snapshots

2015-06-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14588997#comment-14588997
 ] 

Hudson commented on HBASE-13885:


FAILURE: Integrated in HBase-TRUNK #6578 (See 
[https://builds.apache.org/job/HBase-TRUNK/6578/])
HBASE-13885 ZK watches leaks during snapshots. (larsh: rev 
ce2fd2c58c5358f07defe13c8ae56e1bbfd59590)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/procedure/ZKProcedureCoordinatorRpcs.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/procedure/ZKProcedureUtil.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/procedure/ZKProcedureMemberRpcs.java


> ZK watches leaks during snapshots
> -
>
> Key: HBASE-13885
> URL: https://issues.apache.org/jira/browse/HBASE-13885
> Project: HBase
>  Issue Type: Bug
>  Components: snapshots
>Affects Versions: 0.98.12
>Reporter: Abhishek Singh Chouhan
>Assignee: Lars Hofhansl
>Priority: Critical
> Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.1, 1.3.0
>
> Attachments: 13885-0.98-v2.txt, 13885-0.98-v3.txt, 13885-master.txt
>
>
> When taking snapshot of a table a watcher over 
> /hbase/online-snapshot/abort/snapshot-name is created which is never cleared 
> when the snapshot is successful. If we use snapshots to take backups daily we 
> accumulate a lot of watches.
> Steps to reproduce -
> 1) Take snapshot of a table - snapshot 'table_1', 'abc'
> 2) Run the following on zk node or alternatively observe zk watches metric
>  echo "wchc" | nc localhost 2181
> /hbase/online-snapshot/abort/abc can be found.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13885) ZK watches leaks during snapshots

2015-06-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14588979#comment-14588979
 ] 

Hudson commented on HBASE-13885:


FAILURE: Integrated in HBase-1.2 #17 (See 
[https://builds.apache.org/job/HBase-1.2/17/])
HBASE-13885 ZK watches leaks during snapshots. (larsh: rev 
cb126dd99e33732aff8b82d9eb5bd211343e7263)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/procedure/ZKProcedureMemberRpcs.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/procedure/ZKProcedureUtil.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/procedure/ZKProcedureCoordinatorRpcs.java


> ZK watches leaks during snapshots
> -
>
> Key: HBASE-13885
> URL: https://issues.apache.org/jira/browse/HBASE-13885
> Project: HBase
>  Issue Type: Bug
>  Components: snapshots
>Affects Versions: 0.98.12
>Reporter: Abhishek Singh Chouhan
>Assignee: Lars Hofhansl
>Priority: Critical
> Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.1, 1.3.0
>
> Attachments: 13885-0.98-v2.txt, 13885-0.98-v3.txt, 13885-master.txt
>
>
> When taking snapshot of a table a watcher over 
> /hbase/online-snapshot/abort/snapshot-name is created which is never cleared 
> when the snapshot is successful. If we use snapshots to take backups daily we 
> accumulate a lot of watches.
> Steps to reproduce -
> 1) Take snapshot of a table - snapshot 'table_1', 'abc'
> 2) Run the following on zk node or alternatively observe zk watches metric
>  echo "wchc" | nc localhost 2181
> /hbase/online-snapshot/abort/abc can be found.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13885) ZK watches leaks during snapshots

2015-06-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14588835#comment-14588835
 ] 

Hudson commented on HBASE-13885:


FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #983 (See 
[https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/983/])
HBASE-13885 ZK watches leaks during snapshots. (larsh: rev 
049e68177124623f589aa09611e867c0f4cb41bd)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/procedure/ZKProcedureMemberRpcs.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/procedure/ZKProcedureUtil.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/procedure/ZKProcedureCoordinatorRpcs.java


> ZK watches leaks during snapshots
> -
>
> Key: HBASE-13885
> URL: https://issues.apache.org/jira/browse/HBASE-13885
> Project: HBase
>  Issue Type: Bug
>  Components: snapshots
>Affects Versions: 0.98.12
>Reporter: Abhishek Singh Chouhan
>Assignee: Lars Hofhansl
>Priority: Critical
> Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.1, 1.3.0
>
> Attachments: 13885-0.98-v2.txt, 13885-0.98-v3.txt, 13885-master.txt
>
>
> When taking snapshot of a table a watcher over 
> /hbase/online-snapshot/abort/snapshot-name is created which is never cleared 
> when the snapshot is successful. If we use snapshots to take backups daily we 
> accumulate a lot of watches.
> Steps to reproduce -
> 1) Take snapshot of a table - snapshot 'table_1', 'abc'
> 2) Run the following on zk node or alternatively observe zk watches metric
>  echo "wchc" | nc localhost 2181
> /hbase/online-snapshot/abort/abc can be found.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13885) ZK watches leaks during snapshots

2015-06-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14588824#comment-14588824
 ] 

Hudson commented on HBASE-13885:


FAILURE: Integrated in HBase-1.0 #964 (See 
[https://builds.apache.org/job/HBase-1.0/964/])
HBASE-13885 ZK watches leaks during snapshots. (larsh: rev 
a2085155f5d6b9c07f641aabbede9180636f655e)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/procedure/ZKProcedureCoordinatorRpcs.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/procedure/ZKProcedureMemberRpcs.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/procedure/ZKProcedureUtil.java


> ZK watches leaks during snapshots
> -
>
> Key: HBASE-13885
> URL: https://issues.apache.org/jira/browse/HBASE-13885
> Project: HBase
>  Issue Type: Bug
>  Components: snapshots
>Affects Versions: 0.98.12
>Reporter: Abhishek Singh Chouhan
>Assignee: Lars Hofhansl
>Priority: Critical
> Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.1, 1.3.0
>
> Attachments: 13885-0.98-v2.txt, 13885-0.98-v3.txt, 13885-master.txt
>
>
> When taking snapshot of a table a watcher over 
> /hbase/online-snapshot/abort/snapshot-name is created which is never cleared 
> when the snapshot is successful. If we use snapshots to take backups daily we 
> accumulate a lot of watches.
> Steps to reproduce -
> 1) Take snapshot of a table - snapshot 'table_1', 'abc'
> 2) Run the following on zk node or alternatively observe zk watches metric
>  echo "wchc" | nc localhost 2181
> /hbase/online-snapshot/abort/abc can be found.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13885) ZK watches leaks during snapshots

2015-06-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14588721#comment-14588721
 ] 

Hudson commented on HBASE-13885:


FAILURE: Integrated in HBase-1.1 #545 (See 
[https://builds.apache.org/job/HBase-1.1/545/])
HBASE-13885 ZK watches leaks during snapshots. (larsh: rev 
d7b56f631d1fe2a180746b134696d20e4617df2d)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/procedure/ZKProcedureCoordinatorRpcs.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/procedure/ZKProcedureUtil.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/procedure/ZKProcedureMemberRpcs.java


> ZK watches leaks during snapshots
> -
>
> Key: HBASE-13885
> URL: https://issues.apache.org/jira/browse/HBASE-13885
> Project: HBase
>  Issue Type: Bug
>  Components: snapshots
>Affects Versions: 0.98.12
>Reporter: Abhishek Singh Chouhan
>Assignee: Lars Hofhansl
>Priority: Critical
> Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.1, 1.3.0
>
> Attachments: 13885-0.98-v2.txt, 13885-0.98-v3.txt, 13885-master.txt
>
>
> When taking snapshot of a table a watcher over 
> /hbase/online-snapshot/abort/snapshot-name is created which is never cleared 
> when the snapshot is successful. If we use snapshots to take backups daily we 
> accumulate a lot of watches.
> Steps to reproduce -
> 1) Take snapshot of a table - snapshot 'table_1', 'abc'
> 2) Run the following on zk node or alternatively observe zk watches metric
>  echo "wchc" | nc localhost 2181
> /hbase/online-snapshot/abort/abc can be found.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13885) ZK watches leaks during snapshots

2015-06-16 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14588658#comment-14588658
 ] 

Andrew Purtell commented on HBASE-13885:


bq.  I do think the committer should commit to all branches and not push that 
onto the RM.
I agree, but as 0.98 RM I'm giving people an out :-) 

> ZK watches leaks during snapshots
> -
>
> Key: HBASE-13885
> URL: https://issues.apache.org/jira/browse/HBASE-13885
> Project: HBase
>  Issue Type: Bug
>  Components: snapshots
>Affects Versions: 0.98.12
>Reporter: Abhishek Singh Chouhan
>Assignee: Lars Hofhansl
>Priority: Critical
> Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.1, 1.3.0
>
> Attachments: 13885-0.98-v2.txt, 13885-0.98-v3.txt, 13885-master.txt
>
>
> When taking snapshot of a table a watcher over 
> /hbase/online-snapshot/abort/snapshot-name is created which is never cleared 
> when the snapshot is successful. If we use snapshots to take backups daily we 
> accumulate a lot of watches.
> Steps to reproduce -
> 1) Take snapshot of a table - snapshot 'table_1', 'abc'
> 2) Run the following on zk node or alternatively observe zk watches metric
>  echo "wchc" | nc localhost 2181
> /hbase/online-snapshot/abort/abc can be found.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13885) ZK watches leaks during snapshots

2015-06-16 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14588656#comment-14588656
 ] 

Lars Hofhansl commented on HBASE-13885:
---

I did cherry-pick from master all the way down to 0.98. :)
But I almost missed the 1.2.0 branch, since branch-1 now means 1.3.0.

I do think the committer should commit to all branches and not push that onto 
the RM.


> ZK watches leaks during snapshots
> -
>
> Key: HBASE-13885
> URL: https://issues.apache.org/jira/browse/HBASE-13885
> Project: HBase
>  Issue Type: Bug
>  Components: snapshots
>Affects Versions: 0.98.12
>Reporter: Abhishek Singh Chouhan
>Assignee: Lars Hofhansl
>Priority: Critical
> Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.1, 1.3.0
>
> Attachments: 13885-0.98-v2.txt, 13885-0.98-v3.txt, 13885-master.txt
>
>
> When taking snapshot of a table a watcher over 
> /hbase/online-snapshot/abort/snapshot-name is created which is never cleared 
> when the snapshot is successful. If we use snapshots to take backups daily we 
> accumulate a lot of watches.
> Steps to reproduce -
> 1) Take snapshot of a table - snapshot 'table_1', 'abc'
> 2) Run the following on zk node or alternatively observe zk watches metric
>  echo "wchc" | nc localhost 2181
> /hbase/online-snapshot/abort/abc can be found.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13885) ZK watches leaks during snapshots

2015-06-16 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14588624#comment-14588624
 ] 

Andrew Purtell commented on HBASE-13885:


bq. do we have too many branches to maintain now?
I don't find this to be the case, but it's going to depend on workflow. I start 
with a master patch, then pick back (with adjustments) to branch-1, then pick 
back that commit to 1.2, then pick back that commit to 1.1, then 1.0, 0.98 etc. 
If we are good with this approach then I think it can work for others. The 
number of branches will slow us down, though, that's true. Expect to only 
commit one or two things per day. That can have its benefits, actually.

It's fine if you leave off 0.98, I'll take care of that, but leave the fix 
version if it's relevant.

> ZK watches leaks during snapshots
> -
>
> Key: HBASE-13885
> URL: https://issues.apache.org/jira/browse/HBASE-13885
> Project: HBase
>  Issue Type: Bug
>  Components: snapshots
>Affects Versions: 0.98.12
>Reporter: Abhishek Singh Chouhan
>Assignee: Lars Hofhansl
>Priority: Critical
> Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.1, 1.3.0
>
> Attachments: 13885-0.98-v2.txt, 13885-0.98-v3.txt, 13885-master.txt
>
>
> When taking snapshot of a table a watcher over 
> /hbase/online-snapshot/abort/snapshot-name is created which is never cleared 
> when the snapshot is successful. If we use snapshots to take backups daily we 
> accumulate a lot of watches.
> Steps to reproduce -
> 1) Take snapshot of a table - snapshot 'table_1', 'abc'
> 2) Run the following on zk node or alternatively observe zk watches metric
>  echo "wchc" | nc localhost 2181
> /hbase/online-snapshot/abort/abc can be found.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13885) ZK watches leaks during snapshots

2015-06-16 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14588629#comment-14588629
 ] 

Andrew Purtell commented on HBASE-13885:


bq. I'd have to create a NodeAndData object
Yeah, ok

> ZK watches leaks during snapshots
> -
>
> Key: HBASE-13885
> URL: https://issues.apache.org/jira/browse/HBASE-13885
> Project: HBase
>  Issue Type: Bug
>  Components: snapshots
>Affects Versions: 0.98.12
>Reporter: Abhishek Singh Chouhan
>Assignee: Lars Hofhansl
>Priority: Critical
> Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.1, 1.3.0
>
> Attachments: 13885-0.98-v2.txt, 13885-0.98-v3.txt, 13885-master.txt
>
>
> When taking snapshot of a table a watcher over 
> /hbase/online-snapshot/abort/snapshot-name is created which is never cleared 
> when the snapshot is successful. If we use snapshots to take backups daily we 
> accumulate a lot of watches.
> Steps to reproduce -
> 1) Take snapshot of a table - snapshot 'table_1', 'abc'
> 2) Run the following on zk node or alternatively observe zk watches metric
>  echo "wchc" | nc localhost 2181
> /hbase/online-snapshot/abort/abc can be found.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13885) ZK watches leaks during snapshots

2015-06-16 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14588609#comment-14588609
 ] 

Lars Hofhansl commented on HBASE-13885:
---

Committed to 0.98, 1.0.2, 1.1.1, 1.2.0, 1.3.0, and 2.0.0.
(do we have too many branches to maintain now?)



> ZK watches leaks during snapshots
> -
>
> Key: HBASE-13885
> URL: https://issues.apache.org/jira/browse/HBASE-13885
> Project: HBase
>  Issue Type: Bug
>  Components: snapshots
>Affects Versions: 0.98.12
>Reporter: Abhishek Singh Chouhan
>Assignee: Lars Hofhansl
>Priority: Critical
> Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.1, 1.3.0
>
> Attachments: 13885-0.98-v2.txt, 13885-0.98-v3.txt, 13885-master.txt
>
>
> When taking snapshot of a table a watcher over 
> /hbase/online-snapshot/abort/snapshot-name is created which is never cleared 
> when the snapshot is successful. If we use snapshots to take backups daily we 
> accumulate a lot of watches.
> Steps to reproduce -
> 1) Take snapshot of a table - snapshot 'table_1', 'abc'
> 2) Run the following on zk node or alternatively observe zk watches metric
>  echo "wchc" | nc localhost 2181
> /hbase/online-snapshot/abort/abc can be found.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13885) ZK watches leaks during snapshots

2015-06-15 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14587404#comment-14587404
 ] 

Lars Hofhansl commented on HBASE-13885:
---

Thanks. Committing tonight to all branches (after changing to use 
ZKUtil.isEmpty).

> ZK watches leaks during snapshots
> -
>
> Key: HBASE-13885
> URL: https://issues.apache.org/jira/browse/HBASE-13885
> Project: HBase
>  Issue Type: Bug
>  Components: snapshots
>Affects Versions: 0.98.12
>Reporter: Abhishek Singh Chouhan
>Assignee: Lars Hofhansl
>Priority: Critical
> Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.1
>
> Attachments: 13885-0.98-v2.txt, 13885-0.98-v3.txt, 13885-master.txt
>
>
> When taking snapshot of a table a watcher over 
> /hbase/online-snapshot/abort/snapshot-name is created which is never cleared 
> when the snapshot is successful. If we use snapshots to take backups daily we 
> accumulate a lot of watches.
> Steps to reproduce -
> 1) Take snapshot of a table - snapshot 'table_1', 'abc'
> 2) Run the following on zk node or alternatively observe zk watches metric
>  echo "wchc" | nc localhost 2181
> /hbase/online-snapshot/abort/abc can be found.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13885) ZK watches leaks during snapshots

2015-06-15 Thread Jesse Yates (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14587062#comment-14587062
 ] 

Jesse Yates commented on HBASE-13885:
-

Seems reasonable to me :)

> ZK watches leaks during snapshots
> -
>
> Key: HBASE-13885
> URL: https://issues.apache.org/jira/browse/HBASE-13885
> Project: HBase
>  Issue Type: Bug
>  Components: snapshots
>Affects Versions: 0.98.12
>Reporter: Abhishek Singh Chouhan
>Assignee: Lars Hofhansl
>Priority: Critical
> Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.1
>
> Attachments: 13885-0.98-v2.txt, 13885-0.98-v3.txt, 13885-master.txt
>
>
> When taking snapshot of a table a watcher over 
> /hbase/online-snapshot/abort/snapshot-name is created which is never cleared 
> when the snapshot is successful. If we use snapshots to take backups daily we 
> accumulate a lot of watches.
> Steps to reproduce -
> 1) Take snapshot of a table - snapshot 'table_1', 'abc'
> 2) Run the following on zk node or alternatively observe zk watches metric
>  echo "wchc" | nc localhost 2181
> /hbase/online-snapshot/abort/abc can be found.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13885) ZK watches leaks during snapshots

2015-06-15 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14587055#comment-14587055
 ] 

Lars Hofhansl commented on HBASE-13885:
---

Cool... I'll commit this everywhere. [~jesse_yates], any last comments?

> ZK watches leaks during snapshots
> -
>
> Key: HBASE-13885
> URL: https://issues.apache.org/jira/browse/HBASE-13885
> Project: HBase
>  Issue Type: Bug
>  Components: snapshots
>Affects Versions: 0.98.12
>Reporter: Abhishek Singh Chouhan
>Assignee: Lars Hofhansl
>Priority: Critical
> Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.1
>
> Attachments: 13885-0.98-v2.txt, 13885-0.98-v3.txt, 13885-master.txt
>
>
> When taking snapshot of a table a watcher over 
> /hbase/online-snapshot/abort/snapshot-name is created which is never cleared 
> when the snapshot is successful. If we use snapshots to take backups daily we 
> accumulate a lot of watches.
> Steps to reproduce -
> 1) Take snapshot of a table - snapshot 'table_1', 'abc'
> 2) Run the following on zk node or alternatively observe zk watches metric
>  echo "wchc" | nc localhost 2181
> /hbase/online-snapshot/abort/abc can be found.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13885) ZK watches leaks during snapshots

2015-06-15 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14586449#comment-14586449
 ] 

Andrew Purtell commented on HBASE-13885:


+1

bq. FYI for 0.98.14. Or if we want, since this is pretty bad we can pull it 
into the RC (0.98.14 seems fine, though)

0.98.13 is super late and we've been through a couple of RCs already. This 
issue has been there with snapshots/procedure v1 since the beginning. It's 
important but we can get to it when 0.98.14 goes out next month more or less on 
the normal cadence. 

HBASE-13901 adds a check in ZKUtil#isEmpty for this type of thing:
{code}
diff --git 
a/hbase-server/src/main/java/org/apache/hadoop/hbase/procedure/ZKProcedureMemberRpcs.java
 
b/hbase-server/src/main/java/org/apache/hadoop/hbase/procedure/ZKProcedureMemberRpcs.java
index 8620558..114d735 100644
--- 
a/hbase-server/src/main/java/org/apache/hadoop/hbase/procedure/ZKProcedureMemberRpcs.java
+++ 
b/hbase-server/src/main/java/org/apache/hadoop/hbase/procedure/ZKProcedureMemberRpcs.java
@@ -309,7 +309,10 @@ public class ZKProcedureMemberRpcs implements 
ProcedureMemberRpcs {
   // figure out the data we need to pass
   ForeignException ee;
   try {
-if (!ProtobufUtil.isPBMagicPrefix(data)) {
+if (data == null || data.length == 0) {
+  // ignore
+  return;
+} else if (!ProtobufUtil.isPBMagicPrefix(data)) {
   String msg = "Illegally formatted data in abort node for proc " + 
opName
   + ".  Killing the procedure.";
   LOG.error(msg);
{code}
I'll commit HBASE-13901 right after leaving this comment here. Use 
ZKUtil#isEmpty instead of if (data == null ...) ? Just a nit.

> ZK watches leaks during snapshots
> -
>
> Key: HBASE-13885
> URL: https://issues.apache.org/jira/browse/HBASE-13885
> Project: HBase
>  Issue Type: Bug
>  Components: snapshots
>Affects Versions: 0.98.12
>Reporter: Abhishek Singh Chouhan
>Assignee: Lars Hofhansl
>Priority: Critical
> Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.1
>
> Attachments: 13885-0.98-v2.txt, 13885-0.98-v3.txt, 13885-master.txt
>
>
> When taking snapshot of a table a watcher over 
> /hbase/online-snapshot/abort/snapshot-name is created which is never cleared 
> when the snapshot is successful. If we use snapshots to take backups daily we 
> accumulate a lot of watches.
> Steps to reproduce -
> 1) Take snapshot of a table - snapshot 'table_1', 'abc'
> 2) Run the following on zk node or alternatively observe zk watches metric
>  echo "wchc" | nc localhost 2181
> /hbase/online-snapshot/abort/abc can be found.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13885) ZK watches leaks during snapshots

2015-06-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14584969#comment-14584969
 ] 

Hadoop QA commented on HBASE-13885:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12739467/13885-master.txt
  against master branch at commit 682b8ab8a542a903e5807053282693e3a96bad2d.
  ATTACHMENT ID: 12739467

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.1 2.5.2 2.6.0)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
 

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14399//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14399//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14399//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14399//console

This message is automatically generated.

> ZK watches leaks during snapshots
> -
>
> Key: HBASE-13885
> URL: https://issues.apache.org/jira/browse/HBASE-13885
> Project: HBase
>  Issue Type: Bug
>  Components: snapshots
>Affects Versions: 0.98.12
>Reporter: Abhishek Singh Chouhan
>Assignee: Lars Hofhansl
>Priority: Critical
> Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.1
>
> Attachments: 13885-0.98-v2.txt, 13885-0.98-v3.txt, 13885-master.txt
>
>
> When taking snapshot of a table a watcher over 
> /hbase/online-snapshot/abort/snapshot-name is created which is never cleared 
> when the snapshot is successful. If we use snapshots to take backups daily we 
> accumulate a lot of watches.
> Steps to reproduce -
> 1) Take snapshot of a table - snapshot 'table_1', 'abc'
> 2) Run the following on zk node or alternatively observe zk watches metric
>  echo "wchc" | nc localhost 2181
> /hbase/online-snapshot/abort/abc can be found.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13885) ZK watches leaks during snapshots

2015-06-12 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14584284#comment-14584284
 ] 

Lars Hofhansl commented on HBASE-13885:
---

[~apurtell], FYI for 0.98.14. Or if we want, since this is pretty bad we can 
pull it into the RC (0.98.14 seems fine, though)

> ZK watches leaks during snapshots
> -
>
> Key: HBASE-13885
> URL: https://issues.apache.org/jira/browse/HBASE-13885
> Project: HBase
>  Issue Type: Bug
>  Components: snapshots
>Affects Versions: 0.98.12
>Reporter: Abhishek Singh Chouhan
>Priority: Critical
> Attachments: 13885-0.98-v2.txt, 13885-0.98-v3.txt
>
>
> When taking snapshot of a table a watcher over 
> /hbase/online-snapshot/abort/snapshot-name is created which is never cleared 
> when the snapshot is successful. If we use snapshots to take backups daily we 
> accumulate a lot of watches.
> Steps to reproduce -
> 1) Take snapshot of a table - snapshot 'table_1', 'abc'
> 2) Run the following on zk node or alternatively observe zk watches metric
>  echo "wchc" | nc localhost 2181
> /hbase/online-snapshot/abort/abc can be found.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13885) ZK watches leaks during snapshots

2015-06-12 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14584130#comment-14584130
 ] 

Lars Hofhansl commented on HBASE-13885:
---

To summarize:

For ZKProcedure we set watchers for the Aborted, Acquired, and Completed 
znodes. When a procedure completes successfully it never triggers the Abort 
watcher, and if the procedure aborts it never triggers the Acquired watcher. So 
for each snapshot we'll leave a watcher hanging around per involved region 
server.
A solution is to make sure we trigger the watches, passing some data along that 
indicates to the watchers to ignore the trigger.


> ZK watches leaks during snapshots
> -
>
> Key: HBASE-13885
> URL: https://issues.apache.org/jira/browse/HBASE-13885
> Project: HBase
>  Issue Type: Bug
>  Components: snapshots
>Affects Versions: 0.98.12
>Reporter: Abhishek Singh Chouhan
>Priority: Critical
>
> When taking snapshot of a table a watcher over 
> /hbase/online-snapshot/abort/snapshot-name is created which is never cleared 
> when the snapshot is successful. If we use snapshots to take backups daily we 
> accumulate a lot of watches.
> Steps to reproduce -
> 1) Take snapshot of a table - snapshot 'table_1', 'abc'
> 2) Run the following on zk node or alternatively observe zk watches metric
>  echo "wchc" | nc localhost 2181
> /hbase/online-snapshot/abort/abc can be found.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13885) ZK watches leaks during snapshots

2015-06-11 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14583011#comment-14583011
 ] 

Lars Hofhansl commented on HBASE-13885:
---

I tried doing a noop change to the abort/ znode, but then there's a bit 
of logic to ignore this instead of failing the procedure.

> ZK watches leaks during snapshots
> -
>
> Key: HBASE-13885
> URL: https://issues.apache.org/jira/browse/HBASE-13885
> Project: HBase
>  Issue Type: Bug
>  Components: snapshots
>Affects Versions: 0.98.12
>Reporter: Abhishek Singh Chouhan
>Priority: Critical
>
> When taking snapshot of a table a watcher over 
> /hbase/online-snapshot/abort/snapshot-name is created which is never cleared 
> when the snapshot is successful. If we use snapshots to take backups daily we 
> accumulate a lot of watches.
> Steps to reproduce -
> 1) Take snapshot of a table - snapshot 'table_1', 'abc'
> 2) Run the following on zk node or alternatively observe zk watches metric
>  echo "wchc" | nc localhost 2181
> /hbase/online-snapshot/abort/abc can be found.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13885) ZK watches leaks during snapshots

2015-06-11 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14582973#comment-14582973
 ] 

Lars Hofhansl commented on HBASE-13885:
---

So apparently before ZOOKEEPER-442 we must trigger a watch in order to remove 
it. Otherwise the watches will linger and accumulate.

[~jesse_yates], [~mbertozzi], any ideas?


> ZK watches leaks during snapshots
> -
>
> Key: HBASE-13885
> URL: https://issues.apache.org/jira/browse/HBASE-13885
> Project: HBase
>  Issue Type: Bug
>  Components: snapshots
>Affects Versions: 0.98.12
>Reporter: Abhishek Singh Chouhan
>Priority: Critical
>
> When taking snapshot of a table a watcher over 
> /hbase/online-snapshot/abort/snapshot-name is created which is never cleared 
> when the snapshot is successful. If we use snapshots to take backups daily we 
> accumulate a lot of watches.
> Steps to reproduce -
> 1) Take snapshot of a table - snapshot 'table_1', 'abc'
> 2) Run the following on zk node or alternatively observe zk watches metric
>  echo "wchc" | nc localhost 2181
> /hbase/online-snapshot/abort/abc can be found.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13885) ZK watches leaks during snapshots

2015-06-11 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14582916#comment-14582916
 ] 

Lars Hofhansl commented on HBASE-13885:
---

Making critical as it will render the cluster unusable over time.

> ZK watches leaks during snapshots
> -
>
> Key: HBASE-13885
> URL: https://issues.apache.org/jira/browse/HBASE-13885
> Project: HBase
>  Issue Type: Bug
>  Components: snapshots
>Affects Versions: 0.98.12
>Reporter: Abhishek Singh Chouhan
>Priority: Critical
>
> When taking snapshot of a table a watcher over 
> /hbase/online-snapshot/abort/snapshot-name is created which is never cleared 
> when the snapshot is successful. If we use snapshots to take backups daily we 
> accumulate a lot of watches.
> Steps to reproduce -
> 1) Take snapshot of a table - snapshot 'table_1', 'abc'
> 2) Run the following on zk node or alternatively observe zk watches metric
>  echo "wchc" | nc localhost 2181
> /hbase/online-snapshot/abort/abc can be found.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)