[jira] [Updated] (HBASE-18549) Unclaimed replication queues can go undetected

2019-02-01 Thread Andrew Purtell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-18549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-18549:
---
Fix Version/s: (was: 1.5.0)

> Unclaimed replication queues can go undetected
> --
>
> Key: HBASE-18549
> URL: https://issues.apache.org/jira/browse/HBASE-18549
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Reporter: Ashu Pachauri
>Assignee: Xu Cang
>Priority: Critical
> Fix For: 3.0.0, 2.2.0, 1.4.8, 2.1.1
>
> Attachments: HBASE-18549-.master.001.patch, 
> HBASE-18549-.master.002.patch, HBASE-18549-.master.003.patch, 
> HBASE-18549-.master.004.patch, HBASE-18549.branch-1.001.patch, 
> HBASE-18549.branch-1.001.patch
>
>
> We have come across this situation multiple times where a zookeeper issues 
> can cause NodeFailoverWorker to fail picking up replication queue for a dead 
> region server silently. One example is when the znode size for a particular 
> queue exceed jute.maxBuffer value.
> There can be other situations that may lead to this and just go undetected. 
> We need to have a metric for number of unclaimed replication queues. This 
> will help in mitigating the problem through alerting on the metric and 
> identifying underlying issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-18549) Unclaimed replication queues can go undetected

2018-10-01 Thread Andrew Purtell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-18549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-18549:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: (was: 1.3.3)
   2.1.1
   2.2.0
   3.0.0
   Status: Resolved  (was: Patch Available)

Committed. Thanks for taking this one on [~xucang]

> Unclaimed replication queues can go undetected
> --
>
> Key: HBASE-18549
> URL: https://issues.apache.org/jira/browse/HBASE-18549
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Reporter: Ashu Pachauri
>Assignee: Xu Cang
>Priority: Critical
> Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.8, 2.1.1
>
> Attachments: HBASE-18549-.master.001.patch, 
> HBASE-18549-.master.002.patch, HBASE-18549-.master.003.patch, 
> HBASE-18549-.master.004.patch, HBASE-18549.branch-1.001.patch, 
> HBASE-18549.branch-1.001.patch
>
>
> We have come across this situation multiple times where a zookeeper issues 
> can cause NodeFailoverWorker to fail picking up replication queue for a dead 
> region server silently. One example is when the znode size for a particular 
> queue exceed jute.maxBuffer value.
> There can be other situations that may lead to this and just go undetected. 
> We need to have a metric for number of unclaimed replication queues. This 
> will help in mitigating the problem through alerting on the metric and 
> identifying underlying issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-18549) Unclaimed replication queues can go undetected

2018-09-04 Thread Xu Cang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-18549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xu Cang updated HBASE-18549:

Attachment: HBASE-18549.branch-1.001.patch

> Unclaimed replication queues can go undetected
> --
>
> Key: HBASE-18549
> URL: https://issues.apache.org/jira/browse/HBASE-18549
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Reporter: Ashu Pachauri
>Assignee: Xu Cang
>Priority: Critical
> Fix For: 1.5.0, 1.3.3, 1.4.8
>
> Attachments: HBASE-18549-.master.001.patch, 
> HBASE-18549-.master.002.patch, HBASE-18549-.master.003.patch, 
> HBASE-18549-.master.004.patch, HBASE-18549.branch-1.001.patch, 
> HBASE-18549.branch-1.001.patch
>
>
> We have come across this situation multiple times where a zookeeper issues 
> can cause NodeFailoverWorker to fail picking up replication queue for a dead 
> region server silently. One example is when the znode size for a particular 
> queue exceed jute.maxBuffer value.
> There can be other situations that may lead to this and just go undetected. 
> We need to have a metric for number of unclaimed replication queues. This 
> will help in mitigating the problem through alerting on the metric and 
> identifying underlying issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-18549) Unclaimed replication queues can go undetected

2018-08-29 Thread Xu Cang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-18549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xu Cang updated HBASE-18549:

Attachment: HBASE-18549.branch-1.001.patch

> Unclaimed replication queues can go undetected
> --
>
> Key: HBASE-18549
> URL: https://issues.apache.org/jira/browse/HBASE-18549
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Reporter: Ashu Pachauri
>Assignee: Xu Cang
>Priority: Critical
> Fix For: 1.5.0, 1.3.3, 1.4.8
>
> Attachments: HBASE-18549-.master.001.patch, 
> HBASE-18549-.master.002.patch, HBASE-18549-.master.003.patch, 
> HBASE-18549-.master.004.patch, HBASE-18549.branch-1.001.patch
>
>
> We have come across this situation multiple times where a zookeeper issues 
> can cause NodeFailoverWorker to fail picking up replication queue for a dead 
> region server silently. One example is when the znode size for a particular 
> queue exceed jute.maxBuffer value.
> There can be other situations that may lead to this and just go undetected. 
> We need to have a metric for number of unclaimed replication queues. This 
> will help in mitigating the problem through alerting on the metric and 
> identifying underlying issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-18549) Unclaimed replication queues can go undetected

2018-08-29 Thread Xu Cang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-18549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xu Cang updated HBASE-18549:

Attachment: HBASE-18549-.master.004.patch

> Unclaimed replication queues can go undetected
> --
>
> Key: HBASE-18549
> URL: https://issues.apache.org/jira/browse/HBASE-18549
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Reporter: Ashu Pachauri
>Assignee: Xu Cang
>Priority: Critical
> Fix For: 1.5.0, 1.3.3, 1.4.8
>
> Attachments: HBASE-18549-.master.001.patch, 
> HBASE-18549-.master.002.patch, HBASE-18549-.master.003.patch, 
> HBASE-18549-.master.004.patch
>
>
> We have come across this situation multiple times where a zookeeper issues 
> can cause NodeFailoverWorker to fail picking up replication queue for a dead 
> region server silently. One example is when the znode size for a particular 
> queue exceed jute.maxBuffer value.
> There can be other situations that may lead to this and just go undetected. 
> We need to have a metric for number of unclaimed replication queues. This 
> will help in mitigating the problem through alerting on the metric and 
> identifying underlying issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-18549) Unclaimed replication queues can go undetected

2018-08-23 Thread Xu Cang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-18549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xu Cang updated HBASE-18549:

Attachment: HBASE-18549-.master.003.patch

> Unclaimed replication queues can go undetected
> --
>
> Key: HBASE-18549
> URL: https://issues.apache.org/jira/browse/HBASE-18549
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Reporter: Ashu Pachauri
>Assignee: Xu Cang
>Priority: Critical
> Fix For: 1.5.0, 1.3.3, 1.4.8
>
> Attachments: HBASE-18549-.master.001.patch, 
> HBASE-18549-.master.002.patch, HBASE-18549-.master.003.patch
>
>
> We have come across this situation multiple times where a zookeeper issues 
> can cause NodeFailoverWorker to fail picking up replication queue for a dead 
> region server silently. One example is when the znode size for a particular 
> queue exceed jute.maxBuffer value.
> There can be other situations that may lead to this and just go undetected. 
> We need to have a metric for number of unclaimed replication queues. This 
> will help in mitigating the problem through alerting on the metric and 
> identifying underlying issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-18549) Unclaimed replication queues can go undetected

2018-08-23 Thread Xu Cang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-18549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xu Cang updated HBASE-18549:

Attachment: HBASE-18549-.master.002.patch

> Unclaimed replication queues can go undetected
> --
>
> Key: HBASE-18549
> URL: https://issues.apache.org/jira/browse/HBASE-18549
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Reporter: Ashu Pachauri
>Assignee: Xu Cang
>Priority: Critical
> Fix For: 1.5.0, 1.3.3, 1.4.8
>
> Attachments: HBASE-18549-.master.001.patch, 
> HBASE-18549-.master.002.patch
>
>
> We have come across this situation multiple times where a zookeeper issues 
> can cause NodeFailoverWorker to fail picking up replication queue for a dead 
> region server silently. One example is when the znode size for a particular 
> queue exceed jute.maxBuffer value.
> There can be other situations that may lead to this and just go undetected. 
> We need to have a metric for number of unclaimed replication queues. This 
> will help in mitigating the problem through alerting on the metric and 
> identifying underlying issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-18549) Unclaimed replication queues can go undetected

2018-08-23 Thread Xu Cang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-18549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xu Cang updated HBASE-18549:

Attachment: (was: HBASE-18549-.master.001.patch)

> Unclaimed replication queues can go undetected
> --
>
> Key: HBASE-18549
> URL: https://issues.apache.org/jira/browse/HBASE-18549
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Reporter: Ashu Pachauri
>Assignee: Xu Cang
>Priority: Critical
> Fix For: 1.5.0, 1.3.3, 1.4.8
>
> Attachments: HBASE-18549-.master.001.patch
>
>
> We have come across this situation multiple times where a zookeeper issues 
> can cause NodeFailoverWorker to fail picking up replication queue for a dead 
> region server silently. One example is when the znode size for a particular 
> queue exceed jute.maxBuffer value.
> There can be other situations that may lead to this and just go undetected. 
> We need to have a metric for number of unclaimed replication queues. This 
> will help in mitigating the problem through alerting on the metric and 
> identifying underlying issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-18549) Unclaimed replication queues can go undetected

2018-08-23 Thread Xu Cang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-18549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xu Cang updated HBASE-18549:

Attachment: HBASE-18549-.master.001.patch
Status: Patch Available  (was: Open)

> Unclaimed replication queues can go undetected
> --
>
> Key: HBASE-18549
> URL: https://issues.apache.org/jira/browse/HBASE-18549
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Reporter: Ashu Pachauri
>Assignee: Xu Cang
>Priority: Critical
> Fix For: 1.5.0, 1.3.3, 1.4.8
>
> Attachments: HBASE-18549-.master.001.patch
>
>
> We have come across this situation multiple times where a zookeeper issues 
> can cause NodeFailoverWorker to fail picking up replication queue for a dead 
> region server silently. One example is when the znode size for a particular 
> queue exceed jute.maxBuffer value.
> There can be other situations that may lead to this and just go undetected. 
> We need to have a metric for number of unclaimed replication queues. This 
> will help in mitigating the problem through alerting on the metric and 
> identifying underlying issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-18549) Unclaimed replication queues can go undetected

2018-08-23 Thread Xu Cang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-18549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xu Cang updated HBASE-18549:

Attachment: HBASE-18549-.master.001.patch

> Unclaimed replication queues can go undetected
> --
>
> Key: HBASE-18549
> URL: https://issues.apache.org/jira/browse/HBASE-18549
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Reporter: Ashu Pachauri
>Assignee: Xu Cang
>Priority: Critical
> Fix For: 1.5.0, 1.3.3, 1.4.8
>
> Attachments: HBASE-18549-.master.001.patch
>
>
> We have come across this situation multiple times where a zookeeper issues 
> can cause NodeFailoverWorker to fail picking up replication queue for a dead 
> region server silently. One example is when the znode size for a particular 
> queue exceed jute.maxBuffer value.
> There can be other situations that may lead to this and just go undetected. 
> We need to have a metric for number of unclaimed replication queues. This 
> will help in mitigating the problem through alerting on the metric and 
> identifying underlying issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-18549) Unclaimed replication queues can go undetected

2018-08-23 Thread Xu Cang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-18549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xu Cang updated HBASE-18549:

Attachment: (was: HBASE-18549-.master.wip.patch)

> Unclaimed replication queues can go undetected
> --
>
> Key: HBASE-18549
> URL: https://issues.apache.org/jira/browse/HBASE-18549
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Reporter: Ashu Pachauri
>Assignee: Xu Cang
>Priority: Critical
> Fix For: 1.5.0, 1.3.3, 1.4.8
>
> Attachments: HBASE-18549-.master.001.patch
>
>
> We have come across this situation multiple times where a zookeeper issues 
> can cause NodeFailoverWorker to fail picking up replication queue for a dead 
> region server silently. One example is when the znode size for a particular 
> queue exceed jute.maxBuffer value.
> There can be other situations that may lead to this and just go undetected. 
> We need to have a metric for number of unclaimed replication queues. This 
> will help in mitigating the problem through alerting on the metric and 
> identifying underlying issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-18549) Unclaimed replication queues can go undetected

2018-08-22 Thread Xu Cang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-18549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xu Cang updated HBASE-18549:

Attachment: HBASE-18549-.master.wip.patch

> Unclaimed replication queues can go undetected
> --
>
> Key: HBASE-18549
> URL: https://issues.apache.org/jira/browse/HBASE-18549
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Reporter: Ashu Pachauri
>Assignee: Xu Cang
>Priority: Critical
> Fix For: 1.5.0, 1.3.3, 1.4.8
>
> Attachments: HBASE-18549-.master.wip.patch
>
>
> We have come across this situation multiple times where a zookeeper issues 
> can cause NodeFailoverWorker to fail picking up replication queue for a dead 
> region server silently. One example is when the znode size for a particular 
> queue exceed jute.maxBuffer value.
> There can be other situations that may lead to this and just go undetected. 
> We need to have a metric for number of unclaimed replication queues. This 
> will help in mitigating the problem through alerting on the metric and 
> identifying underlying issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-18549) Unclaimed replication queues can go undetected

2018-08-21 Thread Andrew Purtell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-18549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-18549:
---
Fix Version/s: (was: 1.4.7)
   1.4.8

> Unclaimed replication queues can go undetected
> --
>
> Key: HBASE-18549
> URL: https://issues.apache.org/jira/browse/HBASE-18549
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Reporter: Ashu Pachauri
>Priority: Critical
> Fix For: 1.5.0, 1.3.3, 1.4.8
>
>
> We have come across this situation multiple times where a zookeeper issues 
> can cause NodeFailoverWorker to fail picking up replication queue for a dead 
> region server silently. One example is when the znode size for a particular 
> queue exceed jute.maxBuffer value.
> There can be other situations that may lead to this and just go undetected. 
> We need to have a metric for number of unclaimed replication queues. This 
> will help in mitigating the problem through alerting on the metric and 
> identifying underlying issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-18549) Unclaimed replication queues can go undetected

2018-07-19 Thread Andrew Purtell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-18549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-18549:
---
Fix Version/s: (was: 1.4.6)
   1.4.7

> Unclaimed replication queues can go undetected
> --
>
> Key: HBASE-18549
> URL: https://issues.apache.org/jira/browse/HBASE-18549
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Reporter: Ashu Pachauri
>Priority: Critical
> Fix For: 1.5.0, 1.3.3, 1.4.7
>
>
> We have come across this situation multiple times where a zookeeper issues 
> can cause NodeFailoverWorker to fail picking up replication queue for a dead 
> region server silently. One example is when the znode size for a particular 
> queue exceed jute.maxBuffer value.
> There can be other situations that may lead to this and just go undetected. 
> We need to have a metric for number of unclaimed replication queues. This 
> will help in mitigating the problem through alerting on the metric and 
> identifying underlying issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-18549) Unclaimed replication queues can go undetected

2018-04-22 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-18549:
---
Fix Version/s: (was: 1.4.4)
   1.4.5

> Unclaimed replication queues can go undetected
> --
>
> Key: HBASE-18549
> URL: https://issues.apache.org/jira/browse/HBASE-18549
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Reporter: Ashu Pachauri
>Priority: Critical
> Fix For: 1.5.0, 1.3.3, 1.4.5
>
>
> We have come across this situation multiple times where a zookeeper issues 
> can cause NodeFailoverWorker to fail picking up replication queue for a dead 
> region server silently. One example is when the znode size for a particular 
> queue exceed jute.maxBuffer value.
> There can be other situations that may lead to this and just go undetected. 
> We need to have a metric for number of unclaimed replication queues. This 
> will help in mitigating the problem through alerting on the metric and 
> identifying underlying issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-18549) Unclaimed replication queues can go undetected

2018-03-19 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-18549:
---
Fix Version/s: (was: 1.4.3)
   1.4.4

> Unclaimed replication queues can go undetected
> --
>
> Key: HBASE-18549
> URL: https://issues.apache.org/jira/browse/HBASE-18549
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Reporter: Ashu Pachauri
>Priority: Critical
> Fix For: 1.5.0, 1.3.3, 1.4.4
>
>
> We have come across this situation multiple times where a zookeeper issues 
> can cause NodeFailoverWorker to fail picking up replication queue for a dead 
> region server silently. One example is when the znode size for a particular 
> queue exceed jute.maxBuffer value.
> There can be other situations that may lead to this and just go undetected. 
> We need to have a metric for number of unclaimed replication queues. This 
> will help in mitigating the problem through alerting on the metric and 
> identifying underlying issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-18549) Unclaimed replication queues can go undetected

2018-02-28 Thread Francis Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francis Liu updated HBASE-18549:

Fix Version/s: (was: 1.3.2)
   1.3.3

> Unclaimed replication queues can go undetected
> --
>
> Key: HBASE-18549
> URL: https://issues.apache.org/jira/browse/HBASE-18549
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Reporter: Ashu Pachauri
>Priority: Critical
> Fix For: 1.5.0, 1.3.3, 1.4.3
>
>
> We have come across this situation multiple times where a zookeeper issues 
> can cause NodeFailoverWorker to fail picking up replication queue for a dead 
> region server silently. One example is when the znode size for a particular 
> queue exceed jute.maxBuffer value.
> There can be other situations that may lead to this and just go undetected. 
> We need to have a metric for number of unclaimed replication queues. This 
> will help in mitigating the problem through alerting on the metric and 
> identifying underlying issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-18549) Unclaimed replication queues can go undetected

2018-02-14 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-18549:
---
Fix Version/s: (was: 1.4.2)
   1.4.3

> Unclaimed replication queues can go undetected
> --
>
> Key: HBASE-18549
> URL: https://issues.apache.org/jira/browse/HBASE-18549
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Reporter: Ashu Pachauri
>Priority: Critical
> Fix For: 1.3.2, 1.5.0, 1.4.3
>
>
> We have come across this situation multiple times where a zookeeper issues 
> can cause NodeFailoverWorker to fail picking up replication queue for a dead 
> region server silently. One example is when the znode size for a particular 
> queue exceed jute.maxBuffer value.
> There can be other situations that may lead to this and just go undetected. 
> We need to have a metric for number of unclaimed replication queues. This 
> will help in mitigating the problem through alerting on the metric and 
> identifying underlying issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-18549) Unclaimed replication queues can go undetected

2018-01-24 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-18549:
---
Fix Version/s: (was: 1.4.1)
   1.4.2

> Unclaimed replication queues can go undetected
> --
>
> Key: HBASE-18549
> URL: https://issues.apache.org/jira/browse/HBASE-18549
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Reporter: Ashu Pachauri
>Priority: Critical
> Fix For: 1.3.2, 1.5.0, 1.4.2
>
>
> We have come across this situation multiple times where a zookeeper issues 
> can cause NodeFailoverWorker to fail picking up replication queue for a dead 
> region server silently. One example is when the znode size for a particular 
> queue exceed jute.maxBuffer value.
> There can be other situations that may lead to this and just go undetected. 
> We need to have a metric for number of unclaimed replication queues. This 
> will help in mitigating the problem through alerting on the metric and 
> identifying underlying issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-18549) Unclaimed replication queues can go undetected

2017-11-08 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-18549:
---
Fix Version/s: 1.5.0
   1.4.1

> Unclaimed replication queues can go undetected
> --
>
> Key: HBASE-18549
> URL: https://issues.apache.org/jira/browse/HBASE-18549
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Reporter: Ashu Pachauri
>Priority: Critical
> Fix For: 1.3.2, 1.4.1, 1.5.0
>
>
> We have come across this situation multiple times where a zookeeper issues 
> can cause NodeFailoverWorker to fail picking up replication queue for a dead 
> region server silently. One example is when the znode size for a particular 
> queue exceed jute.maxBuffer value.
> There can be other situations that may lead to this and just go undetected. 
> We need to have a metric for number of unclaimed replication queues. This 
> will help in mitigating the problem through alerting on the metric and 
> identifying underlying issues.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)