[jira] [Commented] (HBASE-21356) bulkLoadHFile API should ensure that rs has the source hfile's write permission

2018-10-22 Thread Umesh Agashe (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16659643#comment-16659643
 ] 

Umesh Agashe commented on HBASE-21356:
--

+1, lgtm. A few comments left on RB.

> bulkLoadHFile API should ensure that rs has the source hfile's write 
> permission
> ---
>
> Key: HBASE-21356
> URL: https://issues.apache.org/jira/browse/HBASE-21356
> Project: HBase
>  Issue Type: Bug
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.0.3, 2.1.2
>
> Attachments: HBASE-21356.v1.patch
>
>
> If the rs bulk load a HFile but has no write permission of it,  we can read & 
> compact the hfile, but after the compaction finished, the HFile willl be 
> moved to archive directory,  the HFileCleaner won't has permission to delete, 
> then the HFile will always be keep in HDFS. 
> Need check the file's write permission when run bulkLoadHFile at server side, 
>  if no write permission, then reject.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21214) [hbck2] setTableState just sets hbase:meta state, not in-memory state

2018-09-21 Thread Umesh Agashe (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16624218#comment-16624218
 ] 

Umesh Agashe commented on HBASE-21214:
--

+1 lgtm

> [hbck2] setTableState just sets hbase:meta state, not in-memory state
> -
>
> Key: HBASE-21214
> URL: https://issues.apache.org/jira/browse/HBASE-21214
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2, hbck2
>Reporter: stack
>Assignee: stack
>Priority: Major
> Fix For: 2.1.1
>
> Attachments: 21214.patch, HBASE-21214.master.001.patch
>
>
> Means that we have to go get another Master to see the table state change 
> because in-memory state is still pegged at the old value.
> TODO: Check the is_enabled/is_disabled shell commands to make sure they are 
> reading from the right place.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20941) Create and implement HbckService in master

2018-09-21 Thread Umesh Agashe (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16624213#comment-16624213
 ] 

Umesh Agashe commented on HBASE-20941:
--

bq. It does not update the in-memory state in the Master

[~stack], Changes for this JIRA were committed on August 7 and then HBASE-21025 
added cache for TableState.

> Create and implement HbckService in master
> --
>
> Key: HBASE-20941
> URL: https://issues.apache.org/jira/browse/HBASE-20941
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Umesh Agashe
>Assignee: Umesh Agashe
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.2
>
> Attachments: hbase-20941.master.001.patch, 
> hbase-20941.master.002.patch, hbase-20941.master.003.patch, 
> hbase-20941.master.004.patch, hbase-20941.master.004.patch, 
> hbase-20941.master.004.patch
>
>
> Create HbckService in master and implement following methods:
>  # setTableState(): If table state are inconsistent with action/ procedures 
> working on them, sometimes manipulating their states in meta fix things.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HBASE-21023) Add bypassProcedureToCompletion() API to HbckService

2018-09-18 Thread Umesh Agashe (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16619828#comment-16619828
 ] 

Umesh Agashe edited comment on HBASE-21023 at 9/18/18 10:43 PM:


ah! looking... Thanks [~stack]!


was (Author: uagashe):
ah! looking...

> Add bypassProcedureToCompletion() API to HbckService
> 
>
> Key: HBASE-21023
> URL: https://issues.apache.org/jira/browse/HBASE-21023
> Project: HBase
>  Issue Type: Sub-task
>  Components: hbck2
>Affects Versions: 2.0.1
>Reporter: Umesh Agashe
>Assignee: Umesh Agashe
>Priority: Major
> Fix For: 2.2.0
>
> Attachments: hbase-21023.master.001.patch, 
> hbase-21023.master.001.patch, hbase-21023.master.002.patch
>
>
> completeProcedure/s(): some procedures do not support abort at every step. 
> When these procedures get stuck then they can not be aborted or make further 
> progress. Corrective action is to bypass these procedures to completion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21023) Add bypassProcedureToCompletion() API to HbckService

2018-09-18 Thread Umesh Agashe (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16619828#comment-16619828
 ] 

Umesh Agashe commented on HBASE-21023:
--

ah! looking...

> Add bypassProcedureToCompletion() API to HbckService
> 
>
> Key: HBASE-21023
> URL: https://issues.apache.org/jira/browse/HBASE-21023
> Project: HBase
>  Issue Type: Sub-task
>  Components: hbck2
>Affects Versions: 2.0.1
>Reporter: Umesh Agashe
>Assignee: Umesh Agashe
>Priority: Major
> Fix For: 2.2.0
>
> Attachments: hbase-21023.master.001.patch, 
> hbase-21023.master.001.patch, hbase-21023.master.002.patch
>
>
> completeProcedure/s(): some procedures do not support abort at every step. 
> When these procedures get stuck then they can not be aborted or make further 
> progress. Corrective action is to bypass these procedures to completion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21169) Initiate hbck2 tool in hbase-operator-tools repo

2018-09-18 Thread Umesh Agashe (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16619789#comment-16619789
 ] 

Umesh Agashe commented on HBASE-21169:
--

{quote}Maybe we need to introduce a new HBaseInterfaceAudience called HBCK to 
indicate that this tool is used by HBCK2?
{quote}
Patch for HBASE-20941 already adds this. Its used as:
{code:java}
@InterfaceAudience.LimitedPrivate(HBaseInterfaceAudience.HBCK)
{code}

> Initiate hbck2 tool in hbase-operator-tools repo
> 
>
> Key: HBASE-21169
> URL: https://issues.apache.org/jira/browse/HBASE-21169
> Project: HBase
>  Issue Type: Sub-task
>  Components: hbck2
>Affects Versions: 2.1.0
>Reporter: Umesh Agashe
>Assignee: stack
>Priority: Major
> Attachments: hbase-21169.master.001.patch
>
>
> Create hbck2 tool in hbase-operator-tools 
> (https://github.com/apache/hbase-operator-tools.git) repo. This is not 
> intended to be complete tool but initial changes with usage, ability to 
> connect to server, logging, and using newly added HbckService etc. Code 
> changes to address specific use cases can be added later and tool will evolve.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21023) Add bypassProcedureToCompletion() API to HbckService

2018-09-14 Thread Umesh Agashe (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16615446#comment-16615446
 ] 

Umesh Agashe commented on HBASE-21023:
--

Attached patch 002 with changes as per review comments.

> Add bypassProcedureToCompletion() API to HbckService
> 
>
> Key: HBASE-21023
> URL: https://issues.apache.org/jira/browse/HBASE-21023
> Project: HBase
>  Issue Type: Sub-task
>  Components: hbck2
>Affects Versions: 2.0.1
>Reporter: Umesh Agashe
>Assignee: Umesh Agashe
>Priority: Major
> Fix For: 2.2.0
>
> Attachments: hbase-21023.master.001.patch, 
> hbase-21023.master.001.patch, hbase-21023.master.002.patch
>
>
> completeProcedure/s(): some procedures do not support abort at every step. 
> When these procedures get stuck then they can not be aborted or make further 
> progress. Corrective action is to bypass these procedures to completion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21023) Add bypassProcedureToCompletion() API to HbckService

2018-09-14 Thread Umesh Agashe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Umesh Agashe updated HBASE-21023:
-
Attachment: hbase-21023.master.002.patch

> Add bypassProcedureToCompletion() API to HbckService
> 
>
> Key: HBASE-21023
> URL: https://issues.apache.org/jira/browse/HBASE-21023
> Project: HBase
>  Issue Type: Sub-task
>  Components: hbck2
>Affects Versions: 2.0.1
>Reporter: Umesh Agashe
>Assignee: Umesh Agashe
>Priority: Major
> Fix For: 2.2.0
>
> Attachments: hbase-21023.master.001.patch, 
> hbase-21023.master.001.patch, hbase-21023.master.002.patch
>
>
> completeProcedure/s(): some procedures do not support abort at every step. 
> When these procedures get stuck then they can not be aborted or make further 
> progress. Corrective action is to bypass these procedures to completion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21169) Initiate hbck2 tool in hbase-operator-tools repo

2018-09-14 Thread Umesh Agashe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Umesh Agashe updated HBASE-21169:
-
Attachment: hbase-21169.master.001.patch

> Initiate hbck2 tool in hbase-operator-tools repo
> 
>
> Key: HBASE-21169
> URL: https://issues.apache.org/jira/browse/HBASE-21169
> Project: HBase
>  Issue Type: Sub-task
>  Components: hbck2
>Affects Versions: 2.1.0
>Reporter: Umesh Agashe
>Assignee: stack
>Priority: Major
> Attachments: hbase-21169.master.001.patch
>
>
> Create hbck2 tool in hbase-operator-tools 
> (https://github.com/apache/hbase-operator-tools.git) repo. This is not 
> intended to be complete tool but initial changes with usage, ability to 
> connect to server, logging, and using newly added HbckService etc. Code 
> changes to address specific use cases can be added later and tool will evolve.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21169) Initiate hbck2 tool in hbase-operator-tools repo

2018-09-14 Thread Umesh Agashe (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16615245#comment-16615245
 ] 

Umesh Agashe commented on HBASE-21169:
--

"working' is vague in the comment above. To be clear, I have some code changes. 
I am not working on doc. Thanks for the doc link [~stack]!

> Initiate hbck2 tool in hbase-operator-tools repo
> 
>
> Key: HBASE-21169
> URL: https://issues.apache.org/jira/browse/HBASE-21169
> Project: HBase
>  Issue Type: Sub-task
>  Components: hbck2
>Affects Versions: 2.1.0
>Reporter: Umesh Agashe
>Assignee: stack
>Priority: Major
>
> Create hbck2 tool in hbase-operator-tools 
> (https://github.com/apache/hbase-operator-tools.git) repo. This is not 
> intended to be complete tool but initial changes with usage, ability to 
> connect to server, logging, and using newly added HbckService etc. Code 
> changes to address specific use cases can be added later and tool will evolve.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21169) Initiate hbck2 tool in hbase-operator-tools repo

2018-09-14 Thread Umesh Agashe (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16615158#comment-16615158
 ] 

Umesh Agashe commented on HBASE-21169:
--

[~stack], I was working on it and thought I assigned it to myself to indicate 
the same. Lets talk about it.

> Initiate hbck2 tool in hbase-operator-tools repo
> 
>
> Key: HBASE-21169
> URL: https://issues.apache.org/jira/browse/HBASE-21169
> Project: HBase
>  Issue Type: Sub-task
>  Components: hbck2
>Affects Versions: 2.1.0
>Reporter: Umesh Agashe
>Assignee: stack
>Priority: Major
>
> Create hbck2 tool in hbase-operator-tools 
> (https://github.com/apache/hbase-operator-tools.git) repo. This is not 
> intended to be complete tool but initial changes with usage, ability to 
> connect to server, logging, and using newly added HbckService etc. Code 
> changes to address specific use cases can be added later and tool will evolve.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21169) Initiate hbck2 tool in hbase-operator-tools repo

2018-09-07 Thread Umesh Agashe (JIRA)
Umesh Agashe created HBASE-21169:


 Summary: Initiate hbck2 tool in hbase-operator-tools repo
 Key: HBASE-21169
 URL: https://issues.apache.org/jira/browse/HBASE-21169
 Project: HBase
  Issue Type: Sub-task
  Components: hbck2
Affects Versions: 2.1.0
Reporter: Umesh Agashe
Assignee: Umesh Agashe


Create hbck2 tool in hbase-operator-tools 
(https://github.com/apache/hbase-operator-tools.git) repo. This is not intended 
to be complete tool but initial changes with usage, ability to connect to 
server, logging, and using newly added HbckService etc. Code changes to address 
specific use cases can be added later and tool will evolve.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21023) Add bypassProcedureToCompletion() API to HbckService

2018-09-06 Thread Umesh Agashe (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16606560#comment-16606560
 ] 

Umesh Agashe commented on HBASE-21023:
--

retry, errors doesn't look related.

> Add bypassProcedureToCompletion() API to HbckService
> 
>
> Key: HBASE-21023
> URL: https://issues.apache.org/jira/browse/HBASE-21023
> Project: HBase
>  Issue Type: Sub-task
>  Components: hbck2
>Affects Versions: 2.0.1
>Reporter: Umesh Agashe
>Assignee: Umesh Agashe
>Priority: Major
> Fix For: 2.2.0
>
> Attachments: hbase-21023.master.001.patch, 
> hbase-21023.master.001.patch
>
>
> completeProcedure/s(): some procedures do not support abort at every step. 
> When these procedures get stuck then they can not be aborted or make further 
> progress. Corrective action is to bypass these procedures to completion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21023) Add bypassProcedureToCompletion() API to HbckService

2018-09-06 Thread Umesh Agashe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Umesh Agashe updated HBASE-21023:
-
Attachment: hbase-21023.master.001.patch

> Add bypassProcedureToCompletion() API to HbckService
> 
>
> Key: HBASE-21023
> URL: https://issues.apache.org/jira/browse/HBASE-21023
> Project: HBase
>  Issue Type: Sub-task
>  Components: hbck2
>Affects Versions: 2.0.1
>Reporter: Umesh Agashe
>Assignee: Umesh Agashe
>Priority: Major
> Fix For: 2.2.0
>
> Attachments: hbase-21023.master.001.patch, 
> hbase-21023.master.001.patch
>
>
> completeProcedure/s(): some procedures do not support abort at every step. 
> When these procedures get stuck then they can not be aborted or make further 
> progress. Corrective action is to bypass these procedures to completion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21023) Add bypassProcedureToCompletion() API to HbckService

2018-09-06 Thread Umesh Agashe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Umesh Agashe updated HBASE-21023:
-
Status: Patch Available  (was: In Progress)

> Add bypassProcedureToCompletion() API to HbckService
> 
>
> Key: HBASE-21023
> URL: https://issues.apache.org/jira/browse/HBASE-21023
> Project: HBase
>  Issue Type: Sub-task
>  Components: hbck2
>Affects Versions: 2.0.1
>Reporter: Umesh Agashe
>Assignee: Umesh Agashe
>Priority: Major
> Fix For: 2.2.0
>
> Attachments: hbase-21023.master.001.patch
>
>
> completeProcedure/s(): some procedures do not support abort at every step. 
> When these procedures get stuck then they can not be aborted or make further 
> progress. Corrective action is to bypass these procedures to completion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21023) Add bypassProcedureToCompletion() API to HbckService

2018-09-06 Thread Umesh Agashe (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16606212#comment-16606212
 ] 

Umesh Agashe commented on HBASE-21023:
--

Patch adds API for bypassing procedure to completion for clients to use. See 
comments above from  [~stack] and [~allan163] regarding choice of client and 
how it should use the API.

> Add bypassProcedureToCompletion() API to HbckService
> 
>
> Key: HBASE-21023
> URL: https://issues.apache.org/jira/browse/HBASE-21023
> Project: HBase
>  Issue Type: Sub-task
>  Components: hbck2
>Affects Versions: 2.0.1
>Reporter: Umesh Agashe
>Assignee: Umesh Agashe
>Priority: Major
> Fix For: 2.2.0
>
> Attachments: hbase-21023.master.001.patch
>
>
> completeProcedure/s(): some procedures do not support abort at every step. 
> When these procedures get stuck then they can not be aborted or make further 
> progress. Corrective action is to bypass these procedures to completion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work started] (HBASE-21023) Add bypassProcedureToCompletion() API to HbckService

2018-09-06 Thread Umesh Agashe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HBASE-21023 started by Umesh Agashe.

> Add bypassProcedureToCompletion() API to HbckService
> 
>
> Key: HBASE-21023
> URL: https://issues.apache.org/jira/browse/HBASE-21023
> Project: HBase
>  Issue Type: Sub-task
>  Components: hbck2
>Affects Versions: 2.0.1
>Reporter: Umesh Agashe
>Assignee: Umesh Agashe
>Priority: Major
> Fix For: 2.2.0
>
> Attachments: hbase-21023.master.001.patch
>
>
> completeProcedure/s(): some procedures do not support abort at every step. 
> When these procedures get stuck then they can not be aborted or make further 
> progress. Corrective action is to bypass these procedures to completion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21023) Add bypassProcedureToCompletion() API to HbckService

2018-09-06 Thread Umesh Agashe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Umesh Agashe updated HBASE-21023:
-
Attachment: hbase-21023.master.001.patch

> Add bypassProcedureToCompletion() API to HbckService
> 
>
> Key: HBASE-21023
> URL: https://issues.apache.org/jira/browse/HBASE-21023
> Project: HBase
>  Issue Type: Sub-task
>  Components: hbck2
>Affects Versions: 2.0.1
>Reporter: Umesh Agashe
>Assignee: Umesh Agashe
>Priority: Major
> Fix For: 2.2.0
>
> Attachments: hbase-21023.master.001.patch
>
>
> completeProcedure/s(): some procedures do not support abort at every step. 
> When these procedures get stuck then they can not be aborted or make further 
> progress. Corrective action is to bypass these procedures to completion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21023) Add bypassProcedureToCompletion() API to HbckService

2018-09-05 Thread Umesh Agashe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Umesh Agashe updated HBASE-21023:
-
Summary: Add bypassProcedureToCompletion() API to HbckService  (was: Add 
completeProcedure/s() API to HbckService)

> Add bypassProcedureToCompletion() API to HbckService
> 
>
> Key: HBASE-21023
> URL: https://issues.apache.org/jira/browse/HBASE-21023
> Project: HBase
>  Issue Type: Sub-task
>  Components: hbck2
>Affects Versions: 2.0.1
>Reporter: Umesh Agashe
>Assignee: Umesh Agashe
>Priority: Major
> Fix For: 2.2.0
>
>
> completeProcedure/s(): some procedures do not support abort at every step. 
> When these procedures get stuck then they can not be aborted or make further 
> progress. Corrective action is to bypass these procedures to completion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21023) Add completeProcedure/s() API to HbckService

2018-08-30 Thread Umesh Agashe (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16597822#comment-16597822
 ] 

Umesh Agashe commented on HBASE-21023:
--

HBASE-21083 adds bypassing procedures to completion. This could be used as an 
alternative to purging procedures. So subject and description of the Jira is 
now changed to completeProcedure/s() which will bypass the procedure/s and 
parents to completion without doing actual work. This will be useful for 
operators from hbck2 to unstuck procedures.

> Add completeProcedure/s() API to HbckService
> 
>
> Key: HBASE-21023
> URL: https://issues.apache.org/jira/browse/HBASE-21023
> Project: HBase
>  Issue Type: Sub-task
>  Components: hbck2
>Affects Versions: 2.0.1
>Reporter: Umesh Agashe
>Assignee: Umesh Agashe
>Priority: Major
> Fix For: 2.2.0
>
>
> completeProcedure/s(): some procedures do not support abort at every step. 
> When these procedures get stuck then they can not be aborted or make further 
> progress. Corrective action is to bypass these procedures to completion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21023) Add completeProcedure/s() API to HbckService

2018-08-30 Thread Umesh Agashe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Umesh Agashe updated HBASE-21023:
-
Description: completeProcedure/s(): some procedures do not support abort at 
every step. When these procedures get stuck then they can not be aborted or 
make further progress. Corrective action is to bypass these procedures to 
completion.  (was: purgeProcedure/s(): some procedures do not support abort at 
every step. When these procedures get stuck then they can not be aborted or 
make further progress. Corrective action is to purge these procedures from 
ProcWAL. Provide option to purge sub-procedures as well.)

> Add completeProcedure/s() API to HbckService
> 
>
> Key: HBASE-21023
> URL: https://issues.apache.org/jira/browse/HBASE-21023
> Project: HBase
>  Issue Type: Sub-task
>  Components: hbck2
>Affects Versions: 2.0.1
>Reporter: Umesh Agashe
>Assignee: Umesh Agashe
>Priority: Major
> Fix For: 2.2.0
>
>
> completeProcedure/s(): some procedures do not support abort at every step. 
> When these procedures get stuck then they can not be aborted or make further 
> progress. Corrective action is to bypass these procedures to completion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21023) Add completeProcedure/s() API to HbckService

2018-08-30 Thread Umesh Agashe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Umesh Agashe updated HBASE-21023:
-
Summary: Add completeProcedure/s() API to HbckService  (was: Add 
purgeProcedure/s() API to HbckService)

> Add completeProcedure/s() API to HbckService
> 
>
> Key: HBASE-21023
> URL: https://issues.apache.org/jira/browse/HBASE-21023
> Project: HBase
>  Issue Type: Sub-task
>  Components: hbck2
>Affects Versions: 2.0.1
>Reporter: Umesh Agashe
>Assignee: Umesh Agashe
>Priority: Major
> Fix For: 2.2.0
>
>
> purgeProcedure/s(): some procedures do not support abort at every step. When 
> these procedures get stuck then they can not be aborted or make further 
> progress. Corrective action is to purge these procedures from ProcWAL. 
> Provide option to purge sub-procedures as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HBASE-21083) Introduce a mechanism to bypass the execution of a stuck procedure

2018-08-30 Thread Umesh Agashe (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16597814#comment-16597814
 ] 

Umesh Agashe edited comment on HBASE-21083 at 8/30/18 7:20 PM:
---

[~stack], can this be committed to master as well?


was (Author: uagashe):
@stack, can this be committed to master as well?

> Introduce a mechanism to bypass the execution of a stuck procedure
> --
>
> Key: HBASE-21083
> URL: https://issues.apache.org/jira/browse/HBASE-21083
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2
>Affects Versions: 2.1.0, 2.0.1
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Fix For: 3.0.0, 2.1.1, 2.0.2
>
> Attachments: HBASE-21083.branch-2.0.001.patch, 
> HBASE-21083.branch-2.0.002.patch, HBASE-21083.branch-2.0.003.patch, 
> HBASE-21083.branch-2.0.003.patch, HBASE-21083.branch-2.0.003.patch, 
> HBASE-21083.branch-2.1.001.patch
>
>
> Offline discussed with [~stack] and [~Apache9]. We all agreed that we need to 
> introduce a mechanism to 'force complete' a stuck procedure, so the AMv2 can 
> continue running.
>  we still have some unrevealed bugs hiding in our AMv2 and procedureV2 
> system, we need something to interfere with stuck procedures before HBCK2 can 
> work. This is very crucial for a production ready system. 
> For now, we have little ways to interfere with running procedures. Aborting 
> them is not a good choice, since some procedures are not abort-able. And some 
> procedure may have overridden the abort() method, which will ignore the abort 
> request.
> So, here, I will introduce a mechanism  to bypass the execution of a stuck 
> procedure.
> Basically, I added a field called 'bypass' to Procedure class. If we set this 
> field to true, all the logic in execute/rollback will be skipped, letting 
> this procedure and its ancestors complete normally and releasing the lock 
> resources at last.
> Notice that bypassing a procedure may leave the cluster in a middle state, 
> e.g. the region not assigned, or some hdfs files left behind. 
> The Operators need know the side effect of bypassing and recover the 
> inconsistent state of the cluster themselves, like issuing new procedures to 
> assign the regions.
> A patch will be uploaded and review board will be open. For now, only APIs in 
> ProcedureExecutor are provided. If anything is fine, I will add it to master 
> service and add a shell command to bypass a procedure. Or, maybe we can use 
> dynamically compiled JSPs to execute those APIs as mentioned in HBASE-20679. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21083) Introduce a mechanism to bypass the execution of a stuck procedure

2018-08-30 Thread Umesh Agashe (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16597814#comment-16597814
 ] 

Umesh Agashe commented on HBASE-21083:
--

@stack, can this be committed to master as well?

> Introduce a mechanism to bypass the execution of a stuck procedure
> --
>
> Key: HBASE-21083
> URL: https://issues.apache.org/jira/browse/HBASE-21083
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2
>Affects Versions: 2.1.0, 2.0.1
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Fix For: 3.0.0, 2.1.1, 2.0.2
>
> Attachments: HBASE-21083.branch-2.0.001.patch, 
> HBASE-21083.branch-2.0.002.patch, HBASE-21083.branch-2.0.003.patch, 
> HBASE-21083.branch-2.0.003.patch, HBASE-21083.branch-2.0.003.patch, 
> HBASE-21083.branch-2.1.001.patch
>
>
> Offline discussed with [~stack] and [~Apache9]. We all agreed that we need to 
> introduce a mechanism to 'force complete' a stuck procedure, so the AMv2 can 
> continue running.
>  we still have some unrevealed bugs hiding in our AMv2 and procedureV2 
> system, we need something to interfere with stuck procedures before HBCK2 can 
> work. This is very crucial for a production ready system. 
> For now, we have little ways to interfere with running procedures. Aborting 
> them is not a good choice, since some procedures are not abort-able. And some 
> procedure may have overridden the abort() method, which will ignore the abort 
> request.
> So, here, I will introduce a mechanism  to bypass the execution of a stuck 
> procedure.
> Basically, I added a field called 'bypass' to Procedure class. If we set this 
> field to true, all the logic in execute/rollback will be skipped, letting 
> this procedure and its ancestors complete normally and releasing the lock 
> resources at last.
> Notice that bypassing a procedure may leave the cluster in a middle state, 
> e.g. the region not assigned, or some hdfs files left behind. 
> The Operators need know the side effect of bypassing and recover the 
> inconsistent state of the cluster themselves, like issuing new procedures to 
> assign the regions.
> A patch will be uploaded and review board will be open. For now, only APIs in 
> ProcedureExecutor are provided. If anything is fine, I will add it to master 
> service and add a shell command to bypass a procedure. Or, maybe we can use 
> dynamically compiled JSPs to execute those APIs as mentioned in HBASE-20679. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21083) Introduce a mechanism to bypass the execution of a stuck procedure

2018-08-28 Thread Umesh Agashe (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16595494#comment-16595494
 ] 

Umesh Agashe commented on HBASE-21083:
--

Thanks for addressing the review comments, [~stack]! Thanks [~allan163] for the 
changes!

> Introduce a mechanism to bypass the execution of a stuck procedure
> --
>
> Key: HBASE-21083
> URL: https://issues.apache.org/jira/browse/HBASE-21083
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2
>Affects Versions: 2.1.0, 2.0.1
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Fix For: 3.0.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21083.branch-2.0.001.patch, 
> HBASE-21083.branch-2.0.002.patch, HBASE-21083.branch-2.0.003.patch, 
> HBASE-21083.branch-2.1.001.patch
>
>
> Offline discussed with [~stack] and [~Apache9]. We all agreed that we need to 
> introduce a mechanism to 'force complete' a stuck procedure, so the AMv2 can 
> continue running.
>  we still have some unrevealed bugs hiding in our AMv2 and procedureV2 
> system, we need something to interfere with stuck procedures before HBCK2 can 
> work. This is very crucial for a production ready system. 
> For now, we have little ways to interfere with running procedures. Aborting 
> them is not a good choice, since some procedures are not abort-able. And some 
> procedure may have overridden the abort() method, which will ignore the abort 
> request.
> So, here, I will introduce a mechanism  to bypass the execution of a stuck 
> procedure.
> Basically, I added a field called 'bypass' to Procedure class. If we set this 
> field to true, all the logic in execute/rollback will be skipped, letting 
> this procedure and its ancestors complete normally and releasing the lock 
> resources at last.
> Notice that bypassing a procedure may leave the cluster in a middle state, 
> e.g. the region not assigned, or some hdfs files left behind. 
> The Operators need know the side effect of bypassing and recover the 
> inconsistent state of the cluster themselves, like issuing new procedures to 
> assign the regions.
> A patch will be uploaded and review board will be open. For now, only APIs in 
> ProcedureExecutor are provided. If anything is fine, I will add it to master 
> service and add a shell command to bypass a procedure. Or, maybe we can use 
> dynamically compiled JSPs to execute those APIs as mentioned in HBASE-20679. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20941) Create and implement HbckService in master

2018-08-27 Thread Umesh Agashe (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16594100#comment-16594100
 ] 

Umesh Agashe commented on HBASE-20941:
--

hadoop.hbase.util.TestHBaseFsckReplication failed in last 2 builds. 
'IOException("Duplicate hbck - Abort")' is thrown when lock file for hbck 
already exists due to existing (another) instance of hbck running. In this case 
I think there is a stale file that didn't get cleaned. This seems to be 
unrelated to the changes in the patch. It runs locally in my dev environment.

> Create and implement HbckService in master
> --
>
> Key: HBASE-20941
> URL: https://issues.apache.org/jira/browse/HBASE-20941
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Umesh Agashe
>Assignee: Umesh Agashe
>Priority: Major
> Attachments: hbase-20941.master.001.patch, 
> hbase-20941.master.002.patch, hbase-20941.master.003.patch, 
> hbase-20941.master.004.patch, hbase-20941.master.004.patch, 
> hbase-20941.master.004.patch
>
>
> Create HbckService in master and implement following methods:
>  # setTableState(): If table state are inconsistent with action/ procedures 
> working on them, sometimes manipulating their states in meta fix things.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20941) Create and implement HbckService in master

2018-08-24 Thread Umesh Agashe (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16592075#comment-16592075
 ] 

Umesh Agashe commented on HBASE-20941:
--

retry

> Create and implement HbckService in master
> --
>
> Key: HBASE-20941
> URL: https://issues.apache.org/jira/browse/HBASE-20941
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Umesh Agashe
>Assignee: Umesh Agashe
>Priority: Major
> Attachments: hbase-20941.master.001.patch, 
> hbase-20941.master.002.patch, hbase-20941.master.003.patch, 
> hbase-20941.master.004.patch, hbase-20941.master.004.patch
>
>
> Create HbckService in master and implement following methods:
>  # setTableState(): If table state are inconsistent with action/ procedures 
> working on them, sometimes manipulating their states in meta fix things.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20941) Create and implement HbckService in master

2018-08-24 Thread Umesh Agashe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Umesh Agashe updated HBASE-20941:
-
Attachment: hbase-20941.master.004.patch

> Create and implement HbckService in master
> --
>
> Key: HBASE-20941
> URL: https://issues.apache.org/jira/browse/HBASE-20941
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Umesh Agashe
>Assignee: Umesh Agashe
>Priority: Major
> Attachments: hbase-20941.master.001.patch, 
> hbase-20941.master.002.patch, hbase-20941.master.003.patch, 
> hbase-20941.master.004.patch, hbase-20941.master.004.patch
>
>
> Create HbckService in master and implement following methods:
>  # setTableState(): If table state are inconsistent with action/ procedures 
> working on them, sometimes manipulating their states in meta fix things.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20941) Create and implement HbckService in master

2018-08-23 Thread Umesh Agashe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Umesh Agashe updated HBASE-20941:
-
Attachment: hbase-20941.master.004.patch

> Create and implement HbckService in master
> --
>
> Key: HBASE-20941
> URL: https://issues.apache.org/jira/browse/HBASE-20941
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Umesh Agashe
>Assignee: Umesh Agashe
>Priority: Major
> Attachments: hbase-20941.master.001.patch, 
> hbase-20941.master.002.patch, hbase-20941.master.003.patch, 
> hbase-20941.master.004.patch
>
>
> Create HbckService in master and implement following methods:
>  # setTableState(): If table state are inconsistent with action/ procedures 
> working on them, sometimes manipulating their states in meta fix things.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20941) Create and implement HbckService in master

2018-08-22 Thread Umesh Agashe (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16589272#comment-16589272
 ] 

Umesh Agashe commented on HBASE-20941:
--

Thanks for the review [~stack]! Build passes and all review comments for far 
are addressed. Waiting for more reviews or ship it.

> Create and implement HbckService in master
> --
>
> Key: HBASE-20941
> URL: https://issues.apache.org/jira/browse/HBASE-20941
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Umesh Agashe
>Assignee: Umesh Agashe
>Priority: Major
> Attachments: hbase-20941.master.001.patch, 
> hbase-20941.master.002.patch, hbase-20941.master.003.patch
>
>
> Create HbckService in master and implement following methods:
>  # setTableState(): If table state are inconsistent with action/ procedures 
> working on them, sometimes manipulating their states in meta fix things.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20941) Create and implement HbckService in master

2018-08-21 Thread Umesh Agashe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Umesh Agashe updated HBASE-20941:
-
Attachment: hbase-20941.master.003.patch

> Create and implement HbckService in master
> --
>
> Key: HBASE-20941
> URL: https://issues.apache.org/jira/browse/HBASE-20941
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Umesh Agashe
>Assignee: Umesh Agashe
>Priority: Major
> Attachments: hbase-20941.master.001.patch, 
> hbase-20941.master.002.patch, hbase-20941.master.003.patch
>
>
> Create HbckService in master and implement following methods:
>  # setTableState(): If table state are inconsistent with action/ procedures 
> working on them, sometimes manipulating their states in meta fix things.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20941) Create and implement HbckService in master

2018-08-17 Thread Umesh Agashe (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16584467#comment-16584467
 ] 

Umesh Agashe commented on HBASE-20941:
--

Uploaded patch 002 with changes per review comments.

> Create and implement HbckService in master
> --
>
> Key: HBASE-20941
> URL: https://issues.apache.org/jira/browse/HBASE-20941
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Umesh Agashe
>Assignee: Umesh Agashe
>Priority: Major
> Attachments: hbase-20941.master.001.patch, 
> hbase-20941.master.002.patch
>
>
> Create HbckService in master and implement following methods:
>  # setTableState(): If table state are inconsistent with action/ procedures 
> working on them, sometimes manipulating their states in meta fix things.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20941) Create and implement HbckService in master

2018-08-17 Thread Umesh Agashe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Umesh Agashe updated HBASE-20941:
-
Attachment: hbase-20941.master.002.patch

> Create and implement HbckService in master
> --
>
> Key: HBASE-20941
> URL: https://issues.apache.org/jira/browse/HBASE-20941
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Umesh Agashe
>Assignee: Umesh Agashe
>Priority: Major
> Attachments: hbase-20941.master.001.patch, 
> hbase-20941.master.002.patch
>
>
> Create HbckService in master and implement following methods:
>  # setTableState(): If table state are inconsistent with action/ procedures 
> working on them, sometimes manipulating their states in meta fix things.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HBASE-20874) Sending compaction descriptions from all regionservers to master.

2018-08-17 Thread Umesh Agashe (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16584237#comment-16584237
 ] 

Umesh Agashe edited comment on HBASE-20874 at 8/17/18 6:10 PM:
---

{code}
/testptch/hbase/hbase-shell/src/main/ruby/hbase/admin.rb:102:81: C: 
Metrics/LineLength: Line is too long. [100/80] 
/testptch/hbase/hbase-shell/src/main/ruby/hbase/admin.rb:114:81: C: 
Metrics/LineLength: Line is too long. [99/80] 
/testptch/hbase/hbase-shell/src/main/ruby/shell/commands/compactions.rb:36:81: 
C: Metrics/LineLength: Line is too long. [90/80] 
/testptch/hbase/hbase-shell/src/main/ruby/shell/commands/compactions.rb:37:81: 
C: Metrics/LineLength: Line is too long. [83/80] 
/testptch/hbase/hbase-shell/src/main/ruby/shell/commands/compactions.rb:43:81: 
C: Metrics/LineLength: Line is too long. [84/80] 
/testptch/hbase/hbase-shell/src/main/ruby/shell/commands/compactions.rb:44:81: 
C: Metrics/LineLength: Line is too long. [89/80] 
/testptch/hbase/hbase-shell/src/main/ruby/shell/commands/compactions.rb:45:81: 
C: Metrics/LineLength: Line is too long. [97/80] 
/testptch/hbase/hbase-shell/src/main/ruby/shell/commands/compactions.rb:46:81: 
C: Metrics/LineLength: Line is too long. [97/80] 
/testptch/hbase/hbase-shell/src/main/ruby/shell/commands/compactions.rb:47:81: 
C: Metrics/LineLength: Line is too long. [81/80]{code}

The above errors will go away after addressing HBASE-20851 and following issues 
are showing up in most files:
{code}
/testptch/hbase/hbase-shell/src/main/ruby/shell/commands/compactions.rb:35:7: 
C: Metrics/AbcSize: Assignment Branch Condition size for command is too high. 
[45.01/15]
/testptch/hbase/hbase-shell/src/main/ruby/shell/commands/compactions.rb:35:7: 
C: Metrics/MethodLength: Method has too many lines. [16/10] 
/testptch/hbase/hbase-shell/src/main/ruby/shell/commands/compactions.rb:36:26: 
C: Style/WordArray: Use `%w` or `%W` for an array of words.{code}
 


was (Author: uagashe):
/testptch/hbase/hbase-shell/src/main/ruby/hbase/admin.rb:102:81: C: 
Metrics/LineLength: Line is too long. [100/80] 
/testptch/hbase/hbase-shell/src/main/ruby/hbase/admin.rb:114:81: C: 
Metrics/LineLength: Line is too long. [99/80] 
/testptch/hbase/hbase-shell/src/main/ruby/shell/commands/compactions.rb:36:81: 
C: Metrics/LineLength: Line is too long. [90/80] 
/testptch/hbase/hbase-shell/src/main/ruby/shell/commands/compactions.rb:37:81: 
C: Metrics/LineLength: Line is too long. [83/80] 
/testptch/hbase/hbase-shell/src/main/ruby/shell/commands/compactions.rb:43:81: 
C: Metrics/LineLength: Line is too long. [84/80] 
/testptch/hbase/hbase-shell/src/main/ruby/shell/commands/compactions.rb:44:81: 
C: Metrics/LineLength: Line is too long. [89/80] 
/testptch/hbase/hbase-shell/src/main/ruby/shell/commands/compactions.rb:45:81: 
C: Metrics/LineLength: Line is too long. [97/80] 
/testptch/hbase/hbase-shell/src/main/ruby/shell/commands/compactions.rb:46:81: 
C: Metrics/LineLength: Line is too long. [97/80] 
/testptch/hbase/hbase-shell/src/main/ruby/shell/commands/compactions.rb:47:81: 
C: Metrics/LineLength: Line is too long. [81/80]

The above errors will go away after addressing HBASE-20851 and following issues 
are showing up in most files:

/testptch/hbase/hbase-shell/src/main/ruby/shell/commands/compactions.rb:35:7: 
C: Metrics/AbcSize: Assignment Branch Condition size for command is too high. 
[45.01/15]

/testptch/hbase/hbase-shell/src/main/ruby/shell/commands/compactions.rb:35:7: 
C: Metrics/MethodLength: Method has too many lines. [16/10] 
/testptch/hbase/hbase-shell/src/main/ruby/shell/commands/compactions.rb:36:26: 
C: Style/WordArray: Use `%w` or `%W` for an array of words.

 

> Sending compaction descriptions from all regionservers to master.
> -
>
> Key: HBASE-20874
> URL: https://issues.apache.org/jira/browse/HBASE-20874
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Mohit Goel
>Assignee: Mohit Goel
>Priority: Minor
> Attachments: HBASE-20874.master.004.patch, 
> HBASE-20874.master.005.patch, HBASE-20874.master.006.patch
>
>
> Need to send the compaction description from region servers to Master , to 
> let master know of the entire compaction state of the cluster. Further need 
> to change the implementation of client Side API than like getCompactionState, 
> which will consult master for the result instead of sending individual 
> request to regionservers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20874) Sending compaction descriptions from all regionservers to master.

2018-08-17 Thread Umesh Agashe (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16584237#comment-16584237
 ] 

Umesh Agashe commented on HBASE-20874:
--

/testptch/hbase/hbase-shell/src/main/ruby/hbase/admin.rb:102:81: C: 
Metrics/LineLength: Line is too long. [100/80] 
/testptch/hbase/hbase-shell/src/main/ruby/hbase/admin.rb:114:81: C: 
Metrics/LineLength: Line is too long. [99/80] 
/testptch/hbase/hbase-shell/src/main/ruby/shell/commands/compactions.rb:36:81: 
C: Metrics/LineLength: Line is too long. [90/80] 
/testptch/hbase/hbase-shell/src/main/ruby/shell/commands/compactions.rb:37:81: 
C: Metrics/LineLength: Line is too long. [83/80] 
/testptch/hbase/hbase-shell/src/main/ruby/shell/commands/compactions.rb:43:81: 
C: Metrics/LineLength: Line is too long. [84/80] 
/testptch/hbase/hbase-shell/src/main/ruby/shell/commands/compactions.rb:44:81: 
C: Metrics/LineLength: Line is too long. [89/80] 
/testptch/hbase/hbase-shell/src/main/ruby/shell/commands/compactions.rb:45:81: 
C: Metrics/LineLength: Line is too long. [97/80] 
/testptch/hbase/hbase-shell/src/main/ruby/shell/commands/compactions.rb:46:81: 
C: Metrics/LineLength: Line is too long. [97/80] 
/testptch/hbase/hbase-shell/src/main/ruby/shell/commands/compactions.rb:47:81: 
C: Metrics/LineLength: Line is too long. [81/80]

The above errors will go away after addressing HBASE-20851 and following issues 
are showing up in most files:

/testptch/hbase/hbase-shell/src/main/ruby/shell/commands/compactions.rb:35:7: 
C: Metrics/AbcSize: Assignment Branch Condition size for command is too high. 
[45.01/15]

/testptch/hbase/hbase-shell/src/main/ruby/shell/commands/compactions.rb:35:7: 
C: Metrics/MethodLength: Method has too many lines. [16/10] 
/testptch/hbase/hbase-shell/src/main/ruby/shell/commands/compactions.rb:36:26: 
C: Style/WordArray: Use `%w` or `%W` for an array of words.

 

> Sending compaction descriptions from all regionservers to master.
> -
>
> Key: HBASE-20874
> URL: https://issues.apache.org/jira/browse/HBASE-20874
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Mohit Goel
>Assignee: Mohit Goel
>Priority: Minor
> Attachments: HBASE-20874.master.004.patch, 
> HBASE-20874.master.005.patch, HBASE-20874.master.006.patch
>
>
> Need to send the compaction description from region servers to Master , to 
> let master know of the entire compaction state of the cluster. Further need 
> to change the implementation of client Side API than like getCompactionState, 
> which will consult master for the result instead of sending individual 
> request to regionservers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20941) Create and implement HbckService in master

2018-08-10 Thread Umesh Agashe (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16576973#comment-16576973
 ] 

Umesh Agashe commented on HBASE-20941:
--

Thanks for the review, [~busbey]. Working on changes per review comments.

> Create and implement HbckService in master
> --
>
> Key: HBASE-20941
> URL: https://issues.apache.org/jira/browse/HBASE-20941
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Umesh Agashe
>Assignee: Umesh Agashe
>Priority: Major
> Attachments: hbase-20941.master.001.patch
>
>
> Create HbckService in master and implement following methods:
>  # setTableState(): If table state are inconsistent with action/ procedures 
> working on them, sometimes manipulating their states in meta fix things.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20482) Print a link to the ref guide chapter for the shell during startup

2018-08-07 Thread Umesh Agashe (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16572467#comment-16572467
 ] 

Umesh Agashe commented on HBASE-20482:
--

+1 lgtm

> Print a link to the ref guide chapter for the shell during startup
> --
>
> Key: HBASE-20482
> URL: https://issues.apache.org/jira/browse/HBASE-20482
> Project: HBase
>  Issue Type: Task
>  Components: documentation, shell
>Reporter: Sakthi
>Assignee: Sakthi
>Priority: Minor
> Attachments: hbase-20482.branch-1.2.001.patch, 
> hbase-20482.branch-2.0.001.patch, hbase-20482.master.001.patch, 
> hbase-20482.master.002.patch, hbase-20482.master.003.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21023) Add purgeProcedure/s() API to HbckService

2018-08-07 Thread Umesh Agashe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Umesh Agashe updated HBASE-21023:
-
Description: purgeProcedure/s(): some procedures do not support abort at 
every step. When these procedures get stuck then they can not be aborted or 
make further progress. Corrective action is to purge these procedures from 
ProcWAL. Provide option to purge sub-procedures as well.

> Add purgeProcedure/s() API to HbckService
> -
>
> Key: HBASE-21023
> URL: https://issues.apache.org/jira/browse/HBASE-21023
> Project: HBase
>  Issue Type: Sub-task
>  Components: hbck2
>Affects Versions: 2.0.1
>Reporter: Umesh Agashe
>Assignee: Umesh Agashe
>Priority: Major
> Fix For: 2.2.0
>
>
> purgeProcedure/s(): some procedures do not support abort at every step. When 
> these procedures get stuck then they can not be aborted or make further 
> progress. Corrective action is to purge these procedures from ProcWAL. 
> Provide option to purge sub-procedures as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21023) Add purgeProcedure/s() API to HbckService

2018-08-07 Thread Umesh Agashe (JIRA)
Umesh Agashe created HBASE-21023:


 Summary: Add purgeProcedure/s() API to HbckService
 Key: HBASE-21023
 URL: https://issues.apache.org/jira/browse/HBASE-21023
 Project: HBase
  Issue Type: Sub-task
  Components: hbck2
Affects Versions: 2.0.1
Reporter: Umesh Agashe
Assignee: Umesh Agashe
 Fix For: 2.2.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20941) Create and implement HbckService in master

2018-08-07 Thread Umesh Agashe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Umesh Agashe updated HBASE-20941:
-
Status: Patch Available  (was: In Progress)

> Create and implement HbckService in master
> --
>
> Key: HBASE-20941
> URL: https://issues.apache.org/jira/browse/HBASE-20941
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Umesh Agashe
>Assignee: Umesh Agashe
>Priority: Major
> Attachments: hbase-20941.master.001.patch
>
>
> Create HbckService in master and implement following methods:
>  # setTableState(): If table state are inconsistent with action/ procedures 
> working on them, sometimes manipulating their states in meta fix things.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20941) Create and implement HbckService in master

2018-08-07 Thread Umesh Agashe (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16572166#comment-16572166
 ] 

Umesh Agashe commented on HBASE-20941:
--

Considering size of the patch moving out following API to separate JIRA:
 * purgeProcedure/s(): some procedures do not support abort at every step. When 
these procedures get stuck then they can not be aborted or make further 
progress. Corrective action is to purge these procedures from ProcWAL. Provide 
option to purge sub-procedures as well.

The patch adds and implements HbckService to master and adds UT for the client.

 

 

> Create and implement HbckService in master
> --
>
> Key: HBASE-20941
> URL: https://issues.apache.org/jira/browse/HBASE-20941
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Umesh Agashe
>Assignee: Umesh Agashe
>Priority: Major
> Attachments: hbase-20941.master.001.patch
>
>
> Create HbckService in master and implement following methods:
>  # setTableState(): If table state are inconsistent with action/ procedures 
> working on them, sometimes manipulating their states in meta fix things.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20941) Create and implement HbckService in master

2018-08-07 Thread Umesh Agashe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Umesh Agashe updated HBASE-20941:
-
Description: 
Create HbckService in master and implement following methods:
 # setTableState(): If table state are inconsistent with action/ procedures 
working on them, sometimes manipulating their states in meta fix things.

  was:
Create HbckService in master and implement following methods:
 # purgeProcedure/s(): some procedures do not support abort at every step. When 
these procedures get stuck then they can not be aborted or make further 
progress. Corrective action is to purge these procedures from ProcWAL. Provide 
option to purge sub-procedures as well.
 # setTable/RegionState(): If table/ region state are inconsistent with action/ 
procedures working on them, sometimes manipulating their states in meta fix 
things.


> Create and implement HbckService in master
> --
>
> Key: HBASE-20941
> URL: https://issues.apache.org/jira/browse/HBASE-20941
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Umesh Agashe
>Assignee: Umesh Agashe
>Priority: Major
> Attachments: hbase-20941.master.001.patch
>
>
> Create HbckService in master and implement following methods:
>  # setTableState(): If table state are inconsistent with action/ procedures 
> working on them, sometimes manipulating their states in meta fix things.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20941) Create and implement HbckService in master

2018-08-07 Thread Umesh Agashe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Umesh Agashe updated HBASE-20941:
-
Attachment: hbase-20941.master.001.patch

> Create and implement HbckService in master
> --
>
> Key: HBASE-20941
> URL: https://issues.apache.org/jira/browse/HBASE-20941
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Umesh Agashe
>Assignee: Umesh Agashe
>Priority: Major
> Attachments: hbase-20941.master.001.patch
>
>
> Create HbckService in master and implement following methods:
>  # purgeProcedure/s(): some procedures do not support abort at every step. 
> When these procedures get stuck then they can not be aborted or make further 
> progress. Corrective action is to purge these procedures from ProcWAL. 
> Provide option to purge sub-procedures as well.
>  # setTable/RegionState(): If table/ region state are inconsistent with 
> action/ procedures working on them, sometimes manipulating their states in 
> meta fix things.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work started] (HBASE-20941) Create and implement HbckService in master

2018-08-07 Thread Umesh Agashe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HBASE-20941 started by Umesh Agashe.

> Create and implement HbckService in master
> --
>
> Key: HBASE-20941
> URL: https://issues.apache.org/jira/browse/HBASE-20941
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Umesh Agashe
>Assignee: Umesh Agashe
>Priority: Major
>
> Create HbckService in master and implement following methods:
>  # purgeProcedure/s(): some procedures do not support abort at every step. 
> When these procedures get stuck then they can not be aborted or make further 
> progress. Corrective action is to purge these procedures from ProcWAL. 
> Provide option to purge sub-procedures as well.
>  # setTable/RegionState(): If table/ region state are inconsistent with 
> action/ procedures working on them, sometimes manipulating their states in 
> meta fix things.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21018) RS crashed because AsyncFS was unable to update HDFS data encryption key

2018-08-07 Thread Umesh Agashe (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16572068#comment-16572068
 ] 

Umesh Agashe commented on HBASE-21018:
--

+1

> RS crashed because AsyncFS was unable to update HDFS data encryption key
> 
>
> Key: HBASE-21018
> URL: https://issues.apache.org/jira/browse/HBASE-21018
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Affects Versions: 2.0.0
> Environment: Hadoop 3.0.0, HBase 2.0.0, 
> HDFS configuration dfs.encrypt.data.transfer = true
>Reporter: Wei-Chiu Chuang
>Priority: Critical
> Attachments: HBASE-21018.master.001.patch
>
>
> We (+[~uagashe]) found HBase RegionServer doesn't update HDFS data encryption 
> key correctly, and in some cases after retry 10 times, it aborts.
> {noformat}
> 2018-08-03 17:37:03,233 WARN 
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper: create 
> fan-out dfs output 
> /hbase/WALs/rs1.example.com,22101,1533318719239/rs1.example.com%2C22101%2C1533318719239.rs1.example.com%2C22101%2C1533318719239.regiongroup-0.1533343022981
>  failed, retry = 1
> org.apache.hadoop.hdfs.protocol.datatransfer.InvalidEncryptionKeyException: 
> Can't re-compute encryption key for nonce, since the required block key 
> (keyID=1685436998) doesn't exist. Current key: 1085959374
> at 
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputSaslHelper$SaslNegotiateHandler.check(FanOutOneBlockAsyncDFSOutputSaslHelper.java:399)
> at 
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputSaslHelper$SaslNegotiateHandler.channelRead(FanOutOneBlockAsyncDFSOutputSaslHelper.java:470)
> at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
> at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
> at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
> at 
> org.apache.hbase.thirdparty.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
> at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
> at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
> at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
> at 
> org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:310)
> at 
> org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:284)
> at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
> at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
> at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
> at 
> org.apache.hbase.thirdparty.io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)
> at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
> at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
> at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
> at 
> org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1359)
> at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
> at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
> at 
> org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:935)
> at 
> org.apache.hbase.thirdparty.io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:801)
> at 
> org.apache.hbase.thirdparty.io.netty.channel.epo

[jira] [Commented] (HBASE-21018) RS crashed because AsyncFS was unable to update HDFS data encryption key

2018-08-06 Thread Umesh Agashe (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16571064#comment-16571064
 ] 

Umesh Agashe commented on HBASE-21018:
--

Hi [~Apache9], any details you can provide on this will be helpful. thanks!

> RS crashed because AsyncFS was unable to update HDFS data encryption key
> 
>
> Key: HBASE-21018
> URL: https://issues.apache.org/jira/browse/HBASE-21018
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Affects Versions: 2.0.0
> Environment: Hadoop 3.0.0, HBase 2.0.0, 
> HDFS configuration dfs.encrypt.data.transfer = true
>Reporter: Wei-Chiu Chuang
>Priority: Critical
> Attachments: HBASE-21018.master.001.patch
>
>
> We (+[~uagashe]) found HBase RegionServer doesn't update HDFS data encryption 
> key correctly, and in some cases after retry 10 times, it aborts.
> {noformat}
> 2018-08-03 17:37:03,233 WARN 
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper: create 
> fan-out dfs output 
> /hbase/WALs/rs1.example.com,22101,1533318719239/rs1.example.com%2C22101%2C1533318719239.rs1.example.com%2C22101%2C1533318719239.regiongroup-0.1533343022981
>  failed, retry = 1
> org.apache.hadoop.hdfs.protocol.datatransfer.InvalidEncryptionKeyException: 
> Can't re-compute encryption key for nonce, since the required block key 
> (keyID=1685436998) doesn't exist. Current key: 1085959374
> at 
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputSaslHelper$SaslNegotiateHandler.check(FanOutOneBlockAsyncDFSOutputSaslHelper.java:399)
> at 
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputSaslHelper$SaslNegotiateHandler.channelRead(FanOutOneBlockAsyncDFSOutputSaslHelper.java:470)
> at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
> at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
> at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
> at 
> org.apache.hbase.thirdparty.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
> at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
> at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
> at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
> at 
> org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:310)
> at 
> org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:284)
> at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
> at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
> at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
> at 
> org.apache.hbase.thirdparty.io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)
> at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
> at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
> at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
> at 
> org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1359)
> at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
> at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
> at 
> org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:935)
> at 
> org.apache.hbase.thirdparty.io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.j

[jira] [Commented] (HBASE-20815) In TestServerCrashProcedure collect and assert on submitted and failed counts for ServerCrashProcedure

2018-07-25 Thread Umesh Agashe (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16556368#comment-16556368
 ] 

Umesh Agashe commented on HBASE-20815:
--

+1, lgtm. Thanks for adding testRecoveryOnRsWithMeta() , [~xucang]!

> In TestServerCrashProcedure collect and assert on submitted and failed counts 
> for ServerCrashProcedure
> --
>
> Key: HBASE-20815
> URL: https://issues.apache.org/jira/browse/HBASE-20815
> Project: HBase
>  Issue Type: Bug
>  Components: amv2
>Reporter: Umesh Agashe
>Assignee: Xu Cang
>Priority: Minor
> Attachments: HBASE-20815.master.001.patch, 
> HBASE-20815.master.002.patch, HBASE-20815.master.002.patch
>
>
> We need to collect and possibly assert on number of procedures submitted and 
> failed for ServerCrashProcedures.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20941) Create and implement HbckService in master

2018-07-25 Thread Umesh Agashe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Umesh Agashe updated HBASE-20941:
-
Summary: Create and implement HbckService in master  (was: Cre)

> Create and implement HbckService in master
> --
>
> Key: HBASE-20941
> URL: https://issues.apache.org/jira/browse/HBASE-20941
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Umesh Agashe
>Assignee: Umesh Agashe
>Priority: Major
>
> Create HbckService in master and implement following methods:
>  # purgeProcedure/s(): some procedures do not support abort at every step. 
> When these procedures get stuck then they can not be aborted or make further 
> progress. Corrective action is to purge these procedures from ProcWAL. 
> Provide option to purge sub-procedures as well.
>  # setTable/RegionState(): If table/ region state are inconsistent with 
> action/ procedures working on them, sometimes manipulating their states in 
> meta fix things.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20941) Cre

2018-07-25 Thread Umesh Agashe (JIRA)
Umesh Agashe created HBASE-20941:


 Summary: Cre
 Key: HBASE-20941
 URL: https://issues.apache.org/jira/browse/HBASE-20941
 Project: HBase
  Issue Type: Sub-task
Reporter: Umesh Agashe
Assignee: Umesh Agashe


Create HbckService in master and implement following methods:
 # purgeProcedure/s(): some procedures do not support abort at every step. When 
these procedures get stuck then they can not be aborted or make further 
progress. Corrective action is to purge these procedures from ProcWAL. Provide 
option to purge sub-procedures as well.
 # setTable/RegionState(): If table/ region state are inconsistent with action/ 
procedures working on them, sometimes manipulating their states in meta fix 
things.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20815) In TestServerCrashProcedure collect and assert on submitted and failed counts for ServerCrashProcedure

2018-07-25 Thread Umesh Agashe (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16556075#comment-16556075
 ] 

Umesh Agashe commented on HBASE-20815:
--

Unit test for patch 002 failed. Failure doesn't look related to the changes. 
Retrying.

> In TestServerCrashProcedure collect and assert on submitted and failed counts 
> for ServerCrashProcedure
> --
>
> Key: HBASE-20815
> URL: https://issues.apache.org/jira/browse/HBASE-20815
> Project: HBase
>  Issue Type: Bug
>  Components: amv2
>Reporter: Umesh Agashe
>Assignee: Xu Cang
>Priority: Minor
> Attachments: HBASE-20815.master.001.patch, 
> HBASE-20815.master.002.patch, HBASE-20815.master.002.patch
>
>
> We need to collect and possibly assert on number of procedures submitted and 
> failed for ServerCrashProcedures.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20815) In TestServerCrashProcedure collect and assert on submitted and failed counts for ServerCrashProcedure

2018-07-25 Thread Umesh Agashe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Umesh Agashe updated HBASE-20815:
-
Attachment: HBASE-20815.master.002.patch

> In TestServerCrashProcedure collect and assert on submitted and failed counts 
> for ServerCrashProcedure
> --
>
> Key: HBASE-20815
> URL: https://issues.apache.org/jira/browse/HBASE-20815
> Project: HBase
>  Issue Type: Bug
>  Components: amv2
>Reporter: Umesh Agashe
>Assignee: Xu Cang
>Priority: Minor
> Attachments: HBASE-20815.master.001.patch, 
> HBASE-20815.master.002.patch, HBASE-20815.master.002.patch
>
>
> We need to collect and possibly assert on number of procedures submitted and 
> failed for ServerCrashProcedures.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20815) In TestServerCrashProcedure collect and assert on submitted and failed counts for ServerCrashProcedure

2018-07-24 Thread Umesh Agashe (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16554918#comment-16554918
 ] 

Umesh Agashe commented on HBASE-20815:
--

Thanks for the patch [~xucang]! It looks good. One minor comment: I see 
testCount is incremented before making calls to 
testRecoveryAndDoubleExecution() in all instances. Can testCount be incremented 
in testRecoveryAndDoubleExecution() itself? Thanks!

> In TestServerCrashProcedure collect and assert on submitted and failed counts 
> for ServerCrashProcedure
> --
>
> Key: HBASE-20815
> URL: https://issues.apache.org/jira/browse/HBASE-20815
> Project: HBase
>  Issue Type: Bug
>  Components: amv2
>Reporter: Umesh Agashe
>Assignee: Xu Cang
>Priority: Minor
> Attachments: HBASE-20815.master.001.patch
>
>
> We need to collect and possibly assert on number of procedures submitted and 
> failed for ServerCrashProcedures.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-6028) Implement a cancel for in-progress compactions

2018-07-05 Thread Umesh Agashe (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16534206#comment-16534206
 ] 

Umesh Agashe commented on HBASE-6028:
-

[~busbey], created HBASE-20851 for rubocop config changes.

> Implement a cancel for in-progress compactions
> --
>
> Key: HBASE-6028
> URL: https://issues.apache.org/jira/browse/HBASE-6028
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Derek Wollenstein
>Assignee: Mohit Goel
>Priority: Minor
>  Labels: beginner
> Attachments: HBASE-6028.master.007.patch, 
> HBASE-6028.master.008.patch, HBASE-6028.master.008.patch, 
> HBASE-6028.master.009.patch
>
>
> Depending on current server load, it can be extremely expensive to run 
> periodic minor / major compactions.  It would be helpful to have a feature 
> where a user could use the shell or a client tool to explicitly cancel an 
> in-progress compactions.  This would allow a system to recover when too many 
> regions became eligible for compactions at once



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20851) Change rubocop config for max line length of 100

2018-07-05 Thread Umesh Agashe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Umesh Agashe updated HBASE-20851:
-
Labels: beginner beginners  (was: )

> Change rubocop config for max line length of 100
> 
>
> Key: HBASE-20851
> URL: https://issues.apache.org/jira/browse/HBASE-20851
> Project: HBase
>  Issue Type: Bug
>  Components: shell
>Affects Versions: 2.0.1
>Reporter: Umesh Agashe
>Priority: Minor
>  Labels: beginner, beginners
>
> Existing ruby and Java code uses max line length of 100 characters. Change 
> rubocop config with:
> {code:java}
> Metrics/LineLength:
>   Max: 100
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20851) Change rubocop config for max line length of 100

2018-07-05 Thread Umesh Agashe (JIRA)
Umesh Agashe created HBASE-20851:


 Summary: Change rubocop config for max line length of 100
 Key: HBASE-20851
 URL: https://issues.apache.org/jira/browse/HBASE-20851
 Project: HBase
  Issue Type: Bug
  Components: shell
Affects Versions: 2.0.1
Reporter: Umesh Agashe


Existing ruby and Java code uses max line length of 100 characters. Change 
rubocop config with:
{code:java}
Metrics/LineLength:
  Max: 100
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-6028) Implement a cancel for in-progress compactions

2018-07-02 Thread Umesh Agashe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-6028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Umesh Agashe updated HBASE-6028:

Attachment: HBASE-6028.master.008.patch

> Implement a cancel for in-progress compactions
> --
>
> Key: HBASE-6028
> URL: https://issues.apache.org/jira/browse/HBASE-6028
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Derek Wollenstein
>Assignee: Mohit Goel
>Priority: Minor
>  Labels: beginner
> Attachments: HBASE-6028.master.007.patch, 
> HBASE-6028.master.008.patch, HBASE-6028.master.008.patch
>
>
> Depending on current server load, it can be extremely expensive to run 
> periodic minor / major compactions.  It would be helpful to have a feature 
> where a user could use the shell or a client tool to explicitly cancel an 
> in-progress compactions.  This would allow a system to recover when too many 
> regions became eligible for compactions at once



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-6028) Implement a cancel for in-progress compactions

2018-07-02 Thread Umesh Agashe (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16530763#comment-16530763
 ] 

Umesh Agashe commented on HBASE-6028:
-

2 of the rubocop messages are about 'Line too long'. Rest of the ruby code has 
100 chars wide lines, rubocop expects 80. These messages can be ignored. Unit 
test failure 'TestSyncReplicationStandbyKillMaster' doesn't seem to be related 
to the changes. Retrying the build.

> Implement a cancel for in-progress compactions
> --
>
> Key: HBASE-6028
> URL: https://issues.apache.org/jira/browse/HBASE-6028
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Derek Wollenstein
>Assignee: Mohit Goel
>Priority: Minor
>  Labels: beginner
> Attachments: HBASE-6028.master.007.patch, HBASE-6028.master.008.patch
>
>
> Depending on current server load, it can be extremely expensive to run 
> periodic minor / major compactions.  It would be helpful to have a feature 
> where a user could use the shell or a client tool to explicitly cancel an 
> in-progress compactions.  This would allow a system to recover when too many 
> regions became eligible for compactions at once



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20814) fix error prone assertion failure ignored warnings

2018-06-28 Thread Umesh Agashe (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16526865#comment-16526865
 ] 

Umesh Agashe commented on HBASE-20814:
--

+1, lgtm

> fix error prone assertion failure ignored warnings
> --
>
> Key: HBASE-20814
> URL: https://issues.apache.org/jira/browse/HBASE-20814
> Project: HBase
>  Issue Type: Sub-task
>  Components: build, test
>Reporter: Mike Drob
>Assignee: Mike Drob
>Priority: Major
> Attachments: HBASE-20814.master.001.patch
>
>
> when we have assertion failures ignored, that likely means we're missing a 
> test case, let's make sure our tests are actually running and covering what 
> we think they are.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-16549) Procedure v2 - Add new AM metrics

2018-06-28 Thread Umesh Agashe (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16526833#comment-16526833
 ] 

Umesh Agashe commented on HBASE-16549:
--

Done,HBASE-20815. Thanks [~mdrob]!

> Procedure v2 - Add new AM metrics
> -
>
> Key: HBASE-16549
> URL: https://issues.apache.org/jira/browse/HBASE-16549
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2, Region Assignment
>Affects Versions: 2.0.0
>Reporter: Matteo Bertozzi
>Assignee: Umesh Agashe
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: HBASE-16549-hbase-14614.v1.patch, 
> HBASE-16549-hbase-14614.v2-v3.patch, HBASE-16549-hbase-14614.v2.patch, 
> HBASE-16549-hbase-14614.v3.patch, HBASE-16549-hbase-14614.v3.patch, 
> HBASE-16549.master.v4.patch, HBASE-16549.master.v4.patch, 
> HBASE-16549.master.v4.patch, HBASE-16549.master.v5.patch
>
>
> With the new AM we can add a bunch of metrics
>  - assign/unassign time
>  - server crash time
>  - grouping related metrics? (how many batch we do, and similar?)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20815) In TestServerCrashProcedure collect and assert on submitted and failed counts for ServerCrashProcedure

2018-06-28 Thread Umesh Agashe (JIRA)
Umesh Agashe created HBASE-20815:


 Summary: In TestServerCrashProcedure collect and assert on 
submitted and failed counts for ServerCrashProcedure
 Key: HBASE-20815
 URL: https://issues.apache.org/jira/browse/HBASE-20815
 Project: HBase
  Issue Type: Bug
  Components: amv2
Reporter: Umesh Agashe


We need to collect and possibly assert on number of procedures submitted and 
failed for ServerCrashProcedures.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-16549) Procedure v2 - Add new AM metrics

2018-06-28 Thread Umesh Agashe (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16526800#comment-16526800
 ] 

Umesh Agashe commented on HBASE-16549:
--

[~mdrob], It was added with intention to use it later. When I it was added due 
to the flakiness of the test, asserting on the counts was not possible.

> Procedure v2 - Add new AM metrics
> -
>
> Key: HBASE-16549
> URL: https://issues.apache.org/jira/browse/HBASE-16549
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2, Region Assignment
>Affects Versions: 2.0.0
>Reporter: Matteo Bertozzi
>Assignee: Umesh Agashe
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: HBASE-16549-hbase-14614.v1.patch, 
> HBASE-16549-hbase-14614.v2-v3.patch, HBASE-16549-hbase-14614.v2.patch, 
> HBASE-16549-hbase-14614.v3.patch, HBASE-16549-hbase-14614.v3.patch, 
> HBASE-16549.master.v4.patch, HBASE-16549.master.v4.patch, 
> HBASE-16549.master.v4.patch, HBASE-16549.master.v5.patch
>
>
> With the new AM we can add a bunch of metrics
>  - assign/unassign time
>  - server crash time
>  - grouping related metrics? (how many batch we do, and similar?)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HBASE-18366) Fix flaky test hbase.master.procedure.TestServerCrashProcedure#testRecoveryAndDoubleExecutionOnRsWithMeta

2018-06-25 Thread Umesh Agashe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-18366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Umesh Agashe resolved HBASE-18366.
--
Resolution: Not A Problem

Not flaky anymore. Fixed by other JIRAs.

> Fix flaky test 
> hbase.master.procedure.TestServerCrashProcedure#testRecoveryAndDoubleExecutionOnRsWithMeta
> -
>
> Key: HBASE-18366
> URL: https://issues.apache.org/jira/browse/HBASE-18366
> Project: HBase
>  Issue Type: Bug
>Reporter: Umesh Agashe
>Assignee: Umesh Agashe
>Priority: Major
> Attachments: hbase-18366.fix1.patch, hbase-18366.fix2.patch
>
>
> It worked for a few days after enabling it with HBASE-18278. But started 
> failing after commits:
> 6786b2b
> 68436c9
> 75d2eca
> 50bb045
> df93c13
> It works with one commit before: c5abb6c. Need to see what changed with those 
> commits.
> Currently it fails with TableNotFoundException.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20403) Prefetch sometimes doesn't work with encrypted file system

2018-06-20 Thread Umesh Agashe (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16518682#comment-16518682
 ] 

Umesh Agashe commented on HBASE-20403:
--

+1 for the patch! Nice. Thanks [~tlipcon]!

> Prefetch sometimes doesn't work with encrypted file system
> --
>
> Key: HBASE-20403
> URL: https://issues.apache.org/jira/browse/HBASE-20403
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0-beta-2
>Reporter: Umesh Agashe
>Assignee: Todd Lipcon
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: hbase-20403.patch
>
>
> Log from long running test has following stack trace a few times:
> {code}
> 2018-04-09 18:33:21,523 WARN 
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl: Prefetch 
> path=hdfs://ns1/hbase/data/default/IntegrationTestBigLinkedList_20180409172704/35f1a7ef13b9d327665228abdbcdffae/meta/9089d98b2a6b4847b3fcf6aceb124988,
>  offset=36884200, end=231005989
> java.lang.IllegalArgumentException
>   at java.nio.Buffer.limit(Buffer.java:275)
>   at 
> org.apache.hadoop.hdfs.ByteBufferStrategy.readFromBlock(ReaderStrategy.java:183)
>   at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:705)
>   at 
> org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:766)
>   at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:831)
>   at 
> org.apache.hadoop.crypto.CryptoInputStream.read(CryptoInputStream.java:197)
>   at java.io.DataInputStream.read(DataInputStream.java:149)
>   at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock.readWithExtra(HFileBlock.java:762)
>   at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readAtOffset(HFileBlock.java:1559)
>   at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockDataInternal(HFileBlock.java:1771)
>   at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockData(HFileBlock.java:1594)
>   at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.readBlock(HFileReaderImpl.java:1488)
>   at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$1.run(HFileReaderImpl.java:278)
>   at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
> Size on disk calculations seem to get messed up due to encryption. Possible 
> fixes can be:
> * if file is encrypted with FileStatus#isEncrypted() and do not prefetch.
> * document that hbase.rs.prefetchblocksonopen cannot be true if file is 
> encrypted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19121) HBCK for AMv2 (A.K.A HBCK2)

2018-06-14 Thread Umesh Agashe (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-19121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16512757#comment-16512757
 ] 

Umesh Agashe commented on HBASE-19121:
--

Current usage:
{code:java}
usage: hbase org.apache.hadoop.hbase.util.HBaseFsck2 [OPTIONS] [ACTIONS]
Options:
-l,--timelag  Restrict actions to regions that are not updated 
in last  seconds.
-e,--noExclusive Run even if another instance of hbck is running.
-t,--tables  Restrict actions to specified comma seperated list of 
tables.
-r,--regions  Restrict actions to specified comma seperated list of 
regions.
-s,--regionServers  Restrict actions to specified comma 
seperated list of region servers.
-d,--details Report details.
-v,--verbose Verbose output.
Actions:
FixAssignments Try fixing assignments of regions stuck in transition by 
submitting assign/ unassign procedures.{code}

> HBCK for AMv2 (A.K.A HBCK2)
> ---
>
> Key: HBASE-19121
> URL: https://issues.apache.org/jira/browse/HBASE-19121
> Project: HBase
>  Issue Type: Bug
>  Components: hbck
>Reporter: stack
>Assignee: Umesh Agashe
>Priority: Major
> Attachments: hbase-19121.master.001.patch
>
>
> We don't have an hbck for the new AM. Old hbck may actually do damage going 
> against AMv2.
> Fix.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-19121) HBCK for AMv2 (A.K.A HBCK2)

2018-06-11 Thread Umesh Agashe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Umesh Agashe updated HBASE-19121:
-
Status: Patch Available  (was: Open)

> HBCK for AMv2 (A.K.A HBCK2)
> ---
>
> Key: HBASE-19121
> URL: https://issues.apache.org/jira/browse/HBASE-19121
> Project: HBase
>  Issue Type: Bug
>  Components: hbck
>Reporter: stack
>Assignee: Umesh Agashe
>Priority: Major
> Attachments: hbase-19121.master.001.patch
>
>
> We don't have an hbck for the new AM. Old hbck may actually do damage going 
> against AMv2.
> Fix.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HBASE-19121) HBCK for AMv2 (A.K.A HBCK2)

2018-06-11 Thread Umesh Agashe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Umesh Agashe reassigned HBASE-19121:


Assignee: Umesh Agashe

> HBCK for AMv2 (A.K.A HBCK2)
> ---
>
> Key: HBASE-19121
> URL: https://issues.apache.org/jira/browse/HBASE-19121
> Project: HBase
>  Issue Type: Bug
>  Components: hbck
>Reporter: stack
>Assignee: Umesh Agashe
>Priority: Major
> Attachments: hbase-19121.master.001.patch
>
>
> We don't have an hbck for the new AM. Old hbck may actually do damage going 
> against AMv2.
> Fix.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19121) HBCK for AMv2 (A.K.A HBCK2)

2018-06-11 Thread Umesh Agashe (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-19121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16508721#comment-16508721
 ] 

Umesh Agashe commented on HBASE-19121:
--

HBCK2 will evolve. First version with basic command line options and parsing is 
in 001 patch. It also has action to FixAssignments.

> HBCK for AMv2 (A.K.A HBCK2)
> ---
>
> Key: HBASE-19121
> URL: https://issues.apache.org/jira/browse/HBASE-19121
> Project: HBase
>  Issue Type: Bug
>  Components: hbck
>Reporter: stack
>Priority: Major
> Attachments: hbase-19121.master.001.patch
>
>
> We don't have an hbck for the new AM. Old hbck may actually do damage going 
> against AMv2.
> Fix.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-19121) HBCK for AMv2 (A.K.A HBCK2)

2018-06-11 Thread Umesh Agashe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Umesh Agashe updated HBASE-19121:
-
Attachment: hbase-19121.master.001.patch

> HBCK for AMv2 (A.K.A HBCK2)
> ---
>
> Key: HBASE-19121
> URL: https://issues.apache.org/jira/browse/HBASE-19121
> Project: HBase
>  Issue Type: Bug
>  Components: hbck
>Reporter: stack
>Priority: Major
> Attachments: hbase-19121.master.001.patch
>
>
> We don't have an hbck for the new AM. Old hbck may actually do damage going 
> against AMv2.
> Fix.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20679) Add the ability to compile JSP dynamically in Jetty

2018-06-05 Thread Umesh Agashe (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16502259#comment-16502259
 ] 

Umesh Agashe commented on HBASE-20679:
--

Thanks for the patch [~allan163]! @stack, we need to discuss moving away from 
running master requirement for hbck2.

> Add the ability to compile JSP dynamically in Jetty
> ---
>
> Key: HBASE-20679
> URL: https://issues.apache.org/jira/browse/HBASE-20679
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 2.0.0
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-20679.002.patch, HBASE-20679.patch
>
>
> As discussed in HBASE-20617, adding the ability to dynamically compile jsp 
> enable us to do some hot fix. 
>  For example, several days ago, in our testing HBase-2.0 cluster, 
> procedureWals were corrupted due to some unknown reasons. After restarting 
> the cluster, since some procedures(AssignProcedure for example) were 
> corrupted and couldn't be replayed. Some regions were stuck in RIT forever. 
> We couldn't use HBCK since it haven't support AssignmentV2 yet. As a matter 
> of fact, the namespace region was not online, so the master was not inited, 
> we even couldn't use shell command like assign/move. But, we wrote a jsp and 
> fix this issue easily. The jsp file is like this:
> {code:java}
> <%
>   String action = request.getParameter("action");
>   HMaster master = (HMaster)getServletContext().getAttribute(HMaster.MASTER);
>   List offlineRegionsToAssign = new ArrayList<>();
>   List regionRITs = 
> master.getAssignmentManager()
>   .getRegionStates().getRegionsInTransition();
>   for (RegionStates.RegionStateNode regionStateNode :  regionRITs) {
> // if regionStateNode don't have a procedure attached, but meta state 
> shows
> // this region is in RIT, that means the previous procedure may be 
> corrupted
> // we need to create a new assignProcedure to assign them
> if (!regionStateNode.isInTransition()) {
>   offlineRegionsToAssign.add(regionStateNode.getRegionInfo());
>   out.println("RIT region:" + regionStateNode);
> }
>   }
>   // Assign offline regions. Uses round-robin.
>   if ("fix".equals(action) && offlineRegionsToAssign.size() > 0) {
> 
> master.getMasterProcedureExecutor().submitProcedures(master.getAssignmentManager().
> createRoundRobinAssignProcedures(offlineRegionsToAssign));
>   } else {
> out.println("use ?action=fix to fix RIT regions");
>   }
> %>
> {code}
> Above it is only one example we can do if we have the ability to compile jsp 
> dynamically. We think it is very useful.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20634) Reopen region while server crash can cause the procedure to be stuck

2018-06-01 Thread Umesh Agashe (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16498698#comment-16498698
 ] 

Umesh Agashe commented on HBASE-20634:
--

+1 lgtm

> Reopen region while server crash can cause the procedure to be stuck
> 
>
> Key: HBASE-20634
> URL: https://issues.apache.org/jira/browse/HBASE-20634
> Project: HBase
>  Issue Type: Bug
>Reporter: Duo Zhang
>Assignee: stack
>Priority: Critical
> Fix For: 3.0.0, 2.1.0, 2.0.1
>
> Attachments: HBASE-20634-UT.patch, HBASE-20634.branch-2.0.001.patch, 
> HBASE-20634.branch-2.0.002.patch, HBASE-20634.branch-2.0.003.patch, 
> HBASE-20634.branch-2.0.004.patch, HBASE-20634.branch-2.0.005.patch, 
> HBASE-20634.branch-2.0.006.patch, HBASE-20634.branch-2.0.006.patch, 
> HBASE-20634.branch-2.0.007.patch
>
>
> Found this when implementing HBASE-20424, where we will transit the peer sync 
> replication state while there is server crash.
> The problem is that, in ServerCrashAssign, we do not have the region lock, so 
> it is possible that after we call handleRIT to clear the existing 
> assign/unassign procedures related to this rs, and before we schedule the 
> assign procedures, it is possible that that we schedule a unassign procedure 
> for a region on the crashed rs. This procedure will not receive the 
> ServerCrashException, instead, in addToRemoteDispatcher, it will find that it 
> can not dispatch the remote call and then a  FailedRemoteDispatchException 
> will be raised. But we do not treat this exception the same with 
> ServerCrashException, instead, we will try to expire the rs. Obviously the rs 
> has already been marked as expired, so this is almost a no-op. Then the 
> procedure will be stuck there for ever.
> A possible way to fix it is to treat FailedRemoteDispatchException the same 
> with ServerCrashException, as it will be created in addToRemoteDispatcher 
> only, and the only reason we can not dispatch a remote call is that the rs 
> has already been dead. The nodeMap is a ConcurrentMap so I think we could use 
> it as a guard.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20634) Reopen region while server crash can cause the procedure to be stuck

2018-06-01 Thread Umesh Agashe (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16498618#comment-16498618
 ] 

Umesh Agashe commented on HBASE-20634:
--

[~stack], I have posted comments on RB for latest patch. thanks!

> Reopen region while server crash can cause the procedure to be stuck
> 
>
> Key: HBASE-20634
> URL: https://issues.apache.org/jira/browse/HBASE-20634
> Project: HBase
>  Issue Type: Bug
>Reporter: Duo Zhang
>Assignee: stack
>Priority: Critical
> Fix For: 3.0.0, 2.1.0, 2.0.1
>
> Attachments: HBASE-20634-UT.patch, HBASE-20634.branch-2.0.001.patch, 
> HBASE-20634.branch-2.0.002.patch, HBASE-20634.branch-2.0.003.patch, 
> HBASE-20634.branch-2.0.004.patch, HBASE-20634.branch-2.0.005.patch, 
> HBASE-20634.branch-2.0.006.patch, HBASE-20634.branch-2.0.006.patch
>
>
> Found this when implementing HBASE-20424, where we will transit the peer sync 
> replication state while there is server crash.
> The problem is that, in ServerCrashAssign, we do not have the region lock, so 
> it is possible that after we call handleRIT to clear the existing 
> assign/unassign procedures related to this rs, and before we schedule the 
> assign procedures, it is possible that that we schedule a unassign procedure 
> for a region on the crashed rs. This procedure will not receive the 
> ServerCrashException, instead, in addToRemoteDispatcher, it will find that it 
> can not dispatch the remote call and then a  FailedRemoteDispatchException 
> will be raised. But we do not treat this exception the same with 
> ServerCrashException, instead, we will try to expire the rs. Obviously the rs 
> has already been marked as expired, so this is almost a no-op. Then the 
> procedure will be stuck there for ever.
> A possible way to fix it is to treat FailedRemoteDispatchException the same 
> with ServerCrashException, as it will be created in addToRemoteDispatcher 
> only, and the only reason we can not dispatch a remote call is that the rs 
> has already been dead. The nodeMap is a ConcurrentMap so I think we could use 
> it as a guard.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20552) HBase RegionServer was shutdown due to UnexpectedStateException

2018-05-17 Thread Umesh Agashe (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16479831#comment-16479831
 ] 

Umesh Agashe commented on HBASE-20552:
--

[~elserj], I don't have a repro. I thought I had a repro but it was due to the 
bug which was inadvertently introduced in recent commit and got fixed in 
addendum (HBASE-20564). So far I found 2 instances of missing edits around the 
same time. First, in master proc wal where 003 is not able to read pids 468 
onwards. And second, in meta region:

pid=475 on 005 started with:
{code:java}
2018-05-02 05:39:45,811 INFO  [PEWorker-6] assignment.AssignProcedure: Starting 
pid=475, ppid=471, state=RUNNABLE:REGION_TRANSITION_QUEUE; AssignProcedure 
table=test_hbase_ha_load_test_tool_hbase, 
region=94f6ca283dbb4445b2bcdc321b734d28; rit=OFFLINE, 
location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525238558502; 
forceNewPlan=false, retain=true
{code}
After this it was updated twice on 005:
{code:java}
2018-05-02 05:39:45,983 INFO  [PEWorker-1] assignment.RegionStateStore: pid=475 
updating hbase:meta row=94f6ca283dbb4445b2bcdc321b734d28, regionState=OPENING
2018-05-02 05:39:46,580 INFO  [PEWorker-1] assignment.RegionStateStore: pid=475 
updating hbase:meta row=94f6ca283dbb4445b2bcdc321b734d28, regionState=OPEN, 
openSeqNum=13401, 
regionLocation=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474
{code}
But when 003 read and printed meta, it has:
{code:java}
2018-05-02 05:44:08,236 INFO  
[master/ctr-e138-1518143905142-279227-01-03:2] 
assignment.RegionStateStore: Load hbase:meta entry 
region=94f6ca283dbb4445b2bcdc321b734d28, regionState=OPEN, 
lastHost=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474, 
regionLocation=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525238558502
{code}
The location server including timestamp matches to when pid=471 started 
"location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525238558502".
 So 2 updates from pid=471 to meta are missing.

> HBase RegionServer was shutdown due to UnexpectedStateException
> ---
>
> Key: HBASE-20552
> URL: https://issues.apache.org/jira/browse/HBASE-20552
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Romil Choksi
>Assignee: Umesh Agashe
>Priority: Critical
> Attachments: 
> 102143-master-ctr-e138-1518143905142-279227-01-03.hwx.site.log, 
> 102143-master-ctr-e138-1518143905142-279227-01-05.hwx.site.log, 
> 102143-regionserver-ctr-e138-1518143905142-279227-01-02.hwx.site.log, 
> 102143-regionserver-ctr-e138-1518143905142-279227-01-07.hwx.site.log, 
> 102143-regionserver-ctr-e138-1518143905142-279227-01-08.hwx.site.log
>
>
> This was observed during cluster testing (source code sync'ed with hbase-2.0, 
> built May 2nd):
> {code}
> 2018-05-02 05:44:10,089 ERROR 
> [RpcServer.default.FPBQ.Fifo.handler=28,queue=1,port=2] 
> master.MasterRpcServices: Region server 
> ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 reported 
> a fatal error:
> * ABORTING region server 
> ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474: 
> org.apache.hadoop.hbase.YouAreDeadException: rit=OPEN, location=ctr-e138- 
> 1518143905142-279227-01-07.hwx.site,16020,1525239609353, 
> table=test_hbase_ha_load_test_tool_hbase, 
> region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on server=ctr-e138-  
> 1518143905142-279227-01-02.hwx.site,16020,1525239334474 but state has 
> otherwise.
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1065)
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.reportOnlineRegions(AssignmentManager.java:987)
>   at 
> org.apache.hadoop.hbase.master.MasterRpcServices.regionServerReport(MasterRpcServices.java:459)
>   at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:13118)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131)
>   at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
>   at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
> Caused by: org.apache.hadoop.hbase.exceptions.UnexpectedStateException: 
> rit=OPEN, 
> location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353,
>  table=test_hbase_ha_load_test_tool_hbase, 
> region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on 
> server=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 
> but state  has other

[jira] [Commented] (HBASE-20552) HBase RegionServer was shutdown due to UnexpectedStateException

2018-05-16 Thread Umesh Agashe (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478044#comment-16478044
 ] 

Umesh Agashe commented on HBASE-20552:
--

[~yuzhih...@gmail.com], Just want to confirm that you saw this on branch-2.0 or 
master?

> HBase RegionServer was shutdown due to UnexpectedStateException
> ---
>
> Key: HBASE-20552
> URL: https://issues.apache.org/jira/browse/HBASE-20552
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Romil Choksi
>Assignee: Umesh Agashe
>Priority: Critical
> Attachments: 
> 102143-master-ctr-e138-1518143905142-279227-01-03.hwx.site.log, 
> 102143-master-ctr-e138-1518143905142-279227-01-05.hwx.site.log, 
> 102143-regionserver-ctr-e138-1518143905142-279227-01-02.hwx.site.log, 
> 102143-regionserver-ctr-e138-1518143905142-279227-01-07.hwx.site.log, 
> 102143-regionserver-ctr-e138-1518143905142-279227-01-08.hwx.site.log
>
>
> This was observed during cluster testing (source code sync'ed with hbase-2.0, 
> built May 2nd):
> {code}
> 2018-05-02 05:44:10,089 ERROR 
> [RpcServer.default.FPBQ.Fifo.handler=28,queue=1,port=2] 
> master.MasterRpcServices: Region server 
> ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 reported 
> a fatal error:
> * ABORTING region server 
> ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474: 
> org.apache.hadoop.hbase.YouAreDeadException: rit=OPEN, location=ctr-e138- 
> 1518143905142-279227-01-07.hwx.site,16020,1525239609353, 
> table=test_hbase_ha_load_test_tool_hbase, 
> region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on server=ctr-e138-  
> 1518143905142-279227-01-02.hwx.site,16020,1525239334474 but state has 
> otherwise.
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1065)
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.reportOnlineRegions(AssignmentManager.java:987)
>   at 
> org.apache.hadoop.hbase.master.MasterRpcServices.regionServerReport(MasterRpcServices.java:459)
>   at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:13118)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131)
>   at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
>   at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
> Caused by: org.apache.hadoop.hbase.exceptions.UnexpectedStateException: 
> rit=OPEN, 
> location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353,
>  table=test_hbase_ha_load_test_tool_hbase, 
> region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on 
> server=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 
> but state  has otherwise.
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1037)
>   ... 7 more
>  *
> Cause:
> org.apache.hadoop.hbase.YouAreDeadException: 
> org.apache.hadoop.hbase.YouAreDeadException: rit=OPEN, 
> location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353,
>table=test_hbase_ha_load_test_tool_hbase, 
> region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on 
> server=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 
> but state  has otherwise.
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1065)
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.reportOnlineRegions(AssignmentManager.java:987)
>   at 
> org.apache.hadoop.hbase.master.MasterRpcServices.regionServerReport(MasterRpcServices.java:459)
>   at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:13118)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131)
>   at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
>   at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
> Caused by: org.apache.hadoop.hbase.exceptions.UnexpectedStateException: 
> rit=OPEN, 
> location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353,
>  table=test_hbase_ha_load_test_tool_hbase, 
> region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on 
> server=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 
> but state  has 

[jira] [Commented] (HBASE-20552) HBase RegionServer was shutdown due to UnexpectedStateException

2018-05-11 Thread Umesh Agashe (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472885#comment-16472885
 ] 

Umesh Agashe commented on HBASE-20552:
--

I think its real problem in the code. Working on repro and the patch.

> HBase RegionServer was shutdown due to UnexpectedStateException
> ---
>
> Key: HBASE-20552
> URL: https://issues.apache.org/jira/browse/HBASE-20552
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Romil Choksi
>Assignee: Umesh Agashe
>Priority: Critical
> Attachments: 
> 102143-master-ctr-e138-1518143905142-279227-01-03.hwx.site.log, 
> 102143-master-ctr-e138-1518143905142-279227-01-05.hwx.site.log, 
> 102143-regionserver-ctr-e138-1518143905142-279227-01-02.hwx.site.log, 
> 102143-regionserver-ctr-e138-1518143905142-279227-01-07.hwx.site.log, 
> 102143-regionserver-ctr-e138-1518143905142-279227-01-08.hwx.site.log
>
>
> This was observed during cluster testing (source code sync'ed with hbase-2.0, 
> built May 2nd):
> {code}
> 2018-05-02 05:44:10,089 ERROR 
> [RpcServer.default.FPBQ.Fifo.handler=28,queue=1,port=2] 
> master.MasterRpcServices: Region server 
> ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 reported 
> a fatal error:
> * ABORTING region server 
> ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474: 
> org.apache.hadoop.hbase.YouAreDeadException: rit=OPEN, location=ctr-e138- 
> 1518143905142-279227-01-07.hwx.site,16020,1525239609353, 
> table=test_hbase_ha_load_test_tool_hbase, 
> region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on server=ctr-e138-  
> 1518143905142-279227-01-02.hwx.site,16020,1525239334474 but state has 
> otherwise.
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1065)
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.reportOnlineRegions(AssignmentManager.java:987)
>   at 
> org.apache.hadoop.hbase.master.MasterRpcServices.regionServerReport(MasterRpcServices.java:459)
>   at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:13118)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131)
>   at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
>   at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
> Caused by: org.apache.hadoop.hbase.exceptions.UnexpectedStateException: 
> rit=OPEN, 
> location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353,
>  table=test_hbase_ha_load_test_tool_hbase, 
> region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on 
> server=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 
> but state  has otherwise.
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1037)
>   ... 7 more
>  *
> Cause:
> org.apache.hadoop.hbase.YouAreDeadException: 
> org.apache.hadoop.hbase.YouAreDeadException: rit=OPEN, 
> location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353,
>table=test_hbase_ha_load_test_tool_hbase, 
> region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on 
> server=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 
> but state  has otherwise.
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1065)
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.reportOnlineRegions(AssignmentManager.java:987)
>   at 
> org.apache.hadoop.hbase.master.MasterRpcServices.regionServerReport(MasterRpcServices.java:459)
>   at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:13118)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131)
>   at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
>   at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
> Caused by: org.apache.hadoop.hbase.exceptions.UnexpectedStateException: 
> rit=OPEN, 
> location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353,
>  table=test_hbase_ha_load_test_tool_hbase, 
> region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on 
> server=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 
> but state  has otherwise.
>   at 

[jira] [Work started] (HBASE-20552) HBase RegionServer was shutdown due to UnexpectedStateException

2018-05-11 Thread Umesh Agashe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HBASE-20552 started by Umesh Agashe.

> HBase RegionServer was shutdown due to UnexpectedStateException
> ---
>
> Key: HBASE-20552
> URL: https://issues.apache.org/jira/browse/HBASE-20552
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Romil Choksi
>Assignee: Umesh Agashe
>Priority: Critical
> Attachments: 
> 102143-master-ctr-e138-1518143905142-279227-01-03.hwx.site.log, 
> 102143-master-ctr-e138-1518143905142-279227-01-05.hwx.site.log, 
> 102143-regionserver-ctr-e138-1518143905142-279227-01-02.hwx.site.log, 
> 102143-regionserver-ctr-e138-1518143905142-279227-01-07.hwx.site.log, 
> 102143-regionserver-ctr-e138-1518143905142-279227-01-08.hwx.site.log
>
>
> This was observed during cluster testing (source code sync'ed with hbase-2.0, 
> built May 2nd):
> {code}
> 2018-05-02 05:44:10,089 ERROR 
> [RpcServer.default.FPBQ.Fifo.handler=28,queue=1,port=2] 
> master.MasterRpcServices: Region server 
> ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 reported 
> a fatal error:
> * ABORTING region server 
> ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474: 
> org.apache.hadoop.hbase.YouAreDeadException: rit=OPEN, location=ctr-e138- 
> 1518143905142-279227-01-07.hwx.site,16020,1525239609353, 
> table=test_hbase_ha_load_test_tool_hbase, 
> region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on server=ctr-e138-  
> 1518143905142-279227-01-02.hwx.site,16020,1525239334474 but state has 
> otherwise.
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1065)
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.reportOnlineRegions(AssignmentManager.java:987)
>   at 
> org.apache.hadoop.hbase.master.MasterRpcServices.regionServerReport(MasterRpcServices.java:459)
>   at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:13118)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131)
>   at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
>   at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
> Caused by: org.apache.hadoop.hbase.exceptions.UnexpectedStateException: 
> rit=OPEN, 
> location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353,
>  table=test_hbase_ha_load_test_tool_hbase, 
> region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on 
> server=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 
> but state  has otherwise.
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1037)
>   ... 7 more
>  *
> Cause:
> org.apache.hadoop.hbase.YouAreDeadException: 
> org.apache.hadoop.hbase.YouAreDeadException: rit=OPEN, 
> location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353,
>table=test_hbase_ha_load_test_tool_hbase, 
> region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on 
> server=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 
> but state  has otherwise.
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1065)
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.reportOnlineRegions(AssignmentManager.java:987)
>   at 
> org.apache.hadoop.hbase.master.MasterRpcServices.regionServerReport(MasterRpcServices.java:459)
>   at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:13118)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131)
>   at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
>   at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
> Caused by: org.apache.hadoop.hbase.exceptions.UnexpectedStateException: 
> rit=OPEN, 
> location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353,
>  table=test_hbase_ha_load_test_tool_hbase, 
> region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on 
> server=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 
> but state  has otherwise.
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.ja

[jira] [Commented] (HBASE-20552) HBase RegionServer was shutdown due to UnexpectedStateException

2018-05-11 Thread Umesh Agashe (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472482#comment-16472482
 ] 

Umesh Agashe commented on HBASE-20552:
--

Further, M003 starts SCP with pid=507 for R007:
{code:java}
2018-05-02 05:44:08,413 INFO  [PEWorker-6] procedure.ServerCrashProcedure: 
Start pid=507, state=RUNNABLE:SERVER_CRASH_START; ServerCrashProcedure 
server=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525238558502, 
splitWal=true, meta=false{code}
This starts AssignProcedure with pid=508 for region 
94f6ca283dbb4445b2bcdc321b734d28:
{code:java}
2018-05-02 05:44:08,480 INFO  [PEWorker-6] assignment.AssignProcedure: Starting 
pid=508, ppid=507, state=RUNNABLE:REGION_TRANSITION_QUEUE; AssignProcedure 
table=test_hbase_ha_load_test_tool_hbase, 
region=94f6ca283dbb4445b2bcdc321b734d28; rit=OFFLINE, 
location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525238558502; 
forceNewPlan=false, retain=true
2018-05-02 05:44:08,659 INFO  [PEWorker-11] assignment.RegionStateStore: 
pid=508 updating hbase:meta row=94f6ca283dbb4445b2bcdc321b734d28, 
regionState=OPENING, 
regionLocation=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353
2018-05-02 05:44:08,727 INFO  [PEWorker-11] 
assignment.RegionTransitionProcedure: Dispatch pid=508, ppid=507, 
state=RUNNABLE:REGION_TRANSITION_DISPATCH; AssignProcedure 
table=test_hbase_ha_load_test_tool_hbase, 
region=94f6ca283dbb4445b2bcdc321b734d28; rit=OPENING, 
location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353
...
2018-05-02 05:44:09,213 DEBUG 
[RpcServer.default.FPBQ.Fifo.handler=28,queue=1,port=2] 
assignment.RegionTransitionProcedure: Received report OPENED seqId=13402, 
pid=508, ppid=507, state=RUNNABLE:REGION_TRANSITION_DISPATCH; AssignProcedure 
table=test_hbase_ha_load_test_tool_hbase, 
region=94f6ca283dbb4445b2bcdc321b734d28; rit=OPENING, 
location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353
2018-05-02 05:44:09,213 DEBUG [PEWorker-12] 
assignment.RegionTransitionProcedure: Finishing pid=508, ppid=507, 
state=RUNNABLE:REGION_TRANSITION_FINISH; AssignProcedure 
table=test_hbase_ha_load_test_tool_hbase, 
region=94f6ca283dbb4445b2bcdc321b734d28; rit=OPENING, 
location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353
2018-05-02 05:44:09,214 INFO [PEWorker-12] assignment.RegionStateStore: pid=508 
updating hbase:meta row=94f6ca283dbb4445b2bcdc321b734d28, regionState=OPEN, 
openSeqNum=13402, 
regionLocation=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353
2018-05-02 05:44:09,258 INFO [PEWorker-12] procedure2.ProcedureExecutor: 
Finished subprocedure(s) of pid=507, state=RUNNABLE:SERVER_CRASH_HANDLE_RIT2; 
ServerCrashProcedure 
server=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525238558502, 
splitWal=true, meta=false; resume parent processing.
2018-05-02 05:44:09,258 INFO [PEWorker-12] procedure2.ProcedureExecutor: 
Finished pid=508, ppid=507, state=SUCCESS; AssignProcedure 
table=test_hbase_ha_load_test_tool_hbase, 
region=94f6ca283dbb4445b2bcdc321b734d28 in 764msec
2018-05-02 05:44:09,273 INFO [PEWorker-14] procedure2.ProcedureExecutor: 
Finished pid=507, state=SUCCESS; ServerCrashProcedure 
server=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525238558502, 
splitWal=true, meta=false in 975msec{code}

Strange thing is SCP for R007 is assigning region back to R007!

> HBase RegionServer was shutdown due to UnexpectedStateException
> ---
>
> Key: HBASE-20552
> URL: https://issues.apache.org/jira/browse/HBASE-20552
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Romil Choksi
>Assignee: Umesh Agashe
>Priority: Critical
> Attachments: 
> 102143-master-ctr-e138-1518143905142-279227-01-03.hwx.site.log, 
> 102143-master-ctr-e138-1518143905142-279227-01-05.hwx.site.log, 
> 102143-regionserver-ctr-e138-1518143905142-279227-01-02.hwx.site.log, 
> 102143-regionserver-ctr-e138-1518143905142-279227-01-07.hwx.site.log, 
> 102143-regionserver-ctr-e138-1518143905142-279227-01-08.hwx.site.log
>
>
> This was observed during cluster testing (source code sync'ed with hbase-2.0, 
> built May 2nd):
> {code}
> 2018-05-02 05:44:10,089 ERROR 
> [RpcServer.default.FPBQ.Fifo.handler=28,queue=1,port=2] 
> master.MasterRpcServices: Region server 
> ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 reported 
> a fatal error:
> * ABORTING region server 
> ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474: 
> org.apache.hadoop.hbase.YouAreDeadException: rit=OPEN, location=ctr-e138- 
> 1518143905142-279227-01-07.hwx.site,16020,1525239609353, 
> table=test_hbase_ha_load_test_tool_hbase, 
> r

[jira] [Commented] (HBASE-20552) HBase RegionServer was shutdown due to UnexpectedStateException

2018-05-11 Thread Umesh Agashe (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472456#comment-16472456
 ] 

Umesh Agashe commented on HBASE-20552:
--

bq. Log for server 0002 was attached already.

Thanks! and also for 007?

> HBase RegionServer was shutdown due to UnexpectedStateException
> ---
>
> Key: HBASE-20552
> URL: https://issues.apache.org/jira/browse/HBASE-20552
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Romil Choksi
>Assignee: Umesh Agashe
>Priority: Critical
> Attachments: 
> 102143-master-ctr-e138-1518143905142-279227-01-03.hwx.site.log, 
> 102143-master-ctr-e138-1518143905142-279227-01-05.hwx.site.log, 
> 102143-regionserver-ctr-e138-1518143905142-279227-01-02.hwx.site.log
>
>
> This was observed during cluster testing (source code sync'ed with hbase-2.0, 
> built May 2nd):
> {code}
> 2018-05-02 05:44:10,089 ERROR 
> [RpcServer.default.FPBQ.Fifo.handler=28,queue=1,port=2] 
> master.MasterRpcServices: Region server 
> ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 reported 
> a fatal error:
> * ABORTING region server 
> ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474: 
> org.apache.hadoop.hbase.YouAreDeadException: rit=OPEN, location=ctr-e138- 
> 1518143905142-279227-01-07.hwx.site,16020,1525239609353, 
> table=test_hbase_ha_load_test_tool_hbase, 
> region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on server=ctr-e138-  
> 1518143905142-279227-01-02.hwx.site,16020,1525239334474 but state has 
> otherwise.
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1065)
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.reportOnlineRegions(AssignmentManager.java:987)
>   at 
> org.apache.hadoop.hbase.master.MasterRpcServices.regionServerReport(MasterRpcServices.java:459)
>   at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:13118)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131)
>   at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
>   at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
> Caused by: org.apache.hadoop.hbase.exceptions.UnexpectedStateException: 
> rit=OPEN, 
> location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353,
>  table=test_hbase_ha_load_test_tool_hbase, 
> region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on 
> server=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 
> but state  has otherwise.
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1037)
>   ... 7 more
>  *
> Cause:
> org.apache.hadoop.hbase.YouAreDeadException: 
> org.apache.hadoop.hbase.YouAreDeadException: rit=OPEN, 
> location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353,
>table=test_hbase_ha_load_test_tool_hbase, 
> region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on 
> server=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 
> but state  has otherwise.
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1065)
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.reportOnlineRegions(AssignmentManager.java:987)
>   at 
> org.apache.hadoop.hbase.master.MasterRpcServices.regionServerReport(MasterRpcServices.java:459)
>   at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:13118)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131)
>   at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
>   at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
> Caused by: org.apache.hadoop.hbase.exceptions.UnexpectedStateException: 
> rit=OPEN, 
> location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353,
>  table=test_hbase_ha_load_test_tool_hbase, 
> region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on 
> server=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 
> but state  has otherwise.
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1037)
>   ... 7 more
>   at sun.reflect.N

[jira] [Commented] (HBASE-20552) HBase RegionServer was shutdown due to UnexpectedStateException

2018-05-11 Thread Umesh Agashe (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472438#comment-16472438
 ] 

Umesh Agashe commented on HBASE-20552:
--

bq. Was there any region on 0008 you're interested in ?

670f6b815d2acac905130e5440d59304
1d954f21d711345a9587d995cecea136
91f73e76bbe7bc8a61b1b1299d34c6ab

> HBase RegionServer was shutdown due to UnexpectedStateException
> ---
>
> Key: HBASE-20552
> URL: https://issues.apache.org/jira/browse/HBASE-20552
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Romil Choksi
>Assignee: Umesh Agashe
>Priority: Critical
> Attachments: 
> 102143-master-ctr-e138-1518143905142-279227-01-03.hwx.site.log, 
> 102143-master-ctr-e138-1518143905142-279227-01-05.hwx.site.log, 
> 102143-regionserver-ctr-e138-1518143905142-279227-01-02.hwx.site.log
>
>
> This was observed during cluster testing (source code sync'ed with hbase-2.0, 
> built May 2nd):
> {code}
> 2018-05-02 05:44:10,089 ERROR 
> [RpcServer.default.FPBQ.Fifo.handler=28,queue=1,port=2] 
> master.MasterRpcServices: Region server 
> ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 reported 
> a fatal error:
> * ABORTING region server 
> ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474: 
> org.apache.hadoop.hbase.YouAreDeadException: rit=OPEN, location=ctr-e138- 
> 1518143905142-279227-01-07.hwx.site,16020,1525239609353, 
> table=test_hbase_ha_load_test_tool_hbase, 
> region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on server=ctr-e138-  
> 1518143905142-279227-01-02.hwx.site,16020,1525239334474 but state has 
> otherwise.
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1065)
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.reportOnlineRegions(AssignmentManager.java:987)
>   at 
> org.apache.hadoop.hbase.master.MasterRpcServices.regionServerReport(MasterRpcServices.java:459)
>   at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:13118)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131)
>   at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
>   at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
> Caused by: org.apache.hadoop.hbase.exceptions.UnexpectedStateException: 
> rit=OPEN, 
> location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353,
>  table=test_hbase_ha_load_test_tool_hbase, 
> region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on 
> server=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 
> but state  has otherwise.
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1037)
>   ... 7 more
>  *
> Cause:
> org.apache.hadoop.hbase.YouAreDeadException: 
> org.apache.hadoop.hbase.YouAreDeadException: rit=OPEN, 
> location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353,
>table=test_hbase_ha_load_test_tool_hbase, 
> region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on 
> server=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 
> but state  has otherwise.
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1065)
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.reportOnlineRegions(AssignmentManager.java:987)
>   at 
> org.apache.hadoop.hbase.master.MasterRpcServices.regionServerReport(MasterRpcServices.java:459)
>   at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:13118)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131)
>   at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
>   at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
> Caused by: org.apache.hadoop.hbase.exceptions.UnexpectedStateException: 
> rit=OPEN, 
> location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353,
>  table=test_hbase_ha_load_test_tool_hbase, 
> region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on 
> server=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 
> but state  has otherwise.
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkO

[jira] [Comment Edited] (HBASE-20552) HBase RegionServer was shutdown due to UnexpectedStateException

2018-05-11 Thread Umesh Agashe (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472409#comment-16472409
 ] 

Umesh Agashe edited comment on HBASE-20552 at 5/11/18 6:23 PM:
---

Usually following warnings can be ignored. But these messages followed by 
"Completed pid=" looks trouble. When M003 became active at around 2018-05-02 
05:43:33, there are a few warnings while reading master proc wal:
{code:java}
2018-05-02 05:43:33,529 WARN 
[master/ctr-e138-1518143905142-279227-01-03:2] wal.WALProcedureStore: 
Unable to read tracker for 
hdfs://mycluster/apps/hbase/data/MasterProcWALs/pv2-0004.log - 
Invalid Trailer version. got 8 expected 1
2018-05-02 05:43:33,638 DEBUG 
[master/ctr-e138-1518143905142-279227-01-03:2] wal.WALProcedureStore: 
Roll new state log: 5
2018-05-02 05:43:33,655 INFO 
[master/ctr-e138-1518143905142-279227-01-03:2] 
procedure2.ProcedureExecutor: Recovered WALProcedureStore lease in 219msec
2018-05-02 05:43:33,681 INFO 
[master/ctr-e138-1518143905142-279227-01-03:2] 
wal.ProcedureWALFormatReader: Rebuilding tracker for 
hdfs://mycluster/apps/hbase/data/MasterProcWALs/pv2-0004.log
2018-05-02 05:43:33,816 WARN 
[master/ctr-e138-1518143905142-279227-01-03:2] 
wal.ProcedureWALFormatReader: Nothing left to decode. Exiting with missing EOF, 
log=hdfs://mycluster/apps/hbase/data/MasterProcWALs/pv2-0004.log

2018-05-02 05:43:33,875 DEBUG 
[master/ctr-e138-1518143905142-279227-01-03:2] 
procedure2.ProcedureExecutor: Completed pid=467, state=SUCCESS; 
MoveRegionProcedure hri=4c37ee7a4e1210e481debdc2933fc4d2, 
source=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474, 
destination=ctr-e138-1518143905142-279227-01-03.hwx.site,16020,15252394258262018-05-02
 05:43:33,876 DEBUG [master/ctr-e138-1518143905142-279227-01-03:2] 
procedure2.ProcedureExecutor: Completed pid=465, state=SUCCESS; 
MoveRegionProcedure hri=94f6ca283dbb4445b2bcdc321b734d28, 
source=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474, 
destination=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525238558502
2018-05-02 05:43:33,876 DEBUG 
[master/ctr-e138-1518143905142-279227-01-03:2] 
procedure2.ProcedureExecutor: Completed pid=462, state=SUCCESS; 
MoveRegionProcedure hri=a8ff96226d546f0ea151823ae73e5a1b, 
source=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474, 
destination=ctr-e138-1518143905142-279227-01-08.hwx.site,16020,1525238658606{code}
M003 during startup has no log messages for procedures with ids 468 to 504 even 
though they ran and completed on M005. This is unusual. RecoverMetaProcedure on 
M003 starts with id 505 which is correct.

Orthogonal to above observation we have meta update issue as well. On M005, 
pid=471 is SCP for R007 which also hosts meta. Meta is re-assigned with pid=472 
to R002 which is followed by other region assignments
{code:java}
pid=478 e75a388bc2011feed75bdc1a0e99a9a9   
regionLocation=ctr-e138-1518143905142-279227-01-02.hwx.site
pid=474 670f6b815d2acac905130e5440d59304   
regionLocation=ctr-e138-1518143905142-279227-01-08.hwx.site
pid=479 c963eb77dbdc6dbab886dbe4eebba5ad  
regionLocation=ctr-e138-1518143905142-279227-01-06.hwx.site
pid=481 b5180eee96b616afdf79578309c66a11   
regionLocation=ctr-e138-1518143905142-279227-01-02.hwx.site
pid=486 8dc6fd2022c2fdf8c065fbd16cadaaca   
regionLocation=ctr-e138-1518143905142-279227-01-03.hwx.site
pid=480 f3db9f9879ed03f488dcb89bea834237   
regionLocation=ctr-e138-1518143905142-279227-01-02.hwx.site
pid=484 c078deb2474e9c19b85b5fdb9efaa47d   
regionLocation=ctr-e138-1518143905142-279227-01-06.hwx.site
pid=475 94f6ca283dbb4445b2bcdc321b734d28   
regionLocation=ctr-e138-1518143905142-279227-01-02.hwx.site
pid=483 1d954f21d711345a9587d995cecea136   
regionLocation=ctr-e138-1518143905142-279227-01-08.hwx.site
pid=476 1595f38ee901be7c67b997fe2fc95951   
regionLocation=ctr-e138-1518143905142-279227-01-06.hwx.site
pid=482 a6e0d7561c4f19e78f94d37462588281   
regionLocation=ctr-e138-1518143905142-279227-01-06.hwx.site
pid=485 91f73e76bbe7bc8a61b1b1299d34c6ab   
regionLocation=ctr-e138-1518143905142-279227-01-08.hwx.site
pid=477 a0620fc83de532a37f6a9bb8f99cc6c4   
regionLocation=ctr-e138-1518143905142-279227-01-03.hwx.site{code}
>From the logs all the procedures finished successfully without skipping steps. 
>Meta doesn't seem to be updated for 4 of these assignments. When M003 logs all 
>regions from meta at startup, locations for following 4 regions don't match 
>with the target locations in above procedures:
{code:java}
670f6b815d2acac905130e5440d59304   
ctr-e138-1518143905142-279227-01-08.hwx.site 
lastHost=ctr-e138-1518143905142-279227-01-07.hwx.site 
regionLocation=ctr-e138-1518143905142-279227-01-

[jira] [Commented] (HBASE-20552) HBase RegionServer was shutdown due to UnexpectedStateException

2018-05-11 Thread Umesh Agashe (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472409#comment-16472409
 ] 

Umesh Agashe commented on HBASE-20552:
--

Usually following warnings can be ignored. But these messages followed by 
"Completed pid=" looks trouble. When M003 became active at around 2018-05-02 
05:43:33, there are a few warnings while reading master proc wal:
{code:java}
2018-05-02 05:43:33,529 WARN 
[master/ctr-e138-1518143905142-279227-01-03:2] wal.WALProcedureStore: 
Unable to read tracker for 
hdfs://mycluster/apps/hbase/data/MasterProcWALs/pv2-0004.log - 
Invalid Trailer version. got 8 expected 1
2018-05-02 05:43:33,638 DEBUG 
[master/ctr-e138-1518143905142-279227-01-03:2] wal.WALProcedureStore: 
Roll new state log: 5
2018-05-02 05:43:33,655 INFO 
[master/ctr-e138-1518143905142-279227-01-03:2] 
procedure2.ProcedureExecutor: Recovered WALProcedureStore lease in 219msec
2018-05-02 05:43:33,681 INFO 
[master/ctr-e138-1518143905142-279227-01-03:2] 
wal.ProcedureWALFormatReader: Rebuilding tracker for 
hdfs://mycluster/apps/hbase/data/MasterProcWALs/pv2-0004.log
2018-05-02 05:43:33,816 WARN 
[master/ctr-e138-1518143905142-279227-01-03:2] 
wal.ProcedureWALFormatReader: Nothing left to decode. Exiting with missing EOF, 
log=hdfs://mycluster/apps/hbase/data/MasterProcWALs/pv2-0004.log

2018-05-02 05:43:33,875 DEBUG 
[master/ctr-e138-1518143905142-279227-01-03:2] 
procedure2.ProcedureExecutor: Completed pid=467, state=SUCCESS; 
MoveRegionProcedure hri=4c37ee7a4e1210e481debdc2933fc4d2, 
source=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474, 
destination=ctr-e138-1518143905142-279227-01-03.hwx.site,16020,15252394258262018-05-02
 05:43:33,876 DEBUG [master/ctr-e138-1518143905142-279227-01-03:2] 
procedure2.ProcedureExecutor: Completed pid=465, state=SUCCESS; 
MoveRegionProcedure hri=94f6ca283dbb4445b2bcdc321b734d28, 
source=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474, 
destination=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525238558502
2018-05-02 05:43:33,876 DEBUG 
[master/ctr-e138-1518143905142-279227-01-03:2] 
procedure2.ProcedureExecutor: Completed pid=462, state=SUCCESS; 
MoveRegionProcedure hri=a8ff96226d546f0ea151823ae73e5a1b, 
source=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474, 
destination=ctr-e138-1518143905142-279227-01-08.hwx.site,16020,1525238658606{code}
M003 during startup has no log messages for procedures with ids 468 to 504 even 
though they are ran and completed on M005. This is unusual. 
RecoverMetaProcedure on M003 starts with id 505 which is correct.

Orthogonal to above observation we have meta update issue as well. On M005, 
pid=471 is SCP for R007 which also hosts meta. Meta is re-assigned with pid=472 
to R002 which is followed by other region assignments
{code:java}
pid=478 e75a388bc2011feed75bdc1a0e99a9a9   
regionLocation=ctr-e138-1518143905142-279227-01-02.hwx.site
pid=474 670f6b815d2acac905130e5440d59304   
regionLocation=ctr-e138-1518143905142-279227-01-08.hwx.site
pid=479 c963eb77dbdc6dbab886dbe4eebba5ad  
regionLocation=ctr-e138-1518143905142-279227-01-06.hwx.site
pid=481 b5180eee96b616afdf79578309c66a11   
regionLocation=ctr-e138-1518143905142-279227-01-02.hwx.site
pid=486 8dc6fd2022c2fdf8c065fbd16cadaaca   
regionLocation=ctr-e138-1518143905142-279227-01-03.hwx.site
pid=480 f3db9f9879ed03f488dcb89bea834237   
regionLocation=ctr-e138-1518143905142-279227-01-02.hwx.site
pid=484 c078deb2474e9c19b85b5fdb9efaa47d   
regionLocation=ctr-e138-1518143905142-279227-01-06.hwx.site
pid=475 94f6ca283dbb4445b2bcdc321b734d28   
regionLocation=ctr-e138-1518143905142-279227-01-02.hwx.site
pid=483 1d954f21d711345a9587d995cecea136   
regionLocation=ctr-e138-1518143905142-279227-01-08.hwx.site
pid=476 1595f38ee901be7c67b997fe2fc95951   
regionLocation=ctr-e138-1518143905142-279227-01-06.hwx.site
pid=482 a6e0d7561c4f19e78f94d37462588281   
regionLocation=ctr-e138-1518143905142-279227-01-06.hwx.site
pid=485 91f73e76bbe7bc8a61b1b1299d34c6ab   
regionLocation=ctr-e138-1518143905142-279227-01-08.hwx.site
pid=477 a0620fc83de532a37f6a9bb8f99cc6c4   
regionLocation=ctr-e138-1518143905142-279227-01-03.hwx.site{code}
>From the logs all the procedures finished successfully without skipping steps. 
>Meta doesn't seem to be updated for 4 of these assignments. When M003 logs all 
>regions from meta at startup, locations for following 4 regions don't match 
>with the target locations in above procedures:
{code:java}
670f6b815d2acac905130e5440d59304   
ctr-e138-1518143905142-279227-01-08.hwx.site 
lastHost=ctr-e138-1518143905142-279227-01-07.hwx.site 
regionLocation=ctr-e138-1518143905142-279227-01-07.hwx.site
94f6ca283dbb4445b2bcdc321b734

[jira] [Commented] (HBASE-20544) downstream HBaseTestingUtility fails with invalid port

2018-05-11 Thread Umesh Agashe (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472384#comment-16472384
 ] 

Umesh Agashe commented on HBASE-20544:
--

+1 for addendum

> downstream HBaseTestingUtility fails with invalid port
> --
>
> Key: HBASE-20544
> URL: https://issues.apache.org/jira/browse/HBASE-20544
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.0.0
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Blocker
> Fix For: 3.0.0, 2.1.0, 2.0.1
>
> Attachments: HBASE-20544.0.patch, HBASE-20544.1.patch, 
> HBASE-20544.2.patch, HBASE-20544.addendum.0.patch
>
>
> Attempting to update hbase-downstreamer to use our 2.0.0 release fails with 
> an invalid port in the event that {{hbase.localcluster.assign.random.ports}} 
> isn't set (or is set to false, specifically):
> {code}
> 2018-05-08 06:10:06,508 ERROR [main] regionserver.HRegionServer 
> (HRegionServer.java:(631)) - Failed construction RegionServer
> java.lang.IllegalArgumentException: port out of range:-1
>   at java.net.InetSocketAddress.checkPort(InetSocketAddress.java:143)
>   at java.net.InetSocketAddress.(InetSocketAddress.java:224)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.(RSRpcServices.java:1217)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.(RSRpcServices.java:1184)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.createRpcServices(HRegionServer.java:723)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.(HRegionServer.java:561)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.(MiniHBaseCluster.java:147)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.hbase.util.JVMClusterUtil.createRegionServerThread(JVMClusterUtil.java:86)
>   at 
> org.apache.hadoop.hbase.LocalHBaseCluster.addRegionServer(LocalHBaseCluster.java:184)
>   at 
> org.apache.hadoop.hbase.LocalHBaseCluster$1.run(LocalHBaseCluster.java:198)
>   at 
> org.apache.hadoop.hbase.LocalHBaseCluster$1.run(LocalHBaseCluster.java:195)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at 
> org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:313)
>   at 
> org.apache.hadoop.hbase.LocalHBaseCluster.addRegionServer(LocalHBaseCluster.java:194)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:261)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster.(MiniHBaseCluster.java:121)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1042)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:988)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:859)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:853)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:782)
>   at 
> org.hbase.downstreamer.TestHBaseMiniCluster.testSpinUpMiniHBaseCluster(TestHBaseMiniCluster.java:16)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
>   at org.junit.runners.Pa

[jira] [Assigned] (HBASE-20552) HBase RegionServer was shutdown due to UnexpectedStateException

2018-05-09 Thread Umesh Agashe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Umesh Agashe reassigned HBASE-20552:


Assignee: Umesh Agashe

> HBase RegionServer was shutdown due to UnexpectedStateException
> ---
>
> Key: HBASE-20552
> URL: https://issues.apache.org/jira/browse/HBASE-20552
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Romil Choksi
>Assignee: Umesh Agashe
>Priority: Critical
> Attachments: 
> 102143-master-ctr-e138-1518143905142-279227-01-03.hwx.site.log, 
> 102143-master-ctr-e138-1518143905142-279227-01-05.hwx.site.log, 
> 102143-regionserver-ctr-e138-1518143905142-279227-01-02.hwx.site.log
>
>
> This was observed during cluster testing (source code sync'ed with hbase-2.0, 
> built May 2nd):
> {code}
> 2018-05-02 05:44:10,089 ERROR 
> [RpcServer.default.FPBQ.Fifo.handler=28,queue=1,port=2] 
> master.MasterRpcServices: Region server 
> ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 reported 
> a fatal error:
> * ABORTING region server 
> ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474: 
> org.apache.hadoop.hbase.YouAreDeadException: rit=OPEN, location=ctr-e138- 
> 1518143905142-279227-01-07.hwx.site,16020,1525239609353, 
> table=test_hbase_ha_load_test_tool_hbase, 
> region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on server=ctr-e138-  
> 1518143905142-279227-01-02.hwx.site,16020,1525239334474 but state has 
> otherwise.
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1065)
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.reportOnlineRegions(AssignmentManager.java:987)
>   at 
> org.apache.hadoop.hbase.master.MasterRpcServices.regionServerReport(MasterRpcServices.java:459)
>   at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:13118)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131)
>   at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
>   at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
> Caused by: org.apache.hadoop.hbase.exceptions.UnexpectedStateException: 
> rit=OPEN, 
> location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353,
>  table=test_hbase_ha_load_test_tool_hbase, 
> region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on 
> server=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 
> but state  has otherwise.
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1037)
>   ... 7 more
>  *
> Cause:
> org.apache.hadoop.hbase.YouAreDeadException: 
> org.apache.hadoop.hbase.YouAreDeadException: rit=OPEN, 
> location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353,
>table=test_hbase_ha_load_test_tool_hbase, 
> region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on 
> server=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 
> but state  has otherwise.
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1065)
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.reportOnlineRegions(AssignmentManager.java:987)
>   at 
> org.apache.hadoop.hbase.master.MasterRpcServices.regionServerReport(MasterRpcServices.java:459)
>   at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:13118)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131)
>   at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
>   at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
> Caused by: org.apache.hadoop.hbase.exceptions.UnexpectedStateException: 
> rit=OPEN, 
> location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353,
>  table=test_hbase_ha_load_test_tool_hbase, 
> region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on 
> server=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 
> but state  has otherwise.
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1037)
>   ... 7 more
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorA

[jira] [Commented] (HBASE-20552) HBase RegionServer was shutdown due to UnexpectedStateException

2018-05-09 Thread Umesh Agashe (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16469544#comment-16469544
 ] 

Umesh Agashe commented on HBASE-20552:
--

Thanks for attaching the logs. Need to go through logs to see if its similar to 
what we have seen so far...

> HBase RegionServer was shutdown due to UnexpectedStateException
> ---
>
> Key: HBASE-20552
> URL: https://issues.apache.org/jira/browse/HBASE-20552
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Romil Choksi
>Priority: Critical
> Attachments: 
> 102143-master-ctr-e138-1518143905142-279227-01-03.hwx.site.log, 
> 102143-master-ctr-e138-1518143905142-279227-01-05.hwx.site.log, 
> 102143-regionserver-ctr-e138-1518143905142-279227-01-02.hwx.site.log
>
>
> This was observed during cluster testing (source code sync'ed with hbase-2.0, 
> built May 2nd):
> {code}
> 2018-05-02 05:44:10,089 ERROR 
> [RpcServer.default.FPBQ.Fifo.handler=28,queue=1,port=2] 
> master.MasterRpcServices: Region server 
> ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 reported 
> a fatal error:
> * ABORTING region server 
> ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474: 
> org.apache.hadoop.hbase.YouAreDeadException: rit=OPEN, location=ctr-e138- 
> 1518143905142-279227-01-07.hwx.site,16020,1525239609353, 
> table=test_hbase_ha_load_test_tool_hbase, 
> region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on server=ctr-e138-  
> 1518143905142-279227-01-02.hwx.site,16020,1525239334474 but state has 
> otherwise.
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1065)
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.reportOnlineRegions(AssignmentManager.java:987)
>   at 
> org.apache.hadoop.hbase.master.MasterRpcServices.regionServerReport(MasterRpcServices.java:459)
>   at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:13118)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131)
>   at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
>   at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
> Caused by: org.apache.hadoop.hbase.exceptions.UnexpectedStateException: 
> rit=OPEN, 
> location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353,
>  table=test_hbase_ha_load_test_tool_hbase, 
> region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on 
> server=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 
> but state  has otherwise.
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1037)
>   ... 7 more
>  *
> Cause:
> org.apache.hadoop.hbase.YouAreDeadException: 
> org.apache.hadoop.hbase.YouAreDeadException: rit=OPEN, 
> location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353,
>table=test_hbase_ha_load_test_tool_hbase, 
> region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on 
> server=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 
> but state  has otherwise.
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1065)
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.reportOnlineRegions(AssignmentManager.java:987)
>   at 
> org.apache.hadoop.hbase.master.MasterRpcServices.regionServerReport(MasterRpcServices.java:459)
>   at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:13118)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131)
>   at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
>   at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
> Caused by: org.apache.hadoop.hbase.exceptions.UnexpectedStateException: 
> rit=OPEN, 
> location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353,
>  table=test_hbase_ha_load_test_tool_hbase, 
> region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on 
> server=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 
> but state  has otherwise.
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1037)
>   ... 7 more
>   at sun.reflect.N

[jira] [Commented] (HBASE-20544) downstream HBaseTestingUtility fails with invalid port

2018-05-09 Thread Umesh Agashe (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16469511#comment-16469511
 ] 

Umesh Agashe commented on HBASE-20544:
--

IMO defaults ever being set to -1 is possible but not probable. If defaults are 
ever set to -1 then shouldn't the condition be like:
{code:java}
int port = conf.getInt(HConstants.MASTER_INFO_PORT, 
HConstants.DEFAULT_MASTER_INFOPORT);
if (port != -1 && port == HConstants.DEFAULT_MASTER_INFOPORT) {
{code}

Feel free to ignore the nit. I've already added my +1 to the changes.

> downstream HBaseTestingUtility fails with invalid port
> --
>
> Key: HBASE-20544
> URL: https://issues.apache.org/jira/browse/HBASE-20544
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.0.0
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Blocker
> Fix For: 3.0.0, 2.1.0, 2.0.1
>
> Attachments: HBASE-20544.0.patch, HBASE-20544.1.patch, 
> HBASE-20544.2.patch
>
>
> Attempting to update hbase-downstreamer to use our 2.0.0 release fails with 
> an invalid port in the event that {{hbase.localcluster.assign.random.ports}} 
> isn't set (or is set to false, specifically):
> {code}
> 2018-05-08 06:10:06,508 ERROR [main] regionserver.HRegionServer 
> (HRegionServer.java:(631)) - Failed construction RegionServer
> java.lang.IllegalArgumentException: port out of range:-1
>   at java.net.InetSocketAddress.checkPort(InetSocketAddress.java:143)
>   at java.net.InetSocketAddress.(InetSocketAddress.java:224)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.(RSRpcServices.java:1217)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.(RSRpcServices.java:1184)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.createRpcServices(HRegionServer.java:723)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.(HRegionServer.java:561)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.(MiniHBaseCluster.java:147)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.hbase.util.JVMClusterUtil.createRegionServerThread(JVMClusterUtil.java:86)
>   at 
> org.apache.hadoop.hbase.LocalHBaseCluster.addRegionServer(LocalHBaseCluster.java:184)
>   at 
> org.apache.hadoop.hbase.LocalHBaseCluster$1.run(LocalHBaseCluster.java:198)
>   at 
> org.apache.hadoop.hbase.LocalHBaseCluster$1.run(LocalHBaseCluster.java:195)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at 
> org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:313)
>   at 
> org.apache.hadoop.hbase.LocalHBaseCluster.addRegionServer(LocalHBaseCluster.java:194)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:261)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster.(MiniHBaseCluster.java:121)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1042)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:988)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:859)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:853)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:782)
>   at 
> org.hbase.downstreamer.TestHBaseMiniCluster.testSpinUpMiniHBaseCluster(TestHBaseMiniCluster.java:16)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at org.junit.runners.Parent

[jira] [Commented] (HBASE-20544) downstream HBaseTestingUtility fails with invalid port

2018-05-09 Thread Umesh Agashe (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16469395#comment-16469395
 ] 

Umesh Agashe commented on HBASE-20544:
--

If its explicitly set to -1 then

(-1 == HConstants.DEFAULT_MASTER_INFOPORT) will be false which is same as ( -1 
!= -1) being false.

> downstream HBaseTestingUtility fails with invalid port
> --
>
> Key: HBASE-20544
> URL: https://issues.apache.org/jira/browse/HBASE-20544
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.0.0
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Blocker
> Fix For: 3.0.0, 2.1.0, 2.0.1
>
> Attachments: HBASE-20544.0.patch, HBASE-20544.1.patch, 
> HBASE-20544.2.patch
>
>
> Attempting to update hbase-downstreamer to use our 2.0.0 release fails with 
> an invalid port in the event that {{hbase.localcluster.assign.random.ports}} 
> isn't set (or is set to false, specifically):
> {code}
> 2018-05-08 06:10:06,508 ERROR [main] regionserver.HRegionServer 
> (HRegionServer.java:(631)) - Failed construction RegionServer
> java.lang.IllegalArgumentException: port out of range:-1
>   at java.net.InetSocketAddress.checkPort(InetSocketAddress.java:143)
>   at java.net.InetSocketAddress.(InetSocketAddress.java:224)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.(RSRpcServices.java:1217)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.(RSRpcServices.java:1184)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.createRpcServices(HRegionServer.java:723)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.(HRegionServer.java:561)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.(MiniHBaseCluster.java:147)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.hbase.util.JVMClusterUtil.createRegionServerThread(JVMClusterUtil.java:86)
>   at 
> org.apache.hadoop.hbase.LocalHBaseCluster.addRegionServer(LocalHBaseCluster.java:184)
>   at 
> org.apache.hadoop.hbase.LocalHBaseCluster$1.run(LocalHBaseCluster.java:198)
>   at 
> org.apache.hadoop.hbase.LocalHBaseCluster$1.run(LocalHBaseCluster.java:195)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at 
> org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:313)
>   at 
> org.apache.hadoop.hbase.LocalHBaseCluster.addRegionServer(LocalHBaseCluster.java:194)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:261)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster.(MiniHBaseCluster.java:121)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1042)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:988)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:859)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:853)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:782)
>   at 
> org.hbase.downstreamer.TestHBaseMiniCluster.testSpinUpMiniHBaseCluster(TestHBaseMiniCluster.java:16)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
>   at org

[jira] [Commented] (HBASE-20544) downstream HBaseTestingUtility fails with invalid port

2018-05-09 Thread Umesh Agashe (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16469372#comment-16469372
 ] 

Umesh Agashe commented on HBASE-20544:
--

+1 for the latest patch. nit:
{code:java}
if (conf.getInt(HConstants.REGIONSERVER_INFO_PORT, 0) != -1 &&
conf.getInt(HConstants.REGIONSERVER_INFO_PORT, 
HConstants.DEFAULT_REGIONSERVER_INFOPORT)
== HConstants.DEFAULT_REGIONSERVER_INFOPORT) {{code}
can be effectively changed to:
{code:java}
if (conf.getInt(HConstants.REGIONSERVER_INFO_PORT, 
HConstants.DEFAULT_REGIONSERVER_INFOPORT)
== HConstants.DEFAULT_REGIONSERVER_INFOPORT) {{code}
and same for:
{code:java}
if (conf.getInt(HConstants.MASTER_INFO_PORT, 0) != -1 &&
conf.getInt(HConstants.MASTER_INFO_PORT, HConstants.DEFAULT_MASTER_INFOPORT)
== HConstants.DEFAULT_MASTER_INFOPORT) {{code}

> downstream HBaseTestingUtility fails with invalid port
> --
>
> Key: HBASE-20544
> URL: https://issues.apache.org/jira/browse/HBASE-20544
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.0.0
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Blocker
> Fix For: 3.0.0, 2.1.0, 2.0.1
>
> Attachments: HBASE-20544.0.patch, HBASE-20544.1.patch, 
> HBASE-20544.2.patch
>
>
> Attempting to update hbase-downstreamer to use our 2.0.0 release fails with 
> an invalid port in the event that {{hbase.localcluster.assign.random.ports}} 
> isn't set (or is set to false, specifically):
> {code}
> 2018-05-08 06:10:06,508 ERROR [main] regionserver.HRegionServer 
> (HRegionServer.java:(631)) - Failed construction RegionServer
> java.lang.IllegalArgumentException: port out of range:-1
>   at java.net.InetSocketAddress.checkPort(InetSocketAddress.java:143)
>   at java.net.InetSocketAddress.(InetSocketAddress.java:224)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.(RSRpcServices.java:1217)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.(RSRpcServices.java:1184)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.createRpcServices(HRegionServer.java:723)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.(HRegionServer.java:561)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.(MiniHBaseCluster.java:147)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.hbase.util.JVMClusterUtil.createRegionServerThread(JVMClusterUtil.java:86)
>   at 
> org.apache.hadoop.hbase.LocalHBaseCluster.addRegionServer(LocalHBaseCluster.java:184)
>   at 
> org.apache.hadoop.hbase.LocalHBaseCluster$1.run(LocalHBaseCluster.java:198)
>   at 
> org.apache.hadoop.hbase.LocalHBaseCluster$1.run(LocalHBaseCluster.java:195)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at 
> org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:313)
>   at 
> org.apache.hadoop.hbase.LocalHBaseCluster.addRegionServer(LocalHBaseCluster.java:194)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:261)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster.(MiniHBaseCluster.java:121)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1042)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:988)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:859)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:853)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:782)
>   at 
> org.hbase.downstreamer.TestHBaseMiniCluster.testSpinUpMiniHBaseCluster(TestHBaseMiniCluster.java:16)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runner

[jira] [Commented] (HBASE-20224) Web UI is broken in standalone mode

2018-05-08 Thread Umesh Agashe (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16467916#comment-16467916
 ] 

Umesh Agashe commented on HBASE-20224:
--

[~busbey], can you reconcile the patch for HBASE-20544 with the patch 004 here? 
Specifically around files:

hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java

and

hbase-server/src/main/java/org/apache/hadoop/hbase/LocalHBaseCluster.java
hbase-server/src/test/java/org/apache/hadoop/hbase/MiniHBaseCluster.java

> Web UI is broken in standalone mode
> ---
>
> Key: HBASE-20224
> URL: https://issues.apache.org/jira/browse/HBASE-20224
> Project: HBase
>  Issue Type: Bug
>  Components: UI, Usability
>Affects Versions: 2.0.0-beta-2
>Reporter: Umesh Agashe
>Assignee: Umesh Agashe
>Priority: Blocker
> Fix For: 2.0.0, 2.0.1
>
> Attachments: 
> 0001-HBASE-20224-Web-UI-is-broken-in-standalone-mode-ADDE.ADDENDUM.patch, 
> 20224-addendum.3.txt, 20224.addendum.4, 20224.addendum.5, 20224.addendum.6, 
> HBASE-20224.master.004.patch, hbase-20224.master.001.patch, 
> hbase-20224.master.002.patch, hbase-20224.master.003.patch, 
> hbase-20224.master.addendum.patch
>
>
> Web UI doesn't show up in standalone mode on default port. This can be seen 
> on master and branch-2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20224) Web UI is broken in standalone mode

2018-05-08 Thread Umesh Agashe (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16467891#comment-16467891
 ] 

Umesh Agashe commented on HBASE-20224:
--

I don't see patch 004 committed. Can this be re-opened for tracking?

> Web UI is broken in standalone mode
> ---
>
> Key: HBASE-20224
> URL: https://issues.apache.org/jira/browse/HBASE-20224
> Project: HBase
>  Issue Type: Bug
>  Components: UI, Usability
>Affects Versions: 2.0.0-beta-2
>Reporter: Umesh Agashe
>Assignee: Umesh Agashe
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: 
> 0001-HBASE-20224-Web-UI-is-broken-in-standalone-mode-ADDE.ADDENDUM.patch, 
> 20224-addendum.3.txt, 20224.addendum.4, 20224.addendum.5, 20224.addendum.6, 
> HBASE-20224.master.004.patch, hbase-20224.master.001.patch, 
> hbase-20224.master.002.patch, hbase-20224.master.003.patch, 
> hbase-20224.master.addendum.patch
>
>
> Web UI doesn't show up in standalone mode on default port. This can be seen 
> on master and branch-2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20514) On Master restart if table is stuck in DISABLING state, CLOSED regions should not be considered stuck in-transition

2018-05-01 Thread Umesh Agashe (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16460225#comment-16460225
 ] 

Umesh Agashe commented on HBASE-20514:
--

Added DISABLING state to check if table is in DISABLED or DISABLING state for 
ignoring the region.

> On Master restart if table is stuck in DISABLING state, CLOSED regions should 
> not be considered stuck in-transition
> ---
>
> Key: HBASE-20514
> URL: https://issues.apache.org/jira/browse/HBASE-20514
> Project: HBase
>  Issue Type: Bug
>  Components: amv2
>Affects Versions: 2.0.0
>Reporter: Umesh Agashe
>Assignee: Umesh Agashe
>Priority: Major
> Fix For: 2.0.1
>
> Attachments: hbase-20514.master.001.patch
>
>
> When master is restarted, in AssignmentManager#loadMeta(), if table is in 
> DISABLED state nothing is done for regions in CLOSED state. But if table is 
> stuck in DISABLING state then CLOSED regions are considered as stuck 
> in-transition. CLOSED regions of DISABLING/ DISABLED table can be handled the 
> same way.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20514) On Master restart if table is stuck in DISABLING state, CLOSED regions should not be considered stuck in-transition

2018-05-01 Thread Umesh Agashe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Umesh Agashe updated HBASE-20514:
-
Attachment: hbase-20514.master.001.patch

> On Master restart if table is stuck in DISABLING state, CLOSED regions should 
> not be considered stuck in-transition
> ---
>
> Key: HBASE-20514
> URL: https://issues.apache.org/jira/browse/HBASE-20514
> Project: HBase
>  Issue Type: Bug
>  Components: amv2
>Affects Versions: 2.0.0
>Reporter: Umesh Agashe
>Assignee: Umesh Agashe
>Priority: Major
> Fix For: 2.0.1
>
> Attachments: hbase-20514.master.001.patch
>
>
> When master is restarted, in AssignmentManager#loadMeta(), if table is in 
> DISABLED state nothing is done for regions in CLOSED state. But if table is 
> stuck in DISABLING state then CLOSED regions are considered as stuck 
> in-transition. CLOSED regions of DISABLING/ DISABLED table can be handled the 
> same way.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20514) On Master restart if table is stuck in DISABLING state, CLOSED regions should not be considered stuck in-transition

2018-05-01 Thread Umesh Agashe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Umesh Agashe updated HBASE-20514:
-
Status: Patch Available  (was: In Progress)

> On Master restart if table is stuck in DISABLING state, CLOSED regions should 
> not be considered stuck in-transition
> ---
>
> Key: HBASE-20514
> URL: https://issues.apache.org/jira/browse/HBASE-20514
> Project: HBase
>  Issue Type: Bug
>  Components: amv2
>Affects Versions: 2.0.0
>Reporter: Umesh Agashe
>Assignee: Umesh Agashe
>Priority: Major
> Fix For: 2.0.1
>
> Attachments: hbase-20514.master.001.patch
>
>
> When master is restarted, in AssignmentManager#loadMeta(), if table is in 
> DISABLED state nothing is done for regions in CLOSED state. But if table is 
> stuck in DISABLING state then CLOSED regions are considered as stuck 
> in-transition. CLOSED regions of DISABLING/ DISABLED table can be handled the 
> same way.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work started] (HBASE-20514) On Master restart if table is stuck in DISABLING state, CLOSED regions should not be considered stuck in-transition

2018-05-01 Thread Umesh Agashe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HBASE-20514 started by Umesh Agashe.

> On Master restart if table is stuck in DISABLING state, CLOSED regions should 
> not be considered stuck in-transition
> ---
>
> Key: HBASE-20514
> URL: https://issues.apache.org/jira/browse/HBASE-20514
> Project: HBase
>  Issue Type: Bug
>  Components: amv2
>Affects Versions: 2.0.0
>Reporter: Umesh Agashe
>Assignee: Umesh Agashe
>Priority: Major
> Fix For: 2.0.1
>
>
> When master is restarted, in AssignmentManager#loadMeta(), if table is in 
> DISABLED state nothing is done for regions in CLOSED state. But if table is 
> stuck in DISABLING state then CLOSED regions are considered as stuck 
> in-transition. CLOSED regions of DISABLING/ DISABLED table can be handled the 
> same way.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20514) On Master restart if table is stuck in DISABLING state, CLOSED regions should not be considered stuck in-transition

2018-05-01 Thread Umesh Agashe (JIRA)
Umesh Agashe created HBASE-20514:


 Summary: On Master restart if table is stuck in DISABLING state, 
CLOSED regions should not be considered stuck in-transition
 Key: HBASE-20514
 URL: https://issues.apache.org/jira/browse/HBASE-20514
 Project: HBase
  Issue Type: Bug
  Components: amv2
Affects Versions: 2.0.0
Reporter: Umesh Agashe
Assignee: Umesh Agashe
 Fix For: 2.0.1


When master is restarted, in AssignmentManager#loadMeta(), if table is in 
DISABLED state nothing is done for regions in CLOSED state. But if table is 
stuck in DISABLING state then CLOSED regions are considered as stuck 
in-transition. CLOSED regions of DISABLING/ DISABLED table can be handled the 
same way.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20492) UnassignProcedure is stuck in retry loop on region stuck in OPENING state

2018-04-30 Thread Umesh Agashe (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16459351#comment-16459351
 ] 

Umesh Agashe commented on HBASE-20492:
--

+1, after fixing new checkstyle errors.

> UnassignProcedure is stuck in retry loop on region stuck in OPENING state
> -
>
> Key: HBASE-20492
> URL: https://issues.apache.org/jira/browse/HBASE-20492
> Project: HBase
>  Issue Type: Bug
>  Components: amv2
>Affects Versions: 2.0.0
>Reporter: Umesh Agashe
>Assignee: stack
>Priority: Critical
> Fix For: 2.0.1
>
> Attachments: HBASE-20492.branch-2.0.001.patch, 
> HBASE-20492.branch-2.0.002.patch, HBASE-20492.branch-2.0.003.patch
>
>
> UnassignProcedure gets stuck in a retry loop for a region stuck in OPENING 
> state. From logs:
> {code:java}
> 2018-04-25 15:59:53,825 WARN 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure: 
> Retryable error trying to transition: pid=142564, 
> state=RUNNABLE:REGION_TRANSITION_DISPATCH; UnassignProcedure 
> table=IntegrationTestBigLinkedList_20180331004141, 
> region=bd2fb2c7d39236c9b9085f350358df7c, 
> server=vb1122.halxg.cloudera.com,22101,1522626198450; rit=OPENING, 
> location=vb1122.halxg.cloudera.com,22101,1522626198450
> org.apache.hadoop.hbase.exceptions.UnexpectedStateException: Expected 
> [SPLITTING, SPLIT, MERGING, OPEN, CLOSING] so could move to CLOSING but 
> current state=OPENING
> at 
> org.apache.hadoop.hbase.master.assignment.RegionStates$RegionStateNode.transitionState(RegionStates.java:158)
> at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.markRegionAsClosing(AssignmentManager.java:1514)
> at 
> org.apache.hadoop.hbase.master.assignment.UnassignProcedure.updateTransition(UnassignProcedure.java:179)
> at 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:309)
> at 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:85)
> at org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:845)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1458)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1227)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:78)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1738)
> 2018-04-25 15:59:53,892 WARN 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure: 
> Retryable error trying to transition: pid=142564, 
> state=RUNNABLE:REGION_TRANSITION_DISPATCH; UnassignProcedure 
> table=IntegrationTestBigLinkedList_20180331004141, 
> region=bd2fb2c7d39236c9b9085f350358df7c, 
> server=vb1122.halxg.cloudera.com,22101,1522626198450; rit=OPENING, 
> location=vb1122.halxg.cloudera.com,22101,1522626198450
> org.apache.hadoop.hbase.exceptions.UnexpectedStateException: Expected 
> [SPLITTING, SPLIT, MERGING, OPEN, CLOSING] so could move to CLOSING but 
> current state=OPENING
> at 
> org.apache.hadoop.hbase.master.assignment.RegionStates$RegionStateNode.transitionState(RegionStates.java:158)
> at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.markRegionAsClosing(AssignmentManager.java:1514)
> at 
> org.apache.hadoop.hbase.master.assignment.UnassignProcedure.updateTransition(UnassignProcedure.java:179)
> at 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:309)
> at 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:85)
> at org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:845)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1458)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1227)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:78)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1738){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19121) HBCK for AMv2 (A.K.A HBCK2)

2018-04-27 Thread Umesh Agashe (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16456901#comment-16456901
 ] 

Umesh Agashe commented on HBASE-19121:
--

For region states:
{code:java}
"scan 'hbase:meta', { ROWPREFIXFILTER => 't1,', COLUMNS => 'info:state'}{code}

> HBCK for AMv2 (A.K.A HBCK2)
> ---
>
> Key: HBASE-19121
> URL: https://issues.apache.org/jira/browse/HBASE-19121
> Project: HBase
>  Issue Type: Bug
>  Components: hbck
>Reporter: stack
>Priority: Major
>
> We don't have an hbck for the new AM. Old hbck may actually do damage going 
> against AMv2.
> Fix.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20492) UnassignProcedure is stuck in retry loop on region stuck in OPENING state

2018-04-25 Thread Umesh Agashe (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453246#comment-16453246
 ] 

Umesh Agashe commented on HBASE-20492:
--

Can not abort hung procedure and restarting master doesn't help.

> UnassignProcedure is stuck in retry loop on region stuck in OPENING state
> -
>
> Key: HBASE-20492
> URL: https://issues.apache.org/jira/browse/HBASE-20492
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Umesh Agashe
>Priority: Major
> Fix For: 2.0.1
>
>
> UnassignProcedure gets stuck in a retry loop for a region stuck in OPENING 
> state. From logs:
> {code:java}
> 2018-04-25 15:59:53,825 WARN 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure: 
> Retryable error trying to transition: pid=142564, 
> state=RUNNABLE:REGION_TRANSITION_DISPATCH; UnassignProcedure 
> table=IntegrationTestBigLinkedList_20180331004141, 
> region=bd2fb2c7d39236c9b9085f350358df7c, 
> server=vb1122.halxg.cloudera.com,22101,1522626198450; rit=OPENING, 
> location=vb1122.halxg.cloudera.com,22101,1522626198450
> org.apache.hadoop.hbase.exceptions.UnexpectedStateException: Expected 
> [SPLITTING, SPLIT, MERGING, OPEN, CLOSING] so could move to CLOSING but 
> current state=OPENING
> at 
> org.apache.hadoop.hbase.master.assignment.RegionStates$RegionStateNode.transitionState(RegionStates.java:158)
> at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.markRegionAsClosing(AssignmentManager.java:1514)
> at 
> org.apache.hadoop.hbase.master.assignment.UnassignProcedure.updateTransition(UnassignProcedure.java:179)
> at 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:309)
> at 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:85)
> at org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:845)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1458)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1227)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:78)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1738)
> 2018-04-25 15:59:53,892 WARN 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure: 
> Retryable error trying to transition: pid=142564, 
> state=RUNNABLE:REGION_TRANSITION_DISPATCH; UnassignProcedure 
> table=IntegrationTestBigLinkedList_20180331004141, 
> region=bd2fb2c7d39236c9b9085f350358df7c, 
> server=vb1122.halxg.cloudera.com,22101,1522626198450; rit=OPENING, 
> location=vb1122.halxg.cloudera.com,22101,1522626198450
> org.apache.hadoop.hbase.exceptions.UnexpectedStateException: Expected 
> [SPLITTING, SPLIT, MERGING, OPEN, CLOSING] so could move to CLOSING but 
> current state=OPENING
> at 
> org.apache.hadoop.hbase.master.assignment.RegionStates$RegionStateNode.transitionState(RegionStates.java:158)
> at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.markRegionAsClosing(AssignmentManager.java:1514)
> at 
> org.apache.hadoop.hbase.master.assignment.UnassignProcedure.updateTransition(UnassignProcedure.java:179)
> at 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:309)
> at 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:85)
> at org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:845)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1458)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1227)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:78)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1738){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20492) UnassignProcedure is stuck in retry loop on region stuck in OPENING state

2018-04-25 Thread Umesh Agashe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Umesh Agashe updated HBASE-20492:
-
Fix Version/s: 2.0.1

> UnassignProcedure is stuck in retry loop on region stuck in OPENING state
> -
>
> Key: HBASE-20492
> URL: https://issues.apache.org/jira/browse/HBASE-20492
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Umesh Agashe
>Priority: Major
> Fix For: 2.0.1
>
>
> UnassignProcedure gets stuck in a retry loop for a region stuck in OPENING 
> state. From logs:
> {code:java}
> 2018-04-25 15:59:53,825 WARN 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure: 
> Retryable error trying to transition: pid=142564, 
> state=RUNNABLE:REGION_TRANSITION_DISPATCH; UnassignProcedure 
> table=IntegrationTestBigLinkedList_20180331004141, 
> region=bd2fb2c7d39236c9b9085f350358df7c, 
> server=vb1122.halxg.cloudera.com,22101,1522626198450; rit=OPENING, 
> location=vb1122.halxg.cloudera.com,22101,1522626198450
> org.apache.hadoop.hbase.exceptions.UnexpectedStateException: Expected 
> [SPLITTING, SPLIT, MERGING, OPEN, CLOSING] so could move to CLOSING but 
> current state=OPENING
> at 
> org.apache.hadoop.hbase.master.assignment.RegionStates$RegionStateNode.transitionState(RegionStates.java:158)
> at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.markRegionAsClosing(AssignmentManager.java:1514)
> at 
> org.apache.hadoop.hbase.master.assignment.UnassignProcedure.updateTransition(UnassignProcedure.java:179)
> at 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:309)
> at 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:85)
> at org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:845)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1458)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1227)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:78)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1738)
> 2018-04-25 15:59:53,892 WARN 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure: 
> Retryable error trying to transition: pid=142564, 
> state=RUNNABLE:REGION_TRANSITION_DISPATCH; UnassignProcedure 
> table=IntegrationTestBigLinkedList_20180331004141, 
> region=bd2fb2c7d39236c9b9085f350358df7c, 
> server=vb1122.halxg.cloudera.com,22101,1522626198450; rit=OPENING, 
> location=vb1122.halxg.cloudera.com,22101,1522626198450
> org.apache.hadoop.hbase.exceptions.UnexpectedStateException: Expected 
> [SPLITTING, SPLIT, MERGING, OPEN, CLOSING] so could move to CLOSING but 
> current state=OPENING
> at 
> org.apache.hadoop.hbase.master.assignment.RegionStates$RegionStateNode.transitionState(RegionStates.java:158)
> at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.markRegionAsClosing(AssignmentManager.java:1514)
> at 
> org.apache.hadoop.hbase.master.assignment.UnassignProcedure.updateTransition(UnassignProcedure.java:179)
> at 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:309)
> at 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:85)
> at org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:845)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1458)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1227)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:78)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1738){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20492) UnassignProcedure is stuck in retry loop on region stuck in OPENING state

2018-04-25 Thread Umesh Agashe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Umesh Agashe updated HBASE-20492:
-
Summary: UnassignProcedure is stuck in retry loop on region stuck in 
OPENING state  (was: UnassignProcedure is stuck in retry loop on region with 
state OPENING)

> UnassignProcedure is stuck in retry loop on region stuck in OPENING state
> -
>
> Key: HBASE-20492
> URL: https://issues.apache.org/jira/browse/HBASE-20492
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Umesh Agashe
>Priority: Major
> Fix For: 2.0.1
>
>
> UnassignProcedure gets stuck in a retry loop for a region stuck in OPENING 
> state. From logs:
> {code:java}
> 2018-04-25 15:59:53,825 WARN 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure: 
> Retryable error trying to transition: pid=142564, 
> state=RUNNABLE:REGION_TRANSITION_DISPATCH; UnassignProcedure 
> table=IntegrationTestBigLinkedList_20180331004141, 
> region=bd2fb2c7d39236c9b9085f350358df7c, 
> server=vb1122.halxg.cloudera.com,22101,1522626198450; rit=OPENING, 
> location=vb1122.halxg.cloudera.com,22101,1522626198450
> org.apache.hadoop.hbase.exceptions.UnexpectedStateException: Expected 
> [SPLITTING, SPLIT, MERGING, OPEN, CLOSING] so could move to CLOSING but 
> current state=OPENING
> at 
> org.apache.hadoop.hbase.master.assignment.RegionStates$RegionStateNode.transitionState(RegionStates.java:158)
> at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.markRegionAsClosing(AssignmentManager.java:1514)
> at 
> org.apache.hadoop.hbase.master.assignment.UnassignProcedure.updateTransition(UnassignProcedure.java:179)
> at 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:309)
> at 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:85)
> at org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:845)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1458)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1227)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:78)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1738)
> 2018-04-25 15:59:53,892 WARN 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure: 
> Retryable error trying to transition: pid=142564, 
> state=RUNNABLE:REGION_TRANSITION_DISPATCH; UnassignProcedure 
> table=IntegrationTestBigLinkedList_20180331004141, 
> region=bd2fb2c7d39236c9b9085f350358df7c, 
> server=vb1122.halxg.cloudera.com,22101,1522626198450; rit=OPENING, 
> location=vb1122.halxg.cloudera.com,22101,1522626198450
> org.apache.hadoop.hbase.exceptions.UnexpectedStateException: Expected 
> [SPLITTING, SPLIT, MERGING, OPEN, CLOSING] so could move to CLOSING but 
> current state=OPENING
> at 
> org.apache.hadoop.hbase.master.assignment.RegionStates$RegionStateNode.transitionState(RegionStates.java:158)
> at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.markRegionAsClosing(AssignmentManager.java:1514)
> at 
> org.apache.hadoop.hbase.master.assignment.UnassignProcedure.updateTransition(UnassignProcedure.java:179)
> at 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:309)
> at 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:85)
> at org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:845)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1458)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1227)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:78)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1738){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20492) UnassignProcedure is stuck in retry loop on region with state OPENING

2018-04-25 Thread Umesh Agashe (JIRA)
Umesh Agashe created HBASE-20492:


 Summary: UnassignProcedure is stuck in retry loop on region with 
state OPENING
 Key: HBASE-20492
 URL: https://issues.apache.org/jira/browse/HBASE-20492
 Project: HBase
  Issue Type: Improvement
Affects Versions: 2.0.0
Reporter: Umesh Agashe


UnassignProcedure gets stuck in a retry loop for a region stuck in OPENING 
state. From logs:
{code:java}
2018-04-25 15:59:53,825 WARN 
org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure: Retryable 
error trying to transition: pid=142564, 
state=RUNNABLE:REGION_TRANSITION_DISPATCH; UnassignProcedure 
table=IntegrationTestBigLinkedList_20180331004141, 
region=bd2fb2c7d39236c9b9085f350358df7c, 
server=vb1122.halxg.cloudera.com,22101,1522626198450; rit=OPENING, 
location=vb1122.halxg.cloudera.com,22101,1522626198450
org.apache.hadoop.hbase.exceptions.UnexpectedStateException: Expected 
[SPLITTING, SPLIT, MERGING, OPEN, CLOSING] so could move to CLOSING but current 
state=OPENING
at 
org.apache.hadoop.hbase.master.assignment.RegionStates$RegionStateNode.transitionState(RegionStates.java:158)
at 
org.apache.hadoop.hbase.master.assignment.AssignmentManager.markRegionAsClosing(AssignmentManager.java:1514)
at 
org.apache.hadoop.hbase.master.assignment.UnassignProcedure.updateTransition(UnassignProcedure.java:179)
at 
org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:309)
at 
org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:85)
at org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:845)
at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1458)
at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1227)
at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:78)
at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1738)
2018-04-25 15:59:53,892 WARN 
org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure: Retryable 
error trying to transition: pid=142564, 
state=RUNNABLE:REGION_TRANSITION_DISPATCH; UnassignProcedure 
table=IntegrationTestBigLinkedList_20180331004141, 
region=bd2fb2c7d39236c9b9085f350358df7c, 
server=vb1122.halxg.cloudera.com,22101,1522626198450; rit=OPENING, 
location=vb1122.halxg.cloudera.com,22101,1522626198450
org.apache.hadoop.hbase.exceptions.UnexpectedStateException: Expected 
[SPLITTING, SPLIT, MERGING, OPEN, CLOSING] so could move to CLOSING but current 
state=OPENING
at 
org.apache.hadoop.hbase.master.assignment.RegionStates$RegionStateNode.transitionState(RegionStates.java:158)
at 
org.apache.hadoop.hbase.master.assignment.AssignmentManager.markRegionAsClosing(AssignmentManager.java:1514)
at 
org.apache.hadoop.hbase.master.assignment.UnassignProcedure.updateTransition(UnassignProcedure.java:179)
at 
org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:309)
at 
org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:85)
at org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:845)
at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1458)
at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1227)
at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:78)
at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1738){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20492) UnassignProcedure is stuck in retry loop on region with state OPENING

2018-04-25 Thread Umesh Agashe (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453216#comment-16453216
 ] 

Umesh Agashe commented on HBASE-20492:
--

Logs get filled up with above log messages.

> UnassignProcedure is stuck in retry loop on region with state OPENING
> -
>
> Key: HBASE-20492
> URL: https://issues.apache.org/jira/browse/HBASE-20492
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Umesh Agashe
>Priority: Major
>
> UnassignProcedure gets stuck in a retry loop for a region stuck in OPENING 
> state. From logs:
> {code:java}
> 2018-04-25 15:59:53,825 WARN 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure: 
> Retryable error trying to transition: pid=142564, 
> state=RUNNABLE:REGION_TRANSITION_DISPATCH; UnassignProcedure 
> table=IntegrationTestBigLinkedList_20180331004141, 
> region=bd2fb2c7d39236c9b9085f350358df7c, 
> server=vb1122.halxg.cloudera.com,22101,1522626198450; rit=OPENING, 
> location=vb1122.halxg.cloudera.com,22101,1522626198450
> org.apache.hadoop.hbase.exceptions.UnexpectedStateException: Expected 
> [SPLITTING, SPLIT, MERGING, OPEN, CLOSING] so could move to CLOSING but 
> current state=OPENING
> at 
> org.apache.hadoop.hbase.master.assignment.RegionStates$RegionStateNode.transitionState(RegionStates.java:158)
> at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.markRegionAsClosing(AssignmentManager.java:1514)
> at 
> org.apache.hadoop.hbase.master.assignment.UnassignProcedure.updateTransition(UnassignProcedure.java:179)
> at 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:309)
> at 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:85)
> at org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:845)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1458)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1227)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:78)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1738)
> 2018-04-25 15:59:53,892 WARN 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure: 
> Retryable error trying to transition: pid=142564, 
> state=RUNNABLE:REGION_TRANSITION_DISPATCH; UnassignProcedure 
> table=IntegrationTestBigLinkedList_20180331004141, 
> region=bd2fb2c7d39236c9b9085f350358df7c, 
> server=vb1122.halxg.cloudera.com,22101,1522626198450; rit=OPENING, 
> location=vb1122.halxg.cloudera.com,22101,1522626198450
> org.apache.hadoop.hbase.exceptions.UnexpectedStateException: Expected 
> [SPLITTING, SPLIT, MERGING, OPEN, CLOSING] so could move to CLOSING but 
> current state=OPENING
> at 
> org.apache.hadoop.hbase.master.assignment.RegionStates$RegionStateNode.transitionState(RegionStates.java:158)
> at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.markRegionAsClosing(AssignmentManager.java:1514)
> at 
> org.apache.hadoop.hbase.master.assignment.UnassignProcedure.updateTransition(UnassignProcedure.java:179)
> at 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:309)
> at 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:85)
> at org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:845)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1458)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1227)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:78)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1738){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   3   4   5   6   7   8   9   10   >