[jira] [Resolved] (HBASE-25928) TestHBaseConfiguration#testDeprecatedConfigurations is broken with Hadoop 3.3

2021-05-28 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25928.
---
Fix Version/s: 2.5.0
   3.0.0-alpha-1
 Hadoop Flags: Reviewed
   Resolution: Fixed

Pushed to branch-2 and master. Thanks for finding the issue [~weichiu] and 
thanks for the fix [~DeanZ]

> TestHBaseConfiguration#testDeprecatedConfigurations is broken with Hadoop 3.3
> -
>
> Key: HBASE-25928
> URL: https://issues.apache.org/jira/browse/HBASE-25928
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha-1, 2.5.0
>Reporter: Wei-Chiu Chuang
>Assignee: Baiqiang Zhao
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.5.0
>
>
> The test TestHBaseConfiguration#testDeprecatedConfigurations was added 
> recently by HBASE-25861 to address the usage of Hadoop Configuration 
> addDeprecations API.
> However, the API's behavior was changed to fix a bug.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25914) Provide slow/large logs on RegionServer UI

2021-05-28 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17353386#comment-17353386
 ] 

Michael Stack commented on HBASE-25914:
---

Thanks [~vjasani] My concern w/ 'Name Queue Log' is that it indeed is generic, 
so generic, the operator will be confounded by what it is they are looking at; 
the tab needs explanatory text and a better name I'd suggest (HBase has many 
queues. 'Name'?  'Log' is usually a file on disk ).

In the past, folks have been worried about showing full RPC payload in logs... 
privacy/security concerns. The full display of rpc payload should probably be 
an opt-in switch.

As said before, this looks like a very nice feature.

> Provide slow/large logs on RegionServer UI
> --
>
> Key: HBASE-25914
> URL: https://issues.apache.org/jira/browse/HBASE-25914
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver, UI
>Affects Versions: 3.0.0-alpha-1, 2.5.0
>Reporter: Zhuoyue Huang
>Assignee: Zhuoyue Huang
>Priority: Major
> Attachments: callDetails.png, largeLog.png, slowLog.png
>
>
> Pulling slow/large log from  in-memory queues on RegionServer then display 
> details info in RegionServer status UI



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25914) Provide slow/large logs on RegionServer UI

2021-05-28 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17353378#comment-17353378
 ] 

Michael Stack commented on HBASE-25914:
---

Thanks [~vjasani]. Now I remember.

Would suggest a version of your explanation be added to the head of the 'Named 
Queue Log' tab to explain what it is.

Is 'Named Queue Log' a good name for this page?  'Extraordinary RPC' is a 
mouthful.

Will the full RPC show in the UI?

Thanks.

> Provide slow/large logs on RegionServer UI
> --
>
> Key: HBASE-25914
> URL: https://issues.apache.org/jira/browse/HBASE-25914
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver, UI
>Affects Versions: 3.0.0-alpha-1, 2.5.0
>Reporter: Zhuoyue Huang
>Assignee: Zhuoyue Huang
>Priority: Major
> Attachments: callDetails.png, largeLog.png, slowLog.png
>
>
> Pulling slow/large log from  in-memory queues on RegionServer then display 
> details info in RegionServer status UI



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25914) Provide slow/large logs on RegionServer UI

2021-05-27 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17352958#comment-17352958
 ] 

Michael Stack commented on HBASE-25914:
---

Please say more on this feature [~GeorryHuang]. It looks good.  I'd expect some 
text on the page explaining what is being displayed, wht is slow log and what 
is a large log. Is 'Named Queue Log' a good name for this tab? Thanks.

> Provide slow/large logs on RegionServer UI
> --
>
> Key: HBASE-25914
> URL: https://issues.apache.org/jira/browse/HBASE-25914
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver, UI
>Affects Versions: 3.0.0-alpha-1, 2.5.0
>Reporter: Zhuoyue Huang
>Assignee: Zhuoyue Huang
>Priority: Major
> Attachments: callDetails.png, largeLog.png, slowLog.png
>
>
> Pulling slow/large log from  in-memory queues on RegionServer then display 
> details info in RegionServer status UI



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25758) Move MetaTableAccessor out of hbase-balancer module

2021-05-27 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17352942#comment-17352942
 ] 

Michael Stack commented on HBASE-25758:
---

bq. Finally we have done the moving back of MetaTableAccessor.

Hot dog!

> Move MetaTableAccessor out of hbase-balancer module
> ---
>
> Key: HBASE-25758
> URL: https://issues.apache.org/jira/browse/HBASE-25758
> Project: HBase
>  Issue Type: Sub-task
>  Components: Balancer, meta
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0-alpha-1
>
>
> It should not be there.
> The only reason we have to put it there is the favor node balancer. The favor 
> node balancer does not work well so maybe we could just purge it.
>  update 
> With HBASE-25926 in place we could just move it directly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-25908) Exclude jakarta.activation-api

2021-05-27 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack updated HBASE-25908:
--
Fix Version/s: 2.4.4

> Exclude jakarta.activation-api
> --
>
> Key: HBASE-25908
> URL: https://issues.apache.org/jira/browse/HBASE-25908
> Project: HBase
>  Issue Type: Improvement
>  Components: hadoop3, shading
>Affects Versions: 2.3.0
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.5.0, 2.4.4
>
>
> Hadoop 3.3.1 replaced its dependency of javax.activation 1.2.0 with 
> jakarta.activation 1.2.1.
> They are essentially the same thing (they even have the same classpath name), 
> but Eclipse took over JavaEE development and therefore changed group/artifact 
> id. 
> (https://stackoverflow.com/questions/46493613/what-is-the-replacement-for-javax-activation-package-in-java-9)
> See HADOOP-17049 for more details. Hadoop 3.3.0 updated jackson-databind to 
> 2.10 which shades jakarta.activation, causing classpath conflict.
> The solution to this issue will be similar to HBASE-22268



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25908) Exclude jakarta.activation-api

2021-05-27 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17352919#comment-17352919
 ] 

Michael Stack commented on HBASE-25908:
---

nvm... just simple excludes. Harmless. Backported to 2.4.

> Exclude jakarta.activation-api
> --
>
> Key: HBASE-25908
> URL: https://issues.apache.org/jira/browse/HBASE-25908
> Project: HBase
>  Issue Type: Improvement
>  Components: hadoop3, shading
>Affects Versions: 2.3.0
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.5.0
>
>
> Hadoop 3.3.1 replaced its dependency of javax.activation 1.2.0 with 
> jakarta.activation 1.2.1.
> They are essentially the same thing (they even have the same classpath name), 
> but Eclipse took over JavaEE development and therefore changed group/artifact 
> id. 
> (https://stackoverflow.com/questions/46493613/what-is-the-replacement-for-javax-activation-package-in-java-9)
> See HADOOP-17049 for more details. Hadoop 3.3.0 updated jackson-databind to 
> 2.10 which shades jakarta.activation, causing classpath conflict.
> The solution to this issue will be similar to HBASE-22268



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25908) Exclude jakarta.activation-api

2021-05-27 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25908.
---
Fix Version/s: 2.5.0
   3.0.0-alpha-1
 Hadoop Flags: Reviewed
 Tags: hadoop-3.3.1
   Resolution: Fixed

Merged to master and branch-2. Should I backport to branch-2.4 [~apurtell] ? 
You want to run on hadoop 3.3.1? Otherwise, hbase-2.5 to run on hadoop-3.3.1?

Thanks for the fix and the nice background [~weichiu].

> Exclude jakarta.activation-api
> --
>
> Key: HBASE-25908
> URL: https://issues.apache.org/jira/browse/HBASE-25908
> Project: HBase
>  Issue Type: Improvement
>  Components: hadoop3, shading
>Affects Versions: 2.3.0
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.5.0
>
>
> Hadoop 3.3.1 replaced its dependency of javax.activation 1.2.0 with 
> jakarta.activation 1.2.1.
> They are essentially the same thing (they even have the same classpath name), 
> but Eclipse took over JavaEE development and therefore changed group/artifact 
> id. 
> (https://stackoverflow.com/questions/46493613/what-is-the-replacement-for-javax-activation-package-in-java-9)
> See HADOOP-17049 for more details. Hadoop 3.3.0 updated jackson-databind to 
> 2.10 which shades jakarta.activation, causing classpath conflict.
> The solution to this issue will be similar to HBASE-22268



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25902) 1.x to 2.3.x upgrade does not work; you must install an hbase2 that is earlier than hbase-2.3.0 first

2021-05-20 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17348624#comment-17348624
 ] 

Michael Stack commented on HBASE-25902:
---

bq. our current stable release line is 2.3. that means folks upgrading from our 
hbase 1 stable releases to the current hbase 2 stable release won't work, right?

Good point.

> 1.x to 2.3.x upgrade does not work; you must install an hbase2 that is 
> earlier than hbase-2.3.0 first
> -
>
> Key: HBASE-25902
> URL: https://issues.apache.org/jira/browse/HBASE-25902
> Project: HBase
>  Issue Type: Bug
>  Components: meta, Operability
>Affects Versions: 2.3.0, 2.4.0
>Reporter: Michael Stack
>Priority: Critical
>
> Making note of this issue in case others run into it. At my place of employ, 
> we tried to upgrade a cluster that was an hbase-1.2.x version to an 
> hbase-2.3.5 but it failed because meta didn't have the 'table' column family.
> Up to 2.3.0, hbase:meta was hardcoded. HBASE-12035 added the 'table' CF for 
> hbase-2.0.0. HBASE-23782 (2.3.0) undid hardcoding of the hbase:meta schema; 
> i.e. reading hbase:meta schema from the filesystem. The hbase:meta schema is 
> only created on initial install. If an upgrade over existing data, the 
> hbase-1 hbase:meta will not be suitable for hbase-2.3.x context as it will be 
> missing columnfamilies needed to run (HBASE-23055 made it so hbase:meta could 
> be altered (2.3.0) but probably of no use since Master won't come up).
> It would be a nice-to-have if a user could go from hbase1 to hbase.2.3.0 w/o 
> having to first install an hbase2 that is earlier than 2.3.0 but needs to be 
> demand before we would work on it; meantime, install an intermediate hbase2 
> version before going to hbase-2.3.0+ if coming from hbase-1.x



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25902) 1.x to 2.3.x upgrade does not work; you must install an hbase2 that is earlier than hbase-2.3.0 first

2021-05-20 Thread Michael Stack (Jira)
Michael Stack created HBASE-25902:
-

 Summary: 1.x to 2.3.x upgrade does not work; you must install an 
hbase2 that is earlier than hbase-2.3.0 first
 Key: HBASE-25902
 URL: https://issues.apache.org/jira/browse/HBASE-25902
 Project: HBase
  Issue Type: Bug
Reporter: Michael Stack


Making note of this issue in case others run into it. At my place of employ, we 
tried to upgrade a cluster that was an hbase-1.2.x version to an hbase-2.3.5 
but it failed because meta didn't have the 'table' column family.

Up to 2.3.0, hbase:meta was hardcoded. HBASE-12035 added the 'table' CF for 
hbase-2.0.0. HBASE-23782 (2.3.0) undid hardcoding of the hbase:meta schema; 
i.e. reading hbase:meta schema from the filesystem. The hbase:meta schema is 
only created on initial install. If an upgrade over existing data, the hbase-1 
hbase:meta will not be suitable for hbase-2.3.x context as it will be missing 
columnfamilies needed to run (HBASE-23055 made it so hbase:meta could be 
altered (2.3.0) but probably of no use since Master won't come up).

It would be a nice-to-have if a user could go from hbase1 to hbase.2.3.0 w/o 
having to first install an hbase2 that is earlier than 2.3.0 but needs to be 
demand before we would work on it; meantime, install an intermediate hbase2 
version before going to hbase-2.3.0+ if coming from hbase-1.x



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25870) Validate only direct ancestors instead of entire history for a particular backup

2021-05-12 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25870.
---
Fix Version/s: 3.0.0-alpha-1
 Hadoop Flags: Reviewed
   Resolution: Fixed

Merged to master. Thanks for fix [~rda3mon]

> Validate only direct ancestors instead of entire history for a particular 
> backup
> 
>
> Key: HBASE-25870
> URL: https://issues.apache.org/jira/browse/HBASE-25870
> Project: HBase
>  Issue Type: Bug
>  Components: backuprestore
>Reporter: Mallikarjun
>Assignee: Mallikarjun
>Priority: Major
> Fix For: 3.0.0-alpha-1
>
>
> While creating a manifest of particular backup, it looks for entire history 
> of backups taken on that cluster and links are still valid. This need not 
> hold true and unnecessary. Only ancestors of a particular incremental backup 
> is necessary and sufficient.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25876) Add retry if we fail to read all bytes of the protobuf magic marker

2021-05-12 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25876.
---
Fix Version/s: 2.4.4
   2.5.0
   3.0.0-alpha-1
 Hadoop Flags: Reviewed
 Assignee: Michael Stack
   Resolution: Fixed

Pushed to branch-2.4+ (Master only took one of the patched changes because no 
HRegionInfo in master branch). Thanks for reviews [~anoop.hbase] and [~zhangduo]

> Add retry if we fail to read all bytes of the protobuf magic marker
> ---
>
> Key: HBASE-25876
> URL: https://issues.apache.org/jira/browse/HBASE-25876
> Project: HBase
>  Issue Type: Sub-task
>  Components: io
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Trivial
> Fix For: 3.0.0-alpha-1, 2.5.0, 2.4.4
>
>
> The parent issue fixes an instance where we try once to read protobuf magic 
> marker bytes rather than retry till we have enough. This subtask applies the 
> same trick in all cases where we could run into this issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25867) Extra doc around ITBLL

2021-05-11 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25867.
---
Hadoop Flags: Reviewed
Assignee: Michael Stack
  Resolution: Fixed

Pushed to branch-2.4+. Thanks for review [~busbey]

> Extra doc around ITBLL
> --
>
> Key: HBASE-25867
> URL: https://issues.apache.org/jira/browse/HBASE-25867
> Project: HBase
>  Issue Type: Bug
>  Components: documentation
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Minor
> Fix For: 3.0.0-alpha-1, 2.5.0, 2.4.4
>
>
> Added some doc around ITBLL to explain stuff I had difficultly with. Minor 
> items such as log message & javadoc edits and explaining how to pass 
> configuration to the ChaosMonkeyRunner.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25774) ServerManager.getOnlineServer may miss some region servers when refreshing state in some procedure implementations

2021-05-10 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17342127#comment-17342127
 ] 

Michael Stack commented on HBASE-25774:
---

Thanks for figuring the race [~zhangduo] (My bad too for not seeing it on 
review...)

> ServerManager.getOnlineServer may miss some region servers when refreshing 
> state in some procedure implementations
> --
>
> Key: HBASE-25774
> URL: https://issues.apache.org/jira/browse/HBASE-25774
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Reporter: Xiaolin Ha
>Assignee: Duo Zhang
>Priority: Critical
> Fix For: 3.0.0-alpha-1, 1.7.0, 2.5.0, 2.4.3, 2.3.5.1
>
>
> [https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3025/9/testReport/org.apache.hadoop.hbase.replication/TestSyncReplicationStandbyKillRS/precommit_checks___yetus_jdk8_Hadoop3_checks__/]
> {code:java}
> ...[truncated 391170 chars]...
> 76d634:45149.replicationSource,1] regionserver.HRegionServer(2351): STOPPED: 
> Unexpected exception in RS:2;ece3af76d634:45149.replicationSource,1
> 2021-04-11T11:14:40,268 INFO  [RS:2;ece3af76d634:45149] 
> regionserver.HeapMemoryManager(218): Stopping
> 2021-04-11T11:14:40,268 INFO  [MemStoreFlusher.0] 
> regionserver.MemStoreFlusher$FlushHandler(384): MemStoreFlusher.0 exiting
> 2021-04-11T11:14:40,268 INFO  [RS:2;ece3af76d634:45149] 
> flush.RegionServerFlushTableProcedureManager(118): Stopping region server 
> flush procedure manager abruptly.
> 2021-04-11T11:14:40,270 INFO  [RS:2;ece3af76d634:45149] 
> snapshot.RegionServerSnapshotManager(136): Stopping 
> RegionServerSnapshotManager abruptly.
> 2021-04-11T11:14:40,270 INFO  [RS:2;ece3af76d634:45149] 
> regionserver.HRegionServer(1146): aborting server 
> ece3af76d634,45149,1618139661734
> 2021-04-11T11:14:40,272 ERROR 
> [ReplicationExecutor-0.replicationSource,1-ece3af76d634,44745,1618139625245] 
> regionserver.ReplicationSource(428): Unexpected exception in 
> ReplicationExecutor-0.replicationSource,1-ece3af76d634,44745,1618139625245 
> currentPath=null
> java.lang.IllegalStateException: Source should be active.
>   at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.initialize(ReplicationSource.java:547)
>  ~[classes/:?]
>   at java.lang.Thread.run(Thread.java:748) [?:1.8.0_282]
> 2021-04-11T11:14:40,272 DEBUG 
> [ReplicationExecutor-0.replicationSource,1-ece3af76d634,44745,1618139625245] 
> regionserver.HRegionServer(2576): Abort already in progress. Ignoring the 
> current request with reason: Unexpected exception in 
> ReplicationExecutor-0.replicationSource,1-ece3af76d634,44745,1618139625245
> {code}
> Maybe it should use HBASE-24877 to avoid failure of the initialize of 
> ReplicationSource.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-25876) Add retry if we fail to read all bytes of the protobuf magic marker

2021-05-10 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack updated HBASE-25876:
--
Priority: Trivial  (was: Major)

> Add retry if we fail to read all bytes of the protobuf magic marker
> ---
>
> Key: HBASE-25876
> URL: https://issues.apache.org/jira/browse/HBASE-25876
> Project: HBase
>  Issue Type: Sub-task
>  Components: io
>Reporter: Michael Stack
>Priority: Trivial
>
> The parent issue fixes an instance where we try once to read protobuf magic 
> marker bytes rather than retry till we have enough. This subtask applies the 
> same trick in all cases where we could run into this issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25876) Add retry if we fail to read all bytes of the protobuf magic marker

2021-05-10 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17342125#comment-17342125
 ] 

Michael Stack commented on HBASE-25876:
---

Just two places (thought there were more). Trivial.

> Add retry if we fail to read all bytes of the protobuf magic marker
> ---
>
> Key: HBASE-25876
> URL: https://issues.apache.org/jira/browse/HBASE-25876
> Project: HBase
>  Issue Type: Sub-task
>  Components: io
>Reporter: Michael Stack
>Priority: Trivial
>
> The parent issue fixes an instance where we try once to read protobuf magic 
> marker bytes rather than retry till we have enough. This subtask applies the 
> same trick in all cases where we could run into this issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25876) Add retry if we fail to read all bytes of the protobuf magic marker

2021-05-10 Thread Michael Stack (Jira)
Michael Stack created HBASE-25876:
-

 Summary: Add retry if we fail to read all bytes of the protobuf 
magic marker
 Key: HBASE-25876
 URL: https://issues.apache.org/jira/browse/HBASE-25876
 Project: HBase
  Issue Type: Sub-task
  Components: io
Reporter: Michael Stack


The parent issue fixes an instance where we try once to read protobuf magic 
marker bytes rather than retry till we have enough. This subtask applies the 
same trick in all cases where we could run into this issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25859) Reference class incorrectly parses the protobuf magic marker

2021-05-10 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25859.
---
Fix Version/s: 2.4.4
   2.5.0
   3.0.0-alpha-1
 Hadoop Flags: Reviewed
   Resolution: Fixed

Merged the PR.

Let me make a subissue to address other instances of the problem here.

> Reference class incorrectly parses the protobuf magic marker
> 
>
> Key: HBASE-25859
> URL: https://issues.apache.org/jira/browse/HBASE-25859
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 2.4.1
>Reporter: Constantin-Catalin Luca
>Assignee: Constantin-Catalin Luca
>Priority: Minor
> Fix For: 3.0.0-alpha-1, 2.5.0, 2.4.4
>
>
> The Reference class incorrectly parses the protobuf magic marker.
> It uses:
> {code:java}
> // DataInputStream.read(byte[lengthOfPNMagic]){code}
> but this call does not guarantee to read all the bytes of the marker.
>  The fix is the same as the one for 
> https://issues.apache.org/jira/browse/HBASE-25674



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25032) Wait for region server to become online before adding it to online servers in Master

2021-05-07 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17341124#comment-17341124
 ] 

Michael Stack commented on HBASE-25032:
---

I suggest we not do a 2.3.5.1 but a 2.3.6. [~apurtell]


> Wait for region server to become online before adding it to online servers in 
> Master
> 
>
> Key: HBASE-25032
> URL: https://issues.apache.org/jira/browse/HBASE-25032
> Project: HBase
>  Issue Type: Bug
>Reporter: Sandeep Guggilam
>Assignee: Caroline Zhou
>Priority: Major
>  Labels: master, regionserver
> Fix For: 3.0.0-alpha-1, 2.5.0
>
>
> As part of RS start up, RS reports for duty to Master . Master acknowledges 
> the request and adds it to the onlineServers list for further assigning any 
> regions to the RS
> Once Master acknowledges the reportForDuty and sends back the response, RS 
> does a bunch of stuff like initializing replication sources etc before 
> becoming online. However, sometimes there could be an issue with initializing 
> replication sources when it is unable to connect to peer clusters because of 
> some kerberos configuration and there would be a delay of around 20 mins in 
> becoming online.
>  
> Since master considers it online, it tries to assign regions and which fails 
> with ServerNotRunningYet exception, then the master tries to unassign which 
> again fails with the same exception leading the region to FAILED_CLOSE state.
>  
> It would be good to have a check to see if the RS is ready to accept the 
> assignment requests before adding it to online servers list which would 
> account for any such delays as described above



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25867) Extra doc around ITBLL

2021-05-07 Thread Michael Stack (Jira)
Michael Stack created HBASE-25867:
-

 Summary: Extra doc around ITBLL
 Key: HBASE-25867
 URL: https://issues.apache.org/jira/browse/HBASE-25867
 Project: HBase
  Issue Type: Bug
  Components: documentation
Reporter: Michael Stack
 Fix For: 3.0.0-alpha-1, 2.5.0, 2.4.4


Added some doc around ITBLL to explain stuff I had difficultly with. Minor 
items such as log message & javadoc edits and explaining how to pass 
configuration to the ChaosMonkeyRunner.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25792) Filter out o.a.hadoop.thirdparty building shaded jars

2021-04-27 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25792.
---
Fix Version/s: 2.4.3
   2.5.0
   3.0.0-alpha-1
 Hadoop Flags: Reviewed
   Resolution: Fixed

Pushed to branch-2.4+. Thanks for reviews [~weichiu], [~zhangduo], and 
[~ndimiduk]

> Filter out o.a.hadoop.thirdparty building shaded jars
> -
>
> Key: HBASE-25792
> URL: https://issues.apache.org/jira/browse/HBASE-25792
> Project: HBase
>  Issue Type: Bug
>  Components: shading
>Affects Versions: 3.0.0-alpha-1, 2.5.0, 2.4.3
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.5.0, 2.4.3
>
>
> Hadoop 3.3.1 (unreleased currently) shades guava. The shaded guava then trips 
> the check in our shading that tries to exclude hadoop bits from the fat jars 
> we build.
> For the issue to trigger, need to build against tip of hadoop branch-3.3. You 
> then get this complaint:
> {code}
> [INFO] --- exec-maven-plugin:1.6.0:exec (check-jar-contents) @ 
> hbase-shaded-check-invariants ---
> [ERROR] Found artifact with unexpected contents: 
> '/Users/stack/.m2/repository/org/apache/hbase/hbase-shaded-mapreduce/2.3.6-SNAPSHOT/hbase-shaded-mapreduce-2.3.6-SNAPSHOT.jar'
> Please check the following and either correct the build or update
> the allowed list with reasoning.
> org/apache/hadoop/thirdparty/
> org/apache/hadoop/thirdparty/com/
> org/apache/hadoop/thirdparty/com/google/
> org/apache/hadoop/thirdparty/com/google/common/
> org/apache/hadoop/thirdparty/com/google/common/annotations/
> org/apache/hadoop/thirdparty/com/google/common/annotations/Beta.class
> 
> org/apache/hadoop/thirdparty/com/google/common/annotations/GwtCompatible.class
> 
> org/apache/hadoop/thirdparty/com/google/common/annotations/GwtIncompatible.class
> 
> org/apache/hadoop/thirdparty/com/google/common/annotations/VisibleForTesting.class
> org/apache/hadoop/thirdparty/com/google/common/base/
> org/apache/hadoop/thirdparty/com/google/common/base/Absent.class
> 
> org/apache/hadoop/thirdparty/com/google/common/base/AbstractIterator$1.class
> 
> org/apache/hadoop/thirdparty/com/google/common/base/AbstractIterator$State.class
> org/apache/hadoop/thirdparty/com/google/common/base/AbstractIterator.class
> org/apache/hadoop/thirdparty/com/google/common/base/Ascii.class
> org/apache/hadoop/thirdparty/com/google/common/base/CaseFormat$1.class
> org/apache/hadoop/thirdparty/com/google/common/base/CaseFormat$2.class
> org/apache/hadoop/thirdparty/com/google/common/base/CaseFormat$3.class
> org/apache/hadoop/thirdparty/com/google/common/base/CaseFormat$4.class
> 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-25792) Filter out o.a.hadoop.thirdparty building shaded jars

2021-04-19 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack updated HBASE-25792:
--
Description: 
Hadoop 3.3.1 (unreleased currently) shades guava. The shaded guava then trips 
the check in our shading that tries to exclude hadoop bits from the fat jars we 
build.

For the issue to trigger, need to build against tip of hadoop branch-3.3. You 
then get this complaint:

{code}
[INFO] --- exec-maven-plugin:1.6.0:exec (check-jar-contents) @ 
hbase-shaded-check-invariants ---
[ERROR] Found artifact with unexpected contents: 
'/Users/stack/.m2/repository/org/apache/hbase/hbase-shaded-mapreduce/2.3.6-SNAPSHOT/hbase-shaded-mapreduce-2.3.6-SNAPSHOT.jar'
Please check the following and either correct the build or update
the allowed list with reasoning.

org/apache/hadoop/thirdparty/
org/apache/hadoop/thirdparty/com/
org/apache/hadoop/thirdparty/com/google/
org/apache/hadoop/thirdparty/com/google/common/
org/apache/hadoop/thirdparty/com/google/common/annotations/
org/apache/hadoop/thirdparty/com/google/common/annotations/Beta.class

org/apache/hadoop/thirdparty/com/google/common/annotations/GwtCompatible.class

org/apache/hadoop/thirdparty/com/google/common/annotations/GwtIncompatible.class

org/apache/hadoop/thirdparty/com/google/common/annotations/VisibleForTesting.class
org/apache/hadoop/thirdparty/com/google/common/base/
org/apache/hadoop/thirdparty/com/google/common/base/Absent.class
org/apache/hadoop/thirdparty/com/google/common/base/AbstractIterator$1.class

org/apache/hadoop/thirdparty/com/google/common/base/AbstractIterator$State.class
org/apache/hadoop/thirdparty/com/google/common/base/AbstractIterator.class
org/apache/hadoop/thirdparty/com/google/common/base/Ascii.class
org/apache/hadoop/thirdparty/com/google/common/base/CaseFormat$1.class
org/apache/hadoop/thirdparty/com/google/common/base/CaseFormat$2.class
org/apache/hadoop/thirdparty/com/google/common/base/CaseFormat$3.class
org/apache/hadoop/thirdparty/com/google/common/base/CaseFormat$4.class


{code}

  was:
Hadoop 3.3.1 (unreleased currently) shades guava. The shaded guava then trips 
the check in our shading that tries to exclude hadoop bits from the fat jars we 
build.

For the issue to trigger, need to build against tip of hadoop branch-3.3.


> Filter out o.a.hadoop.thirdparty building shaded jars
> -
>
> Key: HBASE-25792
> URL: https://issues.apache.org/jira/browse/HBASE-25792
> Project: HBase
>  Issue Type: Bug
>  Components: shading
>Affects Versions: 3.0.0-alpha-1, 2.5.0, 2.4.3
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Major
>
> Hadoop 3.3.1 (unreleased currently) shades guava. The shaded guava then trips 
> the check in our shading that tries to exclude hadoop bits from the fat jars 
> we build.
> For the issue to trigger, need to build against tip of hadoop branch-3.3. You 
> then get this complaint:
> {code}
> [INFO] --- exec-maven-plugin:1.6.0:exec (check-jar-contents) @ 
> hbase-shaded-check-invariants ---
> [ERROR] Found artifact with unexpected contents: 
> '/Users/stack/.m2/repository/org/apache/hbase/hbase-shaded-mapreduce/2.3.6-SNAPSHOT/hbase-shaded-mapreduce-2.3.6-SNAPSHOT.jar'
> Please check the following and either correct the build or update
> the allowed list with reasoning.
> org/apache/hadoop/thirdparty/
> org/apache/hadoop/thirdparty/com/
> org/apache/hadoop/thirdparty/com/google/
> org/apache/hadoop/thirdparty/com/google/common/
> org/apache/hadoop/thirdparty/com/google/common/annotations/
> org/apache/hadoop/thirdparty/com/google/common/annotations/Beta.class
> 
> org/apache/hadoop/thirdparty/com/google/common/annotations/GwtCompatible.class
> 
> org/apache/hadoop/thirdparty/com/google/common/annotations/GwtIncompatible.class
> 
> org/apache/hadoop/thirdparty/com/google/common/annotations/VisibleForTesting.class
> org/apache/hadoop/thirdparty/com/google/common/base/
> org/apache/hadoop/thirdparty/com/google/common/base/Absent.class
> 
> org/apache/hadoop/thirdparty/com/google/common/base/AbstractIterator$1.class
> 
> org/apache/hadoop/thirdparty/com/google/common/base/AbstractIterator$State.class
> org/apache/hadoop/thirdparty/com/google/common/base/AbstractIterator.class
> org/apache/hadoop/thirdparty/com/google/common/base/Ascii.class
> org/apache/hadoop/thirdparty/com/google/common/base/CaseFormat$1.class
> org/apache/hadoop/thirdparty/com/google/common/base/CaseFormat$2.class
> org/apache/hadoop/thirdparty/com/google/common/base/CaseFormat$3.class
> org/apache/hadoop/thirdparty/com/google/common/base/CaseFormat$4.class
> 
> {code}



--
This message was sent 

[jira] [Created] (HBASE-25792) Filter out o.a.hadoop.thirdparty building shaded jars

2021-04-19 Thread Michael Stack (Jira)
Michael Stack created HBASE-25792:
-

 Summary: Filter out o.a.hadoop.thirdparty building shaded jars
 Key: HBASE-25792
 URL: https://issues.apache.org/jira/browse/HBASE-25792
 Project: HBase
  Issue Type: Bug
  Components: shading
Affects Versions: 3.0.0-alpha-1, 2.5.0, 2.4.3
Reporter: Michael Stack
Assignee: Michael Stack


Hadoop 3.3.1 (unreleased currently) shades guava. The shaded guava then trips 
the check in our shading that tries to exclude hadoop bits from the fat jars we 
build.

For the issue to trigger, need to build against tip of hadoop branch-3.3.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25761) POC: hbase:meta,,1 as ROOT

2021-04-12 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17319641#comment-17319641
 ] 

Michael Stack commented on HBASE-25761:
---

bq. If we fully follow the bigtable way, then we need to split the meta region 
first to separate the special 'ROOT' region, it will break the old client...

I was thinking that you didn't have to do this; that if meta split is disabled, 
we just carry-on as we do now with a single hbase:meta,,1 Region (old clients 
would continue to work -- no need of a proxy).

bq. In SCP and other places we still need to treat the first meta region 
specially, which is the same with introducing an extra ROOT region, but the 
code will be more confusing...

Yeah, we'd give this Region precedence. It would be named hbase:meta,,1 rather 
than ROOT,,1 but otherwise, the special handling will be there.

In the BT table, they are trying to avoid adding in an extra tier of assign 
which is a concern of ours. Let me play around w/ the idea. Will bring what I 
find back to the design doc.

> POC: hbase:meta,,1 as ROOT
> --
>
> Key: HBASE-25761
> URL: https://issues.apache.org/jira/browse/HBASE-25761
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Michael Stack
>Priority: Major
>
> One of the proposals up in the split-meta design doc suggests a 
> sleight-of-hand where the current hard-coded hbase:meta,,1 Region is 
> leveraged to serve as first Region of a split hbase:meta but also does 
> double-duty as 'ROOT'. This suggestion was put aside as a complicating 
> recursion in chat but then Francis noticed on a re-read of the BigTable 
> paper, that this is how they describe they do 'ROOT': "The root tablet is 
> just the first tablet in the METADATA table, but is treated specially -- it 
> is never split..."
> This issue is for playing around with this notion to see what the problems 
> are so can do a better description of this approach here, in the design:
> https://docs.google.com/document/d/11ChsSb2LGrSzrSJz8pDCAw5IewmaMV0ZDN1LrMkAj4s/edit?ts=606c120f#heading=h.ikbhxlcthjle



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25760) Duplicate uploads for hbase-shaded-protobuf sources jar causing nexus deploy failure

2021-04-12 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17319634#comment-17319634
 ] 

Michael Stack commented on HBASE-25760:
---

[~vjasani] What [~psomogyi] said. Use the create-release scripts from the 
master branch of hbase; it knows how to do third-party, etc.

> Duplicate uploads for hbase-shaded-protobuf sources jar causing nexus deploy 
> failure
> 
>
> Key: HBASE-25760
> URL: https://issues.apache.org/jira/browse/HBASE-25760
> Project: HBase
>  Issue Type: Bug
>  Components: thirdparty
>Reporter: Anjan Das
>Assignee: Anjan Das
>Priority: Major
>
> There are two configurations in 
> [pom.xml|https://github.com/apache/hbase-thirdparty/blob/master/hbase-shaded-protobuf/pom.xml]
>  of hbase-shaded-protobuf that create the sources jar. Here is 
> [one|https://github.com/apache/hbase-thirdparty/blob/ccc49e6a78a00e61fa49f1292ace2f8bde28c54e/hbase-shaded-protobuf/pom.xml#L150]
>  in the maven-shade-plugin and 
> [other|https://github.com/apache/hbase-thirdparty/blob/ccc49e6a78a00e61fa49f1292ace2f8bde28c54e/hbase-shaded-protobuf/pom.xml#L133]
>  in maven-source-plugin(introduced in HBASE-18313)
> After removing the one from maven-shade-plugin, viz. 
> true, we are able to successfully deploy 
> as it generates source jar only once.
>  
> Error stacktraces:
> Failed to execute goal 
> org.apache.maven.plugins:maven-deploy-plugin:2.7:deploy (default-deploy) on 
> project hbase-shaded-protobuf: Failed to deploy artifacts: Could not transfer 
> artifact org.apache.hbase.thirdparty:hbase-shaded-protobuf:jar:sources:3.5.1 
> from/to nexus 
> ([https://nexus.xyz.com/nexus/content/repositories/hbase/):|https://nexus.soma.salesforce.com/nexus/content/repositories/salesforce-hbase/):]
>  Transfer failed for 
> [https://nexus.xyz.com/nexus/content/repositories/hbase/org/apache/hbase/thirdparty/hbase-shaded-protobuf/3.5.1/hbase-shaded-protobuf-3.5.1-sources.jar|https://nexus.soma.salesforce.com/nexus/content/repositories/salesforce-hbase/org/apache/hbase/thirdparty/hbase-shaded-protobuf/3.5.1-sfdc-1.0.3/hbase-shaded-protobuf-3.5.1-sfdc-1.0.3-sources.jar]
>  400 Bad Request -> [Help 1]
>  
> Build Commands: 
> mvn org.codehaus.mojo:versions-maven-plugin:2.7:set 
> -DgenerateBackupPoms=false -DoldVersion='\''*'\'' -DnewVersion='\''3.5.1'\'''
> mvn -X -e -nsu clean deploy -DadditionalJOption=-Xdoclint:none -DskipTests
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25761) POC: hbase:meta,,1 as ROOT

2021-04-10 Thread Michael Stack (Jira)
Michael Stack created HBASE-25761:
-

 Summary: POC: hbase:meta,,1 as ROOT
 Key: HBASE-25761
 URL: https://issues.apache.org/jira/browse/HBASE-25761
 Project: HBase
  Issue Type: Sub-task
Reporter: Michael Stack


One of the proposals up in the split-meta design doc suggests a sleight-of-hand 
where the current hard-coded hbase:meta,,1 Region is leveraged to serve as 
first Region of a split hbase:meta but also does double-duty as 'ROOT'. This 
suggestion was put aside as a complicating recursion in chat but then Francis 
noticed on a re-read of the BigTable paper, that this is how they describe they 
do 'ROOT': "The root tablet is just the first tablet in the METADATA table, but 
is treated specially -- it is never split..."

This issue is for playing around with this notion to see what the problems are 
so can do a better description of this approach here, in the design:

https://docs.google.com/document/d/11ChsSb2LGrSzrSJz8pDCAw5IewmaMV0ZDN1LrMkAj4s/edit?ts=606c120f#heading=h.ikbhxlcthjle



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25709) Close region may stuck when region is compacting and skipped most cells read

2021-04-09 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17318085#comment-17318085
 ] 

Michael Stack commented on HBASE-25709:
---

Thank you [~Xiaolin Ha].  Would it help if we could distinguish compacting 
scanners from user-facing instances? A compacting scanner can be aborted on 
close but a user-scanner not? Will you turn on this feature even though it has 
the correctness issues you note above?

> Close region may stuck when region is compacting and skipped most cells read
> 
>
> Key: HBASE-25709
> URL: https://issues.apache.org/jira/browse/HBASE-25709
> Project: HBase
>  Issue Type: Improvement
>  Components: Compaction
>Affects Versions: 1.4.13
>Reporter: Xiaolin Ha
>Assignee: Xiaolin Ha
>Priority: Major
> Attachments: Master-UI-RIT.png, RS-region-state.png
>
>
> We found in our cluster about stop region stuck. The region is compacting, 
> and its store files has many TTL expired cells. Close region state 
> marker(HRegion#writestate.writesEnabled) is not checked in compaction, 
> because most cells were skipped. 
> !RS-region-state.png|width=698,height=310!
>  
> !Master-UI-RIT.png|width=693,height=157!
>  
> HBASE-23968 has encountered similar problem, but the solution in it is outer 
> the method
> InternalScanner#next(List result, ScannerContext scannerContext), which 
> will not return if there are many skipped cells, for current compaction 
> scanner context. As a result, we need to return in time in the next method, 
> and then check the stop marker.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25735) Add target Region to connection exceptions

2021-04-08 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25735.
---
Resolution: Fixed

I pushed this addendum on branch-2.4+

{code}
kalashnikov:hbase.apache.git stack$ git show -1
commit f9819f33b6b1016364c10d80129e3d0faf7ff17e (HEAD -> m, origin/master, 
origin/HEAD)
Author: stack 
Date:   Thu Apr 8 13:24:29 2021 -0700

HBASE-25735 Add target Region to connection exceptions
Restore API for Phoenix (though it shouldn't be using
Private classes).

diff --git 
a/hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/RpcControllerFactory.java
 
b/hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/RpcControllerFactory.java
index 0dcb22fa5b..e6d63fac1f 100644
--- 
a/hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/RpcControllerFactory.java
+++ 
b/hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/RpcControllerFactory.java
@@ -18,15 +18,14 @@
 package org.apache.hadoop.hbase.ipc;

 import java.util.List;
-
 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.hbase.CellScannable;
 import org.apache.hadoop.hbase.CellScanner;
 import org.apache.hadoop.hbase.client.RegionInfo;
+import org.apache.hadoop.hbase.util.ReflectionUtils;
 import org.apache.yetus.audience.InterfaceAudience;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
-import org.apache.hadoop.hbase.util.ReflectionUtils;

 /**
  * Factory to create a {@link HBaseRpcController}
@@ -52,16 +51,23 @@ public class RpcControllerFactory {
 return new HBaseRpcControllerImpl();
   }

+  public HBaseRpcController newController(CellScanner cellScanner) {
+return new HBaseRpcControllerImpl(null, cellScanner);
+  }
+
   public HBaseRpcController newController(RegionInfo regionInfo, CellScanner 
cellScanner) {
 return new HBaseRpcControllerImpl(regionInfo, cellScanner);
   }

+  public HBaseRpcController newController(final List 
cellIterables) {
+return new HBaseRpcControllerImpl(null, cellIterables);
+  }
+
   public HBaseRpcController newController(RegionInfo regionInfo,
   final List cellIterables) {
 return new HBaseRpcControllerImpl(regionInfo, cellIterables);
   }

-
   public static RpcControllerFactory instantiate(Configuration configuration) {
 String rpcControllerFactoryClazz =
 configuration.get(CUSTOM_CONTROLLER_CONF_KEY,
{code}

> Add target Region to connection exceptions
> --
>
> Key: HBASE-25735
> URL: https://issues.apache.org/jira/browse/HBASE-25735
> Project: HBase
>  Issue Type: Bug
>  Components: rpc
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.5.0, 2.4.3
>
>
> We spent a bit of time making it so exceptions included the remote host name. 
> Looks like we can add the target Region name too with a bit of manipulation; 
> will help figuring hot-spotting or problem Region on serverside.  For 
> example, here is what I was seeing recently on client-side when a RS was was 
> timing out requests:
> {code}
> 2021-04-06T02:18:23.533Z, RpcRetryingCaller{globalStartTime=1617675482894, 
> pause=100, maxAttempts=4}, org.apache.hadoop.hbase.ipc.CallTimeoutException: 
> Call to ps0989.example.org/1.1.1.1:16020 failed on local exception: 
> org.apache.hadoop.hbase.ipc.CallTimeoutException: 
> Call[id=88369369,methodName=Get], waitTime=5006, rpcTimeout=5000
> at 
> org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:145)
> at org.apache.hadoop.hbase.client.HTable.get(HTable.java:383)
> at org.apache.hadoop.hbase.client.HTable.get(HTable.java:357)
> ...
> Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call to 
> ps0989.bot.parsec.apple.com/17.58.114.206:16020 failed on local exception: 
> org.apache.hadoop.hbase.ipc.CallTimeoutException: 
> Call[id=88369369,methodName=Get], waitTime=5006, rpcTimeout=5000
> at org.apache.hadoop.hbase.ipc.IPCUtil.wrapException(IPCUtil.java:209)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractRpcClient.java:378)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$100(AbstractRpcClient.java:89)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:409)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:405)
> at org.apache.hadoop.hbase.ipc.Call.setTimeout(Call.java:110)
> at 
> org.apache.hadoop.hbase.ipc.RpcConnection$1.run(RpcConnection.java:136)
> at 
> org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:672)
> at 
> 

[jira] [Reopened] (HBASE-25735) Add target Region to connection exceptions

2021-04-08 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack reopened HBASE-25735:
---

Reopening to add back old APIs used by Phoenix (though it shouldn't be down in 
our privates)

> Add target Region to connection exceptions
> --
>
> Key: HBASE-25735
> URL: https://issues.apache.org/jira/browse/HBASE-25735
> Project: HBase
>  Issue Type: Bug
>  Components: rpc
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.5.0, 2.4.3
>
>
> We spent a bit of time making it so exceptions included the remote host name. 
> Looks like we can add the target Region name too with a bit of manipulation; 
> will help figuring hot-spotting or problem Region on serverside.  For 
> example, here is what I was seeing recently on client-side when a RS was was 
> timing out requests:
> {code}
> 2021-04-06T02:18:23.533Z, RpcRetryingCaller{globalStartTime=1617675482894, 
> pause=100, maxAttempts=4}, org.apache.hadoop.hbase.ipc.CallTimeoutException: 
> Call to ps0989.example.org/1.1.1.1:16020 failed on local exception: 
> org.apache.hadoop.hbase.ipc.CallTimeoutException: 
> Call[id=88369369,methodName=Get], waitTime=5006, rpcTimeout=5000
> at 
> org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:145)
> at org.apache.hadoop.hbase.client.HTable.get(HTable.java:383)
> at org.apache.hadoop.hbase.client.HTable.get(HTable.java:357)
> ...
> Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call to 
> ps0989.bot.parsec.apple.com/17.58.114.206:16020 failed on local exception: 
> org.apache.hadoop.hbase.ipc.CallTimeoutException: 
> Call[id=88369369,methodName=Get], waitTime=5006, rpcTimeout=5000
> at org.apache.hadoop.hbase.ipc.IPCUtil.wrapException(IPCUtil.java:209)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractRpcClient.java:378)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$100(AbstractRpcClient.java:89)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:409)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:405)
> at org.apache.hadoop.hbase.ipc.Call.setTimeout(Call.java:110)
> at 
> org.apache.hadoop.hbase.ipc.RpcConnection$1.run(RpcConnection.java:136)
> at 
> org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:672)
> at 
> org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:747)
> at 
> org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:472)
> ... 1 more
> Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: 
> Call[id=88369369,methodName=Get], waitTime=5006, rpcTimeout=5000
> at 
> org.apache.hadoop.hbase.ipc.RpcConnection$1.run(RpcConnection.java:137)
> ... 4 more
> {code}
> I wanted the region it was hitting. I wanted to know if it was a server 
> problem or a Region issue. If clients only having issue w/ one Region, then I 
> could focus on it.
> After the PR the exception (from another context) looks like this:
> {code}
> org.apache.hadoop.hbase.ipc.CallTimeoutException: Call to 
> address=127.0.0.1:12345, regionInfo=hbase:meta,,1.1588230740 failed on local 
> exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: error
> 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25735) Add target Region to connection exceptions

2021-04-08 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17317456#comment-17317456
 ] 

Michael Stack commented on HBASE-25735:
---

Doing

> Add target Region to connection exceptions
> --
>
> Key: HBASE-25735
> URL: https://issues.apache.org/jira/browse/HBASE-25735
> Project: HBase
>  Issue Type: Bug
>  Components: rpc
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.5.0, 2.4.3
>
>
> We spent a bit of time making it so exceptions included the remote host name. 
> Looks like we can add the target Region name too with a bit of manipulation; 
> will help figuring hot-spotting or problem Region on serverside.  For 
> example, here is what I was seeing recently on client-side when a RS was was 
> timing out requests:
> {code}
> 2021-04-06T02:18:23.533Z, RpcRetryingCaller{globalStartTime=1617675482894, 
> pause=100, maxAttempts=4}, org.apache.hadoop.hbase.ipc.CallTimeoutException: 
> Call to ps0989.example.org/1.1.1.1:16020 failed on local exception: 
> org.apache.hadoop.hbase.ipc.CallTimeoutException: 
> Call[id=88369369,methodName=Get], waitTime=5006, rpcTimeout=5000
> at 
> org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:145)
> at org.apache.hadoop.hbase.client.HTable.get(HTable.java:383)
> at org.apache.hadoop.hbase.client.HTable.get(HTable.java:357)
> ...
> Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call to 
> ps0989.bot.parsec.apple.com/17.58.114.206:16020 failed on local exception: 
> org.apache.hadoop.hbase.ipc.CallTimeoutException: 
> Call[id=88369369,methodName=Get], waitTime=5006, rpcTimeout=5000
> at org.apache.hadoop.hbase.ipc.IPCUtil.wrapException(IPCUtil.java:209)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractRpcClient.java:378)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$100(AbstractRpcClient.java:89)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:409)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:405)
> at org.apache.hadoop.hbase.ipc.Call.setTimeout(Call.java:110)
> at 
> org.apache.hadoop.hbase.ipc.RpcConnection$1.run(RpcConnection.java:136)
> at 
> org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:672)
> at 
> org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:747)
> at 
> org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:472)
> ... 1 more
> Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: 
> Call[id=88369369,methodName=Get], waitTime=5006, rpcTimeout=5000
> at 
> org.apache.hadoop.hbase.ipc.RpcConnection$1.run(RpcConnection.java:137)
> ... 4 more
> {code}
> I wanted the region it was hitting. I wanted to know if it was a server 
> problem or a Region issue. If clients only having issue w/ one Region, then I 
> could focus on it.
> After the PR the exception (from another context) looks like this:
> {code}
> org.apache.hadoop.hbase.ipc.CallTimeoutException: Call to 
> address=127.0.0.1:12345, regionInfo=hbase:meta,,1.1588230740 failed on local 
> exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: error
> 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25709) Close region may stuck when region is compacting and skipped most cells read

2021-04-08 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17317366#comment-17317366
 ] 

Michael Stack commented on HBASE-25709:
---

Patch looks good. Defaults to off. Why would we not just have this flag enabled 
always [~Xiaolin Ha]? If a Region has been asked close, compactions should be 
preempted and put aside until we open in new location? Close should preempt 
everything I'd suggest except an ongoing user read?

> Close region may stuck when region is compacting and skipped most cells read
> 
>
> Key: HBASE-25709
> URL: https://issues.apache.org/jira/browse/HBASE-25709
> Project: HBase
>  Issue Type: Improvement
>  Components: Compaction
>Affects Versions: 1.4.13
>Reporter: Xiaolin Ha
>Assignee: Xiaolin Ha
>Priority: Major
> Attachments: Master-UI-RIT.png, RS-region-state.png
>
>
> We found in our cluster about stop region stuck. The region is compacting, 
> and its store files has many TTL expired cells. Close region state 
> marker(HRegion#writestate.writesEnabled) is not checked in compaction, 
> because most cells were skipped. 
> !RS-region-state.png|width=698,height=310!
>  
> !Master-UI-RIT.png|width=693,height=157!
>  
> HBASE-23968 has encountered similar problem, but the solution in it is outer 
> the method
> InternalScanner#next(List result, ScannerContext scannerContext), which 
> will not return if there are many skipped cells, for current compaction 
> scanner context. As a result, we need to return in time in the next method, 
> and then check the stop marker.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25747) Remove unused getWriteAvailable method in OperationQuota

2021-04-08 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17317359#comment-17317359
 ] 

Michael Stack commented on HBASE-25747:
---

What versions are we talking here [~meiyi]? Patch looks good otherwise. Thanks.

> Remove unused getWriteAvailable method in OperationQuota
> 
>
> Key: HBASE-25747
> URL: https://issues.apache.org/jira/browse/HBASE-25747
> Project: HBase
>  Issue Type: Improvement
>  Components: Quotas
>Reporter: Yi Mei
>Assignee: Yi Mei
>Priority: Minor
>
> The getWriteAvailable method is unused in OperationQuota, because for write 
> operation, the size is accurate.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25735) Add target Region to connection exceptions

2021-04-07 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17316857#comment-17316857
 ] 

Michael Stack commented on HBASE-25735:
---

Shout if you want me to do something on this end [~apurtell]

> Add target Region to connection exceptions
> --
>
> Key: HBASE-25735
> URL: https://issues.apache.org/jira/browse/HBASE-25735
> Project: HBase
>  Issue Type: Bug
>  Components: rpc
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.5.0, 2.4.3
>
>
> We spent a bit of time making it so exceptions included the remote host name. 
> Looks like we can add the target Region name too with a bit of manipulation; 
> will help figuring hot-spotting or problem Region on serverside.  For 
> example, here is what I was seeing recently on client-side when a RS was was 
> timing out requests:
> {code}
> 2021-04-06T02:18:23.533Z, RpcRetryingCaller{globalStartTime=1617675482894, 
> pause=100, maxAttempts=4}, org.apache.hadoop.hbase.ipc.CallTimeoutException: 
> Call to ps0989.example.org/1.1.1.1:16020 failed on local exception: 
> org.apache.hadoop.hbase.ipc.CallTimeoutException: 
> Call[id=88369369,methodName=Get], waitTime=5006, rpcTimeout=5000
> at 
> org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:145)
> at org.apache.hadoop.hbase.client.HTable.get(HTable.java:383)
> at org.apache.hadoop.hbase.client.HTable.get(HTable.java:357)
> ...
> Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call to 
> ps0989.bot.parsec.apple.com/17.58.114.206:16020 failed on local exception: 
> org.apache.hadoop.hbase.ipc.CallTimeoutException: 
> Call[id=88369369,methodName=Get], waitTime=5006, rpcTimeout=5000
> at org.apache.hadoop.hbase.ipc.IPCUtil.wrapException(IPCUtil.java:209)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractRpcClient.java:378)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$100(AbstractRpcClient.java:89)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:409)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:405)
> at org.apache.hadoop.hbase.ipc.Call.setTimeout(Call.java:110)
> at 
> org.apache.hadoop.hbase.ipc.RpcConnection$1.run(RpcConnection.java:136)
> at 
> org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:672)
> at 
> org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:747)
> at 
> org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:472)
> ... 1 more
> Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: 
> Call[id=88369369,methodName=Get], waitTime=5006, rpcTimeout=5000
> at 
> org.apache.hadoop.hbase.ipc.RpcConnection$1.run(RpcConnection.java:137)
> ... 4 more
> {code}
> I wanted the region it was hitting. I wanted to know if it was a server 
> problem or a Region issue. If clients only having issue w/ one Region, then I 
> could focus on it.
> After the PR the exception (from another context) looks like this:
> {code}
> org.apache.hadoop.hbase.ipc.CallTimeoutException: Call to 
> address=127.0.0.1:12345, regionInfo=hbase:meta,,1.1588230740 failed on local 
> exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: error
> 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25743) Retry REQUESTTIMEOUT KeeperExceptions from ZK

2021-04-07 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17316537#comment-17316537
 ] 

Michael Stack commented on HBASE-25743:
---

I missed the second retry... Thanks.

> Retry REQUESTTIMEOUT KeeperExceptions from ZK
> -
>
> Key: HBASE-25743
> URL: https://issues.apache.org/jira/browse/HBASE-25743
> Project: HBase
>  Issue Type: Bug
>  Components: Zookeeper
>Affects Versions: 3.0.0-alpha-1, 1.7.0, 2.5.0, 2.4.3, 2.3.6
>Reporter: Bharath Vissapragada
>Assignee: Bharath Vissapragada
>Priority: Major
>
> Starting ZOOKEEPER-2251, client requests exceeding a timeout can throw a 
> KeeperException with REQUESTTIMEOUT opcode set. RecoverableZookeeper doesn't 
> transparently retry in such case. This was causing RS aborts when there is a 
> flaky ZK quorum member serving slow requests (especially in cases like 
> rolling upgrades and such).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25687) Backport "HBASE-25681 Add a switch for server/table queryMeter" to branch-2 and branch-1

2021-04-07 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25687.
---
Hadoop Flags: Reviewed
  Resolution: Fixed

Merged to branch-1. Thanks for patch [~DeanZ]

> Backport "HBASE-25681 Add a switch for server/table queryMeter" to branch-2 
> and branch-1
> 
>
> Key: HBASE-25687
> URL: https://issues.apache.org/jira/browse/HBASE-25687
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Baiqiang Zhao
>Assignee: Baiqiang Zhao
>Priority: Major
> Fix For: 1.7.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25743) Retry REQUESTTIMEOUT KeeperExceptions from ZK

2021-04-07 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17316526#comment-17316526
 ] 

Michael Stack commented on HBASE-25743:
---

ZK 3.6 and 3.5.5 have the ZOOKEEPER-2251 commit. The PR changes 
OPERATIONTIMEOUT to do retry? Is this a change in behavior? Thanks [~bharathv]



> Retry REQUESTTIMEOUT KeeperExceptions from ZK
> -
>
> Key: HBASE-25743
> URL: https://issues.apache.org/jira/browse/HBASE-25743
> Project: HBase
>  Issue Type: Bug
>  Components: Zookeeper
>Affects Versions: 3.0.0-alpha-1, 1.7.0, 2.5.0, 2.4.3, 2.3.6
>Reporter: Bharath Vissapragada
>Assignee: Bharath Vissapragada
>Priority: Major
>
> Starting ZOOKEEPER-2251, client requests exceeding a timeout can throw a 
> KeeperException with REQUESTTIMEOUT opcode set. RecoverableZookeeper doesn't 
> transparently retry in such case. This was causing RS aborts when there is a 
> flaky ZK quorum member serving slow requests (especially in cases like 
> rolling upgrades and such).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25735) Add target Region to connection exceptions

2021-04-07 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25735.
---
Resolution: Fixed

Re-resolved after pushing addendum on master.

> Add target Region to connection exceptions
> --
>
> Key: HBASE-25735
> URL: https://issues.apache.org/jira/browse/HBASE-25735
> Project: HBase
>  Issue Type: Bug
>  Components: rpc
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.5.0, 2.4.3
>
>
> We spent a bit of time making it so exceptions included the remote host name. 
> Looks like we can add the target Region name too with a bit of manipulation; 
> will help figuring hot-spotting or problem Region on serverside.  For 
> example, here is what I was seeing recently on client-side when a RS was was 
> timing out requests:
> {code}
> 2021-04-06T02:18:23.533Z, RpcRetryingCaller{globalStartTime=1617675482894, 
> pause=100, maxAttempts=4}, org.apache.hadoop.hbase.ipc.CallTimeoutException: 
> Call to ps0989.example.org/1.1.1.1:16020 failed on local exception: 
> org.apache.hadoop.hbase.ipc.CallTimeoutException: 
> Call[id=88369369,methodName=Get], waitTime=5006, rpcTimeout=5000
> at 
> org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:145)
> at org.apache.hadoop.hbase.client.HTable.get(HTable.java:383)
> at org.apache.hadoop.hbase.client.HTable.get(HTable.java:357)
> ...
> Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call to 
> ps0989.bot.parsec.apple.com/17.58.114.206:16020 failed on local exception: 
> org.apache.hadoop.hbase.ipc.CallTimeoutException: 
> Call[id=88369369,methodName=Get], waitTime=5006, rpcTimeout=5000
> at org.apache.hadoop.hbase.ipc.IPCUtil.wrapException(IPCUtil.java:209)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractRpcClient.java:378)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$100(AbstractRpcClient.java:89)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:409)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:405)
> at org.apache.hadoop.hbase.ipc.Call.setTimeout(Call.java:110)
> at 
> org.apache.hadoop.hbase.ipc.RpcConnection$1.run(RpcConnection.java:136)
> at 
> org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:672)
> at 
> org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:747)
> at 
> org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:472)
> ... 1 more
> Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: 
> Call[id=88369369,methodName=Get], waitTime=5006, rpcTimeout=5000
> at 
> org.apache.hadoop.hbase.ipc.RpcConnection$1.run(RpcConnection.java:137)
> ... 4 more
> {code}
> I wanted the region it was hitting. I wanted to know if it was a server 
> problem or a Region issue. If clients only having issue w/ one Region, then I 
> could focus on it.
> After the PR the exception (from another context) looks like this:
> {code}
> org.apache.hadoop.hbase.ipc.CallTimeoutException: Call to 
> address=127.0.0.1:12345, regionInfo=hbase:meta,,1.1588230740 failed on local 
> exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: error
> 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25735) Add target Region to connection exceptions

2021-04-07 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17316398#comment-17316398
 ] 

Michael Stack commented on HBASE-25735:
---

Hmm.. Fixed that and checked compile. Must have messed up. Thanks for pointer 
[~zhangduo]. Fixed.

> Add target Region to connection exceptions
> --
>
> Key: HBASE-25735
> URL: https://issues.apache.org/jira/browse/HBASE-25735
> Project: HBase
>  Issue Type: Bug
>  Components: rpc
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.5.0, 2.4.3
>
>
> We spent a bit of time making it so exceptions included the remote host name. 
> Looks like we can add the target Region name too with a bit of manipulation; 
> will help figuring hot-spotting or problem Region on serverside.  For 
> example, here is what I was seeing recently on client-side when a RS was was 
> timing out requests:
> {code}
> 2021-04-06T02:18:23.533Z, RpcRetryingCaller{globalStartTime=1617675482894, 
> pause=100, maxAttempts=4}, org.apache.hadoop.hbase.ipc.CallTimeoutException: 
> Call to ps0989.example.org/1.1.1.1:16020 failed on local exception: 
> org.apache.hadoop.hbase.ipc.CallTimeoutException: 
> Call[id=88369369,methodName=Get], waitTime=5006, rpcTimeout=5000
> at 
> org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:145)
> at org.apache.hadoop.hbase.client.HTable.get(HTable.java:383)
> at org.apache.hadoop.hbase.client.HTable.get(HTable.java:357)
> ...
> Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call to 
> ps0989.bot.parsec.apple.com/17.58.114.206:16020 failed on local exception: 
> org.apache.hadoop.hbase.ipc.CallTimeoutException: 
> Call[id=88369369,methodName=Get], waitTime=5006, rpcTimeout=5000
> at org.apache.hadoop.hbase.ipc.IPCUtil.wrapException(IPCUtil.java:209)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractRpcClient.java:378)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$100(AbstractRpcClient.java:89)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:409)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:405)
> at org.apache.hadoop.hbase.ipc.Call.setTimeout(Call.java:110)
> at 
> org.apache.hadoop.hbase.ipc.RpcConnection$1.run(RpcConnection.java:136)
> at 
> org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:672)
> at 
> org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:747)
> at 
> org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:472)
> ... 1 more
> Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: 
> Call[id=88369369,methodName=Get], waitTime=5006, rpcTimeout=5000
> at 
> org.apache.hadoop.hbase.ipc.RpcConnection$1.run(RpcConnection.java:137)
> ... 4 more
> {code}
> I wanted the region it was hitting. I wanted to know if it was a server 
> problem or a Region issue. If clients only having issue w/ one Region, then I 
> could focus on it.
> After the PR the exception (from another context) looks like this:
> {code}
> org.apache.hadoop.hbase.ipc.CallTimeoutException: Call to 
> address=127.0.0.1:12345, regionInfo=hbase:meta,,1.1588230740 failed on local 
> exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: error
> 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25735) Add target Region to connection exceptions

2021-04-06 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25735.
---
Fix Version/s: 2.4.3
   2.5.0
   3.0.0-alpha-1
 Hadoop Flags: Reviewed
   Resolution: Fixed

Merged to branch-2.4+. Thanks for review [~wchevreuil]

> Add target Region to connection exceptions
> --
>
> Key: HBASE-25735
> URL: https://issues.apache.org/jira/browse/HBASE-25735
> Project: HBase
>  Issue Type: Bug
>  Components: rpc
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.5.0, 2.4.3
>
>
> We spent a bit of time making it so exceptions included the remote host name. 
> Looks like we can add the target Region name too with a bit of manipulation; 
> will help figuring hot-spotting or problem Region on serverside.  For 
> example, here is what I was seeing recently on client-side when a RS was was 
> timing out requests:
> {code}
> 2021-04-06T02:18:23.533Z, RpcRetryingCaller{globalStartTime=1617675482894, 
> pause=100, maxAttempts=4}, org.apache.hadoop.hbase.ipc.CallTimeoutException: 
> Call to ps0989.example.org/1.1.1.1:16020 failed on local exception: 
> org.apache.hadoop.hbase.ipc.CallTimeoutException: 
> Call[id=88369369,methodName=Get], waitTime=5006, rpcTimeout=5000
> at 
> org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:145)
> at org.apache.hadoop.hbase.client.HTable.get(HTable.java:383)
> at org.apache.hadoop.hbase.client.HTable.get(HTable.java:357)
> ...
> Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call to 
> ps0989.bot.parsec.apple.com/17.58.114.206:16020 failed on local exception: 
> org.apache.hadoop.hbase.ipc.CallTimeoutException: 
> Call[id=88369369,methodName=Get], waitTime=5006, rpcTimeout=5000
> at org.apache.hadoop.hbase.ipc.IPCUtil.wrapException(IPCUtil.java:209)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractRpcClient.java:378)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$100(AbstractRpcClient.java:89)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:409)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:405)
> at org.apache.hadoop.hbase.ipc.Call.setTimeout(Call.java:110)
> at 
> org.apache.hadoop.hbase.ipc.RpcConnection$1.run(RpcConnection.java:136)
> at 
> org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:672)
> at 
> org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:747)
> at 
> org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:472)
> ... 1 more
> Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: 
> Call[id=88369369,methodName=Get], waitTime=5006, rpcTimeout=5000
> at 
> org.apache.hadoop.hbase.ipc.RpcConnection$1.run(RpcConnection.java:137)
> ... 4 more
> {code}
> I wanted the region it was hitting. I wanted to know if it was a server 
> problem or a Region issue. If clients only having issue w/ one Region, then I 
> could focus on it.
> After the PR the exception (from another context) looks like this:
> {code}
> org.apache.hadoop.hbase.ipc.CallTimeoutException: Call to 
> address=127.0.0.1:12345, regionInfo=hbase:meta,,1.1588230740 failed on local 
> exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: error
> 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25713) Make an hbase-wal module

2021-04-06 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25713.
---
Resolution: Won't Fix

Resolving as failed experiment

> Make an hbase-wal module
> 
>
> Key: HBASE-25713
> URL: https://issues.apache.org/jira/browse/HBASE-25713
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Michael Stack
>Priority: Major
>
> Extract an hbase-wal module upon which hbase-server can depend; makes 
> hbase-server smaller and maybe we could do an hbase-wal standalone... This is 
> an experiment.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25735) Add target Region to connection exceptions

2021-04-06 Thread Michael Stack (Jira)
Michael Stack created HBASE-25735:
-

 Summary: Add target Region to connection exceptions
 Key: HBASE-25735
 URL: https://issues.apache.org/jira/browse/HBASE-25735
 Project: HBase
  Issue Type: Bug
  Components: rpc
Reporter: Michael Stack
Assignee: Michael Stack


We spent a bit of time making it so exceptions included the remote host name. 
Looks like we can add the target Region name too with a bit of manipulation; 
will help figuring hot-spotting or problem Region on serverside.  For example, 
here is what I was seeing recently on client-side when a RS was was timing out 
requests:

{code}
2021-04-06T02:18:23.533Z, RpcRetryingCaller{globalStartTime=1617675482894, 
pause=100, maxAttempts=4}, org.apache.hadoop.hbase.ipc.CallTimeoutException: 
Call to ps0989.example.org/1.1.1.1:16020 failed on local exception: 
org.apache.hadoop.hbase.ipc.CallTimeoutException: 
Call[id=88369369,methodName=Get], waitTime=5006, rpcTimeout=5000
at 
org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:145)
at org.apache.hadoop.hbase.client.HTable.get(HTable.java:383)
at org.apache.hadoop.hbase.client.HTable.get(HTable.java:357)
...
Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call to 
ps0989.bot.parsec.apple.com/17.58.114.206:16020 failed on local exception: 
org.apache.hadoop.hbase.ipc.CallTimeoutException: 
Call[id=88369369,methodName=Get], waitTime=5006, rpcTimeout=5000
at org.apache.hadoop.hbase.ipc.IPCUtil.wrapException(IPCUtil.java:209)
at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractRpcClient.java:378)
at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$100(AbstractRpcClient.java:89)
at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:409)
at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:405)
at org.apache.hadoop.hbase.ipc.Call.setTimeout(Call.java:110)
at 
org.apache.hadoop.hbase.ipc.RpcConnection$1.run(RpcConnection.java:136)
at 
org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:672)
at 
org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:747)
at 
org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:472)
... 1 more
Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: 
Call[id=88369369,methodName=Get], waitTime=5006, rpcTimeout=5000
at 
org.apache.hadoop.hbase.ipc.RpcConnection$1.run(RpcConnection.java:137)
... 4 more
{code}

I wanted the region it was hitting. I wanted to know if it was a server problem 
or a Region issue. If clients only having issue w/ one Region, then I could 
focus on it.

After the PR the exception (from another context) looks like this:

{code}
org.apache.hadoop.hbase.ipc.CallTimeoutException: Call to 
address=127.0.0.1:12345, regionInfo=hbase:meta,,1.1588230740 failed on local 
exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: error

{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HBASE-25558) Adding audit log for execMasterService

2021-03-31 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17312772#comment-17312772
 ] 

Michael Stack edited comment on HBASE-25558 at 3/31/21, 11:18 PM:
--

Thank you for the improvement [~xiaoheipangzi] Merged to branch-2.4+


was (Author: stack):
Thank you for the improvement [~xiaoheipangzi]

> Adding audit log for execMasterService
> --
>
> Key: HBASE-25558
> URL: https://issues.apache.org/jira/browse/HBASE-25558
> Project: HBase
>  Issue Type: Improvement
>Reporter: lujie
>Assignee: lujie
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.5.0, 2.4.3
>
>
> Hi:
> I have found that in APIs, like execProcedure and execProcedureWithRet, have 
> audit log to record who execute the master service. The log can be like:
> {code:java}
> LOG.info(master.getClientIdAuditPrefix() + " procedure request for: " + 
> desc.getSignature());
> {code}
> But it seems that we forget to audit execMasterService. We should add one.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25558) Adding audit log for execMasterService

2021-03-31 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25558.
---
Fix Version/s: 2.4.3
   2.5.0
   3.0.0-alpha-1
 Hadoop Flags: Reviewed
   Resolution: Fixed

Thank you for the improvement [~xiaoheipangzi]

> Adding audit log for execMasterService
> --
>
> Key: HBASE-25558
> URL: https://issues.apache.org/jira/browse/HBASE-25558
> Project: HBase
>  Issue Type: Improvement
>Reporter: lujie
>Assignee: lujie
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.5.0, 2.4.3
>
>
> Hi:
> I have found that in APIs, like execProcedure and execProcedureWithRet, have 
> audit log to record who execute the master service. The log can be like:
> {code:java}
> LOG.info(master.getClientIdAuditPrefix() + " procedure request for: " + 
> desc.getSignature());
> {code}
> But it seems that we forget to audit execMasterService. We should add one.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25713) Make an hbase-wal module

2021-03-31 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17312764#comment-17312764
 ] 

Michael Stack commented on HBASE-25713:
---

I pushed up my latest changes to the branch and updated the PR. I think this 
experiment is done. See below for what I was able to break out for an 
hbase-coprocessor and hbase-wal module. Neither is coherent enough to earn 
their names 

hbase-coprocessor has a few of the base classes in it only and no tests to 
speak of:

{{hbase-coprocessor/src/test/java/org/apache/hadoop/hbase/coprocessor/TestReadOnlyConfiguration.java
hbase-coprocessor/src/main/java/org/apache/hadoop/hbase/CoprocessorEnvironment.java
hbase-coprocessor/src/main/java/org/apache/hadoop/hbase/coprocessor/BaseEnvironment.java
hbase-coprocessor/src/main/java/org/apache/hadoop/hbase/coprocessor/MetricsCoprocessor.java
hbase-coprocessor/src/main/java/org/apache/hadoop/hbase/coprocessor/ObserverContext.java
hbase-coprocessor/src/main/java/org/apache/hadoop/hbase/coprocessor/CoprocessorHost.java
hbase-coprocessor/src/main/java/org/apache/hadoop/hbase/coprocessor/ObserverContextImpl.java
hbase-coprocessor/src/main/java/org/apache/hadoop/hbase/coprocessor/CoreCoprocessor.java
hbase-coprocessor/src/main/java/org/apache/hadoop/hbase/coprocessor/ReadOnlyConfiguration.java
hbase-coprocessor/src/main/java/org/apache/hadoop/hbase/Coprocessor.java}}

Can't move anything else because needs internals context.

For the hbase-wal, I was not able to move an actual implementation. They are 
too entwined in rpc and region. Here is what I was able to break out:

{{hbase-wal/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestMetricsWAL.java
hbase-wal/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestMetricsWALSource.java
hbase-wal/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestCustomWALCellCodec.java
hbase-wal/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestMetricsWALSourceImpl.java
hbase-wal/src/main/java/org/apache/hadoop/hbase/wal/WALProvider.java
hbase-wal/src/main/java/org/apache/hadoop/hbase/wal/WALKey.java
hbase-wal/src/main/java/org/apache/hadoop/hbase/wal/WALKeyImpl.java
hbase-wal/src/main/java/org/apache/hadoop/hbase/wal/DisabledWALProvider.java
hbase-wal/src/main/java/org/apache/hadoop/hbase/wal/WAL.java
hbase-wal/src/main/java/org/apache/hadoop/hbase/wal/WALEdit.java
hbase-wal/src/main/java/org/apache/hadoop/hbase/coprocessor/WALCoprocessorEnvironment.java
hbase-wal/src/main/java/org/apache/hadoop/hbase/coprocessor/WALCoprocessor.java
hbase-wal/src/main/java/org/apache/hadoop/hbase/coprocessor/WALObserver.java
hbase-wal/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALCoprocessorHost.java
hbase-wal/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FailedLogCloseException.java
hbase-wal/src/main/java/org/apache/hadoop/hbase/regionserver/wal/MetricsWAL.java
hbase-wal/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALCellCodec.java
hbase-wal/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALActionsListener.java
hbase-wal/src/main/java/org/apache/hadoop/hbase/regionserver/wal/MetricsWALSource.java
hbase-wal/src/main/java/org/apache/hadoop/hbase/regionserver/wal/DamagedWALException.java
hbase-wal/src/main/java/org/apache/hadoop/hbase/regionserver/wal/MetricsWALSourceImpl.java
hbase-wal/src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java
hbase-wal/src/main/java/org/apache/hadoop/hbase/regionserver/SequenceId.java
hbase-wal/src/main/java/org/apache/hadoop/hbase/replication/regionserver/WALFileLengthProvider.java}}

Its not enough I think for the module to be called the hbase-wal module.

> Make an hbase-wal module
> 
>
> Key: HBASE-25713
> URL: https://issues.apache.org/jira/browse/HBASE-25713
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Michael Stack
>Priority: Major
>
> Extract an hbase-wal module upon which hbase-server can depend; makes 
> hbase-server smaller and maybe we could do an hbase-wal standalone... This is 
> an experiment.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25682) Add a new command to update the configuration of all RSs in a RSGroup

2021-03-29 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17311131#comment-17311131
 ] 

Michael Stack commented on HBASE-25682:
---

Sounds good.

> Add a new command to update the configuration of all RSs in a RSGroup
> -
>
> Key: HBASE-25682
> URL: https://issues.apache.org/jira/browse/HBASE-25682
> Project: HBase
>  Issue Type: Improvement
>  Components: Admin, shell
>Affects Versions: 3.0.0-alpha-1, 2.5.0
>Reporter: Baiqiang Zhao
>Assignee: Baiqiang Zhao
>Priority: Major
>
> Now we support hot update a subset of configuration on a server or all 
> server. Sometimes we may be necessary to hot update the configuration 
> according to a rsgroup.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25713) Make an hbase-wal module

2021-03-29 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17311128#comment-17311128
 ] 

Michael Stack commented on HBASE-25713:
---

 #3105 makes an hbase-coprocessor module.

> Make an hbase-wal module
> 
>
> Key: HBASE-25713
> URL: https://issues.apache.org/jira/browse/HBASE-25713
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Michael Stack
>Priority: Major
>
> Extract an hbase-wal module upon which hbase-server can depend; makes 
> hbase-server smaller and maybe we could do an hbase-wal standalone... This is 
> an experiment.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HBASE-25675) Shrink size of hbase-server module

2021-03-29 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17311108#comment-17311108
 ] 

Michael Stack edited comment on HBASE-25675 at 3/30/21, 4:38 AM:
-

Let me work on branch named for this issue. Made a subtask for making the 
hbase-wal module and will name branch for it, HBASE-25713


was (Author: stack):
Let me work on branch named for this issue.

> Shrink size of hbase-server module
> --
>
> Key: HBASE-25675
> URL: https://issues.apache.org/jira/browse/HBASE-25675
> Project: HBase
>  Issue Type: Umbrella
>Reporter: Michael Stack
>Priority: Major
>
> Umbrella issue for shrinking the size of the hbase-server module. Its too big 
> (see recent notes by [~zhangduo] that hbase-server size was making findbugs 
> OOME).
> Suggested candidate subtasks:
> hbase-io
> hbase-xtra-unit-tests <= Move large hbase-server tests out of hbase-server 
> and into this module
> hbase-region
> HBASE-25190 was about an hbase-tool module



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25713) Make an hbase-wal module

2021-03-29 Thread Michael Stack (Jira)
Michael Stack created HBASE-25713:
-

 Summary: Make an hbase-wal module
 Key: HBASE-25713
 URL: https://issues.apache.org/jira/browse/HBASE-25713
 Project: HBase
  Issue Type: Sub-task
Reporter: Michael Stack


Extract an hbase-wal module upon which hbase-server can depend; makes 
hbase-server smaller and maybe we could do an hbase-wal standalone... This is 
an experiment.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25675) Shrink size of hbase-server module

2021-03-29 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17311108#comment-17311108
 ] 

Michael Stack commented on HBASE-25675:
---

Let me work on branch named for this issue.

> Shrink size of hbase-server module
> --
>
> Key: HBASE-25675
> URL: https://issues.apache.org/jira/browse/HBASE-25675
> Project: HBase
>  Issue Type: Umbrella
>Reporter: Michael Stack
>Priority: Major
>
> Umbrella issue for shrinking the size of the hbase-server module. Its too big 
> (see recent notes by [~zhangduo] that hbase-server size was making findbugs 
> OOME).
> Suggested candidate subtasks:
> hbase-io
> hbase-xtra-unit-tests <= Move large hbase-server tests out of hbase-server 
> and into this module
> hbase-region
> HBASE-25190 was about an hbase-tool module



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25675) Shrink size of hbase-server module

2021-03-29 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17311105#comment-17311105
 ] 

Michael Stack commented on HBASE-25675:
---

Tried hbase-wal. The core will come out mostl but needs a little refactor that 
makes an hbase-coprocessor module first; hbase-coprocessor can be shared by 
hbase-wal and by hbase-server the hbase-wal module will include support for 
the wal coprocessor.  Let me see if it will work will be back.

> Shrink size of hbase-server module
> --
>
> Key: HBASE-25675
> URL: https://issues.apache.org/jira/browse/HBASE-25675
> Project: HBase
>  Issue Type: Umbrella
>Reporter: Michael Stack
>Priority: Major
>
> Umbrella issue for shrinking the size of the hbase-server module. Its too big 
> (see recent notes by [~zhangduo] that hbase-server size was making findbugs 
> OOME).
> Suggested candidate subtasks:
> hbase-io
> hbase-xtra-unit-tests <= Move large hbase-server tests out of hbase-server 
> and into this module
> hbase-region
> HBASE-25190 was about an hbase-tool module



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25634) The client frequently exceeds the quota, which causes the meta table scan to be too high

2021-03-29 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17310968#comment-17310968
 ] 

Michael Stack commented on HBASE-25634:
---

See PR [~zhengsicheng]

> The client frequently exceeds the quota, which causes the meta table scan to 
> be too high
> 
>
> Key: HBASE-25634
> URL: https://issues.apache.org/jira/browse/HBASE-25634
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.3.4
>Reporter: zhengsicheng
>Assignee: zhengsicheng
>Priority: Minor
> Attachments: image-2021-03-05-12-00-33-522.png, 
> image-2021-03-05-12-01-08-769.png
>
>
>  When the client scan operation, the server frequently returns 
> RpcThrottlingException, which will cause the meta table request to become 
> high.
>  
>  
> /hbase-client/src/main/java/org/apache/hadoop/hbase/client/RpcRetryingCallerImpl.java
> {code:java}
> // code placeholder
> @Override
> public T callWithRetries(RetryingCallable callable, int callTimeout)
> throws IOException, RuntimeException {
>   List exceptions = new 
> ArrayList<>();
>   tracker.start();
>   context.clear();
>   for (int tries = 0;; tries++) {
> long expectedSleep;
> try {
>   // bad cache entries are cleared in the call to 
> RetryingCallable#throwable() in catch block
>   // callable.prepare() reload force reload of server location
>   callable.prepare(tries != 0);
>   interceptor.intercept(context.prepare(callable, tries));
>   return callable.call(getTimeout(callTimeout));
> } catch (PreemptiveFastFailException e) {
>   throw e;
> } catch (Throwable t) {
>   ExceptionUtil.rethrowIfInterrupt(t);
>   Throwable cause = t.getCause();
>   if (cause instanceof DoNotRetryIOException) {
> // Fail fast
> throw (DoNotRetryIOException) cause;
>   }
>   // translateException throws exception when should not retry: i.e. when 
> request is bad.
>   interceptor.handleFailure(context, t);
>   t = translateException(t);
> {code}
>  
>  
>  
> !image-2021-03-05-12-00-33-522.png!
> !image-2021-03-05-12-01-08-769.png!
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25670) Backport HBASE-25665 to branch-1

2021-03-29 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25670.
---
Fix Version/s: 1.7.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

Merged to branch-1. Thanks for the PR [~lineyshinya] (Lets keep an eye on this 
one in the nightlies to make sure no unexpected consequence...   
https://ci-hadoop.apache.org/view/HBase/job/HBase/job/HBase%20Nightly/job/branch-1/
 )

> Backport HBASE-25665 to branch-1
> 
>
> Key: HBASE-25670
> URL: https://issues.apache.org/jira/browse/HBASE-25670
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Shinya Yoshida
>Assignee: Shinya Yoshida
>Priority: Major
> Fix For: 1.7.0
>
>
> Backport 
> [https://github.com/apache/hbase/commit/ebb0adf50009fc133af0cfb0bdce4dfbb81d4fbf]
>  for https://issues.apache.org/jira/browse/HBASE-25665 to branch-1



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25711) Setting wrong data block encoding through ColumnFamilyDescriptorBuilder#setValue leading to servers down

2021-03-29 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17310952#comment-17310952
 ] 

Michael Stack commented on HBASE-25711:
---

This is an old issue if you choose wrong encoder, or a good encoder but it 
is not installed properly, there is no way for us to know until we actually try 
to use the encoder. Folks tried to mitigate by providing tools to check encoder 
works... http://hbase.apache.org/book.html#_data_block_encoding_tool  Any 
suggestions for how to deal w/ this?

> Setting wrong data block encoding through 
> ColumnFamilyDescriptorBuilder#setValue leading to servers down
> 
>
> Key: HBASE-25711
> URL: https://issues.apache.org/jira/browse/HBASE-25711
> Project: HBase
>  Issue Type: Bug
>Reporter: Rajeshbabu Chintaguntla
>Assignee: Rajeshbabu Chintaguntla
>Priority: Major
>
> Setting wrong data block encoding using 
> ColumnFamilyDescriptorBuilder#setValue instead of using 
> ColumnFamilyDescriptorBuilder#setDataBlockEncoding leading to region servers 
> down eventually kill master also. This is possible from Phoenix where all the 
> column family properties passed to descriptors using 
> ColumnFamilyDescriptorBuilder#setValue. 
> {noformat}
> Failed to open region 
> my_case_sensitive_table,,1617040355998.d8a1df22970075b8863d5c39b2c1e08c., 
> will report to master
> java.io.IOException: java.lang.IllegalArgumentException: No enum constant 
> org.apache.hadoop.hbase.io.encoding.DataBlockEncoding.SDFS
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.initializeStores(HRegion.java:1134)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.initializeStores(HRegion.java:1076)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:973)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:925)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7346)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7304)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7276)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7234)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7185)
>   at 
> org.apache.hadoop.hbase.regionserver.handler.AssignRegionHandler.process(AssignRegionHandler.java:133)
>   at 
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.IllegalArgumentException: No enum constant 
> org.apache.hadoop.hbase.io.encoding.DataBlockEncoding.SDFS
>   at java.lang.Enum.valueOf(Enum.java:238)
>   at 
> org.apache.hadoop.hbase.io.encoding.DataBlockEncoding.valueOf(DataBlockEncoding.java:31)
>   at 
> org.apache.hadoop.hbase.client.ColumnFamilyDescriptorBuilder$ModifyableColumnFamilyDescriptor.lambda$getDataBlockEncoding$2(ColumnFamilyDescriptorBuilder.java:806)
>   at 
> org.apache.hadoop.hbase.client.ColumnFamilyDescriptorBuilder$ModifyableColumnFamilyDescriptor.lambda$getStringOrDefault$0(ColumnFamilyDescriptorBuilder.java:708)
>   at 
> org.apache.hadoop.hbase.client.ColumnFamilyDescriptorBuilder$ModifyableColumnFamilyDescriptor.getOrDefault(ColumnFamilyDescriptorBuilder.java:716)
>   at 
> org.apache.hadoop.hbase.client.ColumnFamilyDescriptorBuilder$ModifyableColumnFamilyDescriptor.getStringOrDefault(ColumnFamilyDescriptorBuilder.java:708)
>   at 
> org.apache.hadoop.hbase.client.ColumnFamilyDescriptorBuilder$ModifyableColumnFamilyDescriptor.getDataBlockEncoding(ColumnFamilyDescriptorBuilder.java:805)
>   at org.apache.hadoop.hbase.regionserver.HStore.(HStore.java:269)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:5816)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:1098)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:1095)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   ... 3 more
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25682) Add a new command to update the configuration of all RSs in a RSGroup

2021-03-29 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17310945#comment-17310945
 ] 

Michael Stack commented on HBASE-25682:
---

Seems good [~DeanZ].  What versions are you targetting? Master? Where else?

> Add a new command to update the configuration of all RSs in a RSGroup
> -
>
> Key: HBASE-25682
> URL: https://issues.apache.org/jira/browse/HBASE-25682
> Project: HBase
>  Issue Type: Improvement
>  Components: Admin, shell
>Reporter: Baiqiang Zhao
>Assignee: Baiqiang Zhao
>Priority: Major
>
> Now we support hot update a subset of configuration on a server or all 
> server. Sometimes we may be necessary to hot update the configuration 
> according to a rsgroup.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25687) Backport "HBASE-25681 Add a switch for server/table queryMeter" to branch-2 and branch-1

2021-03-29 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17310942#comment-17310942
 ] 

Michael Stack commented on HBASE-25687:
---

Merged branch-1 patch. Waiting on re-build before merging branch-2 

> Backport "HBASE-25681 Add a switch for server/table queryMeter" to branch-2 
> and branch-1
> 
>
> Key: HBASE-25687
> URL: https://issues.apache.org/jira/browse/HBASE-25687
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Baiqiang Zhao
>Assignee: Baiqiang Zhao
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-25687) Backport "HBASE-25681 Add a switch for server/table queryMeter" to branch-2 and branch-1

2021-03-29 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack updated HBASE-25687:
--
Release Note: 
Adds flags to disable server and table metrics. They are default on.

"hbase.regionserver.enable.server.query.meter"
"hbase.regionserver.enable.table.query.meter";



> Backport "HBASE-25681 Add a switch for server/table queryMeter" to branch-2 
> and branch-1
> 
>
> Key: HBASE-25687
> URL: https://issues.apache.org/jira/browse/HBASE-25687
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Baiqiang Zhao
>Assignee: Baiqiang Zhao
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25692) Failure to instantiate WALCellCodec leaks socket in replication

2021-03-29 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25692.
---
Hadoop Flags: Reviewed
  Resolution: Fixed

Merged to 2.3+. Shout if you want it to go elsewhere [~elserj].

> Failure to instantiate WALCellCodec leaks socket in replication
> ---
>
> Key: HBASE-25692
> URL: https://issues.apache.org/jira/browse/HBASE-25692
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 2.1.0, 2.2.0, 2.1.1, 2.1.2, 2.1.3, 2.3.0, 2.3.1, 2.1.4, 
> 2.0.6, 2.1.5, 2.2.1, 2.1.6, 2.1.7, 2.2.2, 2.1.8, 2.2.3, 2.3.3, 2.1.9, 2.2.4, 
> 2.4.0, 2.2.5, 2.2.6, 2.3.2, 2.3.4, 2.4.1, 2.4.2
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.5.0, 2.4.3, 2.3.6
>
>
> I was looking at an HBase user's cluster with [~danilocop] where they saw two 
> otherwise identical clusters where one of them was regularly had sockets in 
> CLOSE_WAIT going from RegionServers to a distributed storage appliance.
> After a lot of analysis, we eventually figured out that these sockets in 
> CLOSE_WAIT were directly related to an FSDataInputStream which we forgot to 
> close inside of the RegionServer. The subtlety was that only one of these 
> HBase clusters was set up to do replication (to the other cluster). The HBase 
> cluster experiencing this problem was shipping edits to a peer, and had 
> previously been using Phoenix. At some point, the cluster had Phoenix removed 
> from it.
> What we found was that replication still had WALs to ship which were for 
> Phoenix tables. Phoenix, in this version, still used the custom WALCellCodec; 
> however, this codec class was missing from the RS classpath after the owner 
> of the cluster removed Phoenix.
> When we try to instantiate the Codec implementation via ReflectionUtils, we 
> end up throwing an UnsupportedOperationException which wraps a 
> NoClassDefFoundException. However, in WALFactory, we _only_ close the 
> FSDataInputStream when we catch an IOException. 
> Thus, replication sits in a "fast" loop, trying to ship these edits, each 
> time leaking a new socket because of the InputStream not being closed. There 
> is an obvious workaround for this specific issue, but we should not leak this 
> inside HBase.
> Approximate, 2.1.x stack trace which lead us to this is below.
> {noformat}
> 2021-03-11 18:19:20,364 ERROR 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALReader: 
> Failed to read stream of replication entries
> java.io.IOException: Cannot get log reader
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:366)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:303)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:291)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:427)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.openReader(WALEntryStream.java:354)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.openNextLog(WALEntryStream.java:302)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.checkReader(WALEntryStream.java:293)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.tryAdvanceEntry(WALEntryStream.java:174)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.hasNext(WALEntryStream.java:100)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALReader.readWALEntries(ReplicationSourceWALReader.java:192)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALReader.run(ReplicationSourceWALReader.java:138)
> Caused by: java.lang.UnsupportedOperationException: Unable to find 
> org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec
>   at 
> org.apache.hadoop.hbase.util.ReflectionUtils.instantiateWithCustomCtor(ReflectionUtils.java:47)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.WALCellCodec.create(WALCellCodec.java:106)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.getCodec(ProtobufLogReader.java:301)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.initAfterCompression(ProtobufLogReader.java:311)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.init(ReaderBase.java:81)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.init(ProtobufLogReader.java:168)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:321)
>   ... 10 more
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec
>

[jira] [Updated] (HBASE-25692) Failure to instantiate WALCellCodec leaks socket in replication

2021-03-29 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack updated HBASE-25692:
--
Fix Version/s: 2.3.6
   2.4.3
   2.5.0
   3.0.0-alpha-1
   Status: In Progress  (was: Patch Available)

> Failure to instantiate WALCellCodec leaks socket in replication
> ---
>
> Key: HBASE-25692
> URL: https://issues.apache.org/jira/browse/HBASE-25692
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 2.4.2, 2.4.1, 2.3.4, 2.3.2, 2.2.6, 2.2.5, 2.4.0, 2.2.4, 
> 2.1.9, 2.3.3, 2.2.3, 2.1.8, 2.2.2, 2.1.7, 2.1.6, 2.2.1, 2.1.5, 2.0.6, 2.1.4, 
> 2.3.1, 2.3.0, 2.1.3, 2.1.2, 2.1.1, 2.2.0, 2.1.0
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.5.0, 2.4.3, 2.3.6
>
>
> I was looking at an HBase user's cluster with [~danilocop] where they saw two 
> otherwise identical clusters where one of them was regularly had sockets in 
> CLOSE_WAIT going from RegionServers to a distributed storage appliance.
> After a lot of analysis, we eventually figured out that these sockets in 
> CLOSE_WAIT were directly related to an FSDataInputStream which we forgot to 
> close inside of the RegionServer. The subtlety was that only one of these 
> HBase clusters was set up to do replication (to the other cluster). The HBase 
> cluster experiencing this problem was shipping edits to a peer, and had 
> previously been using Phoenix. At some point, the cluster had Phoenix removed 
> from it.
> What we found was that replication still had WALs to ship which were for 
> Phoenix tables. Phoenix, in this version, still used the custom WALCellCodec; 
> however, this codec class was missing from the RS classpath after the owner 
> of the cluster removed Phoenix.
> When we try to instantiate the Codec implementation via ReflectionUtils, we 
> end up throwing an UnsupportedOperationException which wraps a 
> NoClassDefFoundException. However, in WALFactory, we _only_ close the 
> FSDataInputStream when we catch an IOException. 
> Thus, replication sits in a "fast" loop, trying to ship these edits, each 
> time leaking a new socket because of the InputStream not being closed. There 
> is an obvious workaround for this specific issue, but we should not leak this 
> inside HBase.
> Approximate, 2.1.x stack trace which lead us to this is below.
> {noformat}
> 2021-03-11 18:19:20,364 ERROR 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALReader: 
> Failed to read stream of replication entries
> java.io.IOException: Cannot get log reader
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:366)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:303)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:291)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:427)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.openReader(WALEntryStream.java:354)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.openNextLog(WALEntryStream.java:302)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.checkReader(WALEntryStream.java:293)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.tryAdvanceEntry(WALEntryStream.java:174)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.hasNext(WALEntryStream.java:100)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALReader.readWALEntries(ReplicationSourceWALReader.java:192)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALReader.run(ReplicationSourceWALReader.java:138)
> Caused by: java.lang.UnsupportedOperationException: Unable to find 
> org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec
>   at 
> org.apache.hadoop.hbase.util.ReflectionUtils.instantiateWithCustomCtor(ReflectionUtils.java:47)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.WALCellCodec.create(WALCellCodec.java:106)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.getCodec(ProtobufLogReader.java:301)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.initAfterCompression(ProtobufLogReader.java:311)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.init(ReaderBase.java:81)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.init(ProtobufLogReader.java:168)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:321)
>   ... 10 more
> Caused by: java.lang.ClassNotFoundException: 
> 

[jira] [Resolved] (HBASE-25707) When restoring a table, create a namespace if it does not exist

2021-03-29 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25707.
---
Fix Version/s: 3.0.0-alpha-1
 Hadoop Flags: Reviewed
   Resolution: Fixed

Merged to master. Reviewed by [~wchevreuil]. Thanks for the PR [~shenshengli]

> When restoring a table, create a namespace if it does not exist
> ---
>
> Key: HBASE-25707
> URL: https://issues.apache.org/jira/browse/HBASE-25707
> Project: HBase
>  Issue Type: Bug
>  Components: backuprestore
>Affects Versions: 2.0.0
>Reporter: shenshengli
>Assignee: shenshengli
>Priority: Minor
> Fix For: 3.0.0-alpha-1
>
>
> It does not seem to have been taken into account that the namespace of the 
> table to be restored does not exist in the target environment, and if the 
> namespace does not exist, it will simply throw an error 
> (NamespaceNotFoundException ), which is unfriendly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25705) Convert proto to RSGroupInfo is costly

2021-03-29 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25705.
---
Fix Version/s: 3.0.0-alpha-1
 Hadoop Flags: Reviewed
   Resolution: Fixed

Merged to master branch. It won't backport w/o complaint. Open sub-task if you 
have PRs for backports [~mokai87]. Thanks for the PR.

> Convert proto to RSGroupInfo is costly
> --
>
> Key: HBASE-25705
> URL: https://issues.apache.org/jira/browse/HBASE-25705
> Project: HBase
>  Issue Type: Improvement
>  Components: rsgroup
>Affects Versions: 3.0.0-alpha-1, 2.2.3
>Reporter: mokai
>Assignee: mokai
>Priority: Minor
> Fix For: 3.0.0-alpha-1
>
>
> Convert RSGroupProtos.RSGroupInfo to RSGroupInfo is costly if the RSGroup has 
> too many RSs and tables. 
> We can use parallelStream to handle the HBaseProtos.ServerName list and 
> TableProtos.TableName list in ProtubufUtil#toGroupInfo as blow.
> {quote}Collection addresses = proto.getServersList()
>  .parallelStream()
>  .map(server -> Address.fromParts(server.getHostName(), server.getPort()))
>  .collect(Collectors.toList());
> Collection tables = proto.getTablesList()
>  .parallelStream()
>  .map(tableName -> ProtobufUtil.toTableName(tableName))
>  .collect(Collectors.toList());
> {quote}
> Get the RSGroupInfo which has 9 RS and 20k tables, the time cost reduced from 
> 6038 ms to 684 ms.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25710) During the recovery process, an error is thrown if there is an incremental backup of data that has not been updated

2021-03-29 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25710.
---
Fix Version/s: 3.0.0-alpha-1
 Hadoop Flags: Reviewed
   Resolution: Fixed

Merged to master. Thanks for PR [~shenshengli]

> During the recovery process, an error is thrown if there is an incremental 
> backup of data that has not been updated
> ---
>
> Key: HBASE-25710
> URL: https://issues.apache.org/jira/browse/HBASE-25710
> Project: HBase
>  Issue Type: Bug
>  Components: backuprestore
>Affects Versions: 2.0.0
>Reporter: shenshengli
>Assignee: shenshengli
>Priority: Minor
> Fix For: 3.0.0-alpha-1
>
>
> The error is shown below:
> 19:49:24.213 [main] ERROR org.apache.hadoop.hbase.backup.RestoreDriver - 
> Error while running restore backup
> java.io.IOException: Can not restore from backup directory (check Hadoop and 
> HBase logs)
>  at 
> org.apache.hadoop.hbase.backup.mapreduce.MapReduceRestoreJob.run(MapReduceRestoreJob.java:110)
>  ~[hbase-backup-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hbase.backup.util.RestoreTool.incrementalRestoreTable(RestoreTool.java:202)
>  ~[hbase-backup-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hbase.backup.impl.RestoreTablesClient.restoreImages(RestoreTablesClient.java:178)
>  ~[hbase-backup-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hbase.backup.impl.RestoreTablesClient.restore(RestoreTablesClient.java:221)
>  ~[hbase-backup-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hbase.backup.impl.RestoreTablesClient.execute(RestoreTablesClient.java:258)
>  ~[hbase-backup-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hbase.backup.impl.BackupAdminImpl.restore(BackupAdminImpl.java:520)
>  ~[hbase-backup-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hbase.backup.RestoreDriver.parseAndRun(RestoreDriver.java:179)
>  [hbase-backup-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hbase.backup.RestoreDriver.doWork(RestoreDriver.java:220) 
> [hbase-backup-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>  at org.apache.hadoop.hbase.backup.RestoreDriver.run(RestoreDriver.java:256) 
> [hbase-backup-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) 
> [hadoop-common-3.1.1.3.0.1.0-187.jar:?]
>  at org.apache.hadoop.hbase.backup.RestoreDriver.main(RestoreDriver.java:228) 
> [hbase-backup-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> Caused by: java.io.IOException: No input paths specified in job



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25706) Support specifying a base split policy class in KeyPrefixRegionSplitPolicy and DelimitedKeyPrefixRegionSplitPolicy

2021-03-29 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17310807#comment-17310807
 ] 

Michael Stack commented on HBASE-25706:
---

I like the [~zhangduo] suggested break up of concerns; good idea.

On the refactor, even though it of classes that are to be deprecated, seems 
fine to me... Let me look a the PR.


> Support specifying a base split policy class in KeyPrefixRegionSplitPolicy 
> and DelimitedKeyPrefixRegionSplitPolicy
> --
>
> Key: HBASE-25706
> URL: https://issues.apache.org/jira/browse/HBASE-25706
> Project: HBase
>  Issue Type: Improvement
>Reporter: Toshihiro Suzuki
>Assignee: Toshihiro Suzuki
>Priority: Major
>
> Basically, I think we can use KeyPrefixRegionSplitPolicy and 
> DelimitedKeyPrefixRegionSplitPolicy along with other split policies. In this 
> Jira, we will support specifying a base split policy class in 
> KeyPrefixRegionSplitPolicy and DelimitedKeyPrefixRegionSplitPolicy to use 
> them with different split policies.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25695) Link to the filter on hbase:meta from user tables panel on master page

2021-03-27 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25695.
---
Fix Version/s: 2.3.6
   2.4.3
   2.5.0
   3.0.0-alpha-1
 Hadoop Flags: Reviewed
 Assignee: Michael Stack
   Resolution: Fixed

Pushed to branch-2.3+. Thanks for review [~ndimiduk]

> Link to the filter on hbase:meta from user tables panel on master page
> --
>
> Key: HBASE-25695
> URL: https://issues.apache.org/jira/browse/HBASE-25695
> Project: HBase
>  Issue Type: Sub-task
>  Components: UI
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.5.0, 2.4.3, 2.3.6
>
> Attachments: image-2021-03-24-21-41-11-393.png, 
> image-2021-03-24-21-42-16-355.png, image-2021-03-24-21-43-24-426.png
>
>
> This is follow-on to the parent issue that added nice filtering mechanism on 
> hbase:meta table. Parent allows finding all Regions in Table XYZ with state 
> OPENING or FAILED_CLOSED.
> The user table panel on the master home page has counts of Regions in each 
> state. The opening and closing counts actually have links under them but they 
> are useless currently as they only show RITs that are CLOSING or OPENING; 
> good but not comprehensive enough.
> This PR adds links under all counts so you can see all CLOSING Regions 
> whether RIT or not; useful when doing fixup on a corrupt cluster.  Adds a bit 
> of help text that tells users about the filter-on-meta feature too.
> Here is how the panel currently looks:
>  !image-2021-03-24-21-41-11-393.png! 
> Here is what it looks like now with the bit of help text
>  !image-2021-03-24-21-42-16-355.png! 
> When you click on the CLOSED number -- '1' in this case -- this where you go 
> to:
>  !image-2021-03-24-21-43-24-426.png! 
> i..e. lists all Regions in the TestTable that are in the CLOSED state (not 
> very pretty with the 'Table Stats' and 'Table Regions' preamble but better 
> than what was there before).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HBASE-25675) Shrink size of hbase-server module

2021-03-26 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17309201#comment-17309201
 ] 

Michael Stack edited comment on HBASE-25675 at 3/26/21, 2:50 PM:
-

[link title|http://people.apache.org/~stack/dependency-graph.png]

Dependency graph made with

 {code}mvn com.github.ferstl:depgraph-maven-plugin:3.3.0:aggregate 
-DcreateImage=true -DreduceEdges=false -Dscope=compile 
"-Dincludes=org.apache.hbase*:*"{code}

Needs GraphViz installed.




was (Author: stack):
 !dependency-graph.png! 

Dependency graph made with

 {code}mvn com.github.ferstl:depgraph-maven-plugin:3.3.0:aggregate 
-DcreateImage=true -DreduceEdges=false -Dscope=compile 
"-Dincludes=org.apache.hbase*:*"{code}

Needs GraphViz installed.



> Shrink size of hbase-server module
> --
>
> Key: HBASE-25675
> URL: https://issues.apache.org/jira/browse/HBASE-25675
> Project: HBase
>  Issue Type: Umbrella
>Reporter: Michael Stack
>Priority: Major
>
> Umbrella issue for shrinking the size of the hbase-server module. Its too big 
> (see recent notes by [~zhangduo] that hbase-server size was making findbugs 
> OOME).
> Suggested candidate subtasks:
> hbase-io
> hbase-xtra-unit-tests <= Move large hbase-server tests out of hbase-server 
> and into this module
> hbase-region
> HBASE-25190 was about an hbase-tool module



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25675) Shrink size of hbase-server module

2021-03-26 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17309230#comment-17309230
 ] 

Michael Stack commented on HBASE-25675:
---

hbase-region would be hard to break out. Depends on everything (hbase-client in 
particular) and region is in o.a.h.h.regionserver. Ideally a hbase-region 
module would not depend on hbase-client but hbase-client has Put, Delete, 
RegionInfo and all are public API.

hbase-wal looks like it might be easier to break out, at least the interfaces 
and core types.

> Shrink size of hbase-server module
> --
>
> Key: HBASE-25675
> URL: https://issues.apache.org/jira/browse/HBASE-25675
> Project: HBase
>  Issue Type: Umbrella
>Reporter: Michael Stack
>Priority: Major
>
> Umbrella issue for shrinking the size of the hbase-server module. Its too big 
> (see recent notes by [~zhangduo] that hbase-server size was making findbugs 
> OOME).
> Suggested candidate subtasks:
> hbase-io
> hbase-xtra-unit-tests <= Move large hbase-server tests out of hbase-server 
> and into this module
> hbase-region
> HBASE-25190 was about an hbase-tool module



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25675) Shrink size of hbase-server module

2021-03-26 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17309228#comment-17309228
 ] 

Michael Stack commented on HBASE-25675:
---

 !dependency-graph.png! 

> Shrink size of hbase-server module
> --
>
> Key: HBASE-25675
> URL: https://issues.apache.org/jira/browse/HBASE-25675
> Project: HBase
>  Issue Type: Umbrella
>Reporter: Michael Stack
>Priority: Major
>
> Umbrella issue for shrinking the size of the hbase-server module. Its too big 
> (see recent notes by [~zhangduo] that hbase-server size was making findbugs 
> OOME).
> Suggested candidate subtasks:
> hbase-io
> hbase-xtra-unit-tests <= Move large hbase-server tests out of hbase-server 
> and into this module
> hbase-region
> HBASE-25190 was about an hbase-tool module



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Issue Comment Deleted] (HBASE-25675) Shrink size of hbase-server module

2021-03-26 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack updated HBASE-25675:
--
Comment: was deleted

(was:  !dependency-graph.png! )

> Shrink size of hbase-server module
> --
>
> Key: HBASE-25675
> URL: https://issues.apache.org/jira/browse/HBASE-25675
> Project: HBase
>  Issue Type: Umbrella
>Reporter: Michael Stack
>Priority: Major
>
> Umbrella issue for shrinking the size of the hbase-server module. Its too big 
> (see recent notes by [~zhangduo] that hbase-server size was making findbugs 
> OOME).
> Suggested candidate subtasks:
> hbase-io
> hbase-xtra-unit-tests <= Move large hbase-server tests out of hbase-server 
> and into this module
> hbase-region
> HBASE-25190 was about an hbase-tool module



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25675) Shrink size of hbase-server module

2021-03-26 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17309201#comment-17309201
 ] 

Michael Stack commented on HBASE-25675:
---

 !dependency-graph.png! 

Dependency graph made with

 {code}mvn com.github.ferstl:depgraph-maven-plugin:3.3.0:aggregate 
-DcreateImage=true -DreduceEdges=false -Dscope=compile 
"-Dincludes=org.apache.hbase*:*"{code}

Needs GraphViz installed.



> Shrink size of hbase-server module
> --
>
> Key: HBASE-25675
> URL: https://issues.apache.org/jira/browse/HBASE-25675
> Project: HBase
>  Issue Type: Umbrella
>Reporter: Michael Stack
>Priority: Major
>
> Umbrella issue for shrinking the size of the hbase-server module. Its too big 
> (see recent notes by [~zhangduo] that hbase-server size was making findbugs 
> OOME).
> Suggested candidate subtasks:
> hbase-io
> hbase-xtra-unit-tests <= Move large hbase-server tests out of hbase-server 
> and into this module
> hbase-region
> HBASE-25190 was about an hbase-tool module



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25032) Wait for region server to become online before adding it to online servers in Master

2021-03-26 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17309196#comment-17309196
 ] 

Michael Stack commented on HBASE-25032:
---

Nice job [~caroliney14] (nice reviewing [~bharathv])

> Wait for region server to become online before adding it to online servers in 
> Master
> 
>
> Key: HBASE-25032
> URL: https://issues.apache.org/jira/browse/HBASE-25032
> Project: HBase
>  Issue Type: Bug
>Reporter: Sandeep Guggilam
>Assignee: Caroline
>Priority: Major
>  Labels: master, regionserver
> Fix For: 3.0.0-alpha-1, 1.7.0, 2.5.0, 2.3.5, 2.4.3
>
>
> As part of RS start up, RS reports for duty to Master . Master acknowledges 
> the request and adds it to the onlineServers list for further assigning any 
> regions to the RS
> Once Master acknowledges the reportForDuty and sends back the response, RS 
> does a bunch of stuff like initializing replication sources etc before 
> becoming online. However, sometimes there could be an issue with initializing 
> replication sources when it is unable to connect to peer clusters because of 
> some kerberos configuration and there would be a delay of around 20 mins in 
> becoming online.
>  
> Since master considers it online, it tries to assign regions and which fails 
> with ServerNotRunningYet exception, then the master tries to unassign which 
> again fails with the same exception leading the region to FAILED_CLOSE state.
>  
> It would be good to have a check to see if the RS is ready to accept the 
> assignment requests before adding it to online servers list which would 
> account for any such delays as described above



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25695) Link to the filter on hbase:meta from user tables panel on master page

2021-03-24 Thread Michael Stack (Jira)
Michael Stack created HBASE-25695:
-

 Summary: Link to the filter on hbase:meta from user tables panel 
on master page
 Key: HBASE-25695
 URL: https://issues.apache.org/jira/browse/HBASE-25695
 Project: HBase
  Issue Type: Sub-task
  Components: UI
Reporter: Michael Stack
 Attachments: image-2021-03-24-21-41-11-393.png, 
image-2021-03-24-21-42-16-355.png, image-2021-03-24-21-43-24-426.png

This is follow-on to the parent issue that added nice filtering mechanism on 
hbase:meta table. Parent allows finding all Regions in Table XYZ with state 
OPENING or FAILED_CLOSED.

The user table panel on the master home page has counts of Regions in each 
state. The opening and closing counts actually have links under them but they 
are useless currently as they only show RITs that are CLOSING or OPENING; good 
but not comprehensive enough.

This PR adds links under all counts so you can see all CLOSING Regions whether 
RIT or not; useful when doing fixup on a corrupt cluster.  Adds a bit of help 
text that tells users about the filter-on-meta feature too.

Here is how the panel currently looks:

 !image-2021-03-24-21-41-11-393.png! 

Here is what it looks like now with the bit of help text

 !image-2021-03-24-21-42-16-355.png! 


When you click on the CLOSED number -- '1' in this case -- this where you go to:

 !image-2021-03-24-21-43-24-426.png! 

i..e. lists all Regions in the TestTable that are in the CLOSED state (not very 
pretty with the 'Table Stats' and 'Table Regions' preamble but better than what 
was there before).




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25676) Move generic classes from hbase-server to hbase-common

2021-03-23 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25676.
---
Resolution: Won't Fix

Resolving as "won't fix"

Let me just close this. Most of the classes moved here are used by hbase-server 
only. Even though a bunch of theses classes are generic and could be used 
elsewhere other than by hbase-server AND even though a good portion of the 
content of hbase-common is currently only used by hbase-server, lets favor 
coherent, contained modules. Closing as wrong direction.

Thanks for reviews [~zhangduo] and @dupg

> Move generic classes from hbase-server to hbase-common
> --
>
> Key: HBASE-25676
> URL: https://issues.apache.org/jira/browse/HBASE-25676
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Michael Stack
>Priority: Major
>
> There's a bunch of classes that are not hbase-server specific on cursory 
> review that could live in hbase-common... not many, about 3% of src/main/java 
> but move them out.
> {code}
>   rename {hbase-server => 
> hbase-common}/src/main/java/org/apache/hadoop/hbase/SslRMIClientSocketFactorySecure.java
>  (99%)
>   rename {hbase-server => 
> hbase-common}/src/main/java/org/apache/hadoop/hbase/SslRMIServerSocketFactorySecure.java
>  (99%)
>   rename {hbase-server/src/main/java/org/apache/hadoop/hbase => 
> hbase-common/src/main/java/org/apache/hadoop/hbase/healthcheck}/HealthCheckChore.java
>  (93%)
>   rename {hbase-server/src/main/java/org/apache/hadoop/hbase => 
> hbase-common/src/main/java/org/apache/hadoop/hbase/healthcheck}/HealthChecker.java
>  (86%)
>   rename {hbase-server/src/main/java/org/apache/hadoop/hbase => 
> hbase-common/src/main/java/org/apache/hadoop/hbase/healthcheck}/HealthReport.java
>  (94%)
>   rename {hbase-server/src/test/java/org/apache/hadoop/hbase => 
> hbase-common/src/test/java/org/apache/hadoop/hbase/healthcheck}/TestNodeHealthCheckChore.java
>  (86%)
>   delete mode 100644 
> hbase-server/src/main/java/org/apache/hadoop/hbase/DaemonThreadFactory.java
>   rename {hbase-server => 
> hbase-common}/src/main/java/org/apache/hadoop/hbase/security/SecurityUtil.java
>  (100%)
>   rename {hbase-server => 
> hbase-common}/src/main/java/org/apache/hadoop/hbase/util/ConfigurationUtil.java
>  (99%)
>   rename {hbase-server => 
> hbase-common}/src/main/java/org/apache/hadoop/hbase/util/DirectMemoryUtils.java
>  (100%)
>   rename {hbase-server => 
> hbase-common}/src/main/java/org/apache/hadoop/hbase/util/GetJavaProperty.java 
> (100%)
>   rename {hbase-server => 
> hbase-common}/src/main/java/org/apache/hadoop/hbase/util/HBaseConfTool.java 
> (100%)
>   rename {hbase-server => 
> hbase-common}/src/main/java/org/apache/hadoop/hbase/util/HashedBytes.java 
> (100%)
>   rename {hbase-server => 
> hbase-common}/src/main/java/org/apache/hadoop/hbase/util/IdReadWriteLock.java 
> (100%)
>   rename {hbase-server => 
> hbase-common}/src/main/java/org/apache/hadoop/hbase/util/JvmVersion.java 
> (100%)
>   rename {hbase-server => 
> hbase-common}/src/main/java/org/apache/hadoop/hbase/util/KeyRange.java (100%)
>   rename {hbase-server => 
> hbase-common}/src/main/java/org/apache/hadoop/hbase/util/LossyCounting.java 
> (100%)
>   rename {hbase-server => 
> hbase-common}/src/main/java/org/apache/hadoop/hbase/util/ManualEnvironmentEdge.java
>  (100%)
>   rename {hbase-server => 
> hbase-common}/src/main/java/org/apache/hadoop/hbase/util/MunkresAssignment.java
>  (100%)
>   rename {hbase-server => 
> hbase-common}/src/main/java/org/apache/hadoop/hbase/util/NettyEventLoopGroupConfig.java
>  (100%)
>   rename {hbase-server => 
> hbase-common}/src/main/java/org/apache/hadoop/hbase/util/RegionSplitCalculator.java
>  (100%)
>   rename {hbase-server => 
> hbase-common}/src/main/java/org/apache/hadoop/hbase/util/RollingStatCalculator.java
>  (99%)
>   rename {hbase-server => 
> hbase-common}/src/main/java/org/apache/hadoop/hbase/util/ShutdownHookManager.java
>  (100%)
>   rename {hbase-server => 
> hbase-common}/src/main/java/org/apache/hadoop/hbase/util/SortedList.java 
> (100%)
>   rename {hbase-server => 
> hbase-common}/src/main/java/org/apache/hadoop/hbase/util/StealJobQueue.java 
> (100%)
>   rename {hbase-server => 
> hbase-common}/src/test/java/org/apache/hadoop/hbase/util/TestConfigurationUtil.java
>  (100%)
>   rename {hbase-server => 
> hbase-common}/src/test/java/org/apache/hadoop/hbase/util/TestIdReadWriteLock.java
>  (100%)
>   rename {hbase-server => 
> hbase-common}/src/test/java/org/apache/hadoop/hbase/util/TestLossyCounting.java
>  (100%)
>   rename {hbase-server => 
> hbase-common}/src/test/java/org/apache/hadoop/hbase/util/TestRegionSplitCalculator.java
>  (95%)
>   rename {hbase-server => 
> hbase-common}/src/test/java/org/apache/hadoop/hbase/util/TestSortedList.java 
> (100%)
>   rename 

[jira] [Resolved] (HBASE-25685) asyncprofiler2.0 no longer supports svg; wants html

2021-03-22 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25685.
---
Fix Version/s: 2.4.3
   2.3.5
   2.5.0
   3.0.0-alpha-1
 Hadoop Flags: Reviewed
 Release Note: 
If asyncprofiler 1.x, all is good. If asyncprofiler 2.x and it is hbase-2.3.x 
or hbase-2.4.x, add '?output=html' to get flamegraphs from the profiler.

Otherwise, if hbase-2.5+ and asyncprofiler2, all works. If asyncprofiler1 and 
hbase-2.5+, you may have to add '?output=svg' to the query.
   Resolution: Fixed

Thanks for the review [~weichiu]. Pushed #3079 on branch-2.3+branch-2.4. Pushed 
#3078 on branch-2 and master.

> asyncprofiler2.0 no longer supports svg; wants html
> ---
>
> Key: HBASE-25685
> URL: https://issues.apache.org/jira/browse/HBASE-25685
> Project: HBase
>  Issue Type: Bug
>Reporter: Michael Stack
>Assignee: Wei-Chiu Chuang
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.5.0, 2.3.5, 2.4.3
>
>
> asyncprofiler2.0 is out. Its a nice tool. Unfortunately, it dropped the svg 
> formatting option that we use in our servlet. Now it wants you  to pass html. 
> Lets fix.
> Old -o on asyncprofiler1.x
> -o fmtoutput format: summary|traces|flat|collapsed|svg|tree|jfr
> New -o asyncprofiler 2.x
> -o fmtoutput format: flat|traces|collapsed|flamegraph|tree|jfr
> If you pass svg to 2.0, it does nothing ... If you run the command hbase is 
> running you see:
> {code}
> /tmp/prof-output$ sudo -u hbase /usr/lib/async-profiler/profiler.sh -e cpu -d 
> 10 -o svg -f /tmp/prof-output/async-prof-pid-8346-cpu-1x.svg 8346
> [ERROR] SVG format is obsolete, use .html for FlameGraph
> {code}
> At a minimum can make it so the OUTPUT param supports HTML. Here is current 
> enum state:
> {code}
>   enum Output {
> SUMMARY,
> TRACES,
> FLAT,
> COLLAPSED,
> SVG,
> TREE,
> JFR
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25685) asyncprofiler2.0 no longer supports svg; wants html

2021-03-22 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17306530#comment-17306530
 ] 

Michael Stack commented on HBASE-25685:
---

Here is suggestion... here for branch-2.3/2.4 #3078... i.e. presume 
asyncprofiler1 but if asyncprofiler2 is there, allow setting output=html. For 
branch-2 and master, presume asyncprofiler2... thats #3079

> asyncprofiler2.0 no longer supports svg; wants html
> ---
>
> Key: HBASE-25685
> URL: https://issues.apache.org/jira/browse/HBASE-25685
> Project: HBase
>  Issue Type: Bug
>Reporter: Michael Stack
>Assignee: Wei-Chiu Chuang
>Priority: Major
>
> asyncprofiler2.0 is out. Its a nice tool. Unfortunately, it dropped the svg 
> formatting option that we use in our servlet. Now it wants you  to pass html. 
> Lets fix.
> Old -o on asyncprofiler1.x
> -o fmtoutput format: summary|traces|flat|collapsed|svg|tree|jfr
> New -o asyncprofiler 2.x
> -o fmtoutput format: flat|traces|collapsed|flamegraph|tree|jfr
> If you pass svg to 2.0, it does nothing ... If you run the command hbase is 
> running you see:
> {code}
> /tmp/prof-output$ sudo -u hbase /usr/lib/async-profiler/profiler.sh -e cpu -d 
> 10 -o svg -f /tmp/prof-output/async-prof-pid-8346-cpu-1x.svg 8346
> [ERROR] SVG format is obsolete, use .html for FlameGraph
> {code}
> At a minimum can make it so the OUTPUT param supports HTML. Here is current 
> enum state:
> {code}
>   enum Output {
> SUMMARY,
> TRACES,
> FLAT,
> COLLAPSED,
> SVG,
> TREE,
> JFR
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25685) asyncprofiler2.0 no longer supports svg; wants html

2021-03-22 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17306332#comment-17306332
 ] 

Michael Stack commented on HBASE-25685:
---

THanks for taking it [~weichiu] What you thinking?

> asyncprofiler2.0 no longer supports svg; wants html
> ---
>
> Key: HBASE-25685
> URL: https://issues.apache.org/jira/browse/HBASE-25685
> Project: HBase
>  Issue Type: Bug
>Reporter: Michael Stack
>Assignee: Wei-Chiu Chuang
>Priority: Major
>
> asyncprofiler2.0 is out. Its a nice tool. Unfortunately, it dropped the svg 
> formatting option that we use in our servlet. Now it wants you  to pass html. 
> Lets fix.
> Old -o on asyncprofiler1.x
> -o fmtoutput format: summary|traces|flat|collapsed|svg|tree|jfr
> New -o asyncprofiler 2.x
> -o fmtoutput format: flat|traces|collapsed|flamegraph|tree|jfr
> If you pass svg to 2.0, it does nothing ... If you run the command hbase is 
> running you see:
> {code}
> /tmp/prof-output$ sudo -u hbase /usr/lib/async-profiler/profiler.sh -e cpu -d 
> 10 -o svg -f /tmp/prof-output/async-prof-pid-8346-cpu-1x.svg 8346
> [ERROR] SVG format is obsolete, use .html for FlameGraph
> {code}
> At a minimum can make it so the OUTPUT param supports HTML. Here is current 
> enum state:
> {code}
>   enum Output {
> SUMMARY,
> TRACES,
> FLAT,
> COLLAPSED,
> SVG,
> TREE,
> JFR
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25672) Backport HBASE-25608 to branch-1

2021-03-22 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25672.
---
Fix Version/s: 1.7.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

Merged to branch-1. Thanks for the PR [~lineyshinya]

> Backport HBASE-25608 to branch-1
> 
>
> Key: HBASE-25672
> URL: https://issues.apache.org/jira/browse/HBASE-25672
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Shinya Yoshida
>Assignee: Shinya Yoshida
>Priority: Major
> Fix For: 1.7.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25683) Simplify UTs using DummyServer

2021-03-22 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25683.
---
Fix Version/s: 3.0.0-alpha-1
 Hadoop Flags: Reviewed
   Resolution: Fixed

Merged. Nice cleanup. Thanks for the PR [~Ddupg]

> Simplify UTs using DummyServer
> --
>
> Key: HBASE-25683
> URL: https://issues.apache.org/jira/browse/HBASE-25683
> Project: HBase
>  Issue Type: Test
>  Components: test
>Affects Versions: 3.0.0-alpha-1
>Reporter: Sun Xin
>Assignee: Sun Xin
>Priority: Trivial
> Fix For: 3.0.0-alpha-1
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25679) Size of log queue metric is incorrect in branch-1/branch-2

2021-03-20 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17305566#comment-17305566
 ] 

Michael Stack commented on HBASE-25679:
---

Thanks for fixing 'fix version' [~shahrs87]

> Size of log queue metric is incorrect in branch-1/branch-2
> --
>
> Key: HBASE-25679
> URL: https://issues.apache.org/jira/browse/HBASE-25679
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.7.0, 2.5.0, 2.4.2
>Reporter: Rushabh Shah
>Assignee: Rushabh Shah
>Priority: Major
> Fix For: 1.7.0, 2.5.0, 2.4.3
>
>
> In HBASE-25539 I did some refactoring for adding a new metric "oldestWalAge" 
> and tried to consolidate update to all the metrics related to 
> ReplicationSource class (size of log queue and oldest wal age) at one place.  
> In that refactoring introduced one bug where I am decrementing twice from 
> size of log queue metric whenever we remove a wal from Replication source 
> queue.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25594) graceful_stop.sh fails to unload regions when ran at localhost

2021-03-20 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25594.
---
Resolution: Fixed

Pushed addendum on branch-2.4+

> graceful_stop.sh fails to unload regions when ran at localhost
> --
>
> Key: HBASE-25594
> URL: https://issues.apache.org/jira/browse/HBASE-25594
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha-1, 1.4.13
>Reporter: Javier Akira Luca de Tena
>Assignee: Javier Akira Luca de Tena
>Priority: Minor
> Fix For: 3.0.0-alpha-1, 2.5.0, 2.4.3
>
>
> We usually use graceful_stop.sh from the Master to restart RegionServers. 
> However, in some scenarios we may not have privileges to restart remote 
> RegionServers (it uses ssh).
>  But we can still use graceful_stop.sh on the same host we want to restart.
> In order to detect the execution at localhost, graceful_stop.sh uses 
> /bin/hostname.
>  
> [https://github.com/apache/hbase/blob/cfbae4d3a37e7ac4d795461c3e19406a2786838d/bin/graceful_stop.sh#L106-L110]
> When RegionMover strips the host to not include it in the list of target 
> hosts, we filter it out by checking all RegionServer hosts in the cluster:
>  
> [https://github.com/apache/hbase/blob/branch-2/hbase-server/src/main/java/org/apache/hadoop/hbase/util/RegionMover.java#L382-L384]
>  
> [https://github.com/apache/hbase/blob/cfbae4d3a37e7ac4d795461c3e19406a2786838d/hbase-server/src/main/java/org/apache/hadoop/hbase/util/RegionMover.java#L692]
> But the list of RegionServer hosts returned by Admin#getRegionServers are 
> FDQN, while the hostname provided from graceful_stop.sh is not FDQN, making 
> the comparison fail.
> Same happens for branch-1 region_mover.rb, which is the place I reproduced in 
> my environment: 
> [https://github.com/apache/hbase/blob/f9a91488b2c39320bed502619bf7adb765c79de6/bin/region_mover.rb#L305]
> [https://github.com/apache/hbase/blob/f9a91488b2c39320bed502619bf7adb765c79de6/bin/region_mover.rb#L175]
>  
> [https://github.com/apache/hbase/blob/f9a91488b2c39320bed502619bf7adb765c79de6/bin/region_mover.rb#L186-L192]
>  
> This can be fixed just by using "/bin/hostname -f" in the graceful_stop.sh 
> script.
> Will provide patch soon.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (HBASE-25594) graceful_stop.sh fails to unload regions when ran at localhost

2021-03-20 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack reopened HBASE-25594:
---

Reopen to apply addendum below
{code}
commit 326835e8372cc83092e0ec127650438ff153476a (HEAD -> m, origin/master, 
origin/HEAD)
Author: stack 
Date:   Sat Mar 20 13:47:18 2021 -0700

HBASE-25594 Make easier to use graceful_stop on localhost mode (#3054)
Addendum.

diff --git a/bin/graceful_stop.sh b/bin/graceful_stop.sh
index 05919ce72d..fc18239830 100755
--- a/bin/graceful_stop.sh
+++ b/bin/graceful_stop.sh
@@ -105,9 +105,6 @@ filename="/tmp/$hostname"
 local=
 localhostname=`/bin/hostname -f`

-if [ "$localhostname" == "$hostname" ]; then
-  local=true
-fi
 if [ "$localhostname" == "$hostname" ] || [ "$hostname" == "localhost" ]; then
   local=true
   hostname=$localhostname
{code}

> graceful_stop.sh fails to unload regions when ran at localhost
> --
>
> Key: HBASE-25594
> URL: https://issues.apache.org/jira/browse/HBASE-25594
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha-1, 1.4.13
>Reporter: Javier Akira Luca de Tena
>Assignee: Javier Akira Luca de Tena
>Priority: Minor
> Fix For: 3.0.0-alpha-1, 2.5.0, 2.4.3
>
>
> We usually use graceful_stop.sh from the Master to restart RegionServers. 
> However, in some scenarios we may not have privileges to restart remote 
> RegionServers (it uses ssh).
>  But we can still use graceful_stop.sh on the same host we want to restart.
> In order to detect the execution at localhost, graceful_stop.sh uses 
> /bin/hostname.
>  
> [https://github.com/apache/hbase/blob/cfbae4d3a37e7ac4d795461c3e19406a2786838d/bin/graceful_stop.sh#L106-L110]
> When RegionMover strips the host to not include it in the list of target 
> hosts, we filter it out by checking all RegionServer hosts in the cluster:
>  
> [https://github.com/apache/hbase/blob/branch-2/hbase-server/src/main/java/org/apache/hadoop/hbase/util/RegionMover.java#L382-L384]
>  
> [https://github.com/apache/hbase/blob/cfbae4d3a37e7ac4d795461c3e19406a2786838d/hbase-server/src/main/java/org/apache/hadoop/hbase/util/RegionMover.java#L692]
> But the list of RegionServer hosts returned by Admin#getRegionServers are 
> FDQN, while the hostname provided from graceful_stop.sh is not FDQN, making 
> the comparison fail.
> Same happens for branch-1 region_mover.rb, which is the place I reproduced in 
> my environment: 
> [https://github.com/apache/hbase/blob/f9a91488b2c39320bed502619bf7adb765c79de6/bin/region_mover.rb#L305]
> [https://github.com/apache/hbase/blob/f9a91488b2c39320bed502619bf7adb765c79de6/bin/region_mover.rb#L175]
>  
> [https://github.com/apache/hbase/blob/f9a91488b2c39320bed502619bf7adb765c79de6/bin/region_mover.rb#L186-L192]
>  
> This can be fixed just by using "/bin/hostname -f" in the graceful_stop.sh 
> script.
> Will provide patch soon.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25668) TestCurrentHourProvider fails 100% in branch-2.3

2021-03-20 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17305561#comment-17305561
 ] 

Michael Stack commented on HBASE-25668:
---

Thank you [~psomogyi]


> TestCurrentHourProvider fails 100% in branch-2.3
> 
>
> Key: HBASE-25668
> URL: https://issues.apache.org/jira/browse/HBASE-25668
> Project: HBase
>  Issue Type: Sub-task
>  Components: flakies
>Affects Versions: 2.3.4
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Minor
> Fix For: 2.3.5
>
> Attachments: image-2021-03-16-13-34-29-412.png, screenshot-1.png
>
>
>  !image-2021-03-16-13-34-29-412.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25594) graceful_stop.sh fails to unload regions when ran at localhost

2021-03-20 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17305562#comment-17305562
 ] 

Michael Stack commented on HBASE-25594:
---

Thanks again[~psomogyi]

> graceful_stop.sh fails to unload regions when ran at localhost
> --
>
> Key: HBASE-25594
> URL: https://issues.apache.org/jira/browse/HBASE-25594
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha-1, 1.4.13
>Reporter: Javier Akira Luca de Tena
>Assignee: Javier Akira Luca de Tena
>Priority: Minor
> Fix For: 3.0.0-alpha-1, 2.5.0, 2.4.3
>
>
> We usually use graceful_stop.sh from the Master to restart RegionServers. 
> However, in some scenarios we may not have privileges to restart remote 
> RegionServers (it uses ssh).
>  But we can still use graceful_stop.sh on the same host we want to restart.
> In order to detect the execution at localhost, graceful_stop.sh uses 
> /bin/hostname.
>  
> [https://github.com/apache/hbase/blob/cfbae4d3a37e7ac4d795461c3e19406a2786838d/bin/graceful_stop.sh#L106-L110]
> When RegionMover strips the host to not include it in the list of target 
> hosts, we filter it out by checking all RegionServer hosts in the cluster:
>  
> [https://github.com/apache/hbase/blob/branch-2/hbase-server/src/main/java/org/apache/hadoop/hbase/util/RegionMover.java#L382-L384]
>  
> [https://github.com/apache/hbase/blob/cfbae4d3a37e7ac4d795461c3e19406a2786838d/hbase-server/src/main/java/org/apache/hadoop/hbase/util/RegionMover.java#L692]
> But the list of RegionServer hosts returned by Admin#getRegionServers are 
> FDQN, while the hostname provided from graceful_stop.sh is not FDQN, making 
> the comparison fail.
> Same happens for branch-1 region_mover.rb, which is the place I reproduced in 
> my environment: 
> [https://github.com/apache/hbase/blob/f9a91488b2c39320bed502619bf7adb765c79de6/bin/region_mover.rb#L305]
> [https://github.com/apache/hbase/blob/f9a91488b2c39320bed502619bf7adb765c79de6/bin/region_mover.rb#L175]
>  
> [https://github.com/apache/hbase/blob/f9a91488b2c39320bed502619bf7adb765c79de6/bin/region_mover.rb#L186-L192]
>  
> This can be fixed just by using "/bin/hostname -f" in the graceful_stop.sh 
> script.
> Will provide patch soon.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25685) asyncprofiler2.0 no longer supports svg; wants html

2021-03-19 Thread Michael Stack (Jira)
Michael Stack created HBASE-25685:
-

 Summary: asyncprofiler2.0 no longer supports svg; wants html
 Key: HBASE-25685
 URL: https://issues.apache.org/jira/browse/HBASE-25685
 Project: HBase
  Issue Type: Bug
Reporter: Michael Stack


asyncprofiler2.0 is out. Its a nice tool. Unfortunately, it dropped the svg 
formatting option that we use in our servlet. Now it wants you  to pass html. 
Lets fix.

Old -o on asyncprofiler1.x
-o fmtoutput format: summary|traces|flat|collapsed|svg|tree|jfr

New -o asyncprofiler 2.x
-o fmtoutput format: flat|traces|collapsed|flamegraph|tree|jfr

If you pass svg to 2.0, it does nothing ... If you run the command hbase is 
running you see:

{code}
/tmp/prof-output$ sudo -u hbase /usr/lib/async-profiler/profiler.sh -e cpu -d 
10 -o svg -f /tmp/prof-output/async-prof-pid-8346-cpu-1x.svg 8346
[ERROR] SVG format is obsolete, use .html for FlameGraph
{code}

At a minimum can make it so the OUTPUT param supports HTML. Here is current 
enum state:

{code}
  enum Output {
SUMMARY,
TRACES,
FLAT,
COLLAPSED,
SVG,
TREE,
JFR
  }
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25668) TestCurrentHourProvider fails 100% in branch-2.3

2021-03-19 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17305240#comment-17305240
 ] 

Michael Stack commented on HBASE-25668:
---

Please and thank you [~psomogyi]


> TestCurrentHourProvider fails 100% in branch-2.3
> 
>
> Key: HBASE-25668
> URL: https://issues.apache.org/jira/browse/HBASE-25668
> Project: HBase
>  Issue Type: Sub-task
>  Components: flakies
>Affects Versions: 2.3.4
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Minor
> Fix For: 2.3.5
>
> Attachments: image-2021-03-16-13-34-29-412.png, screenshot-1.png
>
>
>  !image-2021-03-16-13-34-29-412.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25681) Add a switch for server/table queryMeter

2021-03-19 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25681.
---
Resolution: Fixed

> Add a switch for server/table queryMeter
> 
>
> Key: HBASE-25681
> URL: https://issues.apache.org/jira/browse/HBASE-25681
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Baiqiang Zhao
>Assignee: Baiqiang Zhao
>Priority: Major
> Fix For: 3.0.0-alpha-1
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25681) Add a switch for server/table queryMeter

2021-03-19 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17305238#comment-17305238
 ] 

Michael Stack commented on HBASE-25681:
---

I applied to Master only. The PR does not go back to branch-2 w/o conflict. Do 
you want to make new PRs [~DeanZ]?  One suggestion is that since this feature 
is currently ON by default, for branch-2 and branch-1, perhaps default should 
be ON rather than OFF? What do you think? Suggest adding backports as new 
JIRAs. Thanks.

> Add a switch for server/table queryMeter
> 
>
> Key: HBASE-25681
> URL: https://issues.apache.org/jira/browse/HBASE-25681
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Baiqiang Zhao
>Assignee: Baiqiang Zhao
>Priority: Major
> Fix For: 3.0.0-alpha-1
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-25681) Add a switch for server/table queryMeter

2021-03-19 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack updated HBASE-25681:
--
Fix Version/s: (was: 2.4.3)
   (was: 2.3.5)
   (was: 2.5.0)
   (was: 1.7.0)

> Add a switch for server/table queryMeter
> 
>
> Key: HBASE-25681
> URL: https://issues.apache.org/jira/browse/HBASE-25681
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Baiqiang Zhao
>Assignee: Baiqiang Zhao
>Priority: Major
> Fix For: 3.0.0-alpha-1
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (HBASE-25681) Add a switch for server/table queryMeter

2021-03-19 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack reopened HBASE-25681:
---

> Add a switch for server/table queryMeter
> 
>
> Key: HBASE-25681
> URL: https://issues.apache.org/jira/browse/HBASE-25681
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Baiqiang Zhao
>Assignee: Baiqiang Zhao
>Priority: Major
> Fix For: 3.0.0-alpha-1, 1.7.0, 2.5.0, 2.3.5, 2.4.3
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25681) Add a switch for server/table queryMeter

2021-03-19 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25681.
---
Fix Version/s: 2.4.3
   2.3.5
   2.5.0
   1.7.0
   3.0.0-alpha-1
 Hadoop Flags: Reviewed
 Release Note: 
Adds "hbase.regionserver.enable.server.query.meter" and 
"hbase.regionserver.enable.table.query.meter" switches which are off by default.

Note, these counters used to be ON by default; now they are off.
   Resolution: Fixed

Merged to branch-1 and 2.3+. [~huaxiang] FYI. Thanks for fast turnaround 
[~DeanZ]

> Add a switch for server/table queryMeter
> 
>
> Key: HBASE-25681
> URL: https://issues.apache.org/jira/browse/HBASE-25681
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Baiqiang Zhao
>Assignee: Baiqiang Zhao
>Priority: Major
> Fix For: 3.0.0-alpha-1, 1.7.0, 2.5.0, 2.3.5, 2.4.3
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25679) Size of log queue metric is incorrect in branch-1/branch-2

2021-03-19 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25679.
---
Fix Version/s: 2.4.3
   2.5.0
   1.7.0
   3.0.0-alpha-1
 Hadoop Flags: Reviewed
   Resolution: Fixed

Pushed on branch-1 and on branch-2.4+. Thanks for the fix [~shahrs87]

> Size of log queue metric is incorrect in branch-1/branch-2
> --
>
> Key: HBASE-25679
> URL: https://issues.apache.org/jira/browse/HBASE-25679
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.7.0, 2.5.0, 2.4.2
>Reporter: Rushabh Shah
>Assignee: Rushabh Shah
>Priority: Major
> Fix For: 3.0.0-alpha-1, 1.7.0, 2.5.0, 2.4.3
>
>
> In HBASE-25539 I did some refactoring for adding a new metric "oldestWalAge" 
> and tried to consolidate update to all the metrics related to 
> ReplicationSource class (size of log queue and oldest wal age) at one place.  
> In that refactoring introduced one bug where I am decrementing twice from 
> size of log queue metric whenever we remove a wal from Replication source 
> queue.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-17362) HBase Backup/Restore Phase 4

2021-03-18 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-17362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17304534#comment-17304534
 ] 

Michael Stack commented on HBASE-17362:
---

I was seeing them fail in PR test runs but not now.

> HBase Backup/Restore Phase 4
> 
>
> Key: HBASE-17362
> URL: https://issues.apache.org/jira/browse/HBASE-17362
> Project: HBase
>  Issue Type: Umbrella
>Reporter: Vladimir Rodionov
>Assignee: Mallikarjun
>Priority: Major
> Fix For: 3.0.0-alpha-1
>
>
> The umbrella JIRA for next features of the backup/restore module



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25518) Support separate child regions to different region servers

2021-03-18 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25518.
---
Fix Version/s: 2.4.3
   2.5.0
   3.0.0-alpha-1
 Hadoop Flags: Reviewed
 Release Note: 
Config key for enable/disable automatically separate child regions to different 
region servers in the procedure of split regions. One child will be kept to the 
server where parent region is on, and the other child will be assigned to a 
random server.

hbase.master.auto.separate.child.regions.after.split.enabled

Default setting is false/off.
   Resolution: Fixed

Merged to branch-2.4+. Thanks for the feature [~Xiaolin Ha]. 

> Support separate child regions to different region servers
> --
>
> Key: HBASE-25518
> URL: https://issues.apache.org/jira/browse/HBASE-25518
> Project: HBase
>  Issue Type: Improvement
>Reporter: Xiaolin Ha
>Assignee: Xiaolin Ha
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.5.0, 2.4.3
>
>
> Hot/Large regions can be splitted automatically by some split policies. But 
> children regions will be both on the RS which owns the parent region. We can 
> support dividing child regions from the master side, maybe add a step at the 
> last of SplitTableRegionProcedure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25622) Result#compareResults should compare tags.

2021-03-18 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17304382#comment-17304382
 ] 

Michael Stack commented on HBASE-25622:
---

I just merged the branch-1 PR too (updated the fix version to include 1.7.0)

> Result#compareResults should compare tags.
> --
>
> Key: HBASE-25622
> URL: https://issues.apache.org/jira/browse/HBASE-25622
> Project: HBase
>  Issue Type: Improvement
>  Components: Client
>Affects Versions: 1.7.0
>Reporter: Rushabh Shah
>Assignee: Rushabh Shah
>Priority: Major
> Fix For: 3.0.0-alpha-1, 1.7.0, 2.5.0, 2.4.2
>
>
> Today +Result#compareResults+ compares the 2 cells based on following 
> parameters.
> {noformat}
> for (int i = 0; i < res1.size(); i++) {
>   if (!ourKVs[i].equals(replicatedKVs[i]) ||
>   !CellUtil.matchingValue(ourKVs[i], replicatedKVs[i])) {
> throw new Exception("This result was different: "
> + res1.toString() + " compared to " + res2.toString());
>   }
> {noformat}
> row, family, qualifier, timestamp, type, value.
> We also need to compare tags to determine if both cells are equal or not.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-25622) Result#compareResults should compare tags.

2021-03-18 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack updated HBASE-25622:
--
Fix Version/s: 1.7.0

> Result#compareResults should compare tags.
> --
>
> Key: HBASE-25622
> URL: https://issues.apache.org/jira/browse/HBASE-25622
> Project: HBase
>  Issue Type: Improvement
>  Components: Client
>Affects Versions: 1.7.0
>Reporter: Rushabh Shah
>Assignee: Rushabh Shah
>Priority: Major
> Fix For: 3.0.0-alpha-1, 1.7.0, 2.5.0, 2.4.2
>
>
> Today +Result#compareResults+ compares the 2 cells based on following 
> parameters.
> {noformat}
> for (int i = 0; i < res1.size(); i++) {
>   if (!ourKVs[i].equals(replicatedKVs[i]) ||
>   !CellUtil.matchingValue(ourKVs[i], replicatedKVs[i])) {
> throw new Exception("This result was different: "
> + res1.toString() + " compared to " + res2.toString());
>   }
> {noformat}
> row, family, qualifier, timestamp, type, value.
> We also need to compare tags to determine if both cells are equal or not.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25643) The delayed FlushRegionEntry should be removed when we need a non-delayed one

2021-03-18 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25643.
---
Fix Version/s: 2.5.0
   3.0.0-alpha-1
 Hadoop Flags: Reviewed
   Resolution: Fixed

Merged to branch-2+. Thanks for the nice fix [~filtertip] It would not go back 
to branch-2.4 so if you'd like it to go to there, please make a backport 
subtask and attach a new PR please.
Thanks for review [~anoop.hbase]

> The delayed FlushRegionEntry should be removed when we need a non-delayed one
> -
>
> Key: HBASE-25643
> URL: https://issues.apache.org/jira/browse/HBASE-25643
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.5.0
>
>
> The regionserver periodically check all the regions, if one not flushed for 
> long time, then it will create a delayed FlushRegionEntry, the delay range is 
> 0~300s.
> During the delay time, if many data are written to the region suddenly, we 
> can not do the flush immediately due to the existing one in regionsInQueue, 
> then the RegionTooBusyException will occur.
> It is better to improve the logic here, that the delayed entry should be 
> replaced by the non-delayed one.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25594) graceful_stop.sh fails to unload regions when ran at localhost

2021-03-18 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25594.
---
Resolution: Fixed

I pushed the below to 2.4+
{code}
I pushed this to 2.3+

commit 728d4f5ab12fd2631b1ef0a7c61203e9acfb05f0 (HEAD -> 2.3, origin/branch-2.3)
Author: Javier Akira Luca de Tena 
Date:   Fri Mar 19 04:04:54 2021 +0900

HBOPS-25594 Make easier to use graceful_stop on localhost mode (#3054)

Co-authored-by: Javier 

diff --git a/bin/graceful_stop.sh b/bin/graceful_stop.sh
index 89e3dd939c..e565929606 100755
--- a/bin/graceful_stop.sh
+++ b/bin/graceful_stop.sh
@@ -32,7 +32,7 @@ moving regions"
   echo " maxthreads xx  Limit the number of threads used by the region mover. 
Default value is 1."
   echo " movetimeout xx Timeout for moving regions. If regions are not moved 
by the timeout value,\
 exit with error. Default value is INT_MAX."
-  echo " hostname   Hostname of server we are to stop"
+  echo " hostname   Hostname to stop; match what HBase uses; pass 
'localhost' if local to avoid ssh"
   echo " e|failfast Set -e so exit immediately if any command exits with 
non-zero status"
   echo " nob| nobalancer Do not manage balancer states. This is only used as 
optimization in \
 rolling_restart.sh to avoid multiple calls to hbase shell"
@@ -100,6 +100,10 @@ localhostname=`/bin/hostname`
 if [ "$localhostname" == "$hostname" ]; then
   local=true
 fi
+if [ "$localhostname" == "$hostname" ] || [ "$hostname" == "localhost" ]; then
+  local=true
+  hostname=$localhostname
+fi

 if [ "$nob" == "true"  ]; then
   log "[ $0 ] skipping disabling balancer -nob argument is used"
{code}

> graceful_stop.sh fails to unload regions when ran at localhost
> --
>
> Key: HBASE-25594
> URL: https://issues.apache.org/jira/browse/HBASE-25594
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha-1, 1.4.13
>Reporter: Javier Akira Luca de Tena
>Assignee: Javier Akira Luca de Tena
>Priority: Minor
> Fix For: 3.0.0-alpha-1, 2.5.0, 2.4.3
>
>
> We usually use graceful_stop.sh from the Master to restart RegionServers. 
> However, in some scenarios we may not have privileges to restart remote 
> RegionServers (it uses ssh).
>  But we can still use graceful_stop.sh on the same host we want to restart.
> In order to detect the execution at localhost, graceful_stop.sh uses 
> /bin/hostname.
>  
> [https://github.com/apache/hbase/blob/cfbae4d3a37e7ac4d795461c3e19406a2786838d/bin/graceful_stop.sh#L106-L110]
> When RegionMover strips the host to not include it in the list of target 
> hosts, we filter it out by checking all RegionServer hosts in the cluster:
>  
> [https://github.com/apache/hbase/blob/branch-2/hbase-server/src/main/java/org/apache/hadoop/hbase/util/RegionMover.java#L382-L384]
>  
> [https://github.com/apache/hbase/blob/cfbae4d3a37e7ac4d795461c3e19406a2786838d/hbase-server/src/main/java/org/apache/hadoop/hbase/util/RegionMover.java#L692]
> But the list of RegionServer hosts returned by Admin#getRegionServers are 
> FDQN, while the hostname provided from graceful_stop.sh is not FDQN, making 
> the comparison fail.
> Same happens for branch-1 region_mover.rb, which is the place I reproduced in 
> my environment: 
> [https://github.com/apache/hbase/blob/f9a91488b2c39320bed502619bf7adb765c79de6/bin/region_mover.rb#L305]
> [https://github.com/apache/hbase/blob/f9a91488b2c39320bed502619bf7adb765c79de6/bin/region_mover.rb#L175]
>  
> [https://github.com/apache/hbase/blob/f9a91488b2c39320bed502619bf7adb765c79de6/bin/region_mover.rb#L186-L192]
>  
> This can be fixed just by using "/bin/hostname -f" in the graceful_stop.sh 
> script.
> Will provide patch soon.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (HBASE-25594) graceful_stop.sh fails to unload regions when ran at localhost

2021-03-18 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack reopened HBASE-25594:
---

Reopening to apply addendum.

> graceful_stop.sh fails to unload regions when ran at localhost
> --
>
> Key: HBASE-25594
> URL: https://issues.apache.org/jira/browse/HBASE-25594
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha-1, 1.4.13
>Reporter: Javier Akira Luca de Tena
>Assignee: Javier Akira Luca de Tena
>Priority: Minor
> Fix For: 3.0.0-alpha-1, 2.5.0, 2.4.3
>
>
> We usually use graceful_stop.sh from the Master to restart RegionServers. 
> However, in some scenarios we may not have privileges to restart remote 
> RegionServers (it uses ssh).
>  But we can still use graceful_stop.sh on the same host we want to restart.
> In order to detect the execution at localhost, graceful_stop.sh uses 
> /bin/hostname.
>  
> [https://github.com/apache/hbase/blob/cfbae4d3a37e7ac4d795461c3e19406a2786838d/bin/graceful_stop.sh#L106-L110]
> When RegionMover strips the host to not include it in the list of target 
> hosts, we filter it out by checking all RegionServer hosts in the cluster:
>  
> [https://github.com/apache/hbase/blob/branch-2/hbase-server/src/main/java/org/apache/hadoop/hbase/util/RegionMover.java#L382-L384]
>  
> [https://github.com/apache/hbase/blob/cfbae4d3a37e7ac4d795461c3e19406a2786838d/hbase-server/src/main/java/org/apache/hadoop/hbase/util/RegionMover.java#L692]
> But the list of RegionServer hosts returned by Admin#getRegionServers are 
> FDQN, while the hostname provided from graceful_stop.sh is not FDQN, making 
> the comparison fail.
> Same happens for branch-1 region_mover.rb, which is the place I reproduced in 
> my environment: 
> [https://github.com/apache/hbase/blob/f9a91488b2c39320bed502619bf7adb765c79de6/bin/region_mover.rb#L305]
> [https://github.com/apache/hbase/blob/f9a91488b2c39320bed502619bf7adb765c79de6/bin/region_mover.rb#L175]
>  
> [https://github.com/apache/hbase/blob/f9a91488b2c39320bed502619bf7adb765c79de6/bin/region_mover.rb#L186-L192]
>  
> This can be fixed just by using "/bin/hostname -f" in the graceful_stop.sh 
> script.
> Will provide patch soon.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25671) Backport HBASE-25608 to branch-2

2021-03-18 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25671.
---
Fix Version/s: 2.4.3
   2.5.0
 Hadoop Flags: Reviewed
 Assignee: (was: Shinya Yoshida)
   Resolution: Fixed

Merged to branch-2.4+ Thanks for the backport [~lineyshinya]

> Backport HBASE-25608 to branch-2
> 
>
> Key: HBASE-25671
> URL: https://issues.apache.org/jira/browse/HBASE-25671
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Shinya Yoshida
>Priority: Major
> Fix For: 2.5.0, 2.4.3
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25674) RegionInfo.parseFrom(DataInputStream) sometimes fails to read the protobuf magic marker

2021-03-18 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25674.
---
Fix Version/s: 2.4.3
   2.3.5
   2.5.0
   3.0.0-alpha-1
 Hadoop Flags: Reviewed
   Resolution: Fixed

Merged to branch-2.3+ Thanks for the nice PR [~catalin.luca] (Thanks for review 
[~wchevreuil])

> RegionInfo.parseFrom(DataInputStream) sometimes fails to read the protobuf 
> magic marker
> ---
>
> Key: HBASE-25674
> URL: https://issues.apache.org/jira/browse/HBASE-25674
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 2.4.1
>Reporter: Constantin-Catalin Luca
>Assignee: Constantin-Catalin Luca
>Priority: Minor
> Fix For: 3.0.0-alpha-1, 2.5.0, 2.3.5, 2.4.3
>
> Attachments: HBASE_25674-2.4.1.patch
>
>
> The RegionInfo class uses
> {code:java}
>  DataInputStream.read(byte[lengthOfPBMagic])
> {code}
> to read the protobuf magic marker from the beginning of the stream.
> The code in RegionInfo assumes that the passed byte buffer will be filled, 
> but the DataInputStream class only guarantees that it will read at most 
> lengthOfPBMagic bytes.
> This sometimes causes errors stating that region info file could not be 
> parsed.
> The fix is to simply issue multiple read calls until lengthOfPBMagic bytes 
> have been read.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-25674) RegionInfo.parseFrom(DataInputStream) sometimes fails to read the protobuf magic marker

2021-03-18 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack updated HBASE-25674:
--
Status: In Progress  (was: Patch Available)

> RegionInfo.parseFrom(DataInputStream) sometimes fails to read the protobuf 
> magic marker
> ---
>
> Key: HBASE-25674
> URL: https://issues.apache.org/jira/browse/HBASE-25674
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 2.4.1
>Reporter: Constantin-Catalin Luca
>Assignee: Constantin-Catalin Luca
>Priority: Minor
> Attachments: HBASE_25674-2.4.1.patch
>
>
> The RegionInfo class uses
> {code:java}
>  DataInputStream.read(byte[lengthOfPBMagic])
> {code}
> to read the protobuf magic marker from the beginning of the stream.
> The code in RegionInfo assumes that the passed byte buffer will be filled, 
> but the DataInputStream class only guarantees that it will read at most 
> lengthOfPBMagic bytes.
> This sometimes causes errors stating that region info file could not be 
> parsed.
> The fix is to simply issue multiple read calls until lengthOfPBMagic bytes 
> have been read.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25677) Server+table counters on each scan #nextRaw invocation becomes a bottleneck when heavy load

2021-03-18 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17304346#comment-17304346
 ] 

Michael Stack commented on HBASE-25677:
---

Thank you for reviews [~reidchan] and [~DeanZ]

> Server+table counters on each scan #nextRaw invocation becomes a bottleneck 
> when heavy load
> ---
>
> Key: HBASE-25677
> URL: https://issues.apache.org/jira/browse/HBASE-25677
> Project: HBase
>  Issue Type: Sub-task
>  Components: metrics
>Affects Versions: 2.3.2
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.5.0, 2.3.5, 2.4.3
>
>
> On a heavily loaded server mostly doing reads/scan, I saw that 90+% of 
> handlers were BLOCKED in this fashion in thread dumps:
> {code}
> "RpcServer.default.FPBQ.Fifo.handler=117,queue=17,port=16020" #161 daemon 
> prio=5 os_prio=0 tid=0x7f748757f000 nid=0x73e9 waiting for monitor entry 
> [0x7f74783e]
>   java.lang.Thread.State: BLOCKED (on object monitor)
>at 
> java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1674)
>- waiting to lock <0x7f7647e3cc38> (a 
> java.util.concurrent.ConcurrentHashMap$Node)
>at 
> org.apache.hadoop.hbase.regionserver.MetricsTableQueryMeterImpl.getOrCreateTableMeter(MetricsTableQueryMeterImpl.java:80)
>at 
> org.apache.hadoop.hbase.regionserver.MetricsTableQueryMeterImpl.updateTableReadQueryMeter(MetricsTableQueryMeterImpl.java:90)
>at 
> org.apache.hadoop.hbase.regionserver.RegionServerTableMetrics.updateTableReadQueryMeter(RegionServerTableMetrics.java:89)
>at 
> org.apache.hadoop.hbase.regionserver.MetricsRegionServer.updateReadQueryMeter(MetricsRegionServer.java:274)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:6742)
>at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3319)
>- locked <0x7f896c0165a0> (a 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl)
>at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3566)
>at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:44858)
>at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:393)
>at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133)
>at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338)
>at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318)
> {code}
> It kept up for good periods of time.
> I saw it to a leser extent on other servers, with less load.
> These RS had 400+ Regions a good few of which were serving out scan reads; 
> the server was doing ~1M hits a second. In this scenario, I saw the above 
> bottleneck.
> Looking at it, it came in w/ when the parent issue feature was added. There 
> are these read counts and then there were also write counts. The write counts 
> are mostly batch-based. Let me do same thing here for the read update the 
> central server+table count after scan is done rather than per invocation of 
> #nextRaw.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


<    1   2   3   4   5   6   7   8   9   10   >