from:"stack \(JIRA\)"

[jira] [Created] (HBASE-27396) Purge stale pom comment: ""

2022-09-27 Thread Michael Stack (Jira)

Michael Stack created HBASE-27396:
-

 Summary: Purge stale pom comment: ""
 Key: HBASE-27396
 URL: https://issues.apache.org/jira/browse/HBASE-27396
 Project: HBase
  Issue Type: Task
Reporter: Michael Stack


Any pom that has a hadoop-2.0 profile in it – all but the master branch – has 
this comment in the activation clause:

[jira] [Created] (HBASE-27340) Artifacts with resolved profiles

2022-08-27 Thread Michael Stack (Jira)

Michael Stack created HBASE-27340:
-

 Summary: Artifacts with resolved profiles
 Key: HBASE-27340
 URL: https://issues.apache.org/jira/browse/HBASE-27340
 Project: HBase
  Issue Type: Brainstorming
Reporter: Michael Stack


Brainstorming/Discussion. The maven-flatten-plugin makes it so published poms 
are 'flattened'. The poms contain the runtime-necessary dependencies only, 
'build' and 'test' dependencies and plugins are dropped, versions are resolved 
out of properties, and so on. The published poms are the barebones minimum 
needed to run.

With a switch, the plugin can also make it so the produced poms have all 
profiles 'resolved' – making it so the produced poms have all resolved hadoop2 
or hadoop3 dependencies baked-in – based off which profile we used building.

(I've been interested in this flattening technique since I ran into a 
downstreamer using hbase from a gradle build. Gradle does not respect profiles. 
You can't specify that the gradle build pull in hbase with hadoop3 dependencies 
using 'profiles'. I notice too our [~gjacoby] , [~apurtell] et al. up on the 
dev list talking about making a hadoop3 set of artifacts...who might be 
interested in this direction).

The attached patch adds the flatten plugin so folks can take a look-see. It 
uncovers some locations where our versioning on dependencies is not explicit. 
The workaround practiced here was adding hadoop2/hadoop3 profiles into 
sub-modules that were missing them or moving problematic dependencies that were 
outside of profiles under profiles in sub-modules that had them already. For 
the latter, if the dependency specified excludes, the excludes were moved up to 
the parent pom profile (parent pom profiles have dependencyManagement 
sections... sub-modules have explicit dependency mentions... checks with 
dependency:tree seem to show excludes continue to be effective).

This is the switch that flattens profiles:   
true

This is the sort of complaint we had when the flatten plugin was having trouble 
figure dependency versions – particularly hadoop versions

{{[ERROR] Failed to execute goal 
org.codehaus.mojo:flatten-maven-plugin:1.3.0:flatten (flatten) on project 
hbase-hadoop2-compat: 3 problems were encountered while building the effective 
model for org.apache.hbase:hbase-hadoop2-compat:2.5.1-SNAPSHOT}}

{{[ERROR] [WARNING] 'build.plugins.plugin.version' for 
org.codehaus.mojo:flatten-maven-plugin is missing. @}}

{{[ERROR] [ERROR] 'dependencies.dependency.version' for 
org.apache.hadoop:hadoop-mapreduce-client-core:jar is missing. @}}

{{[ERROR] [ERROR] 'dependencies.dependency.version' for 
javax.activation:javax.activation-api:jar is missing. @}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HBASE-27338) brotli compression lib tests fail on arm64

2022-08-26 Thread Michael Stack (Jira)

Michael Stack created HBASE-27338:
-

 Summary: brotli compression lib tests fail on arm64
 Key: HBASE-27338
 URL: https://issues.apache.org/jira/browse/HBASE-27338
 Project: HBase
  Issue Type: Improvement
Affects Versions: 2.5.0
Reporter: Michael Stack


The brotli tests fail on M1 macs

 

{{[INFO] Running org.apache.hadoop.hbase.io.compress.brotli.TestBrotliCodec}}
{{[INFO] Running 
org.apache.hadoop.hbase.io.compress.brotli.TestHFileCompressionBrotli}}
{{[ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.33 
s <<< FAILURE! - in 
org.apache.hadoop.hbase.io.compress.brotli.TestHFileCompressionBrotli}}
{{[ERROR] 
org.apache.hadoop.hbase.io.compress.brotli.TestHFileCompressionBrotli.test  
Time elapsed: 0.225 s  <<< ERROR!}}
{{java.lang.UnsatisfiedLinkError: Failed to load Brotli native library}}

{{...}}

 

The lib is installed on this machine. A new release of 
*[Brotli4j|https://github.com/hyperxpro/Brotli4j]* lib, 1.8.0, done a few days 
ago fixes the issue... (See [https://github.com/hyperxpro/Brotli4j/pull/34).] I 
tried it .

{{[INFO] ---}}
{{[INFO]  T E S T S}}
{{[INFO] ---}}
{{[INFO] Running org.apache.hadoop.hbase.io.compress.brotli.TestBrotliCodec}}
{{[INFO] Running 
org.apache.hadoop.hbase.io.compress.brotli.TestHFileCompressionBrotli}}
{{[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.036 
s - in org.apache.hadoop.hbase.io.compress.brotli.TestHFileCompressionBrotli}}

{{[INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 8.42 s 
- in org.apache.hadoop.hbase.io.compress.brotli.TestBrotliCodec}}
{{[INFO]}}
{{[INFO] Results:}}
{{[INFO]}}
{{[INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0}}
{{[INFO]}}
{{[INFO]}}
{{[INFO] --- maven-surefire-plugin:3.0.0-M6:test (secondPartTestsExecution) @ 
hbase-compression-brotli ---}}
{{[INFO] Tests are skipped.}}
{{[INFO]}}
{{[INFO] --- maven-jar-plugin:3.2.0:test-jar (default) @ 
hbase-compression-brotli ---}}
{{[INFO] Building jar: 
/Users/stack/checkouts/hbase/2.5.0RC1/hbase-2.5.0/hbase-compression/hbase-compression-brotli/target/hbase-compression-brotli-2.5.0-tests.jar}}
{{[INFO]}}
{{[INFO] --- maven-jar-plugin:3.2.0:jar (default-jar) @ 
hbase-compression-brotli ---}}
{{[INFO] Building jar: 
/Users/stack/checkouts/hbase/2.5.0RC1/hbase-2.5.0/hbase-compression/hbase-compression-brotli/target/hbase-compression-brotli-2.5.0.jar}}
{{[INFO]}}
{{[INFO] --- maven-site-plugin:3.12.0:attach-descriptor (attach-descriptor) @ 
hbase-compression-brotli ---}}
{{[INFO] Skipping because packaging 'jar' is not pom.}}
{{[INFO]}}
{{[INFO] --- maven-install-plugin:2.5.2:install (default-install) @ 
hbase-compression-brotli ---}}
{{[INFO] Installing 
/Users/stack/checkouts/hbase/2.5.0RC1/hbase-2.5.0/hbase-compression/hbase-compression-brotli/target/hbase-compression-brotli-2.5.0.jar
 to 
/Users/stack/.m2/repository/org/apache/hbase/hbase-compression-brotli/2.5.0/hbase-compression-brotli-2.5.0.jar}}
{{[INFO] Installing 
/Users/stack/checkouts/hbase/2.5.0RC1/hbase-2.5.0/hbase-compression/hbase-compression-brotli/pom.xml
 to 
/Users/stack/.m2/repository/org/apache/hbase/hbase-compression-brotli/2.5.0/hbase-compression-brotli-2.5.0.pom}}
{{[INFO] Installing 
/Users/stack/checkouts/hbase/2.5.0RC1/hbase-2.5.0/hbase-compression/hbase-compression-brotli/target/hbase-compression-brotli-2.5.0-tests.jar
 to 
/Users/stack/.m2/repository/org/apache/hbase/hbase-compression-brotli/2.5.0/hbase-compression-brotli-2.5.0-tests.jar}}
{{[INFO] 
}}
{{[INFO] BUILD SUCCESS}}
{{[INFO] 
}}
{{[INFO] Total time:  16.805 s}}
{{[INFO] Finished at: 2022-08-26T11:30:13-07:00}}
{{[INFO] 
}}

 

 

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (HBASE-26321) Post blog to hbase.apache.org on SCR cache sizing

2021-10-05 Thread Michael Stack (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-26321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-26321.
---
Fix Version/s: 3.0.0-alpha-2
 Hadoop Flags: Reviewed
 Release Note: Pushed blog at 
https://blogs.apache.org/hbase/entry/an-hbase-hdfs-short-circuit
 Assignee: Michael Stack
   Resolution: Fixed

Thanks for taking a look [~psomogyi]

I pushed it here 
[https://blogs.apache.org/hbase/entry/an-hbase-hdfs-short-circuit]

Shout if anyone wants to add edits.

> Post blog to hbase.apache.org on SCR cache sizing
> -
>
> Key: HBASE-26321
> URL: https://issues.apache.org/jira/browse/HBASE-26321
> Project: HBase
>  Issue Type: Task
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Major
> Fix For: 3.0.0-alpha-2
>
>
> [~huaxiangsun] and I wrote up our experience debugging a Short-circuit Read 
> cache size issue. Let me attach link here and leave it hang here a few days 
> in case edits or input from others.  Intend to put it up here 
> https://blogs.apache.org/hbase/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HBASE-26321) Post blog to hbase.apache.org on SCR cache sizing

2021-10-01 Thread Michael Stack (Jira)

Michael Stack created HBASE-26321:
-

 Summary: Post blog to hbase.apache.org on SCR cache sizing
 Key: HBASE-26321
 URL: https://issues.apache.org/jira/browse/HBASE-26321
 Project: HBase
  Issue Type: Task
Reporter: Michael Stack


[~huaxiangsun] and I wrote up our experience debugging a Short-circuit Read 
cache size issue. Let me attach link here and leave it hang here a few days in 
case edits or input from others.  Intend to put it up here 
https://blogs.apache.org/hbase/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HBASE-26103) conn.getBufferedMutator(tableName) leaks thread executors and other problems (for master branch)

2021-08-30 Thread Michael Stack (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-26103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-26103.
---
Fix Version/s: 3.0.0-alpha-2
 Hadoop Flags: Reviewed
 Release Note: Deprecate (unused) BufferedMutatorParams#pool and 
BufferedMutatorParams#getPool
   Resolution: Fixed

Merged the PR. Thanks for the contrib [~shahrs87]  (and review [~anoop.hbase] ).

> conn.getBufferedMutator(tableName) leaks thread executors and other problems 
> (for master branch)
> 
>
> Key: HBASE-26103
> URL: https://issues.apache.org/jira/browse/HBASE-26103
> Project: HBase
>  Issue Type: Sub-task
>  Components: Client
>Affects Versions: 3.0.0-alpha-1
>Reporter: Rushabh Shah
>Assignee: Rushabh Shah
>Priority: Major
> Fix For: 3.0.0-alpha-2
>
>
> This is same as HBASE-26088  but created separate ticket for master branch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HBASE-24337) Backport HBASE-23968 to branch-2

2021-08-18 Thread Michael Stack (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-24337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-24337.
---
Fix Version/s: 2.5.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

Backported to branch-2. Resolving.

> Backport HBASE-23968 to branch-2
> 
>
> Key: HBASE-24337
> URL: https://issues.apache.org/jira/browse/HBASE-24337
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Minwoo Kang
>Assignee: Minwoo Kang
>Priority: Minor
> Fix For: 2.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HBASE-24842) make export snapshot report size can be config

2021-08-16 Thread Michael Stack (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-24842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-24842.
---
Fix Version/s: 3.0.0-alpha-2
 Hadoop Flags: Reviewed
 Release Note: Set new config snapshot.export.report.size to size at which 
you want to see reporting.
   Resolution: Fixed

Merged to master. Make subtask if you'd like it backported. Thanks for the PR 
[~chenyechao]

> make export snapshot report size can be config
> --
>
> Key: HBASE-24842
> URL: https://issues.apache.org/jira/browse/HBASE-24842
> Project: HBase
>  Issue Type: Improvement
>  Components: mapreduce
>Reporter: Yechao Chen
>Assignee: Yechao Chen
>Priority: Minor
> Fix For: 3.0.0-alpha-2
>
>
> current export snapshot will be report ONE MB (1*1024*1024 Bytes),
> we can make it can be config 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HBASE-24652) master-status UI make date type fields sortable

2021-08-16 Thread Michael Stack (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-24652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-24652.
---
Fix Version/s: 2.3.7
   2.4.6
   2.5.0
 Hadoop Flags: Reviewed
 Release Note: Makes RegionServer 'Start time' sortable in the Master UI
 Assignee: jeongmin kim
   Resolution: Fixed

[~jeongmin.kim] pardon me. I forgot about this one.  I pushed to branch-2.3+  
Thanks for the fix and please pardon my oversight.

> master-status UI make date type fields sortable
> ---
>
> Key: HBASE-24652
> URL: https://issues.apache.org/jira/browse/HBASE-24652
> Project: HBase
>  Issue Type: Improvement
>  Components: master, Operability, UI, Usability
>Affects Versions: 3.0.0-alpha-1, 2.2.0, 2.3.0, 2.1.5, 2.2.1, 2.1.6
>Reporter: Jeongmin Kim
>Assignee: jeongmin kim
>Priority: Minor
> Fix For: 2.5.0, 3.0.0-alpha-2, 2.4.6, 2.3.7
>
> Attachments: SCREEN_SHOT1.png
>
>
> Revisit of HBASE-21207, HBASE-22543
> date type values such as regionserver list 'Start time' field on 
> master-status page, are not sorted by time.
> HBASE-21207, HBASE-22543 missed it. so before this fix, date sorted as String.
> The first field of it is 'day'. therefore always Friday goes first Wednesday 
> goes last, no matter what date it is.
>    * SCREEN_SHOT1.png
>  
> this fix make date type values sorted by time and date.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HBASE-26200) Undo 'HBASE-25165 Change 'State time' in UI so sorts (#2508)' in favor of HBASE-24652

2021-08-16 Thread Michael Stack (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-26200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-26200.
---
Fix Version/s: 2.3.7
   2.4.6
   3.0.0-alpha-2
   2.5.0
 Release Note: Undid showing RegionServer 'Start time' in ISO-8601 format. 
Revert.
 Assignee: Michael Stack
   Resolution: Fixed

> Undo 'HBASE-25165 Change 'State time' in UI so sorts (#2508)' in favor of 
> HBASE-24652
> -
>
> Key: HBASE-26200
> URL: https://issues.apache.org/jira/browse/HBASE-26200
> Project: HBase
>  Issue Type: Bug
>  Components: UI
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Major
> Fix For: 2.5.0, 3.0.0-alpha-2, 2.4.6, 2.3.7
>
>
> The below change by me does not actually work and I found an old issue that 
> does the proper job that was neglected. I'm undoing the below in favor of 
> HBASE-24652.
>  
> kalashnikov:hbase.apache.git stack$ git show 
> d07d181ea4a9da316659bb21fd4fffc979b5f77a
> commit d07d181ea4a9da316659bb21fd4fffc979b5f77a
> Author: Michael Stack 
> Date: Thu Oct 8 09:10:30 2020 -0700
> HBASE-25165 Change 'State time' in UI so sorts (#2508)
> Display startcode in iso8601.
> Signed-off-by: Nick Dimiduk 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HBASE-26200) Undo 'HBASE-25165 Change 'State time' in UI so sorts (#2508)' in favor of HBASE-24652

2021-08-16 Thread Michael Stack (Jira)

Michael Stack created HBASE-26200:
-

 Summary: Undo 'HBASE-25165 Change 'State time' in UI so sorts 
(#2508)' in favor of HBASE-24652
 Key: HBASE-26200
 URL: https://issues.apache.org/jira/browse/HBASE-26200
 Project: HBase
  Issue Type: Bug
  Components: UI
Reporter: Michael Stack


The below change by me does not actually work and I found an old issue that 
does the proper job that was neglected. I'm undoing the below in favor of 
HBASE-24652.

 

kalashnikov:hbase.apache.git stack$ git show 
d07d181ea4a9da316659bb21fd4fffc979b5f77a
commit d07d181ea4a9da316659bb21fd4fffc979b5f77a
Author: Michael Stack 
Date: Thu Oct 8 09:10:30 2020 -0700

HBASE-25165 Change 'State time' in UI so sorts (#2508)

Display startcode in iso8601.

Signed-off-by: Nick Dimiduk 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HBASE-24339) Backport HBASE-23968 to branch-1

2021-08-16 Thread Michael Stack (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-24339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-24339.
---
Resolution: Won't Fix

> Backport HBASE-23968 to branch-1
> 
>
> Key: HBASE-24339
> URL: https://issues.apache.org/jira/browse/HBASE-24339
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Minwoo Kang
>Assignee: Minwoo Kang
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HBASE-26037) Implement namespace and table level access control for thrift & thrift2

2021-08-16 Thread Michael Stack (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-26037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-26037.
---
Fix Version/s: 3.0.0-alpha-2
 Hadoop Flags: Reviewed
   Resolution: Fixed

Merged to master. Thanks for the PR [~xytss123]  (and review [~zhangduo] ). I 
tried to go back to branch-2 so could be in 2.5.0 but CONFLICT. Make a sub-task 
if you want a backport. Thank you.

> Implement namespace and table level access control for thrift & thrift2
> ---
>
> Key: HBASE-26037
> URL: https://issues.apache.org/jira/browse/HBASE-26037
> Project: HBase
>  Issue Type: Improvement
>  Components: Admin, Thrift
>Reporter: Yutong Xiao
>Assignee: Yutong Xiao
>Priority: Major
> Fix For: 3.0.0-alpha-2
>
>
> Client can grant or revoke ns & table level user permissions through thrift & 
> thrift2. This is implemented with AccessControlClient.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HBASE-26191) Annotate shaded generated protobuf as InterfaceAudience.Private

2021-08-11 Thread Michael Stack (Jira)

Michael Stack created HBASE-26191:
-

 Summary: Annotate shaded generated protobuf as 
InterfaceAudience.Private
 Key: HBASE-26191
 URL: https://issues.apache.org/jira/browse/HBASE-26191
 Project: HBase
  Issue Type: Task
  Components: Coprocessors, Protobufs
Reporter: Michael Stack


Annotate generated shaded protobufs as InterfaceAudience.Private. It might not 
be able to add the annotation to each class; at a minimum update the doc on our 
story around shaded internal protobufs.

See the prompting mailing list discussion here: 
[https://lists.apache.org/thread.html/r9e6eb11106727d245f6eb2a5023823901637971d6ed0f0aedaf8d149%40%3Cdev.hbase.apache.org%3E]

So far the consensus has it that the shaded generated protobuf should be made 
IA.Private.  Will wait on it to settle.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HBASE-16756) InterfaceAudience annotate our protobuf; distinguish internal; publish public

2021-08-11 Thread Michael Stack (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-16756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-16756.
---
Resolution: Won't Fix

Not doing this.

> InterfaceAudience annotate our protobuf; distinguish internal; publish public
> -
>
> Key: HBASE-16756
> URL: https://issues.apache.org/jira/browse/HBASE-16756
> Project: HBase
>  Issue Type: Task
>  Components: Protobufs
>Reporter: Michael Stack
>Priority: Major
>
> This is a follow-on from the work done over in HBASE-15638 Shade protobuf.
> Currently protobufs are not annotated as our java classes are even though 
> they are being used by downstream Coprocessor Endpoints; i.e. if a CPEP wants 
> to update a Cell in HBase or refer to a server in the cluster, 9 times out of 
> 10 they will depend on the HBase Cell.proto and its generated classes or the 
> ServerName definition in HBase.proto file.
> This makes it so we cannot make breaking changes to the Cell type or relocate 
> the ServerName definition to another file if we want CPEPs to keep working.
> The issue gets compounded by HBASE-15638 "Shade protobuf" where protos used 
> internally are relocated, and given another package name altogether. 
> Currently we leave behind the old protos (sort-of duplicated) so CPEPs keep 
> working but going forward, IF WE CONTINUE DOWN THIS PATH OF SHADING PROTOS 
> (we may revisit if hadoop ends up isolating its classpath), then we need to 
> 'publish' protos that we will honor as we would classes annotate with 
> @InterfaceAudience.Public as part of our public API going forward.
> What is involved is a review of the current protos under hbase-protocol. Sort 
> out what is to be made public. We will likely have to break up current proto 
> files into smaller collections since they currently contain mixes of public 
> and private types. Deprecate the fat Admin and Client protos.  This will 
> allow us to better narrow the set of what we make public. These new files 
> could live in the hbase-protocol module suitably annotated or they could be 
> done up in a new module altogether. TODO.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HBASE-6908) Pluggable Call BlockingQueue for HBaseServer

2021-08-09 Thread Michael Stack (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-6908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-6908.
--
Fix Version/s: 2.4.6
   3.0.0-alpha-2
   2.5.0
 Hadoop Flags: Reviewed
 Release Note: 
Can pass in a FQCN to load as the call queue implementation.

Standardized arguments to the constructor are the max queue length, the 
PriorityFunction, and the Configuration.

PluggableBlockingQueue abstract class provided to help guide the correct 
constructor signature.

Hard fails with PluggableRpcQueueNotFound if the class fails to load as a 
BlockingQueue

Upstreaming on behalf of Hubspot, we are interested in defining our own custom 
RPC queue and don't want to get involved in necessarily upstreaming internal 
requirements/iterations. 

   Resolution: Fixed

Merged to branch-2.4+. Thanks for the clean pluggable Interface [~rmarsch]  
I put your PR comment as release note. Edit if you see fit.

> Pluggable Call BlockingQueue for HBaseServer
> 
>
> Key: HBASE-6908
> URL: https://issues.apache.org/jira/browse/HBASE-6908
> Project: HBase
>  Issue Type: New Feature
>  Components: IPC/RPC
>Reporter: James Taylor
>Priority: Major
> Fix For: 2.5.0, 3.0.0-alpha-2, 2.4.6
>
>
> Allow the BlockingQueue implementation class to be specified in the HBase 
> config to enable different behavior than a FIFO queue. The use case we have 
> is around fairness and starvation for big scans that are parallelized on the 
> client. It's easy to fill up the HBase server Call BlockingQueue when 
> processing a single parallelized scan, leadng other scans to time out. 
> Instead, doing round robin processesing on a dequeue through a different 
> BlockingQueue implementation will prevent this from occurring.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HBASE-26170) handleTooBigRequest in NettyRpcServer didn't skip enough bytes

2021-08-05 Thread Michael Stack (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-26170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-26170.
---
Fix Version/s: 2.3.7
   2.4.6
   3.0.0-alpha-2
 Hadoop Flags: Reviewed
   Resolution: Fixed

Merged to branch-2.3+. Nice fix [~Xiaolin Ha]

> handleTooBigRequest in NettyRpcServer didn't skip enough bytes
> --
>
> Key: HBASE-26170
> URL: https://issues.apache.org/jira/browse/HBASE-26170
> Project: HBase
>  Issue Type: Bug
>Reporter: Xiaolin Ha
>Assignee: Xiaolin Ha
>Priority: Major
> Fix For: 3.0.0-alpha-2, 2.4.6, 2.3.7
>
> Attachments: error-logs.png
>
>
> We found there are always coredump problems after too big requests, the logs 
> are as follows,
> !error-logs.png|width=1040,height=187!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HBASE-26153) [create-release] Use cmd-line defined env vars

2021-08-04 Thread Michael Stack (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-26153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-26153.
---
Fix Version/s: 3.0.0-alpha-2
 Assignee: Michael Stack
   Resolution: Fixed

Merged trivial create-release script changes.

> [create-release] Use cmd-line defined env vars
> --
>
> Key: HBASE-26153
> URL: https://issues.apache.org/jira/browse/HBASE-26153
> Project: HBase
>  Issue Type: Improvement
>  Components: RC
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Trivial
> Fix For: 3.0.0-alpha-2
>
>
> Minor item. The create-release scripts allows defining some of the variables 
> used on the command line but not all. Fix.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HBASE-26162) Release 2.3.6

2021-08-02 Thread Michael Stack (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-26162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-26162.
---
Fix Version/s: 2.3.7
 Assignee: Michael Stack
   Resolution: Fixed

Sent announcement email, ran all steps in above list. Downloads will update 
tonight. Resolving.

> Release 2.3.6
> -
>
> Key: HBASE-26162
> URL: https://issues.apache.org/jira/browse/HBASE-26162
> Project: HBase
>  Issue Type: Task
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Major
> Fix For: 2.3.7
>
> Attachments: image-2021-08-02-09-54-56-469.png
>
>
> 2.3.6RC3 was voted as 2.3.6 release.
> Run the release steps listed here for 2.3.6
> !image-2021-08-02-09-54-56-469.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HBASE-26162) Release 2.3.6

2021-08-02 Thread Michael Stack (Jira)

Michael Stack created HBASE-26162:
-

 Summary: Release 2.3.6
 Key: HBASE-26162
 URL: https://issues.apache.org/jira/browse/HBASE-26162
 Project: HBase
  Issue Type: Task
Reporter: Michael Stack
 Attachments: image-2021-08-02-09-54-56-469.png

2.3.6RC3 was voted as 2.3.6 release.

Run the release steps listed here for 2.3.6

!image-2021-08-02-09-54-56-469.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HBASE-26153) [create-release] Use cmd-line defined env vars

2021-07-29 Thread Michael Stack (Jira)

Michael Stack created HBASE-26153:
-

 Summary: [create-release] Use cmd-line defined env vars
 Key: HBASE-26153
 URL: https://issues.apache.org/jira/browse/HBASE-26153
 Project: HBase
  Issue Type: Improvement
  Components: RC
Reporter: Michael Stack


Minor item. The create-release scripts allows defining some of the variables 
used on the command line but not all. Fix.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HBASE-26146) Allow custom opts for hbck in hbase bin

2021-07-27 Thread Michael Stack (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-26146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-26146.
---
Fix Version/s: 2.4.6
 Release Note: Adds HBASE_HBCK_OPTS environment variable to bin/hbase for 
passing extra options to hbck/hbck2. Defaults to HBASE_SERVER_JAAS_OPTS if 
specified, or HBASE_REGIONSERVER_OPTS.
   Resolution: Fixed

Pushed #3537 to branch-2.4. Re-resolving.

 

Added a release note [~anoop.hbase]

> Allow custom opts for hbck in hbase bin
> ---
>
> Key: HBASE-26146
> URL: https://issues.apache.org/jira/browse/HBASE-26146
> Project: HBase
>  Issue Type: Improvement
>Reporter: Bryan Beaudreault
>Assignee: Bryan Beaudreault
>Priority: Minor
> Fix For: 2.5.0, 3.0.0-alpha-2, 2.4.6
>
>
> https://issues.apache.org/jira/browse/HBASE-15145 made it so that when you 
> execute {{hbase hbck}}, the regionserver or JAAS opts are added automatically 
> to the command line. This is problematic in some cases depending on what 
> regionserver opts have been set. For instance, one might configure a jmx port 
> for the regionserver but then hbck will fail due to a port conflict if run on 
> the same host as a regionserver. Another example would be that a regionserver 
> might define an {{-Xms}} value which is significantly more than hbck requires.
>  
> We should make it possible for users to define their own HBASE_HBCK_OPTS 
> which take precedence over the server opts added by default.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Reopened] (HBASE-26146) Allow custom opts for hbck in hbase bin

2021-07-27 Thread Michael Stack (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-26146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack reopened HBASE-26146:
---

Reopening to apply backport to branch-2.3 (#3537)

> Allow custom opts for hbck in hbase bin
> ---
>
> Key: HBASE-26146
> URL: https://issues.apache.org/jira/browse/HBASE-26146
> Project: HBase
>  Issue Type: Improvement
>Reporter: Bryan Beaudreault
>Assignee: Bryan Beaudreault
>Priority: Minor
> Fix For: 2.5.0, 3.0.0-alpha-2
>
>
> https://issues.apache.org/jira/browse/HBASE-15145 made it so that when you 
> execute {{hbase hbck}}, the regionserver or JAAS opts are added automatically 
> to the command line. This is problematic in some cases depending on what 
> regionserver opts have been set. For instance, one might configure a jmx port 
> for the regionserver but then hbck will fail due to a port conflict if run on 
> the same host as a regionserver. Another example would be that a regionserver 
> might define an {{-Xms}} value which is significantly more than hbck requires.
>  
> We should make it possible for users to define their own HBASE_HBCK_OPTS 
> which take precedence over the server opts added by default.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HBASE-26148) Backport HBASE-26146 to branch-2.4

2021-07-27 Thread Michael Stack (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-26148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-26148.
---
Resolution: Invalid

> Backport HBASE-26146 to branch-2.4
> --
>
> Key: HBASE-26148
> URL: https://issues.apache.org/jira/browse/HBASE-26148
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Bryan Beaudreault
>Assignee: Bryan Beaudreault
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HBASE-26146) Allow custom opts for hbck in hbase bin

2021-07-27 Thread Michael Stack (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-26146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-26146.
---
Fix Version/s: 3.0.0-alpha-2
   2.5.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

Thanks for the patch [~bbeaudreault] Merged to branch-2+.  It didn't go to 
branch-2.4. Conflicts. Make a subtask for a backport if you want it in 2.4/2.3 
boss.

> Allow custom opts for hbck in hbase bin
> ---
>
> Key: HBASE-26146
> URL: https://issues.apache.org/jira/browse/HBASE-26146
> Project: HBase
>  Issue Type: Improvement
>Reporter: Bryan Beaudreault
>Assignee: Bryan Beaudreault
>Priority: Minor
> Fix For: 2.5.0, 3.0.0-alpha-2
>
>
> https://issues.apache.org/jira/browse/HBASE-15145 made it so that when you 
> execute {{hbase hbck}}, the regionserver or JAAS opts are added automatically 
> to the command line. This is problematic in some cases depending on what 
> regionserver opts have been set. For instance, one might configure a jmx port 
> for the regionserver but then hbck will fail due to a port conflict if run on 
> the same host as a regionserver. Another example would be that a regionserver 
> might define an {{-Xms}} value which is significantly more than hbck requires.
>  
> We should make it possible for users to define their own HBASE_HBCK_OPTS 
> which take precedence over the server opts added by default.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HBASE-26001) When turn on access control, the cell level TTL of Increment and Append operations is invalid.

2021-07-26 Thread Michael Stack (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-26001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-26001.
---
Resolution: Fixed

Removed 2.3.6 as fix version after revert.

> When turn on access control, the cell level TTL of Increment and Append 
> operations is invalid.
> --
>
> Key: HBASE-26001
> URL: https://issues.apache.org/jira/browse/HBASE-26001
> Project: HBase
>  Issue Type: Bug
>  Components: Coprocessors
>Reporter: Yutong Xiao
>Assignee: Yutong Xiao
>Priority: Minor
> Fix For: 2.6.7, 2.5.0, 2.4.5, 3.0.0-alpha-1
>
>
> AccessController postIncrementBeforeWAL() and postAppendBeforeWAL() methods 
> will rewrite the new cell's tags by the old cell's. This will makes the other 
> kinds of tag in new cell invisible (such as TTL tag) after this. As in 
> Increment and Append operations, the new cell has already catch forward all 
> tags of the old cell and TTL tag from mutation operation, here in 
> AccessController we do not need to rewrite the tags once again. Also, the TTL 
> tag of newCell will be invisible in the new created cell. Actually, in 
> Increment and Append operations, the newCell has already copied all tags of 
> the oldCell. So the oldCell is useless here.
> {code:java}
> private Cell createNewCellWithTags(Mutation mutation, Cell oldCell, Cell 
> newCell) {
> // Collect any ACLs from the old cell
> List tags = Lists.newArrayList();
> List aclTags = Lists.newArrayList();
> ListMultimap perms = ArrayListMultimap.create();
> if (oldCell != null) {
>   Iterator tagIterator = PrivateCellUtil.tagsIterator(oldCell);
>   while (tagIterator.hasNext()) {
> Tag tag = tagIterator.next();
> if (tag.getType() != PermissionStorage.ACL_TAG_TYPE) {
>   // Not an ACL tag, just carry it through
>   if (LOG.isTraceEnabled()) {
> LOG.trace("Carrying forward tag from " + oldCell + ": type " + 
> tag.getType()
> + " length " + tag.getValueLength());
>   }
>   tags.add(tag);
> } else {
>   aclTags.add(tag);
> }
>   }
> }
> // Do we have an ACL on the operation?
> byte[] aclBytes = mutation.getACL();
> if (aclBytes != null) {
>   // Yes, use it
>   tags.add(new ArrayBackedTag(PermissionStorage.ACL_TAG_TYPE, aclBytes));
> } else {
>   // No, use what we carried forward
>   if (perms != null) {
> // TODO: If we collected ACLs from more than one tag we may have a
> // List of size > 1, this can be collapsed into a single
> // Permission
> if (LOG.isTraceEnabled()) {
>   LOG.trace("Carrying forward ACLs from " + oldCell + ": " + perms);
> }
> tags.addAll(aclTags);
>   }
> }
> // If we have no tags to add, just return
> if (tags.isEmpty()) {
>   return newCell;
> }
> // Here the new cell's tags will be in visible.
> return PrivateCellUtil.createCell(newCell, tags);
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Reopened] (HBASE-26001) When turn on access control, the cell level TTL of Increment and Append operations is invalid.

2021-07-26 Thread Michael Stack (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-26001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack reopened HBASE-26001:
---

Reopening to revert from branch-2.3.   The  new test added here is failing 100% 
on branch-2.3. See bottom of 
[https://ci-hadoop.apache.org/view/HBase/job/HBase/job/HBase-Find-Flaky-Tests/job/branch-2.3/lastSuccessfulBuild/artifact/output/dashboard.html]
 Thanks.

> When turn on access control, the cell level TTL of Increment and Append 
> operations is invalid.
> --
>
> Key: HBASE-26001
> URL: https://issues.apache.org/jira/browse/HBASE-26001
> Project: HBase
>  Issue Type: Bug
>  Components: Coprocessors
>Reporter: Yutong Xiao
>Assignee: Yutong Xiao
>Priority: Minor
> Fix For: 3.0.0-alpha-1, 2.6.7, 2.5.0, 2.3.6, 2.4.5
>
>
> AccessController postIncrementBeforeWAL() and postAppendBeforeWAL() methods 
> will rewrite the new cell's tags by the old cell's. This will makes the other 
> kinds of tag in new cell invisible (such as TTL tag) after this. As in 
> Increment and Append operations, the new cell has already catch forward all 
> tags of the old cell and TTL tag from mutation operation, here in 
> AccessController we do not need to rewrite the tags once again. Also, the TTL 
> tag of newCell will be invisible in the new created cell. Actually, in 
> Increment and Append operations, the newCell has already copied all tags of 
> the oldCell. So the oldCell is useless here.
> {code:java}
> private Cell createNewCellWithTags(Mutation mutation, Cell oldCell, Cell 
> newCell) {
> // Collect any ACLs from the old cell
> List tags = Lists.newArrayList();
> List aclTags = Lists.newArrayList();
> ListMultimap perms = ArrayListMultimap.create();
> if (oldCell != null) {
>   Iterator tagIterator = PrivateCellUtil.tagsIterator(oldCell);
>   while (tagIterator.hasNext()) {
> Tag tag = tagIterator.next();
> if (tag.getType() != PermissionStorage.ACL_TAG_TYPE) {
>   // Not an ACL tag, just carry it through
>   if (LOG.isTraceEnabled()) {
> LOG.trace("Carrying forward tag from " + oldCell + ": type " + 
> tag.getType()
> + " length " + tag.getValueLength());
>   }
>   tags.add(tag);
> } else {
>   aclTags.add(tag);
> }
>   }
> }
> // Do we have an ACL on the operation?
> byte[] aclBytes = mutation.getACL();
> if (aclBytes != null) {
>   // Yes, use it
>   tags.add(new ArrayBackedTag(PermissionStorage.ACL_TAG_TYPE, aclBytes));
> } else {
>   // No, use what we carried forward
>   if (perms != null) {
> // TODO: If we collected ACLs from more than one tag we may have a
> // List of size > 1, this can be collapsed into a single
> // Permission
> if (LOG.isTraceEnabled()) {
>   LOG.trace("Carrying forward ACLs from " + oldCell + ": " + perms);
> }
> tags.addAll(aclTags);
>   }
> }
> // If we have no tags to add, just return
> if (tags.isEmpty()) {
>   return newCell;
> }
> // Here the new cell's tags will be in visible.
> return PrivateCellUtil.createCell(newCell, tags);
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Reopened] (HBASE-25165) Change 'State time' in UI so sorts

2021-07-24 Thread Michael Stack (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-25165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack reopened HBASE-25165:
---

Reopen to push on branch-2.3

> Change 'State time' in UI so sorts
> --
>
> Key: HBASE-25165
> URL: https://issues.apache.org/jira/browse/HBASE-25165
> Project: HBase
>  Issue Type: Bug
>  Components: UI
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Minor
> Fix For: 3.0.0-alpha-1, 2.4.0
>
> Attachments: Screen Shot 2020-10-07 at 4.15.32 PM.png, Screen Shot 
> 2020-10-07 at 4.15.42 PM.png
>
>
> Here is a minor issue.
> I had an issue w/ crashing servers. The servers were auto-restarted on crash.
> To find the crashing servers, I was sorting on the 'Start time' column in the 
> Master UI. This basically worked but the sort is unreliable as the date we 
> display starts with days-of-the-week.
> This issue is about moving to display start time in iso8601 which is sortable 
> (and occupies less real estate). Let me add some images.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HBASE-25165) Change 'State time' in UI so sorts

2021-07-24 Thread Michael Stack (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-25165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25165.
---
Resolution: Fixed

Pushed on branch-2.3. Re-resolving.

> Change 'State time' in UI so sorts
> --
>
> Key: HBASE-25165
> URL: https://issues.apache.org/jira/browse/HBASE-25165
> Project: HBase
>  Issue Type: Bug
>  Components: UI
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Minor
> Fix For: 2.3.6, 2.4.0, 3.0.0-alpha-1
>
> Attachments: Screen Shot 2020-10-07 at 4.15.32 PM.png, Screen Shot 
> 2020-10-07 at 4.15.42 PM.png
>
>
> Here is a minor issue.
> I had an issue w/ crashing servers. The servers were auto-restarted on crash.
> To find the crashing servers, I was sorting on the 'Start time' column in the 
> Master UI. This basically worked but the sort is unreliable as the date we 
> display starts with days-of-the-week.
> This issue is about moving to display start time in iso8601 which is sortable 
> (and occupies less real estate). Let me add some images.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HBASE-26062) SIGSEGV in AsyncFSWAL consume

2021-07-21 Thread Michael Stack (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-26062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-26062.
---
Resolution: Duplicate

Thanks [~anoop.hbase]  There is ASYNC_WAL on this cluster afterall (when I 
wrote the above, thought there was none). Resolving as duplicate of what we see 
over on HBASE-24984

> SIGSEGV in AsyncFSWAL consume
> -
>
> Key: HBASE-26062
> URL: https://issues.apache.org/jira/browse/HBASE-26062
> Project: HBase
>  Issue Type: Bug
>Reporter: Michael Stack
>Priority: Major
>
> Seems related to the parent issue. Its happened a few times on one of our 
> clusters here. Below are two examples. Need more detail but perhaps the call 
> has timed out, the buffer has thus been freed, but the late consume on the 
> other side of the ringbuffer doesn't know that and goes ahead (Just 
> speculation).
>  
> {code:java}
> #  SIGSEGV (0xb) at pc=0x7f8b3ef5b77c, pid=37631, tid=0x7f61560ed700
> RAX=0xdf6e is an unknown valueRBX=0x7f8a38d7b6f8 is an 
> oopjava.nio.DirectByteBuffer - klass: 
> 'java/nio/DirectByteBuffer'RCX=0x7f60e2767898 is pointing into 
> metadataRDX=0x0de7 is an unknown valueRSP=0x7f61560ec6f0 is 
> pointing into the stack for thread: 0x7f8b3017b800RBP=[error occurred 
> during error reporting (printing register info), id 0xb]
> Stack: [0x7f6155fed000,0x7f61560ee000],  sp=0x7f61560ec6f0,  free 
> space=1021kNative frames: (J=compiled Java code, j=interpreted, Vv=VM code, 
> C=native code)J 23901 C2 
> java.util.stream.MatchOps$1MatchSink.accept(Ljava/lang/Object;)V (44 bytes) @ 
> 0x7f8b3ef5b77c [0x7f8b3ef5b640+0x13c]J 16165 C2 
> java.util.ArrayList$ArrayListSpliterator.tryAdvance(Ljava/util/function/Consumer;)Z
>  (79 bytes) @ 0x7f8b3d67b344 [0x7f8b3d67b2c0+0x84]J 16160 C2 
> java.util.stream.MatchOps$MatchOp.evaluateSequential(Ljava/util/stream/PipelineHelper;Ljava/util/Spliterator;)Ljava/lang/Object;
>  (7 bytes) @ 0x7f8b3d67bc9c [0x7f8b3d67b900+0x39c]J 17729 C2 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALActionListener.visitLogEntryBeforeWrite(Lorg/apache/hadoop/hbase/wal/WALKey;Lorg/apache/hadoop/hbase/wal/WALEdit;)V
>  (10 bytes) @ 0x7f8b3fc39010 [0x7f8b3fc388a0+0x770]J 29991 C2 
> org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.appendAndSync()V (261 
> bytes) @ 0x7f8b3fd03d90 [0x7f8b3fd039e0+0x3b0]J 20773 C2 
> org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.consume()V (474 bytes) @ 
> 0x7f8b40283728 [0x7f8b40283480+0x2a8]J 15191 C2 
> org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL$$Lambda$76.run()V (8 
> bytes) @ 0x7f8b3ed69ecc [0x7f8b3ed69ea0+0x2c]J 17383% C2 
> java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V
>  (225 bytes) @ 0x7f8b3d9423f8 [0x7f8b3d942260+0x198]j  
> java.util.concurrent.ThreadPoolExecutor$Worker.run()V+5j  
> java.lang.Thread.run()V+11v  ~StubRoutines::call_stubV  [libjvm.so+0x66b9ba]  
> JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*, 
> Thread*)+0xe1aV  [libjvm.so+0x669073]  JavaCalls::call_virtual(JavaValue*, 
> KlassHandle, Symbol*, Symbol*, JavaCallArguments*, Thread*)+0x263V  
> [libjvm.so+0x669647]  JavaCalls::call_virtual(JavaValue*, Handle, 
> KlassHandle, Symbol*, Symbol*, Thread*)+0x57V  [libjvm.so+0x6aaa4c]  
> thread_entry(JavaThread*, Thread*)+0x6cV  [libjvm.so+0xa224cb]  
> JavaThread::thread_main_inner()+0xdbV  [libjvm.so+0xa22816]  
> JavaThread::run()+0x316V  [libjvm.so+0x8c4202]  java_start(Thread*)+0x102C  
> [libpthread.so.0+0x76ba]  start_thread+0xca {code}
>  
> This one is from a month previous and has a deeper stack... we're trying to 
> read a Cell...
>  
> {code:java}
> Stack: [0x7fa1d5fb8000,0x7fa1d60b9000],  sp=0x7fa1d60b7660,  free 
> space=1021kNative frames: (J=compiled Java code, j=interpreted, Vv=VM code, 
> C=native code)J 30665 C2 
> org.apache.hadoop.hbase.PrivateCellUtil.matchingFamily(Lorg/apache/hadoop/hbase/Cell;[BII)Z
>  (59 bytes) @ 0x7fcc2d29eeb2 [0x7fcc2d29e7c0+0x6f2]J 25816 C2 
> org.apache.hadoop.hbase.CellUtil.matchingFamily(Lorg/apache/hadoop/hbase/Cell;[B)Z
>  (28 bytes) @ 0x7fcc2a0430f8 [0x7fcc2a0430e0+0x18]J 17236 C2 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALActionListener$$Lambda$254.test(Ljava/lang/Object;)Z
>  (8 bytes) @ 0x7fcc2b40bc68 [0x7fcc2b40bc20+0x48]J 13735 C2 
> java.util.ArrayList$ArrayListSpliterator.tryAdvance(Ljava/util/function/Consumer;)Z
>  (79 bytes) @ 0x7fcc2b7d936c [0x7fcc2b7d92c0+0xac]J 17162 C2 
> java.util.stream.MatchOps$MatchOp.evaluateSequential(Ljava/util/stream/PipelineHelper;Ljava/util/Spliterator;)Ljava/lang/Object;
>  (7 bytes) @ 0x7fcc29bc05e8

[jira] [Resolved] (HBASE-25739) TableSkewCostFunction need to use aggregated deviation

2021-07-16 Thread Michael Stack (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-25739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25739.
---
Hadoop Flags: Reviewed
  Resolution: Fixed

Resovling after merging PRs for branch-2.3+. Thanks for the patch 
[~clarax98007]  (and reviews [~busbey] and [~zhangduo] )

> TableSkewCostFunction need to use aggregated deviation
> --
>
> Key: HBASE-25739
> URL: https://issues.apache.org/jira/browse/HBASE-25739
> Project: HBase
>  Issue Type: Sub-task
>  Components: Balancer, master
>Reporter: Clara Xiong
>Assignee: Clara Xiong
>Priority: Major
> Fix For: 2.5.0, 2.3.6, 3.0.0-alpha-2, 2.4.5
>
> Attachments: 
> TEST-org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancerBalanceCluster.xml,
>  
> org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancerBalanceCluster.txt
>
>
> TableSkewCostFunction uses the sum of the max deviation region per server for 
> all tables as the measure of unevenness. It doesn't work in a very common 
> scenario in operations. Say we have 100 regions on 50 nodes, two on each. We 
> add 50 new nodes and they have 0 each. The max deviation from the mean is 1, 
> compared to 99 in the worst case scenario of 100 regions on a single server. 
> The normalized cost is 1/99 = 0.011 < default threshold of 0.05. Balancer 
> wouldn't move.  The proposal is to use aggregated deviation of the count per 
> region server to detect this scenario, generating a cost of 100/198 = 0.5 in 
> this case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HBASE-26062) SIGSEGV in AsyncFSWAL consume

2021-07-02 Thread Michael Stack (Jira)

Michael Stack created HBASE-26062:
-

 Summary: SIGSEGV in AsyncFSWAL consume
 Key: HBASE-26062
 URL: https://issues.apache.org/jira/browse/HBASE-26062
 Project: HBase
  Issue Type: Sub-task
Reporter: Michael Stack


Seems related to the parent issue. Its happened a few times on one of our 
clusters here. Below are two examples. Need more detail but perhaps the call 
has timed out, the buffer has thus been freed, but the late consume on the 
other side of the ringbuffer doesn't know that and goes ahead (Just 
speculation).

 
{code:java}
#  SIGSEGV (0xb) at pc=0x7f8b3ef5b77c, pid=37631, tid=0x7f61560ed700

RAX=0xdf6e is an unknown valueRBX=0x7f8a38d7b6f8 is an 
oopjava.nio.DirectByteBuffer - klass: 
'java/nio/DirectByteBuffer'RCX=0x7f60e2767898 is pointing into 
metadataRDX=0x0de7 is an unknown valueRSP=0x7f61560ec6f0 is 
pointing into the stack for thread: 0x7f8b3017b800RBP=[error occurred 
during error reporting (printing register info), id 0xb]

Stack: [0x7f6155fed000,0x7f61560ee000],  sp=0x7f61560ec6f0,  free 
space=1021kNative frames: (J=compiled Java code, j=interpreted, Vv=VM code, 
C=native code)J 23901 C2 
java.util.stream.MatchOps$1MatchSink.accept(Ljava/lang/Object;)V (44 bytes) @ 
0x7f8b3ef5b77c [0x7f8b3ef5b640+0x13c]J 16165 C2 
java.util.ArrayList$ArrayListSpliterator.tryAdvance(Ljava/util/function/Consumer;)Z
 (79 bytes) @ 0x7f8b3d67b344 [0x7f8b3d67b2c0+0x84]J 16160 C2 
java.util.stream.MatchOps$MatchOp.evaluateSequential(Ljava/util/stream/PipelineHelper;Ljava/util/Spliterator;)Ljava/lang/Object;
 (7 bytes) @ 0x7f8b3d67bc9c [0x7f8b3d67b900+0x39c]J 17729 C2 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALActionListener.visitLogEntryBeforeWrite(Lorg/apache/hadoop/hbase/wal/WALKey;Lorg/apache/hadoop/hbase/wal/WALEdit;)V
 (10 bytes) @ 0x7f8b3fc39010 [0x7f8b3fc388a0+0x770]J 29991 C2 
org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.appendAndSync()V (261 
bytes) @ 0x7f8b3fd03d90 [0x7f8b3fd039e0+0x3b0]J 20773 C2 
org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.consume()V (474 bytes) @ 
0x7f8b40283728 [0x7f8b40283480+0x2a8]J 15191 C2 
org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL$$Lambda$76.run()V (8 bytes) 
@ 0x7f8b3ed69ecc [0x7f8b3ed69ea0+0x2c]J 17383% C2 
java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V
 (225 bytes) @ 0x7f8b3d9423f8 [0x7f8b3d942260+0x198]j  
java.util.concurrent.ThreadPoolExecutor$Worker.run()V+5j  
java.lang.Thread.run()V+11v  ~StubRoutines::call_stubV  [libjvm.so+0x66b9ba]  
JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*, 
Thread*)+0xe1aV  [libjvm.so+0x669073]  JavaCalls::call_virtual(JavaValue*, 
KlassHandle, Symbol*, Symbol*, JavaCallArguments*, Thread*)+0x263V  
[libjvm.so+0x669647]  JavaCalls::call_virtual(JavaValue*, Handle, KlassHandle, 
Symbol*, Symbol*, Thread*)+0x57V  [libjvm.so+0x6aaa4c]  
thread_entry(JavaThread*, Thread*)+0x6cV  [libjvm.so+0xa224cb]  
JavaThread::thread_main_inner()+0xdbV  [libjvm.so+0xa22816]  
JavaThread::run()+0x316V  [libjvm.so+0x8c4202]  java_start(Thread*)+0x102C  
[libpthread.so.0+0x76ba]  start_thread+0xca {code}
 

This one is from a month previous and has a deeper stack... we're trying to 
read a Cell...

 
{code:java}
Stack: [0x7fa1d5fb8000,0x7fa1d60b9000],  sp=0x7fa1d60b7660,  free 
space=1021kNative frames: (J=compiled Java code, j=interpreted, Vv=VM code, 
C=native code)J 30665 C2 
org.apache.hadoop.hbase.PrivateCellUtil.matchingFamily(Lorg/apache/hadoop/hbase/Cell;[BII)Z
 (59 bytes) @ 0x7fcc2d29eeb2 [0x7fcc2d29e7c0+0x6f2]J 25816 C2 
org.apache.hadoop.hbase.CellUtil.matchingFamily(Lorg/apache/hadoop/hbase/Cell;[B)Z
 (28 bytes) @ 0x7fcc2a0430f8 [0x7fcc2a0430e0+0x18]J 17236 C2 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALActionListener$$Lambda$254.test(Ljava/lang/Object;)Z
 (8 bytes) @ 0x7fcc2b40bc68 [0x7fcc2b40bc20+0x48]J 13735 C2 
java.util.ArrayList$ArrayListSpliterator.tryAdvance(Ljava/util/function/Consumer;)Z
 (79 bytes) @ 0x7fcc2b7d936c [0x7fcc2b7d92c0+0xac]J 17162 C2 
java.util.stream.MatchOps$MatchOp.evaluateSequential(Ljava/util/stream/PipelineHelper;Ljava/util/Spliterator;)Ljava/lang/Object;
 (7 bytes) @ 0x7fcc29bc05e8 [0x7fcc29bbfe80+0x768]J 16934 C2 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALActionListener.visitLogEntryBeforeWrite(Lorg/apache/hadoop/hbase/wal/WALKey;Lorg/apache/hadoop/hbase/wal/WALEdit;)V
 (10 bytes) @ 0x7fcc2bb313f8 [0x7fcc2bb30c60+0x798]J 30732 C2 
org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.appendAndSync()V (261 
bytes) @ 0x7fcc2ae5a420 [0x7fcc2ae59d60+0x6c0]J 22203 C2 
org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.consume()V (474

[jira] [Resolved] (HBASE-25821) Sorting by "Start time" in the Master UI does so alphabetically and not by date

2021-06-30 Thread Michael Stack (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-25821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25821.
---
Resolution: Duplicate

Resolving as duplicate. Looks like it was done a while ago and will show in 
hbase-2.4.x.  See HBASE-25165 Shout if it ain't what you want [~ekrettek]

> Sorting by "Start time" in the Master UI does so alphabetically and not by 
> date
> ---
>
> Key: HBASE-25821
> URL: https://issues.apache.org/jira/browse/HBASE-25821
> Project: HBase
>  Issue Type: Bug
>  Components: master, UI
>Affects Versions: 2.4.0, 2.3.4, 2.4.1, 2.3.5, 2.4.2
>Reporter: Evan Krettek
>Priority: Minor
>
> When sorting by the start time column in the Maser UI it sorts alphabetically 
> and not by date



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HBASE-25729) Upgrade to latest hbase-thirdparty

2021-06-30 Thread Michael Stack (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-25729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25729.
---
Hadoop Flags: Reviewed
Assignee: Andrew Kyle Purtell
  Resolution: Duplicate

Making this a duplicate of [~pankajkumar] issue; Pankaj did the work of this 
issues subject matter over in HBASE-25918 .

This is a re-resolve because we reopened the issue to discuss whether or not 
latest hbase-thirdparty appropriate in branch-2.4 and conclude it fine. 
Re-resolving.

> Upgrade to latest hbase-thirdparty
> --
>
> Key: HBASE-25729
> URL: https://issues.apache.org/jira/browse/HBASE-25729
> Project: HBase
>  Issue Type: Sub-task
>  Components: build, thirdparty
>Affects Versions: 2.4.2
>Reporter: Andrew Kyle Purtell
>Assignee: Andrew Kyle Purtell
>Priority: Major
> Fix For: 2.5.0, 3.0.0-alpha-2, 2.4.5
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HBASE-25960) Build includes unshaded netty .so; clashes w/ downstreamers who would use a different version of netty

2021-06-29 Thread Michael Stack (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-25960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25960.
---
Fix Version/s: thirdparty-4.0.0
   thirdparty-3.5.2
 Assignee: Michael Stack
   Resolution: Cannot Reproduce

Resolving this issue as "Cannot Reproduce" (could be "Implemented"). Whatever 
was up w/ thirdparty in older versions no longer seems to happen. I tried to 
add enforcer rules but no amenable plugin to check a file in a jar. I could add 
a bit of ant hackery to the thirdparty netty build to set properties if file 
present and fail bulid if so but it'd be ugly. Closing this out.

> Build includes unshaded netty .so; clashes w/ downstreamers who would use a 
> different version of netty
> --
>
> Key: HBASE-25960
> URL: https://issues.apache.org/jira/browse/HBASE-25960
> Project: HBase
>  Issue Type: Bug
>  Components: build
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Major
> Fix For: thirdparty-3.5.2, thirdparty-4.0.0
>
>
> A coworker was trying to use hbase client in a fat application that uses a 
> different netty version to what hbase uses internally. Their app would fail 
> to launch because it kept bumping into an incompatible netty .so lib. Here 
> are the unshaded netty .so's we bundle looking at hbase-2.4.1...:
> ./lib/hbase-shaded-netty-3.4.1.jar has:
> {code}
> META-INF/native/libnetty_transport_native_epoll_aarch_64.so
> META-INF/native/liborg_apache_hbase_thirdparty_netty_transport_native_epoll_x86_64.so
> META-INF/native/libnetty_transport_native_epoll_x86_64.so
> {code}
> (HBASE-25959 should fix the non-relocation of 
> libnetty_transport_native_epoll_aarch_64).
> ./lib/shaded-clients/hbase-shaded-client-byo-hadoop-2.4.1.1-apple.jar has the 
> same three .sos as does 
> ./lib/shaded-clients/hbase-shaded-mapreduce-2.4.1.1-apple.jar
> and ./lib/shaded-clients/hbase-shaded-client-2.4.1.1-apple.jar
> We even bundle ./lib/netty-all-4.1.17.Final.jar which unsurprisingly has the 
> netty .sos in it.
> Looking at published builds of hbase-thirdparty, I see that these too include 
> the above trio of .sos... The hbase-shaded-netty includes them in 3.4.1 
> https://repo1.maven.org/maven2/org/apache/hbase/thirdparty/hbase-shaded-netty/3.4.1/
>  as does 3.5.0.
> I just tried running a build of hbase-thirdparty and it does NOT include the 
> extras
> META-INF/native/liborg_apache_hbase_thirdparty_netty_transport_native_epoll_aarch_64.so
> META-INF/native/liborg_apache_hbase_thirdparty_netty_transport_native_epoll_x86_64.so
> (it has the fix for aarch included... when I built)
> Here is link to the snapshot I made:
> https://repository.apache.org/content/repositories/orgapachehbase-1451/org/apache/hbase/thirdparty/hbase-shaded-netty/3.5.1-stack4/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HBASE-26042) WAL lockup on 'sync failed'

2021-06-29 Thread Michael Stack (Jira)

Michael Stack created HBASE-26042:
-

 Summary: WAL lockup on 'sync failed'
 Key: HBASE-26042
 URL: https://issues.apache.org/jira/browse/HBASE-26042
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.3.5
Reporter: Michael Stack


Making note of issue seen in production cluster.

Node had been struggling under load for a few days with slow syncs up to 10 
seconds, a few STUCK MVCCs from which it recovered and some java pauses up to 
three seconds in length.

Then the below happened:
{code:java}
2021-06-27 13:41:27,604 WARN  [AsyncFSWAL-0-hdfs://:8020/hbase] 
wal.AsyncFSWAL: sync 
failedorg.apache.hbase.thirdparty.io.netty.channel.unix.Errors$NativeIoException:
 readAddress(..) failed: Connection reset by peer {code}
... and WAL turned dead in the water. Scanners start expiring. RPC prints text 
versions of requests complaining requestsTooSlow. Then we start to see these:
{code:java}
org.apache.hadoop.hbase.exceptions.TimeoutIOException: Failed to get sync 
result after 30 ms for txid=552128301, WAL system stuck? {code}
Whats supposed to happen when other side goes away like this is that we will 
roll the WAL – go set up a new one. You can see it happening if you run
{code:java}
mvn test 
-Dtest=org.apache.hadoop.hbase.regionserver.wal.TestAsyncFSWAL#testBrokenWriter 
{code}
I tried hacking the test to repro the above hang by throwing same exception in 
above test (on linux because need epoll to repro) but all just worked.

Thread dumps of the hungup WAL subsystem are a little odd. The log roller is 
stuck w/o timeout trying to write a long on the WAL header:

 
{code:java}
Thread 9464: (state = BLOCKED)
 - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information may 
be imprecise)
 - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, 
line=175 (Compiled frame)
 - java.util.concurrent.CompletableFuture$Signaller.block() @bci=19, line=1707 
(Compiled frame)
 - 
java.util.concurrent.ForkJoinPool.managedBlock(java.util.concurrent.ForkJoinPool$ManagedBlocker)
 @bci=119, line=3323 (Compiled frame)
 - java.util.concurrent.CompletableFuture.waitingGet(boolean) @bci=115, 
line=1742 (Compiled frame)
 - java.util.concurrent.CompletableFuture.get() @bci=11, line=1908 (Compiled 
frame)
 - 
org.apache.hadoop.hbase.regionserver.wal.AsyncProtobufLogWriter.write(java.util.function.Consumer)
 @bci=16, line=189 (Compiled frame)
 - 
org.apache.hadoop.hbase.regionserver.wal.AsyncProtobufLogWriter.writeMagicAndWALHeader(byte[],
 org.apache.hadoop.hbase.shaded.protobuf.generated.WALProtos$WALHeader) @bci=9, 
line=202 (Compiled frame)
 - 
org.apache.hadoop.hbase.regionserver.wal.AbstractProtobufLogWriter.init(org.apache.hadoop.fs.FileSystem,
 org.apache.hadoop.fs.Path, org.apache.hadoop.conf.Configuration, boolean, 
long) @bci=107, line=170 (Compiled frame)
 - 
org.apache.hadoop.hbase.wal.AsyncFSWALProvider.createAsyncWriter(org.apache.hadoop.conf.Configuration,
 org.apache.hadoop.fs.FileSystem, org.apache.hadoop.fs.Path, boolean, long, 
org.apache.hbase.thirdparty.io.netty.channel.EventLoopGroup, java.lang.Class) 
@bci=61, line=113 (Compiled frame)
 - 
org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.createWriterInstance(org.apache.hadoop.fs.Path)
 @bci=22, line=651 (Compiled frame)
 - 
org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.createWriterInstance(org.apache.hadoop.fs.Path)
 @bci=2, line=128 (Compiled frame)
 - org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriter(boolean) 
@bci=101, line=797 (Compiled frame)
 - org.apache.hadoop.hbase.wal.AbstractWALRoller$RollController.rollWal(long) 
@bci=18, line=263 (Compiled frame)
 - org.apache.hadoop.hbase.wal.AbstractWALRoller.run() @bci=198, line=179 
(Compiled frame) {code}
 

Other threads are BLOCKED trying to append the WAL w/ flush markers etc. unable 
to add the ringbuffer:

 
{code:java}
Thread 9465: (state = BLOCKED)
 - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information may 
be imprecise)
 - java.util.concurrent.locks.LockSupport.parkNanos(long) @bci=11, line=338 
(Compiled frame)
 - com.lmax.disruptor.MultiProducerSequencer.next(int) @bci=82, line=136 
(Compiled frame)
 - com.lmax.disruptor.MultiProducerSequencer.next() @bci=2, line=105 
(Interpreted frame)
 - com.lmax.disruptor.RingBuffer.next() @bci=4, line=263 (Compiled frame)
 - 
org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.lambda$stampSequenceIdAndPublishToRingBuffer$1(org.apache.commons.lang3.mutable.MutableLong,
 com.lmax.disruptor.RingBuffer) @bci=2, line=1031 (Compiled frame)
 - org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL$$Lambda$270.run() 
@bci=8 (Compiled frame)
 - 
org.apache.hadoop.hbase.regionserver.MultiVersionConcurrencyControl.begin(java.lang.Runnable)
 @bci=36, line=140 (Interpreted frame)
 -

[jira] [Resolved] (HBASE-25990) Add donated buildbots for jenkins

2021-06-29 Thread Michael Stack (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-25990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25990.
---
Resolution: Incomplete

Closing. I was unable to hook up billing. Might be back but this present effort 
is being aborted.

> Add donated buildbots for jenkins
> -
>
> Key: HBASE-25990
> URL: https://issues.apache.org/jira/browse/HBASE-25990
> Project: HBase
>  Issue Type: Task
>  Components: build
>Reporter: Michael Stack
>Priority: Major
> Attachments: Screen Shot 2021-06-22 at 1.43.12 PM.png
>
>
> This issue is for keeping notes on how to add a donated buildbot to our 
> apache build.
> My employer donated budget (I badly under-estimated cost but whatever...). 
> This issue is about adding 5 GCP nodes.
> There is this page up on apache on donating machines for build 
> https://infra.apache.org/hosting-external-agent.html It got me some of the 
> ways... at least as far as the bit about mailing root@a.o(nada).
> At [~zhangduo]'s encouragement -- he has been this route already adding in 
> the xiaomi donation -- I filed a JIRA after deploying a machine on GCP, 
> INFRA-21973.
> I then reached out on slack and the gentleman Gavin MacDonald picked up the 
> task.
> He told me run this script on all hosts after making edits (comment out line 
> #39 where we set hostname -- doesn't work):
> https://github.com/apache/cassandra-builds/blob/trunk/jenkins-dsl/agent-install.sh
> (For more context on the above script and as a good backgrounder, see the 
> nice C* page on how to do this setup: 
> https://github.com/apache/cassandra-builds/blob/trunk/ASF-jenkins-agents.md)
> After doing the above, I had to do a visudo on each host to add a line for an 
> infra account to allow passwordless access.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HBASE-25989) FanOutOneBlockAsyncDFSOutput using shaded protobuf in hdfs 3.3+

2021-06-12 Thread Michael Stack (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-25989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25989.
---
Fix Version/s: 2.4.5
   2.5.0
   3.0.0-alpha-1
 Hadoop Flags: Reviewed
 Assignee: Michael Stack
   Resolution: Fixed

Pushed to branch-2.4+ Thanks for reviews [~zhangduo] and [~weichiu]

> FanOutOneBlockAsyncDFSOutput using shaded protobuf in hdfs 3.3+
> ---
>
> Key: HBASE-25989
> URL: https://issues.apache.org/jira/browse/HBASE-25989
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.5.0, 2.4.5
>
>
> The parent added some fancy dancing to make it so on hadoop-3.3.0+ we'd use 
> hadoops shaded protobuf rather than the non-relocated protobuf. When hdfs 
> 3.3, the 'trick' is not working so we continue to use the unshaded protobuf. 
> Fix is trivial.
> Found this testing the 3.3.1RC3. Hard to see because whether we use shaded or 
> unshaded is at DEBUG level.  If you set DEBUG level and run 
> TestFanOutOneBlockAsyncDFSOutput with hdfs 3.3.1 RC candidate in place you'll 
> see it uses the unshaded protobuf.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HBASE-25920) Support Hadoop 3.3.1

2021-06-09 Thread Michael Stack (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-25920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25920.
---
Hadoop Flags: Reviewed
Release Note: Fixes to make unit tests pass and to make it so an hbase 
built from branch-2 against a 3.3.1RC can run on a 3.3.1RC small cluster.
  Resolution: Fixed

Resolving as done at least for now.

> Support Hadoop 3.3.1
> 
>
> Key: HBASE-25920
> URL: https://issues.apache.org/jira/browse/HBASE-25920
> Project: HBase
>  Issue Type: Task
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.5.0
>
>
> The Hadoop 3.3.1 is a big release, quite different from 3.3.0.
> File this jira to track the support for Hadoop 3.3.1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HBASE-25971) FanOutOneBlockAsyncDFSOutputHelper stuck when run against hadoop-3.3.1-RC3

2021-06-09 Thread Michael Stack (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-25971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25971.
---
Resolution: Not A Problem

Resolving as 'Not a problem'. This seems to have been some issue around 
building artifacts for RC testing. Subsequent builds worked (after HBASE-25989)

> FanOutOneBlockAsyncDFSOutputHelper stuck when run against hadoop-3.3.1-RC3
> --
>
> Key: HBASE-25971
> URL: https://issues.apache.org/jira/browse/HBASE-25971
> Project: HBase
>  Issue Type: Bug
>Reporter: Michael Stack
>Priority: Major
>
> This is in the log:
> {code}
> 2021-06-04 21:29:39,138 DEBUG [master/oss-master-1:16000:becomeActiveMaster] 
> ipc.ProtobufRpcEngine: Call: addBlock took 6ms
> 2021-06-04 21:29:39,169 WARN  [RS-EventLoopGroup-1-1] 
> concurrent.DefaultPromise: An exception was thrown by 
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper$4.operationComplete()
> java.lang.IllegalArgumentException: object is not an instance of declaring 
> class
> at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.base/java.lang.reflect.Method.invoke(Method.java:566)
> at 
> org.apache.hadoop.hbase.io.asyncfs.ProtobufDecoder.(ProtobufDecoder.java:69)
> at 
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper.processWriteBlockResponse(FanOutOneBlockAsyncDFSOutputHelper.java:343)
> at 
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper.access$100(FanOutOneBlockAsyncDFSOutputHelper.java:112)
> at 
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper$4.operationComplete(FanOutOneBlockAsyncDFSOutputHelper.java:425)
> at 
> org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:578)
> at 
> org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:552)
> at 
> org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:491)
> at 
> org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.addListener(DefaultPromise.java:184)
> at 
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper.initialize(FanOutOneBlockAsyncDFSOutputHelper.java:419)
> at 
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper.access$300(FanOutOneBlockAsyncDFSOutputHelper.java:112)
> at 
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper$5.operationComplete(FanOutOneBlockAsyncDFSOutputHelper.java:477)
> at 
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper$5.operationComplete(FanOutOneBlockAsyncDFSOutputHelper.java:472)
> at 
> org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:578)
> at 
> org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:571)
> at 
> org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:550)
> at 
> org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:491)
> at 
> org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:616)
> at 
> org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.setSuccess0(DefaultPromise.java:605)
> at 
> org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:104)
> at 
> org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPromise.trySuccess(DefaultChannelPromise.java:84)
> at 
> org.apache.hbase.thirdparty.io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.fulfillConnectPromise(AbstractEpollChannel.java:653)
> at 
> org.apache.hbase.thirdparty.io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.finishConnect(AbstractEpollChannel.java:691)
> at 
> org.apache.hbase.thirdparty.io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.epollOutReady(AbstractEpollChannel.java:567)
> at 
> org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:470)
> at 
> org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378)
> at 
>

[jira] [Created] (HBASE-25990) Add donated buildbots for jenkins

2021-06-09 Thread Michael Stack (Jira)

Michael Stack created HBASE-25990:
-

 Summary: Add donated buildbots for jenkins
 Key: HBASE-25990
 URL: https://issues.apache.org/jira/browse/HBASE-25990
 Project: HBase
  Issue Type: Task
  Components: build
Reporter: Michael Stack


This issue is for keeping notes on how to add a donated buildbot to our apache 
build.

My employer donated budget (I badly under-estimated cost but whatever...). This 
issue is about adding 5 GCP nodes.

There is this page up on apache on donating machines for build 
https://infra.apache.org/hosting-external-agent.html It got me some of the 
ways... at least as far as the bit about mailing root@a.o(nada).

At [~zhangduo]'s encouragement -- he has been this route already adding in the 
xiaomi donation -- I filed a JIRA after deploying a machine on GCP, INFRA-21973.

I then reached out on slack and the gentleman Gavin MacDonald picked up the 
task.

He told me run this script on all hosts after making edits (comment out line 
#39 where we set hostname -- doesn't work):

https://github.com/apache/cassandra-builds/blob/trunk/jenkins-dsl/agent-install.sh

(For more context on the above script and as a good backgrounder, see the nice 
C* page on how to do this setup: 
https://github.com/apache/cassandra-builds/blob/trunk/ASF-jenkins-agents.md)

After doing the above, I had to do a visudo on each host to add a line for an 
infra account to allow passwordless access.












--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HBASE-25989) FanOutOneBlockAsyncDFSOutput using shaded protobuf in hdfs 3.3+

2021-06-09 Thread Michael Stack (Jira)

Michael Stack created HBASE-25989:
-

 Summary: FanOutOneBlockAsyncDFSOutput using shaded protobuf in 
hdfs 3.3+
 Key: HBASE-25989
 URL: https://issues.apache.org/jira/browse/HBASE-25989
 Project: HBase
  Issue Type: Sub-task
Reporter: Michael Stack


The parent added some fancy dancing to make it so on hadoop-3.3.0+ we'd use 
hadoops shaded protobuf rather than the non-relocated protobuf. When hdfs 3.3, 
the 'trick' is not working so we continue to use the unshaded protobuf. Fix is 
trivial.

Found this testing the 3.3.1RC3. Hard to see because whether we use shaded or 
unshaded is at DEBUG level.  If you set DEBUG level and run 
TestFanOutOneBlockAsyncDFSOutput with hdfs 3.3.1 RC candidate in place you'll 
see it uses the unshaded protobuf.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HBASE-25969) Cleanup netty-all transitive includes

2021-06-08 Thread Michael Stack (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-25969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25969.
---
Fix Version/s: 2.4.5
   2.5.0
   3.0.0-alpha-1
 Hadoop Flags: Reviewed
 Release Note: We have an (old) netty-all in our produced artifacts. It is 
transitively included from hadoop. It is needed by MiniMRCluster referenced 
from a few MR tests in hbase. This commit adds netty-all excludes everywhere 
else but where tests will fail unless the transitive is allowed through. TODO: 
move MR and/or MR tests out of hbase core.
 Assignee: Michael Stack
   Resolution: Fixed

Thanks for reviews [~haxiaolin] and [~pankajkumar] Pushed to branch-2.4+

> Cleanup netty-all transitive includes
> -
>
> Key: HBASE-25969
> URL: https://issues.apache.org/jira/browse/HBASE-25969
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.5.0, 2.4.5
>
>
> Our releases include lib/netty-all.jar as a transitive include from hadoop. 
> -Purge.-
> ... looks like I can't purge the transitive netty-all includes just yet, not 
> w/o moving MR out of hbase core. The transitively included netty-all w/ the 
> old version is needed to run the tests that put up a MR cluster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HBASE-18562) [AMv2] expireServers and ServerCrashProcedure cleanup

2021-06-05 Thread Michael Stack (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-18562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-18562.
---
Resolution: Not A Problem

Agree.

Resolving as 'Not a problem' since so much has changed since.

> [AMv2] expireServers and ServerCrashProcedure cleanup
> -
>
> Key: HBASE-18562
> URL: https://issues.apache.org/jira/browse/HBASE-18562
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Reporter: Michael Stack
>Priority: Critical
>
> In review of HBASE-18551, [~uagashe] posed a scenario that revealed a hole in 
> our processing of unassigns; there is case where a UP might not get 
> notification from ServerCrashProcedure if the UP is scheduled AFTER a SCP has 
> gotten past its handleRIT call (No new SCP will be queued because 
> expireServer won't let it happen if crashed server is in dead server list 
> which it will be).
> Chatting on it, expireServers is doing checks that belong inside 
> ServerCrashProcedure. expireServers scheduling an SCP each time it is called 
> would make it so SCP processing is serialized one behind the other. If the 
> first does the clean up all subsequent will do no work but Procedures 
> dependent on them will get their wakeup call.
> This issue is about implementing the above cleanup.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HBASE-20179) [TESTING] Scale for 2.0.0

2021-06-05 Thread Michael Stack (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-20179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-20179.
---
Fix Version/s: (was: 3.0.0-alpha-1)
   2.0.0
 Assignee: Michael Stack
   Resolution: Fixed

Finished task. Set fix version as 2.0.0 though it released long ago.

> [TESTING] Scale for 2.0.0
> -
>
> Key: HBASE-20179
> URL: https://issues.apache.org/jira/browse/HBASE-20179
> Project: HBase
>  Issue Type: Umbrella
>  Components: test
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Critical
> Fix For: 2.0.0
>
>
> Umbrella issue for scale testing for 2.0.
> At least keep account of what testing has been done in here.
> TODO as subtasks
>  * Long-running test
>  * ITBLL w/o killing Master
>  * Big ITBLL (1B works, 10B needs more verification, 100B, and 1T).
>  * Many Regions (43k regions takes about 12 minutes on a 6node cluster to 
> deploy and 3 1/2 minutes to go down -- needs tuning).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HBASE-25971) FanOutOneBlockAsyncDFSOutputHelper stuck when run against hadoop-3.3.1-RC3

2021-06-04 Thread Michael Stack (Jira)

Michael Stack created HBASE-25971:
-

 Summary: FanOutOneBlockAsyncDFSOutputHelper stuck when run against 
hadoop-3.3.1-RC3
 Key: HBASE-25971
 URL: https://issues.apache.org/jira/browse/HBASE-25971
 Project: HBase
  Issue Type: Bug
Reporter: Michael Stack


This is in the log:

{code}
2021-06-04 21:29:39,138 DEBUG [master/oss-master-1:16000:becomeActiveMaster] 
ipc.ProtobufRpcEngine: Call: addBlock took 6ms
2021-06-04 21:29:39,169 WARN  [RS-EventLoopGroup-1-1] 
concurrent.DefaultPromise: An exception was thrown by 
org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper$4.operationComplete()
java.lang.IllegalArgumentException: object is not an instance of declaring class
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at 
org.apache.hadoop.hbase.io.asyncfs.ProtobufDecoder.(ProtobufDecoder.java:69)
at 
org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper.processWriteBlockResponse(FanOutOneBlockAsyncDFSOutputHelper.java:343)
at 
org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper.access$100(FanOutOneBlockAsyncDFSOutputHelper.java:112)
at 
org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper$4.operationComplete(FanOutOneBlockAsyncDFSOutputHelper.java:425)
at 
org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:578)
at 
org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:552)
at 
org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:491)
at 
org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.addListener(DefaultPromise.java:184)
at 
org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper.initialize(FanOutOneBlockAsyncDFSOutputHelper.java:419)
at 
org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper.access$300(FanOutOneBlockAsyncDFSOutputHelper.java:112)
at 
org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper$5.operationComplete(FanOutOneBlockAsyncDFSOutputHelper.java:477)
at 
org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper$5.operationComplete(FanOutOneBlockAsyncDFSOutputHelper.java:472)
at 
org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:578)
at 
org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:571)
at 
org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:550)
at 
org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:491)
at 
org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:616)
at 
org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.setSuccess0(DefaultPromise.java:605)
at 
org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:104)
at 
org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPromise.trySuccess(DefaultChannelPromise.java:84)
at 
org.apache.hbase.thirdparty.io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.fulfillConnectPromise(AbstractEpollChannel.java:653)
at 
org.apache.hbase.thirdparty.io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.finishConnect(AbstractEpollChannel.java:691)
at 
org.apache.hbase.thirdparty.io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.epollOutReady(AbstractEpollChannel.java:567)
at 
org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:470)
at 
org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378)
at 
org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
at 
org.apache.hbase.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at 
org.apache.hbase.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:834)
{code}

These are WARNs but Master startup is stuck here:

{code}
"master/oss-master-1:16000:becomeActiveMaster" #88 daemon prio=5 os_prio=0 
cpu=666.20ms

[jira] [Created] (HBASE-25969) Purge netty-all transitive includes

2021-06-03 Thread Michael Stack (Jira)

Michael Stack created HBASE-25969:
-

 Summary: Purge netty-all transitive includes
 Key: HBASE-25969
 URL: https://issues.apache.org/jira/browse/HBASE-25969
 Project: HBase
  Issue Type: Sub-task
Reporter: Michael Stack


Our releases include lib/netty-all.jar as a transitive include from hadoop. 
Purge.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HBASE-19515) Region server left in online servers list forever if it went down after registering to master and before creating ephemeral node

2021-06-02 Thread Michael Stack (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-19515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-19515.
---
Resolution: Not A Problem

Resolving as 'Not a Problem', fixed by HBASE-25032. Thanks [~anoop.hbase] for 
taking a look.

> Region server left in online servers list forever if it went down after 
> registering to master and before creating ephemeral node
> 
>
> Key: HBASE-19515
> URL: https://issues.apache.org/jira/browse/HBASE-19515
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Reporter: Michael Stack
>Priority: Critical
> Fix For: 3.0.0-alpha-2
>
>
> This one is interesting. It was supposedly fixed long time ago back in 
> HBASE-9593 (The issue has same subject as this one) but there was a problem 
> w/ the fix reported later, post-commit, long after the issue was closed. The 
> 'fix' was registering ephemeral node in ZK BEFORE reporting in to the Master 
> for the first time. The problem w/ this approach is that the Master tells the 
> RS what name it should use reporting in. If we register in ZK before we talk 
> to the Master, the name in ZK and the one the RS ends up using could deviate.
> In hbase2, we do the right thing registering the ephemeral node after we 
> report to the Master. So, the issue reported in HBASE-9593, that a RS that 
> dies between reporting to master and registering up in ZK, stays registered 
> at the Master for ever is back; we'll keep trying to assign it regions. Its a 
> real problem.
> That hbase2 has this issue has been suppressed up until now. The test that 
> was written for HBASE-9593, TestRSKilledWhenInitializing, is a good test but 
> a little sloppy. It puts up two RSs aborting one only after registering at 
> the Master before posting to ZK. That leaves one healthy server up. It is 
> hosting hbase:meta. This is enough for the test to bluster through. The only 
> assign it does is namespace table. It goes to the hbase:meta server. If the 
> test created a new table and did roundrobin, it'd fail.
> After HBASE-18946, where we do round robin on table create -- a desirable 
> attribute -- via the balancer so all is kosher, the test 
> TestRSKilledWhenInitializing now starts to fail because we chose the hobbled 
> server most of the time.
> So, this issue is about fixing the original issue properly for hbase2. We 
> don't have a timeout on assign in AMv2, not yet, that might be the fix, or 
> perhaps a double report before we online a server with the second report 
> coming in after ZK goes up (or we stop doing ephemeral nodes for RS up in ZK 
> and just rely on heartbeats).
> Making this a critical issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HBASE-25960) Build includes unshaded netty .so; clashes w/ downstreamers who would use a different version of netty

2021-06-01 Thread Michael Stack (Jira)

Michael Stack created HBASE-25960:
-

 Summary: Build includes unshaded netty .so; clashes w/ 
downstreamers who would use a different version of netty
 Key: HBASE-25960
 URL: https://issues.apache.org/jira/browse/HBASE-25960
 Project: HBase
  Issue Type: Bug
  Components: build
Reporter: Michael Stack


A coworker was trying to use hbase client in a fat application that uses a 
different netty version to what hbase uses internally. Their app would fail to 
launch because it kept bumping into an incompatible netty .so lib. Here are the 
unshaded netty .so's we bundle looking at hbase-2.4.1...:

./lib/hbase-shaded-netty-3.4.1.jar has:

{code}
META-INF/native/libnetty_transport_native_epoll_aarch_64.so
META-INF/native/liborg_apache_hbase_thirdparty_netty_transport_native_epoll_x86_64.so
META-INF/native/libnetty_transport_native_epoll_x86_64.so
{code}

(HBASE-25959 should fix the non-relocation of 
libnetty_transport_native_epoll_aarch_64).

./lib/shaded-clients/hbase-shaded-client-byo-hadoop-2.4.1.1-apple.jar has the 
same three .sos as does 
./lib/shaded-clients/hbase-shaded-mapreduce-2.4.1.1-apple.jar
and ./lib/shaded-clients/hbase-shaded-client-2.4.1.1-apple.jar

We even bundle ./lib/netty-all-4.1.17.Final.jar which unsurprisingly has the 
netty .sos in it.

Looking at published builds of hbase-thirdparty, I see that these too include 
the above trio of .sos... The hbase-shaded-netty includes them in 3.4.1 
https://repo1.maven.org/maven2/org/apache/hbase/thirdparty/hbase-shaded-netty/3.4.1/
 as does 3.5.0.

I just tried running a build of hbase-thirdparty and it does NOT include the 
extras

META-INF/native/liborg_apache_hbase_thirdparty_netty_transport_native_epoll_aarch_64.so
META-INF/native/liborg_apache_hbase_thirdparty_netty_transport_native_epoll_x86_64.so

(it has the fix for aarch included... when I built)

Here is link to the snapshot I made:

https://repository.apache.org/content/repositories/orgapachehbase-1451/org/apache/hbase/thirdparty/hbase-shaded-netty/3.5.1-stack4/






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HBASE-25959) Relocate libnetty_transport_native_epoll_aarch_64.so in hbase-thirdparty

2021-06-01 Thread Michael Stack (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-25959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25959.
---
Fix Version/s: hbase-thirdparty-3.5.1
 Assignee: Michael Stack
   Resolution: Fixed

Pushed the one-liner.

> Relocate libnetty_transport_native_epoll_aarch_64.so in hbase-thirdparty
> 
>
> Key: HBASE-25959
> URL: https://issues.apache.org/jira/browse/HBASE-25959
> Project: HBase
>  Issue Type: Bug
>  Components: hbase-thirdparty
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Minor
> Fix For: hbase-thirdparty-3.5.1
>
> Attachments: 
> 0001-HBASE-25959-Relocate-libnetty_transport_native_epoll.patch
>
>
> Minor item I came across while trying to figure where all the netty 
> native_epoll .so instances are coming from when I look at an hbase release.  
> We've relocated the x86 lib but not the aarch_64... Minor item.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HBASE-25959) Relocate libnetty_transport_native_epoll_aarch_64.so in hbase-thirdparty

2021-06-01 Thread Michael Stack (Jira)

Michael Stack created HBASE-25959:
-

 Summary: Relocate libnetty_transport_native_epoll_aarch_64.so in 
hbase-thirdparty
 Key: HBASE-25959
 URL: https://issues.apache.org/jira/browse/HBASE-25959
 Project: HBase
  Issue Type: Bug
  Components: hbase-thirdparty
Reporter: Michael Stack


Minor item I came across while trying to figure where all the netty 
native_epoll .so instances are coming from when I look at an hbase release.  
We've relocated the x86 lib but not the aarch_64... Minor item.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HBASE-19701) Close without justification following succesful open

2021-06-01 Thread Michael Stack (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-19701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-19701.
---
Resolution: Cannot Reproduce

Closing as 'Cannot Reporduce' ...  haven't seen it since original filing... May 
still be there but no work done on this item... Will open new one if seen again.

> Close without justification following succesful open
> 
>
> Key: HBASE-19701
> URL: https://issues.apache.org/jira/browse/HBASE-19701
> Project: HBase
>  Issue Type: Bug
>Reporter: Michael Stack
>Priority: Critical
> Fix For: 3.0.0-alpha-2
>
>
> [~jmspaggi] conjured an interesting condition where we close a region soon 
> after open WITHOUT seemingly saying why (It looks like Master is asking for 
> region CLOSE but that is not clear looking at RegionServer log).
> Here is log snippet from https://pastebin.com/0r76Y6ap (in case the pastebin 
> evaporates)
> {code}
> 
> 2017-12-31 09:54:20,864 INFO  
> [PostOpenDeployTasks:f49f3cbb7f3db4cf96c7eb3b0cf83869] 
> regionserver.HRegionServer: Post open deploy tasks for 
> TestTable,0408944640,1505391191559.f49f3cbb7f3db4cf96c7eb3b0cf83869.
> 2017-12-31 09:54:20,870 INFO  
> [StoreOpener-330f09f4a0eaf26811c320fbf1b14e70-1] 
> regionserver.CompactingMemStore: Setting in-memory flush size threshold to 
> 13421772 and immutable segments index to be of type CHUNK_MAP
> 2017-12-31 09:54:20,870 INFO  
> [StoreOpener-330f09f4a0eaf26811c320fbf1b14e70-1] regionserver.HStore: 
> Memstore class name is org.apache.hadoop.hbase.regionserver.CompactingMemStore
> 2017-12-31 09:54:20,870 INFO  
> [StoreOpener-330f09f4a0eaf26811c320fbf1b14e70-1] hfile.CacheConfig: Created 
> cacheConfig for info: blockCache=LruBlockCache{blockCount=0, 
> currentSize=2454760, freeSize=3347745560, maxSize=3350200320, 
> heapSize=2454760, minSize=3182690304, minFactor=0.95, multiSize=1591345152, 
> multiFactor=0.5, singleSize=795672576, singleFactor=0.25}, 
> cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, 
> cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, 
> prefetchOnOpen=false
> 2017-12-31 09:54:20,872 INFO  
> [StoreOpener-330f09f4a0eaf26811c320fbf1b14e70-1] 
> compactions.CompactionConfiguration: size [134217728, 9223372036854775807, 
> 9223372036854775807); files [3, 10); ratio 1,20; off-peak ratio 5,00; 
> throttle point 2684354560; major period 60480, major jitter 0,50, min 
> locality to compact 0,00; tiered compaction: max_age 9223372036854775807, 
> incoming window min 6, compaction policy for tiered window 
> org.apache.hadoop.hbase.regionserver.compactions.ExploringCompactionPolicy, 
> single output for minor true, compaction window factory 
> org.apache.hadoop.hbase.regionserver.compactions.ExponentialCompactionWindowFactory
> 2017-12-31 09:54:20,903 INFO  
> [StoreOpener-166b9c45d7724f72fd126adb4445d6ec-1] 
> regionserver.CompactingMemStore: Setting in-memory flush size threshold to 
> 13421772 and immutable segments index to be of type CHUNK_MAP
> 2017-12-31 09:54:20,904 INFO  
> [StoreOpener-166b9c45d7724f72fd126adb4445d6ec-1] regionserver.HStore: 
> Memstore class name is org.apache.hadoop.hbase.regionserver.CompactingMemStore
> 2017-12-31 09:54:20,904 INFO  
> [StoreOpener-166b9c45d7724f72fd126adb4445d6ec-1] hfile.CacheConfig: Created 
> cacheConfig for info: blockCache=LruBlockCache{blockCount=0, 
> currentSize=2454760, freeSize=3347745560, maxSize=3350200320, 
> heapSize=2454760, minSize=3182690304, minFactor=0.95, multiSize=1591345152, 
> multiFactor=0.5, singleSize=795672576, singleFactor=0.25}, 
> cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, 
> cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, 
> prefetchOnOpen=false
> 2017-12-31 09:54:20,905 INFO  
> [StoreOpener-166b9c45d7724f72fd126adb4445d6ec-1] 
> compactions.CompactionConfiguration: size [134217728, 9223372036854775807, 
> 9223372036854775807); files [3, 10); ratio 1,20; off-peak ratio 5,00; 
> throttle point 2684354560; major period 60480, major jitter 0,50, min 
> locality to compact 0,00; tiered compaction: max_age 9223372036854775807, 
> incoming window min 6, compaction policy for tiered window 
> org.apache.hadoop.hbase.regionserver.compactions.ExploringCompactionPolicy, 
> single output for minor true, compaction window factory 
> org.apache.hadoop.hbase.regionserver.compactions.ExponentialCompactionWindowFactory
> 2017-12-31 09:54:20,929 INFO  [RS_OPEN_REGION-node1:16020-1] 
> regionserver.HRegion: Setting FlushNonSloppyStoresFirstPolicy for the 
> region=TestTable,0262144000,1505391191559.166b9c45d7724f72fd126adb4445d6ec.
> 2017-12-31 09:54:20,956 INFO  [RS_OPEN_REGION-node1:16020-0] 
>

[jira] [Resolved] (HBASE-25941) TestRESTServerSSL fails because of jdk bug

2021-05-30 Thread Michael Stack (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-25941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25941.
---
Fix Version/s: 2.4.4
   2.3.6
   3.0.0-alpha-1
 Hadoop Flags: Reviewed
 Assignee: Michael Stack
   Resolution: Fixed

Pushed to branch-2.3+. Thanks for the review [~weichiu] (and for finding how to 
fix)

> TestRESTServerSSL fails because of jdk bug
> --
>
> Key: HBASE-25941
> URL: https://issues.apache.org/jira/browse/HBASE-25941
> Project: HBase
>  Issue Type: Sub-task
>  Components: test
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.6, 2.4.4
>
>
> [~weijing329] identified issue in TestRESTServerSSL when using jdk8 292+. It 
> came up in comment in the parent issue. I verified it fails for me using jdk8 
> v292. Here is the failure
> ```[INFO] Running org.apache.hadoop.hbase.rest.TestRESTServerSSL
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 1.4 s 
> <<< FAILURE! - in org.apache.hadoop.hbase.rest.TestRESTServerSSL
> [ERROR] org.apache.hadoop.hbase.rest.TestRESTServerSSL  Time elapsed: 1.387 s 
>  <<< ERROR!
> java.security.NoSuchAlgorithmException: unrecognized algorithm name: 
> PBEWithSHA1AndDESede
>   at 
> org.apache.hadoop.hbase.rest.TestRESTServerSSL.beforeClass(TestRESTServerSSL.java:74)```
> For workaround, see https://github.com/bcgit/bc-java/issues/941



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HBASE-25940) Update Compression/TestCompressionTest: LZ4, SNAPPY, LZO

2021-05-29 Thread Michael Stack (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-25940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25940.
---
Hadoop Flags: Reviewed
Assignee: Michael Stack
  Resolution: Fixed

Merged to branch-2.4+. Thanks for review [~zhangduo]

> Update Compression/TestCompressionTest: LZ4, SNAPPY, LZO
> 
>
> Key: HBASE-25940
> URL: https://issues.apache.org/jira/browse/HBASE-25940
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.5.0, 2.4.4
>
>
> LZ4 was changed in hadoop-3.3.1 to use a more amenable library, one that did 
> not require gymnastics installing lz4 native libs everywhere; rather, you 
> just add a jar to the classpath. See HADOOP-17292.
> Similar was done for SNAPPY. See HADOOP-17125.
> What this means is that our TestCompressionTest passes for hadoop before 
> 3.3.1 but at 3.3.1, the assert that SNAPPY and LZ4 compressors should fail -- 
> because no lib installed -- now no longer asserts.
> While in here, LZO is GPL and requires extra install to setup [1]. When 
> TestCompressionTest runs, it emits the below for the LZO check. The check is 
> kinda useless.
> {code}
> 2021-05-28T10:05:36,513 WARN  [Time-limited test] util.CompressionTest(75): 
> Can't instantiate codec: lzo
> org.apache.hadoop.hbase.DoNotRetryIOException: Compression algorithm 'lzo' 
> previously failed test.
> {code}
> I think best thing for now is to comment out the asserts that LZ4 and SNAPPY 
> do NOT work when binary cannot be found -- since this holds only if hadoop < 
> 3.3..1; the test is a little weak anyways.
> 1. 
> https://stackoverflow.com/questions/23441142/class-com-hadoop-compression-lzo-lzocodec-not-found-for-spark-on-cdh-5



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HBASE-25941) TestRESTServerSSL fails because of jdk bug

2021-05-28 Thread Michael Stack (Jira)

Michael Stack created HBASE-25941:
-

 Summary: TestRESTServerSSL fails because of jdk bug
 Key: HBASE-25941
 URL: https://issues.apache.org/jira/browse/HBASE-25941
 Project: HBase
  Issue Type: Sub-task
  Components: test
Reporter: Michael Stack


[~weijing329] identified issue in TestRESTServerSSL when using jdk8 252+. It 
came up in comment in the parent issue. I verified it fails for me using jdk8 
v292. Here is the failure

```[INFO] Running org.apache.hadoop.hbase.rest.TestRESTServerSSL
[ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 1.4 s 
<<< FAILURE! - in org.apache.hadoop.hbase.rest.TestRESTServerSSL
[ERROR] org.apache.hadoop.hbase.rest.TestRESTServerSSL  Time elapsed: 1.387 s  
<<< ERROR!
java.security.NoSuchAlgorithmException: unrecognized algorithm name: 
PBEWithSHA1AndDESede
at 
org.apache.hadoop.hbase.rest.TestRESTServerSSL.beforeClass(TestRESTServerSSL.java:74)```

For workaround, see https://github.com/bcgit/bc-java/issues/941



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HBASE-25940) Update Compression/TestCompressionTest: LZ4, SNAPPY, LZO

2021-05-28 Thread Michael Stack (Jira)

Michael Stack created HBASE-25940:
-

 Summary: Update Compression/TestCompressionTest: LZ4, SNAPPY, LZO
 Key: HBASE-25940
 URL: https://issues.apache.org/jira/browse/HBASE-25940
 Project: HBase
  Issue Type: Sub-task
Reporter: Michael Stack


LZ4 was changed in hadoop-3.3.1 to use a more amenable library, one that did 
not require gymnastics installing lz4 native libs everywhere; rather, you just 
add a jar to the classpath. See HADOOP-17292.

Similar was done for SNAPPY. See HADOOP-17125.

What this means is that our TestCompressionTest passes for hadoop before 3.3.1 
but at 3.3.1, the assert that SNAPPY and LZ4 compressors should fail -- because 
no lib installed -- now no longer asserts.

While in here, LZO is GPL and requires extra install to setup [1]. When 
TestCompressionTest runs, it emits the below for the LZO check. The check is 
kinda useless.

{code}
2021-05-28T10:05:36,513 WARN  [Time-limited test] util.CompressionTest(75): 
Can't instantiate codec: lzo
org.apache.hadoop.hbase.DoNotRetryIOException: Compression algorithm 'lzo' 
previously failed test.
{code}

I think best thing for now is to comment out the asserts that LZ4 and SNAPPY do 
NOT work when binary cannot be found -- since this holds only if hadoop < 
3.3..1; the test is a little weak anyways.



1. 
https://stackoverflow.com/questions/23441142/class-com-hadoop-compression-lzo-lzocodec-not-found-for-spark-on-cdh-5



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HBASE-25861) Correct the usage of Configuration#addDeprecation

2021-05-28 Thread Michael Stack (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-25861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25861.
---
Hadoop Flags: Reviewed
  Resolution: Fixed

Re-resolving. Issue addressed over in HBASE-25928

> Correct the usage of Configuration#addDeprecation
> -
>
> Key: HBASE-25861
> URL: https://issues.apache.org/jira/browse/HBASE-25861
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha-1, 2.5.0
>Reporter: Baiqiang Zhao
>Assignee: Baiqiang Zhao
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.5.0
>
>
> When I was solving HBASE-25745 
> ([PR3139|https://github.com/apache/hbase/pull/3139]), I found that our use of 
> Configuration#addDeprecation API was wrong. 
>  
> At present, we will call Configuration#addDeprecation in the static block for 
> the deprecated configuration. But after testing, it is found that this does 
> not complete backward compatibility. When user upgrades HBase and does not 
> change the deprecated configuration to the new configuration, he will find 
> that the deprecated configuration does not effect, which may not be 
> consistent with expectations. The specific test results can be seen in the PR 
> above, and we can found the calling order of Configuration#addDeprecation is 
> very important.
>  
> Configuration#addDeprecation is a Hadoop API, looking through the Hadoop 
> source code, we will find that before creating the Configuration object, the 
> addDeprecatedKeys() method will be called first: 
> [https://github.com/apache/hadoop/blob/b93e448f9aa66689f1ce5059f6cdce8add130457/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/HdfsConfiguration.java#L34]
>  .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HBASE-25928) TestHBaseConfiguration#testDeprecatedConfigurations is broken with Hadoop 3.3

2021-05-28 Thread Michael Stack (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-25928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25928.
---
Fix Version/s: 2.5.0
   3.0.0-alpha-1
 Hadoop Flags: Reviewed
   Resolution: Fixed

Pushed to branch-2 and master. Thanks for finding the issue [~weichiu] and 
thanks for the fix [~DeanZ]

> TestHBaseConfiguration#testDeprecatedConfigurations is broken with Hadoop 3.3
> -
>
> Key: HBASE-25928
> URL: https://issues.apache.org/jira/browse/HBASE-25928
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha-1, 2.5.0
>Reporter: Wei-Chiu Chuang
>Assignee: Baiqiang Zhao
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.5.0
>
>
> The test TestHBaseConfiguration#testDeprecatedConfigurations was added 
> recently by HBASE-25861 to address the usage of Hadoop Configuration 
> addDeprecations API.
> However, the API's behavior was changed to fix a bug.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HBASE-25908) Exclude jakarta.activation-api

2021-05-27 Thread Michael Stack (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-25908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25908.
---
Fix Version/s: 2.5.0
   3.0.0-alpha-1
 Hadoop Flags: Reviewed
 Tags: hadoop-3.3.1
   Resolution: Fixed

Merged to master and branch-2. Should I backport to branch-2.4 [~apurtell] ? 
You want to run on hadoop 3.3.1? Otherwise, hbase-2.5 to run on hadoop-3.3.1?

Thanks for the fix and the nice background [~weichiu].

> Exclude jakarta.activation-api
> --
>
> Key: HBASE-25908
> URL: https://issues.apache.org/jira/browse/HBASE-25908
> Project: HBase
>  Issue Type: Improvement
>  Components: hadoop3, shading
>Affects Versions: 2.3.0
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.5.0
>
>
> Hadoop 3.3.1 replaced its dependency of javax.activation 1.2.0 with 
> jakarta.activation 1.2.1.
> They are essentially the same thing (they even have the same classpath name), 
> but Eclipse took over JavaEE development and therefore changed group/artifact 
> id. 
> (https://stackoverflow.com/questions/46493613/what-is-the-replacement-for-javax-activation-package-in-java-9)
> See HADOOP-17049 for more details. Hadoop 3.3.0 updated jackson-databind to 
> 2.10 which shades jakarta.activation, causing classpath conflict.
> The solution to this issue will be similar to HBASE-22268



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HBASE-25902) 1.x to 2.3.x upgrade does not work; you must install an hbase2 that is earlier than hbase-2.3.0 first

2021-05-20 Thread Michael Stack (Jira)

Michael Stack created HBASE-25902:
-

 Summary: 1.x to 2.3.x upgrade does not work; you must install an 
hbase2 that is earlier than hbase-2.3.0 first
 Key: HBASE-25902
 URL: https://issues.apache.org/jira/browse/HBASE-25902
 Project: HBase
  Issue Type: Bug
Reporter: Michael Stack


Making note of this issue in case others run into it. At my place of employ, we 
tried to upgrade a cluster that was an hbase-1.2.x version to an hbase-2.3.5 
but it failed because meta didn't have the 'table' column family.

Up to 2.3.0, hbase:meta was hardcoded. HBASE-12035 added the 'table' CF for 
hbase-2.0.0. HBASE-23782 (2.3.0) undid hardcoding of the hbase:meta schema; 
i.e. reading hbase:meta schema from the filesystem. The hbase:meta schema is 
only created on initial install. If an upgrade over existing data, the hbase-1 
hbase:meta will not be suitable for hbase-2.3.x context as it will be missing 
columnfamilies needed to run (HBASE-23055 made it so hbase:meta could be 
altered (2.3.0) but probably of no use since Master won't come up).

It would be a nice-to-have if a user could go from hbase1 to hbase.2.3.0 w/o 
having to first install an hbase2 that is earlier than 2.3.0 but needs to be 
demand before we would work on it; meantime, install an intermediate hbase2 
version before going to hbase-2.3.0+ if coming from hbase-1.x



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HBASE-25870) Validate only direct ancestors instead of entire history for a particular backup

2021-05-12 Thread Michael Stack (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-25870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25870.
---
Fix Version/s: 3.0.0-alpha-1
 Hadoop Flags: Reviewed
   Resolution: Fixed

Merged to master. Thanks for fix [~rda3mon]

> Validate only direct ancestors instead of entire history for a particular 
> backup
> 
>
> Key: HBASE-25870
> URL: https://issues.apache.org/jira/browse/HBASE-25870
> Project: HBase
>  Issue Type: Bug
>  Components: backuprestore
>Reporter: Mallikarjun
>Assignee: Mallikarjun
>Priority: Major
> Fix For: 3.0.0-alpha-1
>
>
> While creating a manifest of particular backup, it looks for entire history 
> of backups taken on that cluster and links are still valid. This need not 
> hold true and unnecessary. Only ancestors of a particular incremental backup 
> is necessary and sufficient.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HBASE-25876) Add retry if we fail to read all bytes of the protobuf magic marker

2021-05-12 Thread Michael Stack (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-25876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25876.
---
Fix Version/s: 2.4.4
   2.5.0
   3.0.0-alpha-1
 Hadoop Flags: Reviewed
 Assignee: Michael Stack
   Resolution: Fixed

Pushed to branch-2.4+ (Master only took one of the patched changes because no 
HRegionInfo in master branch). Thanks for reviews [~anoop.hbase] and [~zhangduo]

> Add retry if we fail to read all bytes of the protobuf magic marker
> ---
>
> Key: HBASE-25876
> URL: https://issues.apache.org/jira/browse/HBASE-25876
> Project: HBase
>  Issue Type: Sub-task
>  Components: io
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Trivial
> Fix For: 3.0.0-alpha-1, 2.5.0, 2.4.4
>
>
> The parent issue fixes an instance where we try once to read protobuf magic 
> marker bytes rather than retry till we have enough. This subtask applies the 
> same trick in all cases where we could run into this issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HBASE-25867) Extra doc around ITBLL

2021-05-11 Thread Michael Stack (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-25867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25867.
---
Hadoop Flags: Reviewed
Assignee: Michael Stack
  Resolution: Fixed

Pushed to branch-2.4+. Thanks for review [~busbey]

> Extra doc around ITBLL
> --
>
> Key: HBASE-25867
> URL: https://issues.apache.org/jira/browse/HBASE-25867
> Project: HBase
>  Issue Type: Bug
>  Components: documentation
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Minor
> Fix For: 3.0.0-alpha-1, 2.5.0, 2.4.4
>
>
> Added some doc around ITBLL to explain stuff I had difficultly with. Minor 
> items such as log message & javadoc edits and explaining how to pass 
> configuration to the ChaosMonkeyRunner.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HBASE-25876) Add retry if we fail to read all bytes of the protobuf magic marker

2021-05-10 Thread Michael Stack (Jira)

Michael Stack created HBASE-25876:
-

 Summary: Add retry if we fail to read all bytes of the protobuf 
magic marker
 Key: HBASE-25876
 URL: https://issues.apache.org/jira/browse/HBASE-25876
 Project: HBase
  Issue Type: Sub-task
  Components: io
Reporter: Michael Stack


The parent issue fixes an instance where we try once to read protobuf magic 
marker bytes rather than retry till we have enough. This subtask applies the 
same trick in all cases where we could run into this issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HBASE-25859) Reference class incorrectly parses the protobuf magic marker

2021-05-10 Thread Michael Stack (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-25859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25859.
---
Fix Version/s: 2.4.4
   2.5.0
   3.0.0-alpha-1
 Hadoop Flags: Reviewed
   Resolution: Fixed

Merged the PR.

Let me make a subissue to address other instances of the problem here.

> Reference class incorrectly parses the protobuf magic marker
> 
>
> Key: HBASE-25859
> URL: https://issues.apache.org/jira/browse/HBASE-25859
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 2.4.1
>Reporter: Constantin-Catalin Luca
>Assignee: Constantin-Catalin Luca
>Priority: Minor
> Fix For: 3.0.0-alpha-1, 2.5.0, 2.4.4
>
>
> The Reference class incorrectly parses the protobuf magic marker.
> It uses:
> {code:java}
> // DataInputStream.read(byte[lengthOfPNMagic]){code}
> but this call does not guarantee to read all the bytes of the marker.
>  The fix is the same as the one for 
> https://issues.apache.org/jira/browse/HBASE-25674



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HBASE-25867) Extra doc around ITBLL

2021-05-07 Thread Michael Stack (Jira)

Michael Stack created HBASE-25867:
-

 Summary: Extra doc around ITBLL
 Key: HBASE-25867
 URL: https://issues.apache.org/jira/browse/HBASE-25867
 Project: HBase
  Issue Type: Bug
  Components: documentation
Reporter: Michael Stack
 Fix For: 3.0.0-alpha-1, 2.5.0, 2.4.4


Added some doc around ITBLL to explain stuff I had difficultly with. Minor 
items such as log message & javadoc edits and explaining how to pass 
configuration to the ChaosMonkeyRunner.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HBASE-25792) Filter out o.a.hadoop.thirdparty building shaded jars

2021-04-27 Thread Michael Stack (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-25792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25792.
---
Fix Version/s: 2.4.3
   2.5.0
   3.0.0-alpha-1
 Hadoop Flags: Reviewed
   Resolution: Fixed

Pushed to branch-2.4+. Thanks for reviews [~weichiu], [~zhangduo], and 
[~ndimiduk]

> Filter out o.a.hadoop.thirdparty building shaded jars
> -
>
> Key: HBASE-25792
> URL: https://issues.apache.org/jira/browse/HBASE-25792
> Project: HBase
>  Issue Type: Bug
>  Components: shading
>Affects Versions: 3.0.0-alpha-1, 2.5.0, 2.4.3
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.5.0, 2.4.3
>
>
> Hadoop 3.3.1 (unreleased currently) shades guava. The shaded guava then trips 
> the check in our shading that tries to exclude hadoop bits from the fat jars 
> we build.
> For the issue to trigger, need to build against tip of hadoop branch-3.3. You 
> then get this complaint:
> {code}
> [INFO] --- exec-maven-plugin:1.6.0:exec (check-jar-contents) @ 
> hbase-shaded-check-invariants ---
> [ERROR] Found artifact with unexpected contents: 
> '/Users/stack/.m2/repository/org/apache/hbase/hbase-shaded-mapreduce/2.3.6-SNAPSHOT/hbase-shaded-mapreduce-2.3.6-SNAPSHOT.jar'
> Please check the following and either correct the build or update
> the allowed list with reasoning.
> org/apache/hadoop/thirdparty/
> org/apache/hadoop/thirdparty/com/
> org/apache/hadoop/thirdparty/com/google/
> org/apache/hadoop/thirdparty/com/google/common/
> org/apache/hadoop/thirdparty/com/google/common/annotations/
> org/apache/hadoop/thirdparty/com/google/common/annotations/Beta.class
> 
> org/apache/hadoop/thirdparty/com/google/common/annotations/GwtCompatible.class
> 
> org/apache/hadoop/thirdparty/com/google/common/annotations/GwtIncompatible.class
> 
> org/apache/hadoop/thirdparty/com/google/common/annotations/VisibleForTesting.class
> org/apache/hadoop/thirdparty/com/google/common/base/
> org/apache/hadoop/thirdparty/com/google/common/base/Absent.class
> 
> org/apache/hadoop/thirdparty/com/google/common/base/AbstractIterator$1.class
> 
> org/apache/hadoop/thirdparty/com/google/common/base/AbstractIterator$State.class
> org/apache/hadoop/thirdparty/com/google/common/base/AbstractIterator.class
> org/apache/hadoop/thirdparty/com/google/common/base/Ascii.class
> org/apache/hadoop/thirdparty/com/google/common/base/CaseFormat$1.class
> org/apache/hadoop/thirdparty/com/google/common/base/CaseFormat$2.class
> org/apache/hadoop/thirdparty/com/google/common/base/CaseFormat$3.class
> org/apache/hadoop/thirdparty/com/google/common/base/CaseFormat$4.class
> 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HBASE-25792) Filter out o.a.hadoop.thirdparty building shaded jars

2021-04-19 Thread Michael Stack (Jira)

Michael Stack created HBASE-25792:
-

 Summary: Filter out o.a.hadoop.thirdparty building shaded jars
 Key: HBASE-25792
 URL: https://issues.apache.org/jira/browse/HBASE-25792
 Project: HBase
  Issue Type: Bug
  Components: shading
Affects Versions: 3.0.0-alpha-1, 2.5.0, 2.4.3
Reporter: Michael Stack
Assignee: Michael Stack


Hadoop 3.3.1 (unreleased currently) shades guava. The shaded guava then trips 
the check in our shading that tries to exclude hadoop bits from the fat jars we 
build.

For the issue to trigger, need to build against tip of hadoop branch-3.3.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HBASE-25761) POC: hbase:meta,,1 as ROOT

2021-04-10 Thread Michael Stack (Jira)

Michael Stack created HBASE-25761:
-

 Summary: POC: hbase:meta,,1 as ROOT
 Key: HBASE-25761
 URL: https://issues.apache.org/jira/browse/HBASE-25761
 Project: HBase
  Issue Type: Sub-task
Reporter: Michael Stack


One of the proposals up in the split-meta design doc suggests a sleight-of-hand 
where the current hard-coded hbase:meta,,1 Region is leveraged to serve as 
first Region of a split hbase:meta but also does double-duty as 'ROOT'. This 
suggestion was put aside as a complicating recursion in chat but then Francis 
noticed on a re-read of the BigTable paper, that this is how they describe they 
do 'ROOT': "The root tablet is just the first tablet in the METADATA table, but 
is treated specially -- it is never split..."

This issue is for playing around with this notion to see what the problems are 
so can do a better description of this approach here, in the design:

https://docs.google.com/document/d/11ChsSb2LGrSzrSJz8pDCAw5IewmaMV0ZDN1LrMkAj4s/edit?ts=606c120f#heading=h.ikbhxlcthjle



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HBASE-25735) Add target Region to connection exceptions

2021-04-08 Thread Michael Stack (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-25735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25735.
---
Resolution: Fixed

I pushed this addendum on branch-2.4+

{code}
kalashnikov:hbase.apache.git stack$ git show -1
commit f9819f33b6b1016364c10d80129e3d0faf7ff17e (HEAD -> m, origin/master, 
origin/HEAD)
Author: stack 
Date:   Thu Apr 8 13:24:29 2021 -0700

HBASE-25735 Add target Region to connection exceptions
Restore API for Phoenix (though it shouldn't be using
Private classes).

diff --git 
a/hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/RpcControllerFactory.java
 
b/hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/RpcControllerFactory.java
index 0dcb22fa5b..e6d63fac1f 100644
--- 
a/hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/RpcControllerFactory.java
+++ 
b/hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/RpcControllerFactory.java
@@ -18,15 +18,14 @@
 package org.apache.hadoop.hbase.ipc;

 import java.util.List;
-
 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.hbase.CellScannable;
 import org.apache.hadoop.hbase.CellScanner;
 import org.apache.hadoop.hbase.client.RegionInfo;
+import org.apache.hadoop.hbase.util.ReflectionUtils;
 import org.apache.yetus.audience.InterfaceAudience;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
-import org.apache.hadoop.hbase.util.ReflectionUtils;

 /**
  * Factory to create a {@link HBaseRpcController}
@@ -52,16 +51,23 @@ public class RpcControllerFactory {
 return new HBaseRpcControllerImpl();
   }

+  public HBaseRpcController newController(CellScanner cellScanner) {
+return new HBaseRpcControllerImpl(null, cellScanner);
+  }
+
   public HBaseRpcController newController(RegionInfo regionInfo, CellScanner 
cellScanner) {
 return new HBaseRpcControllerImpl(regionInfo, cellScanner);
   }

+  public HBaseRpcController newController(final List 
cellIterables) {
+return new HBaseRpcControllerImpl(null, cellIterables);
+  }
+
   public HBaseRpcController newController(RegionInfo regionInfo,
   final List cellIterables) {
 return new HBaseRpcControllerImpl(regionInfo, cellIterables);
   }

-
   public static RpcControllerFactory instantiate(Configuration configuration) {
 String rpcControllerFactoryClazz =
 configuration.get(CUSTOM_CONTROLLER_CONF_KEY,
{code}

> Add target Region to connection exceptions
> --
>
> Key: HBASE-25735
> URL: https://issues.apache.org/jira/browse/HBASE-25735
> Project: HBase
>  Issue Type: Bug
>  Components: rpc
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.5.0, 2.4.3
>
>
> We spent a bit of time making it so exceptions included the remote host name. 
> Looks like we can add the target Region name too with a bit of manipulation; 
> will help figuring hot-spotting or problem Region on serverside.  For 
> example, here is what I was seeing recently on client-side when a RS was was 
> timing out requests:
> {code}
> 2021-04-06T02:18:23.533Z, RpcRetryingCaller{globalStartTime=1617675482894, 
> pause=100, maxAttempts=4}, org.apache.hadoop.hbase.ipc.CallTimeoutException: 
> Call to ps0989.example.org/1.1.1.1:16020 failed on local exception: 
> org.apache.hadoop.hbase.ipc.CallTimeoutException: 
> Call[id=88369369,methodName=Get], waitTime=5006, rpcTimeout=5000
> at 
> org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:145)
> at org.apache.hadoop.hbase.client.HTable.get(HTable.java:383)
> at org.apache.hadoop.hbase.client.HTable.get(HTable.java:357)
> ...
> Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call to 
> ps0989.bot.parsec.apple.com/17.58.114.206:16020 failed on local exception: 
> org.apache.hadoop.hbase.ipc.CallTimeoutException: 
> Call[id=88369369,methodName=Get], waitTime=5006, rpcTimeout=5000
> at org.apache.hadoop.hbase.ipc.IPCUtil.wrapException(IPCUtil.java:209)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractRpcClient.java:378)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$100(AbstractRpcClient.java:89)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:409)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:405)
> at org.apache.hadoop.hbase.ipc.Call.setTimeout(Call.java:110)
> at 
> org.apache.hadoop.hbase.ipc.RpcConnection$1.run(RpcConnection.java:136)
> at 
> org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:672)
> at 
>

[jira] [Reopened] (HBASE-25735) Add target Region to connection exceptions

2021-04-08 Thread Michael Stack (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-25735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack reopened HBASE-25735:
---

Reopening to add back old APIs used by Phoenix (though it shouldn't be down in 
our privates)

> Add target Region to connection exceptions
> --
>
> Key: HBASE-25735
> URL: https://issues.apache.org/jira/browse/HBASE-25735
> Project: HBase
>  Issue Type: Bug
>  Components: rpc
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.5.0, 2.4.3
>
>
> We spent a bit of time making it so exceptions included the remote host name. 
> Looks like we can add the target Region name too with a bit of manipulation; 
> will help figuring hot-spotting or problem Region on serverside.  For 
> example, here is what I was seeing recently on client-side when a RS was was 
> timing out requests:
> {code}
> 2021-04-06T02:18:23.533Z, RpcRetryingCaller{globalStartTime=1617675482894, 
> pause=100, maxAttempts=4}, org.apache.hadoop.hbase.ipc.CallTimeoutException: 
> Call to ps0989.example.org/1.1.1.1:16020 failed on local exception: 
> org.apache.hadoop.hbase.ipc.CallTimeoutException: 
> Call[id=88369369,methodName=Get], waitTime=5006, rpcTimeout=5000
> at 
> org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:145)
> at org.apache.hadoop.hbase.client.HTable.get(HTable.java:383)
> at org.apache.hadoop.hbase.client.HTable.get(HTable.java:357)
> ...
> Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call to 
> ps0989.bot.parsec.apple.com/17.58.114.206:16020 failed on local exception: 
> org.apache.hadoop.hbase.ipc.CallTimeoutException: 
> Call[id=88369369,methodName=Get], waitTime=5006, rpcTimeout=5000
> at org.apache.hadoop.hbase.ipc.IPCUtil.wrapException(IPCUtil.java:209)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractRpcClient.java:378)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$100(AbstractRpcClient.java:89)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:409)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:405)
> at org.apache.hadoop.hbase.ipc.Call.setTimeout(Call.java:110)
> at 
> org.apache.hadoop.hbase.ipc.RpcConnection$1.run(RpcConnection.java:136)
> at 
> org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:672)
> at 
> org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:747)
> at 
> org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:472)
> ... 1 more
> Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: 
> Call[id=88369369,methodName=Get], waitTime=5006, rpcTimeout=5000
> at 
> org.apache.hadoop.hbase.ipc.RpcConnection$1.run(RpcConnection.java:137)
> ... 4 more
> {code}
> I wanted the region it was hitting. I wanted to know if it was a server 
> problem or a Region issue. If clients only having issue w/ one Region, then I 
> could focus on it.
> After the PR the exception (from another context) looks like this:
> {code}
> org.apache.hadoop.hbase.ipc.CallTimeoutException: Call to 
> address=127.0.0.1:12345, regionInfo=hbase:meta,,1.1588230740 failed on local 
> exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: error
> 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HBASE-25687) Backport "HBASE-25681 Add a switch for server/table queryMeter" to branch-2 and branch-1

2021-04-07 Thread Michael Stack (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-25687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25687.
---
Hadoop Flags: Reviewed
  Resolution: Fixed

Merged to branch-1. Thanks for patch [~DeanZ]

> Backport "HBASE-25681 Add a switch for server/table queryMeter" to branch-2 
> and branch-1
> 
>
> Key: HBASE-25687
> URL: https://issues.apache.org/jira/browse/HBASE-25687
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Baiqiang Zhao
>Assignee: Baiqiang Zhao
>Priority: Major
> Fix For: 1.7.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HBASE-25735) Add target Region to connection exceptions

2021-04-07 Thread Michael Stack (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-25735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25735.
---
Resolution: Fixed

Re-resolved after pushing addendum on master.

> Add target Region to connection exceptions
> --
>
> Key: HBASE-25735
> URL: https://issues.apache.org/jira/browse/HBASE-25735
> Project: HBase
>  Issue Type: Bug
>  Components: rpc
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.5.0, 2.4.3
>
>
> We spent a bit of time making it so exceptions included the remote host name. 
> Looks like we can add the target Region name too with a bit of manipulation; 
> will help figuring hot-spotting or problem Region on serverside.  For 
> example, here is what I was seeing recently on client-side when a RS was was 
> timing out requests:
> {code}
> 2021-04-06T02:18:23.533Z, RpcRetryingCaller{globalStartTime=1617675482894, 
> pause=100, maxAttempts=4}, org.apache.hadoop.hbase.ipc.CallTimeoutException: 
> Call to ps0989.example.org/1.1.1.1:16020 failed on local exception: 
> org.apache.hadoop.hbase.ipc.CallTimeoutException: 
> Call[id=88369369,methodName=Get], waitTime=5006, rpcTimeout=5000
> at 
> org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:145)
> at org.apache.hadoop.hbase.client.HTable.get(HTable.java:383)
> at org.apache.hadoop.hbase.client.HTable.get(HTable.java:357)
> ...
> Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call to 
> ps0989.bot.parsec.apple.com/17.58.114.206:16020 failed on local exception: 
> org.apache.hadoop.hbase.ipc.CallTimeoutException: 
> Call[id=88369369,methodName=Get], waitTime=5006, rpcTimeout=5000
> at org.apache.hadoop.hbase.ipc.IPCUtil.wrapException(IPCUtil.java:209)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractRpcClient.java:378)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$100(AbstractRpcClient.java:89)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:409)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:405)
> at org.apache.hadoop.hbase.ipc.Call.setTimeout(Call.java:110)
> at 
> org.apache.hadoop.hbase.ipc.RpcConnection$1.run(RpcConnection.java:136)
> at 
> org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:672)
> at 
> org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:747)
> at 
> org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:472)
> ... 1 more
> Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: 
> Call[id=88369369,methodName=Get], waitTime=5006, rpcTimeout=5000
> at 
> org.apache.hadoop.hbase.ipc.RpcConnection$1.run(RpcConnection.java:137)
> ... 4 more
> {code}
> I wanted the region it was hitting. I wanted to know if it was a server 
> problem or a Region issue. If clients only having issue w/ one Region, then I 
> could focus on it.
> After the PR the exception (from another context) looks like this:
> {code}
> org.apache.hadoop.hbase.ipc.CallTimeoutException: Call to 
> address=127.0.0.1:12345, regionInfo=hbase:meta,,1.1588230740 failed on local 
> exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: error
> 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HBASE-25735) Add target Region to connection exceptions

2021-04-06 Thread Michael Stack (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-25735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25735.
---
Fix Version/s: 2.4.3
   2.5.0
   3.0.0-alpha-1
 Hadoop Flags: Reviewed
   Resolution: Fixed

Merged to branch-2.4+. Thanks for review [~wchevreuil]

> Add target Region to connection exceptions
> --
>
> Key: HBASE-25735
> URL: https://issues.apache.org/jira/browse/HBASE-25735
> Project: HBase
>  Issue Type: Bug
>  Components: rpc
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.5.0, 2.4.3
>
>
> We spent a bit of time making it so exceptions included the remote host name. 
> Looks like we can add the target Region name too with a bit of manipulation; 
> will help figuring hot-spotting or problem Region on serverside.  For 
> example, here is what I was seeing recently on client-side when a RS was was 
> timing out requests:
> {code}
> 2021-04-06T02:18:23.533Z, RpcRetryingCaller{globalStartTime=1617675482894, 
> pause=100, maxAttempts=4}, org.apache.hadoop.hbase.ipc.CallTimeoutException: 
> Call to ps0989.example.org/1.1.1.1:16020 failed on local exception: 
> org.apache.hadoop.hbase.ipc.CallTimeoutException: 
> Call[id=88369369,methodName=Get], waitTime=5006, rpcTimeout=5000
> at 
> org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:145)
> at org.apache.hadoop.hbase.client.HTable.get(HTable.java:383)
> at org.apache.hadoop.hbase.client.HTable.get(HTable.java:357)
> ...
> Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call to 
> ps0989.bot.parsec.apple.com/17.58.114.206:16020 failed on local exception: 
> org.apache.hadoop.hbase.ipc.CallTimeoutException: 
> Call[id=88369369,methodName=Get], waitTime=5006, rpcTimeout=5000
> at org.apache.hadoop.hbase.ipc.IPCUtil.wrapException(IPCUtil.java:209)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractRpcClient.java:378)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$100(AbstractRpcClient.java:89)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:409)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:405)
> at org.apache.hadoop.hbase.ipc.Call.setTimeout(Call.java:110)
> at 
> org.apache.hadoop.hbase.ipc.RpcConnection$1.run(RpcConnection.java:136)
> at 
> org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:672)
> at 
> org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:747)
> at 
> org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:472)
> ... 1 more
> Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: 
> Call[id=88369369,methodName=Get], waitTime=5006, rpcTimeout=5000
> at 
> org.apache.hadoop.hbase.ipc.RpcConnection$1.run(RpcConnection.java:137)
> ... 4 more
> {code}
> I wanted the region it was hitting. I wanted to know if it was a server 
> problem or a Region issue. If clients only having issue w/ one Region, then I 
> could focus on it.
> After the PR the exception (from another context) looks like this:
> {code}
> org.apache.hadoop.hbase.ipc.CallTimeoutException: Call to 
> address=127.0.0.1:12345, regionInfo=hbase:meta,,1.1588230740 failed on local 
> exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: error
> 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HBASE-25713) Make an hbase-wal module

2021-04-06 Thread Michael Stack (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-25713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25713.
---
Resolution: Won't Fix

Resolving as failed experiment

> Make an hbase-wal module
> 
>
> Key: HBASE-25713
> URL: https://issues.apache.org/jira/browse/HBASE-25713
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Michael Stack
>Priority: Major
>
> Extract an hbase-wal module upon which hbase-server can depend; makes 
> hbase-server smaller and maybe we could do an hbase-wal standalone... This is 
> an experiment.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HBASE-25735) Add target Region to connection exceptions

2021-04-06 Thread Michael Stack (Jira)

Michael Stack created HBASE-25735:
-

 Summary: Add target Region to connection exceptions
 Key: HBASE-25735
 URL: https://issues.apache.org/jira/browse/HBASE-25735
 Project: HBase
  Issue Type: Bug
  Components: rpc
Reporter: Michael Stack
Assignee: Michael Stack


We spent a bit of time making it so exceptions included the remote host name. 
Looks like we can add the target Region name too with a bit of manipulation; 
will help figuring hot-spotting or problem Region on serverside.  For example, 
here is what I was seeing recently on client-side when a RS was was timing out 
requests:

{code}
2021-04-06T02:18:23.533Z, RpcRetryingCaller{globalStartTime=1617675482894, 
pause=100, maxAttempts=4}, org.apache.hadoop.hbase.ipc.CallTimeoutException: 
Call to ps0989.example.org/1.1.1.1:16020 failed on local exception: 
org.apache.hadoop.hbase.ipc.CallTimeoutException: 
Call[id=88369369,methodName=Get], waitTime=5006, rpcTimeout=5000
at 
org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:145)
at org.apache.hadoop.hbase.client.HTable.get(HTable.java:383)
at org.apache.hadoop.hbase.client.HTable.get(HTable.java:357)
...
Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call to 
ps0989.bot.parsec.apple.com/17.58.114.206:16020 failed on local exception: 
org.apache.hadoop.hbase.ipc.CallTimeoutException: 
Call[id=88369369,methodName=Get], waitTime=5006, rpcTimeout=5000
at org.apache.hadoop.hbase.ipc.IPCUtil.wrapException(IPCUtil.java:209)
at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractRpcClient.java:378)
at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$100(AbstractRpcClient.java:89)
at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:409)
at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:405)
at org.apache.hadoop.hbase.ipc.Call.setTimeout(Call.java:110)
at 
org.apache.hadoop.hbase.ipc.RpcConnection$1.run(RpcConnection.java:136)
at 
org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:672)
at 
org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:747)
at 
org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:472)
... 1 more
Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: 
Call[id=88369369,methodName=Get], waitTime=5006, rpcTimeout=5000
at 
org.apache.hadoop.hbase.ipc.RpcConnection$1.run(RpcConnection.java:137)
... 4 more
{code}

I wanted the region it was hitting. I wanted to know if it was a server problem 
or a Region issue. If clients only having issue w/ one Region, then I could 
focus on it.

After the PR the exception (from another context) looks like this:

{code}
org.apache.hadoop.hbase.ipc.CallTimeoutException: Call to 
address=127.0.0.1:12345, regionInfo=hbase:meta,,1.1588230740 failed on local 
exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: error

{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HBASE-25558) Adding audit log for execMasterService

2021-03-31 Thread Michael Stack (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-25558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25558.
---
Fix Version/s: 2.4.3
   2.5.0
   3.0.0-alpha-1
 Hadoop Flags: Reviewed
   Resolution: Fixed

Thank you for the improvement [~xiaoheipangzi]

> Adding audit log for execMasterService
> --
>
> Key: HBASE-25558
> URL: https://issues.apache.org/jira/browse/HBASE-25558
> Project: HBase
>  Issue Type: Improvement
>Reporter: lujie
>Assignee: lujie
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.5.0, 2.4.3
>
>
> Hi:
> I have found that in APIs, like execProcedure and execProcedureWithRet, have 
> audit log to record who execute the master service. The log can be like:
> {code:java}
> LOG.info(master.getClientIdAuditPrefix() + " procedure request for: " + 
> desc.getSignature());
> {code}
> But it seems that we forget to audit execMasterService. We should add one.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HBASE-25713) Make an hbase-wal module

2021-03-29 Thread Michael Stack (Jira)

Michael Stack created HBASE-25713:
-

 Summary: Make an hbase-wal module
 Key: HBASE-25713
 URL: https://issues.apache.org/jira/browse/HBASE-25713
 Project: HBase
  Issue Type: Sub-task
Reporter: Michael Stack


Extract an hbase-wal module upon which hbase-server can depend; makes 
hbase-server smaller and maybe we could do an hbase-wal standalone... This is 
an experiment.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HBASE-25670) Backport HBASE-25665 to branch-1

2021-03-29 Thread Michael Stack (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-25670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25670.
---
Fix Version/s: 1.7.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

Merged to branch-1. Thanks for the PR [~lineyshinya] (Lets keep an eye on this 
one in the nightlies to make sure no unexpected consequence...   
https://ci-hadoop.apache.org/view/HBase/job/HBase/job/HBase%20Nightly/job/branch-1/
 )

> Backport HBASE-25665 to branch-1
> 
>
> Key: HBASE-25670
> URL: https://issues.apache.org/jira/browse/HBASE-25670
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Shinya Yoshida
>Assignee: Shinya Yoshida
>Priority: Major
> Fix For: 1.7.0
>
>
> Backport 
> [https://github.com/apache/hbase/commit/ebb0adf50009fc133af0cfb0bdce4dfbb81d4fbf]
>  for https://issues.apache.org/jira/browse/HBASE-25665 to branch-1



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HBASE-25692) Failure to instantiate WALCellCodec leaks socket in replication

2021-03-29 Thread Michael Stack (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-25692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25692.
---
Hadoop Flags: Reviewed
  Resolution: Fixed

Merged to 2.3+. Shout if you want it to go elsewhere [~elserj].

> Failure to instantiate WALCellCodec leaks socket in replication
> ---
>
> Key: HBASE-25692
> URL: https://issues.apache.org/jira/browse/HBASE-25692
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 2.1.0, 2.2.0, 2.1.1, 2.1.2, 2.1.3, 2.3.0, 2.3.1, 2.1.4, 
> 2.0.6, 2.1.5, 2.2.1, 2.1.6, 2.1.7, 2.2.2, 2.1.8, 2.2.3, 2.3.3, 2.1.9, 2.2.4, 
> 2.4.0, 2.2.5, 2.2.6, 2.3.2, 2.3.4, 2.4.1, 2.4.2
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.5.0, 2.4.3, 2.3.6
>
>
> I was looking at an HBase user's cluster with [~danilocop] where they saw two 
> otherwise identical clusters where one of them was regularly had sockets in 
> CLOSE_WAIT going from RegionServers to a distributed storage appliance.
> After a lot of analysis, we eventually figured out that these sockets in 
> CLOSE_WAIT were directly related to an FSDataInputStream which we forgot to 
> close inside of the RegionServer. The subtlety was that only one of these 
> HBase clusters was set up to do replication (to the other cluster). The HBase 
> cluster experiencing this problem was shipping edits to a peer, and had 
> previously been using Phoenix. At some point, the cluster had Phoenix removed 
> from it.
> What we found was that replication still had WALs to ship which were for 
> Phoenix tables. Phoenix, in this version, still used the custom WALCellCodec; 
> however, this codec class was missing from the RS classpath after the owner 
> of the cluster removed Phoenix.
> When we try to instantiate the Codec implementation via ReflectionUtils, we 
> end up throwing an UnsupportedOperationException which wraps a 
> NoClassDefFoundException. However, in WALFactory, we _only_ close the 
> FSDataInputStream when we catch an IOException. 
> Thus, replication sits in a "fast" loop, trying to ship these edits, each 
> time leaking a new socket because of the InputStream not being closed. There 
> is an obvious workaround for this specific issue, but we should not leak this 
> inside HBase.
> Approximate, 2.1.x stack trace which lead us to this is below.
> {noformat}
> 2021-03-11 18:19:20,364 ERROR 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALReader: 
> Failed to read stream of replication entries
> java.io.IOException: Cannot get log reader
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:366)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:303)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:291)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:427)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.openReader(WALEntryStream.java:354)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.openNextLog(WALEntryStream.java:302)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.checkReader(WALEntryStream.java:293)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.tryAdvanceEntry(WALEntryStream.java:174)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.hasNext(WALEntryStream.java:100)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALReader.readWALEntries(ReplicationSourceWALReader.java:192)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALReader.run(ReplicationSourceWALReader.java:138)
> Caused by: java.lang.UnsupportedOperationException: Unable to find 
> org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec
>   at 
> org.apache.hadoop.hbase.util.ReflectionUtils.instantiateWithCustomCtor(ReflectionUtils.java:47)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.WALCellCodec.create(WALCellCodec.java:106)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.getCodec(ProtobufLogReader.java:301)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.initAfterCompression(ProtobufLogReader.java:311)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.init(ReaderBase.java:81)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.init(ProtobufLogReader.java:168)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:321)
>   ... 10 more
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec
>

[jira] [Resolved] (HBASE-25707) When restoring a table, create a namespace if it does not exist

2021-03-29 Thread Michael Stack (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-25707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25707.
---
Fix Version/s: 3.0.0-alpha-1
 Hadoop Flags: Reviewed
   Resolution: Fixed

Merged to master. Reviewed by [~wchevreuil]. Thanks for the PR [~shenshengli]

> When restoring a table, create a namespace if it does not exist
> ---
>
> Key: HBASE-25707
> URL: https://issues.apache.org/jira/browse/HBASE-25707
> Project: HBase
>  Issue Type: Bug
>  Components: backuprestore
>Affects Versions: 2.0.0
>Reporter: shenshengli
>Assignee: shenshengli
>Priority: Minor
> Fix For: 3.0.0-alpha-1
>
>
> It does not seem to have been taken into account that the namespace of the 
> table to be restored does not exist in the target environment, and if the 
> namespace does not exist, it will simply throw an error 
> (NamespaceNotFoundException ), which is unfriendly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HBASE-25705) Convert proto to RSGroupInfo is costly

2021-03-29 Thread Michael Stack (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-25705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25705.
---
Fix Version/s: 3.0.0-alpha-1
 Hadoop Flags: Reviewed
   Resolution: Fixed

Merged to master branch. It won't backport w/o complaint. Open sub-task if you 
have PRs for backports [~mokai87]. Thanks for the PR.

> Convert proto to RSGroupInfo is costly
> --
>
> Key: HBASE-25705
> URL: https://issues.apache.org/jira/browse/HBASE-25705
> Project: HBase
>  Issue Type: Improvement
>  Components: rsgroup
>Affects Versions: 3.0.0-alpha-1, 2.2.3
>Reporter: mokai
>Assignee: mokai
>Priority: Minor
> Fix For: 3.0.0-alpha-1
>
>
> Convert RSGroupProtos.RSGroupInfo to RSGroupInfo is costly if the RSGroup has 
> too many RSs and tables. 
> We can use parallelStream to handle the HBaseProtos.ServerName list and 
> TableProtos.TableName list in ProtubufUtil#toGroupInfo as blow.
> {quote}Collection addresses = proto.getServersList()
>  .parallelStream()
>  .map(server -> Address.fromParts(server.getHostName(), server.getPort()))
>  .collect(Collectors.toList());
> Collection tables = proto.getTablesList()
>  .parallelStream()
>  .map(tableName -> ProtobufUtil.toTableName(tableName))
>  .collect(Collectors.toList());
> {quote}
> Get the RSGroupInfo which has 9 RS and 20k tables, the time cost reduced from 
> 6038 ms to 684 ms.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HBASE-25710) During the recovery process, an error is thrown if there is an incremental backup of data that has not been updated

2021-03-29 Thread Michael Stack (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-25710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25710.
---
Fix Version/s: 3.0.0-alpha-1
 Hadoop Flags: Reviewed
   Resolution: Fixed

Merged to master. Thanks for PR [~shenshengli]

> During the recovery process, an error is thrown if there is an incremental 
> backup of data that has not been updated
> ---
>
> Key: HBASE-25710
> URL: https://issues.apache.org/jira/browse/HBASE-25710
> Project: HBase
>  Issue Type: Bug
>  Components: backuprestore
>Affects Versions: 2.0.0
>Reporter: shenshengli
>Assignee: shenshengli
>Priority: Minor
> Fix For: 3.0.0-alpha-1
>
>
> The error is shown below:
> 19:49:24.213 [main] ERROR org.apache.hadoop.hbase.backup.RestoreDriver - 
> Error while running restore backup
> java.io.IOException: Can not restore from backup directory (check Hadoop and 
> HBase logs)
>  at 
> org.apache.hadoop.hbase.backup.mapreduce.MapReduceRestoreJob.run(MapReduceRestoreJob.java:110)
>  ~[hbase-backup-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hbase.backup.util.RestoreTool.incrementalRestoreTable(RestoreTool.java:202)
>  ~[hbase-backup-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hbase.backup.impl.RestoreTablesClient.restoreImages(RestoreTablesClient.java:178)
>  ~[hbase-backup-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hbase.backup.impl.RestoreTablesClient.restore(RestoreTablesClient.java:221)
>  ~[hbase-backup-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hbase.backup.impl.RestoreTablesClient.execute(RestoreTablesClient.java:258)
>  ~[hbase-backup-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hbase.backup.impl.BackupAdminImpl.restore(BackupAdminImpl.java:520)
>  ~[hbase-backup-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hbase.backup.RestoreDriver.parseAndRun(RestoreDriver.java:179)
>  [hbase-backup-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hbase.backup.RestoreDriver.doWork(RestoreDriver.java:220) 
> [hbase-backup-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>  at org.apache.hadoop.hbase.backup.RestoreDriver.run(RestoreDriver.java:256) 
> [hbase-backup-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) 
> [hadoop-common-3.1.1.3.0.1.0-187.jar:?]
>  at org.apache.hadoop.hbase.backup.RestoreDriver.main(RestoreDriver.java:228) 
> [hbase-backup-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> Caused by: java.io.IOException: No input paths specified in job



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HBASE-25695) Link to the filter on hbase:meta from user tables panel on master page

2021-03-27 Thread Michael Stack (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-25695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25695.
---
Fix Version/s: 2.3.6
   2.4.3
   2.5.0
   3.0.0-alpha-1
 Hadoop Flags: Reviewed
 Assignee: Michael Stack
   Resolution: Fixed

Pushed to branch-2.3+. Thanks for review [~ndimiduk]

> Link to the filter on hbase:meta from user tables panel on master page
> --
>
> Key: HBASE-25695
> URL: https://issues.apache.org/jira/browse/HBASE-25695
> Project: HBase
>  Issue Type: Sub-task
>  Components: UI
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.5.0, 2.4.3, 2.3.6
>
> Attachments: image-2021-03-24-21-41-11-393.png, 
> image-2021-03-24-21-42-16-355.png, image-2021-03-24-21-43-24-426.png
>
>
> This is follow-on to the parent issue that added nice filtering mechanism on 
> hbase:meta table. Parent allows finding all Regions in Table XYZ with state 
> OPENING or FAILED_CLOSED.
> The user table panel on the master home page has counts of Regions in each 
> state. The opening and closing counts actually have links under them but they 
> are useless currently as they only show RITs that are CLOSING or OPENING; 
> good but not comprehensive enough.
> This PR adds links under all counts so you can see all CLOSING Regions 
> whether RIT or not; useful when doing fixup on a corrupt cluster.  Adds a bit 
> of help text that tells users about the filter-on-meta feature too.
> Here is how the panel currently looks:
>  !image-2021-03-24-21-41-11-393.png! 
> Here is what it looks like now with the bit of help text
>  !image-2021-03-24-21-42-16-355.png! 
> When you click on the CLOSED number -- '1' in this case -- this where you go 
> to:
>  !image-2021-03-24-21-43-24-426.png! 
> i..e. lists all Regions in the TestTable that are in the CLOSED state (not 
> very pretty with the 'Table Stats' and 'Table Regions' preamble but better 
> than what was there before).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HBASE-25695) Link to the filter on hbase:meta from user tables panel on master page

2021-03-24 Thread Michael Stack (Jira)

Michael Stack created HBASE-25695:
-

 Summary: Link to the filter on hbase:meta from user tables panel 
on master page
 Key: HBASE-25695
 URL: https://issues.apache.org/jira/browse/HBASE-25695
 Project: HBase
  Issue Type: Sub-task
  Components: UI
Reporter: Michael Stack
 Attachments: image-2021-03-24-21-41-11-393.png, 
image-2021-03-24-21-42-16-355.png, image-2021-03-24-21-43-24-426.png

This is follow-on to the parent issue that added nice filtering mechanism on 
hbase:meta table. Parent allows finding all Regions in Table XYZ with state 
OPENING or FAILED_CLOSED.

The user table panel on the master home page has counts of Regions in each 
state. The opening and closing counts actually have links under them but they 
are useless currently as they only show RITs that are CLOSING or OPENING; good 
but not comprehensive enough.

This PR adds links under all counts so you can see all CLOSING Regions whether 
RIT or not; useful when doing fixup on a corrupt cluster.  Adds a bit of help 
text that tells users about the filter-on-meta feature too.

Here is how the panel currently looks:

 !image-2021-03-24-21-41-11-393.png! 

Here is what it looks like now with the bit of help text

 !image-2021-03-24-21-42-16-355.png! 


When you click on the CLOSED number -- '1' in this case -- this where you go to:

 !image-2021-03-24-21-43-24-426.png! 

i..e. lists all Regions in the TestTable that are in the CLOSED state (not very 
pretty with the 'Table Stats' and 'Table Regions' preamble but better than what 
was there before).




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HBASE-25676) Move generic classes from hbase-server to hbase-common

2021-03-23 Thread Michael Stack (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-25676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25676.
---
Resolution: Won't Fix

Resolving as "won't fix"

Let me just close this. Most of the classes moved here are used by hbase-server 
only. Even though a bunch of theses classes are generic and could be used 
elsewhere other than by hbase-server AND even though a good portion of the 
content of hbase-common is currently only used by hbase-server, lets favor 
coherent, contained modules. Closing as wrong direction.

Thanks for reviews [~zhangduo] and @dupg

> Move generic classes from hbase-server to hbase-common
> --
>
> Key: HBASE-25676
> URL: https://issues.apache.org/jira/browse/HBASE-25676
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Michael Stack
>Priority: Major
>
> There's a bunch of classes that are not hbase-server specific on cursory 
> review that could live in hbase-common... not many, about 3% of src/main/java 
> but move them out.
> {code}
>   rename {hbase-server => 
> hbase-common}/src/main/java/org/apache/hadoop/hbase/SslRMIClientSocketFactorySecure.java
>  (99%)
>   rename {hbase-server => 
> hbase-common}/src/main/java/org/apache/hadoop/hbase/SslRMIServerSocketFactorySecure.java
>  (99%)
>   rename {hbase-server/src/main/java/org/apache/hadoop/hbase => 
> hbase-common/src/main/java/org/apache/hadoop/hbase/healthcheck}/HealthCheckChore.java
>  (93%)
>   rename {hbase-server/src/main/java/org/apache/hadoop/hbase => 
> hbase-common/src/main/java/org/apache/hadoop/hbase/healthcheck}/HealthChecker.java
>  (86%)
>   rename {hbase-server/src/main/java/org/apache/hadoop/hbase => 
> hbase-common/src/main/java/org/apache/hadoop/hbase/healthcheck}/HealthReport.java
>  (94%)
>   rename {hbase-server/src/test/java/org/apache/hadoop/hbase => 
> hbase-common/src/test/java/org/apache/hadoop/hbase/healthcheck}/TestNodeHealthCheckChore.java
>  (86%)
>   delete mode 100644 
> hbase-server/src/main/java/org/apache/hadoop/hbase/DaemonThreadFactory.java
>   rename {hbase-server => 
> hbase-common}/src/main/java/org/apache/hadoop/hbase/security/SecurityUtil.java
>  (100%)
>   rename {hbase-server => 
> hbase-common}/src/main/java/org/apache/hadoop/hbase/util/ConfigurationUtil.java
>  (99%)
>   rename {hbase-server => 
> hbase-common}/src/main/java/org/apache/hadoop/hbase/util/DirectMemoryUtils.java
>  (100%)
>   rename {hbase-server => 
> hbase-common}/src/main/java/org/apache/hadoop/hbase/util/GetJavaProperty.java 
> (100%)
>   rename {hbase-server => 
> hbase-common}/src/main/java/org/apache/hadoop/hbase/util/HBaseConfTool.java 
> (100%)
>   rename {hbase-server => 
> hbase-common}/src/main/java/org/apache/hadoop/hbase/util/HashedBytes.java 
> (100%)
>   rename {hbase-server => 
> hbase-common}/src/main/java/org/apache/hadoop/hbase/util/IdReadWriteLock.java 
> (100%)
>   rename {hbase-server => 
> hbase-common}/src/main/java/org/apache/hadoop/hbase/util/JvmVersion.java 
> (100%)
>   rename {hbase-server => 
> hbase-common}/src/main/java/org/apache/hadoop/hbase/util/KeyRange.java (100%)
>   rename {hbase-server => 
> hbase-common}/src/main/java/org/apache/hadoop/hbase/util/LossyCounting.java 
> (100%)
>   rename {hbase-server => 
> hbase-common}/src/main/java/org/apache/hadoop/hbase/util/ManualEnvironmentEdge.java
>  (100%)
>   rename {hbase-server => 
> hbase-common}/src/main/java/org/apache/hadoop/hbase/util/MunkresAssignment.java
>  (100%)
>   rename {hbase-server => 
> hbase-common}/src/main/java/org/apache/hadoop/hbase/util/NettyEventLoopGroupConfig.java
>  (100%)
>   rename {hbase-server => 
> hbase-common}/src/main/java/org/apache/hadoop/hbase/util/RegionSplitCalculator.java
>  (100%)
>   rename {hbase-server => 
> hbase-common}/src/main/java/org/apache/hadoop/hbase/util/RollingStatCalculator.java
>  (99%)
>   rename {hbase-server => 
> hbase-common}/src/main/java/org/apache/hadoop/hbase/util/ShutdownHookManager.java
>  (100%)
>   rename {hbase-server => 
> hbase-common}/src/main/java/org/apache/hadoop/hbase/util/SortedList.java 
> (100%)
>   rename {hbase-server => 
> hbase-common}/src/main/java/org/apache/hadoop/hbase/util/StealJobQueue.java 
> (100%)
>   rename {hbase-server => 
> hbase-common}/src/test/java/org/apache/hadoop/hbase/util/TestConfigurationUtil.java
>  (100%)
>   rename {hbase-server => 
> hbase-common}/src/test/java/org/apache/hadoop/hbase/util/TestIdReadWriteLock.java
>  (100%)
>   rename {hbase-server => 
> hbase-common}/src/test/java/org/apache/hadoop/hbase/util/TestLossyCounting.java
>  (100%)
>   rename {hbase-server => 
> hbase-common}/src/test/java/org/apache/hadoop/hbase/util/TestRegionSplitCalculator.java
>  (95%)
>   rename {hbase-server => 
> hbase-common}/src/test/java/org/apache/hadoop/hbase/util/TestSortedList.java 
> (100%)
>   rename

[jira] [Resolved] (HBASE-25685) asyncprofiler2.0 no longer supports svg; wants html

2021-03-22 Thread Michael Stack (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-25685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25685.
---
Fix Version/s: 2.4.3
   2.3.5
   2.5.0
   3.0.0-alpha-1
 Hadoop Flags: Reviewed
 Release Note: 
If asyncprofiler 1.x, all is good. If asyncprofiler 2.x and it is hbase-2.3.x 
or hbase-2.4.x, add '?output=html' to get flamegraphs from the profiler.

Otherwise, if hbase-2.5+ and asyncprofiler2, all works. If asyncprofiler1 and 
hbase-2.5+, you may have to add '?output=svg' to the query.
   Resolution: Fixed

Thanks for the review [~weichiu]. Pushed #3079 on branch-2.3+branch-2.4. Pushed 
#3078 on branch-2 and master.

> asyncprofiler2.0 no longer supports svg; wants html
> ---
>
> Key: HBASE-25685
> URL: https://issues.apache.org/jira/browse/HBASE-25685
> Project: HBase
>  Issue Type: Bug
>Reporter: Michael Stack
>Assignee: Wei-Chiu Chuang
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.5.0, 2.3.5, 2.4.3
>
>
> asyncprofiler2.0 is out. Its a nice tool. Unfortunately, it dropped the svg 
> formatting option that we use in our servlet. Now it wants you  to pass html. 
> Lets fix.
> Old -o on asyncprofiler1.x
> -o fmtoutput format: summary|traces|flat|collapsed|svg|tree|jfr
> New -o asyncprofiler 2.x
> -o fmtoutput format: flat|traces|collapsed|flamegraph|tree|jfr
> If you pass svg to 2.0, it does nothing ... If you run the command hbase is 
> running you see:
> {code}
> /tmp/prof-output$ sudo -u hbase /usr/lib/async-profiler/profiler.sh -e cpu -d 
> 10 -o svg -f /tmp/prof-output/async-prof-pid-8346-cpu-1x.svg 8346
> [ERROR] SVG format is obsolete, use .html for FlameGraph
> {code}
> At a minimum can make it so the OUTPUT param supports HTML. Here is current 
> enum state:
> {code}
>   enum Output {
> SUMMARY,
> TRACES,
> FLAT,
> COLLAPSED,
> SVG,
> TREE,
> JFR
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HBASE-25672) Backport HBASE-25608 to branch-1

2021-03-22 Thread Michael Stack (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-25672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25672.
---
Fix Version/s: 1.7.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

Merged to branch-1. Thanks for the PR [~lineyshinya]

> Backport HBASE-25608 to branch-1
> 
>
> Key: HBASE-25672
> URL: https://issues.apache.org/jira/browse/HBASE-25672
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Shinya Yoshida
>Assignee: Shinya Yoshida
>Priority: Major
> Fix For: 1.7.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HBASE-25683) Simplify UTs using DummyServer

2021-03-22 Thread Michael Stack (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-25683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25683.
---
Fix Version/s: 3.0.0-alpha-1
 Hadoop Flags: Reviewed
   Resolution: Fixed

Merged. Nice cleanup. Thanks for the PR [~Ddupg]

> Simplify UTs using DummyServer
> --
>
> Key: HBASE-25683
> URL: https://issues.apache.org/jira/browse/HBASE-25683
> Project: HBase
>  Issue Type: Test
>  Components: test
>Affects Versions: 3.0.0-alpha-1
>Reporter: Sun Xin
>Assignee: Sun Xin
>Priority: Trivial
> Fix For: 3.0.0-alpha-1
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HBASE-25594) graceful_stop.sh fails to unload regions when ran at localhost

2021-03-20 Thread Michael Stack (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-25594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25594.
---
Resolution: Fixed

Pushed addendum on branch-2.4+

> graceful_stop.sh fails to unload regions when ran at localhost
> --
>
> Key: HBASE-25594
> URL: https://issues.apache.org/jira/browse/HBASE-25594
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha-1, 1.4.13
>Reporter: Javier Akira Luca de Tena
>Assignee: Javier Akira Luca de Tena
>Priority: Minor
> Fix For: 3.0.0-alpha-1, 2.5.0, 2.4.3
>
>
> We usually use graceful_stop.sh from the Master to restart RegionServers. 
> However, in some scenarios we may not have privileges to restart remote 
> RegionServers (it uses ssh).
>  But we can still use graceful_stop.sh on the same host we want to restart.
> In order to detect the execution at localhost, graceful_stop.sh uses 
> /bin/hostname.
>  
> [https://github.com/apache/hbase/blob/cfbae4d3a37e7ac4d795461c3e19406a2786838d/bin/graceful_stop.sh#L106-L110]
> When RegionMover strips the host to not include it in the list of target 
> hosts, we filter it out by checking all RegionServer hosts in the cluster:
>  
> [https://github.com/apache/hbase/blob/branch-2/hbase-server/src/main/java/org/apache/hadoop/hbase/util/RegionMover.java#L382-L384]
>  
> [https://github.com/apache/hbase/blob/cfbae4d3a37e7ac4d795461c3e19406a2786838d/hbase-server/src/main/java/org/apache/hadoop/hbase/util/RegionMover.java#L692]
> But the list of RegionServer hosts returned by Admin#getRegionServers are 
> FDQN, while the hostname provided from graceful_stop.sh is not FDQN, making 
> the comparison fail.
> Same happens for branch-1 region_mover.rb, which is the place I reproduced in 
> my environment: 
> [https://github.com/apache/hbase/blob/f9a91488b2c39320bed502619bf7adb765c79de6/bin/region_mover.rb#L305]
> [https://github.com/apache/hbase/blob/f9a91488b2c39320bed502619bf7adb765c79de6/bin/region_mover.rb#L175]
>  
> [https://github.com/apache/hbase/blob/f9a91488b2c39320bed502619bf7adb765c79de6/bin/region_mover.rb#L186-L192]
>  
> This can be fixed just by using "/bin/hostname -f" in the graceful_stop.sh 
> script.
> Will provide patch soon.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Reopened] (HBASE-25594) graceful_stop.sh fails to unload regions when ran at localhost

2021-03-20 Thread Michael Stack (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-25594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack reopened HBASE-25594:
---

Reopen to apply addendum below
{code}
commit 326835e8372cc83092e0ec127650438ff153476a (HEAD -> m, origin/master, 
origin/HEAD)
Author: stack 
Date:   Sat Mar 20 13:47:18 2021 -0700

HBASE-25594 Make easier to use graceful_stop on localhost mode (#3054)
Addendum.

diff --git a/bin/graceful_stop.sh b/bin/graceful_stop.sh
index 05919ce72d..fc18239830 100755
--- a/bin/graceful_stop.sh
+++ b/bin/graceful_stop.sh
@@ -105,9 +105,6 @@ filename="/tmp/$hostname"
 local=
 localhostname=`/bin/hostname -f`

-if [ "$localhostname" == "$hostname" ]; then
-  local=true
-fi
 if [ "$localhostname" == "$hostname" ] || [ "$hostname" == "localhost" ]; then
   local=true
   hostname=$localhostname
{code}

> graceful_stop.sh fails to unload regions when ran at localhost
> --
>
> Key: HBASE-25594
> URL: https://issues.apache.org/jira/browse/HBASE-25594
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha-1, 1.4.13
>Reporter: Javier Akira Luca de Tena
>Assignee: Javier Akira Luca de Tena
>Priority: Minor
> Fix For: 3.0.0-alpha-1, 2.5.0, 2.4.3
>
>
> We usually use graceful_stop.sh from the Master to restart RegionServers. 
> However, in some scenarios we may not have privileges to restart remote 
> RegionServers (it uses ssh).
>  But we can still use graceful_stop.sh on the same host we want to restart.
> In order to detect the execution at localhost, graceful_stop.sh uses 
> /bin/hostname.
>  
> [https://github.com/apache/hbase/blob/cfbae4d3a37e7ac4d795461c3e19406a2786838d/bin/graceful_stop.sh#L106-L110]
> When RegionMover strips the host to not include it in the list of target 
> hosts, we filter it out by checking all RegionServer hosts in the cluster:
>  
> [https://github.com/apache/hbase/blob/branch-2/hbase-server/src/main/java/org/apache/hadoop/hbase/util/RegionMover.java#L382-L384]
>  
> [https://github.com/apache/hbase/blob/cfbae4d3a37e7ac4d795461c3e19406a2786838d/hbase-server/src/main/java/org/apache/hadoop/hbase/util/RegionMover.java#L692]
> But the list of RegionServer hosts returned by Admin#getRegionServers are 
> FDQN, while the hostname provided from graceful_stop.sh is not FDQN, making 
> the comparison fail.
> Same happens for branch-1 region_mover.rb, which is the place I reproduced in 
> my environment: 
> [https://github.com/apache/hbase/blob/f9a91488b2c39320bed502619bf7adb765c79de6/bin/region_mover.rb#L305]
> [https://github.com/apache/hbase/blob/f9a91488b2c39320bed502619bf7adb765c79de6/bin/region_mover.rb#L175]
>  
> [https://github.com/apache/hbase/blob/f9a91488b2c39320bed502619bf7adb765c79de6/bin/region_mover.rb#L186-L192]
>  
> This can be fixed just by using "/bin/hostname -f" in the graceful_stop.sh 
> script.
> Will provide patch soon.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HBASE-25685) asyncprofiler2.0 no longer supports svg; wants html

2021-03-19 Thread Michael Stack (Jira)

Michael Stack created HBASE-25685:
-

 Summary: asyncprofiler2.0 no longer supports svg; wants html
 Key: HBASE-25685
 URL: https://issues.apache.org/jira/browse/HBASE-25685
 Project: HBase
  Issue Type: Bug
Reporter: Michael Stack


asyncprofiler2.0 is out. Its a nice tool. Unfortunately, it dropped the svg 
formatting option that we use in our servlet. Now it wants you  to pass html. 
Lets fix.

Old -o on asyncprofiler1.x
-o fmtoutput format: summary|traces|flat|collapsed|svg|tree|jfr

New -o asyncprofiler 2.x
-o fmtoutput format: flat|traces|collapsed|flamegraph|tree|jfr

If you pass svg to 2.0, it does nothing ... If you run the command hbase is 
running you see:

{code}
/tmp/prof-output$ sudo -u hbase /usr/lib/async-profiler/profiler.sh -e cpu -d 
10 -o svg -f /tmp/prof-output/async-prof-pid-8346-cpu-1x.svg 8346
[ERROR] SVG format is obsolete, use .html for FlameGraph
{code}

At a minimum can make it so the OUTPUT param supports HTML. Here is current 
enum state:

{code}
  enum Output {
SUMMARY,
TRACES,
FLAT,
COLLAPSED,
SVG,
TREE,
JFR
  }
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HBASE-25681) Add a switch for server/table queryMeter

2021-03-19 Thread Michael Stack (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-25681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25681.
---
Resolution: Fixed

> Add a switch for server/table queryMeter
> 
>
> Key: HBASE-25681
> URL: https://issues.apache.org/jira/browse/HBASE-25681
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Baiqiang Zhao
>Assignee: Baiqiang Zhao
>Priority: Major
> Fix For: 3.0.0-alpha-1
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Reopened] (HBASE-25681) Add a switch for server/table queryMeter

2021-03-19 Thread Michael Stack (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-25681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack reopened HBASE-25681:
---

> Add a switch for server/table queryMeter
> 
>
> Key: HBASE-25681
> URL: https://issues.apache.org/jira/browse/HBASE-25681
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Baiqiang Zhao
>Assignee: Baiqiang Zhao
>Priority: Major
> Fix For: 3.0.0-alpha-1, 1.7.0, 2.5.0, 2.3.5, 2.4.3
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HBASE-25681) Add a switch for server/table queryMeter

2021-03-19 Thread Michael Stack (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-25681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25681.
---
Fix Version/s: 2.4.3
   2.3.5
   2.5.0
   1.7.0
   3.0.0-alpha-1
 Hadoop Flags: Reviewed
 Release Note: 
Adds "hbase.regionserver.enable.server.query.meter" and 
"hbase.regionserver.enable.table.query.meter" switches which are off by default.

Note, these counters used to be ON by default; now they are off.
   Resolution: Fixed

Merged to branch-1 and 2.3+. [~huaxiang] FYI. Thanks for fast turnaround 
[~DeanZ]

> Add a switch for server/table queryMeter
> 
>
> Key: HBASE-25681
> URL: https://issues.apache.org/jira/browse/HBASE-25681
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Baiqiang Zhao
>Assignee: Baiqiang Zhao
>Priority: Major
> Fix For: 3.0.0-alpha-1, 1.7.0, 2.5.0, 2.3.5, 2.4.3
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HBASE-25679) Size of log queue metric is incorrect in branch-1/branch-2

2021-03-19 Thread Michael Stack (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-25679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25679.
---
Fix Version/s: 2.4.3
   2.5.0
   1.7.0
   3.0.0-alpha-1
 Hadoop Flags: Reviewed
   Resolution: Fixed

Pushed on branch-1 and on branch-2.4+. Thanks for the fix [~shahrs87]

> Size of log queue metric is incorrect in branch-1/branch-2
> --
>
> Key: HBASE-25679
> URL: https://issues.apache.org/jira/browse/HBASE-25679
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.7.0, 2.5.0, 2.4.2
>Reporter: Rushabh Shah
>Assignee: Rushabh Shah
>Priority: Major
> Fix For: 3.0.0-alpha-1, 1.7.0, 2.5.0, 2.4.3
>
>
> In HBASE-25539 I did some refactoring for adding a new metric "oldestWalAge" 
> and tried to consolidate update to all the metrics related to 
> ReplicationSource class (size of log queue and oldest wal age) at one place.  
> In that refactoring introduced one bug where I am decrementing twice from 
> size of log queue metric whenever we remove a wal from Replication source 
> queue.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HBASE-25518) Support separate child regions to different region servers

2021-03-18 Thread Michael Stack (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-25518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25518.
---
Fix Version/s: 2.4.3
   2.5.0
   3.0.0-alpha-1
 Hadoop Flags: Reviewed
 Release Note: 
Config key for enable/disable automatically separate child regions to different 
region servers in the procedure of split regions. One child will be kept to the 
server where parent region is on, and the other child will be assigned to a 
random server.

hbase.master.auto.separate.child.regions.after.split.enabled

Default setting is false/off.
   Resolution: Fixed

Merged to branch-2.4+. Thanks for the feature [~Xiaolin Ha]. 

> Support separate child regions to different region servers
> --
>
> Key: HBASE-25518
> URL: https://issues.apache.org/jira/browse/HBASE-25518
> Project: HBase
>  Issue Type: Improvement
>Reporter: Xiaolin Ha
>Assignee: Xiaolin Ha
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.5.0, 2.4.3
>
>
> Hot/Large regions can be splitted automatically by some split policies. But 
> children regions will be both on the RS which owns the parent region. We can 
> support dividing child regions from the master side, maybe add a step at the 
> last of SplitTableRegionProcedure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HBASE-25643) The delayed FlushRegionEntry should be removed when we need a non-delayed one

2021-03-18 Thread Michael Stack (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-25643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25643.
---
Fix Version/s: 2.5.0
   3.0.0-alpha-1
 Hadoop Flags: Reviewed
   Resolution: Fixed

Merged to branch-2+. Thanks for the nice fix [~filtertip] It would not go back 
to branch-2.4 so if you'd like it to go to there, please make a backport 
subtask and attach a new PR please.
Thanks for review [~anoop.hbase]

> The delayed FlushRegionEntry should be removed when we need a non-delayed one
> -
>
> Key: HBASE-25643
> URL: https://issues.apache.org/jira/browse/HBASE-25643
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.5.0
>
>
> The regionserver periodically check all the regions, if one not flushed for 
> long time, then it will create a delayed FlushRegionEntry, the delay range is 
> 0~300s.
> During the delay time, if many data are written to the region suddenly, we 
> can not do the flush immediately due to the existing one in regionsInQueue, 
> then the RegionTooBusyException will occur.
> It is better to improve the logic here, that the delayed entry should be 
> replaced by the non-delayed one.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HBASE-25594) graceful_stop.sh fails to unload regions when ran at localhost

2021-03-18 Thread Michael Stack (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-25594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25594.
---
Resolution: Fixed

I pushed the below to 2.4+
{code}
I pushed this to 2.3+

commit 728d4f5ab12fd2631b1ef0a7c61203e9acfb05f0 (HEAD -> 2.3, origin/branch-2.3)
Author: Javier Akira Luca de Tena 
Date:   Fri Mar 19 04:04:54 2021 +0900

HBOPS-25594 Make easier to use graceful_stop on localhost mode (#3054)

Co-authored-by: Javier 

diff --git a/bin/graceful_stop.sh b/bin/graceful_stop.sh
index 89e3dd939c..e565929606 100755
--- a/bin/graceful_stop.sh
+++ b/bin/graceful_stop.sh
@@ -32,7 +32,7 @@ moving regions"
   echo " maxthreads xx  Limit the number of threads used by the region mover. 
Default value is 1."
   echo " movetimeout xx Timeout for moving regions. If regions are not moved 
by the timeout value,\
 exit with error. Default value is INT_MAX."
-  echo " hostname   Hostname of server we are to stop"
+  echo " hostname   Hostname to stop; match what HBase uses; pass 
'localhost' if local to avoid ssh"
   echo " e|failfast Set -e so exit immediately if any command exits with 
non-zero status"
   echo " nob| nobalancer Do not manage balancer states. This is only used as 
optimization in \
 rolling_restart.sh to avoid multiple calls to hbase shell"
@@ -100,6 +100,10 @@ localhostname=`/bin/hostname`
 if [ "$localhostname" == "$hostname" ]; then
   local=true
 fi
+if [ "$localhostname" == "$hostname" ] || [ "$hostname" == "localhost" ]; then
+  local=true
+  hostname=$localhostname
+fi

 if [ "$nob" == "true"  ]; then
   log "[ $0 ] skipping disabling balancer -nob argument is used"
{code}

> graceful_stop.sh fails to unload regions when ran at localhost
> --
>
> Key: HBASE-25594
> URL: https://issues.apache.org/jira/browse/HBASE-25594
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha-1, 1.4.13
>Reporter: Javier Akira Luca de Tena
>Assignee: Javier Akira Luca de Tena
>Priority: Minor
> Fix For: 3.0.0-alpha-1, 2.5.0, 2.4.3
>
>
> We usually use graceful_stop.sh from the Master to restart RegionServers. 
> However, in some scenarios we may not have privileges to restart remote 
> RegionServers (it uses ssh).
>  But we can still use graceful_stop.sh on the same host we want to restart.
> In order to detect the execution at localhost, graceful_stop.sh uses 
> /bin/hostname.
>  
> [https://github.com/apache/hbase/blob/cfbae4d3a37e7ac4d795461c3e19406a2786838d/bin/graceful_stop.sh#L106-L110]
> When RegionMover strips the host to not include it in the list of target 
> hosts, we filter it out by checking all RegionServer hosts in the cluster:
>  
> [https://github.com/apache/hbase/blob/branch-2/hbase-server/src/main/java/org/apache/hadoop/hbase/util/RegionMover.java#L382-L384]
>  
> [https://github.com/apache/hbase/blob/cfbae4d3a37e7ac4d795461c3e19406a2786838d/hbase-server/src/main/java/org/apache/hadoop/hbase/util/RegionMover.java#L692]
> But the list of RegionServer hosts returned by Admin#getRegionServers are 
> FDQN, while the hostname provided from graceful_stop.sh is not FDQN, making 
> the comparison fail.
> Same happens for branch-1 region_mover.rb, which is the place I reproduced in 
> my environment: 
> [https://github.com/apache/hbase/blob/f9a91488b2c39320bed502619bf7adb765c79de6/bin/region_mover.rb#L305]
> [https://github.com/apache/hbase/blob/f9a91488b2c39320bed502619bf7adb765c79de6/bin/region_mover.rb#L175]
>  
> [https://github.com/apache/hbase/blob/f9a91488b2c39320bed502619bf7adb765c79de6/bin/region_mover.rb#L186-L192]
>  
> This can be fixed just by using "/bin/hostname -f" in the graceful_stop.sh 
> script.
> Will provide patch soon.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Reopened] (HBASE-25594) graceful_stop.sh fails to unload regions when ran at localhost

2021-03-18 Thread Michael Stack (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-25594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack reopened HBASE-25594:
---

Reopening to apply addendum.

> graceful_stop.sh fails to unload regions when ran at localhost
> --
>
> Key: HBASE-25594
> URL: https://issues.apache.org/jira/browse/HBASE-25594
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha-1, 1.4.13
>Reporter: Javier Akira Luca de Tena
>Assignee: Javier Akira Luca de Tena
>Priority: Minor
> Fix For: 3.0.0-alpha-1, 2.5.0, 2.4.3
>
>
> We usually use graceful_stop.sh from the Master to restart RegionServers. 
> However, in some scenarios we may not have privileges to restart remote 
> RegionServers (it uses ssh).
>  But we can still use graceful_stop.sh on the same host we want to restart.
> In order to detect the execution at localhost, graceful_stop.sh uses 
> /bin/hostname.
>  
> [https://github.com/apache/hbase/blob/cfbae4d3a37e7ac4d795461c3e19406a2786838d/bin/graceful_stop.sh#L106-L110]
> When RegionMover strips the host to not include it in the list of target 
> hosts, we filter it out by checking all RegionServer hosts in the cluster:
>  
> [https://github.com/apache/hbase/blob/branch-2/hbase-server/src/main/java/org/apache/hadoop/hbase/util/RegionMover.java#L382-L384]
>  
> [https://github.com/apache/hbase/blob/cfbae4d3a37e7ac4d795461c3e19406a2786838d/hbase-server/src/main/java/org/apache/hadoop/hbase/util/RegionMover.java#L692]
> But the list of RegionServer hosts returned by Admin#getRegionServers are 
> FDQN, while the hostname provided from graceful_stop.sh is not FDQN, making 
> the comparison fail.
> Same happens for branch-1 region_mover.rb, which is the place I reproduced in 
> my environment: 
> [https://github.com/apache/hbase/blob/f9a91488b2c39320bed502619bf7adb765c79de6/bin/region_mover.rb#L305]
> [https://github.com/apache/hbase/blob/f9a91488b2c39320bed502619bf7adb765c79de6/bin/region_mover.rb#L175]
>  
> [https://github.com/apache/hbase/blob/f9a91488b2c39320bed502619bf7adb765c79de6/bin/region_mover.rb#L186-L192]
>  
> This can be fixed just by using "/bin/hostname -f" in the graceful_stop.sh 
> script.
> Will provide patch soon.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 3502 matches

Mail list logo