[jira] [Commented] (HBASE-27414) Search order for locations in HFileLink

2022-11-04 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17629073#comment-17629073
 ] 

Michael Stack commented on HBASE-27414:
---

Looking at the PR, it is odd the way the dir order is flipped without comment. 
There could have been evidence or a thinking behind the change but it is lost 
now. Agree to flip it back and move on.

> Search order for locations in  HFileLink
> 
>
> Key: HBASE-27414
> URL: https://issues.apache.org/jira/browse/HBASE-27414
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance
>Reporter: Huaxiang Sun
>Priority: Minor
>
> Found that search order for locations is following the order of these 
> locations added to HFileLink object. 
>  
> setLocations(originPath, tempPath, mobPath, archivePath);
> archivePath is the last one to be searched. For most cases, hfile exists in 
> archivePath, so we can move archivePath to the first parameter to avoid 
> unnecessary NN query.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-27437) TestHeapSize is flaky

2022-10-20 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17621405#comment-17621405
 ] 

Michael Stack commented on HBASE-27437:
---

Makes sense to me.

> TestHeapSize is flaky
> -
>
> Key: HBASE-27437
> URL: https://issues.apache.org/jira/browse/HBASE-27437
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Reporter: Duo Zhang
>Priority: Major
>
> I believe it is just in memory computation, so it is weird that why it can be 
> flaky.
> Need to dig more.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27396) Purge stale pom comment: ""

2022-09-27 Thread Michael Stack (Jira)
Michael Stack created HBASE-27396:
-

 Summary: Purge stale pom comment: ""
 Key: HBASE-27396
 URL: https://issues.apache.org/jira/browse/HBASE-27396
 Project: HBase
  Issue Type: Task
Reporter: Michael Stack


Any pom that has a hadoop-2.0 profile in it – all but the master branch – has 
this comment in the activation clause:

 


[jira] [Commented] (HBASE-27340) Artifacts with resolved profiles

2022-09-06 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17600870#comment-17600870
 ] 

Michael Stack commented on HBASE-27340:
---

(Thanks for fixing 'Environment' vs 'Release Note' [~zhangduo] )

> Artifacts with resolved profiles
> 
>
> Key: HBASE-27340
> URL: https://issues.apache.org/jira/browse/HBASE-27340
> Project: HBase
>  Issue Type: Brainstorming
>  Components: build, pom
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Minor
> Fix For: 2.6.0, 2.5.1, 3.0.0-alpha-4
>
>
> Brainstorming/Discussion. The maven-flatten-plugin makes it so published poms 
> are 'flattened'. The poms contain the runtime-necessary dependencies only, 
> 'build' and 'test' dependencies and plugins are dropped, versions are 
> resolved out of properties, and so on. The published poms are the barebones 
> minimum needed to run.
> With a switch, the plugin can also make it so the produced poms have all 
> profiles 'resolved' – making it so the produced poms have all resolved 
> hadoop2 or hadoop3 dependencies baked-in – based off which profile we used 
> building.
> (I've been interested in this flattening technique since I ran into a 
> downstreamer using hbase from a gradle build. Gradle does not respect 
> profiles. You can't specify that the gradle build pull in hbase with hadoop3 
> dependencies using 'profiles'. I notice too our [~gjacoby] , [~apurtell] et 
> al. up on the dev list talking about making a hadoop3 set of artifacts...who 
> might be interested in this direction).
> The attached patch adds the flatten plugin so folks can take a look-see. It 
> uncovers some locations where our versioning on dependencies is not explicit. 
> The workaround practiced here was adding hadoop2/hadoop3 profiles into 
> sub-modules that were missing them or moving problematic dependencies that 
> were outside of profiles under profiles in sub-modules that had them already. 
> For the latter, if the dependency specified excludes, the excludes were moved 
> up to the parent pom profile (parent pom profiles have dependencyManagement 
> sections... sub-modules have explicit dependency mentions... checks with 
> dependency:tree seem to show excludes continue to be effective).
> This is the switch that flattens profiles:   
> true
> This is the sort of complaint we had when the flatten plugin was having 
> trouble figure dependency versions – particularly hadoop versions
> {{[ERROR] Failed to execute goal 
> org.codehaus.mojo:flatten-maven-plugin:1.3.0:flatten (flatten) on project 
> hbase-hadoop2-compat: 3 problems were encountered while building the 
> effective model for org.apache.hbase:hbase-hadoop2-compat:2.5.1-SNAPSHOT}}
> {{[ERROR] [WARNING] 'build.plugins.plugin.version' for 
> org.codehaus.mojo:flatten-maven-plugin is missing. @}}
> {{[ERROR] [ERROR] 'dependencies.dependency.version' for 
> org.apache.hadoop:hadoop-mapreduce-client-core:jar is missing. @}}
> {{[ERROR] [ERROR] 'dependencies.dependency.version' for 
> javax.activation:javax.activation-api:jar is missing. @}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-27340) Artifacts with resolved profiles

2022-09-06 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack updated HBASE-27340:
--
Environment: Published poms now contain runtime dependencies only; build 
and test time dependencies are stripped. Profiles are also now resolved and 
in-lined at publish time. This removes the need/ability of downstreamers 
shaping hbase dependencies via enable/disable of hbase profile settings 
(Implication is that now the hbase project publishes artifacts for hadoop2 and 
for hadoop3, and so on).

> Artifacts with resolved profiles
> 
>
> Key: HBASE-27340
> URL: https://issues.apache.org/jira/browse/HBASE-27340
> Project: HBase
>  Issue Type: Brainstorming
>  Components: build, pom
> Environment: Published poms now contain runtime dependencies only; 
> build and test time dependencies are stripped. Profiles are also now resolved 
> and in-lined at publish time. This removes the need/ability of downstreamers 
> shaping hbase dependencies via enable/disable of hbase profile settings 
> (Implication is that now the hbase project publishes artifacts for hadoop2 
> and for hadoop3, and so on).
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Minor
> Fix For: 2.6.0, 2.5.1, 3.0.0-alpha-4
>
>
> Brainstorming/Discussion. The maven-flatten-plugin makes it so published poms 
> are 'flattened'. The poms contain the runtime-necessary dependencies only, 
> 'build' and 'test' dependencies and plugins are dropped, versions are 
> resolved out of properties, and so on. The published poms are the barebones 
> minimum needed to run.
> With a switch, the plugin can also make it so the produced poms have all 
> profiles 'resolved' – making it so the produced poms have all resolved 
> hadoop2 or hadoop3 dependencies baked-in – based off which profile we used 
> building.
> (I've been interested in this flattening technique since I ran into a 
> downstreamer using hbase from a gradle build. Gradle does not respect 
> profiles. You can't specify that the gradle build pull in hbase with hadoop3 
> dependencies using 'profiles'. I notice too our [~gjacoby] , [~apurtell] et 
> al. up on the dev list talking about making a hadoop3 set of artifacts...who 
> might be interested in this direction).
> The attached patch adds the flatten plugin so folks can take a look-see. It 
> uncovers some locations where our versioning on dependencies is not explicit. 
> The workaround practiced here was adding hadoop2/hadoop3 profiles into 
> sub-modules that were missing them or moving problematic dependencies that 
> were outside of profiles under profiles in sub-modules that had them already. 
> For the latter, if the dependency specified excludes, the excludes were moved 
> up to the parent pom profile (parent pom profiles have dependencyManagement 
> sections... sub-modules have explicit dependency mentions... checks with 
> dependency:tree seem to show excludes continue to be effective).
> This is the switch that flattens profiles:   
> true
> This is the sort of complaint we had when the flatten plugin was having 
> trouble figure dependency versions – particularly hadoop versions
> {{[ERROR] Failed to execute goal 
> org.codehaus.mojo:flatten-maven-plugin:1.3.0:flatten (flatten) on project 
> hbase-hadoop2-compat: 3 problems were encountered while building the 
> effective model for org.apache.hbase:hbase-hadoop2-compat:2.5.1-SNAPSHOT}}
> {{[ERROR] [WARNING] 'build.plugins.plugin.version' for 
> org.codehaus.mojo:flatten-maven-plugin is missing. @}}
> {{[ERROR] [ERROR] 'dependencies.dependency.version' for 
> org.apache.hadoop:hadoop-mapreduce-client-core:jar is missing. @}}
> {{[ERROR] [ERROR] 'dependencies.dependency.version' for 
> javax.activation:javax.activation-api:jar is missing. @}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-27338) brotli compression lib tests fail on arm64

2022-08-29 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack updated HBASE-27338:
--
Hadoop Flags: Reviewed
  Resolution: Fixed
  Status: Resolved  (was: Patch Available)

Pushed on master, branch-2, and branch-2.5 (<= Hope this is ok [~ndimiduk] ). 
Thanks for reviews [~bbeaudreault]  and [~apurtell] . Resolving.

> brotli compression lib tests fail on arm64
> --
>
> Key: HBASE-27338
> URL: https://issues.apache.org/jira/browse/HBASE-27338
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.5.0
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Minor
> Fix For: 2.6.0, 2.5.1, 3.0.0-alpha-4
>
>
> The brotli tests fail on M1 macs
>  
> {{[INFO] Running org.apache.hadoop.hbase.io.compress.brotli.TestBrotliCodec}}
> {{[INFO] Running 
> org.apache.hadoop.hbase.io.compress.brotli.TestHFileCompressionBrotli}}
> {{[ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 
> 0.33 s <<< FAILURE! - in 
> org.apache.hadoop.hbase.io.compress.brotli.TestHFileCompressionBrotli}}
> {{[ERROR] 
> org.apache.hadoop.hbase.io.compress.brotli.TestHFileCompressionBrotli.test  
> Time elapsed: 0.225 s  <<< ERROR!}}
> {{java.lang.UnsatisfiedLinkError: Failed to load Brotli native library}}
> {{...}}
>  
> The lib is installed on this machine. A new release of 
> *[Brotli4j|https://github.com/hyperxpro/Brotli4j]* lib, 1.8.0, done a few 
> days ago fixes the issue... (See 
> [https://github.com/hyperxpro/Brotli4j/pull/34).] I tried it .
> {{[INFO] ---}}
> {{[INFO]  T E S T S}}
> {{[INFO] ---}}
> {{[INFO] Running org.apache.hadoop.hbase.io.compress.brotli.TestBrotliCodec}}
> {{[INFO] Running 
> org.apache.hadoop.hbase.io.compress.brotli.TestHFileCompressionBrotli}}
> {{[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
> 1.036 s - in 
> org.apache.hadoop.hbase.io.compress.brotli.TestHFileCompressionBrotli}}
> {{[INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 8.42 
> s - in org.apache.hadoop.hbase.io.compress.brotli.TestBrotliCodec}}
> {{[INFO]}}
> {{[INFO] Results:}}
> {{[INFO]}}
> {{[INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0}}
> {{[INFO]}}
> {{[INFO]}}
> {{[INFO] --- maven-surefire-plugin:3.0.0-M6:test (secondPartTestsExecution) @ 
> hbase-compression-brotli ---}}
> {{[INFO] Tests are skipped.}}
> {{[INFO]}}
> {{[INFO] --- maven-jar-plugin:3.2.0:test-jar (default) @ 
> hbase-compression-brotli ---}}
> {{[INFO] Building jar: 
> /Users/stack/checkouts/hbase/2.5.0RC1/hbase-2.5.0/hbase-compression/hbase-compression-brotli/target/hbase-compression-brotli-2.5.0-tests.jar}}
> {{[INFO]}}
> {{[INFO] --- maven-jar-plugin:3.2.0:jar (default-jar) @ 
> hbase-compression-brotli ---}}
> {{[INFO] Building jar: 
> /Users/stack/checkouts/hbase/2.5.0RC1/hbase-2.5.0/hbase-compression/hbase-compression-brotli/target/hbase-compression-brotli-2.5.0.jar}}
> {{[INFO]}}
> {{[INFO] --- maven-site-plugin:3.12.0:attach-descriptor (attach-descriptor) @ 
> hbase-compression-brotli ---}}
> {{[INFO] Skipping because packaging 'jar' is not pom.}}
> {{[INFO]}}
> {{[INFO] --- maven-install-plugin:2.5.2:install (default-install) @ 
> hbase-compression-brotli ---}}
> {{[INFO] Installing 
> /Users/stack/checkouts/hbase/2.5.0RC1/hbase-2.5.0/hbase-compression/hbase-compression-brotli/target/hbase-compression-brotli-2.5.0.jar
>  to 
> /Users/stack/.m2/repository/org/apache/hbase/hbase-compression-brotli/2.5.0/hbase-compression-brotli-2.5.0.jar}}
> {{[INFO] Installing 
> /Users/stack/checkouts/hbase/2.5.0RC1/hbase-2.5.0/hbase-compression/hbase-compression-brotli/pom.xml
>  to 
> /Users/stack/.m2/repository/org/apache/hbase/hbase-compression-brotli/2.5.0/hbase-compression-brotli-2.5.0.pom}}
> {{[INFO] Installing 
> /Users/stack/checkouts/hbase/2.5.0RC1/hbase-2.5.0/hbase-compression/hbase-compression-brotli/target/hbase-compression-brotli-2.5.0-tests.jar
>  to 
> /Users/stack/.m2/repository/org/apache/hbase/hbase-compression-brotli/2.5.0/hbase-compression-brotli-2.5.0-tests.jar}}
> {{[INFO] 
> }}
> {{[INFO] BUILD SUCCESS}}
> {{[INFO] 
> }}
> {{[INFO] Total time:  16.805 s}}
> {{[INFO] Finished at: 2022-08-26T11:30:13-07:00}}
> {{[INFO] 
> }}
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27340) Artifacts with resolved profiles

2022-08-27 Thread Michael Stack (Jira)
Michael Stack created HBASE-27340:
-

 Summary: Artifacts with resolved profiles
 Key: HBASE-27340
 URL: https://issues.apache.org/jira/browse/HBASE-27340
 Project: HBase
  Issue Type: Brainstorming
Reporter: Michael Stack


Brainstorming/Discussion. The maven-flatten-plugin makes it so published poms 
are 'flattened'. The poms contain the runtime-necessary dependencies only, 
'build' and 'test' dependencies and plugins are dropped, versions are resolved 
out of properties, and so on. The published poms are the barebones minimum 
needed to run.

With a switch, the plugin can also make it so the produced poms have all 
profiles 'resolved' – making it so the produced poms have all resolved hadoop2 
or hadoop3 dependencies baked-in – based off which profile we used building.

(I've been interested in this flattening technique since I ran into a 
downstreamer using hbase from a gradle build. Gradle does not respect profiles. 
You can't specify that the gradle build pull in hbase with hadoop3 dependencies 
using 'profiles'. I notice too our [~gjacoby] , [~apurtell] et al. up on the 
dev list talking about making a hadoop3 set of artifacts...who might be 
interested in this direction).

The attached patch adds the flatten plugin so folks can take a look-see. It 
uncovers some locations where our versioning on dependencies is not explicit. 
The workaround practiced here was adding hadoop2/hadoop3 profiles into 
sub-modules that were missing them or moving problematic dependencies that were 
outside of profiles under profiles in sub-modules that had them already. For 
the latter, if the dependency specified excludes, the excludes were moved up to 
the parent pom profile (parent pom profiles have dependencyManagement 
sections... sub-modules have explicit dependency mentions... checks with 
dependency:tree seem to show excludes continue to be effective).

This is the switch that flattens profiles:   
true

This is the sort of complaint we had when the flatten plugin was having trouble 
figure dependency versions – particularly hadoop versions

{{[ERROR] Failed to execute goal 
org.codehaus.mojo:flatten-maven-plugin:1.3.0:flatten (flatten) on project 
hbase-hadoop2-compat: 3 problems were encountered while building the effective 
model for org.apache.hbase:hbase-hadoop2-compat:2.5.1-SNAPSHOT}}

{{[ERROR] [WARNING] 'build.plugins.plugin.version' for 
org.codehaus.mojo:flatten-maven-plugin is missing. @}}

{{[ERROR] [ERROR] 'dependencies.dependency.version' for 
org.apache.hadoop:hadoop-mapreduce-client-core:jar is missing. @}}

{{[ERROR] [ERROR] 'dependencies.dependency.version' for 
javax.activation:javax.activation-api:jar is missing. @}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27338) brotli compression lib tests fail on arm64

2022-08-26 Thread Michael Stack (Jira)
Michael Stack created HBASE-27338:
-

 Summary: brotli compression lib tests fail on arm64
 Key: HBASE-27338
 URL: https://issues.apache.org/jira/browse/HBASE-27338
 Project: HBase
  Issue Type: Improvement
Affects Versions: 2.5.0
Reporter: Michael Stack


The brotli tests fail on M1 macs

 

{{[INFO] Running org.apache.hadoop.hbase.io.compress.brotli.TestBrotliCodec}}
{{[INFO] Running 
org.apache.hadoop.hbase.io.compress.brotli.TestHFileCompressionBrotli}}
{{[ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.33 
s <<< FAILURE! - in 
org.apache.hadoop.hbase.io.compress.brotli.TestHFileCompressionBrotli}}
{{[ERROR] 
org.apache.hadoop.hbase.io.compress.brotli.TestHFileCompressionBrotli.test  
Time elapsed: 0.225 s  <<< ERROR!}}
{{java.lang.UnsatisfiedLinkError: Failed to load Brotli native library}}

{{...}}

 

The lib is installed on this machine. A new release of 
*[Brotli4j|https://github.com/hyperxpro/Brotli4j]* lib, 1.8.0, done a few days 
ago fixes the issue... (See [https://github.com/hyperxpro/Brotli4j/pull/34).] I 
tried it .

{{[INFO] ---}}
{{[INFO]  T E S T S}}
{{[INFO] ---}}
{{[INFO] Running org.apache.hadoop.hbase.io.compress.brotli.TestBrotliCodec}}
{{[INFO] Running 
org.apache.hadoop.hbase.io.compress.brotli.TestHFileCompressionBrotli}}
{{[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.036 
s - in org.apache.hadoop.hbase.io.compress.brotli.TestHFileCompressionBrotli}}

{{[INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 8.42 s 
- in org.apache.hadoop.hbase.io.compress.brotli.TestBrotliCodec}}
{{[INFO]}}
{{[INFO] Results:}}
{{[INFO]}}
{{[INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0}}
{{[INFO]}}
{{[INFO]}}
{{[INFO] --- maven-surefire-plugin:3.0.0-M6:test (secondPartTestsExecution) @ 
hbase-compression-brotli ---}}
{{[INFO] Tests are skipped.}}
{{[INFO]}}
{{[INFO] --- maven-jar-plugin:3.2.0:test-jar (default) @ 
hbase-compression-brotli ---}}
{{[INFO] Building jar: 
/Users/stack/checkouts/hbase/2.5.0RC1/hbase-2.5.0/hbase-compression/hbase-compression-brotli/target/hbase-compression-brotli-2.5.0-tests.jar}}
{{[INFO]}}
{{[INFO] --- maven-jar-plugin:3.2.0:jar (default-jar) @ 
hbase-compression-brotli ---}}
{{[INFO] Building jar: 
/Users/stack/checkouts/hbase/2.5.0RC1/hbase-2.5.0/hbase-compression/hbase-compression-brotli/target/hbase-compression-brotli-2.5.0.jar}}
{{[INFO]}}
{{[INFO] --- maven-site-plugin:3.12.0:attach-descriptor (attach-descriptor) @ 
hbase-compression-brotli ---}}
{{[INFO] Skipping because packaging 'jar' is not pom.}}
{{[INFO]}}
{{[INFO] --- maven-install-plugin:2.5.2:install (default-install) @ 
hbase-compression-brotli ---}}
{{[INFO] Installing 
/Users/stack/checkouts/hbase/2.5.0RC1/hbase-2.5.0/hbase-compression/hbase-compression-brotli/target/hbase-compression-brotli-2.5.0.jar
 to 
/Users/stack/.m2/repository/org/apache/hbase/hbase-compression-brotli/2.5.0/hbase-compression-brotli-2.5.0.jar}}
{{[INFO] Installing 
/Users/stack/checkouts/hbase/2.5.0RC1/hbase-2.5.0/hbase-compression/hbase-compression-brotli/pom.xml
 to 
/Users/stack/.m2/repository/org/apache/hbase/hbase-compression-brotli/2.5.0/hbase-compression-brotli-2.5.0.pom}}
{{[INFO] Installing 
/Users/stack/checkouts/hbase/2.5.0RC1/hbase-2.5.0/hbase-compression/hbase-compression-brotli/target/hbase-compression-brotli-2.5.0-tests.jar
 to 
/Users/stack/.m2/repository/org/apache/hbase/hbase-compression-brotli/2.5.0/hbase-compression-brotli-2.5.0-tests.jar}}
{{[INFO] 
}}
{{[INFO] BUILD SUCCESS}}
{{[INFO] 
}}
{{[INFO] Total time:  16.805 s}}
{{[INFO] Finished at: 2022-08-26T11:30:13-07:00}}
{{[INFO] 
}}

 

 

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-26982) Add index and bloom filter statistics of LruBlockCache on rs web UI

2022-08-15 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17579846#comment-17579846
 ] 

Michael Stack commented on HBASE-26982:
---

Below is image that shows what this PR adds:

 

!165520264-e172911b-58c5-4e74-8d61-00bd66524c94.png!

> Add index and bloom filter statistics of LruBlockCache on rs web UI
> ---
>
> Key: HBASE-26982
> URL: https://issues.apache.org/jira/browse/HBASE-26982
> Project: HBase
>  Issue Type: Improvement
>  Components: BlockCache, UI
>Reporter: Xuesen Liang
>Assignee: Xuesen Liang
>Priority: Minor
> Fix For: 2.5.0, 3.0.0-alpha-4
>
> Attachments: 165520264-e172911b-58c5-4e74-8d61-00bd66524c94.png
>
>
> When _CombinedBlockCache_ is configured, _LruBlockCache_ is used for 
> index/bloom/meta block cache, and _BucketCache_ is used for data block cache. 
> Index and bloom filter statistics on rs web UI will be helpful.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-26982) Add index and bloom filter statistics of LruBlockCache on rs web UI

2022-08-15 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack updated HBASE-26982:
--
Attachment: 165520264-e172911b-58c5-4e74-8d61-00bd66524c94.png

> Add index and bloom filter statistics of LruBlockCache on rs web UI
> ---
>
> Key: HBASE-26982
> URL: https://issues.apache.org/jira/browse/HBASE-26982
> Project: HBase
>  Issue Type: Improvement
>  Components: BlockCache, UI
>Reporter: Xuesen Liang
>Assignee: Xuesen Liang
>Priority: Minor
> Fix For: 2.5.0, 3.0.0-alpha-4
>
> Attachments: 165520264-e172911b-58c5-4e74-8d61-00bd66524c94.png
>
>
> When _CombinedBlockCache_ is configured, _LruBlockCache_ is used for 
> index/bloom/meta block cache, and _BucketCache_ is used for data block cache. 
> Index and bloom filter statistics on rs web UI will be helpful.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-26982) Add index and bloom filter statistics of LruBlockCache on rs web UI

2022-08-15 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack updated HBASE-26982:
--
Hadoop Flags: Reviewed
  Resolution: Fixed
  Status: Resolved  (was: Patch Available)

Merged to master and then cherry-picked to branch-2 and branch-2.5. Thanks for 
the nice patch [~liangxs]  and review [~andrew.purt...@gmail.com] 

> Add index and bloom filter statistics of LruBlockCache on rs web UI
> ---
>
> Key: HBASE-26982
> URL: https://issues.apache.org/jira/browse/HBASE-26982
> Project: HBase
>  Issue Type: Improvement
>  Components: BlockCache, UI
>Reporter: Xuesen Liang
>Assignee: Xuesen Liang
>Priority: Minor
> Fix For: 2.5.0, 3.0.0-alpha-4
>
>
> When _CombinedBlockCache_ is configured, _LruBlockCache_ is used for 
> index/bloom/meta block cache, and _BucketCache_ is used for data block cache. 
> Index and bloom filter statistics on rs web UI will be helpful.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-27112) Investigate Netty resource usage limits

2022-07-12 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17566179#comment-17566179
 ] 

Michael Stack commented on HBASE-27112:
---

[~vjasani] This run compares runs without and then with the below:

 

{{  }}
{{    hbase.netty.eventloop.rpcserver.thread.count}}
{{    1}}
{{    See the end of 
https://issues.apache.org/jira/browse/HBASE-27112. Default}}
{{  is 2xCPU_COUNT which seems way too much. 1 thread seems fine for the siri 
workload at least.}}
{{  }}
{{  }}
{{    hbase.netty.worker.count}}
{{    1}}
{{    See the end of 
https://issues.apache.org/jira/browse/HBASE-27112. Default}}
{{  is 2xCPU_COUNT which seems way too much. 1 thread seems fine for the siri 
workload at least.}}
{{  }}

{{The job without the above configs takes 100-102 minutes to complete. With the 
above config, the job takes 84-88 minutes to complete. Below, the first hump is 
run without the above configs. The second hump is with. Here the throughput 
seems a little elevated (Interesting is how handlers are less consumed when 
only the single eventloop). I'm going to try these configs over here. Here is 
metrics showing before and after.}}

!Image 7-12-22 at 10.45 PM.jpg!

 

 

> Investigate Netty resource usage limits
> ---
>
> Key: HBASE-27112
> URL: https://issues.apache.org/jira/browse/HBASE-27112
> Project: HBase
>  Issue Type: Sub-task
>  Components: IPC/RPC
>Affects Versions: 2.5.0
>Reporter: Andrew Kyle Purtell
>Priority: Major
> Attachments: Image 7-11-22 at 10.12 PM.jpg, Image 7-12-22 at 10.45 
> PM.jpg
>
>
> We leave Netty level resource limits unbounded. The number of threads to use 
> for the event loop is default 0 (unbounded). The default for 
> io.netty.eventLoop.maxPendingTasks is INT_MAX. 
> We don't do that for our own RPC handlers. We have a notion of maximum 
> handler pool size, with a default of 30, typically raised in production by 
> the user. We constrain the depth of the request queue in multiple ways... 
> limits on number of queued calls, limits on total size of calls data that can 
> be queued (to avoid memory usage overrun, CoDel conditioning of the call 
> queues if it is enabled, and so on.
> Under load can we pile up a excess of pending request state, such as direct 
> buffers containing request bytes, at the netty layer because of downstream 
> resource limits? Those limits will act as a bottleneck, as intended, and 
> before would have also applied backpressure through RPC too, because 
> SimpleRpcServer had thread limits ("hbase.ipc.server.read.threadpool.size", 
> default 10), but Netty may be able to queue up a lot more, in comparison, 
> because Netty has been optimized to prefer concurrency.
> Consider the hbase.netty.eventloop.rpcserver.thread.count default. It is 0 
> (unbounded). I don't know what it can actually get up to in production, 
> because we lack the metric, but there are diminishing returns when threads > 
> cores so a reasonable default here could be 
> Runtime.getRuntime().availableProcessors() instead of unbounded?
> maxPendingTasks probably should not be INT_MAX, but that may matter less.
> The tasks here are:
> - Instrument netty level resources to understand better actual resource 
> allocations under load. Investigate what we need to plug in where to gain 
> visibility. 
> - Where instrumentation designed for this issue can be implemented as low 
> overhead metrics, consider formally adding them as a metric. 
> - Based on the findings from this instrumentation, consider and implement 
> next steps. The goal would be to limit concurrency at the Netty layer in such 
> a way that performance is still good, and under load we don't balloon 
> resource usage at the Netty layer.
> If the instrumentation and experimental results indicate no changes are 
> necessary, we can close this as Not A Problem or WontFix. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-27112) Investigate Netty resource usage limits

2022-07-12 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack updated HBASE-27112:
--
Attachment: Image 7-12-22 at 10.45 PM.jpg

> Investigate Netty resource usage limits
> ---
>
> Key: HBASE-27112
> URL: https://issues.apache.org/jira/browse/HBASE-27112
> Project: HBase
>  Issue Type: Sub-task
>  Components: IPC/RPC
>Affects Versions: 2.5.0
>Reporter: Andrew Kyle Purtell
>Priority: Major
> Attachments: Image 7-11-22 at 10.12 PM.jpg, Image 7-12-22 at 10.45 
> PM.jpg
>
>
> We leave Netty level resource limits unbounded. The number of threads to use 
> for the event loop is default 0 (unbounded). The default for 
> io.netty.eventLoop.maxPendingTasks is INT_MAX. 
> We don't do that for our own RPC handlers. We have a notion of maximum 
> handler pool size, with a default of 30, typically raised in production by 
> the user. We constrain the depth of the request queue in multiple ways... 
> limits on number of queued calls, limits on total size of calls data that can 
> be queued (to avoid memory usage overrun, CoDel conditioning of the call 
> queues if it is enabled, and so on.
> Under load can we pile up a excess of pending request state, such as direct 
> buffers containing request bytes, at the netty layer because of downstream 
> resource limits? Those limits will act as a bottleneck, as intended, and 
> before would have also applied backpressure through RPC too, because 
> SimpleRpcServer had thread limits ("hbase.ipc.server.read.threadpool.size", 
> default 10), but Netty may be able to queue up a lot more, in comparison, 
> because Netty has been optimized to prefer concurrency.
> Consider the hbase.netty.eventloop.rpcserver.thread.count default. It is 0 
> (unbounded). I don't know what it can actually get up to in production, 
> because we lack the metric, but there are diminishing returns when threads > 
> cores so a reasonable default here could be 
> Runtime.getRuntime().availableProcessors() instead of unbounded?
> maxPendingTasks probably should not be INT_MAX, but that may matter less.
> The tasks here are:
> - Instrument netty level resources to understand better actual resource 
> allocations under load. Investigate what we need to plug in where to gain 
> visibility. 
> - Where instrumentation designed for this issue can be implemented as low 
> overhead metrics, consider formally adding them as a metric. 
> - Based on the findings from this instrumentation, consider and implement 
> next steps. The goal would be to limit concurrency at the Netty layer in such 
> a way that performance is still good, and under load we don't balloon 
> resource usage at the Netty layer.
> If the instrumentation and experimental results indicate no changes are 
> necessary, we can close this as Not A Problem or WontFix. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-27112) Investigate Netty resource usage limits

2022-07-11 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17565279#comment-17565279
 ] 

Michael Stack commented on HBASE-27112:
---

[~apurtell]  Pardon me. I took another look in here. Mr. Maurer suggested we 
should be able to do w/ just one EventLoop. I tried it for a well-known loading 
over here (80% random gets w/ 20% writes) and indeed, it seems to perform 
pretty much the same as default (2*CPU_COUNT which in my case turns out to be 
48 EventLoops). Below are some graphs from a small cluster. There are two runs 
w/ one EventLoop thread only and then two runs with default (48 threads). The 
two job runs took about the same time to complete – little difference. I'm 
going to deploy hbase.netty.eventloop.rpcserver.thread.count=1 in favor of the 
default.

 

!Image 7-11-22 at 10.12 PM.jpg!

> Investigate Netty resource usage limits
> ---
>
> Key: HBASE-27112
> URL: https://issues.apache.org/jira/browse/HBASE-27112
> Project: HBase
>  Issue Type: Sub-task
>  Components: IPC/RPC
>Affects Versions: 2.5.0
>Reporter: Andrew Kyle Purtell
>Priority: Major
> Attachments: Image 7-11-22 at 10.12 PM.jpg
>
>
> We leave Netty level resource limits unbounded. The number of threads to use 
> for the event loop is default 0 (unbounded). The default for 
> io.netty.eventLoop.maxPendingTasks is INT_MAX. 
> We don't do that for our own RPC handlers. We have a notion of maximum 
> handler pool size, with a default of 30, typically raised in production by 
> the user. We constrain the depth of the request queue in multiple ways... 
> limits on number of queued calls, limits on total size of calls data that can 
> be queued (to avoid memory usage overrun, CoDel conditioning of the call 
> queues if it is enabled, and so on.
> Under load can we pile up a excess of pending request state, such as direct 
> buffers containing request bytes, at the netty layer because of downstream 
> resource limits? Those limits will act as a bottleneck, as intended, and 
> before would have also applied backpressure through RPC too, because 
> SimpleRpcServer had thread limits ("hbase.ipc.server.read.threadpool.size", 
> default 10), but Netty may be able to queue up a lot more, in comparison, 
> because Netty has been optimized to prefer concurrency.
> Consider the hbase.netty.eventloop.rpcserver.thread.count default. It is 0 
> (unbounded). I don't know what it can actually get up to in production, 
> because we lack the metric, but there are diminishing returns when threads > 
> cores so a reasonable default here could be 
> Runtime.getRuntime().availableProcessors() instead of unbounded?
> maxPendingTasks probably should not be INT_MAX, but that may matter less.
> The tasks here are:
> - Instrument netty level resources to understand better actual resource 
> allocations under load. Investigate what we need to plug in where to gain 
> visibility. 
> - Where instrumentation designed for this issue can be implemented as low 
> overhead metrics, consider formally adding them as a metric. 
> - Based on the findings from this instrumentation, consider and implement 
> next steps. The goal would be to limit concurrency at the Netty layer in such 
> a way that performance is still good, and under load we don't balloon 
> resource usage at the Netty layer.
> If the instrumentation and experimental results indicate no changes are 
> necessary, we can close this as Not A Problem or WontFix. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-27112) Investigate Netty resource usage limits

2022-07-11 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack updated HBASE-27112:
--
Attachment: Image 7-11-22 at 10.12 PM.jpg

> Investigate Netty resource usage limits
> ---
>
> Key: HBASE-27112
> URL: https://issues.apache.org/jira/browse/HBASE-27112
> Project: HBase
>  Issue Type: Sub-task
>  Components: IPC/RPC
>Affects Versions: 2.5.0
>Reporter: Andrew Kyle Purtell
>Priority: Major
> Attachments: Image 7-11-22 at 10.12 PM.jpg
>
>
> We leave Netty level resource limits unbounded. The number of threads to use 
> for the event loop is default 0 (unbounded). The default for 
> io.netty.eventLoop.maxPendingTasks is INT_MAX. 
> We don't do that for our own RPC handlers. We have a notion of maximum 
> handler pool size, with a default of 30, typically raised in production by 
> the user. We constrain the depth of the request queue in multiple ways... 
> limits on number of queued calls, limits on total size of calls data that can 
> be queued (to avoid memory usage overrun, CoDel conditioning of the call 
> queues if it is enabled, and so on.
> Under load can we pile up a excess of pending request state, such as direct 
> buffers containing request bytes, at the netty layer because of downstream 
> resource limits? Those limits will act as a bottleneck, as intended, and 
> before would have also applied backpressure through RPC too, because 
> SimpleRpcServer had thread limits ("hbase.ipc.server.read.threadpool.size", 
> default 10), but Netty may be able to queue up a lot more, in comparison, 
> because Netty has been optimized to prefer concurrency.
> Consider the hbase.netty.eventloop.rpcserver.thread.count default. It is 0 
> (unbounded). I don't know what it can actually get up to in production, 
> because we lack the metric, but there are diminishing returns when threads > 
> cores so a reasonable default here could be 
> Runtime.getRuntime().availableProcessors() instead of unbounded?
> maxPendingTasks probably should not be INT_MAX, but that may matter less.
> The tasks here are:
> - Instrument netty level resources to understand better actual resource 
> allocations under load. Investigate what we need to plug in where to gain 
> visibility. 
> - Where instrumentation designed for this issue can be implemented as low 
> overhead metrics, consider formally adding them as a metric. 
> - Based on the findings from this instrumentation, consider and implement 
> next steps. The goal would be to limit concurrency at the Netty layer in such 
> a way that performance is still good, and under load we don't balloon 
> resource usage at the Netty layer.
> If the instrumentation and experimental results indicate no changes are 
> necessary, we can close this as Not A Problem or WontFix. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-27112) Investigate Netty resource usage limits

2022-06-29 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17560816#comment-17560816
 ] 

Michael Stack commented on HBASE-27112:
---

{quote}Instrument netty level resources to understand better actual resource 
allocations under load. Investigate what we need to plug in where to gain 
visibility.
{quote}
There doesn't seem to be an amenable, native metrics export by netty and core 
classes/interfaces are bare of anything but functionality so tough getting 
counts w/o gymnastics (It seems like it is an old ask of netty 
[https://github.com/netty/netty/issues/6523]. or 
https://groups.google.com/g/netty/c/-ZNx-L75csc/m/z8u5rp6lCUQJ).

Thread-dumping while under load w/ default config., I see 2*CPU_COUNT netty 
RS-EventLoopGroup-N threads all RUNNABLE seemingly doing nothing stuck on 
epoll.Native.epollWait.

I am running w/ 2*CPU_COUNT handlers.

Netty thread count does seems excessive.

I tried running w/ 8 threads in the RS-EventLoopGroup pool on a 24 "CPU" node 
w/ load on and throughput seemed less, but pretty close; 75th percentile seemed 
the same before as after but 99.9th percentile was elevated some (~70ms vs 
~90ms). With less threads I started to get a few CallQueueTooBigExceptions on a 
few servers which I didn't notice happening when 2*CPU_COUNT.

I tried with CPU_COUNT netty RS-EventLoopGroup threads. All seems to be about 
same as 2*CPU_COUNT – perhaps slightly less throughput though the 99.9th 
percentile seemed less (~70ms vs ~50). There were a few transient 
CallQueueTooBigExceptions. I think I'm going to leave the default 2*CPU_COUNT 
in place for now on this cluster though it seems profligate (80% read/20% 
write) until someone does a deeper dig than the cursory one done here. Thanks.

Hope this helps.

> Investigate Netty resource usage limits
> ---
>
> Key: HBASE-27112
> URL: https://issues.apache.org/jira/browse/HBASE-27112
> Project: HBase
>  Issue Type: Sub-task
>  Components: IPC/RPC
>Affects Versions: 2.5.0
>Reporter: Andrew Kyle Purtell
>Assignee: Andrew Kyle Purtell
>Priority: Major
> Fix For: 2.5.0, 3.0.0-alpha-4
>
>
> We leave Netty level resource limits unbounded. The number of threads to use 
> for the event loop is default 0 (unbounded). The default for 
> io.netty.eventLoop.maxPendingTasks is INT_MAX. 
> We don't do that for our own RPC handlers. We have a notion of maximum 
> handler pool size, with a default of 30, typically raised in production by 
> the user. We constrain the depth of the request queue in multiple ways... 
> limits on number of queued calls, limits on total size of calls data that can 
> be queued (to avoid memory usage overrun, CoDel conditioning of the call 
> queues if it is enabled, and so on.
> Under load can we pile up a excess of pending request state, such as direct 
> buffers containing request bytes, at the netty layer because of downstream 
> resource limits? Those limits will act as a bottleneck, as intended, and 
> before would have also applied backpressure through RPC too, because 
> SimpleRpcServer had thread limits ("hbase.ipc.server.read.threadpool.size", 
> default 10), but Netty may be able to queue up a lot more, in comparison, 
> because Netty has been optimized to prefer concurrency.
> Consider the hbase.netty.eventloop.rpcserver.thread.count default. It is 0 
> (unbounded). I don't know what it can actually get up to in production, 
> because we lack the metric, but there are diminishing returns when threads > 
> cores so a reasonable default here could be 
> Runtime.getRuntime().availableProcessors() instead of unbounded?
> maxPendingTasks probably should not be INT_MAX, but that may matter less.
> The tasks here are:
> - Instrument netty level resources to understand better actual resource 
> allocations under load. Investigate what we need to plug in where to gain 
> visibility. 
> - Where instrumentation designed for this issue can be implemented as low 
> overhead metrics, consider formally adding them as a metric. 
> - Based on the findings from this instrumentation, consider and implement 
> next steps. The goal would be to limit concurrency at the Netty layer in such 
> a way that performance is still good, and under load we don't balloon 
> resource usage at the Netty layer.
> If the instrumentation and experimental results indicate no changes are 
> necessary, we can close this as Not A Problem or WontFix. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-21065) Try ROW_INDEX_V1 encoding on meta table (fix bloomfilters on meta while we are at it)

2022-03-19 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-21065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17509330#comment-17509330
 ] 

Michael Stack commented on HBASE-21065:
---

[~bbeaudreault] making hbase:meta schema default enabling ROW_INDEX_V1? 
Probably not a technical issue. IIRC, probably thought changing default 
hbase:meta schema should wait on major version release  (I think the patch and 
the subject on this Jira are out of alignment so there might be some confusion 
here as to what this Jira did).

If you are just interested in addressing meta hotspot issues, edit your 
hbase:meta and enable ROW_INDEX_V1 and BLOOMFILTER... You can since 2.3.0. I 
checked that at least one cluster where I work has this in place – 2.4.x, 
BLOOMFILTER => 'ROW', DATA_BLOCK_ENCODING => 'ROW_INDEX_V1'  on all hbase:meta 
columnfamilies. Enabling meta replicas also helped.

 

> Try ROW_INDEX_V1 encoding on meta table (fix bloomfilters on meta while we 
> are at it)
> -
>
> Key: HBASE-21065
> URL: https://issues.apache.org/jira/browse/HBASE-21065
> Project: HBase
>  Issue Type: Improvement
>  Components: meta, Performance
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Major
> Fix For: 3.0.0-alpha-1
>
>
> Some users end up hitting meta hard. Bulk is probably because our client goes 
> to meta too often, and the real 'fix' for a saturated meta is splitting it, 
> but the encoding that came in with HBASE-16213, ROW_INDEX_V1, could help in 
> the near term. It adds an index on hfile blocks and helped improve random 
> reads against user-space tables (less compares as we used index to go direct 
> to requested Cells rather than look at each Cell in turn until we found what 
> we wanted -- see RN on HBASE-16213 for citation).
> I also noticed code-reading that we don't enable blooms on hbase:meta tables; 
> that could save some CPU and speed things up a bit too:
> {code}
> // Disable blooms for meta.  Needs work.  Seems to mess w/ 
> getClosestOrBefore.
> .setBloomFilterType(BloomType.NONE)
> {code}
> This issue is about doing a bit of perf compare of encoding *on* vs current 
> default (and will check diff in size of indexed blocks).
> Meta access is mostly random-read I believe (A review of a user's access 
> showed this so at least for their workload). The nice addition, HBASE-19722 
> Meta query statistics metrics source, would help verify if it saw some usage 
> on a prod cluster.
> If all is good, I'd like to make a small patch, one that could be easily 
> backported, with minimal changes in it.
> As is, its all a little awkward as the meta table schema is hard-coded and 
> meta is immutable -- stuff we'll have to fix if we want to split meta -- so 
> in the meantime it requires a code change to enable (and a backport of 
> HBASE-16213 -- this patch is in 1.4.0 only currently, perhaps that is 
> enough). Code change to enable is small:
> {code}
> diff --git 
> a/hbase-server/src/main/java/org/apache/hadoop/hbase/util/FSTableDescriptors.java
>  
> b/hbase-server/src/main/java/org/apache/hadoop/hbase/util/FSTableDescriptors.java
> index 28c7ec3c2f..8f08f94dc1 100644
> --- 
> a/hbase-server/src/main/java/org/apache/hadoop/hbase/util/FSTableDescriptors.java
> +++ 
> b/hbase-server/src/main/java/org/apache/hadoop/hbase/util/FSTableDescriptors.java
> @@ -160,6 +160,7 @@ public class FSTableDescriptors implements 
> TableDescriptors {
>  .setScope(HConstants.REPLICATION_SCOPE_LOCAL)
>  // Disable blooms for meta.  Needs work.  Seems to mess w/ 
> getClosestOrBefore.
>  .setBloomFilterType(BloomType.NONE)
> +
> .setDataBlockEncoding(org.apache.hadoop.hbase.io.encoding.DataBlockEncoding.ROW_INDEX_V1)
>  .build())
>
> .setColumnFamily(ColumnFamilyDescriptorBuilder.newBuilder(HConstants.TABLE_FAMILY)
>  .setMaxVersions(conf.getInt(HConstants.HBASE_META_VERSIONS,
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (HBASE-26546) hbase-shaded-client missing required thirdparty classes under hadoop 3.3.1

2021-12-07 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17454806#comment-17454806
 ] 

Michael Stack edited comment on HBASE-26546 at 12/7/21, 8:14 PM:
-

Makes sense [~bbeaudreault] . Thanks for digging in. Shout if you want me 
implement your suggestion (sounds like you have better test setup than I 
though).


was (Author: stack):
Makes sense [~bbeaudreault] . Thanks for digging in.

> hbase-shaded-client missing required thirdparty classes under hadoop 3.3.1
> --
>
> Key: HBASE-26546
> URL: https://issues.apache.org/jira/browse/HBASE-26546
> Project: HBase
>  Issue Type: Bug
>Reporter: Bryan Beaudreault
>Priority: Major
>
> In HBASE-25792, the shaded thirdparty libraries from hadoop were removed from 
> the hbase-shaded-client fat jar to satisfy invariant checks. Unfortunately 
> this causes users of hbase-shaded-client to fail, because required classes 
> are not available at runtime.
> The specific failure I'm seeing is when trying to call new Configuration(), 
> which results in:
>  
>  
> {code:java}
> Caused by: java.lang.NoClassDefFoundError: 
> org/apache/hadoop/thirdparty/com/google/common/base/Preconditions   
>   at 
> org.apache.hadoop.conf.Configuration$DeprecationDelta.(Configuration.java:430)
>    
>   at 
> org.apache.hadoop.conf.Configuration$DeprecationDelta.(Configuration.java:443)
>    
>   at 
> org.apache.hadoop.conf.Configuration.(Configuration.java:525){code}
>  
>  
> If you take a look at the hbase-shaded-client fat jar, it contains the 
> org.apache.hadoop.conf.Configuration class as you'd expect. If you decompile 
> that class (or look at the 3.3.1 source), you'll see that there is an import 
> for org.apache.hadoop.thirdparty.com.google.common.base.Preconditions but the 
> fat jar does not provide it.
>  
> One way for clients to get around this is to add an explicit dependency on 
> hadoop-shaded-guava, but this is problematic for a few reasons:
>  
> - it's best practice to use maven-dependency-plugin to disallow declared, 
> unused dependencies (which this would be)
> - it requires users to continually keep the version of hadoop-shaded-guava 
> up-to-date over time.
> - it only covers guava, but there is also protobuf and potentially other 
> shaded libraries in the future.
>  
> I think we should remove the exclusion of 
> {{org/apache/hadoop/thirdparty/**/*}} from the shading config and instead add 
> that pattern to the allowlist so that hbase-shaded-client is all clients need 
> to get started with hbase.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HBASE-26546) hbase-shaded-client missing required thirdparty classes under hadoop 3.3.1

2021-12-07 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17454806#comment-17454806
 ] 

Michael Stack commented on HBASE-26546:
---

Makes sense [~bbeaudreault] . Thanks for digging in.

> hbase-shaded-client missing required thirdparty classes under hadoop 3.3.1
> --
>
> Key: HBASE-26546
> URL: https://issues.apache.org/jira/browse/HBASE-26546
> Project: HBase
>  Issue Type: Bug
>Reporter: Bryan Beaudreault
>Priority: Major
>
> In HBASE-25792, the shaded thirdparty libraries from hadoop were removed from 
> the hbase-shaded-client fat jar to satisfy invariant checks. Unfortunately 
> this causes users of hbase-shaded-client to fail, because required classes 
> are not available at runtime.
> The specific failure I'm seeing is when trying to call new Configuration(), 
> which results in:
>  
>  
> {code:java}
> Caused by: java.lang.NoClassDefFoundError: 
> org/apache/hadoop/thirdparty/com/google/common/base/Preconditions   
>   at 
> org.apache.hadoop.conf.Configuration$DeprecationDelta.(Configuration.java:430)
>    
>   at 
> org.apache.hadoop.conf.Configuration$DeprecationDelta.(Configuration.java:443)
>    
>   at 
> org.apache.hadoop.conf.Configuration.(Configuration.java:525){code}
>  
>  
> If you take a look at the hbase-shaded-client fat jar, it contains the 
> org.apache.hadoop.conf.Configuration class as you'd expect. If you decompile 
> that class (or look at the 3.3.1 source), you'll see that there is an import 
> for org.apache.hadoop.thirdparty.com.google.common.base.Preconditions but the 
> fat jar does not provide it.
>  
> One way for clients to get around this is to add an explicit dependency on 
> hadoop-shaded-guava, but this is problematic for a few reasons:
>  
> - it's best practice to use maven-dependency-plugin to disallow declared, 
> unused dependencies (which this would be)
> - it requires users to continually keep the version of hadoop-shaded-guava 
> up-to-date over time.
> - it only covers guava, but there is also protobuf and potentially other 
> shaded libraries in the future.
>  
> I think we should remove the exclusion of 
> {{org/apache/hadoop/thirdparty/**/*}} from the shading config and instead add 
> that pattern to the allowlist so that hbase-shaded-client is all clients need 
> to get started with hbase.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HBASE-26321) Post blog to hbase.apache.org on SCR cache sizing

2021-10-05 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-26321.
---
Fix Version/s: 3.0.0-alpha-2
 Hadoop Flags: Reviewed
 Release Note: Pushed blog at 
https://blogs.apache.org/hbase/entry/an-hbase-hdfs-short-circuit
 Assignee: Michael Stack
   Resolution: Fixed

Thanks for taking a look [~psomogyi]

I pushed it here 
[https://blogs.apache.org/hbase/entry/an-hbase-hdfs-short-circuit]

Shout if anyone wants to add edits.

> Post blog to hbase.apache.org on SCR cache sizing
> -
>
> Key: HBASE-26321
> URL: https://issues.apache.org/jira/browse/HBASE-26321
> Project: HBase
>  Issue Type: Task
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Major
> Fix For: 3.0.0-alpha-2
>
>
> [~huaxiangsun] and I wrote up our experience debugging a Short-circuit Read 
> cache size issue. Let me attach link here and leave it hang here a few days 
> in case edits or input from others.  Intend to put it up here 
> https://blogs.apache.org/hbase/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-26321) Post blog to hbase.apache.org on SCR cache sizing

2021-10-01 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17423455#comment-17423455
 ] 

Michael Stack commented on HBASE-26321:
---

Linked to google doc 
https://docs.google.com/document/d/15uhD2T2atpAJRT2lY2lm1vkekbfhb0Q0DQT2IOpKliQ/edit#heading=h.qnb3dkb8gw59

> Post blog to hbase.apache.org on SCR cache sizing
> -
>
> Key: HBASE-26321
> URL: https://issues.apache.org/jira/browse/HBASE-26321
> Project: HBase
>  Issue Type: Task
>Reporter: Michael Stack
>Priority: Major
>
> [~huaxiangsun] and I wrote up our experience debugging a Short-circuit Read 
> cache size issue. Let me attach link here and leave it hang here a few days 
> in case edits or input from others.  Intend to put it up here 
> https://blogs.apache.org/hbase/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-26321) Post blog to hbase.apache.org on SCR cache sizing

2021-10-01 Thread Michael Stack (Jira)
Michael Stack created HBASE-26321:
-

 Summary: Post blog to hbase.apache.org on SCR cache sizing
 Key: HBASE-26321
 URL: https://issues.apache.org/jira/browse/HBASE-26321
 Project: HBase
  Issue Type: Task
Reporter: Michael Stack


[~huaxiangsun] and I wrote up our experience debugging a Short-circuit Read 
cache size issue. Let me attach link here and leave it hang here a few days in 
case edits or input from others.  Intend to put it up here 
https://blogs.apache.org/hbase/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-26198) RegionServer dead on hadoop 3.3.1: NoSuchMethodError LocatedBlocks.getLocations()

2021-09-02 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17409262#comment-17409262
 ] 

Michael Stack commented on HBASE-26198:
---

Thanks [~mengqi] ...

The change in the LocatedBlock.getLocations() signature looks to have happened 
here, committed for hadoop-2.7, way back when.
{code:java}
commit ab934e85947dcf2092050023909dd81ae274ff45
Author: Arpit Agarwal 
Date:   Mon Feb 9 12:17:40 2015 -0800HDFS-7647. 
DatanodeManager.sortLocatedBlocks sorts DatanodeInfos but not StorageIDs. 
(Contributed by Milan Desai){code}
I'm bit baffled as to why I did not run into this testing – or why I don't see 
it in the 3.x clusters we usually run on. Let me try 2.4.5 on 3.3.1 again 
(it'll be a few days I think...). Thanks.

> RegionServer dead on hadoop 3.3.1: NoSuchMethodError 
> LocatedBlocks.getLocations()
> -
>
> Key: HBASE-26198
> URL: https://issues.apache.org/jira/browse/HBASE-26198
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: mengqi
>Priority: Major
> Attachments: 4ad46153842c29898189b90fc986925c87966ce6.diff, a.diff, 
> image-2021-08-16-16-24-32-418.png
>
>
> !image-2021-08-16-16-24-32-418.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-26103) conn.getBufferedMutator(tableName) leaks thread executors and other problems (for master branch)

2021-08-30 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-26103.
---
Fix Version/s: 3.0.0-alpha-2
 Hadoop Flags: Reviewed
 Release Note: Deprecate (unused) BufferedMutatorParams#pool and 
BufferedMutatorParams#getPool
   Resolution: Fixed

Merged the PR. Thanks for the contrib [~shahrs87]  (and review [~anoop.hbase] ).

> conn.getBufferedMutator(tableName) leaks thread executors and other problems 
> (for master branch)
> 
>
> Key: HBASE-26103
> URL: https://issues.apache.org/jira/browse/HBASE-26103
> Project: HBase
>  Issue Type: Sub-task
>  Components: Client
>Affects Versions: 3.0.0-alpha-1
>Reporter: Rushabh Shah
>Assignee: Rushabh Shah
>Priority: Major
> Fix For: 3.0.0-alpha-2
>
>
> This is same as HBASE-26088  but created separate ticket for master branch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-18612) Update comparators to be more declarative

2021-08-30 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-18612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17406765#comment-17406765
 ] 

Michael Stack commented on HBASE-18612:
---

Added you as a contributor [~harjitdotsingh] and assigned you this issue (FYI 
[~mdrob])

> Update comparators to be more declarative
> -
>
> Key: HBASE-18612
> URL: https://issues.apache.org/jira/browse/HBASE-18612
> Project: HBase
>  Issue Type: Improvement
>Reporter: Mike Drob
>Assignee: Harjit Singh
>Priority: Major
>  Labels: beginner, java8
> Attachments: HBASE-18612-WIP.patch
>
>
> We can write less code if we use factory methods present on Comparator to 
> build the chains. Also has the advantage of being easier to update in the 
> future and hopefully easier for incoming folks to understand.
> See also: 
> https://praveer09.github.io/technology/2016/06/21/writing-comparators-the-java8-way/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HBASE-18612) Update comparators to be more declarative

2021-08-30 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-18612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack reassigned HBASE-18612:
-

Assignee: Harjit Singh

> Update comparators to be more declarative
> -
>
> Key: HBASE-18612
> URL: https://issues.apache.org/jira/browse/HBASE-18612
> Project: HBase
>  Issue Type: Improvement
>Reporter: Mike Drob
>Assignee: Harjit Singh
>Priority: Major
>  Labels: beginner, java8
> Attachments: HBASE-18612-WIP.patch
>
>
> We can write less code if we use factory methods present on Comparator to 
> build the chains. Also has the advantage of being easier to update in the 
> future and hopefully easier for incoming folks to understand.
> See also: 
> https://praveer09.github.io/technology/2016/06/21/writing-comparators-the-java8-way/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HBASE-20503) [AsyncFSWAL] Failed to get sync result after 300000 ms for txid=160912, WAL system stuck?

2021-08-26 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-20503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17405575#comment-17405575
 ] 

Michael Stack edited comment on HBASE-20503 at 8/27/21, 4:47 AM:
-

[~kingWang] -have you looked in your datanode logs? In our case, the datanode 
was restarting unexpectedly – resource constraint.- <= Ignore. Wrong context.


was (Author: stack):
[~kingWang] have you looked in your datanode logs? In our case, the datanode 
was restarting unexpectedly – resource constraint.

> [AsyncFSWAL] Failed to get sync result after 30 ms for txid=160912, WAL 
> system stuck?
> -
>
> Key: HBASE-20503
> URL: https://issues.apache.org/jira/browse/HBASE-20503
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Reporter: Michael Stack
>Priority: Major
> Attachments: 
> 0001-HBASE-20503-AsyncFSWAL-Failed-to-get-sync-result-aft.patch, 
> 0001-HBASE-20503-AsyncFSWAL-Failed-to-get-sync-result-aft.patch
>
>
> Scale test. Startup w/ 30k regions over ~250nodes. This RS is trying to 
> furiously open regions assigned by Master. It is importantly carrying 
> hbase:meta. Twenty minutes in, meta goes dead after an exception up out 
> AsyncFSWAL. Process had been restarted so I couldn't get a  thread dump. 
> Suspicious is we archive a WAL and we get a FNFE because we got to access WAL 
> in old location. [~Apache9] mind taking a look? Does this FNFE rolling kill 
> the WAL sub-system? Thanks.
> DFS complaining on file open for a few files getting blocks from remote dead 
> DNs: e.g. {{2018-04-25 10:05:21,506 WARN 
> org.apache.hadoop.hdfs.client.impl.BlockReaderFactory: I/O error constructing 
> remote block reader.
> java.net.ConnectException: Connection refused}}
> AsyncFSWAL complaining: "AbstractFSWAL: Slow sync cost: 103 ms" .
> About ten minutes in, we get this:
> {code}
> 2018-04-25 10:15:16,532 WARN 
> org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL: sync failed
> java.io.IOException: stream already broken
>   at 
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutput.flush0(FanOutOneBlockAsyncDFSOutput.java:424)
>   at 
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutput.flush(FanOutOneBlockAsyncDFSOutput.java:513)
>   
>   
>   
>   at 
> org.apache.hadoop.hbase.regionserver.wal.AsyncProtobufLogWriter.sync(AsyncProtobufLogWriter.java:134)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.sync(AsyncFSWAL.java:364)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.consume(AsyncFSWAL.java:547)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> 2018-04-25 10:15:16,680 INFO 
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL: Rolled WAL 
> /hbase/WALs/vc0205.halxg.cloudera.com,22101,1524675808073/vc0205.halxg.cloudera.com%2C22101%2C1524675808073.meta.1524676253923.meta
>  with entries=10819, filesize=7.57 MB; new WAL 
> /hbase/WALs/vc0205.halxg.cloudera.com,22101,1524675808073/vc0205.halxg.cloudera.com%2C22101%2C1524675808073.meta.1524676516535.meta
> 2018-04-25 10:15:16,680 INFO 
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL: Archiving 
> hdfs://ns1/hbase/WALs/vc0205.halxg.cloudera.com,22101,1524675808073/vc0205.halxg.cloudera.com%2C22101%2C1524675808073.meta.1524675848653.meta
>  to 
> hdfs://ns1/hbase/oldWALs/vc0205.halxg.cloudera.com%2C22101%2C1524675808073.meta.1524675848653.meta
> 2018-04-25 10:15:16,686 WARN 
> org.apache.hadoop.hbase.regionserver.wal.AbstractProtobufLogWriter: Failed to 
> write trailer, non-fatal, continuing...
> java.io.IOException: stream already broken
>   at 
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutput.flush0(FanOutOneBlockAsyncDFSOutput.java:424)
>   at 
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutput.flush(FanOutOneBlockAsyncDFSOutput.java:513)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.AsyncProtobufLogWriter.lambda$writeWALTrailerAndMagic$3(AsyncProtobufLogWriter.java:210)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.AsyncProtobufLogWriter.write(AsyncProtobufLogWriter.java:166)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.AsyncProtobufLogWriter.writeWALTrailerAndMagic(AsyncProtobufLogWriter.java:201)
>   at 
> 

[jira] [Commented] (HBASE-20503) [AsyncFSWAL] Failed to get sync result after 300000 ms for txid=160912, WAL system stuck?

2021-08-26 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-20503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17405575#comment-17405575
 ] 

Michael Stack commented on HBASE-20503:
---

[~kingWang] have you looked in your datanode logs? In our case, the datanode 
was restarting unexpectedly – resource constraint.

> [AsyncFSWAL] Failed to get sync result after 30 ms for txid=160912, WAL 
> system stuck?
> -
>
> Key: HBASE-20503
> URL: https://issues.apache.org/jira/browse/HBASE-20503
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Reporter: Michael Stack
>Priority: Major
> Attachments: 
> 0001-HBASE-20503-AsyncFSWAL-Failed-to-get-sync-result-aft.patch, 
> 0001-HBASE-20503-AsyncFSWAL-Failed-to-get-sync-result-aft.patch
>
>
> Scale test. Startup w/ 30k regions over ~250nodes. This RS is trying to 
> furiously open regions assigned by Master. It is importantly carrying 
> hbase:meta. Twenty minutes in, meta goes dead after an exception up out 
> AsyncFSWAL. Process had been restarted so I couldn't get a  thread dump. 
> Suspicious is we archive a WAL and we get a FNFE because we got to access WAL 
> in old location. [~Apache9] mind taking a look? Does this FNFE rolling kill 
> the WAL sub-system? Thanks.
> DFS complaining on file open for a few files getting blocks from remote dead 
> DNs: e.g. {{2018-04-25 10:05:21,506 WARN 
> org.apache.hadoop.hdfs.client.impl.BlockReaderFactory: I/O error constructing 
> remote block reader.
> java.net.ConnectException: Connection refused}}
> AsyncFSWAL complaining: "AbstractFSWAL: Slow sync cost: 103 ms" .
> About ten minutes in, we get this:
> {code}
> 2018-04-25 10:15:16,532 WARN 
> org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL: sync failed
> java.io.IOException: stream already broken
>   at 
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutput.flush0(FanOutOneBlockAsyncDFSOutput.java:424)
>   at 
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutput.flush(FanOutOneBlockAsyncDFSOutput.java:513)
>   
>   
>   
>   at 
> org.apache.hadoop.hbase.regionserver.wal.AsyncProtobufLogWriter.sync(AsyncProtobufLogWriter.java:134)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.sync(AsyncFSWAL.java:364)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.consume(AsyncFSWAL.java:547)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> 2018-04-25 10:15:16,680 INFO 
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL: Rolled WAL 
> /hbase/WALs/vc0205.halxg.cloudera.com,22101,1524675808073/vc0205.halxg.cloudera.com%2C22101%2C1524675808073.meta.1524676253923.meta
>  with entries=10819, filesize=7.57 MB; new WAL 
> /hbase/WALs/vc0205.halxg.cloudera.com,22101,1524675808073/vc0205.halxg.cloudera.com%2C22101%2C1524675808073.meta.1524676516535.meta
> 2018-04-25 10:15:16,680 INFO 
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL: Archiving 
> hdfs://ns1/hbase/WALs/vc0205.halxg.cloudera.com,22101,1524675808073/vc0205.halxg.cloudera.com%2C22101%2C1524675808073.meta.1524675848653.meta
>  to 
> hdfs://ns1/hbase/oldWALs/vc0205.halxg.cloudera.com%2C22101%2C1524675808073.meta.1524675848653.meta
> 2018-04-25 10:15:16,686 WARN 
> org.apache.hadoop.hbase.regionserver.wal.AbstractProtobufLogWriter: Failed to 
> write trailer, non-fatal, continuing...
> java.io.IOException: stream already broken
>   at 
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutput.flush0(FanOutOneBlockAsyncDFSOutput.java:424)
>   at 
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutput.flush(FanOutOneBlockAsyncDFSOutput.java:513)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.AsyncProtobufLogWriter.lambda$writeWALTrailerAndMagic$3(AsyncProtobufLogWriter.java:210)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.AsyncProtobufLogWriter.write(AsyncProtobufLogWriter.java:166)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.AsyncProtobufLogWriter.writeWALTrailerAndMagic(AsyncProtobufLogWriter.java:201)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.AbstractProtobufLogWriter.writeWALTrailer(AbstractProtobufLogWriter.java:233)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.AsyncProtobufLogWriter.close(AsyncProtobufLogWriter.java:143)
>   at 
> 

[jira] [Commented] (HBASE-26042) WAL lockup on 'sync failed' org.apache.hbase.thirdparty.io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection reset by peer

2021-08-20 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17402421#comment-17402421
 ] 

Michael Stack commented on HBASE-26042:
---

Update. We cured the provocation that was causing lots of IOEs during WAL 
rolling so we don't see this issue anymore.

> WAL lockup on 'sync failed' 
> org.apache.hbase.thirdparty.io.netty.channel.unix.Errors$NativeIoException: 
> readAddress(..) failed: Connection reset by peer
> 
>
> Key: HBASE-26042
> URL: https://issues.apache.org/jira/browse/HBASE-26042
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.3.5
>Reporter: Michael Stack
>Priority: Major
> Attachments: HBASE-26042-test-repro.patch, js1, js2
>
>
> Making note of issue seen in production cluster.
> Node had been struggling under load for a few days with slow syncs up to 10 
> seconds, a few STUCK MVCCs from which it recovered and some java pauses up to 
> three seconds in length.
> Then the below happened:
> {code:java}
> 2021-06-27 13:41:27,604 WARN  [AsyncFSWAL-0-hdfs://:8020/hbase] 
> wal.AsyncFSWAL: sync 
> failedorg.apache.hbase.thirdparty.io.netty.channel.unix.Errors$NativeIoException:
>  readAddress(..) failed: Connection reset by peer {code}
> ... and WAL turned dead in the water. Scanners start expiring. RPC prints 
> text versions of requests complaining requestsTooSlow. Then we start to see 
> these:
> {code:java}
> org.apache.hadoop.hbase.exceptions.TimeoutIOException: Failed to get sync 
> result after 30 ms for txid=552128301, WAL system stuck? {code}
> Whats supposed to happen when other side goes away like this is that we will 
> roll the WAL – go set up a new one. You can see it happening if you run
> {code:java}
> mvn test 
> -Dtest=org.apache.hadoop.hbase.regionserver.wal.TestAsyncFSWAL#testBrokenWriter
>  {code}
> I tried hacking the test to repro the above hang by throwing same exception 
> in above test (on linux because need epoll to repro) but all just worked.
> Thread dumps of the hungup WAL subsystem are a little odd. The log roller is 
> stuck w/o timeout trying to write a long on the WAL header:
>  
> {code:java}
> Thread 9464: (state = BLOCKED)
>  - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information 
> may be imprecise)
>  - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, 
> line=175 (Compiled frame)
>  - java.util.concurrent.CompletableFuture$Signaller.block() @bci=19, 
> line=1707 (Compiled frame)
>  - 
> java.util.concurrent.ForkJoinPool.managedBlock(java.util.concurrent.ForkJoinPool$ManagedBlocker)
>  @bci=119, line=3323 (Compiled frame)
>  - java.util.concurrent.CompletableFuture.waitingGet(boolean) @bci=115, 
> line=1742 (Compiled frame)
>  - java.util.concurrent.CompletableFuture.get() @bci=11, line=1908 (Compiled 
> frame)
>  - 
> org.apache.hadoop.hbase.regionserver.wal.AsyncProtobufLogWriter.write(java.util.function.Consumer)
>  @bci=16, line=189 (Compiled frame)
>  - 
> org.apache.hadoop.hbase.regionserver.wal.AsyncProtobufLogWriter.writeMagicAndWALHeader(byte[],
>  org.apache.hadoop.hbase.shaded.protobuf.generated.WALProtos$WALHeader) 
> @bci=9, line=202 (Compiled frame)
>  - 
> org.apache.hadoop.hbase.regionserver.wal.AbstractProtobufLogWriter.init(org.apache.hadoop.fs.FileSystem,
>  org.apache.hadoop.fs.Path, org.apache.hadoop.conf.Configuration, boolean, 
> long) @bci=107, line=170 (Compiled frame)
>  - 
> org.apache.hadoop.hbase.wal.AsyncFSWALProvider.createAsyncWriter(org.apache.hadoop.conf.Configuration,
>  org.apache.hadoop.fs.FileSystem, org.apache.hadoop.fs.Path, boolean, long, 
> org.apache.hbase.thirdparty.io.netty.channel.EventLoopGroup, java.lang.Class) 
> @bci=61, line=113 (Compiled frame)
>  - 
> org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.createWriterInstance(org.apache.hadoop.fs.Path)
>  @bci=22, line=651 (Compiled frame)
>  - 
> org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.createWriterInstance(org.apache.hadoop.fs.Path)
>  @bci=2, line=128 (Compiled frame)
>  - org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriter(boolean) 
> @bci=101, line=797 (Compiled frame)
>  - org.apache.hadoop.hbase.wal.AbstractWALRoller$RollController.rollWal(long) 
> @bci=18, line=263 (Compiled frame)
>  - org.apache.hadoop.hbase.wal.AbstractWALRoller.run() @bci=198, line=179 
> (Compiled frame) {code}
>  
> Other threads are BLOCKED trying to append the WAL w/ flush markers etc. 
> unable to add the ringbuffer:
>  
> {code:java}
> Thread 9465: (state = BLOCKED)
>  - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information 
> may be imprecise)
>  - java.util.concurrent.locks.LockSupport.parkNanos(long) @bci=11, line=338 
> (Compiled 

[jira] [Resolved] (HBASE-24337) Backport HBASE-23968 to branch-2

2021-08-18 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-24337.
---
Fix Version/s: 2.5.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

Backported to branch-2. Resolving.

> Backport HBASE-23968 to branch-2
> 
>
> Key: HBASE-24337
> URL: https://issues.apache.org/jira/browse/HBASE-24337
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Minwoo Kang
>Assignee: Minwoo Kang
>Priority: Minor
> Fix For: 2.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-26206) Add UT for masterless mode

2021-08-18 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17401194#comment-17401194
 ] 

Michael Stack commented on HBASE-26206:
---

{quote}[~stack] Could you please provide some context about what is going on in 
HBASE-18846?
{quote}
A downstreamer project started up a RS just so it could capture replication 
stream output and send it to an indexer.  The downstreamer would break anytime 
RS internals changed. Over in HBASE-18846, the change made it so you could 
start up a skeleton of the RS only... all 'services' but Replication were 
turned off... so just the core of a RS was running w/ Replication. Shout if you 
need more context.--

> Add UT for masterless mode
> --
>
> Key: HBASE-26206
> URL: https://issues.apache.org/jira/browse/HBASE-26206
> Project: HBase
>  Issue Type: Task
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
>
> It is not used in code any more and there is no UT to make sure it works.
> Let's just remove it as it is only designed to be used in tests.
>  Update 
> This is used in hbase-connector, so we'd better add a UT in hbase to confirm 
> that it still works.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23917) [SimpleRpcServer] Subsequent requests will have no response in case of request IO mess up

2021-08-17 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17400783#comment-17400783
 ] 

Michael Stack commented on HBASE-23917:
---

[~javaman_chen] I closed the PR because the bulk of the PR is for code we don't 
use anymore (and releases from branch-1 are finishing). There is nice 
refactoring in RpcServer in the PR that we could use but otherwise, the PR 
won't be applied. Thanks for the contribution.

> [SimpleRpcServer] Subsequent requests will have no response in case of 
> request IO mess up
> -
>
> Key: HBASE-23917
> URL: https://issues.apache.org/jira/browse/HBASE-23917
> Project: HBase
>  Issue Type: Bug
>  Components: rpc
>Reporter: chenxu
>Assignee: chenxu
>Priority: Major
>
> Code in SimpleServerRpcConnection#readAndProcess work like this
> {code:java}
> public int readAndProcess() throws IOException, InterruptedException {
>   …
>   if (data == null) {
>     …
>     initByteBuffToReadInto(dataLength);
>     …
>   }
>   count = channelDataRead(channel, data);
>   if (count >= 0 && data.remaining() == 0) { // count==0 if dataLength == 0
>     process();
>   }
>   return count;
> }
> {code}
> In case of request IO mess up, _data.remaining()_ may be greater than 0, so 
> _process()_ method will not be executed.
> There are some cleanup operations in _process()_
> {code:java}
> private void process() throws IOException, InterruptedException {
>   data.rewind();
>   try {
> ..
>   } finally {
> dataLengthBuffer.clear(); // Clean for the next call
> data = null; // For the GC
> this.callCleanup = null;
>   }
> }
> {code}
> If _process()_ not executed, variable _data_ will always not null, and 
> _data.remaining()_ will always be greater than 0, so _process()_ will never 
> be executed again, and subsequent requests will have no response, this has 
> been occured in our product env.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23584) Decrease rpc getFileStatus count when open a storefile (cache filestatus in storefileinfo rather than load each time)

2021-08-17 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17400782#comment-17400782
 ] 

Michael Stack commented on HBASE-23584:
---

Refreshed the PR. Nice idea hanging out in our PR Q

> Decrease rpc getFileStatus count when open a storefile  (cache filestatus in 
> storefileinfo rather than load each time)
> --
>
> Key: HBASE-23584
> URL: https://issues.apache.org/jira/browse/HBASE-23584
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Affects Versions: 2.1.1
>Reporter: yuhuiyang
>Priority: Minor
> Attachments: HBASE-23584-branch-2.1-01.patch, 
> HBASE-23584-master-001.patch
>
>
> When a store needs to open a storefile , it will create getFileStatus rpc 
> twice or more . So open a region with too many files or open too many regions 
> at once will cost very much time. if namenode wastes too much time in rpc 
> process every time (in my case 5s sometime) due to namenode itself's problem 
> . So i think we can descrease the times for getFileStatus , this will reduce 
> stress to namenode and consume less time when store open a storefile .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-23584) Decrease rpc getFileStatus count when open a storefile (cache filestatus in storefileinfo rather than load each time)

2021-08-17 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack updated HBASE-23584:
--
Summary: Decrease rpc getFileStatus count when open a storefile  (cache 
filestatus in storefileinfo rather than load each time)  (was: Descrease rpc 
getFileStatus count when open a storefile )

> Decrease rpc getFileStatus count when open a storefile  (cache filestatus in 
> storefileinfo rather than load each time)
> --
>
> Key: HBASE-23584
> URL: https://issues.apache.org/jira/browse/HBASE-23584
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Affects Versions: 2.1.1
>Reporter: yuhuiyang
>Priority: Minor
> Attachments: HBASE-23584-branch-2.1-01.patch, 
> HBASE-23584-master-001.patch
>
>
> When a store needs to open a storefile , it will create getFileStatus rpc 
> twice or more . So open a region with too many files or open too many regions 
> at once will cost very much time. if namenode wastes too much time in rpc 
> process every time (in my case 5s sometime) due to namenode itself's problem 
> . So i think we can descrease the times for getFileStatus , this will reduce 
> stress to namenode and consume less time when store open a storefile .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24053) hbase/bin/region_status.rb Not available in hbase2

2021-08-17 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17400757#comment-17400757
 ] 

Michael Stack commented on HBASE-24053:
---

I tried the PR and it fails still w/ the following:

 

{{NoMethodError: undefined method `allTableRegions' for 
Java::OrgApacheHadoopHbase::MetaTableAccessor:Class}}
{{Did you mean? getAllRegions}}
{{ method_missing at org/jruby/RubyBasicObject.java:1657}}
{{ block in ./bin/region_status.rb at ./bin/region_status.rb:139}}
{{ loop at org/jruby/RubyKernel.java:1316}}
{{  at ./bin/region_status.rb:134}}

 

> hbase/bin/region_status.rb Not available in hbase2
> --
>
> Key: HBASE-24053
> URL: https://issues.apache.org/jira/browse/HBASE-24053
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.1
>Reporter: shenshengli
>Assignee: shenshengli
>Priority: Minor
>
> A series of strange errors occurred while running this script in hbase2, 
> which was corrected as prompted, and note that in hbase2's MetaTableAccessor, 
> getHRegionInfo has been replaced by getRegionInfo



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25165) Change 'State time' in UI so sorts

2021-08-17 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17400476#comment-17400476
 ] 

Michael Stack commented on HBASE-25165:
---

Just to say I tested on big cluster before merging...

> Change 'State time' in UI so sorts
> --
>
> Key: HBASE-25165
> URL: https://issues.apache.org/jira/browse/HBASE-25165
> Project: HBase
>  Issue Type: Bug
>  Components: UI
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Minor
> Fix For: 3.0.0-alpha-1, 2.4.0, 2.3.6
>
> Attachments: Screen Shot 2020-10-07 at 4.15.32 PM.png, Screen Shot 
> 2020-10-07 at 4.15.42 PM.png
>
>
> Here is a minor issue.
> I had an issue w/ crashing servers. The servers were auto-restarted on crash.
> To find the crashing servers, I was sorting on the 'Start time' column in the 
> Master UI. This basically worked but the sort is unreliable as the date we 
> display starts with days-of-the-week.
> This issue is about moving to display start time in iso8601 which is sortable 
> (and occupies less real estate). Let me add some images.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-26198) RegionServer dead on hadoop 3.3.1: NoSuchMethodError LocatedBlocks.getLocations()

2021-08-16 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17400115#comment-17400115
 ] 

Michael Stack commented on HBASE-26198:
---

We cannot vouch for vendor version of HBase 

> RegionServer dead on hadoop 3.3.1: NoSuchMethodError 
> LocatedBlocks.getLocations()
> -
>
> Key: HBASE-26198
> URL: https://issues.apache.org/jira/browse/HBASE-26198
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: mengqi
>Priority: Major
> Attachments: 4ad46153842c29898189b90fc986925c87966ce6.diff, 
> image-2021-08-16-16-24-32-418.png
>
>
> !image-2021-08-16-16-24-32-418.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-26196) Support configuration override for remote cluster of HFileOutputFormat locality sensitive

2021-08-16 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17400113#comment-17400113
 ] 

Michael Stack commented on HBASE-26196:
---

Thanks for improving release note. Make sub task for branch 1 but releases off 
branch 1 are coming to an end is my understanding.  See dev mailing lust for 
discussion. Thanks

> Support configuration override for remote cluster of HFileOutputFormat 
> locality sensitive
> -
>
> Key: HBASE-26196
> URL: https://issues.apache.org/jira/browse/HBASE-26196
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 1.8.0, 3.0.0-alpha-2, 2.4.5
>Reporter: Shinya Yoshida
>Assignee: Shinya Yoshida
>Priority: Major
> Fix For: 2.5.0, 3.0.0-alpha-2, 2.4.6
>
>
> We introduced support to generate hfile with good locality for a remote 
> cluster even in HBASE-25608.
> I realized we need to override other configurations for the remote cluster in 
> addition to the zookeeper cluster key.
> For example, read from a non-secure cluster and write hfiles for a secure 
> cluster.
>  In this case, we use TableInputFormat for non-secure cluster with 
> hbase.security.authentication=simple in job configuration.
>  So HFileOutputFormat failed to connect to remote secure cluster because 
> requires hbase.security.authentication=kerberos in job conf.
>  
> Thus let's introduce configuration override for remote-cluster-aware 
> HFileOutputFormat locality-sensitive feature.
>  
> -Another example is to read from a secure cluster (A) and write hfiles for 
> another secure cluster (B) and we use different principal for each cluster.-
>  -For instance, we use cluster-a/_h...@example.com for A and 
> cluster-b/_h...@example.com for B.-
>  -Then we need to override MASTER_KRB_PRINCIPAL and 
> REGIONSERVER_KRB_PRINCIPAL using cluster-b/_h...@example.com to connect 
> cluster B.-
> ^ This is not truth, we use token based digest auth in mapper/reducer, so 
> principal difference for kerberos should be fine



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-24842) make export snapshot report size can be config

2021-08-16 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-24842.
---
Fix Version/s: 3.0.0-alpha-2
 Hadoop Flags: Reviewed
 Release Note: Set new config snapshot.export.report.size to size at which 
you want to see reporting.
   Resolution: Fixed

Merged to master. Make subtask if you'd like it backported. Thanks for the PR 
[~chenyechao]

> make export snapshot report size can be config
> --
>
> Key: HBASE-24842
> URL: https://issues.apache.org/jira/browse/HBASE-24842
> Project: HBase
>  Issue Type: Improvement
>  Components: mapreduce
>Reporter: Yechao Chen
>Assignee: Yechao Chen
>Priority: Minor
> Fix For: 3.0.0-alpha-2
>
>
> current export snapshot will be report ONE MB (1*1024*1024 Bytes),
> we can make it can be config 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24570) connection#close throws NPE

2021-08-16 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack updated HBASE-24570:
--
Fix Version/s: 2.3.7
   2.4.6
   2.5.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Appled to branch-2.3+ (doesn't apply to master... doesn't make sense there). 
Thanks for the PR [~Bo Cui]

> connection#close throws NPE
> ---
>
> Key: HBASE-24570
> URL: https://issues.apache.org/jira/browse/HBASE-24570
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 2.2.3
>Reporter: Bo Cui
>Assignee: Bo Cui
>Priority: Minor
> Fix For: 2.5.0, 2.4.6, 2.3.7
>
>
> In the ConnectionImplementation construction method, if getRegistry throws 
> exception, registry will be null,and then close will throw NPE
> {code:java}
> try {
>   this.registry = AsyncRegistryFactory.getRegistry(conf);
>   ...
> } catch (Throwable e) {
>   // avoid leaks: registry, rpcClient, ...
>   LOG.debug("connection construction failed", e);
>   close();
>   throw e;
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-26191) Annotate shaded generated protobuf as InterfaceAudience.Private

2021-08-16 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17400055#comment-17400055
 ] 

Michael Stack commented on HBASE-26191:
---

{quote}Perhaps we can use something like 
[Rewrite|https://github.com/openrewrite/rewrite-maven-plugin] to annotate the 
generated classes during the build process.
{quote}
Yeah. Was thinking. We do some gymnastics w/ it already IIRC.

> Annotate shaded generated protobuf as InterfaceAudience.Private
> ---
>
> Key: HBASE-26191
> URL: https://issues.apache.org/jira/browse/HBASE-26191
> Project: HBase
>  Issue Type: Task
>  Components: Coprocessors, Protobufs
>Reporter: Michael Stack
>Priority: Major
>
> Annotate generated shaded protobufs as InterfaceAudience.Private. It might 
> not be able to add the annotation to each class; at a minimum update the doc 
> on our story around shaded internal protobufs.
> See the prompting mailing list discussion here: 
> [https://lists.apache.org/thread.html/r9e6eb11106727d245f6eb2a5023823901637971d6ed0f0aedaf8d149%40%3Cdev.hbase.apache.org%3E]
> So far the consensus has it that the shaded generated protobuf should be made 
> IA.Private.  Will wait on it to settle.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24652) master-status UI make date type fields sortable

2021-08-16 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17400049#comment-17400049
 ] 

Michael Stack commented on HBASE-24652:
---

HBASE-25165 is where I took us in the wrong direction before re-surfacing this 
old issue.

> master-status UI make date type fields sortable
> ---
>
> Key: HBASE-24652
> URL: https://issues.apache.org/jira/browse/HBASE-24652
> Project: HBase
>  Issue Type: Improvement
>  Components: master, Operability, UI, Usability
>Affects Versions: 3.0.0-alpha-1, 2.2.0, 2.3.0, 2.1.5, 2.2.1, 2.1.6
>Reporter: Jeongmin Kim
>Assignee: jeongmin kim
>Priority: Minor
> Fix For: 2.5.0, 3.0.0-alpha-2, 2.4.6, 2.3.7
>
> Attachments: SCREEN_SHOT1.png
>
>
> Revisit of HBASE-21207, HBASE-22543
> date type values such as regionserver list 'Start time' field on 
> master-status page, are not sorted by time.
> HBASE-21207, HBASE-22543 missed it. so before this fix, date sorted as String.
> The first field of it is 'day'. therefore always Friday goes first Wednesday 
> goes last, no matter what date it is.
>    * SCREEN_SHOT1.png
>  
> this fix make date type values sorted by time and date.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-24652) master-status UI make date type fields sortable

2021-08-16 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-24652.
---
Fix Version/s: 2.3.7
   2.4.6
   2.5.0
 Hadoop Flags: Reviewed
 Release Note: Makes RegionServer 'Start time' sortable in the Master UI
 Assignee: jeongmin kim
   Resolution: Fixed

[~jeongmin.kim] pardon me. I forgot about this one.  I pushed to branch-2.3+  
Thanks for the fix and please pardon my oversight.

> master-status UI make date type fields sortable
> ---
>
> Key: HBASE-24652
> URL: https://issues.apache.org/jira/browse/HBASE-24652
> Project: HBase
>  Issue Type: Improvement
>  Components: master, Operability, UI, Usability
>Affects Versions: 3.0.0-alpha-1, 2.2.0, 2.3.0, 2.1.5, 2.2.1, 2.1.6
>Reporter: Jeongmin Kim
>Assignee: jeongmin kim
>Priority: Minor
> Fix For: 2.5.0, 3.0.0-alpha-2, 2.4.6, 2.3.7
>
> Attachments: SCREEN_SHOT1.png
>
>
> Revisit of HBASE-21207, HBASE-22543
> date type values such as regionserver list 'Start time' field on 
> master-status page, are not sorted by time.
> HBASE-21207, HBASE-22543 missed it. so before this fix, date sorted as String.
> The first field of it is 'day'. therefore always Friday goes first Wednesday 
> goes last, no matter what date it is.
>    * SCREEN_SHOT1.png
>  
> this fix make date type values sorted by time and date.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-26200) Undo 'HBASE-25165 Change 'State time' in UI so sorts (#2508)' in favor of HBASE-24652

2021-08-16 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-26200.
---
Fix Version/s: 2.3.7
   2.4.6
   3.0.0-alpha-2
   2.5.0
 Release Note: Undid showing RegionServer 'Start time' in ISO-8601 format. 
Revert.
 Assignee: Michael Stack
   Resolution: Fixed

> Undo 'HBASE-25165 Change 'State time' in UI so sorts (#2508)' in favor of 
> HBASE-24652
> -
>
> Key: HBASE-26200
> URL: https://issues.apache.org/jira/browse/HBASE-26200
> Project: HBase
>  Issue Type: Bug
>  Components: UI
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Major
> Fix For: 2.5.0, 3.0.0-alpha-2, 2.4.6, 2.3.7
>
>
> The below change by me does not actually work and I found an old issue that 
> does the proper job that was neglected. I'm undoing the below in favor of 
> HBASE-24652.
>  
> kalashnikov:hbase.apache.git stack$ git show 
> d07d181ea4a9da316659bb21fd4fffc979b5f77a
> commit d07d181ea4a9da316659bb21fd4fffc979b5f77a
> Author: Michael Stack 
> Date: Thu Oct 8 09:10:30 2020 -0700
> HBASE-25165 Change 'State time' in UI so sorts (#2508)
> Display startcode in iso8601.
> Signed-off-by: Nick Dimiduk 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HBASE-26200) Undo 'HBASE-25165 Change 'State time' in UI so sorts (#2508)' in favor of HBASE-24652

2021-08-16 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1737#comment-1737
 ] 

Michael Stack edited comment on HBASE-26200 at 8/16/21, 8:43 PM:
-

HBASE-25165 had us show 'start time' as ISO8601 which our String sort in UI 
doesn't. The other issue w/ ISO8601 is that it rendors as UTC which isn't what 
all want. Let me undo HBASE-25165 then apply HBASE-24652.


was (Author: stack):
HBASE-25165 had us show 'start time' as ISO8601 which our String sort in UI 
doesn't. The other issue w/ ISO8601 is that it rendors as UTC which isn't what 
all want. Let me revert this then apply HBASE-24652.

> Undo 'HBASE-25165 Change 'State time' in UI so sorts (#2508)' in favor of 
> HBASE-24652
> -
>
> Key: HBASE-26200
> URL: https://issues.apache.org/jira/browse/HBASE-26200
> Project: HBase
>  Issue Type: Bug
>  Components: UI
>Reporter: Michael Stack
>Priority: Major
>
> The below change by me does not actually work and I found an old issue that 
> does the proper job that was neglected. I'm undoing the below in favor of 
> HBASE-24652.
>  
> kalashnikov:hbase.apache.git stack$ git show 
> d07d181ea4a9da316659bb21fd4fffc979b5f77a
> commit d07d181ea4a9da316659bb21fd4fffc979b5f77a
> Author: Michael Stack 
> Date: Thu Oct 8 09:10:30 2020 -0700
> HBASE-25165 Change 'State time' in UI so sorts (#2508)
> Display startcode in iso8601.
> Signed-off-by: Nick Dimiduk 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-26200) Undo 'HBASE-25165 Change 'State time' in UI so sorts (#2508)' in favor of HBASE-24652

2021-08-16 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1737#comment-1737
 ] 

Michael Stack commented on HBASE-26200:
---

HBASE-25165 had us show 'start time' as ISO8601 which our String sort in UI 
doesn't. The other issue w/ ISO8601 is that it rendors as UTC which isn't what 
all want. Let me revert this then apply HBASE-24652.

> Undo 'HBASE-25165 Change 'State time' in UI so sorts (#2508)' in favor of 
> HBASE-24652
> -
>
> Key: HBASE-26200
> URL: https://issues.apache.org/jira/browse/HBASE-26200
> Project: HBase
>  Issue Type: Bug
>  Components: UI
>Reporter: Michael Stack
>Priority: Major
>
> The below change by me does not actually work and I found an old issue that 
> does the proper job that was neglected. I'm undoing the below in favor of 
> HBASE-24652.
>  
> kalashnikov:hbase.apache.git stack$ git show 
> d07d181ea4a9da316659bb21fd4fffc979b5f77a
> commit d07d181ea4a9da316659bb21fd4fffc979b5f77a
> Author: Michael Stack 
> Date: Thu Oct 8 09:10:30 2020 -0700
> HBASE-25165 Change 'State time' in UI so sorts (#2508)
> Display startcode in iso8601.
> Signed-off-by: Nick Dimiduk 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-26200) Undo 'HBASE-25165 Change 'State time' in UI so sorts (#2508)' in favor of HBASE-24652

2021-08-16 Thread Michael Stack (Jira)
Michael Stack created HBASE-26200:
-

 Summary: Undo 'HBASE-25165 Change 'State time' in UI so sorts 
(#2508)' in favor of HBASE-24652
 Key: HBASE-26200
 URL: https://issues.apache.org/jira/browse/HBASE-26200
 Project: HBase
  Issue Type: Bug
  Components: UI
Reporter: Michael Stack


The below change by me does not actually work and I found an old issue that 
does the proper job that was neglected. I'm undoing the below in favor of 
HBASE-24652.

 

kalashnikov:hbase.apache.git stack$ git show 
d07d181ea4a9da316659bb21fd4fffc979b5f77a
commit d07d181ea4a9da316659bb21fd4fffc979b5f77a
Author: Michael Stack 
Date: Thu Oct 8 09:10:30 2020 -0700

HBASE-25165 Change 'State time' in UI so sorts (#2508)

Display startcode in iso8601.

Signed-off-by: Nick Dimiduk 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-24339) Backport HBASE-23968 to branch-1

2021-08-16 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-24339.
---
Resolution: Won't Fix

> Backport HBASE-23968 to branch-1
> 
>
> Key: HBASE-24339
> URL: https://issues.apache.org/jira/browse/HBASE-24339
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Minwoo Kang
>Assignee: Minwoo Kang
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24339) Backport HBASE-23968 to branch-1

2021-08-16 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17399945#comment-17399945
 ] 

Michael Stack commented on HBASE-24339:
---

I closed the PR. No more releases off branch-1. Resolving as won't fix.

> Backport HBASE-23968 to branch-1
> 
>
> Key: HBASE-24339
> URL: https://issues.apache.org/jira/browse/HBASE-24339
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Minwoo Kang
>Assignee: Minwoo Kang
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24337) Backport HBASE-23968 to branch-2

2021-08-16 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17399944#comment-17399944
 ] 

Michael Stack commented on HBASE-24337:
---

Refreshed PR.

> Backport HBASE-23968 to branch-2
> 
>
> Key: HBASE-24337
> URL: https://issues.apache.org/jira/browse/HBASE-24337
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Minwoo Kang
>Assignee: Minwoo Kang
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-26037) Implement namespace and table level access control for thrift & thrift2

2021-08-16 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-26037.
---
Fix Version/s: 3.0.0-alpha-2
 Hadoop Flags: Reviewed
   Resolution: Fixed

Merged to master. Thanks for the PR [~xytss123]  (and review [~zhangduo] ). I 
tried to go back to branch-2 so could be in 2.5.0 but CONFLICT. Make a sub-task 
if you want a backport. Thank you.

> Implement namespace and table level access control for thrift & thrift2
> ---
>
> Key: HBASE-26037
> URL: https://issues.apache.org/jira/browse/HBASE-26037
> Project: HBase
>  Issue Type: Improvement
>  Components: Admin, Thrift
>Reporter: Yutong Xiao
>Assignee: Yutong Xiao
>Priority: Major
> Fix For: 3.0.0-alpha-2
>
>
> Client can grant or revoke ns & table level user permissions through thrift & 
> thrift2. This is implemented with AccessControlClient.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-26017) hbase performance evaluation tool could not support datasize more than 2048g

2021-08-16 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17399898#comment-17399898
 ] 

Michael Stack commented on HBASE-26017:
---

The PR looks good. I left some comments. Can do the refactor of pe to do bigger 
sizes in a follow-on.

> hbase performance evaluation tool  could not support datasize more than 2048g
> -
>
> Key: HBASE-26017
> URL: https://issues.apache.org/jira/browse/HBASE-26017
> Project: HBase
>  Issue Type: Bug
>  Components: PE
>Affects Versions: 2.1.0, 2.3.2, 2.4.4
>Reporter: dingwei2019
>Assignee: dingwei2019
>Priority: Minor
>
> in our daily test, we may hope to test more datasize than 2048g, when we set 
> --size more than 2048g, pe print abnormal message like this:
> [TestClient-1] hbase.PerformanceEvaluation: Start class 
> org.apache.hadoop.hbase.PerformanceEvaluation$SequentialWriteTest at offset 
> -1138166308 for -21474836 rows
>  
> this is due to variable totalRows in TestOptions defined by int(-2147483648  
> --- 2147483647), One GB is 1048576(1024*1024) by default. The max value of 
> totalRow is  2147483647, in this condition, we may write not larger than 
> 2147483647/1048576 = 2047.999G.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-26016) HFilePrettyPrinter tool can not print the last LEAF_INDEX block or BLOOM_CHUNK.

2021-08-16 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17399889#comment-17399889
 ] 

Michael Stack commented on HBASE-26016:
---

[~dingwei2019] over on PR asking for example of the fix in action? Thanks.

> HFilePrettyPrinter tool can not print the last LEAF_INDEX block or 
> BLOOM_CHUNK.
> ---
>
> Key: HBASE-26016
> URL: https://issues.apache.org/jira/browse/HBASE-26016
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 2.1.0, 2.3.2, 2.4.4
>Reporter: dingwei2019
>Assignee: dingwei2019
>Priority: Minor
> Attachments: HBASE-26016-prettyPrintTool-1.patch
>
>
> when i use pretty printer tool to print the headers of block, i can not get 
> the last LEAF_INDEX block and BLOOM_CHUNK. the last info of the tools is blow:
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25996) add hbase hbck result on jmx

2021-08-16 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17399888#comment-17399888
 ] 

Michael Stack commented on HBASE-25996:
---

Please say more about why we would do this?

> add hbase hbck result on jmx
> 
>
> Key: HBASE-25996
> URL: https://issues.apache.org/jira/browse/HBASE-25996
> Project: HBase
>  Issue Type: Improvement
>Reporter: xijiawen
>Assignee: xijiawen
>Priority: Major
>
> https://github.com/apache/hbase/pull/3379



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-26196) Support configuration override for remote cluster of HFileOutputFormat locality sensitive

2021-08-16 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17399867#comment-17399867
 ] 

Michael Stack commented on HBASE-26196:
---

Thanks for PR [~lineyshinya] . Pushed to branch-2.4+. It would not go back to 
branch-1 w/o CONFLICT and I think we're done w/ making releases off branch-1 
(Make s sub-task for backport if you want it to go in still. Thanks).

> Support configuration override for remote cluster of HFileOutputFormat 
> locality sensitive
> -
>
> Key: HBASE-26196
> URL: https://issues.apache.org/jira/browse/HBASE-26196
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 1.8.0, 3.0.0-alpha-2, 2.4.5
>Reporter: Shinya Yoshida
>Assignee: Shinya Yoshida
>Priority: Major
> Fix For: 2.5.0, 3.0.0-alpha-2, 2.4.6
>
>
> We introduced support to generate hfile with good locality for a remote 
> cluster even in HBASE-25608.
> I realized we need to override other configurations for the remote cluster in 
> addition to the zookeeper cluster key.
> For example, read from a non-secure cluster and write hfiles for a secure 
> cluster.
>  In this case, we use TableInputFormat for non-secure cluster with 
> hbase.security.authentication=simple in job configuration.
>  So HFileOutputFormat failed to connect to remote secure cluster because 
> requires hbase.security.authentication=kerberos in job conf.
>  
> Thus let's introduce configuration override for remote-cluster-aware 
> HFileOutputFormat locality-sensitive feature.
>  
> -Another example is to read from a secure cluster (A) and write hfiles for 
> another secure cluster (B) and we use different principal for each cluster.-
>  -For instance, we use cluster-a/_h...@example.com for A and 
> cluster-b/_h...@example.com for B.-
>  -Then we need to override MASTER_KRB_PRINCIPAL and 
> REGIONSERVER_KRB_PRINCIPAL using cluster-b/_h...@example.com to connect 
> cluster B.-
> ^ This is not truth, we use token based digest auth in mapper/reducer, so 
> principal difference for kerberos should be fine



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-26196) Support configuration override for remote cluster of HFileOutputFormat locality sensitive

2021-08-16 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack updated HBASE-26196:
--
Fix Version/s: 2.4.6
   3.0.0-alpha-2
   2.5.0
 Hadoop Flags: Reviewed
 Release Note: Allow zookeeper configuration for remote cluster in 
HFileOutputFormat2.
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> Support configuration override for remote cluster of HFileOutputFormat 
> locality sensitive
> -
>
> Key: HBASE-26196
> URL: https://issues.apache.org/jira/browse/HBASE-26196
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 1.8.0, 3.0.0-alpha-2, 2.4.5
>Reporter: Shinya Yoshida
>Assignee: Shinya Yoshida
>Priority: Major
> Fix For: 2.5.0, 3.0.0-alpha-2, 2.4.6
>
>
> We introduced support to generate hfile with good locality for a remote 
> cluster even in HBASE-25608.
> I realized we need to override other configurations for the remote cluster in 
> addition to the zookeeper cluster key.
> For example, read from a non-secure cluster and write hfiles for a secure 
> cluster.
>  In this case, we use TableInputFormat for non-secure cluster with 
> hbase.security.authentication=simple in job configuration.
>  So HFileOutputFormat failed to connect to remote secure cluster because 
> requires hbase.security.authentication=kerberos in job conf.
>  
> Thus let's introduce configuration override for remote-cluster-aware 
> HFileOutputFormat locality-sensitive feature.
>  
> -Another example is to read from a secure cluster (A) and write hfiles for 
> another secure cluster (B) and we use different principal for each cluster.-
>  -For instance, we use cluster-a/_h...@example.com for A and 
> cluster-b/_h...@example.com for B.-
>  -Then we need to override MASTER_KRB_PRINCIPAL and 
> REGIONSERVER_KRB_PRINCIPAL using cluster-b/_h...@example.com to connect 
> cluster B.-
> ^ This is not truth, we use token based digest auth in mapper/reducer, so 
> principal difference for kerberos should be fine



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-26198) RegionServer dead on hadoop 3.3.1: NoSuchMethodError LocatedBlocks.getLocations()

2021-08-16 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17399824#comment-17399824
 ] 

Michael Stack commented on HBASE-26198:
---

Interesting. I tested 3.3.1. Which hbase version is this? Thanks.

> RegionServer dead on hadoop 3.3.1: NoSuchMethodError 
> LocatedBlocks.getLocations()
> -
>
> Key: HBASE-26198
> URL: https://issues.apache.org/jira/browse/HBASE-26198
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: mengqi
>Priority: Major
> Attachments: 4ad46153842c29898189b90fc986925c87966ce6.diff, 
> image-2021-08-16-16-24-32-418.png
>
>
> !image-2021-08-16-16-24-32-418.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-26198) RegionServer dead on hadoop 3.3.1: NoSuchMethodError LocatedBlocks.getLocations()

2021-08-16 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack updated HBASE-26198:
--
Summary: RegionServer dead on hadoop 3.3.1: NoSuchMethodError 
LocatedBlocks.getLocations()  (was: regionserver dead on hadoop 3.3.1)

> RegionServer dead on hadoop 3.3.1: NoSuchMethodError 
> LocatedBlocks.getLocations()
> -
>
> Key: HBASE-26198
> URL: https://issues.apache.org/jira/browse/HBASE-26198
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: mengqi
>Priority: Major
> Attachments: 4ad46153842c29898189b90fc986925c87966ce6.diff, 
> image-2021-08-16-16-24-32-418.png
>
>
> !image-2021-08-16-16-24-32-418.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24286) HMaster won't become healthy after after cloning or creating a new cluster pointing at the same file system

2021-08-14 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17399210#comment-17399210
 ] 

Michael Stack commented on HBASE-24286:
---

Linking HBASE-26193... has a nice summary of what the issue is here by [~zyork] 
 w/ some color added by [~zhangduo]

> HMaster won't become healthy after after cloning or creating a new cluster 
> pointing at the same file system
> ---
>
> Key: HBASE-24286
> URL: https://issues.apache.org/jira/browse/HBASE-24286
> Project: HBase
>  Issue Type: Bug
>  Components: master, Region Assignment
>Affects Versions: 3.0.0-alpha-1, 2.2.3, 2.2.4, 2.2.5
>Reporter: Jack Ye
>Assignee: Tak-Lon (Stephen) Wu
>Priority: Major
>
> h1. How to reproduce:
>  # user starts an HBase cluster on top of a file system
>  # user performs some operations and shuts down the cluster, all the data are 
> still persisted in the file system
>  # user creates a new HBase cluster using a different set of servers on top 
> of the same file system with the same root directory
>  # HMaster cannot initialize
> h1. Root cause:
> During HMaster initialization phase, the following happens:
>  # HMaster waits for namespace table online
>  # AssignmentManager gets all namespace table regions info
>  # region servers of namespace table are already dead, online check fails
>  # HMaster waits for namespace regions online, keep retrying for 1000 times 
> which means forever
> Code waiting for namespace table to be online: 
> https://github.com/apache/hbase/blob/rel/2.2.3/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java#L1102
> h1. Stack trace (running on S3):
> 2020-04-23 08:15:57,185 WARN [master/ip-10-12-13-14:16000:becomeActiveMaster] 
> master.HMaster: 
> hbase:namespace,,1587628169070.d34b65b91a52644ed3e77c5fbb065c2b. is NOT 
> online; state=\{d34b65b91a52644ed3e77c5fbb065c2b state=OPEN, 
> ts=1587629742129, server=ip-10-12-13-14.ec2.internal,16020,1587628031614}; 
> ServerCrashProcedures=false. Master startup cannot progress, in 
> holding-pattern until region onlined.
> where ip-10-12-13-14.ec2.internal is the old region server hosting the region 
> of hbase:namespace.
> h1. Discussion for the fix
> We see there is a fix for this at branch-3: 
> https://issues.apache.org/jira/browse/HBASE-21154. Before we provide a patch, 
> we would like to know from the community if we should backport this change to 
> branch-2, or if we should just perform a fix with minimum code change.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-26193) Do not store meta region location on zookeeper

2021-08-12 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17398424#comment-17398424
 ] 

Michael Stack commented on HBASE-26193:
---

{quote}But obviously that's a different problem than this is trying to solve.
{quote}
Thanks [~zyork]  My bad. Thought it would help (Would be kinda silly though if 
the Master renewed a meta location that was for a server that was not a member 
of the current cluster ... can look more if this goes in).

> Do not store meta region location on zookeeper
> --
>
> Key: HBASE-26193
> URL: https://issues.apache.org/jira/browse/HBASE-26193
> Project: HBase
>  Issue Type: Improvement
>  Components: meta, Zookeeper
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
>
> As it breaks one of our design rules
> https://hbase.apache.org/book.html#design.invariants.zk.data
> We used to think hbase should be recovered automatically when all the data on 
> zk (except the replication data) are cleared, but obviously, if you clear the 
> meta region location, the cluster will be in trouble, and need to use 
> operation tools to recover the cluster.
> So here, along with the ConnectionRegistry improvements, we should also 
> consider move the meta region location off zookeeper.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-26193) Do not store meta region location on zookeeper

2021-08-12 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17398341#comment-17398341
 ] 

Michael Stack commented on HBASE-26193:
---

[~zyork] Pardon. I was flagging you for the first part of the doc i.e. a 
fix so a populated hbase could start though zk is tabula rasa... (The second 
part of the doc seems to be an active working doc whose product would be a 
proposal over on the split meta Jira  – will let Duo clarify).

> Do not store meta region location on zookeeper
> --
>
> Key: HBASE-26193
> URL: https://issues.apache.org/jira/browse/HBASE-26193
> Project: HBase
>  Issue Type: Improvement
>  Components: meta, Zookeeper
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
>
> As it breaks one of our design rules
> https://hbase.apache.org/book.html#design.invariants.zk.data
> We used to think hbase should be recovered automatically when all the data on 
> zk (except the replication data) are cleared, but obviously, if you clear the 
> meta region location, the cluster will be in trouble, and need to use 
> operation tools to recover the cluster.
> So here, along with the ConnectionRegistry improvements, we should also 
> consider move the meta region location off zookeeper.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HBASE-26193) Do not store meta region location on zookeeper

2021-08-12 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17398288#comment-17398288
 ] 

Michael Stack edited comment on HBASE-26193 at 8/12/21, 9:02 PM:
-

+1 on your suggestion of keeping meta location in master-local-region w/ the 
master publishing to zk on restart.

Thanks for looking into this.

(FYI [~zyork] )


was (Author: stack):
+1 on your suggestion of keeping meta location in master-local-region w/ the 
master publishing to zk on restart.

> Do not store meta region location on zookeeper
> --
>
> Key: HBASE-26193
> URL: https://issues.apache.org/jira/browse/HBASE-26193
> Project: HBase
>  Issue Type: Improvement
>  Components: meta, Zookeeper
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
>
> As it breaks one of our design rules
> https://hbase.apache.org/book.html#design.invariants.zk.data
> We used to think hbase should be recovered automatically when all the data on 
> zk (except the replication data) are cleared, but obviously, if you clear the 
> meta region location, the cluster will be in trouble, and need to use 
> operation tools to recover the cluster.
> So here, along with the ConnectionRegistry improvements, we should also 
> consider move the meta region location off zookeeper.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-26193) Do not store meta region location on zookeeper

2021-08-12 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17398288#comment-17398288
 ] 

Michael Stack commented on HBASE-26193:
---

+1 on your suggestion of keeping meta location in master-local-region w/ the 
master publishing to zk on restart.

> Do not store meta region location on zookeeper
> --
>
> Key: HBASE-26193
> URL: https://issues.apache.org/jira/browse/HBASE-26193
> Project: HBase
>  Issue Type: Improvement
>  Components: meta, Zookeeper
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
>
> As it breaks one of our design rules
> https://hbase.apache.org/book.html#design.invariants.zk.data
> We used to think hbase should be recovered automatically when all the data on 
> zk (except the replication data) are cleared, but obviously, if you clear the 
> meta region location, the cluster will be in trouble, and need to use 
> operation tools to recover the cluster.
> So here, along with the ConnectionRegistry improvements, we should also 
> consider move the meta region location off zookeeper.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-26191) Annotate shaded generated protobuf as InterfaceAudience.Private

2021-08-11 Thread Michael Stack (Jira)
Michael Stack created HBASE-26191:
-

 Summary: Annotate shaded generated protobuf as 
InterfaceAudience.Private
 Key: HBASE-26191
 URL: https://issues.apache.org/jira/browse/HBASE-26191
 Project: HBase
  Issue Type: Task
  Components: Coprocessors, Protobufs
Reporter: Michael Stack


Annotate generated shaded protobufs as InterfaceAudience.Private. It might not 
be able to add the annotation to each class; at a minimum update the doc on our 
story around shaded internal protobufs.

See the prompting mailing list discussion here: 
[https://lists.apache.org/thread.html/r9e6eb11106727d245f6eb2a5023823901637971d6ed0f0aedaf8d149%40%3Cdev.hbase.apache.org%3E]

So far the consensus has it that the shaded generated protobuf should be made 
IA.Private.  Will wait on it to settle.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-16756) InterfaceAudience annotate our protobuf; distinguish internal; publish public

2021-08-11 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-16756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-16756.
---
Resolution: Won't Fix

Not doing this.

> InterfaceAudience annotate our protobuf; distinguish internal; publish public
> -
>
> Key: HBASE-16756
> URL: https://issues.apache.org/jira/browse/HBASE-16756
> Project: HBase
>  Issue Type: Task
>  Components: Protobufs
>Reporter: Michael Stack
>Priority: Major
>
> This is a follow-on from the work done over in HBASE-15638 Shade protobuf.
> Currently protobufs are not annotated as our java classes are even though 
> they are being used by downstream Coprocessor Endpoints; i.e. if a CPEP wants 
> to update a Cell in HBase or refer to a server in the cluster, 9 times out of 
> 10 they will depend on the HBase Cell.proto and its generated classes or the 
> ServerName definition in HBase.proto file.
> This makes it so we cannot make breaking changes to the Cell type or relocate 
> the ServerName definition to another file if we want CPEPs to keep working.
> The issue gets compounded by HBASE-15638 "Shade protobuf" where protos used 
> internally are relocated, and given another package name altogether. 
> Currently we leave behind the old protos (sort-of duplicated) so CPEPs keep 
> working but going forward, IF WE CONTINUE DOWN THIS PATH OF SHADING PROTOS 
> (we may revisit if hadoop ends up isolating its classpath), then we need to 
> 'publish' protos that we will honor as we would classes annotate with 
> @InterfaceAudience.Public as part of our public API going forward.
> What is involved is a review of the current protos under hbase-protocol. Sort 
> out what is to be made public. We will likely have to break up current proto 
> files into smaller collections since they currently contain mixes of public 
> and private types. Deprecate the fat Admin and Client protos.  This will 
> allow us to better narrow the set of what we make public. These new files 
> could live in the hbase-protocol module suitably annotated or they could be 
> done up in a new module altogether. TODO.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-16756) InterfaceAudience annotate our protobuf; distinguish internal; publish public

2021-08-11 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-16756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17397789#comment-17397789
 ] 

Michael Stack commented on HBASE-16756:
---

{quote}I'm not sure what would constitute "Fixed" on this issue, it looks to 
mostly document some decisions that have already been made.
{quote}
I think this issue had grand notions of a curated list of pbs that could 
somehow be consumed by downstreamers freely marked w/ interfaceaudience.public. 
It hasn't happened. Not sure it even possible given pb2 vs pb3 and that our 
protos were a glob probably hard to disentangle into public/private. Let me 
resolve as "won't implement".

Since , hbase-protocol is removed for hbase3 (see nice release note on 
HBASE-23797 ). Let me do your suggestion of underlining pbs as limitedprivate 
in doc over in another issue [~bbeaudreault]

> InterfaceAudience annotate our protobuf; distinguish internal; publish public
> -
>
> Key: HBASE-16756
> URL: https://issues.apache.org/jira/browse/HBASE-16756
> Project: HBase
>  Issue Type: Task
>  Components: Protobufs
>Reporter: Michael Stack
>Priority: Major
>
> This is a follow-on from the work done over in HBASE-15638 Shade protobuf.
> Currently protobufs are not annotated as our java classes are even though 
> they are being used by downstream Coprocessor Endpoints; i.e. if a CPEP wants 
> to update a Cell in HBase or refer to a server in the cluster, 9 times out of 
> 10 they will depend on the HBase Cell.proto and its generated classes or the 
> ServerName definition in HBase.proto file.
> This makes it so we cannot make breaking changes to the Cell type or relocate 
> the ServerName definition to another file if we want CPEPs to keep working.
> The issue gets compounded by HBASE-15638 "Shade protobuf" where protos used 
> internally are relocated, and given another package name altogether. 
> Currently we leave behind the old protos (sort-of duplicated) so CPEPs keep 
> working but going forward, IF WE CONTINUE DOWN THIS PATH OF SHADING PROTOS 
> (we may revisit if hadoop ends up isolating its classpath), then we need to 
> 'publish' protos that we will honor as we would classes annotate with 
> @InterfaceAudience.Public as part of our public API going forward.
> What is involved is a review of the current protos under hbase-protocol. Sort 
> out what is to be made public. We will likely have to break up current proto 
> files into smaller collections since they currently contain mixes of public 
> and private types. Deprecate the fat Admin and Client protos.  This will 
> allow us to better narrow the set of what we make public. These new files 
> could live in the hbase-protocol module suitably annotated or they could be 
> done up in a new module altogether. TODO.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-16756) InterfaceAudience annotate our protobuf; distinguish internal; publish public

2021-08-11 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-16756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17397772#comment-17397772
 ] 

Michael Stack commented on HBASE-16756:
---

HBASE-23797 removes hbase-protocol; CPEPs are to use shaded protos (see release 
note on the issue).

> InterfaceAudience annotate our protobuf; distinguish internal; publish public
> -
>
> Key: HBASE-16756
> URL: https://issues.apache.org/jira/browse/HBASE-16756
> Project: HBase
>  Issue Type: Task
>  Components: Protobufs
>Reporter: Michael Stack
>Priority: Major
>
> This is a follow-on from the work done over in HBASE-15638 Shade protobuf.
> Currently protobufs are not annotated as our java classes are even though 
> they are being used by downstream Coprocessor Endpoints; i.e. if a CPEP wants 
> to update a Cell in HBase or refer to a server in the cluster, 9 times out of 
> 10 they will depend on the HBase Cell.proto and its generated classes or the 
> ServerName definition in HBase.proto file.
> This makes it so we cannot make breaking changes to the Cell type or relocate 
> the ServerName definition to another file if we want CPEPs to keep working.
> The issue gets compounded by HBASE-15638 "Shade protobuf" where protos used 
> internally are relocated, and given another package name altogether. 
> Currently we leave behind the old protos (sort-of duplicated) so CPEPs keep 
> working but going forward, IF WE CONTINUE DOWN THIS PATH OF SHADING PROTOS 
> (we may revisit if hadoop ends up isolating its classpath), then we need to 
> 'publish' protos that we will honor as we would classes annotate with 
> @InterfaceAudience.Public as part of our public API going forward.
> What is involved is a review of the current protos under hbase-protocol. Sort 
> out what is to be made public. We will likely have to break up current proto 
> files into smaller collections since they currently contain mixes of public 
> and private types. Deprecate the fat Admin and Client protos.  This will 
> allow us to better narrow the set of what we make public. These new files 
> could live in the hbase-protocol module suitably annotated or they could be 
> done up in a new module altogether. TODO.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-26185) Fix TestMaster#testMoveRegionWhenNotInitialized with hbase.min.version.move.system.tables

2021-08-11 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17397471#comment-17397471
 ] 

Michael Stack commented on HBASE-26185:
---

Thanks [~vjasani]

> Fix TestMaster#testMoveRegionWhenNotInitialized with 
> hbase.min.version.move.system.tables
> -
>
> Key: HBASE-26185
> URL: https://issues.apache.org/jira/browse/HBASE-26185
> Project: HBase
>  Issue Type: Test
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Minor
> Fix For: 2.5.0, 3.0.0-alpha-2, 1.7.2, 2.4.6, 2.3.7
>
>
> In order to protect meta region movement unexpectedly during upgrade with 
> rsGroup enabled, it is good practice to keep 
> hbase.min.version.move.system.tables in hbase-default for specific branch so 
> that the use-case for the specific version of HBase is well under control. 
> However, TestMaster#testMoveRegionWhenNotInitialized would fail because it 
> would not find server to move meta to. We should fix this.
>  
> {code:java}
> INFO  [Time-limited test] master.HMaster(2029): Passed destination servername 
> is null/empty so choosing a server at random
> java.lang.UnsupportedOperationExceptionjava.lang.UnsupportedOperationException
>  at java.util.AbstractList.add(AbstractList.java:148) at 
> java.util.AbstractList.add(AbstractList.java:108) at 
> org.apache.hadoop.hbase.master.HMaster.move(HMaster.java:2031) at 
> org.apache.hadoop.hbase.master.TestMaster.testMoveRegionWhenNotInitialized(TestMaster.java:181)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-26185) Fix TestMaster#testMoveRegionWhenNotInitialized with hbase.min.version.move.system.tables

2021-08-11 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17397437#comment-17397437
 ] 

Michael Stack commented on HBASE-26185:
---

Wondering whats happening here. The issue is reopened. Was wondering why and 
what is to be done to re-resolve? Thanks. Is 
[https://github.com/apache/hbase/pull/3577]  a backport?

> Fix TestMaster#testMoveRegionWhenNotInitialized with 
> hbase.min.version.move.system.tables
> -
>
> Key: HBASE-26185
> URL: https://issues.apache.org/jira/browse/HBASE-26185
> Project: HBase
>  Issue Type: Test
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Minor
> Fix For: 2.5.0, 3.0.0-alpha-2, 1.7.2, 2.4.6, 2.3.7
>
>
> In order to protect meta region movement unexpectedly during upgrade with 
> rsGroup enabled, it is good practice to keep 
> hbase.min.version.move.system.tables in hbase-default for specific branch so 
> that the use-case for the specific version of HBase is well under control. 
> However, TestMaster#testMoveRegionWhenNotInitialized would fail because it 
> would not find server to move meta to. We should fix this.
>  
> {code:java}
> INFO  [Time-limited test] master.HMaster(2029): Passed destination servername 
> is null/empty so choosing a server at random
> java.lang.UnsupportedOperationExceptionjava.lang.UnsupportedOperationException
>  at java.util.AbstractList.add(AbstractList.java:148) at 
> java.util.AbstractList.add(AbstractList.java:108) at 
> org.apache.hadoop.hbase.master.HMaster.move(HMaster.java:2031) at 
> org.apache.hadoop.hbase.master.TestMaster.testMoveRegionWhenNotInitialized(TestMaster.java:181)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-26122) Limit max result size of individual Gets

2021-08-11 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17397396#comment-17397396
 ] 

Michael Stack commented on HBASE-26122:
---

No harm. I reverted the commit so we can avoid having to do an addendum.

> Limit max result size of individual Gets
> 
>
> Key: HBASE-26122
> URL: https://issues.apache.org/jira/browse/HBASE-26122
> Project: HBase
>  Issue Type: New Feature
>  Components: Client, regionserver
>Reporter: Bryan Beaudreault
>Assignee: Bryan Beaudreault
>Priority: Major
> Fix For: 2.5.0, 3.0.0-alpha-2
>
>
> Scans have the ability to have a configured max result size, which causes 
> them to return a partial result once the limit has been reached. MultiGets 
> also can throw MultiActionResultTooLarge if the response size is over a 
> configured quota. Neither of these really accounts for a single Get of a 
> too-large row. Such too-large Gets can cause substantial GC pressure or worse 
> if sent at volume.
> Currently one can work around this by converting their Get to a single row 
> Scan, but this requires a developer to proactively know about and prepare for 
> the issue by using a Scan upfront or wait for the RegionServer to choke on a 
> large request and only then rewrite the Get for future requests.
> We should implement the same response size limits for for Get as for Scan, 
> whereby the server returns a partial result to the client for handling.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-26097) Resolve dependency conflicts of hbase-endpoint third-party libraries

2021-08-10 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17397071#comment-17397071
 ] 

Michael Stack commented on HBASE-26097:
---

Commented on the branch-2 PR.

> Resolve dependency conflicts of hbase-endpoint third-party libraries
> 
>
> Key: HBASE-26097
> URL: https://issues.apache.org/jira/browse/HBASE-26097
> Project: HBase
>  Issue Type: Improvement
>  Components: hbase-operator-tools
>Affects Versions: 2.0.0, 2.0.6
>Reporter: zyxxoo
>Priority: Major
> Fix For: 2.0.6
>
> Attachments: screenshot-1.png
>
>
> Hi, Our project uses “hbase-endpoint”, but the protobuf version is 2.5 
> version, which conflicts with 3.0 in our project. I want to merge into a 
> "hbase-shaded-endpoint" submission to resolve "hbase-endpoint" dependency 
> issue of version 2.0



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25859) Reference class incorrectly parses the protobuf magic marker

2021-08-10 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17397067#comment-17397067
 ] 

Michael Stack commented on HBASE-25859:
---

[~catalin.luca] I closed the PR against branch-1.4. 1.4 is EOL'd. The PR 
applies to branch-1 and compiles but we think we've made our last release of 
branch-1 recently... it was 1.7.1.  FYI.

> Reference class incorrectly parses the protobuf magic marker
> 
>
> Key: HBASE-25859
> URL: https://issues.apache.org/jira/browse/HBASE-25859
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 2.4.1
>Reporter: Constantin-Catalin Luca
>Assignee: Constantin-Catalin Luca
>Priority: Minor
> Fix For: 3.0.0-alpha-1, 2.5.0, 2.4.3
>
>
> The Reference class incorrectly parses the protobuf magic marker.
> It uses:
> {code:java}
> // DataInputStream.read(byte[lengthOfPNMagic]){code}
> but this call does not guarantee to read all the bytes of the marker.
>  The fix is the same as the one for 
> https://issues.apache.org/jira/browse/HBASE-25674



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-26122) Limit max result size of individual Gets

2021-08-10 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17397031#comment-17397031
 ] 

Michael Stack commented on HBASE-26122:
---

Pushed the branch-2 PR. Make the master PR same as the branch-2 and I'll merge 
it to master too [~bbeaudreault]  (then we can resolve this JIRA). FYI, not 
putting into 2.4... it adds API  Better to do this in new minor version.

> Limit max result size of individual Gets
> 
>
> Key: HBASE-26122
> URL: https://issues.apache.org/jira/browse/HBASE-26122
> Project: HBase
>  Issue Type: New Feature
>  Components: Client, regionserver
>Reporter: Bryan Beaudreault
>Assignee: Bryan Beaudreault
>Priority: Major
> Fix For: 2.5.0, 3.0.0-alpha-2
>
>
> Scans have the ability to have a configured max result size, which causes 
> them to return a partial result once the limit has been reached. MultiGets 
> also can throw MultiActionResultTooLarge if the response size is over a 
> configured quota. Neither of these really accounts for a single Get of a 
> too-large row. Such too-large Gets can cause substantial GC pressure or worse 
> if sent at volume.
> Currently one can work around this by converting their Get to a single row 
> Scan, but this requires a developer to proactively know about and prepare for 
> the issue by using a Scan upfront or wait for the RegionServer to choke on a 
> large request and only then rewrite the Get for future requests.
> We should implement the same response size limits for for Get as for Scan, 
> whereby the server returns a partial result to the client for handling.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-26122) Limit max result size of individual Gets

2021-08-10 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17397033#comment-17397033
 ] 

Michael Stack commented on HBASE-26122:
---

Oh, put the shell example as release note. Its nice illustration I'd say.

> Limit max result size of individual Gets
> 
>
> Key: HBASE-26122
> URL: https://issues.apache.org/jira/browse/HBASE-26122
> Project: HBase
>  Issue Type: New Feature
>  Components: Client, regionserver
>Reporter: Bryan Beaudreault
>Assignee: Bryan Beaudreault
>Priority: Major
> Fix For: 2.5.0, 3.0.0-alpha-2
>
>
> Scans have the ability to have a configured max result size, which causes 
> them to return a partial result once the limit has been reached. MultiGets 
> also can throw MultiActionResultTooLarge if the response size is over a 
> configured quota. Neither of these really accounts for a single Get of a 
> too-large row. Such too-large Gets can cause substantial GC pressure or worse 
> if sent at volume.
> Currently one can work around this by converting their Get to a single row 
> Scan, but this requires a developer to proactively know about and prepare for 
> the issue by using a Scan upfront or wait for the RegionServer to choke on a 
> large request and only then rewrite the Get for future requests.
> We should implement the same response size limits for for Get as for Scan, 
> whereby the server returns a partial result to the client for handling.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-26122) Limit max result size of individual Gets

2021-08-10 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack updated HBASE-26122:
--
Fix Version/s: 2.5.0

> Limit max result size of individual Gets
> 
>
> Key: HBASE-26122
> URL: https://issues.apache.org/jira/browse/HBASE-26122
> Project: HBase
>  Issue Type: New Feature
>  Components: Client, regionserver
>Reporter: Bryan Beaudreault
>Assignee: Bryan Beaudreault
>Priority: Major
> Fix For: 2.5.0, 3.0.0-alpha-2
>
>
> Scans have the ability to have a configured max result size, which causes 
> them to return a partial result once the limit has been reached. MultiGets 
> also can throw MultiActionResultTooLarge if the response size is over a 
> configured quota. Neither of these really accounts for a single Get of a 
> too-large row. Such too-large Gets can cause substantial GC pressure or worse 
> if sent at volume.
> Currently one can work around this by converting their Get to a single row 
> Scan, but this requires a developer to proactively know about and prepare for 
> the issue by using a Scan upfront or wait for the RegionServer to choke on a 
> large request and only then rewrite the Get for future requests.
> We should implement the same response size limits for for Get as for Scan, 
> whereby the server returns a partial result to the client for handling.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-26122) Limit max result size of individual Gets

2021-08-10 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack updated HBASE-26122:
--
Fix Version/s: 3.0.0-alpha-2

> Limit max result size of individual Gets
> 
>
> Key: HBASE-26122
> URL: https://issues.apache.org/jira/browse/HBASE-26122
> Project: HBase
>  Issue Type: New Feature
>  Components: Client, regionserver
>Reporter: Bryan Beaudreault
>Assignee: Bryan Beaudreault
>Priority: Major
> Fix For: 3.0.0-alpha-2
>
>
> Scans have the ability to have a configured max result size, which causes 
> them to return a partial result once the limit has been reached. MultiGets 
> also can throw MultiActionResultTooLarge if the response size is over a 
> configured quota. Neither of these really accounts for a single Get of a 
> too-large row. Such too-large Gets can cause substantial GC pressure or worse 
> if sent at volume.
> Currently one can work around this by converting their Get to a single row 
> Scan, but this requires a developer to proactively know about and prepare for 
> the issue by using a Scan upfront or wait for the RegionServer to choke on a 
> large request and only then rewrite the Get for future requests.
> We should implement the same response size limits for for Get as for Scan, 
> whereby the server returns a partial result to the client for handling.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-26149) Further improvements on ConnectionRegistry implementations

2021-08-10 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17396887#comment-17396887
 ] 

Michael Stack commented on HBASE-26149:
---

The one-pager helped. Thanks. I put it here as the Jira description copying to 
sub-tasks description that was in this document bit missing from the sub-task 
JIRA's desciption. Hopefully makes it easier on others trying to follow-long 
whats going on here (Put some questions on the document for my own 
clarification). THanks.

> Further improvements on ConnectionRegistry implementations
> --
>
> Key: HBASE-26149
> URL: https://issues.apache.org/jira/browse/HBASE-26149
> Project: HBase
>  Issue Type: Umbrella
>  Components: Client
>Reporter: Duo Zhang
>Priority: Major
>
> (Copied in-line from the attached 'Documentation' with some filler as 
> connecting script)
> HBASE-23324 Deprecate clients that connect to Zookeeper
> ^^^ This is always our goal, to remove the zookeeper dependency from the 
> client side.
>  
> See the sub-task HBASE-25051 DIGEST based auth broken for MasterRegistry
> When constructing RpcClient, we will pass the clusterid in, and it will be 
> used to select the authentication method. More specifically, it will be used 
> to select the tokens for digest based authentication, please see the code in 
> BuiltInProviderSelector. For ZKConnectionRegistry, we do not need to use 
> RpcClient to connect to zookeeper, so we could get the cluster id first, and 
> then create the RpcClient. But for MasterRegistry/RpcConnectionRegistry, we 
> need to use RpcClient to connect to the ClientMetaService endpoints and then 
> we can call the getClusterId method to get the cluster id. Because of this, 
> when creating RpcClient for MasterRegistry/RpcConnectionRegistry, we can only 
> pass null or the default cluster id, which means the digest based 
> authentication is broken.
> This is a cyclic dependency problem. Maybe a possible way forward, is to make 
> getClusterId method available to all users, which means it does not require 
> any authentication, so we can always call getClusterId with simple 
> authentication, and then at client side, once we get the cluster id, we 
> create a new RpcClient to select the correct authentication way.
> The work in the sub-task, HBASE-26150 Let region server also carry 
> ClientMetaService, is work to make it so the RegionServers can carry a 
> ConnectionRegistry (rather than have the Masters-only carry it as is the case 
> now). Adds a new method getBootstrapNodes to ClientMetaService, the 
> ConnectionRegistry proto Service, for refreshing the bootstrap nodes 
> periodically or on error. The new *RpcConnectionRegistry*  [Created here but 
> defined in the next sub-task]will use this method to refresh the bootstrap 
> nodes, while the old MasterRegistry will use the getMasters method to refresh 
> the ‘bootstrap’ nodes.
> The getBootstrapNodes method will return all the region servers, so after the 
> first refreshing, the client will go to region servers for later rpc calls. 
> But since masters and region servers both implement the ClientMetaService 
> interface, it is free for the client to configure master as the initial 
> bootstrap nodes.
> The following sub-task then deprecates MasterRegistry, HBASE-26172 Deprecated 
> MasterRegistry
> The implementation of MasterRegistry is almost the same with 
> RpcConnectionRegistry except that it uses getMasters instead of 
> getBootstrapNodes to refresh the ‘bootstrap’ nodes connected to. So we could 
> add configs in server side to control what nodes we want to return to client 
> in getBootstrapNodes, i.e, master or region server, then the 
> RpcConnectionRegistry can fully replace the old MasterRegistry. Deprecates 
> the MasterRegistry.
> Sub-task HBASE-26173 Return only a sub set of region servers as bootstrap 
> nodes
> For a large cluster which may have thousands of region servers, it is not a 
> good idea to return all the region servers as bootstrap nodes to clients. So 
> we should add a config at server side to control the max number of bootstrap 
> nodes we want to return to clients. I think the default value could be 5 or 
> 10, which is enough.
> Sub-task HBASE-26174 Make rpc connection registry the default registry on 
> 3.0.0
> Just a follow up of HBASE-26172. MasterRegistry has been deprecated, we 
> should not make it default for 3.0.0 any more.
> Sub-task HBASE-26180 Introduce a initial refresh interval for 
> RpcConnectionRegistry
> As end users could configure any nodes in a cluster as the initial bootstrap 
> nodes, it is possible that different end users will configure the same 
> machine which makes the machine over load. So we should have a shorter delay 
> for the initial refresh, to let users quickly switch to the 

[jira] [Updated] (HBASE-26149) Further improvements on ConnectionRegistry implementations

2021-08-10 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack updated HBASE-26149:
--
Description: 
(Copied in-line from the attached 'Documentation' with some filler as 
connecting script)

HBASE-23324 Deprecate clients that connect to Zookeeper

^^^ This is always our goal, to remove the zookeeper dependency from the client 
side.

 

See the sub-task HBASE-25051 DIGEST based auth broken for MasterRegistry

When constructing RpcClient, we will pass the clusterid in, and it will be used 
to select the authentication method. More specifically, it will be used to 
select the tokens for digest based authentication, please see the code in 
BuiltInProviderSelector. For ZKConnectionRegistry, we do not need to use 
RpcClient to connect to zookeeper, so we could get the cluster id first, and 
then create the RpcClient. But for MasterRegistry/RpcConnectionRegistry, we 
need to use RpcClient to connect to the ClientMetaService endpoints and then we 
can call the getClusterId method to get the cluster id. Because of this, when 
creating RpcClient for MasterRegistry/RpcConnectionRegistry, we can only pass 
null or the default cluster id, which means the digest based authentication is 
broken.

This is a cyclic dependency problem. Maybe a possible way forward, is to make 
getClusterId method available to all users, which means it does not require any 
authentication, so we can always call getClusterId with simple authentication, 
and then at client side, once we get the cluster id, we create a new RpcClient 
to select the correct authentication way.

The work in the sub-task, HBASE-26150 Let region server also carry 
ClientMetaService, is work to make it so the RegionServers can carry a 
ConnectionRegistry (rather than have the Masters-only carry it as is the case 
now). Adds a new method getBootstrapNodes to ClientMetaService, the 
ConnectionRegistry proto Service, for refreshing the bootstrap nodes 
periodically or on error. The new *RpcConnectionRegistry*  [Created here but 
defined in the next sub-task]will use this method to refresh the bootstrap 
nodes, while the old MasterRegistry will use the getMasters method to refresh 
the ‘bootstrap’ nodes.

The getBootstrapNodes method will return all the region servers, so after the 
first refreshing, the client will go to region servers for later rpc calls. But 
since masters and region servers both implement the ClientMetaService 
interface, it is free for the client to configure master as the initial 
bootstrap nodes.

The following sub-task then deprecates MasterRegistry, HBASE-26172 Deprecated 
MasterRegistry

The implementation of MasterRegistry is almost the same with 
RpcConnectionRegistry except that it uses getMasters instead of 
getBootstrapNodes to refresh the ‘bootstrap’ nodes connected to. So we could 
add configs in server side to control what nodes we want to return to client in 
getBootstrapNodes, i.e, master or region server, then the RpcConnectionRegistry 
can fully replace the old MasterRegistry. Deprecates the MasterRegistry.

Sub-task HBASE-26173 Return only a sub set of region servers as bootstrap nodes

For a large cluster which may have thousands of region servers, it is not a 
good idea to return all the region servers as bootstrap nodes to clients. So we 
should add a config at server side to control the max number of bootstrap nodes 
we want to return to clients. I think the default value could be 5 or 10, which 
is enough.

Sub-task HBASE-26174 Make rpc connection registry the default registry on 3.0.0

Just a follow up of HBASE-26172. MasterRegistry has been deprecated, we should 
not make it default for 3.0.0 any more.

Sub-task HBASE-26180 Introduce a initial refresh interval for 
RpcConnectionRegistry

As end users could configure any nodes in a cluster as the initial bootstrap 
nodes, it is possible that different end users will configure the same machine 
which makes the machine over load. So we should have a shorter delay for the 
initial refresh, to let users quickly switch to the bootstrap nodes we want 
them to connect to.

Sub-task HBASE-26181 Region server and master could use itself as 
ConnectionRegistry

This is an optimization to reduce the pressure on zookeeper. For 
MasterRegistry, we do not want to use it as the ConnectionRegistry for our 
cluster connection because:

// We use ZKConnectionRegistry for all the internal communication, 
primarily for these reasons:

// - Decouples RS and master life cycles. RegionServers can continue be up 
independent of

//   masters' availability.

// - Configuration management for region servers (cluster internal) is much 
simpler when adding

//   new masters or removing existing masters, since only clients' config 
needs to be updated.

// - We need to retain ZKConnectionRegistry for replication use anyway, so 
we just extend it for

//   

[jira] [Updated] (HBASE-26181) Region server and master could use itself as ConnectionRegistry

2021-08-10 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack updated HBASE-26181:
--
Description: 
As they already cached everything for connection registry in memory, the 
cluster connection can fetch the in memory data directly instead of go to 
zookeeper again.

This is an optimization to reduce the pressure on zookeeper.

For MasterRegistry, we do not want to use it as the ConnectionRegistry for our 
cluster connection because:

// We use ZKConnectionRegistry for all the internal communication, 
primarily for these reasons:

// - Decouples RS and master life cycles. RegionServers can continue be up 
independent of

//   masters' availability.

// - Configuration management for region servers (cluster internal) is much 
simpler when adding

//   new masters or removing existing masters, since only clients' config 
needs to be updated.

// - We need to retain ZKConnectionRegistry for replication use anyway, so 
we just extend it for

//   other internal connections too.

The above comments are in our code, in the HRegionServer.cleanupConfiguration 
method.

But since now, masters and regionservers both implement the ClientMetaService 
interface, we are free to just let the ConnectionRegistry to make use of these 
in memory information directly, instead of going to zookeeper again.

  was:As they already cached everything for connection registry in memory, the 
cluster connection can fetch the in memory data directly instead of go to 
zookeeper again.


> Region server and master could use itself as ConnectionRegistry
> ---
>
> Key: HBASE-26181
> URL: https://issues.apache.org/jira/browse/HBASE-26181
> Project: HBase
>  Issue Type: Sub-task
>  Components: master, regionserver
>Reporter: Duo Zhang
>Priority: Major
>
> As they already cached everything for connection registry in memory, the 
> cluster connection can fetch the in memory data directly instead of go to 
> zookeeper again.
> This is an optimization to reduce the pressure on zookeeper.
> For MasterRegistry, we do not want to use it as the ConnectionRegistry for 
> our cluster connection because:
> // We use ZKConnectionRegistry for all the internal communication, 
> primarily for these reasons:
> // - Decouples RS and master life cycles. RegionServers can continue be 
> up independent of
> //   masters' availability.
> // - Configuration management for region servers (cluster internal) is 
> much simpler when adding
> //   new masters or removing existing masters, since only clients' config 
> needs to be updated.
> // - We need to retain ZKConnectionRegistry for replication use anyway, 
> so we just extend it for
> //   other internal connections too.
> The above comments are in our code, in the HRegionServer.cleanupConfiguration 
> method.
> But since now, masters and regionservers both implement the ClientMetaService 
> interface, we are free to just let the ConnectionRegistry to make use of 
> these in memory information directly, instead of going to zookeeper again.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-26180) Introduce a initial refresh interval for RpcConnectionRegistry

2021-08-10 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack updated HBASE-26180:
--
Description: 
So we can get the new list soon once we connect to the cluster.

As end users could configure any nodes in a cluster as the initial bootstrap 
nodes, it is possible that different end users will configure the same machine 
which makes the machine over load. So we should have a shorter delay for the 
initial refresh, to let users quickly switch to the bootstrap nodes we want 
them to connect to.

  was:So we can get the new list soon once we connect to the cluster.


> Introduce a initial refresh interval for RpcConnectionRegistry
> --
>
> Key: HBASE-26180
> URL: https://issues.apache.org/jira/browse/HBASE-26180
> Project: HBase
>  Issue Type: Sub-task
>  Components: Client
>Reporter: Duo Zhang
>Priority: Major
>
> So we can get the new list soon once we connect to the cluster.
> As end users could configure any nodes in a cluster as the initial bootstrap 
> nodes, it is possible that different end users will configure the same 
> machine which makes the machine over load. So we should have a shorter delay 
> for the initial refresh, to let users quickly switch to the bootstrap nodes 
> we want them to connect to.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-26172) Deprecate MasterRegistry and allow getBootstrapNodes to return master address instead of region server

2021-08-10 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack updated HBASE-26172:
--
Description: 
Maybe in some environment we still want to use master as registry endpoint, but 
this should be controlled at cluster side, not client side.

 

The implementation of MasterRegistry is almost the same with 
RpcConnectionRegistry except that it uses getMasters instead of 
getBootstrapNodes to refresh the ‘bootstrap’ nodes connected to. So we could 
add configs in server side to control what nodes we want to return to client in 
getBootstrapNodes, i.e, master or region server, then the RpcConnectionRegistry 
can fully replace the old MasterRegistry. As part of this change, we deprecate 
the MasterRegistry.

  was:
Maybe in some environment we still want to use master as registry endpoint, but 
this should be controlled at cluster side, not client side.

 

The implementation of MasterRegistry is almost the same with 
RpcConnectionRegistry except that it uses getMasters instead of 
getBootstrapNodes to refresh the ‘bootstrap’ nodes connected to. So we could 
add configs in server side to control what nodes we want to return to client in 
getBootstrapNodes, i.e, master or region server, then the RpcConnectionRegistry 
can fully replace the old MasterRegistry. So after this change, we could 
deprecate the MasterRegistry.


> Deprecate MasterRegistry and allow getBootstrapNodes to return master address 
> instead of region server
> --
>
> Key: HBASE-26172
> URL: https://issues.apache.org/jira/browse/HBASE-26172
> Project: HBase
>  Issue Type: Sub-task
>  Components: Client
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
>
> Maybe in some environment we still want to use master as registry endpoint, 
> but this should be controlled at cluster side, not client side.
>  
> The implementation of MasterRegistry is almost the same with 
> RpcConnectionRegistry except that it uses getMasters instead of 
> getBootstrapNodes to refresh the ‘bootstrap’ nodes connected to. So we could 
> add configs in server side to control what nodes we want to return to client 
> in getBootstrapNodes, i.e, master or region server, then the 
> RpcConnectionRegistry can fully replace the old MasterRegistry. As part of 
> this change, we deprecate the MasterRegistry.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-26172) Deprecate MasterRegistry and allow getBootstrapNodes to return master address instead of region server

2021-08-10 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack updated HBASE-26172:
--
Summary: Deprecate MasterRegistry and allow getBootstrapNodes to return 
master address instead of region server  (was: Deprecated MasterRegistry and 
allow getBootstrapNodes to return master address instead of region server)

> Deprecate MasterRegistry and allow getBootstrapNodes to return master address 
> instead of region server
> --
>
> Key: HBASE-26172
> URL: https://issues.apache.org/jira/browse/HBASE-26172
> Project: HBase
>  Issue Type: Sub-task
>  Components: Client
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
>
> Maybe in some environment we still want to use master as registry endpoint, 
> but this should be controlled at cluster side, not client side.
>  
> The implementation of MasterRegistry is almost the same with 
> RpcConnectionRegistry except that it uses getMasters instead of 
> getBootstrapNodes to refresh the ‘bootstrap’ nodes connected to. So we could 
> add configs in server side to control what nodes we want to return to client 
> in getBootstrapNodes, i.e, master or region server, then the 
> RpcConnectionRegistry can fully replace the old MasterRegistry. So after this 
> change, we could deprecate the MasterRegistry.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-26172) Deprecated MasterRegistry and allow getBootstrapNodes to return master address instead of region server

2021-08-10 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack updated HBASE-26172:
--
Description: 
Maybe in some environment we still want to use master as registry endpoint, but 
this should be controlled at cluster side, not client side.

 

The implementation of MasterRegistry is almost the same with 
RpcConnectionRegistry except that it uses getMasters instead of 
getBootstrapNodes to refresh the ‘bootstrap’ nodes connected to. So we could 
add configs in server side to control what nodes we want to return to client in 
getBootstrapNodes, i.e, master or region server, then the RpcConnectionRegistry 
can fully replace the old MasterRegistry. So after this change, we could 
deprecate the MasterRegistry.

  was:Maybe in some environment we still want to use master as registry 
endpoint, but this should be controlled at cluster side, not client side.


> Deprecated MasterRegistry and allow getBootstrapNodes to return master 
> address instead of region server
> ---
>
> Key: HBASE-26172
> URL: https://issues.apache.org/jira/browse/HBASE-26172
> Project: HBase
>  Issue Type: Sub-task
>  Components: Client
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
>
> Maybe in some environment we still want to use master as registry endpoint, 
> but this should be controlled at cluster side, not client side.
>  
> The implementation of MasterRegistry is almost the same with 
> RpcConnectionRegistry except that it uses getMasters instead of 
> getBootstrapNodes to refresh the ‘bootstrap’ nodes connected to. So we could 
> add configs in server side to control what nodes we want to return to client 
> in getBootstrapNodes, i.e, master or region server, then the 
> RpcConnectionRegistry can fully replace the old MasterRegistry. So after this 
> change, we could deprecate the MasterRegistry.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-26149) Further improvements on ConnectionRegistry implementations

2021-08-10 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack updated HBASE-26149:
--
Description: 
(Copied in-line from the attached 'Documentation' with some filler as 
connecting script)

HBASE-23324 Deprecate clients that connect to Zookeeper

^^^ This is always our goal, to remove the zookeeper dependency from the client 
side.

 

See the sub-task HBASE-25051 DIGEST based auth broken for MasterRegistry

When constructing RpcClient, we will pass the clusterid in, and it will be used 
to select the authentication method. More specifically, it will be used to 
select the tokens for digest based authentication, please see the code in 
BuiltInProviderSelector. For ZKConnectionRegistry, we do not need to use 
RpcClient to connect to zookeeper, so we could get the cluster id first, and 
then create the RpcClient. But for MasterRegistry/RpcConnectionRegistry, we 
need to use RpcClient to connect to the ClientMetaService endpoints and then we 
can call the getClusterId method to get the cluster id. Because of this, when 
creating RpcClient for MasterRegistry/RpcConnectionRegistry, we can only pass 
null or the default cluster id, which means the digest based authentication is 
broken.

This is a cyclic dependency problem. Maybe a possible way forward, is to make 
getClusterId method available to all users, which means it does not require any 
authentication, so we can always call getClusterId with simple authentication, 
and then at client side, once we get the cluster id, we create a new RpcClient 
to select the correct authentication way.

The work in the sub-task, HBASE-26150 Let region server also carry 
ClientMetaService, is work to make it so the RegionServers can carry a 
ConnectionRegistry (rather than have the Masters-only carry it as is the case 
now). Adds a new method getBootstrapNodes to ClientMetaService, the 
ConnectionRegistry proto Service, for refreshing the bootstrap nodes 
periodically or on error. The new *RpcConnectionRegistry*  [Created here but 
defined in the next sub-task]will use this method to refresh the bootstrap 
nodes, while the old MasterRegistry will use the getMasters method to refresh 
the ‘bootstrap’ nodes.

The getBootstrapNodes method will return all the region servers, so after the 
first refreshing, the client will go to region servers for later rpc calls. But 
since masters and region servers both implement the ClientMetaService 
interface, it is free for the client to configure master as the initial 
bootstrap nodes.

HBASE-26172 Deprecated MasterRegistry and allow getBootstrapNodes to return 
master address instead of region server

The implementation of MasterRegistry is almost the same with 
RpcConnectionRegistry except that it uses getMasters instead of 
getBootstrapNodes to refresh the ‘bootstrap’ nodes connected to. So we could 
add configs in server side to control what nodes we want to return to client in 
getBootstrapNodes, i.e, master or region server, then the RpcConnectionRegistry 
can fully replace the old MasterRegistry. So after this change, we could 
deprecate the MasterRegistry.
h1. HBASE-26173 Return only a sub set of region servers as bootstrap nodes

For a large cluster which may have thousands of region servers, it is not a 
good idea to return all the region servers as bootstrap nodes to clients. So we 
should add a config at server side to control the max number of bootstrap nodes 
we want to return to clients. I think the default value could be 5 or 10, which 
is enough.
h1. HBASE-26174 Make rpc connection registry the default registry on 3.0.0

Just a follow up of HBASE-26172. MasterRegistry has been deprecated, we should 
not make it default for 3.0.0 any more.
h1. HBASE-26180 Introduce a initial refresh interval for RpcConnectionRegistry

As end users could configure any nodes in a cluster as the initial bootstrap 
nodes, it is possible that different end users will configure the same machine 
which makes the machine over load. So we should have a shorter delay for the 
initial refresh, to let users quickly switch to the bootstrap nodes we want 
them to connect to.
h1. HBASE-26181 Region server and master could use itself as ConnectionRegistry

This is an optimization to reduce the pressure on zookeeper. For 
MasterRegistry, we do not want to use it as the ConnectionRegistry for our 
cluster connection because:

// We use ZKConnectionRegistry for all the internal communication, 
primarily for these reasons:

// - Decouples RS and master life cycles. RegionServers can continue be up 
independent of

//   masters' availability.

// - Configuration management for region servers (cluster internal) is much 
simpler when adding

//   new masters or removing existing masters, since only clients' config 
needs to be updated.

// - We need to retain ZKConnectionRegistry for replication use anyway, so 
we just 

[jira] [Updated] (HBASE-26150) Let region server also carry ClientMetaService

2021-08-10 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack updated HBASE-26150:
--
Description: 
Usually region server will be deployed on machines which are more powerful than 
masters, so it will be good to let region servers carries more load.
h2. Background

This is preliminary work needed for one of the [splittable meta 
designs|https://docs.google.com/document/d/11ChsSb2LGrSzrSJz8pDCAw5IewmaMV0ZDN1LrMkAj4s/edit#heading=h.90th11txi153].
 We want to hide the implementation of ROOT from the client side, by adding new 
methods in ClientMetaService for locating meta Regions.


One of the concerns is that the original implementation of the Master Registery 
over in HBASE-18095 puts master inline in the normal read/write path, which is 
not always a good choice. This issue aims to move the ConnectionRegistry off of 
the masters.

In this issue, we let the region server implement the ClientMetaService 
interface, so the client could also get the cluster id, active master, meta 
region location, etc. from RegionServers (and not just Masters as it is 
currently).


We introduce a new method getBootstrapNodes in ClientMetaService, for 
refreshing the bootstrap nodes periodically or on error. The new 
RpcConnectionRegistry will use this method to refresh the bootstrap nodes, 
while the old MasterRegistry will use the getMasters method to refresh the 
‘bootstrap’ nodes.

The getBootstrapNodes method will return all the region servers, so after the 
first refreshing, the client will go to region servers for later rpc calls. But 
since masters and region servers both implement the ClientMetaService 
interface, it is free for the client to configure master as the initial 
bootstrap nodes.

  was:Usually region server will be deployed on machines which are more 
powerful than masters, so it will be good to let region servers carries more 
load.


> Let region server also carry ClientMetaService
> --
>
> Key: HBASE-26150
> URL: https://issues.apache.org/jira/browse/HBASE-26150
> Project: HBase
>  Issue Type: Sub-task
>  Components: Client, meta
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 2.5.0, 3.0.0-alpha-2
>
>
> Usually region server will be deployed on machines which are more powerful 
> than masters, so it will be good to let region servers carries more load.
> h2. Background
> This is preliminary work needed for one of the [splittable meta 
> designs|https://docs.google.com/document/d/11ChsSb2LGrSzrSJz8pDCAw5IewmaMV0ZDN1LrMkAj4s/edit#heading=h.90th11txi153].
>  We want to hide the implementation of ROOT from the client side, by adding 
> new methods in ClientMetaService for locating meta Regions.
> One of the concerns is that the original implementation of the Master 
> Registery over in HBASE-18095 puts master inline in the normal read/write 
> path, which is not always a good choice. This issue aims to move the 
> ConnectionRegistry off of the masters.
> In this issue, we let the region server implement the ClientMetaService 
> interface, so the client could also get the cluster id, active master, meta 
> region location, etc. from RegionServers (and not just Masters as it is 
> currently).
> We introduce a new method getBootstrapNodes in ClientMetaService, for 
> refreshing the bootstrap nodes periodically or on error. The new 
> RpcConnectionRegistry will use this method to refresh the bootstrap nodes, 
> while the old MasterRegistry will use the getMasters method to refresh the 
> ‘bootstrap’ nodes.
> The getBootstrapNodes method will return all the region servers, so after the 
> first refreshing, the client will go to region servers for later rpc calls. 
> But since masters and region servers both implement the ClientMetaService 
> interface, it is free for the client to configure master as the initial 
> bootstrap nodes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-26175) MetricsHBaseServer should record all kinds of Exceptions

2021-08-10 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17396735#comment-17396735
 ] 

Michael Stack commented on HBASE-26175:
---

Say more please [~Xiaolin Ha] ... I don't follow what it is that you are 
suggesting. Thanks.

> MetricsHBaseServer should record all kinds of Exceptions
> 
>
> Key: HBASE-26175
> URL: https://issues.apache.org/jira/browse/HBASE-26175
> Project: HBase
>  Issue Type: Improvement
>  Components: metrics
>Reporter: Xiaolin Ha
>Priority: Major
> Attachments: RequestTooBigException.png
>
>
> We can define a kind of Exception such as OtherExcpetions to record 
> exceptions doesn't in the following kinds of exceptions. Only debug those 
> exceptions by LOG.debug("Unknown exception type", throwable); is not helpful 
> to find errors.
> {code:java}
> if (throwable != null) {
>   if (throwable instanceof OutOfOrderScannerNextException) {
> source.outOfOrderException();
>   } else if (throwable instanceof RegionTooBusyException) {
> source.tooBusyException();
>   } else if (throwable instanceof UnknownScannerException) {
> source.unknownScannerException();
>   } else if (throwable instanceof ScannerResetException) {
> source.scannerResetException();
>   } else if (throwable instanceof RegionMovedException) {
> source.movedRegionException();
>   } else if (throwable instanceof NotServingRegionException) {
> source.notServingRegionException();
>   } else if (throwable instanceof FailedSanityCheckException) {
> source.failedSanityException();
>   } else if (throwable instanceof MultiActionResultTooLarge) {
> source.multiActionTooLargeException();
>   } else if (throwable instanceof CallQueueTooBigException) {
> source.callQueueTooBigException();
>   } else if (throwable instanceof QuotaExceededException) {
> source.quotaExceededException();
>   } else if (throwable instanceof RpcThrottlingException) {
> source.rpcThrottlingException();
>   } else if (LOG.isDebugEnabled()) {
> LOG.debug("Unknown exception type", throwable);
>   }
> }
> {code}
> !RequestTooBigException.png|width=787,height=336!
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-26183) Size of the Result object while querying huge data from HBASE table

2021-08-10 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17396723#comment-17396723
 ] 

Michael Stack commented on HBASE-26183:
---

You have a bound on the size of the JSON object? Do you have to fetch all in 
one go? The OOME is on the client-side or on server-side?

Also, this sort of query is best suited to the user mailing list rather than 
here in JIRA. Mind posting there? Thanks.

> Size of the Result object while querying huge data from HBASE table
> ---
>
> Key: HBASE-26183
> URL: https://issues.apache.org/jira/browse/HBASE-26183
> Project: HBase
>  Issue Type: New Feature
>  Components: scan
>Affects Versions: 1.1.13
>Reporter: shriram
>Priority: Major
>  Labels: performance
>
>  
> I am trying to query hbase table with rowkeys. We have the following structure
>  * index table which has rowkeys of the actual table
>  * actual table which contains json data in compressed format.
> When i am trying to query hbase, i have to scan first index table for rowkeys 
> using scan with some filters which will results to byte array.(row keys). 
> Once we obtained rowkeys, we are invoking listofGets() in Table object. Once 
> obtained we are iterating the object and prepare a list which contains 
> compressed json objects. Here we are not sure about the size and number of 
> the objects. In case of number of objects is huge we may result in OOM. Do we 
> have any options to return Iterator or buffering the results so that we can 
> avoid OOM.
>  {{for (byte[] rowkey : indexTableOutput)
> {Get get = new 
> Get(rowkey).addFamily(Bytes.toBytes(columnFamilty)).setMaxVersions(MAX_VERSIONS);
> listOfget.add(get);
> }}}
> The above piece of code which is used to retrieve the keys from index table.
>  {{TableName tableName = TableName.valueOf("table1");Table tableObj = 
> conn.getTable(tableName);
> Result[] results = tableObj.get(listOfget);}}
> From the above piece of code we have few queries. Any help would be 
> appreciated.
>  * If we have a huge number of data, Result[] will contain all the results?
>  * How to return a iterator kind of object so that we can leave it to 
> consumer because keeping all the data and doing processing will result in OOM
>  * Any other options to return a limited data so that consumer do processing 
> and continue
> I could find a resultscanner is returning for scan objects. But couldn't find 
> any other options for list of Get's. Here we know the exact keys from index 
> table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-26171) MetricsRegionServer may throws NullPointerException

2021-08-09 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17396409#comment-17396409
 ] 

Michael Stack commented on HBASE-26171:
---

How is IllegalStateException better than NPE? Thanks.

> MetricsRegionServer may throws NullPointerException
> ---
>
> Key: HBASE-26171
> URL: https://issues.apache.org/jira/browse/HBASE-26171
> Project: HBase
>  Issue Type: Improvement
>Reporter: Zixuan Liu
>Priority: Major
>
> MetricsRegionServer may throws NullPointerException When optional.get() is 
> called.
>  
>  
> {code:java}
> metricRegistry = 
> MetricRegistries.global().get(serverSource.getMetricRegistryInfo()).get();
>  
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-26122) Limit max result size of individual Gets

2021-08-09 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17396407#comment-17396407
 ] 

Michael Stack commented on HBASE-26122:
---

The above shell example would make nice release note. Added some comments on 
the PR.

> Limit max result size of individual Gets
> 
>
> Key: HBASE-26122
> URL: https://issues.apache.org/jira/browse/HBASE-26122
> Project: HBase
>  Issue Type: New Feature
>  Components: Client, regionserver
>Reporter: Bryan Beaudreault
>Assignee: Bryan Beaudreault
>Priority: Major
>
> Scans have the ability to have a configured max result size, which causes 
> them to return a partial result once the limit has been reached. MultiGets 
> also can throw MultiActionResultTooLarge if the response size is over a 
> configured quota. Neither of these really accounts for a single Get of a 
> too-large row. Such too-large Gets can cause substantial GC pressure or worse 
> if sent at volume.
> Currently one can work around this by converting their Get to a single row 
> Scan, but this requires a developer to proactively know about and prepare for 
> the issue by using a Scan upfront or wait for the RegionServer to choke on a 
> large request and only then rewrite the Get for future requests.
> We should implement the same response size limits for for Get as for Scan, 
> whereby the server returns a partial result to the client for handling.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-26150) Let region server also carry ClientMetaService

2021-08-09 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17396390#comment-17396390
 ] 

Michael Stack commented on HBASE-26150:
---

Any chance of a page writeup on whats being done here – up in parent? – so can 
do better reviews and follow-along whats going on? Thanks.

> Let region server also carry ClientMetaService
> --
>
> Key: HBASE-26150
> URL: https://issues.apache.org/jira/browse/HBASE-26150
> Project: HBase
>  Issue Type: Sub-task
>  Components: Client, meta
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 2.5.0, 3.0.0-alpha-2
>
>
> Usually region server will be deployed on machines which are more powerful 
> than masters, so it will be good to let region servers carries more load.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HBASE-26149) Further improvements on ConnectionRegistry implementations

2021-08-09 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17396385#comment-17396385
 ] 

Michael Stack edited comment on HBASE-26149 at 8/10/21, 3:26 AM:
-

Thanks.

This is interesting => " This is a cyclic dependency" (This must have been 
in place always?)

Whats the refresh of? Registry content?

Master hosts Registry currently. Do these changes make it so RS can host 
Registry? Or is it just the mechanics that is being moved over?

-RPC-based registry as opposed to which sort of Registry? One where we read 
from configs?- (Figured it out – or rather, I didn't. RpcConnectionRegistry is 
a new notion it seems) Thanks.


was (Author: stack):
Thanks.

This is interesting => " This is a cyclic dependency" (This must have been 
in place always?)

Whats the refresh of? Registry content?

Master hosts Resistry currently. Do these changes make it so RS can host 
Registry? Or is it just the mechanics that is being moved over?

-RPC-based registry as opposed to which sort of Registry? One where we read 
from configs?- (Figured it out) Thanks.

> Further improvements on ConnectionRegistry implementations
> --
>
> Key: HBASE-26149
> URL: https://issues.apache.org/jira/browse/HBASE-26149
> Project: HBase
>  Issue Type: Umbrella
>  Components: Client
>Reporter: Duo Zhang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HBASE-26149) Further improvements on ConnectionRegistry implementations

2021-08-09 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17396385#comment-17396385
 ] 

Michael Stack edited comment on HBASE-26149 at 8/10/21, 3:22 AM:
-

Thanks.

This is interesting => " This is a cyclic dependency" (This must have been 
in place always?)

Whats the refresh of? Registry content?

Master hosts Resistry currently. Do these changes make it so RS can host 
Registry? Or is it just the mechanics that is being moved over?

-RPC-based registry as opposed to which sort of Registry? One where we read 
from configs?- (Figured it out) Thanks.


was (Author: stack):
Thanks.

This is interesting => " This is a cyclic dependency" (This must have been 
in place always?)

Whats the refresh of? Registry content?

Master hosts Resistry currently. Do these changes make it so RS can host 
Registry? Or is it just the mechanics that is being moved over?

RPC-based registry as opposed to which sort of Registry? One where we read from 
configs? Thanks.

> Further improvements on ConnectionRegistry implementations
> --
>
> Key: HBASE-26149
> URL: https://issues.apache.org/jira/browse/HBASE-26149
> Project: HBase
>  Issue Type: Umbrella
>  Components: Client
>Reporter: Duo Zhang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-26149) Further improvements on ConnectionRegistry implementations

2021-08-09 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17396385#comment-17396385
 ] 

Michael Stack commented on HBASE-26149:
---

Thanks.

This is interesting => " This is a cyclic dependency" (This must have been 
in place always?)

Whats the refresh of? Registry content?

Master hosts Resistry currently. Do these changes make it so RS can host 
Registry? Or is it just the mechanics that is being moved over?

RPC-based registry as opposed to which sort of Registry? One where we read from 
configs? Thanks.

> Further improvements on ConnectionRegistry implementations
> --
>
> Key: HBASE-26149
> URL: https://issues.apache.org/jira/browse/HBASE-26149
> Project: HBase
>  Issue Type: Umbrella
>  Components: Client
>Reporter: Duo Zhang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-26150) Let region server also carry ClientMetaService

2021-08-09 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17396271#comment-17396271
 ] 

Michael Stack commented on HBASE-26150:
---

Trying to follow along... Seems like RS can carry Registry now too...not just 
Master?

When I read the release note I want to know what the RpcConnectionRegistry 
does Is 'hbase.client.bootstrap.servers'  the config we used to use 
specifying masters carrying registry or is it a new one?

Thanks

> Let region server also carry ClientMetaService
> --
>
> Key: HBASE-26150
> URL: https://issues.apache.org/jira/browse/HBASE-26150
> Project: HBase
>  Issue Type: Sub-task
>  Components: Client, meta
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 2.5.0, 3.0.0-alpha-2
>
>
> Usually region server will be deployed on machines which are more powerful 
> than masters, so it will be good to let region servers carries more load.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-26149) Further improvements on ConnectionRegistry implementations

2021-08-09 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17396263#comment-17396263
 ] 

Michael Stack commented on HBASE-26149:
---

Is there an overarching intent here [~zhangduo] or just random improvements? 
Thanks.

> Further improvements on ConnectionRegistry implementations
> --
>
> Key: HBASE-26149
> URL: https://issues.apache.org/jira/browse/HBASE-26149
> Project: HBase
>  Issue Type: Umbrella
>  Components: Client
>Reporter: Duo Zhang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-6908) Pluggable Call BlockingQueue for HBaseServer

2021-08-09 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-6908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17396251#comment-17396251
 ] 

Michael Stack commented on HBASE-6908:
--

Added you as a contributor [~rmarscher]  and assigned you this resolved issue.

> Pluggable Call BlockingQueue for HBaseServer
> 
>
> Key: HBASE-6908
> URL: https://issues.apache.org/jira/browse/HBASE-6908
> Project: HBase
>  Issue Type: New Feature
>  Components: IPC/RPC
>Reporter: James Taylor
>Assignee: Richard Marscher
>Priority: Major
> Fix For: 2.5.0, 3.0.0-alpha-2, 2.4.6
>
>
> Allow the BlockingQueue implementation class to be specified in the HBase 
> config to enable different behavior than a FIFO queue. The use case we have 
> is around fairness and starvation for big scans that are parallelized on the 
> client. It's easy to fill up the HBase server Call BlockingQueue when 
> processing a single parallelized scan, leadng other scans to time out. 
> Instead, doing round robin processesing on a dequeue through a different 
> BlockingQueue implementation will prevent this from occurring.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HBASE-6908) Pluggable Call BlockingQueue for HBaseServer

2021-08-09 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-6908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack reassigned HBASE-6908:


Assignee: Richard Marscher  (was: Michael Stack)

> Pluggable Call BlockingQueue for HBaseServer
> 
>
> Key: HBASE-6908
> URL: https://issues.apache.org/jira/browse/HBASE-6908
> Project: HBase
>  Issue Type: New Feature
>  Components: IPC/RPC
>Reporter: James Taylor
>Assignee: Richard Marscher
>Priority: Major
> Fix For: 2.5.0, 3.0.0-alpha-2, 2.4.6
>
>
> Allow the BlockingQueue implementation class to be specified in the HBase 
> config to enable different behavior than a FIFO queue. The use case we have 
> is around fairness and starvation for big scans that are parallelized on the 
> client. It's easy to fill up the HBase server Call BlockingQueue when 
> processing a single parallelized scan, leadng other scans to time out. 
> Instead, doing round robin processesing on a dequeue through a different 
> BlockingQueue implementation will prevent this from occurring.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HBASE-6908) Pluggable Call BlockingQueue for HBaseServer

2021-08-09 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-6908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack reassigned HBASE-6908:


Assignee: Michael Stack

> Pluggable Call BlockingQueue for HBaseServer
> 
>
> Key: HBASE-6908
> URL: https://issues.apache.org/jira/browse/HBASE-6908
> Project: HBase
>  Issue Type: New Feature
>  Components: IPC/RPC
>Reporter: James Taylor
>Assignee: Michael Stack
>Priority: Major
> Fix For: 2.5.0, 3.0.0-alpha-2, 2.4.6
>
>
> Allow the BlockingQueue implementation class to be specified in the HBase 
> config to enable different behavior than a FIFO queue. The use case we have 
> is around fairness and starvation for big scans that are parallelized on the 
> client. It's easy to fill up the HBase server Call BlockingQueue when 
> processing a single parallelized scan, leadng other scans to time out. 
> Instead, doing round robin processesing on a dequeue through a different 
> BlockingQueue implementation will prevent this from occurring.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-6908) Pluggable Call BlockingQueue for HBaseServer

2021-08-09 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-6908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-6908.
--
Fix Version/s: 2.4.6
   3.0.0-alpha-2
   2.5.0
 Hadoop Flags: Reviewed
 Release Note: 
Can pass in a FQCN to load as the call queue implementation.

Standardized arguments to the constructor are the max queue length, the 
PriorityFunction, and the Configuration.

PluggableBlockingQueue abstract class provided to help guide the correct 
constructor signature.

Hard fails with PluggableRpcQueueNotFound if the class fails to load as a 
BlockingQueue

Upstreaming on behalf of Hubspot, we are interested in defining our own custom 
RPC queue and don't want to get involved in necessarily upstreaming internal 
requirements/iterations. 

   Resolution: Fixed

Merged to branch-2.4+. Thanks for the clean pluggable Interface [~rmarsch]  
I put your PR comment as release note. Edit if you see fit.

> Pluggable Call BlockingQueue for HBaseServer
> 
>
> Key: HBASE-6908
> URL: https://issues.apache.org/jira/browse/HBASE-6908
> Project: HBase
>  Issue Type: New Feature
>  Components: IPC/RPC
>Reporter: James Taylor
>Priority: Major
> Fix For: 2.5.0, 3.0.0-alpha-2, 2.4.6
>
>
> Allow the BlockingQueue implementation class to be specified in the HBase 
> config to enable different behavior than a FIFO queue. The use case we have 
> is around fairness and starvation for big scans that are parallelized on the 
> client. It's easy to fill up the HBase server Call BlockingQueue when 
> processing a single parallelized scan, leadng other scans to time out. 
> Instead, doing round robin processesing on a dequeue through a different 
> BlockingQueue implementation will prevent this from occurring.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-26170) handleTooBigRequest in NettyRpcServer didn't skip enough bytes

2021-08-05 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-26170.
---
Fix Version/s: 2.3.7
   2.4.6
   3.0.0-alpha-2
 Hadoop Flags: Reviewed
   Resolution: Fixed

Merged to branch-2.3+. Nice fix [~Xiaolin Ha]

> handleTooBigRequest in NettyRpcServer didn't skip enough bytes
> --
>
> Key: HBASE-26170
> URL: https://issues.apache.org/jira/browse/HBASE-26170
> Project: HBase
>  Issue Type: Bug
>Reporter: Xiaolin Ha
>Assignee: Xiaolin Ha
>Priority: Major
> Fix For: 3.0.0-alpha-2, 2.4.6, 2.3.7
>
> Attachments: error-logs.png
>
>
> We found there are always coredump problems after too big requests, the logs 
> are as follows,
> !error-logs.png|width=1040,height=187!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   4   5   6   7   8   9   10   >