[jira] [Commented] (HBASE-28506) Remove hbase-compression-xz

2024-04-12 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17836748#comment-17836748
 ] 

Andrew Kyle Purtell commented on HBASE-28506:
-

Thanks [~bbeaudreault] and [~zhangduo] for the reviews

> Remove hbase-compression-xz
> ---
>
> Key: HBASE-28506
> URL: https://issues.apache.org/jira/browse/HBASE-28506
> Project: HBase
>  Issue Type: Task
>Reporter: Andrew Kyle Purtell
>Assignee: Andrew Kyle Purtell
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.6.0, 3.0.0-beta-2
>
>
> Refer to [https://lists.apache.org/thread/on62z40rwotrcc8w1l5n55rd4zldho5g] .
> Deprecate in 2.5.x, remove in 2.6.
> I will add a release note when resolving this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-28507) Deprecate hbase-compression-xz

2024-04-12 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17836747#comment-17836747
 ] 

Andrew Kyle Purtell commented on HBASE-28507:
-

Thanks [~bbeaudreault] and [~zhangduo] for the reviews

> Deprecate hbase-compression-xz
> --
>
> Key: HBASE-28507
> URL: https://issues.apache.org/jira/browse/HBASE-28507
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Andrew Kyle Purtell
>Assignee: Andrew Kyle Purtell
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.5.9
>
>
> Refer to [https://lists.apache.org/thread/on62z40rwotrcc8w1l5n55rd4zldho5g] .
> Deprecate in 2.5.x.
> I will add a release note when resolving this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28507) Deprecate hbase-compression-xz

2024-04-12 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell updated HBASE-28507:

Hadoop Flags: Reviewed
  Resolution: Fixed
  Status: Resolved  (was: Patch Available)

> Deprecate hbase-compression-xz
> --
>
> Key: HBASE-28507
> URL: https://issues.apache.org/jira/browse/HBASE-28507
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Andrew Kyle Purtell
>Assignee: Andrew Kyle Purtell
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.5.9
>
>
> Refer to [https://lists.apache.org/thread/on62z40rwotrcc8w1l5n55rd4zldho5g] .
> Deprecate in 2.5.x.
> I will add a release note when resolving this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-27826) Region split and merge time while offline is O(n) with respect to number of store files

2024-04-12 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17836743#comment-17836743
 ] 

Andrew Kyle Purtell commented on HBASE-27826:
-

We can use this issue as an umbrella and break up changes into subtasks with 
small scope that will be easy to review.

> Region split and merge time while offline is O(n) with respect to number of 
> store files
> ---
>
> Key: HBASE-27826
> URL: https://issues.apache.org/jira/browse/HBASE-27826
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.5.4
>Reporter: Andrew Kyle Purtell
>Priority: Major
> Fix For: 2.7.0, 3.0.0-beta-2
>
>
> This is a significant availability issue when HFiles are on S3. =
> HBASE-26079 ({_}Use StoreFileTracker when splitting and merging{_}) changed 
> the split and merge table procedure implementations to indirect through the 
> StoreFileTracker implementation when selecting HFiles to be merged or split, 
> rather than directly listing those using file system APIs. It also changed 
> the commit logic in HRegionFileSystem to add the link/ref files on resulting 
> split or merged regions to the StoreFileTracker. However, the creation of a 
> link file is still a filesystem operation and creating a “file” on S3 can 
> take well over a second. If, for example there are 20 store files in a 
> region, which is not uncommon, after the region is taken offline for a split 
> (or merge) it may require more than 20 seconds to create the link files 
> before the results can be brought back online, creating a severe availability 
> problem. Splits and merges are supposed to be fast, completing in less than a 
> second, certainly less than a few seconds. This has been true when HFiles are 
> stored on HDFS only because file creation operations there are nearly 
> instantaneous. 
> There are two issues but both can be handled with modifications to the store 
> file tracker interface and the file based store file tracker implementation. 
> When the file based store file file tracker is enabled the HFile links should 
> be virtual entities that only exist in the file manifest. We do not require 
> physical files in the filesystem to serve as links now. That is the magic of 
> the this file tracker, the manifest file replaces requirements to list the 
> filesystem.
> Then, when splitting or merging, the HFile links should be collected into a 
> list and committed in one batch using a new FILE file tracker interface, 
> requiring only one update of the manifest file in S3, bringing the time 
> requirement for this operation to O(1) down from O[n].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (HBASE-28447) New site configuration option "hfile.block.size"

2024-04-10 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835852#comment-17835852
 ] 

Andrew Kyle Purtell edited comment on HBASE-28447 at 4/10/24 6:55 PM:
--

PR #5820 introduces a new configuration setting - "hfile.block.size" - that, if 
set, will define the default blocksize to use when writing HFiles if a column 
family schema does not define its own non-default block size. This is a bit 
complicated but required for compatability. The rules are:
 * If the schema specifies a non default block size, use it.
 * Otherwise, if the configuration specifies a non default block size, use it.
 * Otherwise, use the default block size.

The default is defined by HConstants.DEFAULT_BLOCKSIZE.

Given how compound configurations work the precedence order for a non default 
block size is: BLOCKSIZE in the column family schema > "hfile.block.size" in CF 
or table level schema > "hfile.block.size" in site configuration > 
HConstants.DEFAULT_BLOCKSIZE


was (Author: apurtell):
PR #5820 introduces a new configuration setting - "hfile.block.size" - that, if 
set, will define the default blocksize to use when writing HFiles if a column 
family schema does not define its own non-default block size. This is a bit 
complicated but required for compatability. The rules are:
 * If the schema specifies a non default block size, use it.
 * Otherwise, if the configuration specifies a non default block size, use it.
 * Otherwise, use the default block size.

The default is defined by HConstants.DEFAULT_BLOCKSIZE.

> New site configuration option "hfile.block.size"
> 
>
> Key: HBASE-28447
> URL: https://issues.apache.org/jira/browse/HBASE-28447
> Project: HBase
>  Issue Type: Improvement
>Reporter: Gourab Taparia
>Assignee: Andrew Kyle Purtell
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 2.6.0, 2.7.0, 3.0.0-beta-2, 2.5.9
>
>
> Right now there is no config attached to the HFile block size by which we can 
> override the default. The default is set to 64 KB in 
> HConstants.DEFAULT_BLOCKSIZE . We need a global config property that would go 
> on hbase-site.xm which can control this value.
> Since the BLOCKSIZE is tracked at the column family level - we will need to 
> respect the CFD value first. Also, configuration settings are also something 
> that can be set in schema, at the column or table level, and will override 
> the relevant values from the site file. Below is the precedence order we can 
> use to get the final blocksize value :
> {code:java}
> ColumnFamilyDescriptor.BLOCKSIZE > schema level site configuration overrides 
> > site configuration > HConstants.DEFAULT_BLOCKSIZE{code}
> PS: There is one related config “hbase.mapreduce.hfileoutputformat.blocksize” 
> however that is specific to map-reduce jobs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28447) New site configuration option "hfile.block.size"

2024-04-10 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell updated HBASE-28447:

Summary: New site configuration option "hfile.block.size"  (was: New 
configuration to override the hfile specific blocksize)

> New site configuration option "hfile.block.size"
> 
>
> Key: HBASE-28447
> URL: https://issues.apache.org/jira/browse/HBASE-28447
> Project: HBase
>  Issue Type: Improvement
>Reporter: Gourab Taparia
>Assignee: Andrew Kyle Purtell
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 2.6.0, 2.7.0, 3.0.0-beta-2, 2.5.9
>
>
> Right now there is no config attached to the HFile block size by which we can 
> override the default. The default is set to 64 KB in 
> HConstants.DEFAULT_BLOCKSIZE . We need a global config property that would go 
> on hbase-site.xm which can control this value.
> Since the BLOCKSIZE is tracked at the column family level - we will need to 
> respect the CFD value first. Also, configuration settings are also something 
> that can be set in schema, at the column or table level, and will override 
> the relevant values from the site file. Below is the precedence order we can 
> use to get the final blocksize value :
> {code:java}
> ColumnFamilyDescriptor.BLOCKSIZE > schema level site configuration overrides 
> > site configuration > HConstants.DEFAULT_BLOCKSIZE{code}
> PS: There is one related config “hbase.mapreduce.hfileoutputformat.blocksize” 
> however that is specific to map-reduce jobs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (HBASE-28447) New configuration to override the hfile specific blocksize

2024-04-10 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835852#comment-17835852
 ] 

Andrew Kyle Purtell edited comment on HBASE-28447 at 4/10/24 6:45 PM:
--

PR #5820 introduces a new configuration setting - "hfile.block.size" - that, if 
set, will define the default blocksize to use when writing HFiles if a column 
family schema does not define its own non-default block size. This is a bit 
complicated but required for compatability. The rules are:
 * If the schema specifies a non default block size, use it.
 * Otherwise, if the configuration specifies a non default block size, use it.
 * Otherwise, use the default block size.

The default is defined by HConstants.DEFAULT_BLOCKSIZE.


was (Author: apurtell):
PR #5820 introduces a new configuration setting - "hfile.block.size" - that, if 
set, will define the default blocksize to use when writing HFiles if a column 
family schema does not define its own non-default block size. This is a bit 
complicated but required for compatability. The rules are:
 * If the schema specifies a non default block size, use it.
 * Otherwise, if the configuration specifies a non default block size, use it.
 * Otherwise, use the default block size. The default is defined by 
HConstants.DEFAULT_BLOCKSIZE.

> New configuration to override the hfile specific blocksize
> --
>
> Key: HBASE-28447
> URL: https://issues.apache.org/jira/browse/HBASE-28447
> Project: HBase
>  Issue Type: Improvement
>Reporter: Gourab Taparia
>Assignee: Andrew Kyle Purtell
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 2.6.0, 2.7.0, 3.0.0-beta-2, 2.5.9
>
>
> Right now there is no config attached to the HFile block size by which we can 
> override the default. The default is set to 64 KB in 
> HConstants.DEFAULT_BLOCKSIZE . We need a global config property that would go 
> on hbase-site.xm which can control this value.
> Since the BLOCKSIZE is tracked at the column family level - we will need to 
> respect the CFD value first. Also, configuration settings are also something 
> that can be set in schema, at the column or table level, and will override 
> the relevant values from the site file. Below is the precedence order we can 
> use to get the final blocksize value :
> {code:java}
> ColumnFamilyDescriptor.BLOCKSIZE > schema level site configuration overrides 
> > site configuration > HConstants.DEFAULT_BLOCKSIZE{code}
> PS: There is one related config “hbase.mapreduce.hfileoutputformat.blocksize” 
> however that is specific to map-reduce jobs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] (HBASE-28447) New configuration to override the hfile specific blocksize

2024-04-10 Thread Andrew Kyle Purtell (Jira)


[ https://issues.apache.org/jira/browse/HBASE-28447 ]


Andrew Kyle Purtell deleted comment on HBASE-28447:
-

was (Author: apurtell):
(y)

> New configuration to override the hfile specific blocksize
> --
>
> Key: HBASE-28447
> URL: https://issues.apache.org/jira/browse/HBASE-28447
> Project: HBase
>  Issue Type: Improvement
>Reporter: Gourab Taparia
>Assignee: Andrew Kyle Purtell
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 2.6.0, 2.7.0, 3.0.0-beta-2, 2.5.9
>
>
> Right now there is no config attached to the HFile block size by which we can 
> override the default. The default is set to 64 KB in 
> HConstants.DEFAULT_BLOCKSIZE . We need a global config property that would go 
> on hbase-site.xm which can control this value.
> Since the BLOCKSIZE is tracked at the column family level - we will need to 
> respect the CFD value first. Also, configuration settings are also something 
> that can be set in schema, at the column or table level, and will override 
> the relevant values from the site file. Below is the precedence order we can 
> use to get the final blocksize value :
> {code:java}
> ColumnFamilyDescriptor.BLOCKSIZE > schema level site configuration overrides 
> > site configuration > HConstants.DEFAULT_BLOCKSIZE{code}
> PS: There is one related config “hbase.mapreduce.hfileoutputformat.blocksize” 
> however that is specific to map-reduce jobs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28447) New configuration to override the hfile specific blocksize

2024-04-10 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell updated HBASE-28447:

Status: Patch Available  (was: Open)

> New configuration to override the hfile specific blocksize
> --
>
> Key: HBASE-28447
> URL: https://issues.apache.org/jira/browse/HBASE-28447
> Project: HBase
>  Issue Type: Improvement
>Reporter: Gourab Taparia
>Assignee: Andrew Kyle Purtell
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 2.6.0, 2.7.0, 3.0.0-beta-2, 2.5.9
>
>
> Right now there is no config attached to the HFile block size by which we can 
> override the default. The default is set to 64 KB in 
> HConstants.DEFAULT_BLOCKSIZE . We need a global config property that would go 
> on hbase-site.xm which can control this value.
> Since the BLOCKSIZE is tracked at the column family level - we will need to 
> respect the CFD value first. Also, configuration settings are also something 
> that can be set in schema, at the column or table level, and will override 
> the relevant values from the site file. Below is the precedence order we can 
> use to get the final blocksize value :
> {code:java}
> ColumnFamilyDescriptor.BLOCKSIZE > schema level site configuration overrides 
> > site configuration > HConstants.DEFAULT_BLOCKSIZE{code}
> PS: There is one related config “hbase.mapreduce.hfileoutputformat.blocksize” 
> however that is specific to map-reduce jobs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-28447) New configuration to override the hfile specific blocksize

2024-04-10 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835852#comment-17835852
 ] 

Andrew Kyle Purtell commented on HBASE-28447:
-

PR #5820 introduces a new configuration setting - "hfile.block.size" - that, if 
set, will define the default blocksize to use when writing HFiles if a column 
family schema does not define its own non-default block size. This is a bit 
complicated but required for compatability. The rules are:
 * If the schema specifies a non default block size, use it.
 * Otherwise, if the configuration specifies a non default block size, use it.
 * Otherwise, use the default block size. The default is defined by 
HConstants.DEFAULT_BLOCKSIZE.

> New configuration to override the hfile specific blocksize
> --
>
> Key: HBASE-28447
> URL: https://issues.apache.org/jira/browse/HBASE-28447
> Project: HBase
>  Issue Type: Improvement
>Reporter: Gourab Taparia
>Assignee: Andrew Kyle Purtell
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 2.6.0, 2.7.0, 3.0.0-beta-2, 2.5.9
>
>
> Right now there is no config attached to the HFile block size by which we can 
> override the default. The default is set to 64 KB in 
> HConstants.DEFAULT_BLOCKSIZE . We need a global config property that would go 
> on hbase-site.xm which can control this value.
> Since the BLOCKSIZE is tracked at the column family level - we will need to 
> respect the CFD value first. Also, configuration settings are also something 
> that can be set in schema, at the column or table level, and will override 
> the relevant values from the site file. Below is the precedence order we can 
> use to get the final blocksize value :
> {code:java}
> ColumnFamilyDescriptor.BLOCKSIZE > schema level site configuration overrides 
> > site configuration > HConstants.DEFAULT_BLOCKSIZE{code}
> PS: There is one related config “hbase.mapreduce.hfileoutputformat.blocksize” 
> however that is specific to map-reduce jobs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28506) Remove hbase-compression-xz

2024-04-10 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell updated HBASE-28506:

Hadoop Flags: Incompatible change,Reviewed
Release Note: 
CVE-2024-3094 implicated recent releases of the native liblzma library as a 
vector for malicious code. While this does not include the LZMA algorithm 
implementation we use to support XZ compression in hbase-compression-xz, 
xz-java, how the backdoor was introduced calls into question the 
trustworthiness and viability of the XZ project. XZ compression provides little 
to no value over more modern alternatives, like ZStandard, that can also 
achieve similar compression ratios, and to our knowledge no HBase users of XZ 
compression exist.

XZ compression support has been deprecated in 2.5 and removed in 2.6 and up. 
  Resolution: Fixed
  Status: Resolved  (was: Patch Available)

Subtask to deprecate in 2.5 is still unresolved but review feedback has been 
addressed and it will land shortly.

> Remove hbase-compression-xz
> ---
>
> Key: HBASE-28506
> URL: https://issues.apache.org/jira/browse/HBASE-28506
> Project: HBase
>  Issue Type: Task
>Reporter: Andrew Kyle Purtell
>Assignee: Andrew Kyle Purtell
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.6.0, 3.0.0-beta-2
>
>
> Refer to [https://lists.apache.org/thread/on62z40rwotrcc8w1l5n55rd4zldho5g] .
> Deprecate in 2.5.x, remove in 2.6.
> I will add a release note when resolving this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28506) Remove hbase-compression-xz

2024-04-09 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell updated HBASE-28506:

Status: Patch Available  (was: Open)

> Remove hbase-compression-xz
> ---
>
> Key: HBASE-28506
> URL: https://issues.apache.org/jira/browse/HBASE-28506
> Project: HBase
>  Issue Type: Task
>Reporter: Andrew Kyle Purtell
>Assignee: Andrew Kyle Purtell
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.6.0, 3.0.0-beta-2
>
>
> Refer to [https://lists.apache.org/thread/on62z40rwotrcc8w1l5n55rd4zldho5g] .
> Deprecate in 2.5.x, remove in 2.6.
> I will add a release note when resolving this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28507) Deprecate hbase-compression-xz

2024-04-09 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell updated HBASE-28507:

Status: Patch Available  (was: Open)

> Deprecate hbase-compression-xz
> --
>
> Key: HBASE-28507
> URL: https://issues.apache.org/jira/browse/HBASE-28507
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Andrew Kyle Purtell
>Assignee: Andrew Kyle Purtell
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.5.9
>
>
> Refer to [https://lists.apache.org/thread/on62z40rwotrcc8w1l5n55rd4zldho5g] .
> Deprecate in 2.5.x.
> I will add a release note when resolving this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28506) Remove hbase-compression-xz

2024-04-09 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell updated HBASE-28506:

Description: 
Refer to [https://lists.apache.org/thread/on62z40rwotrcc8w1l5n55rd4zldho5g] .

Deprecate in 2.5.x, remove in 2.6.

I will add a release note when resolving this issue.

  was:
Refer to [https://lists.apache.org/thread/on62z40rwotrcc8w1l5n55rd4zldho5g] .

Deprecate in 2.5.x, remove in 2.6.

 


> Remove hbase-compression-xz
> ---
>
> Key: HBASE-28506
> URL: https://issues.apache.org/jira/browse/HBASE-28506
> Project: HBase
>  Issue Type: Task
>Reporter: Andrew Kyle Purtell
>Assignee: Andrew Kyle Purtell
>Priority: Major
> Fix For: 2.6.0, 3.0.0-beta-2
>
>
> Refer to [https://lists.apache.org/thread/on62z40rwotrcc8w1l5n55rd4zldho5g] .
> Deprecate in 2.5.x, remove in 2.6.
> I will add a release note when resolving this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28507) Deprecate hbase-compression-xz

2024-04-09 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell updated HBASE-28507:

Description: 
Refer to [https://lists.apache.org/thread/on62z40rwotrcc8w1l5n55rd4zldho5g] .

Deprecate in 2.5.x.

I will add a release note when resolving this issue.

  was:
Refer to [https://lists.apache.org/thread/on62z40rwotrcc8w1l5n55rd4zldho5g] .

Deprecate in 2.5.x


> Deprecate hbase-compression-xz
> --
>
> Key: HBASE-28507
> URL: https://issues.apache.org/jira/browse/HBASE-28507
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Andrew Kyle Purtell
>Assignee: Andrew Kyle Purtell
>Priority: Major
> Fix For: 2.5.9
>
>
> Refer to [https://lists.apache.org/thread/on62z40rwotrcc8w1l5n55rd4zldho5g] .
> Deprecate in 2.5.x.
> I will add a release note when resolving this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28506) Remove hbase-compression-xz

2024-04-09 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell updated HBASE-28506:

Description: 
Refer to [https://lists.apache.org/thread/on62z40rwotrcc8w1l5n55rd4zldho5g] .

Deprecate in 2.5.x, remove in 2.6.

 

> Remove hbase-compression-xz
> ---
>
> Key: HBASE-28506
> URL: https://issues.apache.org/jira/browse/HBASE-28506
> Project: HBase
>  Issue Type: Task
>Reporter: Andrew Kyle Purtell
>Assignee: Andrew Kyle Purtell
>Priority: Major
> Fix For: 2.6.0, 3.0.0-beta-2
>
>
> Refer to [https://lists.apache.org/thread/on62z40rwotrcc8w1l5n55rd4zldho5g] .
> Deprecate in 2.5.x, remove in 2.6.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28507) Deprecate hbase-compression-xz

2024-04-09 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell updated HBASE-28507:

Description: 
Refer to [https://lists.apache.org/thread/on62z40rwotrcc8w1l5n55rd4zldho5g] .

Deprecate in 2.5.x

> Deprecate hbase-compression-xz
> --
>
> Key: HBASE-28507
> URL: https://issues.apache.org/jira/browse/HBASE-28507
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Andrew Kyle Purtell
>Assignee: Andrew Kyle Purtell
>Priority: Major
> Fix For: 2.5.9
>
>
> Refer to [https://lists.apache.org/thread/on62z40rwotrcc8w1l5n55rd4zldho5g] .
> Deprecate in 2.5.x



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-28506) Remove hbase-compression-xz

2024-04-09 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835475#comment-17835475
 ] 

Andrew Kyle Purtell commented on HBASE-28506:
-

Sorry for the late breaking action [~bbeaudreault] , I hope this is not too 
inconvenient.

> Remove hbase-compression-xz
> ---
>
> Key: HBASE-28506
> URL: https://issues.apache.org/jira/browse/HBASE-28506
> Project: HBase
>  Issue Type: Task
>Reporter: Andrew Kyle Purtell
>Assignee: Andrew Kyle Purtell
>Priority: Major
> Fix For: 2.6.0, 3.0.0-beta-2
>
>
> Refer to [https://lists.apache.org/thread/on62z40rwotrcc8w1l5n55rd4zldho5g] .
> Deprecate in 2.5.x, remove in 2.6.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28506) Remove hbase-compression-xz

2024-04-09 Thread Andrew Kyle Purtell (Jira)
Andrew Kyle Purtell created HBASE-28506:
---

 Summary: Remove hbase-compression-xz
 Key: HBASE-28506
 URL: https://issues.apache.org/jira/browse/HBASE-28506
 Project: HBase
  Issue Type: Task
Reporter: Andrew Kyle Purtell
Assignee: Andrew Kyle Purtell
 Fix For: 2.6.0, 3.0.0-beta-2






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28507) Deprecate hbase-compression-xz

2024-04-09 Thread Andrew Kyle Purtell (Jira)
Andrew Kyle Purtell created HBASE-28507:
---

 Summary: Deprecate hbase-compression-xz
 Key: HBASE-28507
 URL: https://issues.apache.org/jira/browse/HBASE-28507
 Project: HBase
  Issue Type: Sub-task
Reporter: Andrew Kyle Purtell
Assignee: Andrew Kyle Purtell
 Fix For: 2.5.9






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-26192) Master UI hbck should provide a JSON formatted output option

2024-04-08 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell updated HBASE-26192:

Hadoop Flags: Reviewed
  Resolution: Fixed
  Status: Resolved  (was: Patch Available)

> Master UI hbck should provide a JSON formatted output option
> 
>
> Key: HBASE-26192
> URL: https://issues.apache.org/jira/browse/HBASE-26192
> Project: HBase
>  Issue Type: New Feature
>Reporter: Andrew Kyle Purtell
>Assignee: Mihir Monani
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 2.7.0, 3.0.0-beta-2, 2.6.1, 2.5.9
>
> Attachments: HBCK Report in JSON Format.png, Screen Shot 2022-05-31 
> at 5.18.15 PM.png
>
>
> It used to be possible to get hbck's verdict of cluster status from the 
> command line, especially useful for headless deployments, i.e. without 
> requiring a browser with sufficient connectivity to load a UI, or scrape 
> information out of raw HTML, or write regex to comb over log4j output. The 
> hbck tool's output wasn't particularly convenient to parse but it was 
> straightforward to extract the desired information with a handful of regular 
> expressions. 
> HBCK2 has a different design philosophy than the old hbck, which is to serve 
> as a collection of small and discrete recovery and repair functions, rather 
> than attempt to be a universal repair tool. This makes a lot of sense and 
> isn't the issue at hand. Unfortunately the old hbck's utility for reporting 
> the current cluster health assessment has not been replaced either in whole 
> or in part. Instead:
> {quote}
> HBCK2 is for fixes. For listings of inconsistencies or blockages in the 
> running cluster, you go elsewhere, to the logs and UI of the running cluster 
> Master. Once an issue has been identified, you use the HBCK2 tool to ask the 
> Master to effect fixes or to skip-over bad state. Asking the Master to make 
> the fixes rather than try and effect the repair locally in a fix-it tool's 
> context is another important difference between HBCK2 and hbck1. 
> {quote}
> Developing custom tooling to mine logs and scrape UI simply to gain a top 
> level assessment of system health is unsatisfying. There should be a 
> convenient means for querying the system if issues that rise to the level of 
> _inconsistency_, in the hbck parlance, are believed to be present. It would 
> be relatively simple to bring back the experience of invoking a command line 
> tool to deliver a verdict. This could be added to the hbck2 tool itself but 
> given that hbase-operator-tools is a separate project an intrinsic solution 
> is desirable. 
> An option that immediately comes to mind is modification of the Master's 
> hbck.jsp page to provide a JSON formatted output option if the HTTP Accept 
> header asks for text/json. However, looking at the source of hbck.jsp, it 
> makes more sense to leave it as is and implement a convenient machine 
> parseable output format elsewhere. This can be trivially accomplished with a 
> new servlet. Like hbck.jsp the servlet implementation would get a reference 
> to HbckChore and present the information this class makes available via its 
> various getters.  
> The machine parseable output is sufficient to enable headless hbck status 
> checking but it still would be nice if we could provide operators a command 
> line tool that formats the information for convenient viewing in a terminal. 
> That part could be implemented in the hbck2 tool after this proposal is 
> implemented.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28447) New configuration to override the hfile specific blocksize

2024-04-08 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell updated HBASE-28447:

Fix Version/s: 2.6.0
   2.7.0
   3.0.0-beta-2
   2.5.9

> New configuration to override the hfile specific blocksize
> --
>
> Key: HBASE-28447
> URL: https://issues.apache.org/jira/browse/HBASE-28447
> Project: HBase
>  Issue Type: Improvement
>Reporter: Gourab Taparia
>Assignee: Andrew Kyle Purtell
>Priority: Minor
> Fix For: 2.6.0, 2.7.0, 3.0.0-beta-2, 2.5.9
>
>
> Right now there is no config attached to the HFile block size by which we can 
> override the default. The default is set to 64 KB in 
> HConstants.DEFAULT_BLOCKSIZE . We need a global config property that would go 
> on hbase-site.xm which can control this value.
> Since the BLOCKSIZE is tracked at the column family level - we will need to 
> respect the CFD value first. Also, configuration settings are also something 
> that can be set in schema, at the column or table level, and will override 
> the relevant values from the site file. Below is the precedence order we can 
> use to get the final blocksize value :
> {code:java}
> ColumnFamilyDescriptor.BLOCKSIZE > schema level site configuration overrides 
> > site configuration > HConstants.DEFAULT_BLOCKSIZE{code}
> PS: There is one related config “hbase.mapreduce.hfileoutputformat.blocksize” 
> however that is specific to map-reduce jobs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HBASE-28447) New configuration to override the hfile specific blocksize

2024-04-08 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell reassigned HBASE-28447:
---

Assignee: Andrew Kyle Purtell  (was: Gourab Taparia)

> New configuration to override the hfile specific blocksize
> --
>
> Key: HBASE-28447
> URL: https://issues.apache.org/jira/browse/HBASE-28447
> Project: HBase
>  Issue Type: Improvement
>Reporter: Gourab Taparia
>Assignee: Andrew Kyle Purtell
>Priority: Minor
>
> Right now there is no config attached to the HFile block size by which we can 
> override the default. The default is set to 64 KB in 
> HConstants.DEFAULT_BLOCKSIZE . We need a global config property that would go 
> on hbase-site.xm which can control this value.
> Since the BLOCKSIZE is tracked at the column family level - we will need to 
> respect the CFD value first. Also, configuration settings are also something 
> that can be set in schema, at the column or table level, and will override 
> the relevant values from the site file. Below is the precedence order we can 
> use to get the final blocksize value :
> {code:java}
> ColumnFamilyDescriptor.BLOCKSIZE > schema level site configuration overrides 
> > site configuration > HConstants.DEFAULT_BLOCKSIZE{code}
> PS: There is one related config “hbase.mapreduce.hfileoutputformat.blocksize” 
> however that is specific to map-reduce jobs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-28447) New configuration to override the hfile specific blocksize

2024-04-08 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834963#comment-17834963
 ] 

Andrew Kyle Purtell commented on HBASE-28447:
-

(y)

> New configuration to override the hfile specific blocksize
> --
>
> Key: HBASE-28447
> URL: https://issues.apache.org/jira/browse/HBASE-28447
> Project: HBase
>  Issue Type: Improvement
>Reporter: Gourab Taparia
>Assignee: Gourab Taparia
>Priority: Minor
>
> Right now there is no config attached to the HFile block size by which we can 
> override the default. The default is set to 64 KB in 
> HConstants.DEFAULT_BLOCKSIZE . We need a global config property that would go 
> on hbase-site.xm which can control this value.
> Since the BLOCKSIZE is tracked at the column family level - we will need to 
> respect the CFD value first. Also, configuration settings are also something 
> that can be set in schema, at the column or table level, and will override 
> the relevant values from the site file. Below is the precedence order we can 
> use to get the final blocksize value :
> {code:java}
> ColumnFamilyDescriptor.BLOCKSIZE > schema level site configuration overrides 
> > site configuration > HConstants.DEFAULT_BLOCKSIZE{code}
> PS: There is one related config “hbase.mapreduce.hfileoutputformat.blocksize” 
> however that is specific to map-reduce jobs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-28485) Re-use ZstdDecompressCtx/ZstdCompressCtx for performance

2024-04-08 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834962#comment-17834962
 ] 

Andrew Kyle Purtell commented on HBASE-28485:
-

[~charlesconnell] thank you.
I approved the PR.

> Re-use ZstdDecompressCtx/ZstdCompressCtx for performance
> 
>
> Key: HBASE-28485
> URL: https://issues.apache.org/jira/browse/HBASE-28485
> Project: HBase
>  Issue Type: Improvement
>Reporter: Charles Connell
>Assignee: Charles Connell
>Priority: Major
>  Labels: pull-request-available
> Attachments: async-prof-flamegraph-cpu_event-1712150670836-cpu.html, 
> async-prof-pid-1324144-cpu-1.html
>
>
> The zstd documentation 
> [recommends|https://facebook.github.io/zstd/zstd_manual.html#Chapter4] 
> re-using context objects when possible, because their creation has some 
> expense. They can be more cheaply reset than re-created. In 
> {{ZstdDecompressor}} and {{{}ZstdCompressor{}}}, we create a new context 
> object for every call to {{decompress()}} and {{{}compress(){}}}. In CPU 
> profiles I've taken at my company, the constructor of {{ZstdDecompressCtx}} 
> can sometimes represent 10-25% of the time spent in zstd decompression, which 
> itself is 5-10% of a RegionServer's total CPU time. Avoiding this performance 
> penalty won't lead to any massive performance boost, but is a nice little win.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-28447) New configuration to override the hfile specific blocksize

2024-04-05 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834450#comment-17834450
 ] 

Andrew Kyle Purtell commented on HBASE-28447:
-

[~gourab.taparia] Are you planning to open a PR for this? 

> New configuration to override the hfile specific blocksize
> --
>
> Key: HBASE-28447
> URL: https://issues.apache.org/jira/browse/HBASE-28447
> Project: HBase
>  Issue Type: Improvement
>Reporter: Gourab Taparia
>Assignee: Gourab Taparia
>Priority: Minor
>
> Right now there is no config attached to the HFile block size by which we can 
> override the default. The default is set to 64 KB in 
> HConstants.DEFAULT_BLOCKSIZE . We need a global config property that would go 
> on hbase-site.xm which can control this value.
> Since the BLOCKSIZE is tracked at the column family level - we will need to 
> respect the CFD value first. Also, configuration settings are also something 
> that can be set in schema, at the column or table level, and will override 
> the relevant values from the site file. Below is the precedence order we can 
> use to get the final blocksize value :
> {code:java}
> ColumnFamilyDescriptor.BLOCKSIZE > schema level site configuration overrides 
> > site configuration > HConstants.DEFAULT_BLOCKSIZE{code}
> PS: There is one related config “hbase.mapreduce.hfileoutputformat.blocksize” 
> however that is specific to map-reduce jobs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-26192) Master UI hbck should provide a JSON formatted output option

2024-03-26 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell updated HBASE-26192:

Fix Version/s: (was: 2.4.18)

> Master UI hbck should provide a JSON formatted output option
> 
>
> Key: HBASE-26192
> URL: https://issues.apache.org/jira/browse/HBASE-26192
> Project: HBase
>  Issue Type: New Feature
>Reporter: Andrew Kyle Purtell
>Assignee: Mihir Monani
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 2.7.0, 3.0.0-beta-2, 2.6.1, 2.5.9
>
> Attachments: HBCK Report in JSON Format.png, Screen Shot 2022-05-31 
> at 5.18.15 PM.png
>
>
> It used to be possible to get hbck's verdict of cluster status from the 
> command line, especially useful for headless deployments, i.e. without 
> requiring a browser with sufficient connectivity to load a UI, or scrape 
> information out of raw HTML, or write regex to comb over log4j output. The 
> hbck tool's output wasn't particularly convenient to parse but it was 
> straightforward to extract the desired information with a handful of regular 
> expressions. 
> HBCK2 has a different design philosophy than the old hbck, which is to serve 
> as a collection of small and discrete recovery and repair functions, rather 
> than attempt to be a universal repair tool. This makes a lot of sense and 
> isn't the issue at hand. Unfortunately the old hbck's utility for reporting 
> the current cluster health assessment has not been replaced either in whole 
> or in part. Instead:
> {quote}
> HBCK2 is for fixes. For listings of inconsistencies or blockages in the 
> running cluster, you go elsewhere, to the logs and UI of the running cluster 
> Master. Once an issue has been identified, you use the HBCK2 tool to ask the 
> Master to effect fixes or to skip-over bad state. Asking the Master to make 
> the fixes rather than try and effect the repair locally in a fix-it tool's 
> context is another important difference between HBCK2 and hbck1. 
> {quote}
> Developing custom tooling to mine logs and scrape UI simply to gain a top 
> level assessment of system health is unsatisfying. There should be a 
> convenient means for querying the system if issues that rise to the level of 
> _inconsistency_, in the hbck parlance, are believed to be present. It would 
> be relatively simple to bring back the experience of invoking a command line 
> tool to deliver a verdict. This could be added to the hbck2 tool itself but 
> given that hbase-operator-tools is a separate project an intrinsic solution 
> is desirable. 
> An option that immediately comes to mind is modification of the Master's 
> hbck.jsp page to provide a JSON formatted output option if the HTTP Accept 
> header asks for text/json. However, looking at the source of hbck.jsp, it 
> makes more sense to leave it as is and implement a convenient machine 
> parseable output format elsewhere. This can be trivially accomplished with a 
> new servlet. Like hbck.jsp the servlet implementation would get a reference 
> to HbckChore and present the information this class makes available via its 
> various getters.  
> The machine parseable output is sufficient to enable headless hbck status 
> checking but it still would be nice if we could provide operators a command 
> line tool that formats the information for convenient viewing in a terminal. 
> That part could be implemented in the hbck2 tool after this proposal is 
> implemented.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-26192) Master UI hbck should provide a JSON formatted output option

2024-03-26 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell updated HBASE-26192:

Fix Version/s: 2.4.18
   2.7.0
   2.6.1
   2.5.9

> Master UI hbck should provide a JSON formatted output option
> 
>
> Key: HBASE-26192
> URL: https://issues.apache.org/jira/browse/HBASE-26192
> Project: HBase
>  Issue Type: New Feature
>Reporter: Andrew Kyle Purtell
>Assignee: Mihir Monani
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 2.4.18, 2.7.0, 3.0.0-beta-2, 2.6.1, 2.5.9
>
> Attachments: HBCK Report in JSON Format.png, Screen Shot 2022-05-31 
> at 5.18.15 PM.png
>
>
> It used to be possible to get hbck's verdict of cluster status from the 
> command line, especially useful for headless deployments, i.e. without 
> requiring a browser with sufficient connectivity to load a UI, or scrape 
> information out of raw HTML, or write regex to comb over log4j output. The 
> hbck tool's output wasn't particularly convenient to parse but it was 
> straightforward to extract the desired information with a handful of regular 
> expressions. 
> HBCK2 has a different design philosophy than the old hbck, which is to serve 
> as a collection of small and discrete recovery and repair functions, rather 
> than attempt to be a universal repair tool. This makes a lot of sense and 
> isn't the issue at hand. Unfortunately the old hbck's utility for reporting 
> the current cluster health assessment has not been replaced either in whole 
> or in part. Instead:
> {quote}
> HBCK2 is for fixes. For listings of inconsistencies or blockages in the 
> running cluster, you go elsewhere, to the logs and UI of the running cluster 
> Master. Once an issue has been identified, you use the HBCK2 tool to ask the 
> Master to effect fixes or to skip-over bad state. Asking the Master to make 
> the fixes rather than try and effect the repair locally in a fix-it tool's 
> context is another important difference between HBCK2 and hbck1. 
> {quote}
> Developing custom tooling to mine logs and scrape UI simply to gain a top 
> level assessment of system health is unsatisfying. There should be a 
> convenient means for querying the system if issues that rise to the level of 
> _inconsistency_, in the hbck parlance, are believed to be present. It would 
> be relatively simple to bring back the experience of invoking a command line 
> tool to deliver a verdict. This could be added to the hbck2 tool itself but 
> given that hbase-operator-tools is a separate project an intrinsic solution 
> is desirable. 
> An option that immediately comes to mind is modification of the Master's 
> hbck.jsp page to provide a JSON formatted output option if the HTTP Accept 
> header asks for text/json. However, looking at the source of hbck.jsp, it 
> makes more sense to leave it as is and implement a convenient machine 
> parseable output format elsewhere. This can be trivially accomplished with a 
> new servlet. Like hbck.jsp the servlet implementation would get a reference 
> to HbckChore and present the information this class makes available via its 
> various getters.  
> The machine parseable output is sufficient to enable headless hbck status 
> checking but it still would be nice if we could provide operators a command 
> line tool that formats the information for convenient viewing in a terminal. 
> That part could be implemented in the hbck2 tool after this proposal is 
> implemented.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-27826) Region split and merge time while offline is O(n) with respect to number of store files

2024-03-24 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell updated HBASE-27826:

Fix Version/s: 2.7.0
   3.0.0-beta-2

> Region split and merge time while offline is O(n) with respect to number of 
> store files
> ---
>
> Key: HBASE-27826
> URL: https://issues.apache.org/jira/browse/HBASE-27826
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.5.4
>Reporter: Andrew Kyle Purtell
>Priority: Major
> Fix For: 2.7.0, 3.0.0-beta-2
>
>
> This is a significant availability issue when HFiles are on S3. =
> HBASE-26079 ({_}Use StoreFileTracker when splitting and merging{_}) changed 
> the split and merge table procedure implementations to indirect through the 
> StoreFileTracker implementation when selecting HFiles to be merged or split, 
> rather than directly listing those using file system APIs. It also changed 
> the commit logic in HRegionFileSystem to add the link/ref files on resulting 
> split or merged regions to the StoreFileTracker. However, the creation of a 
> link file is still a filesystem operation and creating a “file” on S3 can 
> take well over a second. If, for example there are 20 store files in a 
> region, which is not uncommon, after the region is taken offline for a split 
> (or merge) it may require more than 20 seconds to create the link files 
> before the results can be brought back online, creating a severe availability 
> problem. Splits and merges are supposed to be fast, completing in less than a 
> second, certainly less than a few seconds. This has been true when HFiles are 
> stored on HDFS only because file creation operations there are nearly 
> instantaneous. 
> There are two issues but both can be handled with modifications to the store 
> file tracker interface and the file based store file tracker implementation. 
> When the file based store file file tracker is enabled the HFile links should 
> be virtual entities that only exist in the file manifest. We do not require 
> physical files in the filesystem to serve as links now. That is the magic of 
> the this file tracker, the manifest file replaces requirements to list the 
> filesystem.
> Then, when splitting or merging, the HFile links should be collected into a 
> list and committed in one batch using a new FILE file tracker interface, 
> requiring only one update of the manifest file in S3, bringing the time 
> requirement for this operation to O(1) down from O[n].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (HBASE-27826) Region split and merge time while offline is O(n) with respect to number of store files

2024-03-22 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17829754#comment-17829754
 ] 

Andrew Kyle Purtell edited comment on HBASE-27826 at 3/22/24 6:40 AM:
--

{quote}Links and back references should also be encoded into the tracker file 
directly, but this is not fully covered by the design doc
{quote}
This issue really should cover link and ref files too, because it is the time 
required to create them all when we have a lot of store files in the region 
that cause splits to be in offline state for a very long time on S3. If they 
are still real files it will take S3 about a second to create each one. We 
could perhaps create them in parallel with a thread pool but that is a 
workaround not a solution. This is the main pain point.

Agree the solution for link and back references is not fully covered by the 
design doc yet. So we will update the design doc to add the missing coverage. 
There are some issues to work out related to supporting rollback in particular. 
Migration will need to update both the new virtual entries in the manifest and 
real "files" in the filesystem or bucket until the user decides they can fully 
switch over, or we design a way for it to be safe and automatic.

bq. File a new issue, to add a version field in the tracker file defination, so 
when we find out that we are reading a tracker file with a higher version, we 
will fail so end users will know that they should do something before 
downgrading.

This one we can surely do now in a subtask, agreed.


was (Author: apurtell):
{quote}Links and back references should also be encoded into the tracker file 
directly, but this is not fully covered by the design doc
{quote}
This issue really should cover link and ref files too, because it is the time 
required to create them all when we have a lot of store files in the region 
that cause splits to be in offline state for a very long time on S3. If they 
are still real files it will take S3 about a second to create each one. We 
could perhaps create them in parallel with a thread pool but that is a 
workaround not a solution. This is the main pain point.

Agree the solution for link and back references is not fully covered by the 
design doc yet. So we will update the design doc to add the missing coverage. 
There are some issues to work out related to supporting rollback in particular. 
Migration will need to update both the new virtual entries in the manifest and 
real "files" in the filesystem or bucket until the user decides they can fully 
switch over, or we design a way for it to be safe and automatic.

 

> Region split and merge time while offline is O(n) with respect to number of 
> store files
> ---
>
> Key: HBASE-27826
> URL: https://issues.apache.org/jira/browse/HBASE-27826
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.5.4
>Reporter: Andrew Kyle Purtell
>Priority: Major
>
> This is a significant availability issue when HFiles are on S3. =
> HBASE-26079 ({_}Use StoreFileTracker when splitting and merging{_}) changed 
> the split and merge table procedure implementations to indirect through the 
> StoreFileTracker implementation when selecting HFiles to be merged or split, 
> rather than directly listing those using file system APIs. It also changed 
> the commit logic in HRegionFileSystem to add the link/ref files on resulting 
> split or merged regions to the StoreFileTracker. However, the creation of a 
> link file is still a filesystem operation and creating a “file” on S3 can 
> take well over a second. If, for example there are 20 store files in a 
> region, which is not uncommon, after the region is taken offline for a split 
> (or merge) it may require more than 20 seconds to create the link files 
> before the results can be brought back online, creating a severe availability 
> problem. Splits and merges are supposed to be fast, completing in less than a 
> second, certainly less than a few seconds. This has been true when HFiles are 
> stored on HDFS only because file creation operations there are nearly 
> instantaneous. 
> There are two issues but both can be handled with modifications to the store 
> file tracker interface and the file based store file tracker implementation. 
> When the file based store file file tracker is enabled the HFile links should 
> be virtual entities that only exist in the file manifest. We do not require 
> physical files in the filesystem to serve as links now. That is the magic of 
> the this file tracker, the manifest file replaces requirements to list the 
> filesystem.
> Then, when splitting or merging, the HFile links should be collected into a 
> list and committed in one batch using a new FILE file tracker 

[jira] [Comment Edited] (HBASE-27826) Region split and merge time while offline is O(n) with respect to number of store files

2024-03-22 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17829754#comment-17829754
 ] 

Andrew Kyle Purtell edited comment on HBASE-27826 at 3/22/24 6:32 AM:
--

{quote}Links and back references should also be encoded into the tracker file 
directly, but this is not fully covered by the design doc
{quote}
This issue really should cover link and ref files too, because it is the time 
required to create them all when we have a lot of store files in the region 
that cause splits to be in offline state for a very long time on S3. If they 
are still real files it will take S3 about a second to create each one. We 
could perhaps create them in parallel with a thread pool but that is a 
workaround not a solution. This is the main pain point.

Agree the solution for link and back references is not fully covered by the 
design doc yet. So we will update the design doc to add the missing coverage. 
There are some issues to work out related to supporting rollback in particular. 
Migration will need to update both the new virtual entries in the manifest and 
real "files" in the filesystem or bucket until the user decides they can fully 
switch over, or we design a way for it to be safe and automatic.

 


was (Author: apurtell):
{quote}Links and back references should also be encoded into the tracker file 
directly, but this is not fully covered by the design doc
{quote}
 

This issue really should cover link and ref files too, because it is the time 
required to create them all when we have a lot of store files in the region 
that cause splits to be in offline state for a very long time on S3. If they 
are still real files it will take S3 about a second to create each one. We 
could perhaps create them in parallel with a thread pool but that is a 
workaround not a solution.

Agree the solution for link and back references is not fully covered by the 
design doc yet. So we will update the design doc to add the missing coverage. 
There are some issues to work out related to supporting rollback in particular. 
Migration will need to update both the new virtual entries in the manifest and 
real "files" in the filesystem or bucket until the user decides they can fully 
switch over, or we design a way for it to be safe and automatic. 

 

> Region split and merge time while offline is O(n) with respect to number of 
> store files
> ---
>
> Key: HBASE-27826
> URL: https://issues.apache.org/jira/browse/HBASE-27826
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.5.4
>Reporter: Andrew Kyle Purtell
>Priority: Major
>
> This is a significant availability issue when HFiles are on S3. =
> HBASE-26079 ({_}Use StoreFileTracker when splitting and merging{_}) changed 
> the split and merge table procedure implementations to indirect through the 
> StoreFileTracker implementation when selecting HFiles to be merged or split, 
> rather than directly listing those using file system APIs. It also changed 
> the commit logic in HRegionFileSystem to add the link/ref files on resulting 
> split or merged regions to the StoreFileTracker. However, the creation of a 
> link file is still a filesystem operation and creating a “file” on S3 can 
> take well over a second. If, for example there are 20 store files in a 
> region, which is not uncommon, after the region is taken offline for a split 
> (or merge) it may require more than 20 seconds to create the link files 
> before the results can be brought back online, creating a severe availability 
> problem. Splits and merges are supposed to be fast, completing in less than a 
> second, certainly less than a few seconds. This has been true when HFiles are 
> stored on HDFS only because file creation operations there are nearly 
> instantaneous. 
> There are two issues but both can be handled with modifications to the store 
> file tracker interface and the file based store file tracker implementation. 
> When the file based store file file tracker is enabled the HFile links should 
> be virtual entities that only exist in the file manifest. We do not require 
> physical files in the filesystem to serve as links now. That is the magic of 
> the this file tracker, the manifest file replaces requirements to list the 
> filesystem.
> Then, when splitting or merging, the HFile links should be collected into a 
> list and committed in one batch using a new FILE file tracker interface, 
> requiring only one update of the manifest file in S3, bringing the time 
> requirement for this operation to O(1) down from O[n].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-27826) Region split and merge time while offline is O(n) with respect to number of store files

2024-03-22 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17829754#comment-17829754
 ] 

Andrew Kyle Purtell commented on HBASE-27826:
-

{quote}Links and back references should also be encoded into the tracker file 
directly, but this is not fully covered by the design doc
{quote}
 

This issue really should cover link and ref files too, because it is the time 
required to create them all when we have a lot of store files in the region 
that cause splits to be in offline state for a very long time on S3. If they 
are still real files it will take S3 about a second to create each one. We 
could perhaps create them in parallel with a thread pool but that is a 
workaround not a solution.

Agree the solution for link and back references is not fully covered by the 
design doc yet. So we will update the design doc to add the missing coverage. 
There are some issues to work out related to supporting rollback in particular. 
Migration will need to update both the new virtual entries in the manifest and 
real "files" in the filesystem or bucket until the user decides they can fully 
switch over, or we design a way for it to be safe and automatic. 

 

> Region split and merge time while offline is O(n) with respect to number of 
> store files
> ---
>
> Key: HBASE-27826
> URL: https://issues.apache.org/jira/browse/HBASE-27826
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.5.4
>Reporter: Andrew Kyle Purtell
>Priority: Major
>
> This is a significant availability issue when HFiles are on S3. =
> HBASE-26079 ({_}Use StoreFileTracker when splitting and merging{_}) changed 
> the split and merge table procedure implementations to indirect through the 
> StoreFileTracker implementation when selecting HFiles to be merged or split, 
> rather than directly listing those using file system APIs. It also changed 
> the commit logic in HRegionFileSystem to add the link/ref files on resulting 
> split or merged regions to the StoreFileTracker. However, the creation of a 
> link file is still a filesystem operation and creating a “file” on S3 can 
> take well over a second. If, for example there are 20 store files in a 
> region, which is not uncommon, after the region is taken offline for a split 
> (or merge) it may require more than 20 seconds to create the link files 
> before the results can be brought back online, creating a severe availability 
> problem. Splits and merges are supposed to be fast, completing in less than a 
> second, certainly less than a few seconds. This has been true when HFiles are 
> stored on HDFS only because file creation operations there are nearly 
> instantaneous. 
> There are two issues but both can be handled with modifications to the store 
> file tracker interface and the file based store file tracker implementation. 
> When the file based store file file tracker is enabled the HFile links should 
> be virtual entities that only exist in the file manifest. We do not require 
> physical files in the filesystem to serve as links now. That is the magic of 
> the this file tracker, the manifest file replaces requirements to list the 
> filesystem.
> Then, when splitting or merging, the HFile links should be collected into a 
> list and committed in one batch using a new FILE file tracker interface, 
> requiring only one update of the manifest file in S3, bringing the time 
> requirement for this operation to O(1) down from O[n].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-28444) Bump org.apache.zookeeper:zookeeper from 3.8.3 to 3.8.4

2024-03-20 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828614#comment-17828614
 ] 

Andrew Kyle Purtell commented on HBASE-28444:
-

Right, we have to remove the exists check, and so this test is basically 
removed. That's good enough for now? We can add a test later once we determine 
how to write the test after this ZooKeeper change.

> Bump org.apache.zookeeper:zookeeper from 3.8.3 to 3.8.4
> ---
>
> Key: HBASE-28444
> URL: https://issues.apache.org/jira/browse/HBASE-28444
> Project: HBase
>  Issue Type: Task
>  Components: dependabot, security, Zookeeper
>Reporter: Duo Zhang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-28415) Remove Curator dependency from hbase-endpoint

2024-03-19 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828484#comment-17828484
 ] 

Andrew Kyle Purtell commented on HBASE-28415:
-

I have no concerns about it being removed. Sure, I may cherry pick it. We can 
file a new issue for that work.

> Remove Curator dependency from hbase-endpoint
> -
>
> Key: HBASE-28415
> URL: https://issues.apache.org/jira/browse/HBASE-28415
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0-beta-1
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.6.0, 3.0.0-beta-2
>
>
> TestRpcControllerFactory used to erronously import some classes from 
> Curator's relocated Guava.
> This has been fixed now, but the (compile scope) dependency for curator has 
> not been removed.
> I propose removing the dependency in master, branch-3 and branch-2.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28441) Update downloads.xml for 2.5.8

2024-03-13 Thread Andrew Kyle Purtell (Jira)
Andrew Kyle Purtell created HBASE-28441:
---

 Summary: Update downloads.xml for 2.5.8
 Key: HBASE-28441
 URL: https://issues.apache.org/jira/browse/HBASE-28441
 Project: HBase
  Issue Type: Task
Reporter: Andrew Kyle Purtell
Assignee: Andrew Kyle Purtell






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28441) Update downloads.xml for 2.5.8

2024-03-13 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell resolved HBASE-28441.
-
Resolution: Fixed

> Update downloads.xml for 2.5.8
> --
>
> Key: HBASE-28441
> URL: https://issues.apache.org/jira/browse/HBASE-28441
> Project: HBase
>  Issue Type: Task
>Reporter: Andrew Kyle Purtell
>Assignee: Andrew Kyle Purtell
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-27826) Region split and merge time while offline is O(n) with respect to number of store files

2024-03-13 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826829#comment-17826829
 ] 

Andrew Kyle Purtell commented on HBASE-27826:
-

I started a design document. Find it in the issue links. Anyone who has this 
link can edit. [~prathyu6] [~zhangduo] [~wchevreuil] 

> Region split and merge time while offline is O(n) with respect to number of 
> store files
> ---
>
> Key: HBASE-27826
> URL: https://issues.apache.org/jira/browse/HBASE-27826
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.5.4
>Reporter: Andrew Kyle Purtell
>Priority: Major
>
> This is a significant availability issue when HFiles are on S3. =
> HBASE-26079 ({_}Use StoreFileTracker when splitting and merging{_}) changed 
> the split and merge table procedure implementations to indirect through the 
> StoreFileTracker implementation when selecting HFiles to be merged or split, 
> rather than directly listing those using file system APIs. It also changed 
> the commit logic in HRegionFileSystem to add the link/ref files on resulting 
> split or merged regions to the StoreFileTracker. However, the creation of a 
> link file is still a filesystem operation and creating a “file” on S3 can 
> take well over a second. If, for example there are 20 store files in a 
> region, which is not uncommon, after the region is taken offline for a split 
> (or merge) it may require more than 20 seconds to create the link files 
> before the results can be brought back online, creating a severe availability 
> problem. Splits and merges are supposed to be fast, completing in less than a 
> second, certainly less than a few seconds. This has been true when HFiles are 
> stored on HDFS only because file creation operations there are nearly 
> instantaneous. 
> There are two issues but both can be handled with modifications to the store 
> file tracker interface and the file based store file tracker implementation. 
> When the file based store file file tracker is enabled the HFile links should 
> be virtual entities that only exist in the file manifest. We do not require 
> physical files in the filesystem to serve as links now. That is the magic of 
> the this file tracker, the manifest file replaces requirements to list the 
> filesystem.
> Then, when splitting or merging, the HFile links should be collected into a 
> list and committed in one batch using a new FILE file tracker interface, 
> requiring only one update of the manifest file in S3, bringing the time 
> requirement for this operation to O(1) down from O[n].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (HBASE-27826) Region split and merge time while offline is O(n) with respect to number of store files

2024-03-13 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826771#comment-17826771
 ] 

Andrew Kyle Purtell edited comment on HBASE-27826 at 3/13/24 4:16 PM:
--

{quote}We will define a splitFiles method in StoreFileTracker interface
{quote}
No. Split logic should remain in SplitTransaction. Breaking this encapsulation 
and diluting the split implementation does not seem like a good idea to me, but 
we could discuss it, if someone actually wants this.

StoreFileTracker is a directory of store files. This concept is neatly extended 
to include management of reference and link files. References and links are 
aspects of maintaining a directory of store file contents. SFT is the 
appropriate place to make design changes (in my opinion). And once SFT is 
managing references and links, they do not need to be real files, they can be 
virtual concepts maintained in the manifest. So SFT gets new additional methods 
for adding and removing references and links. Like createLink(), deleteLink(), 
createReference(), deleteReference(), and so on. The SFT becomes responsible 
for listing the link and reference files among the store contents. Today we 
sometimes go directly to the filesystem for listing stores, still. This is 
wrong! SFT should be the exclusive way we track and discover store contents. 

Once references and links are concepts managed by SFT, we can have the 
different SFT implementations optimize for their design cases. When using the 
FileBasedStoreFileTracker we would not wait for up to a second or two when 
creating each link or reference in the S3 bucket, causing long offline times 
during splits proportional to the number of store files in the region. Instead 
imagine links and references are entries in the manifest, not real files. We 
don't take the cost of creating files in the S3 bucket, we only update the 
manifest, and that can be optimized further. We can gather all of the links and 
references we want to create into a list, and we submit them to SFT all at 
once, using an interface method that accepts an array or list of SFT mutations 
to perform in batch, so there is only one manifest update required, and then 
this aspect of splitting becomes O(1) in time.

Regarding the DefaultStoreFileTracker, it maintains existing functionality. 
DefaultStoreFileTracker needs new methods for creating and managing links too, 
but they will be real link and reference files, they will maintain their 
current naming and structure, this will be fully compatible with existing 
stores. This amounts to refactoring some of the code in HFileLink and 
ReferenceFile into DefaultStoreFileTracker. This is our current thinking.

A design doc will help clarify the proposals and discussion.


was (Author: apurtell):
{quote}We will define a splitFiles method in StoreFileTracker interface
{quote}
No. Split logic should remain in SplitTransaction. Breaking this encapsulation 
and diluting the split implementation does not seem like a good idea to me, but 
we could discuss it, if someone actually wants this.

StoreFileTracker is a directory of store files. This concept is neatly extended 
to include management of reference and link files. References and links are 
aspects of maintaining a directory of store file contents. SFT is the 
appropriate place to make design changes (in my opinion). And once SFT is 
managing references and links, they do not need to be real files, they can be 
virtual concepts maintained in the manifest. So SFT gets new additional methods 
for adding and removing references and links. Like createLink(), deleteLink(), 
createReference(), deleteReference(), and so on.

Once references and links are concepts managed by SFT, we can have the 
different SFT implementations optimize for their design cases. When using the 
FileBasedStoreFileTracker we would not wait for up to a second or two when 
creating each link or reference in the S3 bucket, causing long offline times 
during splits proportional to the number of store files in the region. Instead 
imagine links and references are entries in the manifest, not real files. We 
don't take the cost of creating files in the S3 bucket, we only update the 
manifest, and that can be optimized further. We can gather all of the links and 
references we want to create into a list, and we submit them to SFT all at 
once, using an interface method that accepts an array or list of SFT mutations 
to perform in batch, so there is only one manifest update required, and then 
this aspect of splitting becomes O(1) in time.

Regarding the DefaultStoreFileTracker, it maintains existing functionality. 
DefaultStoreFileTracker needs new methods for creating and managing links too, 
but they will be real link and reference files, they will maintain their 
current naming and structure, this will be fully compatible with existing 
stores. 

[jira] [Comment Edited] (HBASE-27826) Region split and merge time while offline is O(n) with respect to number of store files

2024-03-13 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826771#comment-17826771
 ] 

Andrew Kyle Purtell edited comment on HBASE-27826 at 3/13/24 4:17 PM:
--

{quote}We will define a splitFiles method in StoreFileTracker interface
{quote}
No. Split logic should remain in SplitTransaction. Breaking this encapsulation 
and diluting the split implementation does not seem like a good idea to me, but 
we could discuss it, if someone actually wants this.

StoreFileTracker is a directory of store files. This concept is neatly extended 
to include management of reference and link files. References and links are 
aspects of maintaining a directory of store file contents. SFT is the 
appropriate place to make design changes (in my opinion). And once SFT is 
managing references and links, they do not need to be real files, they can be 
virtual concepts maintained in the manifest. So SFT gets new additional methods 
for adding and removing references and links. Like createLink(), deleteLink(), 
createReference(), deleteReference(), and so on. The SFT becomes responsible 
for listing the link and reference files among the store contents. Today we 
sometimes go directly to the filesystem for listing stores, still. We do direct 
filesystem access for making and discovering link and reference files. This is 
wrong. SFT should be the exclusive way we track and discover store contents.

Once references and links are concepts managed by SFT, we can have the 
different SFT implementations optimize for their design cases. When using the 
FileBasedStoreFileTracker we would not wait for up to a second or two when 
creating each link or reference in the S3 bucket, causing long offline times 
during splits proportional to the number of store files in the region. Instead 
imagine links and references are entries in the manifest, not real files. We 
don't take the cost of creating files in the S3 bucket, we only update the 
manifest, and that can be optimized further. We can gather all of the links and 
references we want to create into a list, and we submit them to SFT all at 
once, using an interface method that accepts an array or list of SFT mutations 
to perform in batch, so there is only one manifest update required, and then 
this aspect of splitting becomes O(1) in time.

Regarding the DefaultStoreFileTracker, it maintains existing functionality. 
DefaultStoreFileTracker needs new methods for creating and managing links too, 
but they will be real link and reference files, they will maintain their 
current naming and structure, this will be fully compatible with existing 
stores. This amounts to refactoring some of the code in HFileLink and 
ReferenceFile into DefaultStoreFileTracker. This is our current thinking.

A design doc will help clarify the proposals and discussion.


was (Author: apurtell):
{quote}We will define a splitFiles method in StoreFileTracker interface
{quote}
No. Split logic should remain in SplitTransaction. Breaking this encapsulation 
and diluting the split implementation does not seem like a good idea to me, but 
we could discuss it, if someone actually wants this.

StoreFileTracker is a directory of store files. This concept is neatly extended 
to include management of reference and link files. References and links are 
aspects of maintaining a directory of store file contents. SFT is the 
appropriate place to make design changes (in my opinion). And once SFT is 
managing references and links, they do not need to be real files, they can be 
virtual concepts maintained in the manifest. So SFT gets new additional methods 
for adding and removing references and links. Like createLink(), deleteLink(), 
createReference(), deleteReference(), and so on. The SFT becomes responsible 
for listing the link and reference files among the store contents. Today we 
sometimes go directly to the filesystem for listing stores, still. This is 
wrong! SFT should be the exclusive way we track and discover store contents. 

Once references and links are concepts managed by SFT, we can have the 
different SFT implementations optimize for their design cases. When using the 
FileBasedStoreFileTracker we would not wait for up to a second or two when 
creating each link or reference in the S3 bucket, causing long offline times 
during splits proportional to the number of store files in the region. Instead 
imagine links and references are entries in the manifest, not real files. We 
don't take the cost of creating files in the S3 bucket, we only update the 
manifest, and that can be optimized further. We can gather all of the links and 
references we want to create into a list, and we submit them to SFT all at 
once, using an interface method that accepts an array or list of SFT mutations 
to perform in batch, so there is only one manifest update required, and then 
this aspect of 

[jira] [Comment Edited] (HBASE-27826) Region split and merge time while offline is O(n) with respect to number of store files

2024-03-13 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826771#comment-17826771
 ] 

Andrew Kyle Purtell edited comment on HBASE-27826 at 3/13/24 4:14 PM:
--

{quote}We will define a splitFiles method in StoreFileTracker interface
{quote}
No. Split logic should remain in SplitTransaction. Breaking this encapsulation 
and diluting the split implementation does not seem like a good idea to me, but 
we could discuss it, if someone actually wants this.

StoreFileTracker is a directory of store files. This concept is neatly extended 
to include management of reference and link files. References and links are 
aspects of maintaining a directory of store file contents. SFT is the 
appropriate place to make design changes (in my opinion). And once SFT is 
managing references and links, they do not need to be real files, they can be 
virtual concepts maintained in the manifest. So SFT gets new additional methods 
for adding and removing references and links. Like createLink(), deleteLink(), 
createReference(), deleteReference(), and so on.

Once references and links are concepts managed by SFT, we can have the 
different SFT implementations optimize for their design cases. When using the 
FileBasedStoreFileTracker we would not wait for up to a second or two when 
creating each link or reference in the S3 bucket, causing long offline times 
during splits proportional to the number of store files in the region. Instead 
imagine links and references are entries in the manifest, not real files. We 
don't take the cost of creating files in the S3 bucket, we only update the 
manifest, and that can be optimized further. We can gather all of the links and 
references we want to create into a list, and we submit them to SFT all at 
once, using an interface method that accepts an array or list of SFT mutations 
to perform in batch, so there is only one manifest update required, and then 
this aspect of splitting becomes O(1) in time.

Regarding the DefaultStoreFileTracker, it maintains existing functionality. 
DefaultStoreFileTracker needs new methods for creating and managing links too, 
but they will be real link and reference files, they will maintain their 
current naming and structure, this will be fully compatible with existing 
stores. This amounts to refactoring some of the code in HFileLink and 
ReferenceFile into DefaultStoreFileTracker. This is our current thinking.

A design doc will help clarify the proposals and discussion.


was (Author: apurtell):
{quote}We will define a splitFiles method in StoreFileTracker interface
{quote}
No. Split logic should remain in SplitTransaction. Breaking this encapsulation 
and diluting the split implementation does not seem like a good idea to me, but 
we could discuss it, if someone actually wants this.

StoreFileTracker is a directory of store files. This concept is neatly extended 
to include management of reference and link files. References and links are 
aspects of maintaining a directory of store file contents. SFT is the 
appropriate place to make design changes (in my opinion). And once SFT is 
managing references and links, they do not need to be real files, they can be 
virtual concepts maintained in the manifest. So SFT gets new additional methods 
for adding and removing references and links. Like createLink(), deleteLink(), 
createReference(), deleteReference(), and so on.

Once references and links are concepts managed by SFT, we can have the 
different SFT implementations optimize for their design cases. When using the 
FileBasedStoreFileTracker we would not wait for up to a second or two when 
creating each link or reference in the S3 bucket, causing long offline times 
during splits proportional to the number of store files in the region. Instead 
imagine we gather all of the links and references we want to create into a 
list, and we submit them to SFT all at once, using an interface method that 
accepts an array or list of SFT mutations to perform in batch, so there is only 
one manifest update required, and then this aspect of splitting becomes O(1) in 
time.

Regarding the DefaultStoreFileTracker, it maintains existing functionality. 
DefaultStoreFileTracker needs new methods for creating and managing links too, 
but they will be real link and reference files, they will maintain their 
current naming and structure, this will be fully compatible with existing 
stores. This amounts to refactoring some of the code in HFileLink and 
ReferenceFile into DefaultStoreFileTracker. This is our current thinking.

A design doc will help clarify the proposals and discussion.

> Region split and merge time while offline is O(n) with respect to number of 
> store files
> ---
>
> Key: HBASE-27826
> URL: 

[jira] [Comment Edited] (HBASE-27826) Region split and merge time while offline is O(n) with respect to number of store files

2024-03-13 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826771#comment-17826771
 ] 

Andrew Kyle Purtell edited comment on HBASE-27826 at 3/13/24 3:39 PM:
--

{quote}We will define a splitFiles method in StoreFileTracker interface
{quote}
No. Split logic should remain in SplitTransaction. Breaking this encapsulation 
and diluting the split implementation does not seem like a good idea to me, but 
we could discuss it, if someone actually wants this.

StoreFileTracker is a directory of store files. This concept is neatly extended 
to include management of reference and link files. References and links are 
aspects of maintaining a directory of store file contents. SFT is the 
appropriate place to make design changes (in my opinion). And once SFT is 
managing references and links, they do not need to be real files, they can be 
virtual concepts maintained in the manifest. So SFT gets new additional methods 
for adding and removing references and links. Like createLink(), deleteLink(), 
createReference(), deleteReference(), and so on.

Once references and links are concepts managed by SFT, we can have the 
different SFT implementations optimize for their design cases. When using the 
FileBasedStoreFileTracker we would not wait for up to a second or two when 
creating each link or reference in the S3 bucket, causing long offline times 
during splits proportional to the number of store files in the region. Instead 
imagine we gather all of the links and references we want to create into a 
list, and we submit them to SFT all at once, using an interface method that 
accepts an array or list of SFT mutations to perform in batch, so there is only 
one manifest update required, and then this aspect of splitting becomes O(1) in 
time.

Regarding the DefaultStoreFileTracker, it maintains existing functionality. 
DefaultStoreFileTracker needs new methods for creating and managing links too, 
but they will be real link and reference files, they will maintain their 
current naming and structure, this will be fully compatible with existing 
stores. This amounts to refactoring some of the code in HFileLink and 
ReferenceFile into DefaultStoreFileTracker. This is our current thinking.

A design doc will help clarify the proposals and discussion.


was (Author: apurtell):
{quote}We will define a splitFiles method in StoreFileTracker interface
{quote}
No. Split logic should remain in SplitTransaction. Breaking this encapsulation 
and diluting the split implementation does not seem like a good idea to me, but 
we could discuss it, if someone actually wants this.

StoreFileTracker is a directory of store files. This concept is neatly extended 
to include management of reference and link files. And once SFT is managing 
references and links, they do not need to be real files, they can be virtual 
concepts maintained in the manifest. So SFT gets new additional methods for 
adding and removing references and links. Like createLink(), deleteLink(), 
createReference(), deleteReference(), and so on.

Once references and links are concepts managed by SFT, we can have the 
different SFT implementations optimize for their design cases. When using the 
FileBasedStoreFileTracker we would not wait for up to a second or two when 
creating each link or reference in the S3 bucket, causing long offline times 
during splits proportional to the number of store files in the region. Instead 
imagine we gather all of the links and references we want to create into a 
list, and we submit them to SFT all at once, using an interface method that 
accepts an array or list of SFT mutations to perform in batch, so there is only 
one manifest update required, and then this aspect of splitting becomes O(1) in 
time.

Regarding the DefaultStoreFileTracker, it maintains existing functionality. 
DefaultStoreFileTracker needs new methods for creating and managing links too, 
but they will be real link and reference files, they will maintain their 
current naming and structure, this will be fully compatible with existing 
stores. This amounts to refactoring some of the code in HFileLink and 
ReferenceFile into DefaultStoreFileTracker. This is our current thinking.

A design doc will help clarify the proposals and discussion.

> Region split and merge time while offline is O(n) with respect to number of 
> store files
> ---
>
> Key: HBASE-27826
> URL: https://issues.apache.org/jira/browse/HBASE-27826
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.5.4
>Reporter: Andrew Kyle Purtell
>Priority: Major
>
> This is a significant availability issue when HFiles are on S3. =
> HBASE-26079 ({_}Use StoreFileTracker when splitting and merging{_}) changed 
> the 

[jira] [Comment Edited] (HBASE-27826) Region split and merge time while offline is O(n) with respect to number of store files

2024-03-13 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826771#comment-17826771
 ] 

Andrew Kyle Purtell edited comment on HBASE-27826 at 3/13/24 3:37 PM:
--

{quote}We will define a splitFiles method in StoreFileTracker interface
{quote}
No. Split logic should remain in SplitTransaction. Breaking this encapsulation 
and diluting the split implementation does not seem like a good idea to me, but 
we could discuss it, if someone actually wants this.

StoreFileTracker is a directory of store files. This concept is neatly extended 
to include management of reference and link files. And once SFT is managing 
references and links, they do not need to be real files, they can be virtual 
concepts maintained in the manifest. So SFT gets new additional methods for 
adding and removing references and links. Like createLink(), deleteLink(), 
createReference(), deleteReference(), and so on.

Once references and links are concepts managed by SFT, we can have the 
different SFT implementations optimize for their design cases. When using the 
FileBasedStoreFileTracker we would not wait for up to a second or two when 
creating each link or reference in the S3 bucket, causing long offline times 
during splits proportional to the number of store files in the region. Instead 
imagine we gather all of the links and references we want to create into a 
list, and we submit them to SFT all at once, using an interface method that 
accepts an array or list of SFT mutations to perform in batch, so there is only 
one manifest update required, and then this aspect of splitting becomes O(1) in 
time.

Regarding the DefaultStoreFileTracker, it maintains existing functionality. 
DefaultStoreFileTracker needs new methods for creating and managing links too, 
but they will be real link and reference files, they will maintain their 
current naming and structure, this will be fully compatible with existing 
stores. This amounts to refactoring some of the code in HFileLink and 
ReferenceFile into DefaultStoreFileTracker. This is our current thinking.

A design doc will help clarify the proposals and discussion.


was (Author: apurtell):
{quote}We will define a splitFiles method in StoreFileTracker interface
{quote}
No. Split logic should remain in SplitTransaction. Breaking this encapsulation 
and diluting the split implementation does not seem like a good idea to me, but 
we could discuss it, if someone actually wants this.

StoreFileTracker is a directory of store files. This concept is neatly extended 
to include management of reference and link files. And once SFT is managing 
references and links, they do not need to be real files, they can be virtual 
concepts maintained in the manifest. So SFT gets new additional methods for 
adding and removing references and links. Like createLink(), deleteLink(), 
createReference(), deleteReference(), and so on. 

Once references and links are virtual concepts when using the 
FileBasedStoreFileTracker, we do not wait for up to a second or two when 
creating each link or reference in the S3 bucket, causing long offline times 
during splits proportional to the number of store files in the region.

We can further optimize by gathering all of the links and references we want to 
create into a list and submitting them to SFT all at once using an interface 
method that accepts an array or list of SFT mutations to perform in batch, so 
there is only one manifest update required, and then this aspect of splitting 
becomes O(1) in time.

Regarding the DefaultStoreFileTracker, it maintains existing functionality. 
DefaultStoreFileTracker needs new methods for creating and managing links too, 
but they will be real link and reference files, they will maintain their 
current naming and structure, this will be fully compatible with existing 
stores. This amounts to refactoring some of the code in HFileLink and 
ReferenceFile into DefaultStoreFileTracker. This is our current thinking.

A design doc will help clarify the proposals and discussion.

> Region split and merge time while offline is O(n) with respect to number of 
> store files
> ---
>
> Key: HBASE-27826
> URL: https://issues.apache.org/jira/browse/HBASE-27826
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.5.4
>Reporter: Andrew Kyle Purtell
>Priority: Major
>
> This is a significant availability issue when HFiles are on S3. =
> HBASE-26079 ({_}Use StoreFileTracker when splitting and merging{_}) changed 
> the split and merge table procedure implementations to indirect through the 
> StoreFileTracker implementation when selecting HFiles to be merged or split, 
> rather than directly listing those using file system APIs. It also changed 
> 

[jira] [Comment Edited] (HBASE-27826) Region split and merge time while offline is O(n) with respect to number of store files

2024-03-13 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826771#comment-17826771
 ] 

Andrew Kyle Purtell edited comment on HBASE-27826 at 3/13/24 3:36 PM:
--

{quote}We will define a splitFiles method in StoreFileTracker interface
{quote}
No. Split logic should remain in SplitTransaction. Breaking this encapsulation 
and diluting the split implementation does not seem like a good idea to me, but 
we could discuss it, if someone actually wants this.

StoreFileTracker is a directory of store files. This concept is neatly extended 
to include management of reference and link files. And once SFT is managing 
references and links, they do not need to be real files, they can be virtual 
concepts maintained in the manifest. So SFT gets new additional methods for 
adding and removing references and links. Like createLink(), deleteLink(), 
createReference(), deleteReference(), and so on. 

Once references and links are virtual concepts when using the 
FileBasedStoreFileTracker, we do not wait for up to a second or two when 
creating each link or reference in the S3 bucket, causing long offline times 
during splits proportional to the number of store files in the region.

We can further optimize by gathering all of the links and references we want to 
create into a list and submitting them to SFT all at once using an interface 
method that accepts an array or list of SFT mutations to perform in batch, so 
there is only one manifest update required, and then this aspect of splitting 
becomes O(1) in time.

Regarding the DefaultStoreFileTracker, it maintains existing functionality. 
DefaultStoreFileTracker needs new methods for creating and managing links too, 
but they will be real link and reference files, they will maintain their 
current naming and structure, this will be fully compatible with existing 
stores. This amounts to refactoring some of the code in HFileLink and 
ReferenceFile into DefaultStoreFileTracker. This is our current thinking.

A design doc will help clarify the proposals and discussion.


was (Author: apurtell):
{quote}We will define a splitFiles method in StoreFileTracker interface
{quote}
No. Split logic should remain in SplitTransaction. Breaking this encapsulation 
and diluting the split implementation does not seem like a good idea to me, but 
we could discuss it, if someone actually wants this.

StoreFileTracker is a directory of store files. This concept is neatly extended 
to include management of reference and link files. And once SFT is managing 
references and links, they do not need to be real files, they can be virtual 
concepts maintained in the manifest. So SFT gets new additional methods for 
adding and removing references and links.

Once references and links are virtual concepts when using the 
FileBasedStoreFileTracker, we do not wait for up to a second or two when 
creating each link or reference in the S3 bucket, causing long offline times 
during splits proportional to the number of store files in the region.

We can further optimize by gathering all of the links and references we want to 
create into a list and submitting them to SFT all at once by some method like 
SFT.createLink(HFileLink links[]), so there is only one manifest update 
required, and then this aspect of splitting becomes O(1) in time.

Regarding the DefaultStoreFileTracker, it maintains existing functionality. 
DefaultStoreFileTracker needs new methods for creating and managing links too, 
but they will be real link and reference files, they will maintain their 
current naming and structure, this will be fully compatible with existing 
stores. This amounts to refactoring some of the code in HFileLink and 
ReferenceFile into DefaultStoreFileTracker. This is our current thinking. 

A design doc will help clarify the proposals and discussion.

> Region split and merge time while offline is O(n) with respect to number of 
> store files
> ---
>
> Key: HBASE-27826
> URL: https://issues.apache.org/jira/browse/HBASE-27826
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.5.4
>Reporter: Andrew Kyle Purtell
>Priority: Major
>
> This is a significant availability issue when HFiles are on S3. =
> HBASE-26079 ({_}Use StoreFileTracker when splitting and merging{_}) changed 
> the split and merge table procedure implementations to indirect through the 
> StoreFileTracker implementation when selecting HFiles to be merged or split, 
> rather than directly listing those using file system APIs. It also changed 
> the commit logic in HRegionFileSystem to add the link/ref files on resulting 
> split or merged regions to the StoreFileTracker. However, the creation of a 
> link file is still a filesystem operation and 

[jira] [Comment Edited] (HBASE-27826) Region split and merge time while offline is O(n) with respect to number of store files

2024-03-13 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826771#comment-17826771
 ] 

Andrew Kyle Purtell edited comment on HBASE-27826 at 3/13/24 3:33 PM:
--

{quote}We will define a splitFiles method in StoreFileTracker interface
{quote}
No. Split logic should remain in SplitTransaction. Breaking this encapsulation 
and diluting the split implementation does not seem like a good idea to me, but 
we could discuss it, if someone actually wants this.

StoreFileTracker is a directory of store files. This concept is neatly extended 
to include management of reference and link files. And once SFT is managing 
references and links, they do not need to be real files, they can be virtual 
concepts maintained in the manifest. So SFT gets new additional methods for 
adding and removing references and links.

Once references and links are virtual concepts when using the 
FileBasedStoreFileTracker, we do not wait for up to a second or two when 
creating each link or reference in the S3 bucket, causing long offline times 
during splits proportional to the number of store files in the region.

We can further optimize by gathering all of the links and references we want to 
create into a list and submitting them to SFT all at once by some method like 
SFT.createLink(HFileLink links[]), so there is only one manifest update 
required, and then this aspect of splitting becomes O(1) in time.

Regarding the DefaultStoreFileTracker, it maintains existing functionality. 
DefaultStoreFileTracker needs new methods for creating and managing links too, 
but they will be real link and reference files, they will maintain their 
current naming and structure, this will be fully compatible with existing 
stores. This amounts to refactoring some of the code in HFileLink and 
ReferenceFile into DefaultStoreFileTracker. This is our current thinking. 

A design doc will help clarify the proposals and discussion.


was (Author: apurtell):
{quote}We will define a splitFiles method in StoreFileTracker interface
{quote}
No. Split logic should remain in SplitTransaction. Breaking this encapsulation 
and diluting the split implementation does not seem like a good idea to me, but 
we could discuss it, if someone actually wants this.

StoreFileTracker is a directory of store files. This concept is neatly extended 
to include management of reference and link files. And once SFT is managing 
references and links, they do not need to be real files, they can be virtual 
concepts maintained in the manifest. So SFT gets new additional methods for 
adding and removing references and links. 

Once references and links are virtual concepts when using the 
FileBasedStoreFileTracker, we do not wait for up to a second or two when 
creating each link or reference in the S3 bucket, causing long offline times 
during splits proportional to the number of store files in the region. 

We can further optimize by gathering all of the links and references we want to 
create into a list and submitting them to SFT all at once by some method like 
SFT.createLink(HFileLink links[]), so there is only one manifest update 
required, and then this aspect of splitting becomes O(1) in time.

A design doc will help clarify the proposals and discussion.

> Region split and merge time while offline is O(n) with respect to number of 
> store files
> ---
>
> Key: HBASE-27826
> URL: https://issues.apache.org/jira/browse/HBASE-27826
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.5.4
>Reporter: Andrew Kyle Purtell
>Priority: Major
>
> This is a significant availability issue when HFiles are on S3. =
> HBASE-26079 ({_}Use StoreFileTracker when splitting and merging{_}) changed 
> the split and merge table procedure implementations to indirect through the 
> StoreFileTracker implementation when selecting HFiles to be merged or split, 
> rather than directly listing those using file system APIs. It also changed 
> the commit logic in HRegionFileSystem to add the link/ref files on resulting 
> split or merged regions to the StoreFileTracker. However, the creation of a 
> link file is still a filesystem operation and creating a “file” on S3 can 
> take well over a second. If, for example there are 20 store files in a 
> region, which is not uncommon, after the region is taken offline for a split 
> (or merge) it may require more than 20 seconds to create the link files 
> before the results can be brought back online, creating a severe availability 
> problem. Splits and merges are supposed to be fast, completing in less than a 
> second, certainly less than a few seconds. This has been true when HFiles are 
> stored on HDFS only because file creation operations there are nearly 

[jira] [Comment Edited] (HBASE-27826) Region split and merge time while offline is O(n) with respect to number of store files

2024-03-13 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826771#comment-17826771
 ] 

Andrew Kyle Purtell edited comment on HBASE-27826 at 3/13/24 3:29 PM:
--

{quote}We will define a splitFiles method in StoreFileTracker interface
{quote}
No. Split logic should remain in SplitTransaction. Breaking this encapsulation 
and diluting the split implementation does not seem like a good idea to me, but 
we could discuss it, if someone actually wants this.

StoreFileTracker is a directory of store files. This concept is neatly extended 
to include management of reference and link files. And once SFT is managing 
references and links, they do not need to be real files, they can be virtual 
concepts maintained in the manifest. So SFT gets new additional methods for 
adding and removing references and links. 

Once references and links are virtual concepts when using the 
FileBasedStoreFileTracker, we do not wait for up to a second or two when 
creating each link or reference in the S3 bucket, causing long offline times 
during splits proportional to the number of store files in the region. 

We can further optimize by gathering all of the links and references we want to 
create into a list and submitting them to SFT all at once by some method like 
SFT.createLink(HFileLink links[]), so there is only one manifest update 
required, and then this aspect of splitting becomes O(1) in time.

A design doc will help clarify the proposals and discussion.


was (Author: apurtell):
{quote}We will define a splitFiles method in StoreFileTracker interface
{quote}
No. Split logic should remain in SplitTransaction. Breaking this encapsulation 
and diluting the split implementation does not seem like a good idea to me, but 
we could discuss it, if someone actually wants this. 

StoreFileTracker is a directory of store files. This concept is neatly extended 
to include management of reference and link files. And once SFT is managing 
references and links, they do not need to be real files, they can be virtual 
concepts maintained in the manifest. So SFT gets new additional methods for 
adding and removing references and links. 

A design doc will help clarify the proposals and discussion.

> Region split and merge time while offline is O(n) with respect to number of 
> store files
> ---
>
> Key: HBASE-27826
> URL: https://issues.apache.org/jira/browse/HBASE-27826
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.5.4
>Reporter: Andrew Kyle Purtell
>Priority: Major
>
> This is a significant availability issue when HFiles are on S3. =
> HBASE-26079 ({_}Use StoreFileTracker when splitting and merging{_}) changed 
> the split and merge table procedure implementations to indirect through the 
> StoreFileTracker implementation when selecting HFiles to be merged or split, 
> rather than directly listing those using file system APIs. It also changed 
> the commit logic in HRegionFileSystem to add the link/ref files on resulting 
> split or merged regions to the StoreFileTracker. However, the creation of a 
> link file is still a filesystem operation and creating a “file” on S3 can 
> take well over a second. If, for example there are 20 store files in a 
> region, which is not uncommon, after the region is taken offline for a split 
> (or merge) it may require more than 20 seconds to create the link files 
> before the results can be brought back online, creating a severe availability 
> problem. Splits and merges are supposed to be fast, completing in less than a 
> second, certainly less than a few seconds. This has been true when HFiles are 
> stored on HDFS only because file creation operations there are nearly 
> instantaneous. 
> There are two issues but both can be handled with modifications to the store 
> file tracker interface and the file based store file tracker implementation. 
> When the file based store file file tracker is enabled the HFile links should 
> be virtual entities that only exist in the file manifest. We do not require 
> physical files in the filesystem to serve as links now. That is the magic of 
> the this file tracker, the manifest file replaces requirements to list the 
> filesystem.
> Then, when splitting or merging, the HFile links should be collected into a 
> list and committed in one batch using a new FILE file tracker interface, 
> requiring only one update of the manifest file in S3, bringing the time 
> requirement for this operation to O(1) down from O[n].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-27826) Region split and merge time while offline is O(n) with respect to number of store files

2024-03-13 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826771#comment-17826771
 ] 

Andrew Kyle Purtell commented on HBASE-27826:
-

{quote}We will define a splitFiles method in StoreFileTracker interface
{quote}
No. Split logic should remain in SplitTransaction. Breaking this encapsulation 
and diluting the split implementation does not seem like a good idea to me, but 
we could discuss it, if someone actually wants this. 

StoreFileTracker is a directory of store files. This concept is neatly extended 
to include management of reference and link files. And once SFT is managing 
references and links, they do not need to be real files, they can be virtual 
concepts maintained in the manifest. So SFT gets new additional methods for 
adding and removing references and links. 

A design doc will help clarify the proposals and discussion.

> Region split and merge time while offline is O(n) with respect to number of 
> store files
> ---
>
> Key: HBASE-27826
> URL: https://issues.apache.org/jira/browse/HBASE-27826
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.5.4
>Reporter: Andrew Kyle Purtell
>Priority: Major
>
> This is a significant availability issue when HFiles are on S3. =
> HBASE-26079 ({_}Use StoreFileTracker when splitting and merging{_}) changed 
> the split and merge table procedure implementations to indirect through the 
> StoreFileTracker implementation when selecting HFiles to be merged or split, 
> rather than directly listing those using file system APIs. It also changed 
> the commit logic in HRegionFileSystem to add the link/ref files on resulting 
> split or merged regions to the StoreFileTracker. However, the creation of a 
> link file is still a filesystem operation and creating a “file” on S3 can 
> take well over a second. If, for example there are 20 store files in a 
> region, which is not uncommon, after the region is taken offline for a split 
> (or merge) it may require more than 20 seconds to create the link files 
> before the results can be brought back online, creating a severe availability 
> problem. Splits and merges are supposed to be fast, completing in less than a 
> second, certainly less than a few seconds. This has been true when HFiles are 
> stored on HDFS only because file creation operations there are nearly 
> instantaneous. 
> There are two issues but both can be handled with modifications to the store 
> file tracker interface and the file based store file tracker implementation. 
> When the file based store file file tracker is enabled the HFile links should 
> be virtual entities that only exist in the file manifest. We do not require 
> physical files in the filesystem to serve as links now. That is the magic of 
> the this file tracker, the manifest file replaces requirements to list the 
> filesystem.
> Then, when splitting or merging, the HFile links should be collected into a 
> list and committed in one batch using a new FILE file tracker interface, 
> requiring only one update of the manifest file in S3, bringing the time 
> requirement for this operation to O(1) down from O[n].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-27826) Region split and merge time while offline is O(n) with respect to number of store files

2024-03-10 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17825138#comment-17825138
 ] 

Andrew Kyle Purtell commented on HBASE-27826:
-

We have someone working on this at my employer. I will connect you. 

> Region split and merge time while offline is O(n) with respect to number of 
> store files
> ---
>
> Key: HBASE-27826
> URL: https://issues.apache.org/jira/browse/HBASE-27826
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.5.4
>Reporter: Andrew Kyle Purtell
>Priority: Major
>
> This is a significant availability issue when HFiles are on S3. =
> HBASE-26079 ({_}Use StoreFileTracker when splitting and merging{_}) changed 
> the split and merge table procedure implementations to indirect through the 
> StoreFileTracker implementation when selecting HFiles to be merged or split, 
> rather than directly listing those using file system APIs. It also changed 
> the commit logic in HRegionFileSystem to add the link/ref files on resulting 
> split or merged regions to the StoreFileTracker. However, the creation of a 
> link file is still a filesystem operation and creating a “file” on S3 can 
> take well over a second. If, for example there are 20 store files in a 
> region, which is not uncommon, after the region is taken offline for a split 
> (or merge) it may require more than 20 seconds to create the link files 
> before the results can be brought back online, creating a severe availability 
> problem. Splits and merges are supposed to be fast, completing in less than a 
> second, certainly less than a few seconds. This has been true when HFiles are 
> stored on HDFS only because file creation operations there are nearly 
> instantaneous. 
> There are two issues but both can be handled with modifications to the store 
> file tracker interface and the file based store file tracker implementation. 
> When the file based store file file tracker is enabled the HFile links should 
> be virtual entities that only exist in the file manifest. We do not require 
> physical files in the filesystem to serve as links now. That is the magic of 
> the this file tracker, the manifest file replaces requirements to list the 
> filesystem.
> Then, when splitting or merging, the HFile links should be collected into a 
> list and committed in one batch using a new FILE file tracker interface, 
> requiring only one update of the manifest file in S3, bringing the time 
> requirement for this operation to O(1) down from O[n].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28411) Remove direct dependency on Curator

2024-03-04 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell updated HBASE-28411:

Fix Version/s: 3.0.0-beta-2

> Remove direct dependency on Curator
> ---
>
> Key: HBASE-28411
> URL: https://issues.apache.org/jira/browse/HBASE-28411
> Project: HBase
>  Issue Type: Improvement
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.0.0-beta-2
>
>
> The only place where Curator is used is 
> ZooKeeperScanPolicyObserver.java in hbase-examples.
> That functionality can be re-implementend without curator, and a problematic 
> dependency can be removed from HBase.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28415) Remove Curator dependency from hbase-endpoint

2024-03-04 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell updated HBASE-28415:

Fix Version/s: 2.6.0
   2.4.18
   3.0.0-beta-2
   2.5.9

> Remove Curator dependency from hbase-endpoint
> -
>
> Key: HBASE-28415
> URL: https://issues.apache.org/jira/browse/HBASE-28415
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0-beta-1
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.6.0, 2.4.18, 3.0.0-beta-2, 2.5.9
>
>
> TestRpcControllerFactory used to erronously import some classes from 
> Curator's relocated Guava.
> This has been fixed now, but the (compile scope) dependency for curator has 
> not been removed.
> I propose removing the dependency in master, branch-3 and branch-2.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28415) Remove Curator dependency from hbase-endpoint

2024-03-04 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell updated HBASE-28415:

Affects Version/s: 3.0.0-beta-1
   Status: Patch Available  (was: Open)

> Remove Curator dependency from hbase-endpoint
> -
>
> Key: HBASE-28415
> URL: https://issues.apache.org/jira/browse/HBASE-28415
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0-beta-1
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
>  Labels: pull-request-available
>
> TestRpcControllerFactory used to erronously import some classes from 
> Curator's relocated Guava.
> This has been fixed now, but the (compile scope) dependency for curator has 
> not been removed.
> I propose removing the dependency in master, branch-3 and branch-2.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28416) Remove hbase-examples from hbase-assembly

2024-03-04 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell updated HBASE-28416:

Affects Version/s: 3.0.0-beta-1
   Status: Patch Available  (was: Open)

> Remove hbase-examples from hbase-assembly
> -
>
> Key: HBASE-28416
> URL: https://issues.apache.org/jira/browse/HBASE-28416
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0-beta-1
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
>  Labels: pull-request-available
>
> hbase-assembly is supposed to contain programming examples for HBase.
> However, it is added to the assembly API, and becomes part of the HBase 
> distrbutions.
> On hand this adds some potentially useful components and coprocessors to 
> HBase, on the other hand many of those are not production quality, and were 
> never meant to be used as-is.
> It also adds the Curator libraries to the general HBase classpath, which are 
> used but by but a single example.
> Removing hbase-examples from the assembly would fix both problems.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-28403) Improve debugging for failures in procedure tests

2024-02-29 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17822322#comment-17822322
 ] 

Andrew Kyle Purtell commented on HBASE-28403:
-

No concerns [~ndimiduk] . I'd approve a PR. Approved the branch-2 and 
branch-2.6 PRs.

> Improve debugging for failures in procedure tests
> -
>
> Key: HBASE-28403
> URL: https://issues.apache.org/jira/browse/HBASE-28403
> Project: HBase
>  Issue Type: Task
>  Components: proc-v2, test
>Reporter: Nick Dimiduk
>Assignee: Nick Dimiduk
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-1, 3.0.0-beta-2
>
>
> We see unit test failures in Jenkins that look like this:
> {noformat}
> java.lang.IllegalArgumentException: run queue not empty
>   at 
> org.apache.hbase.thirdparty.com.google.common.base.Preconditions.checkArgument(Preconditions.java:143)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.load(ProcedureExecutor.java:332)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.init(ProcedureExecutor.java:665)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureTestingUtility.restart(ProcedureTestingUtility.java:132)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureTestingUtility.restart(ProcedureTestingUtility.java:100)
>   at 
> org.apache.hadoop.hbase.master.procedure.MasterProcedureTestingUtility.restartMasterProcedureExecutor(MasterProcedureTestingUtility.java:85)
>   at 
> org.apache.hadoop.hbase.master.assignment.TestRollbackSCP.testFailAndRollback(TestRollbackSCP.java:180)
> {noformat}
> This isn't enough information to debug the situation. The test code in 
> question looks reasonable enough -- it clears the object for re-use between 
> tests. However, somewhere between stop/clear/start we miss something. Add 
> some toString implementations and dump the objects in the preconditions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] (HBASE-28048) RSProcedureDispatcher to abort executing request after configurable retries

2024-02-28 Thread Andrew Kyle Purtell (Jira)


[ https://issues.apache.org/jira/browse/HBASE-28048 ]


Andrew Kyle Purtell deleted comment on HBASE-28048:
-

was (Author: apurtell):
Moving out of 2.5.6

> RSProcedureDispatcher to abort executing request after configurable retries
> ---
>
> Key: HBASE-28048
> URL: https://issues.apache.org/jira/browse/HBASE-28048
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha-4, 2.4.17, 2.5.5
>Reporter: Viraj Jasani
>Priority: Major
> Fix For: 2.4.18, 2.7.0, 3.0.0-beta-2, 2.6.1, 2.5.9
>
>
> In a recent incident, we observed that RSProcedureDispatcher continues 
> executing region open/close procedures with unbounded retries even in the 
> presence of known failures like GSS initiate failure:
>  
> {code:java}
> 2023-08-25 02:21:02,821 WARN [ispatcher-pool-40777] 
> procedure.RSProcedureDispatcher - request to rs1,61020,1692930044498 failed 
> due to java.io.IOException: Call to address=rs1:61020 failed on local 
> exception: java.io.IOException: 
> org.apache.hbase.thirdparty.io.netty.handler.codec.DecoderException: 
> org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException): GSS 
> initiate failed, try=0, retrying... {code}
>  
>  
> If the remote execution results in IOException, the dispatcher attempts to 
> schedule the procedure for further retries:
>  
> {code:java}
>     private boolean scheduleForRetry(IOException e) {
>       LOG.debug("Request to {} failed, try={}", serverName, 
> numberOfAttemptsSoFar, e);
>       // Should we wait a little before retrying? If the server is starting 
> it's yes.
>       ...
>       ...
>       ...
>       numberOfAttemptsSoFar++;
>       // Add some backoff here as the attempts rise otherwise if a stuck 
> condition, will fill logs
>       // with failed attempts. None of our backoff classes -- RetryCounter or 
> ClientBackoffPolicy
>       // -- fit here nicely so just do something simple; increment by 
> rsRpcRetryInterval millis *
>       // retry^2 on each try
>       // up to max of 10 seconds (don't want to back off too much in case of 
> situation change).
>       submitTask(this,
>         Math.min(rsRpcRetryInterval * (this.numberOfAttemptsSoFar * 
> this.numberOfAttemptsSoFar),
>           10 * 1000),
>         TimeUnit.MILLISECONDS);
>       return true;
>     }
>  {code}
>  
>  
> Even though we try to provide backoff while retrying, max wait time is 10s:
>  
> {code:java}
> submitTask(this,
>   Math.min(rsRpcRetryInterval * (this.numberOfAttemptsSoFar * 
> this.numberOfAttemptsSoFar),
> 10 * 1000),
>   TimeUnit.MILLISECONDS); {code}
>  
>  
> This results in endless loop of retries, until either the underlying issue is 
> fixed (e.g. krb issue in this case) or regionserver is killed and the ongoing 
> open/close region procedure (and perhaps entire SCP) for the affected 
> regionserver is sidelined manually.
> {code:java}
> 2023-08-25 03:04:18,918 WARN  [ispatcher-pool-41274] 
> procedure.RSProcedureDispatcher - request to rs1,61020,1692930044498 failed 
> due to java.io.IOException: Call to address=rs1:61020 failed on local 
> exception: java.io.IOException: 
> org.apache.hbase.thirdparty.io.netty.handler.codec.DecoderException: 
> org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException): GSS 
> initiate failed, try=217, retrying...
> 2023-08-25 03:04:18,916 WARN  [ispatcher-pool-41280] 
> procedure.RSProcedureDispatcher - request to rs1,61020,1692930044498 failed 
> due to java.io.IOException: Call to address=rs1:61020 failed on local 
> exception: java.io.IOException: 
> org.apache.hbase.thirdparty.io.netty.handler.codec.DecoderException: 
> org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException): GSS 
> initiate failed, try=193, retrying...
> 2023-08-25 03:04:28,968 WARN  [ispatcher-pool-41315] 
> procedure.RSProcedureDispatcher - request to rs1,61020,1692930044498 failed 
> due to java.io.IOException: Call to address=rs1:61020 failed on local 
> exception: java.io.IOException: 
> org.apache.hbase.thirdparty.io.netty.handler.codec.DecoderException: 
> org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException): GSS 
> initiate failed, try=266, retrying...
> 2023-08-25 03:04:28,969 WARN  [ispatcher-pool-41240] 
> procedure.RSProcedureDispatcher - request to rs1,61020,1692930044498 failed 
> due to java.io.IOException: Call to address=rs1:61020 failed on local 
> exception: java.io.IOException: 
> org.apache.hbase.thirdparty.io.netty.handler.codec.DecoderException: 
> org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException): GSS 
> initiate failed, try=266, retrying...{code}
>  
> While external issues like "krb ticket expiry" requires operator 
> intervention, it is not prudent to fill up the active handlers with endless 
> retries 

[jira] [Commented] (HBASE-28048) RSProcedureDispatcher to abort executing request after configurable retries

2024-02-28 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821980#comment-17821980
 ] 

Andrew Kyle Purtell commented on HBASE-28048:
-

{quote}When Master recognizes that the {{ServerName}} that is the target of a 
{{RemoteProcedure}} has left the cluster, the remote procedure must be failed. 
How the failure is handled is up to each procedure.
{quote}
This is a good idea.

> RSProcedureDispatcher to abort executing request after configurable retries
> ---
>
> Key: HBASE-28048
> URL: https://issues.apache.org/jira/browse/HBASE-28048
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha-4, 2.4.17, 2.5.5
>Reporter: Viraj Jasani
>Priority: Major
> Fix For: 2.4.18, 2.7.0, 3.0.0-beta-2, 2.6.1, 2.5.9
>
>
> In a recent incident, we observed that RSProcedureDispatcher continues 
> executing region open/close procedures with unbounded retries even in the 
> presence of known failures like GSS initiate failure:
>  
> {code:java}
> 2023-08-25 02:21:02,821 WARN [ispatcher-pool-40777] 
> procedure.RSProcedureDispatcher - request to rs1,61020,1692930044498 failed 
> due to java.io.IOException: Call to address=rs1:61020 failed on local 
> exception: java.io.IOException: 
> org.apache.hbase.thirdparty.io.netty.handler.codec.DecoderException: 
> org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException): GSS 
> initiate failed, try=0, retrying... {code}
>  
>  
> If the remote execution results in IOException, the dispatcher attempts to 
> schedule the procedure for further retries:
>  
> {code:java}
>     private boolean scheduleForRetry(IOException e) {
>       LOG.debug("Request to {} failed, try={}", serverName, 
> numberOfAttemptsSoFar, e);
>       // Should we wait a little before retrying? If the server is starting 
> it's yes.
>       ...
>       ...
>       ...
>       numberOfAttemptsSoFar++;
>       // Add some backoff here as the attempts rise otherwise if a stuck 
> condition, will fill logs
>       // with failed attempts. None of our backoff classes -- RetryCounter or 
> ClientBackoffPolicy
>       // -- fit here nicely so just do something simple; increment by 
> rsRpcRetryInterval millis *
>       // retry^2 on each try
>       // up to max of 10 seconds (don't want to back off too much in case of 
> situation change).
>       submitTask(this,
>         Math.min(rsRpcRetryInterval * (this.numberOfAttemptsSoFar * 
> this.numberOfAttemptsSoFar),
>           10 * 1000),
>         TimeUnit.MILLISECONDS);
>       return true;
>     }
>  {code}
>  
>  
> Even though we try to provide backoff while retrying, max wait time is 10s:
>  
> {code:java}
> submitTask(this,
>   Math.min(rsRpcRetryInterval * (this.numberOfAttemptsSoFar * 
> this.numberOfAttemptsSoFar),
> 10 * 1000),
>   TimeUnit.MILLISECONDS); {code}
>  
>  
> This results in endless loop of retries, until either the underlying issue is 
> fixed (e.g. krb issue in this case) or regionserver is killed and the ongoing 
> open/close region procedure (and perhaps entire SCP) for the affected 
> regionserver is sidelined manually.
> {code:java}
> 2023-08-25 03:04:18,918 WARN  [ispatcher-pool-41274] 
> procedure.RSProcedureDispatcher - request to rs1,61020,1692930044498 failed 
> due to java.io.IOException: Call to address=rs1:61020 failed on local 
> exception: java.io.IOException: 
> org.apache.hbase.thirdparty.io.netty.handler.codec.DecoderException: 
> org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException): GSS 
> initiate failed, try=217, retrying...
> 2023-08-25 03:04:18,916 WARN  [ispatcher-pool-41280] 
> procedure.RSProcedureDispatcher - request to rs1,61020,1692930044498 failed 
> due to java.io.IOException: Call to address=rs1:61020 failed on local 
> exception: java.io.IOException: 
> org.apache.hbase.thirdparty.io.netty.handler.codec.DecoderException: 
> org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException): GSS 
> initiate failed, try=193, retrying...
> 2023-08-25 03:04:28,968 WARN  [ispatcher-pool-41315] 
> procedure.RSProcedureDispatcher - request to rs1,61020,1692930044498 failed 
> due to java.io.IOException: Call to address=rs1:61020 failed on local 
> exception: java.io.IOException: 
> org.apache.hbase.thirdparty.io.netty.handler.codec.DecoderException: 
> org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException): GSS 
> initiate failed, try=266, retrying...
> 2023-08-25 03:04:28,969 WARN  [ispatcher-pool-41240] 
> procedure.RSProcedureDispatcher - request to rs1,61020,1692930044498 failed 
> due to java.io.IOException: Call to address=rs1:61020 failed on local 
> exception: java.io.IOException: 
> org.apache.hbase.thirdparty.io.netty.handler.codec.DecoderException: 
> 

[jira] [Comment Edited] (HBASE-28405) Region open procedure silently returns without notifying the parent proc

2024-02-27 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821361#comment-17821361
 ] 

Andrew Kyle Purtell edited comment on HBASE-28405 at 2/27/24 5:49 PM:
--

{quote}Maybe since region is online just changing the state in region state 
node (meta) from MERGING to OPEN would have sifficed in such cases.
{quote}
The best solution is fixing the state in the master to reflect the region is 
still online on the RS after the failed merge. 
In the rollback we can set MERGING state back to OPEN. 

Also, there is a related and interesting finding. 
Here:
{noformat}
2024-02-11 10:53:59,074 WARN [REGION-regionserver/rs-210:60020-10] 
handler.AssignRegionHandler -
Received OPEN for table1,r1,1685436252488.a92008b76ccae47d55c590930b837036. 
which is already online
{noformat}
One option here is the RS can tell the master the assign succeeded, because the 
OPEN request is idempotent when the region is already open on the RS.

[~zhangduo]
{quote}So the problem is that, we should not issue a TRSP if the region is 
already online, when rollbacking the MergeTableRegionsProcedure. If we assign 
the region to the same RS, it will hang the rollback
{quote}
It would be ideal if the master does not make redundant requests, but if it 
does make one, the RS should handle the request and return success to the 
master because the request to open an already open region on the same server is 
idempotent with the earlier request that caused the region to be opened there 
in the first place. So why would this hang the rollback? Maybe because today 
the RS won't ack the new request? So we can change the RS code to do that if so.
{quote}If we assign the region to the same RS, it will hang the rollback, a 
worse scenario is we try to assign it to another region server, then it will 
lead to double assign and cause data loss
{quote}
For sure we must insure a region already OPEN on one server is never assigned 
to a different server concurrently. 


was (Author: apurtell):
{quote}Maybe since region is online just changing the state in region state 
node (meta) from MERGING to OPEN would have sifficed in such cases.
{quote}
The best solution is fixing the state in the master to reflect the region is 
still online on the RS after the failed merge.

Here:
{noformat}
2024-02-11 10:53:59,074 WARN [REGION-regionserver/rs-210:60020-10] 
handler.AssignRegionHandler -
Received OPEN for table1,r1,1685436252488.a92008b76ccae47d55c590930b837036. 
which is already online
{noformat}
One option here is the RS can tell the master the assign succeeded, because the 
OPEN request is idempotent when the region is already open on the RS.

[~zhangduo]
{quote}So the problem is that, we should not issue a TRSP if the region is 
already online, when rollbacking the MergeTableRegionsProcedure. If we assign 
the region to the same RS, it will hang the rollback
{quote}
It would be ideal if the master does not make redundant requests, but if it 
does make one, the RS should handle the request and return success to the 
master because the request to open an already open region on the same server is 
idempotent with the earlier request that caused the region to be opened there 
in the first place. So why would this hang the rollback? Maybe because today 
the RS won't ack the new request? So we can change the RS code to do that if so.
{quote}If we assign the region to the same RS, it will hang the rollback, a 
worse scenario is we try to assign it to another region server, then it will 
lead to double assign and cause data loss
{quote}
For sure we must insure a region already OPEN on one server is never assigned 
to a different server concurrently. 

> Region open procedure silently returns without notifying the parent proc
> 
>
> Key: HBASE-28405
> URL: https://issues.apache.org/jira/browse/HBASE-28405
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Affects Versions: 2.5.7
>Reporter: Aman Poonia
>Assignee: Aman Poonia
>Priority: Major
>
> *We had a scenario in production where a merge operation had failed as below*
> _2024-02-11 10:53:57,715 ERROR [PEWorker-31] 
> assignment.MergeTableRegionsProcedure - Error trying to merge 
> [a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b] in 
> table1 (in state=MERGE_TABLE_REGIONS_CLOSE_REGIONS)_
> _org.apache.hadoop.hbase.HBaseIOException: The parent region state=MERGING, 
> location=rs-229,60020,1707587658182, table=table1, 
> region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up_
> _at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManagerUtil.createUnassignProceduresForSplitOrMerge(AssignmentManagerUtil.java:120)_
> _at 
> 

[jira] [Comment Edited] (HBASE-28405) Region open procedure silently returns without notifying the parent proc

2024-02-27 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821361#comment-17821361
 ] 

Andrew Kyle Purtell edited comment on HBASE-28405 at 2/27/24 5:48 PM:
--

{quote}Maybe since region is online just changing the state in region state 
node (meta) from MERGING to OPEN would have sifficed in such cases.
{quote}
The best solution is fixing the state in the master to reflect the region is 
still online on the RS after the failed merge.

Here:
{noformat}
2024-02-11 10:53:59,074 WARN [REGION-regionserver/rs-210:60020-10] 
handler.AssignRegionHandler -
Received OPEN for table1,r1,1685436252488.a92008b76ccae47d55c590930b837036. 
which is already online
{noformat}
One option here is the RS can tell the master the assign succeeded, because the 
OPEN request is idempotent when the region is already open on the RS.

[~zhangduo]
{quote}So the problem is that, we should not issue a TRSP if the region is 
already online, when rollbacking the MergeTableRegionsProcedure. If we assign 
the region to the same RS, it will hang the rollback
{quote}
It would be ideal if the master does not make redundant requests, but if it 
does make one, the RS should handle the request and return success to the 
master because the request to open an already open region on the same server is 
idempotent with the earlier request that caused the region to be opened there 
in the first place. So why would this hang the rollback? Maybe because today 
the RS won't ack the new request? So we can change the RS code to do that if so.
{quote}If we assign the region to the same RS, it will hang the rollback, a 
worse scenario is we try to assign it to another region server, then it will 
lead to double assign and cause data loss
{quote}
For sure we must insure a region already OPEN on one server is never assigned 
to a different server concurrently. 


was (Author: apurtell):
{quote}Maybe since region is online just changing the state in region state 
node (meta) from MERGING to OPEN would have sifficed in such cases.
{quote}
The best solution is fixing the state in the master to reflect the region is 
still online on the RS after the failed merge.

Here:
{noformat}
2024-02-11 10:53:59,074 WARN [REGION-regionserver/rs-210:60020-10] 
handler.AssignRegionHandler -
Received OPEN for table1,r1,1685436252488.a92008b76ccae47d55c590930b837036. 
which is already online
{noformat}
One option here is the RS can tell the master the assign succeeded, because the 
OPEN request is idempotent when the region is already open on the RS.

[~zhangduo]
{quote}So the problem is that, we should not issue a TRSP if the region is 
already online, when rollbacking the MergeTableRegionsProcedure. If we assign 
the region to the same RS, it will hang the rollback
{quote}
It would be ideal if the master does not make redundant requests, but if it 
does make one, the RS should handle the request and return success to the 
master because the request to open an already open region on the same server is 
idempotent with the earlier request that caused the region to be opened there 
in the first place. So why would this hang the rollback? Maybe because today 
the RS won't ack the new request? So we can change the RS code to do that if so.

> Region open procedure silently returns without notifying the parent proc
> 
>
> Key: HBASE-28405
> URL: https://issues.apache.org/jira/browse/HBASE-28405
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Affects Versions: 2.5.7
>Reporter: Aman Poonia
>Assignee: Aman Poonia
>Priority: Major
>
> *We had a scenario in production where a merge operation had failed as below*
> _2024-02-11 10:53:57,715 ERROR [PEWorker-31] 
> assignment.MergeTableRegionsProcedure - Error trying to merge 
> [a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b] in 
> table1 (in state=MERGE_TABLE_REGIONS_CLOSE_REGIONS)_
> _org.apache.hadoop.hbase.HBaseIOException: The parent region state=MERGING, 
> location=rs-229,60020,1707587658182, table=table1, 
> region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up_
> _at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManagerUtil.createUnassignProceduresForSplitOrMerge(AssignmentManagerUtil.java:120)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.createUnassignProcedures(MergeTableRegionsProcedure.java:648)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:205)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:79)_
> _at 
> 

[jira] [Comment Edited] (HBASE-28405) Region open procedure silently returns without notifying the parent proc

2024-02-27 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821361#comment-17821361
 ] 

Andrew Kyle Purtell edited comment on HBASE-28405 at 2/27/24 5:46 PM:
--

{quote}Maybe since region is online just changing the state in region state 
node (meta) from MERGING to OPEN would have sifficed in such cases.
{quote}
The best solution is fixing the state in the master to reflect the region is 
still online on the RS after the failed merge.

Here:
{noformat}
2024-02-11 10:53:59,074 WARN [REGION-regionserver/rs-210:60020-10] 
handler.AssignRegionHandler -
Received OPEN for table1,r1,1685436252488.a92008b76ccae47d55c590930b837036. 
which is already online
{noformat}
One option here is the RS can tell the master the assign succeeded, because the 
OPEN request is idempotent when the region is already open on the RS.

[~zhangduo]
{quote}So the problem is that, we should not issue a TRSP if the region is 
already online, when rollbacking the MergeTableRegionsProcedure. If we assign 
the region to the same RS, it will hang the rollback
{quote}
It would be ideal if the master does not make redundant requests, but if it 
does make one, the RS should handle the request and return success to the 
master because the request to open an already open region on the same server is 
idempotent with the earlier request that caused the region to be opened there 
in the first place. So why would this hang the rollback? Maybe because today 
the RS won't ack the new request? So we can change the RS code to do that if so.

Although it would be good to optimize the master so it isn't making redundant 
requests. 


was (Author: apurtell):
{quote}Maybe since region is online just changing the state in region state 
node (meta) from MERGING to OPEN would have sifficed in such cases.
{quote}
The best solution is fixing the state in the master to reflect the region is 
still online on the RS after the failed merge.

Here:
{noformat}
2024-02-11 10:53:59,074 WARN [REGION-regionserver/rs-210:60020-10] 
handler.AssignRegionHandler -
Received OPEN for table1,r1,1685436252488.a92008b76ccae47d55c590930b837036. 
which is already online
{noformat}
One option here is the RS can tell the master the assign succeeded, because the 
OPEN request is idempotent when the region is already open on the RS.

[~zhangduo]
{quote}So the problem is that, we should not issue a TRSP if the region is 
already online, when rollbacking the MergeTableRegionsProcedure. If we assign 
the region to the same RS, it will hang the rollback
{quote}
It is unnecessary to make a new TRSP to assign a region already online on the 
regionserver, but if we do make one, the RS should handle the request and 
return success to the master because the request to open an already open region 
on the same server is idempotent with the earlier request that caused the 
region to be opened there in the first place. So why would this hang the 
rollback? Maybe because today the RS won't ack the new request? So we can 
change the RS code to do that if so.

Although it would be good to optimize the master so it isn't making redundant 
requests. 

> Region open procedure silently returns without notifying the parent proc
> 
>
> Key: HBASE-28405
> URL: https://issues.apache.org/jira/browse/HBASE-28405
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Affects Versions: 2.5.7
>Reporter: Aman Poonia
>Assignee: Aman Poonia
>Priority: Major
>
> *We had a scenario in production where a merge operation had failed as below*
> _2024-02-11 10:53:57,715 ERROR [PEWorker-31] 
> assignment.MergeTableRegionsProcedure - Error trying to merge 
> [a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b] in 
> table1 (in state=MERGE_TABLE_REGIONS_CLOSE_REGIONS)_
> _org.apache.hadoop.hbase.HBaseIOException: The parent region state=MERGING, 
> location=rs-229,60020,1707587658182, table=table1, 
> region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up_
> _at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManagerUtil.createUnassignProceduresForSplitOrMerge(AssignmentManagerUtil.java:120)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.createUnassignProcedures(MergeTableRegionsProcedure.java:648)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:205)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:79)_
> _at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:188)_
> _at 
> 

[jira] [Comment Edited] (HBASE-28405) Region open procedure silently returns without notifying the parent proc

2024-02-27 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821361#comment-17821361
 ] 

Andrew Kyle Purtell edited comment on HBASE-28405 at 2/27/24 5:46 PM:
--

{quote}Maybe since region is online just changing the state in region state 
node (meta) from MERGING to OPEN would have sifficed in such cases.
{quote}
The best solution is fixing the state in the master to reflect the region is 
still online on the RS after the failed merge.

Here:
{noformat}
2024-02-11 10:53:59,074 WARN [REGION-regionserver/rs-210:60020-10] 
handler.AssignRegionHandler -
Received OPEN for table1,r1,1685436252488.a92008b76ccae47d55c590930b837036. 
which is already online
{noformat}
One option here is the RS can tell the master the assign succeeded, because the 
OPEN request is idempotent when the region is already open on the RS.

[~zhangduo]
{quote}So the problem is that, we should not issue a TRSP if the region is 
already online, when rollbacking the MergeTableRegionsProcedure. If we assign 
the region to the same RS, it will hang the rollback
{quote}
It would be ideal if the master does not make redundant requests, but if it 
does make one, the RS should handle the request and return success to the 
master because the request to open an already open region on the same server is 
idempotent with the earlier request that caused the region to be opened there 
in the first place. So why would this hang the rollback? Maybe because today 
the RS won't ack the new request? So we can change the RS code to do that if so.


was (Author: apurtell):
{quote}Maybe since region is online just changing the state in region state 
node (meta) from MERGING to OPEN would have sifficed in such cases.
{quote}
The best solution is fixing the state in the master to reflect the region is 
still online on the RS after the failed merge.

Here:
{noformat}
2024-02-11 10:53:59,074 WARN [REGION-regionserver/rs-210:60020-10] 
handler.AssignRegionHandler -
Received OPEN for table1,r1,1685436252488.a92008b76ccae47d55c590930b837036. 
which is already online
{noformat}
One option here is the RS can tell the master the assign succeeded, because the 
OPEN request is idempotent when the region is already open on the RS.

[~zhangduo]
{quote}So the problem is that, we should not issue a TRSP if the region is 
already online, when rollbacking the MergeTableRegionsProcedure. If we assign 
the region to the same RS, it will hang the rollback
{quote}
It would be ideal if the master does not make redundant requests, but if it 
does make one, the RS should handle the request and return success to the 
master because the request to open an already open region on the same server is 
idempotent with the earlier request that caused the region to be opened there 
in the first place. So why would this hang the rollback? Maybe because today 
the RS won't ack the new request? So we can change the RS code to do that if so.

Although it would be good to optimize the master so it isn't making redundant 
requests. 

> Region open procedure silently returns without notifying the parent proc
> 
>
> Key: HBASE-28405
> URL: https://issues.apache.org/jira/browse/HBASE-28405
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Affects Versions: 2.5.7
>Reporter: Aman Poonia
>Assignee: Aman Poonia
>Priority: Major
>
> *We had a scenario in production where a merge operation had failed as below*
> _2024-02-11 10:53:57,715 ERROR [PEWorker-31] 
> assignment.MergeTableRegionsProcedure - Error trying to merge 
> [a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b] in 
> table1 (in state=MERGE_TABLE_REGIONS_CLOSE_REGIONS)_
> _org.apache.hadoop.hbase.HBaseIOException: The parent region state=MERGING, 
> location=rs-229,60020,1707587658182, table=table1, 
> region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up_
> _at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManagerUtil.createUnassignProceduresForSplitOrMerge(AssignmentManagerUtil.java:120)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.createUnassignProcedures(MergeTableRegionsProcedure.java:648)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:205)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:79)_
> _at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:188)_
> _at 
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:922)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1650)_
> 

[jira] [Comment Edited] (HBASE-28405) Region open procedure silently returns without notifying the parent proc

2024-02-27 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821361#comment-17821361
 ] 

Andrew Kyle Purtell edited comment on HBASE-28405 at 2/27/24 5:45 PM:
--

{quote}Maybe since region is online just changing the state in region state 
node (meta) from MERGING to OPEN would have sifficed in such cases.
{quote}
The best solution is fixing the state in the master to reflect the region is 
still online on the RS after the failed merge.

Here:
{noformat}
2024-02-11 10:53:59,074 WARN [REGION-regionserver/rs-210:60020-10] 
handler.AssignRegionHandler -
Received OPEN for table1,r1,1685436252488.a92008b76ccae47d55c590930b837036. 
which is already online
{noformat}
One option here is the RS can tell the master the assign succeeded, because the 
OPEN request is idempotent when the region is already open on the RS.

[~zhangduo]
{quote}So the problem is that, we should not issue a TRSP if the region is 
already online, when rollbacking the MergeTableRegionsProcedure. If we assign 
the region to the same RS, it will hang the rollback
{quote}
It is unnecessary to make a new TRSP to assign a region already online on the 
regionserver, but if we do make one, the RS should handle the request and 
return success to the master because the request to open an already open region 
on the same server is idempotent with the earlier request that caused the 
region to be opened there in the first place. So why would this hang the 
rollback? Maybe because today the RS won't ack the new request? So we can 
change the RS code to do that if so.

Although it would be good to optimize the master so it isn't making redundant 
requests. 


was (Author: apurtell):
{quote}Maybe since region is online just changing the state in region state 
node (meta) from MERGING to OPEN would have sifficed in such cases.
{quote}
The best solution is fixing the state in the master to reflect the region is 
still online on the RS after the failed merge.

Here:
{noformat}
2024-02-11 10:53:59,074 WARN [REGION-regionserver/rs-210:60020-10] 
handler.AssignRegionHandler -
Received OPEN for table1,r1,1685436252488.a92008b76ccae47d55c590930b837036. 
which is already online
{noformat}
One option here is the RS can tell the master the assign succeeded, because the 
OPEN request is idempotent when the region is already open on the RS.

[~zhangduo] I don't understand this part:
{quote}So the problem is that, we should not issue a TRSP if the region is 
already online, when rollbacking the MergeTableRegionsProcedure. If we assign 
the region to the same RS, it will hang the rollback
{quote}
It is unnecessary to make a new TRSP to assign a region already online on the 
regionserver, but if we do make one, the RS should handle the request and 
return success to the master because the request to open an already open region 
on the same server is idempotent with the earlier request that caused the 
region to be opened there in the first place. So why would this hang the 
rollback? Maybe because today the RS won't ack the new request? So we can 
change the RS code to do that if so.

Although it would be good to optimize the master so it isn't making redundant 
requests. 

> Region open procedure silently returns without notifying the parent proc
> 
>
> Key: HBASE-28405
> URL: https://issues.apache.org/jira/browse/HBASE-28405
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Affects Versions: 2.5.7
>Reporter: Aman Poonia
>Assignee: Aman Poonia
>Priority: Major
>
> *We had a scenario in production where a merge operation had failed as below*
> _2024-02-11 10:53:57,715 ERROR [PEWorker-31] 
> assignment.MergeTableRegionsProcedure - Error trying to merge 
> [a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b] in 
> table1 (in state=MERGE_TABLE_REGIONS_CLOSE_REGIONS)_
> _org.apache.hadoop.hbase.HBaseIOException: The parent region state=MERGING, 
> location=rs-229,60020,1707587658182, table=table1, 
> region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up_
> _at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManagerUtil.createUnassignProceduresForSplitOrMerge(AssignmentManagerUtil.java:120)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.createUnassignProcedures(MergeTableRegionsProcedure.java:648)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:205)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:79)_
> _at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:188)_
> _at 
> 

[jira] [Comment Edited] (HBASE-28405) Region open procedure silently returns without notifying the parent proc

2024-02-27 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821361#comment-17821361
 ] 

Andrew Kyle Purtell edited comment on HBASE-28405 at 2/27/24 5:45 PM:
--

{quote}Maybe since region is online just changing the state in region state 
node (meta) from MERGING to OPEN would have sifficed in such cases.
{quote}
The best solution is fixing the state in the master to reflect the region is 
still online on the RS after the failed merge.

Here:
{noformat}
2024-02-11 10:53:59,074 WARN [REGION-regionserver/rs-210:60020-10] 
handler.AssignRegionHandler -
Received OPEN for table1,r1,1685436252488.a92008b76ccae47d55c590930b837036. 
which is already online
{noformat}
One option here is the RS can tell the master the assign succeeded, because the 
OPEN request is idempotent when the region is already open on the RS.

[~zhangduo] I don't understand this part:
{quote}So the problem is that, we should not issue a TRSP if the region is 
already online, when rollbacking the MergeTableRegionsProcedure. If we assign 
the region to the same RS, it will hang the rollback
{quote}
It is unnecessary to make a new TRSP to assign a region already online on the 
regionserver, but if we do make one, the RS should handle the request and 
return success to the master because the request to open an already open region 
on the same server is idempotent with the earlier request that caused the 
region to be opened there in the first place. So why would this hang the 
rollback? Maybe because today the RS won't ack the new request? So we can 
change the RS code to do that if so.

Although it would be good to optimize the master so it isn't making redundant 
requests. 


was (Author: apurtell):
{quote}Maybe since region is online just changing the state in region state 
node (meta) from MERGING to OPEN would have sifficed in such cases.
{quote}
+1, I think this is what Duo was getting at. The solution is fixing the state 
in the master to reflect the region is still online on the RS after the failed 
merge.

Here:
{noformat}
2024-02-11 10:53:59,074 WARN [REGION-regionserver/rs-210:60020-10] 
handler.AssignRegionHandler -
Received OPEN for table1,r1,1685436252488.a92008b76ccae47d55c590930b837036. 
which is already online
{noformat}
One option here is the RS can tell the master the assign succeeded, because the 
OPEN request is idempotent when the region is already open on the RS.


[~zhangduo] I don't understand this part:

bq. So the problem is that, we should not issue a TRSP if the region is already 
online, when rollbacking the MergeTableRegionsProcedure. If we assign the 
region to the same RS, it will hang the rollback

It is unnecessary to make a new TRSP to assign a region already online on the 
regionserver, but if we do make one, the RS should handle the request and 
return success to the master because the request to open an already open region 
on the same server is idempotent with the earlier request that caused the 
region to be opened there in the first place. So why would this hang the 
rollback? Maybe because today the RS won't ack the new request. So we can 
change the RS code to do that if so.

Although it would be good to optimize the master so it isn't making redundant 
requests. 

> Region open procedure silently returns without notifying the parent proc
> 
>
> Key: HBASE-28405
> URL: https://issues.apache.org/jira/browse/HBASE-28405
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Affects Versions: 2.5.7
>Reporter: Aman Poonia
>Assignee: Aman Poonia
>Priority: Major
>
> *We had a scenario in production where a merge operation had failed as below*
> _2024-02-11 10:53:57,715 ERROR [PEWorker-31] 
> assignment.MergeTableRegionsProcedure - Error trying to merge 
> [a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b] in 
> table1 (in state=MERGE_TABLE_REGIONS_CLOSE_REGIONS)_
> _org.apache.hadoop.hbase.HBaseIOException: The parent region state=MERGING, 
> location=rs-229,60020,1707587658182, table=table1, 
> region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up_
> _at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManagerUtil.createUnassignProceduresForSplitOrMerge(AssignmentManagerUtil.java:120)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.createUnassignProcedures(MergeTableRegionsProcedure.java:648)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:205)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:79)_
> _at 
> 

[jira] [Commented] (HBASE-28405) Region open procedure silently returns without notifying the parent proc

2024-02-27 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821361#comment-17821361
 ] 

Andrew Kyle Purtell commented on HBASE-28405:
-

{quote}Maybe since region is online just changing the state in region state 
node (meta) from MERGING to OPEN would have sifficed in such cases.
{quote}
+1, I think this is what Duo was getting at. The solution is fixing the state 
in the master to reflect the region is still online on the RS after the failed 
merge.

Here:
{noformat}
2024-02-11 10:53:59,074 WARN [REGION-regionserver/rs-210:60020-10] 
handler.AssignRegionHandler -
Received OPEN for table1,r1,1685436252488.a92008b76ccae47d55c590930b837036. 
which is already online
{noformat}
One option here is the RS can tell the master the assign succeeded, because the 
OPEN request is idempotent when the region is already open on the RS.


[~zhangduo] I don't understand this part:

bq. So the problem is that, we should not issue a TRSP if the region is already 
online, when rollbacking the MergeTableRegionsProcedure. If we assign the 
region to the same RS, it will hang the rollback

It is unnecessary to make a new TRSP to assign a region already online on the 
regionserver, but if we do make one, the RS should handle the request and 
return success to the master because the request to open an already open region 
on the same server is idempotent with the earlier request that caused the 
region to be opened there in the first place. So why would this hang the 
rollback? Maybe because today the RS won't ack the new request. So we can 
change the RS code to do that if so.

Although it would be good to optimize the master so it isn't making redundant 
requests. 

> Region open procedure silently returns without notifying the parent proc
> 
>
> Key: HBASE-28405
> URL: https://issues.apache.org/jira/browse/HBASE-28405
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Affects Versions: 2.5.7
>Reporter: Aman Poonia
>Assignee: Aman Poonia
>Priority: Major
>
> *We had a scenario in production where a merge operation had failed as below*
> _2024-02-11 10:53:57,715 ERROR [PEWorker-31] 
> assignment.MergeTableRegionsProcedure - Error trying to merge 
> [a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b] in 
> table1 (in state=MERGE_TABLE_REGIONS_CLOSE_REGIONS)_
> _org.apache.hadoop.hbase.HBaseIOException: The parent region state=MERGING, 
> location=rs-229,60020,1707587658182, table=table1, 
> region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up_
> _at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManagerUtil.createUnassignProceduresForSplitOrMerge(AssignmentManagerUtil.java:120)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.createUnassignProcedures(MergeTableRegionsProcedure.java:648)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:205)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:79)_
> _at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:188)_
> _at 
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:922)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1650)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1396)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1000(ProcedureExecutor.java:75)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.runProcedure(ProcedureExecutor.java:1964)_
> _at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:216)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1991)_
> *Now when we do rollback of failed merge operation we see a issue where 
> region is in state opened until the RS holding it stopped.*
> Rollback create a TRSP as below
> _2024-02-11 10:53:57,719 DEBUG [PEWorker-31] procedure2.ProcedureExecutor - 
> Stored [pid=26674602, 
> state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE; 
> TransitRegionStateProcedure table=table1, 
> region=a92008b76ccae47d55c590930b837036, ASSIGN]_
> *and rollback finished successfully*
> _2024-02-11 10:53:57,721 INFO [PEWorker-31] procedure2.ProcedureExecutor - 
> Rolled back pid=26673594, state=ROLLEDBACK, 
> exception=org.apache.hadoop.hbase.HBaseIOException via 
> master-merge-regions:org.apache.hadoop.hbase.HBaseIOException: The parent 
> region state=MERGING, location=rs-229,60020,1707587658182, table=table1, 
> 

[jira] [Updated] (HBASE-27022) SFT seems to apparently tracking invalid/malformed store files

2024-02-26 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell updated HBASE-27022:

Fix Version/s: 2.5.9
   (was: 2.5.8)

> SFT seems to apparently tracking invalid/malformed store files
> --
>
> Key: HBASE-27022
> URL: https://issues.apache.org/jira/browse/HBASE-27022
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Wellington Chevreuil
>Priority: Minor
> Fix For: 2.7.0, 3.0.0-beta-2, 2.5.9
>
>
> Opening this on behalf of [~apurtell] , who first reported this issue on 
> HBASE-26999: When running scale tests using ITLCC, the following errors were 
> observed:
> {noformat}
> [00]2022-05-05 15:59:52,280 WARN [region-location-0] 
> regionserver.StoreFileInfo:
> Skipping 
> hdfs://ip-172-31-58-47.us-west-2.compute.internal:8020/hbase/data/default/IntegrationTestLoadCommonCrawl/9eafc10e1b5a25532a4f0adf550828fc/c/9d07757144a7404fac02e161b5bd035e
> because it is empty. HBASE-646 DATA LOSS?
> ...
> [00]2022-05-05 15:59:52,320 WARN [region-location-2] 
> regionserver.StoreFileInfo: 
> Skipping 
> hdfs://ip-172-31-58-47.us-west-2.compute.internal:8020/hbase/data/default/IntegrationTestLoadCommonCrawl/5322c54b9a899eae03cb16e956a836d5/c/184b4f55ab1a4dbc813e77aeae1343ae
>  
> because it is empty. HBASE-646 DATA LOSS? {noformat}
>  
> From some discussions in HBASE-26999, it seems that SFT has wrongly tracked 
> an incomplete/unfinished store file. 
> For further context, follow the [comments thread on 
> HBASE-26999|https://issues.apache.org/jira/browse/HBASE-26999?focusedCommentId=17533508=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17533508].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-27380) RitDuration histogram metric is broken

2024-02-26 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell updated HBASE-27380:

Fix Version/s: 2.5.9
   (was: 2.5.8)

> RitDuration histogram metric is broken
> --
>
> Key: HBASE-27380
> URL: https://issues.apache.org/jira/browse/HBASE-27380
> Project: HBase
>  Issue Type: Bug
>Reporter: Bryan Beaudreault
>Priority: Minor
> Fix For: 2.7.0, 3.0.0-beta-2, 2.6.1, 2.5.9
>
>
> Looks like the method which updates it in MetricsAssignmentManager, 
> updateRitDuration, got broken somewhere along the way. It's no longer used. 
> We should wire this back up.
> https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MetricsAssignmentManager.java#L82



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-26844) Fix flaky TestBasicWALEntryStreamFSHLog.testSizeOfLogQueue

2024-02-26 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell updated HBASE-26844:

Fix Version/s: 2.5.9
   (was: 2.5.8)

> Fix flaky TestBasicWALEntryStreamFSHLog.testSizeOfLogQueue
> --
>
> Key: HBASE-26844
> URL: https://issues.apache.org/jira/browse/HBASE-26844
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.5.0, 2.4.12
>Reporter: Xiaolin Ha
>Assignee: Xiaolin Ha
>Priority: Minor
> Fix For: 2.7.0, 3.0.0-beta-2, 2.5.9
>
>
> The failed info is described in HBASE-26843.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HBASE-28158) Decouple RIT list management from TRSP invocation

2024-02-26 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell reassigned HBASE-28158:
---

Assignee: (was: Andrew Kyle Purtell)

> Decouple RIT list management from TRSP invocation
> -
>
> Key: HBASE-28158
> URL: https://issues.apache.org/jira/browse/HBASE-28158
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.5.6
>Reporter: Andrew Kyle Purtell
>Priority: Major
> Fix For: 4.0.0-alpha-1, 2.5.8, 3.0.0-beta-2, 2.6.1
>
>
> Operators bypassed some in progress TRSPs leading to a state where some 
> regions were persistently in transition but hidden. Because the master builds 
> its list of regions in transition by tracking TRSP, the bypass of TRSP 
> removed the regions from the RIT list. 
> Although I can see from reading the code this is the expected behavior, it is 
> surprising for operators and should be changed. Operators expect that regions 
> that should be open but are not appear the master's RIT list, provided by 
> /rits.jsp, the output of the shell's 'rit' command, and in ClusterStatus.
> We should only remove a region from the RIT map when assignment reaches a 
> suitable terminal state.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28158) Decouple RIT list management from TRSP invocation

2024-02-26 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell updated HBASE-28158:

Fix Version/s: 2.5.9
   (was: 2.5.8)

> Decouple RIT list management from TRSP invocation
> -
>
> Key: HBASE-28158
> URL: https://issues.apache.org/jira/browse/HBASE-28158
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.5.6
>Reporter: Andrew Kyle Purtell
>Priority: Major
> Fix For: 4.0.0-alpha-1, 3.0.0-beta-2, 2.6.1, 2.5.9
>
>
> Operators bypassed some in progress TRSPs leading to a state where some 
> regions were persistently in transition but hidden. Because the master builds 
> its list of regions in transition by tracking TRSP, the bypass of TRSP 
> removed the regions from the RIT list. 
> Although I can see from reading the code this is the expected behavior, it is 
> surprising for operators and should be changed. Operators expect that regions 
> that should be open but are not appear the master's RIT list, provided by 
> /rits.jsp, the output of the shell's 'rit' command, and in ClusterStatus.
> We should only remove a region from the RIT map when assignment reaches a 
> suitable terminal state.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28221) Introduce regionserver metric for delayed flushes

2024-02-26 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell updated HBASE-28221:

Fix Version/s: 2.5.9
   (was: 2.5.8)

> Introduce regionserver metric for delayed flushes
> -
>
> Key: HBASE-28221
> URL: https://issues.apache.org/jira/browse/HBASE-28221
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.4.17, 2.5.6
>Reporter: Viraj Jasani
>Assignee: Rahul Kumar
>Priority: Major
> Fix For: 2.4.18, 2.7.0, 3.0.0-beta-2, 2.6.1, 2.5.9
>
>
> If compaction is disabled temporarily to allow stabilizing hdfs load, we can 
> forget re-enabling the compaction. This can result into flushes getting 
> delayed for "hbase.hstore.blockingWaitTime" time (90s). While flushes do 
> happen eventually after waiting for max blocking time, it is important to 
> realize that any cluster cannot function well with compaction disabled for 
> significant amount of time.
>  
> We would also block any write requests until region is flushed (90+ sec, by 
> default):
> {code:java}
> 2023-11-27 20:40:52,124 WARN  [,queue=18,port=60020] regionserver.HRegion - 
> Region is too busy due to exceeding memstore size limit.
> org.apache.hadoop.hbase.RegionTooBusyException: Above memstore limit, 
> regionName=table1,1699923733811.4fd5e52e2133df1e347f32c646f23ab4., 
> server=server-1,60020,1699421714454, memstoreSize=1073820928, 
> blockingMemStoreSize=1073741824
>     at 
> org.apache.hadoop.hbase.regionserver.HRegion.checkResources(HRegion.java:4200)
>     at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3264)
>     at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3215)
>     at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:967)
>     at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:895)
>     at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2524)
>     at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:36812)
>     at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2432)
>     at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
>     at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:311)
>     at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:291) 
> {code}
>  
> Delayed flush logs:
> {code:java}
> LOG.warn("{} has too many store files({}); delaying flush up to {} ms",
>   region.getRegionInfo().getEncodedName(), getStoreFileCount(region),
>   this.blockingWaitTime); {code}
> Suggestion: Introduce regionserver metric (MetricsRegionServerSource) for the 
> num of flushes getting delayed due to too many store files.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28192) Master should recover if meta region state is inconsistent

2024-02-26 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell updated HBASE-28192:

Fix Version/s: 2.5.9
   (was: 2.5.8)

> Master should recover if meta region state is inconsistent
> --
>
> Key: HBASE-28192
> URL: https://issues.apache.org/jira/browse/HBASE-28192
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.4.17, 2.5.6
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
> Fix For: 2.4.18, 3.0.0-beta-2, 2.6.1, 2.5.9
>
>
> During active master initialization, before we set master as active (i.e. 
> {_}setInitialized(true){_}), we need both meta and namespace regions online. 
> If the region state of meta or namespace is inconsistent, active master can 
> get stuck in the initialization step:
> {code:java}
> private boolean isRegionOnline(RegionInfo ri) {
>   RetryCounter rc = null;
>   while (!isStopped()) {
> ...
> ...
> ...
> // Check once-a-minute.
> if (rc == null) {
>   rc = new RetryCounterFactory(Integer.MAX_VALUE, 1000, 60_000).create();
> }
> Threads.sleep(rc.getBackoffTimeAndIncrementAttempts());
>   }
>   return false;
> }
>  {code}
> In one of the recent outage, we observed that meta was online on a server, 
> which was correctly reflected in meta znode, but the server starttime was 
> different. This means that as per the latest transition record, meta was 
> marked online on old server (same server with old start time). This kept 
> active master initialization waiting forever and some SCPs got stuck in 
> initial stage where they need to access meta table before getting candidate 
> for region moves.
> The only way out of this outage is for operator to schedule recoveries using 
> hbck for old server, which triggers SCP for old server address of meta. Since 
> many SCPs were stuck, the processing of new SCP too was taking some time and 
> manual restart of active master triggered failover, and new master was able 
> to complete SCP for old meta server, correcting the meta assignment details, 
> which eventually marked master as active and only after this, we were able to 
> see real large num of RITs that were hidden so far.
> We need to let master recover from this state to avoid manual intervention.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28048) RSProcedureDispatcher to abort executing request after configurable retries

2024-02-26 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell updated HBASE-28048:

Fix Version/s: 2.5.9
   (was: 2.5.8)

> RSProcedureDispatcher to abort executing request after configurable retries
> ---
>
> Key: HBASE-28048
> URL: https://issues.apache.org/jira/browse/HBASE-28048
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha-4, 2.4.17, 2.5.5
>Reporter: Viraj Jasani
>Priority: Major
> Fix For: 2.4.18, 2.7.0, 3.0.0-beta-2, 2.6.1, 2.5.9
>
>
> In a recent incident, we observed that RSProcedureDispatcher continues 
> executing region open/close procedures with unbounded retries even in the 
> presence of known failures like GSS initiate failure:
>  
> {code:java}
> 2023-08-25 02:21:02,821 WARN [ispatcher-pool-40777] 
> procedure.RSProcedureDispatcher - request to rs1,61020,1692930044498 failed 
> due to java.io.IOException: Call to address=rs1:61020 failed on local 
> exception: java.io.IOException: 
> org.apache.hbase.thirdparty.io.netty.handler.codec.DecoderException: 
> org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException): GSS 
> initiate failed, try=0, retrying... {code}
>  
>  
> If the remote execution results in IOException, the dispatcher attempts to 
> schedule the procedure for further retries:
>  
> {code:java}
>     private boolean scheduleForRetry(IOException e) {
>       LOG.debug("Request to {} failed, try={}", serverName, 
> numberOfAttemptsSoFar, e);
>       // Should we wait a little before retrying? If the server is starting 
> it's yes.
>       ...
>       ...
>       ...
>       numberOfAttemptsSoFar++;
>       // Add some backoff here as the attempts rise otherwise if a stuck 
> condition, will fill logs
>       // with failed attempts. None of our backoff classes -- RetryCounter or 
> ClientBackoffPolicy
>       // -- fit here nicely so just do something simple; increment by 
> rsRpcRetryInterval millis *
>       // retry^2 on each try
>       // up to max of 10 seconds (don't want to back off too much in case of 
> situation change).
>       submitTask(this,
>         Math.min(rsRpcRetryInterval * (this.numberOfAttemptsSoFar * 
> this.numberOfAttemptsSoFar),
>           10 * 1000),
>         TimeUnit.MILLISECONDS);
>       return true;
>     }
>  {code}
>  
>  
> Even though we try to provide backoff while retrying, max wait time is 10s:
>  
> {code:java}
> submitTask(this,
>   Math.min(rsRpcRetryInterval * (this.numberOfAttemptsSoFar * 
> this.numberOfAttemptsSoFar),
> 10 * 1000),
>   TimeUnit.MILLISECONDS); {code}
>  
>  
> This results in endless loop of retries, until either the underlying issue is 
> fixed (e.g. krb issue in this case) or regionserver is killed and the ongoing 
> open/close region procedure (and perhaps entire SCP) for the affected 
> regionserver is sidelined manually.
> {code:java}
> 2023-08-25 03:04:18,918 WARN  [ispatcher-pool-41274] 
> procedure.RSProcedureDispatcher - request to rs1,61020,1692930044498 failed 
> due to java.io.IOException: Call to address=rs1:61020 failed on local 
> exception: java.io.IOException: 
> org.apache.hbase.thirdparty.io.netty.handler.codec.DecoderException: 
> org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException): GSS 
> initiate failed, try=217, retrying...
> 2023-08-25 03:04:18,916 WARN  [ispatcher-pool-41280] 
> procedure.RSProcedureDispatcher - request to rs1,61020,1692930044498 failed 
> due to java.io.IOException: Call to address=rs1:61020 failed on local 
> exception: java.io.IOException: 
> org.apache.hbase.thirdparty.io.netty.handler.codec.DecoderException: 
> org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException): GSS 
> initiate failed, try=193, retrying...
> 2023-08-25 03:04:28,968 WARN  [ispatcher-pool-41315] 
> procedure.RSProcedureDispatcher - request to rs1,61020,1692930044498 failed 
> due to java.io.IOException: Call to address=rs1:61020 failed on local 
> exception: java.io.IOException: 
> org.apache.hbase.thirdparty.io.netty.handler.codec.DecoderException: 
> org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException): GSS 
> initiate failed, try=266, retrying...
> 2023-08-25 03:04:28,969 WARN  [ispatcher-pool-41240] 
> procedure.RSProcedureDispatcher - request to rs1,61020,1692930044498 failed 
> due to java.io.IOException: Call to address=rs1:61020 failed on local 
> exception: java.io.IOException: 
> org.apache.hbase.thirdparty.io.netty.handler.codec.DecoderException: 
> org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException): GSS 
> initiate failed, try=266, retrying...{code}
>  
> While external issues like "krb ticket expiry" requires operator 
> intervention, it is not 

[jira] [Comment Edited] (HBASE-27694) Exclude the older versions of netty pulling from Hadoop dependencies

2024-02-26 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820911#comment-17820911
 ] 

Andrew Kyle Purtell edited comment on HBASE-27694 at 2/26/24 11:49 PM:
---

We can't fix this on our side because some Hadoop code still requires netty 3. 
We need to wait for HADOOP-15327 . Fix version is 3.4.0. 
Reopen or file a new issue once 3.4.0 is available and we can agree to make it 
our minimum dependency version.


was (Author: apurtell):
We can't fix this on our side because some Hadoop code still requires netty 3. 
We need to wait for HADOOP-15327 . Fix version is 3.4.0. 

> Exclude the older versions of netty pulling from Hadoop dependencies
> 
>
> Key: HBASE-27694
> URL: https://issues.apache.org/jira/browse/HBASE-27694
> Project: HBase
>  Issue Type: Bug
>Reporter: Rajeshbabu Chintaguntla
>Priority: Major
>
> Currently the netty version of 3.10.6 is getting pulled from hdfs 
> dependencies and sonatype kind of tools reporting the CVEs in HBase. To get 
> rid of this better to exclude netty where hdfs or mapred client jars used.
>  * org.apache.hbase : hbase-it : jar : tests : 2.5.2
>  ** org.apache.hadoop : hadoop-mapreduce-client-core : 3.2.2
>  *** io.netty : netty : 3.10.6.final
>  ** org.apache.hbase : hbase-endpoint : 2.5.2
>  *** org.apache.hadoop : hadoop-hdfs : jar : tests : 3.2.2
>   io.netty : netty : 3.10.6.final
>  *** org.apache.hadoop : hadoop-hdfs : 3.2.2
>   io.netty : netty : 3.10.6.final
>  * org.apache.hadoop : hadoop-mapreduce-client-jobclient : 3.2.2
>  ** io.netty : netty : 3.10.6.final
>  ** org.apache.hadoop : hadoop-mapreduce-client-common : 3.2.2
>  *** io.netty : netty : 3.10.6.final
>  * org.apache.hadoop : hadoop-mapreduce-client-jobclient : jar : tests : 3.2.2
>  ** io.netty : netty : 3.10.6.final
>  * org.apache.hadoop : hadoop-mapreduce-client-hs : 3.2.2
>  ** io.netty : netty : 3.10.6.final
>  ** org.apache.hadoop : hadoop-mapreduce-client-app : 3.2.2
>  *** io.netty : netty : 3.10.6.final
>  *** org.apache.hadoop : hadoop-mapreduce-client-shuffle : 3.2.2
>   io.netty : netty : 3.10.6.final



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-27694) Exclude the older versions of netty pulling from Hadoop dependencies

2024-02-26 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell resolved HBASE-27694.
-
Fix Version/s: (was: 2.5.8)
   (was: 3.0.0-beta-2)
   (was: 2.6.1)
 Assignee: (was: Rajeshbabu Chintaguntla)
   Resolution: Won't Fix

We can't fix this on our side because some Hadoop code still requires netty 3. 
We need to wait for HADOOP-15327 . Fix version is 3.4.0. 

> Exclude the older versions of netty pulling from Hadoop dependencies
> 
>
> Key: HBASE-27694
> URL: https://issues.apache.org/jira/browse/HBASE-27694
> Project: HBase
>  Issue Type: Bug
>Reporter: Rajeshbabu Chintaguntla
>Priority: Major
>
> Currently the netty version of 3.10.6 is getting pulled from hdfs 
> dependencies and sonatype kind of tools reporting the CVEs in HBase. To get 
> rid of this better to exclude netty where hdfs or mapred client jars used.
>  * org.apache.hbase : hbase-it : jar : tests : 2.5.2
>  ** org.apache.hadoop : hadoop-mapreduce-client-core : 3.2.2
>  *** io.netty : netty : 3.10.6.final
>  ** org.apache.hbase : hbase-endpoint : 2.5.2
>  *** org.apache.hadoop : hadoop-hdfs : jar : tests : 3.2.2
>   io.netty : netty : 3.10.6.final
>  *** org.apache.hadoop : hadoop-hdfs : 3.2.2
>   io.netty : netty : 3.10.6.final
>  * org.apache.hadoop : hadoop-mapreduce-client-jobclient : 3.2.2
>  ** io.netty : netty : 3.10.6.final
>  ** org.apache.hadoop : hadoop-mapreduce-client-common : 3.2.2
>  *** io.netty : netty : 3.10.6.final
>  * org.apache.hadoop : hadoop-mapreduce-client-jobclient : jar : tests : 3.2.2
>  ** io.netty : netty : 3.10.6.final
>  * org.apache.hadoop : hadoop-mapreduce-client-hs : 3.2.2
>  ** io.netty : netty : 3.10.6.final
>  ** org.apache.hadoop : hadoop-mapreduce-client-app : 3.2.2
>  *** io.netty : netty : 3.10.6.final
>  *** org.apache.hadoop : hadoop-mapreduce-client-shuffle : 3.2.2
>   io.netty : netty : 3.10.6.final



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-26958) Improve the comments and test coverage for compaction progress implementation

2024-02-26 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell updated HBASE-26958:

Fix Version/s: 2.5.9
   (was: 2.5.8)

> Improve the comments and test coverage for compaction progress implementation
> -
>
> Key: HBASE-26958
> URL: https://issues.apache.org/jira/browse/HBASE-26958
> Project: HBase
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Duo Zhang
>Priority: Major
> Fix For: 3.0.0-beta-2, 2.6.1, 2.5.9
>
>
> When fixing HBASE-26938, we also found the root cause that why sometimes we 
> will get broken compaction progress value, i.e, a negative or more than 100%, 
> it is because multiple compaction can happen at the same time but only use 
> one compaction prorgress, which is not thread safe to track them all.
> We have fixed this problem in HBASE-26938, but it is suggested to improve the 
> test coverage of this ares so we will not make a mistake in the future again.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-27127) Should use FileStatus to archive expired MOB files instead of construct HStoreFile object

2024-02-26 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell updated HBASE-27127:

Fix Version/s: 2.5.9
   (was: 2.5.8)

> Should use FileStatus to archive expired MOB files instead of construct 
> HStoreFile object
> -
>
> Key: HBASE-27127
> URL: https://issues.apache.org/jira/browse/HBASE-27127
> Project: HBase
>  Issue Type: Improvement
>  Components: mob
>Affects Versions: 2.4.12
>Reporter: Xiaolin Ha
>Assignee: Xiaolin Ha
>Priority: Major
> Fix For: 3.0.0-beta-2, 2.6.1, 2.5.9
>
>
> The MobUtils#removeMobFiles reused codes in HFileArchiver#archiveStoreFiles, 
> which is used to archive compacted files on RSes under normal conditions. In 
> HFileArchiver#archiveStoreFiles, store files are closed and removed, while 
> MOB cleaner should only rename the expired files and no need to build new 
> HStoreFile object before archive and close it after removed. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-27243) Fix TestQuotaThrottle after HBASE-27046

2024-02-26 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell updated HBASE-27243:

Fix Version/s: 2.5.9
   (was: 2.5.8)

> Fix TestQuotaThrottle after HBASE-27046
> ---
>
> Key: HBASE-27243
> URL: https://issues.apache.org/jira/browse/HBASE-27243
> Project: HBase
>  Issue Type: Test
>Reporter: Andrew Kyle Purtell
>Priority: Major
> Fix For: 2.4.18, 3.0.0-beta-2, 2.6.1, 2.5.9
>
>
> TestQuotaThrottle breaks monotonic WAL numbering after HBASE-20746 because of 
> how it manipulates the EnvironmentEdge and was disabled by HBASE-27087. Fix 
> the test.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-27698) Migrate meta locations from zookeeper to master data may not always possible if we migrate from 1.x HBase

2024-02-26 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell updated HBASE-27698:

Fix Version/s: (was: 2.5.8)

> Migrate meta locations from zookeeper to master data may not always possible 
> if we migrate from 1.x HBase
> -
>
> Key: HBASE-27698
> URL: https://issues.apache.org/jira/browse/HBASE-27698
> Project: HBase
>  Issue Type: Bug
>  Components: migration
>Affects Versions: 2.5.0
>Reporter: Rajeshbabu Chintaguntla
>Assignee: Rajeshbabu Chintaguntla
>Priority: Major
> Fix For: 2.7.0
>
>
> In HBase 1.x versions meta server location from zookeeper will be removed 
> when the server stopped. In such cases migrating to 2.5.x branches may not 
> create any meta entries in master data. So in case if we could not find the 
> meta location from zookeeper we can get meta location from wal directories 
> with .meta extension and add to master data.
> {noformat}
>   private void tryMigrateMetaLocationsFromZooKeeper() throws IOException, 
> KeeperException {
> // try migrate data from zookeeper
> try (ResultScanner scanner =
>   masterRegion.getScanner(new 
> Scan().addFamily(HConstants.CATALOG_FAMILY))) {
>   if (scanner.next() != null) {
> // notice that all replicas for a region are in the same row, so the 
> migration can be
> // done with in a one row put, which means if we have data in catalog 
> family then we can
> // make sure that the migration is done.
> LOG.info("The {} family in master local region already has data in 
> it, skip migrating...",
>   HConstants.CATALOG_FAMILY_STR);
> return;
>   }
> }
> // start migrating
> byte[] row = 
> CatalogFamilyFormat.getMetaKeyForRegion(RegionInfoBuilder.FIRST_META_REGIONINFO);
> Put put = new Put(row);
> List metaReplicaNodes = zooKeeper.getMetaReplicaNodes();
> StringBuilder info = new StringBuilder("Migrating meta locations:");
> for (String metaReplicaNode : metaReplicaNodes) {
>   int replicaId = 
> zooKeeper.getZNodePaths().getMetaReplicaIdFromZNode(metaReplicaNode);
>   RegionState state = MetaTableLocator.getMetaRegionState(zooKeeper, 
> replicaId);
>   info.append(" ").append(state);
>   put.setTimestamp(state.getStamp());
>   MetaTableAccessor.addRegionInfo(put, state.getRegion());
>   if (state.getServerName() != null) {
> MetaTableAccessor.addLocation(put, state.getServerName(), 
> HConstants.NO_SEQNUM, replicaId);
>   }
>   
> put.add(CellBuilderFactory.create(CellBuilderType.SHALLOW_COPY).setRow(put.getRow())
> .setFamily(HConstants.CATALOG_FAMILY)
> 
> .setQualifier(RegionStateStore.getStateColumn(replicaId)).setTimestamp(put.getTimestamp())
> 
> .setType(Cell.Type.Put).setValue(Bytes.toBytes(state.getState().name())).build());
> }
> if (!put.isEmpty()) {
>   LOG.info(info.toString());
>   masterRegion.update(r -> r.put(put));
> } else {
>   LOG.info("No meta location available on zookeeper, skip migrating...");
> }
>   }
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-27635) Shutdown zookeeper logs coming via ReadOnlyZKClient when hbase shell started

2024-02-26 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell updated HBASE-27635:

Fix Version/s: 2.5.9
   (was: 2.5.8)

> Shutdown zookeeper logs coming via ReadOnlyZKClient when hbase shell started
> 
>
> Key: HBASE-27635
> URL: https://issues.apache.org/jira/browse/HBASE-27635
> Project: HBase
>  Issue Type: Improvement
>  Components: shell
>Reporter: Rajeshbabu Chintaguntla
>Assignee: Rajeshbabu Chintaguntla
>Priority: Major
> Fix For: 3.0.0-beta-2, 2.6.1, 2.5.9
>
>
> When hbase shell with HBase 2.5.2 started there is too much logging of zk 
> connection realated, classpaths etc.  Even though we enabled ERROR log level 
> for zookeeper package.
> {noformat}
> 2023-02-10 17:34:25,211 INFO  
> [ReadOnlyZKClient-host1:2181,host2:2181,host3:2181@0x15c16f19] 
> zookeeper.ZooKeeper: Client 
> environment:zookeeper.version=3.5.9-5-a433770fc7b303332f10174221799495a26bbca2,
>  built on 02/07/2023 13:02 GMT
> 2023-02-10 17:34:25,212 INFO  
> [ReadOnlyZKClient-host1:2181,host2:2181,host3:2181@0x15c16f19] 
> zookeeper.ZooKeeper: Client environment:host.name=host1
> 2023-02-10 17:34:25,212 INFO  
> [ReadOnlyZKClient-host1:2181,host2:2181,host3:2181:2181@0x15c16f19] 
> zookeeper.ZooKeeper: Client environment:java.version=1.8.0_352
> 2023-02-10 17:34:25,212 INFO  
> [ReadOnlyZKClient-host1:2181,host2:2181,host3:2181@0x15c16f19] 
> zookeeper.ZooKeeper: Client environment:java.vendor=Red Hat, Inc.
> 2023-02-10 17:34:25,212 INFO  
> [ReadOnlyZKClient-host1:2181,host2:2181,host3:2181@0x15c16f19] 
> zookeeper.ZooKeeper: Client 
> environment:java.home=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.352.b08-2.el7_9.x86_64/jre
> {noformat}
> Better to change the  org.apache.hadoop.hbase.zookeeper package log level to 
> error.
> {noformat}
> # Set logging level to avoid verboseness
> org.apache.logging.log4j.core.config.Configurator.setAllLevels('org.apache.zookeeper',
>  log_level)
> org.apache.logging.log4j.core.config.Configurator.setAllLevels('org.apache.hadoop',
>  log_level)
> org.apache.logging.log4j.core.config.Configurator.setAllLevels('org.apache.hadoop.hbase.zookeeper',
>  log_level)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-27635) Shutdown zookeeper logs coming via ReadOnlyZKClient when hbase shell started

2024-02-26 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820910#comment-17820910
 ] 

Andrew Kyle Purtell commented on HBASE-27635:
-

Current code on branch-2.5 and master will set the zookeeper logging level to 
ERROR unless the debug (-d) parameter is supplied to the shell. Is this still 
an issue?

> Shutdown zookeeper logs coming via ReadOnlyZKClient when hbase shell started
> 
>
> Key: HBASE-27635
> URL: https://issues.apache.org/jira/browse/HBASE-27635
> Project: HBase
>  Issue Type: Improvement
>  Components: shell
>Reporter: Rajeshbabu Chintaguntla
>Assignee: Rajeshbabu Chintaguntla
>Priority: Major
> Fix For: 2.5.8, 3.0.0-beta-2, 2.6.1
>
>
> When hbase shell with HBase 2.5.2 started there is too much logging of zk 
> connection realated, classpaths etc.  Even though we enabled ERROR log level 
> for zookeeper package.
> {noformat}
> 2023-02-10 17:34:25,211 INFO  
> [ReadOnlyZKClient-host1:2181,host2:2181,host3:2181@0x15c16f19] 
> zookeeper.ZooKeeper: Client 
> environment:zookeeper.version=3.5.9-5-a433770fc7b303332f10174221799495a26bbca2,
>  built on 02/07/2023 13:02 GMT
> 2023-02-10 17:34:25,212 INFO  
> [ReadOnlyZKClient-host1:2181,host2:2181,host3:2181@0x15c16f19] 
> zookeeper.ZooKeeper: Client environment:host.name=host1
> 2023-02-10 17:34:25,212 INFO  
> [ReadOnlyZKClient-host1:2181,host2:2181,host3:2181:2181@0x15c16f19] 
> zookeeper.ZooKeeper: Client environment:java.version=1.8.0_352
> 2023-02-10 17:34:25,212 INFO  
> [ReadOnlyZKClient-host1:2181,host2:2181,host3:2181@0x15c16f19] 
> zookeeper.ZooKeeper: Client environment:java.vendor=Red Hat, Inc.
> 2023-02-10 17:34:25,212 INFO  
> [ReadOnlyZKClient-host1:2181,host2:2181,host3:2181@0x15c16f19] 
> zookeeper.ZooKeeper: Client 
> environment:java.home=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.352.b08-2.el7_9.x86_64/jre
> {noformat}
> Better to change the  org.apache.hadoop.hbase.zookeeper package log level to 
> error.
> {noformat}
> # Set logging level to avoid verboseness
> org.apache.logging.log4j.core.config.Configurator.setAllLevels('org.apache.zookeeper',
>  log_level)
> org.apache.logging.log4j.core.config.Configurator.setAllLevels('org.apache.hadoop',
>  log_level)
> org.apache.logging.log4j.core.config.Configurator.setAllLevels('org.apache.hadoop.hbase.zookeeper',
>  log_level)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-28390) WAL value compression fails for cells with large values

2024-02-22 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819846#comment-17819846
 ] 

Andrew Kyle Purtell commented on HBASE-28390:
-

(y)

> WAL value compression fails for cells with large values
> ---
>
> Key: HBASE-28390
> URL: https://issues.apache.org/jira/browse/HBASE-28390
> Project: HBase
>  Issue Type: Bug
>Reporter: Bryan Beaudreault
>Priority: Major
>  Labels: pull-request-available
>
> We are testing out WAL compression and noticed that it fails for large values 
> when both features (wal compression and wal value compression) are enabled. 
> It works fine with either feature independently, but not when combined. It 
> seems to fail for all of the value compressor types, and the failure is in 
> the LRUDictionary of wal key compression:
>  
> {code:java}
> java.io.IOException: Error  while reading 2 WAL KVs; started reading at 230 
> and read up to 396
>     at 
> org.apache.hadoop.hbase.regionserver.wal.ProtobufWALStreamReader.next(ProtobufWALStreamReader.java:94)
>  ~[classes/:?]
>     at 
> org.apache.hadoop.hbase.wal.CompressedWALTestBase.doTest(CompressedWALTestBase.java:181)
>  ~[test-classes/:?]
>     at 
> org.apache.hadoop.hbase.wal.CompressedWALTestBase.testForSize(CompressedWALTestBase.java:129)
>  ~[test-classes/:?]
>     at 
> org.apache.hadoop.hbase.wal.CompressedWALTestBase.testLarge(CompressedWALTestBase.java:94)
>  ~[test-classes/:?]
>     at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:?]
>     at 
> jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  ~[?:?]
>     at 
> jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:?]
>     at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?]
>     at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>  ~[junit-4.13.2.jar:4.13.2]
>     at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  ~[junit-4.13.2.jar:4.13.2]
>     at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>  ~[junit-4.13.2.jar:4.13.2]
>     at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  ~[junit-4.13.2.jar:4.13.2]
>     at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) 
> ~[junit-4.13.2.jar:4.13.2]
>     at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>  ~[junit-4.13.2.jar:4.13.2]
>     at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) 
> ~[junit-4.13.2.jar:4.13.2]
>     at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>  ~[junit-4.13.2.jar:4.13.2]
>     at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>  ~[junit-4.13.2.jar:4.13.2]
>     at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) 
> ~[junit-4.13.2.jar:4.13.2]
>     at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) 
> ~[junit-4.13.2.jar:4.13.2]
>     at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) 
> ~[junit-4.13.2.jar:4.13.2]
>     at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) 
> ~[junit-4.13.2.jar:4.13.2]
>     at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) 
> ~[junit-4.13.2.jar:4.13.2]
>     at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) 
> ~[junit-4.13.2.jar:4.13.2]
>     at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) 
> ~[junit-4.13.2.jar:4.13.2]
>     at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
>  ~[junit-4.13.2.jar:4.13.2]
>     at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
>  ~[junit-4.13.2.jar:4.13.2]
>     at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
>     at java.lang.Thread.run(Thread.java:829) ~[?:?]
> Caused by: java.lang.IndexOutOfBoundsException: index (21) must be less than 
> size (1)
>     at 
> org.apache.hbase.thirdparty.com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:1371)
>  ~[hbase-shaded-miscellaneous-4.1.5.jar:4.1.5]
>     at 
> org.apache.hbase.thirdparty.com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:1353)
>  ~[hbase-shaded-miscellaneous-4.1.5.jar:4.1.5]
>     at 
> org.apache.hadoop.hbase.io.util.LRUDictionary$BidirectionalLRUMap.get(LRUDictionary.java:153)
>  ~[classes/:?]
>     at 
> org.apache.hadoop.hbase.io.util.LRUDictionary$BidirectionalLRUMap.access$000(LRUDictionary.java:79)
>  ~[classes/:?]
>     at 
> 

[jira] [Commented] (HBASE-28390) WAL value compression fails for cells with large values

2024-02-21 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819407#comment-17819407
 ] 

Andrew Kyle Purtell commented on HBASE-28390:
-

Again with ITLCC some HFile blocks were definitely in excess of 1MB, with 
single values, compressed, being of that size. 
Makes sense you aren't seeing an issue with HFile reads. I didn't either.

bq. maybe we can just make WALCellCodec more resilient to hanging 0 ints (seems 
ugly of course)
Maybe

> WAL value compression fails for cells with large values
> ---
>
> Key: HBASE-28390
> URL: https://issues.apache.org/jira/browse/HBASE-28390
> Project: HBase
>  Issue Type: Bug
>Reporter: Bryan Beaudreault
>Priority: Major
>
> We are testing out WAL compression and noticed that it fails for large values 
> when both features (wal compression and wal value compression) are enabled. 
> It works fine with either feature independently, but not when combined. It 
> seems to fail for all of the value compressor types, and the failure is in 
> the LRUDictionary of wal key compression:
>  
> {code:java}
> java.io.IOException: Error  while reading 2 WAL KVs; started reading at 230 
> and read up to 396
>     at 
> org.apache.hadoop.hbase.regionserver.wal.ProtobufWALStreamReader.next(ProtobufWALStreamReader.java:94)
>  ~[classes/:?]
>     at 
> org.apache.hadoop.hbase.wal.CompressedWALTestBase.doTest(CompressedWALTestBase.java:181)
>  ~[test-classes/:?]
>     at 
> org.apache.hadoop.hbase.wal.CompressedWALTestBase.testForSize(CompressedWALTestBase.java:129)
>  ~[test-classes/:?]
>     at 
> org.apache.hadoop.hbase.wal.CompressedWALTestBase.testLarge(CompressedWALTestBase.java:94)
>  ~[test-classes/:?]
>     at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:?]
>     at 
> jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  ~[?:?]
>     at 
> jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:?]
>     at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?]
>     at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>  ~[junit-4.13.2.jar:4.13.2]
>     at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  ~[junit-4.13.2.jar:4.13.2]
>     at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>  ~[junit-4.13.2.jar:4.13.2]
>     at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  ~[junit-4.13.2.jar:4.13.2]
>     at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) 
> ~[junit-4.13.2.jar:4.13.2]
>     at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>  ~[junit-4.13.2.jar:4.13.2]
>     at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) 
> ~[junit-4.13.2.jar:4.13.2]
>     at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>  ~[junit-4.13.2.jar:4.13.2]
>     at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>  ~[junit-4.13.2.jar:4.13.2]
>     at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) 
> ~[junit-4.13.2.jar:4.13.2]
>     at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) 
> ~[junit-4.13.2.jar:4.13.2]
>     at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) 
> ~[junit-4.13.2.jar:4.13.2]
>     at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) 
> ~[junit-4.13.2.jar:4.13.2]
>     at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) 
> ~[junit-4.13.2.jar:4.13.2]
>     at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) 
> ~[junit-4.13.2.jar:4.13.2]
>     at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) 
> ~[junit-4.13.2.jar:4.13.2]
>     at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
>  ~[junit-4.13.2.jar:4.13.2]
>     at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
>  ~[junit-4.13.2.jar:4.13.2]
>     at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
>     at java.lang.Thread.run(Thread.java:829) ~[?:?]
> Caused by: java.lang.IndexOutOfBoundsException: index (21) must be less than 
> size (1)
>     at 
> org.apache.hbase.thirdparty.com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:1371)
>  ~[hbase-shaded-miscellaneous-4.1.5.jar:4.1.5]
>     at 
> org.apache.hbase.thirdparty.com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:1353)
>  ~[hbase-shaded-miscellaneous-4.1.5.jar:4.1.5]
>     at 
> 

[jira] [Comment Edited] (HBASE-28390) WAL value compression fails for cells with large values

2024-02-21 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819403#comment-17819403
 ] 

Andrew Kyle Purtell edited comment on HBASE-28390 at 2/21/24 10:40 PM:
---

IntegrationTestLoadCommonCrawl loads and verifies the data. It does not read 
the WAL per se but I had replication enabled and was verifying on the far side 
at least some of the time.


was (Author: apurtell):
IntegrationTestLoadCommonCrawl loads and verifies the data. It does not read 
the WAL per se but I had replication enabled and was verifying on the far side.

> WAL value compression fails for cells with large values
> ---
>
> Key: HBASE-28390
> URL: https://issues.apache.org/jira/browse/HBASE-28390
> Project: HBase
>  Issue Type: Bug
>Reporter: Bryan Beaudreault
>Priority: Major
>
> We are testing out WAL compression and noticed that it fails for large values 
> when both features (wal compression and wal value compression) are enabled. 
> It works fine with either feature independently, but not when combined. It 
> seems to fail for all of the value compressor types, and the failure is in 
> the LRUDictionary of wal key compression:
>  
> {code:java}
> java.io.IOException: Error  while reading 2 WAL KVs; started reading at 230 
> and read up to 396
>     at 
> org.apache.hadoop.hbase.regionserver.wal.ProtobufWALStreamReader.next(ProtobufWALStreamReader.java:94)
>  ~[classes/:?]
>     at 
> org.apache.hadoop.hbase.wal.CompressedWALTestBase.doTest(CompressedWALTestBase.java:181)
>  ~[test-classes/:?]
>     at 
> org.apache.hadoop.hbase.wal.CompressedWALTestBase.testForSize(CompressedWALTestBase.java:129)
>  ~[test-classes/:?]
>     at 
> org.apache.hadoop.hbase.wal.CompressedWALTestBase.testLarge(CompressedWALTestBase.java:94)
>  ~[test-classes/:?]
>     at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:?]
>     at 
> jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  ~[?:?]
>     at 
> jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:?]
>     at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?]
>     at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>  ~[junit-4.13.2.jar:4.13.2]
>     at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  ~[junit-4.13.2.jar:4.13.2]
>     at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>  ~[junit-4.13.2.jar:4.13.2]
>     at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  ~[junit-4.13.2.jar:4.13.2]
>     at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) 
> ~[junit-4.13.2.jar:4.13.2]
>     at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>  ~[junit-4.13.2.jar:4.13.2]
>     at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) 
> ~[junit-4.13.2.jar:4.13.2]
>     at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>  ~[junit-4.13.2.jar:4.13.2]
>     at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>  ~[junit-4.13.2.jar:4.13.2]
>     at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) 
> ~[junit-4.13.2.jar:4.13.2]
>     at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) 
> ~[junit-4.13.2.jar:4.13.2]
>     at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) 
> ~[junit-4.13.2.jar:4.13.2]
>     at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) 
> ~[junit-4.13.2.jar:4.13.2]
>     at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) 
> ~[junit-4.13.2.jar:4.13.2]
>     at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) 
> ~[junit-4.13.2.jar:4.13.2]
>     at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) 
> ~[junit-4.13.2.jar:4.13.2]
>     at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
>  ~[junit-4.13.2.jar:4.13.2]
>     at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
>  ~[junit-4.13.2.jar:4.13.2]
>     at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
>     at java.lang.Thread.run(Thread.java:829) ~[?:?]
> Caused by: java.lang.IndexOutOfBoundsException: index (21) must be less than 
> size (1)
>     at 
> org.apache.hbase.thirdparty.com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:1371)
>  ~[hbase-shaded-miscellaneous-4.1.5.jar:4.1.5]
>     at 
> 

[jira] [Commented] (HBASE-28390) WAL value compression fails for cells with large values

2024-02-21 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819403#comment-17819403
 ] 

Andrew Kyle Purtell commented on HBASE-28390:
-

IntegrationTestLoadCommonCrawl loads and verifies the data. It does not read 
the WAL per se but I had replication enabled and was verifying on the far side.

> WAL value compression fails for cells with large values
> ---
>
> Key: HBASE-28390
> URL: https://issues.apache.org/jira/browse/HBASE-28390
> Project: HBase
>  Issue Type: Bug
>Reporter: Bryan Beaudreault
>Priority: Major
>
> We are testing out WAL compression and noticed that it fails for large values 
> when both features (wal compression and wal value compression) are enabled. 
> It works fine with either feature independently, but not when combined. It 
> seems to fail for all of the value compressor types, and the failure is in 
> the LRUDictionary of wal key compression:
>  
> {code:java}
> java.io.IOException: Error  while reading 2 WAL KVs; started reading at 230 
> and read up to 396
>     at 
> org.apache.hadoop.hbase.regionserver.wal.ProtobufWALStreamReader.next(ProtobufWALStreamReader.java:94)
>  ~[classes/:?]
>     at 
> org.apache.hadoop.hbase.wal.CompressedWALTestBase.doTest(CompressedWALTestBase.java:181)
>  ~[test-classes/:?]
>     at 
> org.apache.hadoop.hbase.wal.CompressedWALTestBase.testForSize(CompressedWALTestBase.java:129)
>  ~[test-classes/:?]
>     at 
> org.apache.hadoop.hbase.wal.CompressedWALTestBase.testLarge(CompressedWALTestBase.java:94)
>  ~[test-classes/:?]
>     at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:?]
>     at 
> jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  ~[?:?]
>     at 
> jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:?]
>     at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?]
>     at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>  ~[junit-4.13.2.jar:4.13.2]
>     at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  ~[junit-4.13.2.jar:4.13.2]
>     at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>  ~[junit-4.13.2.jar:4.13.2]
>     at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  ~[junit-4.13.2.jar:4.13.2]
>     at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) 
> ~[junit-4.13.2.jar:4.13.2]
>     at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>  ~[junit-4.13.2.jar:4.13.2]
>     at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) 
> ~[junit-4.13.2.jar:4.13.2]
>     at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>  ~[junit-4.13.2.jar:4.13.2]
>     at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>  ~[junit-4.13.2.jar:4.13.2]
>     at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) 
> ~[junit-4.13.2.jar:4.13.2]
>     at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) 
> ~[junit-4.13.2.jar:4.13.2]
>     at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) 
> ~[junit-4.13.2.jar:4.13.2]
>     at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) 
> ~[junit-4.13.2.jar:4.13.2]
>     at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) 
> ~[junit-4.13.2.jar:4.13.2]
>     at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) 
> ~[junit-4.13.2.jar:4.13.2]
>     at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) 
> ~[junit-4.13.2.jar:4.13.2]
>     at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
>  ~[junit-4.13.2.jar:4.13.2]
>     at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
>  ~[junit-4.13.2.jar:4.13.2]
>     at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
>     at java.lang.Thread.run(Thread.java:829) ~[?:?]
> Caused by: java.lang.IndexOutOfBoundsException: index (21) must be less than 
> size (1)
>     at 
> org.apache.hbase.thirdparty.com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:1371)
>  ~[hbase-shaded-miscellaneous-4.1.5.jar:4.1.5]
>     at 
> org.apache.hbase.thirdparty.com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:1353)
>  ~[hbase-shaded-miscellaneous-4.1.5.jar:4.1.5]
>     at 
> org.apache.hadoop.hbase.io.util.LRUDictionary$BidirectionalLRUMap.get(LRUDictionary.java:153)
>  ~[classes/:?]
>     at 
> 

[jira] [Comment Edited] (HBASE-28390) WAL value compression fails for cells with large values

2024-02-21 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819399#comment-17819399
 ] 

Andrew Kyle Purtell edited comment on HBASE-28390 at 2/21/24 10:15 PM:
---

I find the failure interesting because I tested WAL compression using 
IntegrationTestLoadCommonCrawl which would have value payloads well in excess 
of ~200kb, up to > 1MB. Let me go back and look at my test plan because perhaps 
somehow I failed to always read back the written WALs... 

bq. But I guess I'm not sure how it's working to begin with, since 
BlockDecompressorStream does not know to read an extra 0 int. So maybe not 
breaking to fix after all? 

I don't understand this either.

Maybe {{len > MAX_INPUT_SIZE}} is rare? 

bq. Or maybe we can work around this in hbase.

We can certainly do that. We could even import BlockCompressorStream and 
BlockDecompressorStream and bug fix in our tree. Unfortunately I worry about 
cases where people have used Hadoop compression codecs for compressing WALs and 
HFiles, and then upgrade to a version where we have our fixed compressor stream 
implementations, and they are not compatible unless we add special handling. 
Maintaining compatibility would hinge on knowing under what circumstances we 
should read that extra 0. I have not thought about this in detail so do not 
know how challenging that might be.


was (Author: apurtell):
I find the failure interesting because I tested WAL compression using 
IntegrationTestLoadCommonCrawl which would have value payloads well in excess 
of ~200kb, up to > 1MB. Let me go back and look at my test plan because perhaps 
somehow I failed to always read back the written WALs... 

bq. But I guess I'm not sure how it's working to begin with, since 
BlockDecompressorStream does not know to read an extra 0 int. So maybe not 
breaking to fix after all? 

I don't understand this either.

bq. Or maybe we can work around this in hbase.

We can certainly do that. We could even import BlockCompressorStream and 
BlockDecompressorStream and bug fix in our tree. Unfortunately I worry about 
cases where people have used Hadoop compression codecs for compressing WALs and 
HFiles, and then upgrade to a version where we have our fixed compressor stream 
implementations, and they are not compatible unless we add special handling. 
Maintaining compatibility would hinge on knowing under what circumstances we 
should read that extra 0. I have not thought about this in detail so do not 
know how challenging that might be.

> WAL value compression fails for cells with large values
> ---
>
> Key: HBASE-28390
> URL: https://issues.apache.org/jira/browse/HBASE-28390
> Project: HBase
>  Issue Type: Bug
>Reporter: Bryan Beaudreault
>Priority: Major
>
> We are testing out WAL compression and noticed that it fails for large values 
> when both features (wal compression and wal value compression) are enabled. 
> It works fine with either feature independently, but not when combined. It 
> seems to fail for all of the value compressor types, and the failure is in 
> the LRUDictionary of wal key compression:
>  
> {code:java}
> java.io.IOException: Error  while reading 2 WAL KVs; started reading at 230 
> and read up to 396
>     at 
> org.apache.hadoop.hbase.regionserver.wal.ProtobufWALStreamReader.next(ProtobufWALStreamReader.java:94)
>  ~[classes/:?]
>     at 
> org.apache.hadoop.hbase.wal.CompressedWALTestBase.doTest(CompressedWALTestBase.java:181)
>  ~[test-classes/:?]
>     at 
> org.apache.hadoop.hbase.wal.CompressedWALTestBase.testForSize(CompressedWALTestBase.java:129)
>  ~[test-classes/:?]
>     at 
> org.apache.hadoop.hbase.wal.CompressedWALTestBase.testLarge(CompressedWALTestBase.java:94)
>  ~[test-classes/:?]
>     at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:?]
>     at 
> jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  ~[?:?]
>     at 
> jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:?]
>     at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?]
>     at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>  ~[junit-4.13.2.jar:4.13.2]
>     at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  ~[junit-4.13.2.jar:4.13.2]
>     at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>  ~[junit-4.13.2.jar:4.13.2]
>     at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  ~[junit-4.13.2.jar:4.13.2]
>     at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) 
> ~[junit-4.13.2.jar:4.13.2]
>     at 
> 

[jira] [Comment Edited] (HBASE-28390) WAL value compression fails for cells with large values

2024-02-21 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819399#comment-17819399
 ] 

Andrew Kyle Purtell edited comment on HBASE-28390 at 2/21/24 10:13 PM:
---

I find the failure interesting because I tested WAL compression using 
IntegrationTestLoadCommonCrawl which would have value payloads well in excess 
of ~200kb, up to > 1MB. Let me go back and look at my test plan because perhaps 
somehow I failed to always read back the written WALs... 

bq. But I guess I'm not sure how it's working to begin with, since 
BlockDecompressorStream does not know to read an extra 0 int. So maybe not 
breaking to fix after all? 

I don't understand this either.

bq. Or maybe we can work around this in hbase.

We can certainly do that. We could even import BlockCompressorStream and 
BlockDecompressorStream and bug fix in our tree. Unfortunately I worry about 
cases where people have used Hadoop compression codecs for compressing WALs and 
HFiles, and then upgrade to a version where we have our fixed compressor stream 
implementations, and they are not compatible unless we add special handling. 
Maintaining compatibility would hinge on knowing under what circumstances we 
should read that extra 0. I have not thought about this in detail so do not 
know how challenging that might be.


was (Author: apurtell):
I find the failure interesting because I tested WAL compression using 
IntegrationTestLoadCommonCrawl which would have value payloads well in excess 
of ~200kb, up to > 1MB. Let me go back and look at my test plan because perhaps 
somehow I failed to always read back the written WALs... 

bq. But I guess I'm not sure how it's working to begin with, since 
BlockDecompressorStream does not know to read an extra 0 int. So maybe not 
breaking to fix after all? 

I don't understand this either.

bq. Or maybe we can work around this in hbase.

We can certainly do that. We could even import BlockCompressorStream and 
BlockDecompressorStream and bug fix in our tree. Unfortunately I worry about 
cases where people have used Hadoop compression codecs for compressing WALs and 
HFiles, and then upgrade to a version where we have our fixed compressor stream 
implementations, and they are not compatibility. Maintaining compatibility 
would hinge on knowing under what circumstances we should read that extra 0. I 
have not thought about this in detail so do not know how challenging that might 
be.

> WAL value compression fails for cells with large values
> ---
>
> Key: HBASE-28390
> URL: https://issues.apache.org/jira/browse/HBASE-28390
> Project: HBase
>  Issue Type: Bug
>Reporter: Bryan Beaudreault
>Priority: Major
>
> We are testing out WAL compression and noticed that it fails for large values 
> when both features (wal compression and wal value compression) are enabled. 
> It works fine with either feature independently, but not when combined. It 
> seems to fail for all of the value compressor types, and the failure is in 
> the LRUDictionary of wal key compression:
>  
> {code:java}
> java.io.IOException: Error  while reading 2 WAL KVs; started reading at 230 
> and read up to 396
>     at 
> org.apache.hadoop.hbase.regionserver.wal.ProtobufWALStreamReader.next(ProtobufWALStreamReader.java:94)
>  ~[classes/:?]
>     at 
> org.apache.hadoop.hbase.wal.CompressedWALTestBase.doTest(CompressedWALTestBase.java:181)
>  ~[test-classes/:?]
>     at 
> org.apache.hadoop.hbase.wal.CompressedWALTestBase.testForSize(CompressedWALTestBase.java:129)
>  ~[test-classes/:?]
>     at 
> org.apache.hadoop.hbase.wal.CompressedWALTestBase.testLarge(CompressedWALTestBase.java:94)
>  ~[test-classes/:?]
>     at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:?]
>     at 
> jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  ~[?:?]
>     at 
> jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:?]
>     at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?]
>     at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>  ~[junit-4.13.2.jar:4.13.2]
>     at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  ~[junit-4.13.2.jar:4.13.2]
>     at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>  ~[junit-4.13.2.jar:4.13.2]
>     at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  ~[junit-4.13.2.jar:4.13.2]
>     at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) 
> ~[junit-4.13.2.jar:4.13.2]
>     at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>  

[jira] [Commented] (HBASE-28390) WAL value compression fails for cells with large values

2024-02-21 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819399#comment-17819399
 ] 

Andrew Kyle Purtell commented on HBASE-28390:
-

I find the failure interesting because I tested WAL compression using 
IntegrationTestLoadCommonCrawl which would have value payloads well in excess 
of ~200kb, up to > 1MB. 

bq. But I guess I'm not sure how it's working to begin with, since 
BlockDecompressorStream does not know to read an extra 0 int. So maybe not 
breaking to fix after all? 

I don't understand this either.

bq. Or maybe we can work around this in hbase.

We can certainly do that. We could even import BlockCompressorStream and 
BlockDecompressorStream and bug fix in our tree. Unfortunately I worry about 
cases where people have used Hadoop compression codecs for compressing WALs and 
HFiles, and then upgrade to a version where we have our fixed compressor stream 
implementations, and they are not compatibility. Maintaining compatibility 
would hinge on knowing under what circumstances we should read that extra 0. I 
have not thought about this in detail so do not know how challenging that might 
be.

> WAL value compression fails for cells with large values
> ---
>
> Key: HBASE-28390
> URL: https://issues.apache.org/jira/browse/HBASE-28390
> Project: HBase
>  Issue Type: Bug
>Reporter: Bryan Beaudreault
>Priority: Major
>
> We are testing out WAL compression and noticed that it fails for large values 
> when both features (wal compression and wal value compression) are enabled. 
> It works fine with either feature independently, but not when combined. It 
> seems to fail for all of the value compressor types, and the failure is in 
> the LRUDictionary of wal key compression:
>  
> {code:java}
> java.io.IOException: Error  while reading 2 WAL KVs; started reading at 230 
> and read up to 396
>     at 
> org.apache.hadoop.hbase.regionserver.wal.ProtobufWALStreamReader.next(ProtobufWALStreamReader.java:94)
>  ~[classes/:?]
>     at 
> org.apache.hadoop.hbase.wal.CompressedWALTestBase.doTest(CompressedWALTestBase.java:181)
>  ~[test-classes/:?]
>     at 
> org.apache.hadoop.hbase.wal.CompressedWALTestBase.testForSize(CompressedWALTestBase.java:129)
>  ~[test-classes/:?]
>     at 
> org.apache.hadoop.hbase.wal.CompressedWALTestBase.testLarge(CompressedWALTestBase.java:94)
>  ~[test-classes/:?]
>     at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:?]
>     at 
> jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  ~[?:?]
>     at 
> jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:?]
>     at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?]
>     at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>  ~[junit-4.13.2.jar:4.13.2]
>     at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  ~[junit-4.13.2.jar:4.13.2]
>     at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>  ~[junit-4.13.2.jar:4.13.2]
>     at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  ~[junit-4.13.2.jar:4.13.2]
>     at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) 
> ~[junit-4.13.2.jar:4.13.2]
>     at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>  ~[junit-4.13.2.jar:4.13.2]
>     at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) 
> ~[junit-4.13.2.jar:4.13.2]
>     at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>  ~[junit-4.13.2.jar:4.13.2]
>     at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>  ~[junit-4.13.2.jar:4.13.2]
>     at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) 
> ~[junit-4.13.2.jar:4.13.2]
>     at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) 
> ~[junit-4.13.2.jar:4.13.2]
>     at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) 
> ~[junit-4.13.2.jar:4.13.2]
>     at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) 
> ~[junit-4.13.2.jar:4.13.2]
>     at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) 
> ~[junit-4.13.2.jar:4.13.2]
>     at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) 
> ~[junit-4.13.2.jar:4.13.2]
>     at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) 
> ~[junit-4.13.2.jar:4.13.2]
>     at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
>  ~[junit-4.13.2.jar:4.13.2]
>     at 
> 

[jira] [Comment Edited] (HBASE-28390) WAL value compression fails for cells with large values

2024-02-21 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819399#comment-17819399
 ] 

Andrew Kyle Purtell edited comment on HBASE-28390 at 2/21/24 10:12 PM:
---

I find the failure interesting because I tested WAL compression using 
IntegrationTestLoadCommonCrawl which would have value payloads well in excess 
of ~200kb, up to > 1MB. Let me go back and look at my test plan because perhaps 
somehow I failed to always read back the written WALs... 

bq. But I guess I'm not sure how it's working to begin with, since 
BlockDecompressorStream does not know to read an extra 0 int. So maybe not 
breaking to fix after all? 

I don't understand this either.

bq. Or maybe we can work around this in hbase.

We can certainly do that. We could even import BlockCompressorStream and 
BlockDecompressorStream and bug fix in our tree. Unfortunately I worry about 
cases where people have used Hadoop compression codecs for compressing WALs and 
HFiles, and then upgrade to a version where we have our fixed compressor stream 
implementations, and they are not compatibility. Maintaining compatibility 
would hinge on knowing under what circumstances we should read that extra 0. I 
have not thought about this in detail so do not know how challenging that might 
be.


was (Author: apurtell):
I find the failure interesting because I tested WAL compression using 
IntegrationTestLoadCommonCrawl which would have value payloads well in excess 
of ~200kb, up to > 1MB. 

bq. But I guess I'm not sure how it's working to begin with, since 
BlockDecompressorStream does not know to read an extra 0 int. So maybe not 
breaking to fix after all? 

I don't understand this either.

bq. Or maybe we can work around this in hbase.

We can certainly do that. We could even import BlockCompressorStream and 
BlockDecompressorStream and bug fix in our tree. Unfortunately I worry about 
cases where people have used Hadoop compression codecs for compressing WALs and 
HFiles, and then upgrade to a version where we have our fixed compressor stream 
implementations, and they are not compatibility. Maintaining compatibility 
would hinge on knowing under what circumstances we should read that extra 0. I 
have not thought about this in detail so do not know how challenging that might 
be.

> WAL value compression fails for cells with large values
> ---
>
> Key: HBASE-28390
> URL: https://issues.apache.org/jira/browse/HBASE-28390
> Project: HBase
>  Issue Type: Bug
>Reporter: Bryan Beaudreault
>Priority: Major
>
> We are testing out WAL compression and noticed that it fails for large values 
> when both features (wal compression and wal value compression) are enabled. 
> It works fine with either feature independently, but not when combined. It 
> seems to fail for all of the value compressor types, and the failure is in 
> the LRUDictionary of wal key compression:
>  
> {code:java}
> java.io.IOException: Error  while reading 2 WAL KVs; started reading at 230 
> and read up to 396
>     at 
> org.apache.hadoop.hbase.regionserver.wal.ProtobufWALStreamReader.next(ProtobufWALStreamReader.java:94)
>  ~[classes/:?]
>     at 
> org.apache.hadoop.hbase.wal.CompressedWALTestBase.doTest(CompressedWALTestBase.java:181)
>  ~[test-classes/:?]
>     at 
> org.apache.hadoop.hbase.wal.CompressedWALTestBase.testForSize(CompressedWALTestBase.java:129)
>  ~[test-classes/:?]
>     at 
> org.apache.hadoop.hbase.wal.CompressedWALTestBase.testLarge(CompressedWALTestBase.java:94)
>  ~[test-classes/:?]
>     at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:?]
>     at 
> jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  ~[?:?]
>     at 
> jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:?]
>     at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?]
>     at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>  ~[junit-4.13.2.jar:4.13.2]
>     at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  ~[junit-4.13.2.jar:4.13.2]
>     at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>  ~[junit-4.13.2.jar:4.13.2]
>     at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  ~[junit-4.13.2.jar:4.13.2]
>     at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) 
> ~[junit-4.13.2.jar:4.13.2]
>     at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>  ~[junit-4.13.2.jar:4.13.2]
>     at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) 
> ~[junit-4.13.2.jar:4.13.2]
>     at 
> 

[jira] [Assigned] (HBASE-27706) Additional Zstandard codec compatible with the Hadoop native one

2024-02-05 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell reassigned HBASE-27706:
---

Assignee: (was: Andrew Kyle Purtell)

> Additional Zstandard codec compatible with the Hadoop native one
> 
>
> Key: HBASE-27706
> URL: https://issues.apache.org/jira/browse/HBASE-27706
> Project: HBase
>  Issue Type: Bug
>  Components: compatibility
>Affects Versions: 2.5.3
>Reporter: Frens Jan Rumph
>Priority: Major
>
>  
> We're in the process of upgrading a HBase installation from 2.2.4 to 2.5.3. 
> We're currently using Zstd compression from our Hadoop installation. Due to 
> some other class path issues (Netty issues in relation to the async WAL 
> provider), we would like to remove Hadoop from the class path.
> However, using the Zstd compression from HBase (which uses 
> [https://github.com/luben/zstd-jni]) we seem to hit some incompatibility. 
> When restarting a node to use this implementation we had errors like the 
> following:
> {code:java}
> 2023-03-10 16:33:01,925 WARN  [RS_OPEN_REGION-regionserver/n2:16020-0] 
> handler.AssignRegionHandler: Failed to open region 
> NAMESPACE:TABLE,,1673888962751.cdb726dad4eaabf765969f195e91c737., will report 
> to master
> java.io.IOException: java.io.IOException: 
> org.apache.hadoop.hbase.io.hfile.CorruptHFileException: Problem reading data 
> index and meta index from file 
> hdfs://CLUSTER/hbase/data/NAMESPACE/TABLE/cdb726dad4eaabf765969f195e91c737/e/aea6eddaa8ee476197d064a4b4c345b9
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.initializeStores(HRegion.java:1148)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.initializeStores(HRegion.java:1091)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:994)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:941)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7228)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegionFromTableDir(HRegion.java:7183)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7159)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7118)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7074)
> at 
> org.apache.hadoop.hbase.regionserver.handler.AssignRegionHandler.process(AssignRegionHandler.java:147)
> at 
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:100)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at java.base/java.lang.Thread.run(Thread.java:829)
> Caused by: java.io.IOException: 
> org.apache.hadoop.hbase.io.hfile.CorruptHFileException: Problem reading data 
> index and meta index from file 
> hdfs://CLUSTER/hbase/data/NAMESPACE/TABLE/cdb726dad4eaabf765969f195e91c737/e/aea6eddaa8ee476197d064a4b4c345b9
> at 
> org.apache.hadoop.hbase.regionserver.StoreEngine.openStoreFiles(StoreEngine.java:288)
> at 
> org.apache.hadoop.hbase.regionserver.StoreEngine.initialize(StoreEngine.java:338)
> at org.apache.hadoop.hbase.regionserver.HStore.(HStore.java:297)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:6359)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:1114)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:)
> at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
> at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
> at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
> ... 3 more
> Caused by: org.apache.hadoop.hbase.io.hfile.CorruptHFileException: Problem 
> reading data index and meta index from file 
> hdfs://CLUSTER/hbase/data/NAMESPACE/TABLE/cdb726dad4eaabf765969f195e91c737/e/aea6eddaa8ee476197d064a4b4c345b9
> at 
> org.apache.hadoop.hbase.io.hfile.HFileInfo.initMetaAndIndex(HFileInfo.java:392)
> at 
> org.apache.hadoop.hbase.regionserver.HStoreFile.open(HStoreFile.java:394)
> at 
> org.apache.hadoop.hbase.regionserver.HStoreFile.initReader(HStoreFile.java:518)
> at 
> org.apache.hadoop.hbase.regionserver.StoreEngine.createStoreFileAndReader(StoreEngine.java:225)
> at 
> org.apache.hadoop.hbase.regionserver.StoreEngine.lambda$openStoreFiles$0(StoreEngine.java:266)
> ... 6 more
> Caused by: java.io.IOException: Premature EOF from inputStream, but 

[jira] [Commented] (HBASE-28343) Write codec class into hfile header/trailer

2024-02-05 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17814521#comment-17814521
 ] 

Andrew Kyle Purtell commented on HBASE-28343:
-

We write the compression algorithm ordinal into the trailer, that used to be 
sufficient, but then I added these new codecs where some of the implementation 
options had limitations, where one flavor might not be compatible with another 
-- especially Zstandard! -- although it was assumed that an operator never 
changes codec configuration once data is live in the cluster because a codec 
option is always compatible with itself of course.

bq. I think this problem could be solved by writing the classname of the codec 
used into the hfile. This could be used as a hint so that a regionserver can 
read hfiles compressed with any compression codec that it supports.

+1, makes sense to me.
Adds some safety and improves handling when codec implementations of a various 
algorithm may have been mixed. Although that should not be recommended 
practice. 

There is also HBASE-27706. The idea there is to implement a Hadoop codec 
compatible HBase side codec using zstd-jni, which I think is possible. 

> Write codec class into hfile header/trailer
> ---
>
> Key: HBASE-28343
> URL: https://issues.apache.org/jira/browse/HBASE-28343
> Project: HBase
>  Issue Type: Improvement
>Reporter: Bryan Beaudreault
>Priority: Major
>
> We recently started playing around with the new bundled compression libraries 
> as of 2.5.0. Specifically, we are experimenting with the different zstd 
> codecs. The book says that aircompressor's zstd is not data compatible with 
> hadoops, but doesn't say the same about zstd-jni.
> In our experiments we ended up in a state where some hfiles were encoded with 
> zstd-jni (zstd.ZstdCodec) while others were encoded with hadoop 
> (ZStandardCodec). At this point the cluster became extremely unstable, with 
> some files unable to be read because they encoded with a codec that didn't 
> match the current runtime configration. Changing the runtime configuration 
> caused the other files to not be readable.
> I think this problem could be solved by writing the classname of the codec 
> used into the hfile. This could be used as a hint so that a regionserver can 
> read hfiles compressed with any compression codec that it supports.
> [~apurtell] do you have any thoughts here since you brought us all of these 
> great compression options?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28282) Update downloads.xml for release 2.5.7

2023-12-24 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell resolved HBASE-28282.
-
Resolution: Fixed

> Update downloads.xml for release 2.5.7
> --
>
> Key: HBASE-28282
> URL: https://issues.apache.org/jira/browse/HBASE-28282
> Project: HBase
>  Issue Type: Task
>Reporter: Andrew Kyle Purtell
>Assignee: Andrew Kyle Purtell
>Priority: Trivial
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28282) Update downloads.xml for release 2.5.7

2023-12-24 Thread Andrew Kyle Purtell (Jira)
Andrew Kyle Purtell created HBASE-28282:
---

 Summary: Update downloads.xml for release 2.5.7
 Key: HBASE-28282
 URL: https://issues.apache.org/jira/browse/HBASE-28282
 Project: HBase
  Issue Type: Task
Reporter: Andrew Kyle Purtell
Assignee: Andrew Kyle Purtell






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28267) create-release should run spotless

2023-12-16 Thread Andrew Kyle Purtell (Jira)
Andrew Kyle Purtell created HBASE-28267:
---

 Summary: create-release should run spotless
 Key: HBASE-28267
 URL: https://issues.apache.org/jira/browse/HBASE-28267
 Project: HBase
  Issue Type: Task
Reporter: Andrew Kyle Purtell


Before committing generated files like CHANGES.md and RELEASENOTES.md we should 
run 'mvn spotless:apply' first to ensure what is committed is formatted per our 
rules and will not be modified when someone invokes spotless later.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28262) Fix spotless error on branch-2.5

2023-12-16 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell updated HBASE-28262:

Fix Version/s: 2.5.8
 Assignee: Duo Zhang
   Status: Patch Available  (was: Open)

> Fix spotless error on branch-2.5
> 
>
> Key: HBASE-28262
> URL: https://issues.apache.org/jira/browse/HBASE-28262
> Project: HBase
>  Issue Type: Bug
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 2.5.8
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (HBASE-28262) Fix spotless error on branch-2.5

2023-12-16 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17797468#comment-17797468
 ] 

Andrew Kyle Purtell edited comment on HBASE-28262 at 12/16/23 6:33 PM:
---

Yes.
Filed HBASE-28267


was (Author: apurtell):
Yes.

> Fix spotless error on branch-2.5
> 
>
> Key: HBASE-28262
> URL: https://issues.apache.org/jira/browse/HBASE-28262
> Project: HBase
>  Issue Type: Bug
>Reporter: Duo Zhang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-28262) Fix spotless error on branch-2.5

2023-12-16 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17797468#comment-17797468
 ] 

Andrew Kyle Purtell commented on HBASE-28262:
-

Yes.

> Fix spotless error on branch-2.5
> 
>
> Key: HBASE-28262
> URL: https://issues.apache.org/jira/browse/HBASE-28262
> Project: HBase
>  Issue Type: Bug
>Reporter: Duo Zhang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-27380) RitDuration histogram metric is broken

2023-12-14 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell updated HBASE-27380:

Fix Version/s: 2.7.0
   2.5.8
   (was: 2.5.7)

> RitDuration histogram metric is broken
> --
>
> Key: HBASE-27380
> URL: https://issues.apache.org/jira/browse/HBASE-27380
> Project: HBase
>  Issue Type: Bug
>Reporter: Bryan Beaudreault
>Priority: Minor
> Fix For: 2.6.0, 3.0.0-beta-1, 2.7.0, 2.5.8
>
>
> Looks like the method which updates it in MetricsAssignmentManager, 
> updateRitDuration, got broken somewhere along the way. It's no longer used. 
> We should wire this back up.
> https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MetricsAssignmentManager.java#L82



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-26844) Fix flaky TestBasicWALEntryStreamFSHLog.testSizeOfLogQueue

2023-12-14 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell updated HBASE-26844:

Fix Version/s: 2.7.0

> Fix flaky TestBasicWALEntryStreamFSHLog.testSizeOfLogQueue
> --
>
> Key: HBASE-26844
> URL: https://issues.apache.org/jira/browse/HBASE-26844
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.5.0, 2.4.12
>Reporter: Xiaolin Ha
>Assignee: Xiaolin Ha
>Priority: Minor
> Fix For: 2.6.0, 3.0.0-beta-1, 2.7.0, 2.5.8
>
>
> The failed info is described in HBASE-26843.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28221) Introduce regionserver metric for delayed flushes

2023-12-14 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell updated HBASE-28221:

Fix Version/s: 2.7.0

> Introduce regionserver metric for delayed flushes
> -
>
> Key: HBASE-28221
> URL: https://issues.apache.org/jira/browse/HBASE-28221
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.4.17, 2.5.6
>Reporter: Viraj Jasani
>Assignee: Rahul Kumar
>Priority: Major
> Fix For: 2.6.0, 2.4.18, 3.0.0-beta-1, 2.7.0, 2.5.8
>
>
> If compaction is disabled temporarily to allow stabilizing hdfs load, we can 
> forget re-enabling the compaction. This can result into flushes getting 
> delayed for "hbase.hstore.blockingWaitTime" time (90s). While flushes do 
> happen eventually after waiting for max blocking time, it is important to 
> realize that any cluster cannot function well with compaction disabled for 
> significant amount of time.
>  
> We would also block any write requests until region is flushed (90+ sec, by 
> default):
> {code:java}
> 2023-11-27 20:40:52,124 WARN  [,queue=18,port=60020] regionserver.HRegion - 
> Region is too busy due to exceeding memstore size limit.
> org.apache.hadoop.hbase.RegionTooBusyException: Above memstore limit, 
> regionName=table1,1699923733811.4fd5e52e2133df1e347f32c646f23ab4., 
> server=server-1,60020,1699421714454, memstoreSize=1073820928, 
> blockingMemStoreSize=1073741824
>     at 
> org.apache.hadoop.hbase.regionserver.HRegion.checkResources(HRegion.java:4200)
>     at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3264)
>     at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3215)
>     at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:967)
>     at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:895)
>     at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2524)
>     at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:36812)
>     at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2432)
>     at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
>     at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:311)
>     at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:291) 
> {code}
>  
> Delayed flush logs:
> {code:java}
> LOG.warn("{} has too many store files({}); delaying flush up to {} ms",
>   region.getRegionInfo().getEncodedName(), getStoreFileCount(region),
>   this.blockingWaitTime); {code}
> Suggestion: Introduce regionserver metric (MetricsRegionServerSource) for the 
> num of flushes getting delayed due to too many store files.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-26844) Fix flaky TestBasicWALEntryStreamFSHLog.testSizeOfLogQueue

2023-12-14 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell updated HBASE-26844:

Fix Version/s: 2.5.8
   (was: 2.5.7)

> Fix flaky TestBasicWALEntryStreamFSHLog.testSizeOfLogQueue
> --
>
> Key: HBASE-26844
> URL: https://issues.apache.org/jira/browse/HBASE-26844
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.5.0, 2.4.12
>Reporter: Xiaolin Ha
>Assignee: Xiaolin Ha
>Priority: Minor
> Fix For: 2.6.0, 3.0.0-beta-1, 2.5.8
>
>
> The failed info is described in HBASE-26843.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-27022) SFT seems to apparently tracking invalid/malformed store files

2023-12-14 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell updated HBASE-27022:

Fix Version/s: 2.7.0
   2.5.8
   (was: 2.5.7)

> SFT seems to apparently tracking invalid/malformed store files
> --
>
> Key: HBASE-27022
> URL: https://issues.apache.org/jira/browse/HBASE-27022
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Wellington Chevreuil
>Priority: Minor
> Fix For: 2.6.0, 3.0.0-beta-1, 2.7.0, 2.5.8
>
>
> Opening this on behalf of [~apurtell] , who first reported this issue on 
> HBASE-26999: When running scale tests using ITLCC, the following errors were 
> observed:
> {noformat}
> [00]2022-05-05 15:59:52,280 WARN [region-location-0] 
> regionserver.StoreFileInfo:
> Skipping 
> hdfs://ip-172-31-58-47.us-west-2.compute.internal:8020/hbase/data/default/IntegrationTestLoadCommonCrawl/9eafc10e1b5a25532a4f0adf550828fc/c/9d07757144a7404fac02e161b5bd035e
> because it is empty. HBASE-646 DATA LOSS?
> ...
> [00]2022-05-05 15:59:52,320 WARN [region-location-2] 
> regionserver.StoreFileInfo: 
> Skipping 
> hdfs://ip-172-31-58-47.us-west-2.compute.internal:8020/hbase/data/default/IntegrationTestLoadCommonCrawl/5322c54b9a899eae03cb16e956a836d5/c/184b4f55ab1a4dbc813e77aeae1343ae
>  
> because it is empty. HBASE-646 DATA LOSS? {noformat}
>  
> From some discussions in HBASE-26999, it seems that SFT has wrongly tracked 
> an incomplete/unfinished store file. 
> For further context, follow the [comments thread on 
> HBASE-26999|https://issues.apache.org/jira/browse/HBASE-26999?focusedCommentId=17533508=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17533508].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28158) Decouple RIT list management from TRSP invocation

2023-12-14 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell updated HBASE-28158:

Status: Open  (was: Patch Available)

Cancelling patch. PR needs attention. Will get to it after the holidays.

> Decouple RIT list management from TRSP invocation
> -
>
> Key: HBASE-28158
> URL: https://issues.apache.org/jira/browse/HBASE-28158
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.5.6
>Reporter: Andrew Kyle Purtell
>Assignee: Andrew Kyle Purtell
>Priority: Major
> Fix For: 2.6.0, 3.0.0-beta-1, 4.0.0-alpha-1, 2.5.8
>
>
> Operators bypassed some in progress TRSPs leading to a state where some 
> regions were persistently in transition but hidden. Because the master builds 
> its list of regions in transition by tracking TRSP, the bypass of TRSP 
> removed the regions from the RIT list. 
> Although I can see from reading the code this is the expected behavior, it is 
> surprising for operators and should be changed. Operators expect that regions 
> that should be open but are not appear the master's RIT list, provided by 
> /rits.jsp, the output of the shell's 'rit' command, and in ClusterStatus.
> We should only remove a region from the RIT map when assignment reaches a 
> suitable terminal state.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28221) Introduce regionserver metric for delayed flushes

2023-12-14 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell updated HBASE-28221:

Fix Version/s: 2.5.8
   (was: 2.5.7)

> Introduce regionserver metric for delayed flushes
> -
>
> Key: HBASE-28221
> URL: https://issues.apache.org/jira/browse/HBASE-28221
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.4.17, 2.5.6
>Reporter: Viraj Jasani
>Assignee: Rahul Kumar
>Priority: Major
> Fix For: 2.6.0, 2.4.18, 3.0.0-beta-1, 2.5.8
>
>
> If compaction is disabled temporarily to allow stabilizing hdfs load, we can 
> forget re-enabling the compaction. This can result into flushes getting 
> delayed for "hbase.hstore.blockingWaitTime" time (90s). While flushes do 
> happen eventually after waiting for max blocking time, it is important to 
> realize that any cluster cannot function well with compaction disabled for 
> significant amount of time.
>  
> We would also block any write requests until region is flushed (90+ sec, by 
> default):
> {code:java}
> 2023-11-27 20:40:52,124 WARN  [,queue=18,port=60020] regionserver.HRegion - 
> Region is too busy due to exceeding memstore size limit.
> org.apache.hadoop.hbase.RegionTooBusyException: Above memstore limit, 
> regionName=table1,1699923733811.4fd5e52e2133df1e347f32c646f23ab4., 
> server=server-1,60020,1699421714454, memstoreSize=1073820928, 
> blockingMemStoreSize=1073741824
>     at 
> org.apache.hadoop.hbase.regionserver.HRegion.checkResources(HRegion.java:4200)
>     at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3264)
>     at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3215)
>     at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:967)
>     at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:895)
>     at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2524)
>     at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:36812)
>     at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2432)
>     at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
>     at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:311)
>     at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:291) 
> {code}
>  
> Delayed flush logs:
> {code:java}
> LOG.warn("{} has too many store files({}); delaying flush up to {} ms",
>   region.getRegionInfo().getEncodedName(), getStoreFileCount(region),
>   this.blockingWaitTime); {code}
> Suggestion: Introduce regionserver metric (MetricsRegionServerSource) for the 
> num of flushes getting delayed due to too many store files.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


  1   2   3   4   5   6   7   8   9   10   >