[jira] [Commented] (HBASE-25692) Failure to instantiate WALCellCodec leaks socket in replication

2021-04-15 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17322510#comment-17322510
 ] 

Hudson commented on HBASE-25692:


Results for branch branch-2.2
[build #205 on 
builds.a.o|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.2/205/]:
 (x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.2/205//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.2/205//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.2/205//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(x) {color:red}-1 client integration test{color}
--Failed when running client tests on top of Hadoop 2. [see log for 
details|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.2/205//artifact/output-integration/hadoop-2.log].
 (note that this means we didn't run on Hadoop 3)


> Failure to instantiate WALCellCodec leaks socket in replication
> ---
>
> Key: HBASE-25692
> URL: https://issues.apache.org/jira/browse/HBASE-25692
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 2.1.0, 2.2.0, 2.1.1, 2.1.2, 2.1.3, 2.3.0, 2.3.1, 2.1.4, 
> 2.0.6, 2.1.5, 2.2.1, 2.1.6, 2.1.7, 2.2.2, 2.1.8, 2.2.3, 2.3.3, 2.1.9, 2.2.4, 
> 2.4.0, 2.2.5, 2.2.6, 2.3.2, 2.3.4, 2.4.1, 2.4.2
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.2.7, 2.5.0, 2.4.3, 2.3.6
>
>
> I was looking at an HBase user's cluster with [~danilocop] where they saw two 
> otherwise identical clusters where one of them was regularly had sockets in 
> CLOSE_WAIT going from RegionServers to a distributed storage appliance.
> After a lot of analysis, we eventually figured out that these sockets in 
> CLOSE_WAIT were directly related to an FSDataInputStream which we forgot to 
> close inside of the RegionServer. The subtlety was that only one of these 
> HBase clusters was set up to do replication (to the other cluster). The HBase 
> cluster experiencing this problem was shipping edits to a peer, and had 
> previously been using Phoenix. At some point, the cluster had Phoenix removed 
> from it.
> What we found was that replication still had WALs to ship which were for 
> Phoenix tables. Phoenix, in this version, still used the custom WALCellCodec; 
> however, this codec class was missing from the RS classpath after the owner 
> of the cluster removed Phoenix.
> When we try to instantiate the Codec implementation via ReflectionUtils, we 
> end up throwing an UnsupportedOperationException which wraps a 
> NoClassDefFoundException. However, in WALFactory, we _only_ close the 
> FSDataInputStream when we catch an IOException. 
> Thus, replication sits in a "fast" loop, trying to ship these edits, each 
> time leaking a new socket because of the InputStream not being closed. There 
> is an obvious workaround for this specific issue, but we should not leak this 
> inside HBase.
> Approximate, 2.1.x stack trace which lead us to this is below.
> {noformat}
> 2021-03-11 18:19:20,364 ERROR 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALReader: 
> Failed to read stream of replication entries
> java.io.IOException: Cannot get log reader
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:366)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:303)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:291)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:427)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.openReader(WALEntryStream.java:354)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.openNextLog(WALEntryStream.java:302)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.checkReader(WALEntryStream.java:293)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.tryAdvanceEntry(WALEntryStream.java:174)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.hasNext(WALEntryStream.java:100)
>   at 
> 

[jira] [Commented] (HBASE-25692) Failure to instantiate WALCellCodec leaks socket in replication

2021-04-03 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17314342#comment-17314342
 ] 

Hudson commented on HBASE-25692:


Results for branch branch-1
[build #108 on 
builds.a.o|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-1/108/]:
 (x) *{color:red}-1 overall{color}*

details (if available):

(x) {color:red}-1 general checks{color}
-- For more information [see general 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-1/108//General_Nightly_Build_Report/]


(x) {color:red}-1 jdk7 checks{color}
-- For more information [see jdk7 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-1/108//JDK7_Nightly_Build_Report/]


(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-1/108//JDK8_Nightly_Build_Report_(Hadoop2)/]




(x) {color:red}-1 source release artifact{color}
-- See build output for details.


> Failure to instantiate WALCellCodec leaks socket in replication
> ---
>
> Key: HBASE-25692
> URL: https://issues.apache.org/jira/browse/HBASE-25692
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 2.1.0, 2.2.0, 2.1.1, 2.1.2, 2.1.3, 2.3.0, 2.3.1, 2.1.4, 
> 2.0.6, 2.1.5, 2.2.1, 2.1.6, 2.1.7, 2.2.2, 2.1.8, 2.2.3, 2.3.3, 2.1.9, 2.2.4, 
> 2.4.0, 2.2.5, 2.2.6, 2.3.2, 2.3.4, 2.4.1, 2.4.2
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.2.7, 2.5.0, 2.4.3, 2.3.6
>
>
> I was looking at an HBase user's cluster with [~danilocop] where they saw two 
> otherwise identical clusters where one of them was regularly had sockets in 
> CLOSE_WAIT going from RegionServers to a distributed storage appliance.
> After a lot of analysis, we eventually figured out that these sockets in 
> CLOSE_WAIT were directly related to an FSDataInputStream which we forgot to 
> close inside of the RegionServer. The subtlety was that only one of these 
> HBase clusters was set up to do replication (to the other cluster). The HBase 
> cluster experiencing this problem was shipping edits to a peer, and had 
> previously been using Phoenix. At some point, the cluster had Phoenix removed 
> from it.
> What we found was that replication still had WALs to ship which were for 
> Phoenix tables. Phoenix, in this version, still used the custom WALCellCodec; 
> however, this codec class was missing from the RS classpath after the owner 
> of the cluster removed Phoenix.
> When we try to instantiate the Codec implementation via ReflectionUtils, we 
> end up throwing an UnsupportedOperationException which wraps a 
> NoClassDefFoundException. However, in WALFactory, we _only_ close the 
> FSDataInputStream when we catch an IOException. 
> Thus, replication sits in a "fast" loop, trying to ship these edits, each 
> time leaking a new socket because of the InputStream not being closed. There 
> is an obvious workaround for this specific issue, but we should not leak this 
> inside HBase.
> Approximate, 2.1.x stack trace which lead us to this is below.
> {noformat}
> 2021-03-11 18:19:20,364 ERROR 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALReader: 
> Failed to read stream of replication entries
> java.io.IOException: Cannot get log reader
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:366)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:303)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:291)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:427)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.openReader(WALEntryStream.java:354)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.openNextLog(WALEntryStream.java:302)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.checkReader(WALEntryStream.java:293)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.tryAdvanceEntry(WALEntryStream.java:174)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.hasNext(WALEntryStream.java:100)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALReader.readWALEntries(ReplicationSourceWALReader.java:192)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALReader.run(ReplicationSourceWALReader.java:138)
> Caused by: java.lang.UnsupportedOperationException: Unable to find 
> org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec
>   at 
> 

[jira] [Commented] (HBASE-25692) Failure to instantiate WALCellCodec leaks socket in replication

2021-03-31 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1731#comment-1731
 ] 

Hudson commented on HBASE-25692:


Results for branch branch-2.4
[build #85 on 
builds.a.o|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.4/85/]:
 (x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.4/85/General_20Nightly_20Build_20Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.4/85/JDK8_20Nightly_20Build_20Report_20_28Hadoop2_29/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.4/85/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.4/85/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Failure to instantiate WALCellCodec leaks socket in replication
> ---
>
> Key: HBASE-25692
> URL: https://issues.apache.org/jira/browse/HBASE-25692
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 2.1.0, 2.2.0, 2.1.1, 2.1.2, 2.1.3, 2.3.0, 2.3.1, 2.1.4, 
> 2.0.6, 2.1.5, 2.2.1, 2.1.6, 2.1.7, 2.2.2, 2.1.8, 2.2.3, 2.3.3, 2.1.9, 2.2.4, 
> 2.4.0, 2.2.5, 2.2.6, 2.3.2, 2.3.4, 2.4.1, 2.4.2
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.2.7, 2.5.0, 2.4.3, 2.3.6
>
>
> I was looking at an HBase user's cluster with [~danilocop] where they saw two 
> otherwise identical clusters where one of them was regularly had sockets in 
> CLOSE_WAIT going from RegionServers to a distributed storage appliance.
> After a lot of analysis, we eventually figured out that these sockets in 
> CLOSE_WAIT were directly related to an FSDataInputStream which we forgot to 
> close inside of the RegionServer. The subtlety was that only one of these 
> HBase clusters was set up to do replication (to the other cluster). The HBase 
> cluster experiencing this problem was shipping edits to a peer, and had 
> previously been using Phoenix. At some point, the cluster had Phoenix removed 
> from it.
> What we found was that replication still had WALs to ship which were for 
> Phoenix tables. Phoenix, in this version, still used the custom WALCellCodec; 
> however, this codec class was missing from the RS classpath after the owner 
> of the cluster removed Phoenix.
> When we try to instantiate the Codec implementation via ReflectionUtils, we 
> end up throwing an UnsupportedOperationException which wraps a 
> NoClassDefFoundException. However, in WALFactory, we _only_ close the 
> FSDataInputStream when we catch an IOException. 
> Thus, replication sits in a "fast" loop, trying to ship these edits, each 
> time leaking a new socket because of the InputStream not being closed. There 
> is an obvious workaround for this specific issue, but we should not leak this 
> inside HBase.
> Approximate, 2.1.x stack trace which lead us to this is below.
> {noformat}
> 2021-03-11 18:19:20,364 ERROR 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALReader: 
> Failed to read stream of replication entries
> java.io.IOException: Cannot get log reader
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:366)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:303)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:291)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:427)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.openReader(WALEntryStream.java:354)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.openNextLog(WALEntryStream.java:302)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.checkReader(WALEntryStream.java:293)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.tryAdvanceEntry(WALEntryStream.java:174)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.hasNext(WALEntryStream.java:100)
>   at 
> 

[jira] [Commented] (HBASE-25692) Failure to instantiate WALCellCodec leaks socket in replication

2021-03-30 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17311808#comment-17311808
 ] 

Hudson commented on HBASE-25692:


Results for branch branch-2.2
[build #199 on 
builds.a.o|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.2/199/]:
 (x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.2/199//General_Nightly_Build_Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.2/199//JDK8_Nightly_Build_Report_(Hadoop2)/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.2/199//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(x) {color:red}-1 client integration test{color}
--Failed when running client tests on top of Hadoop 2. [see log for 
details|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.2/199//artifact/output-integration/hadoop-2.log].
 (note that this means we didn't run on Hadoop 3)


> Failure to instantiate WALCellCodec leaks socket in replication
> ---
>
> Key: HBASE-25692
> URL: https://issues.apache.org/jira/browse/HBASE-25692
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 2.1.0, 2.2.0, 2.1.1, 2.1.2, 2.1.3, 2.3.0, 2.3.1, 2.1.4, 
> 2.0.6, 2.1.5, 2.2.1, 2.1.6, 2.1.7, 2.2.2, 2.1.8, 2.2.3, 2.3.3, 2.1.9, 2.2.4, 
> 2.4.0, 2.2.5, 2.2.6, 2.3.2, 2.3.4, 2.4.1, 2.4.2
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.2.7, 2.5.0, 2.4.3, 2.3.6
>
>
> I was looking at an HBase user's cluster with [~danilocop] where they saw two 
> otherwise identical clusters where one of them was regularly had sockets in 
> CLOSE_WAIT going from RegionServers to a distributed storage appliance.
> After a lot of analysis, we eventually figured out that these sockets in 
> CLOSE_WAIT were directly related to an FSDataInputStream which we forgot to 
> close inside of the RegionServer. The subtlety was that only one of these 
> HBase clusters was set up to do replication (to the other cluster). The HBase 
> cluster experiencing this problem was shipping edits to a peer, and had 
> previously been using Phoenix. At some point, the cluster had Phoenix removed 
> from it.
> What we found was that replication still had WALs to ship which were for 
> Phoenix tables. Phoenix, in this version, still used the custom WALCellCodec; 
> however, this codec class was missing from the RS classpath after the owner 
> of the cluster removed Phoenix.
> When we try to instantiate the Codec implementation via ReflectionUtils, we 
> end up throwing an UnsupportedOperationException which wraps a 
> NoClassDefFoundException. However, in WALFactory, we _only_ close the 
> FSDataInputStream when we catch an IOException. 
> Thus, replication sits in a "fast" loop, trying to ship these edits, each 
> time leaking a new socket because of the InputStream not being closed. There 
> is an obvious workaround for this specific issue, but we should not leak this 
> inside HBase.
> Approximate, 2.1.x stack trace which lead us to this is below.
> {noformat}
> 2021-03-11 18:19:20,364 ERROR 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALReader: 
> Failed to read stream of replication entries
> java.io.IOException: Cannot get log reader
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:366)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:303)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:291)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:427)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.openReader(WALEntryStream.java:354)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.openNextLog(WALEntryStream.java:302)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.checkReader(WALEntryStream.java:293)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.tryAdvanceEntry(WALEntryStream.java:174)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.hasNext(WALEntryStream.java:100)
>   at 
> 

[jira] [Commented] (HBASE-25692) Failure to instantiate WALCellCodec leaks socket in replication

2021-03-30 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17311713#comment-17311713
 ] 

Hudson commented on HBASE-25692:


Results for branch branch-2
[build #214 on 
builds.a.o|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/214/]:
 (x) *{color:red}-1 overall{color}*

details (if available):

(x) {color:red}-1 general checks{color}
-- For more information [see general 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/214/General_20Nightly_20Build_20Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/214/JDK8_20Nightly_20Build_20Report_20_28Hadoop2_29/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/214/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/214/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Failure to instantiate WALCellCodec leaks socket in replication
> ---
>
> Key: HBASE-25692
> URL: https://issues.apache.org/jira/browse/HBASE-25692
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 2.1.0, 2.2.0, 2.1.1, 2.1.2, 2.1.3, 2.3.0, 2.3.1, 2.1.4, 
> 2.0.6, 2.1.5, 2.2.1, 2.1.6, 2.1.7, 2.2.2, 2.1.8, 2.2.3, 2.3.3, 2.1.9, 2.2.4, 
> 2.4.0, 2.2.5, 2.2.6, 2.3.2, 2.3.4, 2.4.1, 2.4.2
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.2.7, 2.5.0, 2.4.3, 2.3.6
>
>
> I was looking at an HBase user's cluster with [~danilocop] where they saw two 
> otherwise identical clusters where one of them was regularly had sockets in 
> CLOSE_WAIT going from RegionServers to a distributed storage appliance.
> After a lot of analysis, we eventually figured out that these sockets in 
> CLOSE_WAIT were directly related to an FSDataInputStream which we forgot to 
> close inside of the RegionServer. The subtlety was that only one of these 
> HBase clusters was set up to do replication (to the other cluster). The HBase 
> cluster experiencing this problem was shipping edits to a peer, and had 
> previously been using Phoenix. At some point, the cluster had Phoenix removed 
> from it.
> What we found was that replication still had WALs to ship which were for 
> Phoenix tables. Phoenix, in this version, still used the custom WALCellCodec; 
> however, this codec class was missing from the RS classpath after the owner 
> of the cluster removed Phoenix.
> When we try to instantiate the Codec implementation via ReflectionUtils, we 
> end up throwing an UnsupportedOperationException which wraps a 
> NoClassDefFoundException. However, in WALFactory, we _only_ close the 
> FSDataInputStream when we catch an IOException. 
> Thus, replication sits in a "fast" loop, trying to ship these edits, each 
> time leaking a new socket because of the InputStream not being closed. There 
> is an obvious workaround for this specific issue, but we should not leak this 
> inside HBase.
> Approximate, 2.1.x stack trace which lead us to this is below.
> {noformat}
> 2021-03-11 18:19:20,364 ERROR 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALReader: 
> Failed to read stream of replication entries
> java.io.IOException: Cannot get log reader
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:366)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:303)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:291)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:427)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.openReader(WALEntryStream.java:354)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.openNextLog(WALEntryStream.java:302)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.checkReader(WALEntryStream.java:293)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.tryAdvanceEntry(WALEntryStream.java:174)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.hasNext(WALEntryStream.java:100)
>   at 
> 

[jira] [Commented] (HBASE-25692) Failure to instantiate WALCellCodec leaks socket in replication

2021-03-30 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17311666#comment-17311666
 ] 

Hudson commented on HBASE-25692:


Results for branch branch-2.3
[build #197 on 
builds.a.o|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.3/197/]:
 (/) *{color:green}+1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.3/197/General_20Nightly_20Build_20Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.3/197/JDK8_20Nightly_20Build_20Report_20_28Hadoop2_29/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.3/197/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.3/197/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Failure to instantiate WALCellCodec leaks socket in replication
> ---
>
> Key: HBASE-25692
> URL: https://issues.apache.org/jira/browse/HBASE-25692
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 2.1.0, 2.2.0, 2.1.1, 2.1.2, 2.1.3, 2.3.0, 2.3.1, 2.1.4, 
> 2.0.6, 2.1.5, 2.2.1, 2.1.6, 2.1.7, 2.2.2, 2.1.8, 2.2.3, 2.3.3, 2.1.9, 2.2.4, 
> 2.4.0, 2.2.5, 2.2.6, 2.3.2, 2.3.4, 2.4.1, 2.4.2
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.2.7, 2.5.0, 2.4.3, 2.3.6
>
>
> I was looking at an HBase user's cluster with [~danilocop] where they saw two 
> otherwise identical clusters where one of them was regularly had sockets in 
> CLOSE_WAIT going from RegionServers to a distributed storage appliance.
> After a lot of analysis, we eventually figured out that these sockets in 
> CLOSE_WAIT were directly related to an FSDataInputStream which we forgot to 
> close inside of the RegionServer. The subtlety was that only one of these 
> HBase clusters was set up to do replication (to the other cluster). The HBase 
> cluster experiencing this problem was shipping edits to a peer, and had 
> previously been using Phoenix. At some point, the cluster had Phoenix removed 
> from it.
> What we found was that replication still had WALs to ship which were for 
> Phoenix tables. Phoenix, in this version, still used the custom WALCellCodec; 
> however, this codec class was missing from the RS classpath after the owner 
> of the cluster removed Phoenix.
> When we try to instantiate the Codec implementation via ReflectionUtils, we 
> end up throwing an UnsupportedOperationException which wraps a 
> NoClassDefFoundException. However, in WALFactory, we _only_ close the 
> FSDataInputStream when we catch an IOException. 
> Thus, replication sits in a "fast" loop, trying to ship these edits, each 
> time leaking a new socket because of the InputStream not being closed. There 
> is an obvious workaround for this specific issue, but we should not leak this 
> inside HBase.
> Approximate, 2.1.x stack trace which lead us to this is below.
> {noformat}
> 2021-03-11 18:19:20,364 ERROR 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALReader: 
> Failed to read stream of replication entries
> java.io.IOException: Cannot get log reader
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:366)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:303)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:291)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:427)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.openReader(WALEntryStream.java:354)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.openNextLog(WALEntryStream.java:302)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.checkReader(WALEntryStream.java:293)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.tryAdvanceEntry(WALEntryStream.java:174)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.hasNext(WALEntryStream.java:100)
>   at 
> 

[jira] [Commented] (HBASE-25692) Failure to instantiate WALCellCodec leaks socket in replication

2021-03-30 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17311372#comment-17311372
 ] 

Hudson commented on HBASE-25692:


Results for branch master
[build #250 on 
builds.a.o|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/master/250/]:
 (x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/master/250/General_20Nightly_20Build_20Report/]






(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/master/250/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/master/250/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Failure to instantiate WALCellCodec leaks socket in replication
> ---
>
> Key: HBASE-25692
> URL: https://issues.apache.org/jira/browse/HBASE-25692
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 2.1.0, 2.2.0, 2.1.1, 2.1.2, 2.1.3, 2.3.0, 2.3.1, 2.1.4, 
> 2.0.6, 2.1.5, 2.2.1, 2.1.6, 2.1.7, 2.2.2, 2.1.8, 2.2.3, 2.3.3, 2.1.9, 2.2.4, 
> 2.4.0, 2.2.5, 2.2.6, 2.3.2, 2.3.4, 2.4.1, 2.4.2
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.2.7, 2.5.0, 2.4.3, 2.3.6
>
>
> I was looking at an HBase user's cluster with [~danilocop] where they saw two 
> otherwise identical clusters where one of them was regularly had sockets in 
> CLOSE_WAIT going from RegionServers to a distributed storage appliance.
> After a lot of analysis, we eventually figured out that these sockets in 
> CLOSE_WAIT were directly related to an FSDataInputStream which we forgot to 
> close inside of the RegionServer. The subtlety was that only one of these 
> HBase clusters was set up to do replication (to the other cluster). The HBase 
> cluster experiencing this problem was shipping edits to a peer, and had 
> previously been using Phoenix. At some point, the cluster had Phoenix removed 
> from it.
> What we found was that replication still had WALs to ship which were for 
> Phoenix tables. Phoenix, in this version, still used the custom WALCellCodec; 
> however, this codec class was missing from the RS classpath after the owner 
> of the cluster removed Phoenix.
> When we try to instantiate the Codec implementation via ReflectionUtils, we 
> end up throwing an UnsupportedOperationException which wraps a 
> NoClassDefFoundException. However, in WALFactory, we _only_ close the 
> FSDataInputStream when we catch an IOException. 
> Thus, replication sits in a "fast" loop, trying to ship these edits, each 
> time leaking a new socket because of the InputStream not being closed. There 
> is an obvious workaround for this specific issue, but we should not leak this 
> inside HBase.
> Approximate, 2.1.x stack trace which lead us to this is below.
> {noformat}
> 2021-03-11 18:19:20,364 ERROR 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALReader: 
> Failed to read stream of replication entries
> java.io.IOException: Cannot get log reader
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:366)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:303)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:291)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:427)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.openReader(WALEntryStream.java:354)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.openNextLog(WALEntryStream.java:302)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.checkReader(WALEntryStream.java:293)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.tryAdvanceEntry(WALEntryStream.java:174)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.hasNext(WALEntryStream.java:100)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALReader.readWALEntries(ReplicationSourceWALReader.java:192)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALReader.run(ReplicationSourceWALReader.java:138)
> Caused by: java.lang.UnsupportedOperationException: Unable to find 
> 

[jira] [Commented] (HBASE-25692) Failure to instantiate WALCellCodec leaks socket in replication

2021-03-24 Thread Josh Elser (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17307996#comment-17307996
 ] 

Josh Elser commented on HBASE-25692:


I've verified that all 2.x and master branches have this problem. I'll double 
check what's going on with 1.x when we have a fix that can be cherry-picked.

> Failure to instantiate WALCellCodec leaks socket in replication
> ---
>
> Key: HBASE-25692
> URL: https://issues.apache.org/jira/browse/HBASE-25692
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 2.1.0, 2.2.0, 2.1.1, 2.1.2, 2.1.3, 2.3.0, 2.3.1, 2.1.4, 
> 2.0.6, 2.1.5, 2.2.1, 2.1.6, 2.1.7, 2.2.2, 2.1.8, 2.2.3, 2.3.3, 2.1.9, 2.2.4, 
> 2.4.0, 2.2.5, 2.2.6, 2.3.2, 2.3.4, 2.4.1, 2.4.2
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Major
>
> I was looking at an HBase user's cluster with [~danilocop] where they saw two 
> otherwise identical clusters where one of them was regularly had sockets in 
> CLOSE_WAIT going from RegionServers to a distributed storage appliance.
> After a lot of analysis, we eventually figured out that these sockets in 
> CLOSE_WAIT were directly related to an FSDataInputStream which we forgot to 
> close inside of the RegionServer. The subtlety was that only one of these 
> HBase clusters was set up to do replication (to the other cluster). The HBase 
> cluster experiencing this problem was shipping edits to a peer, and had 
> previously been using Phoenix. At some point, the cluster had Phoenix removed 
> from it.
> What we found was that replication still had WALs to ship which were for 
> Phoenix tables. Phoenix, in this version, still used the custom WALCellCodec; 
> however, this codec class was missing from the RS classpath after the owner 
> of the cluster removed Phoenix.
> When we try to instantiate the Codec implementation via ReflectionUtils, we 
> end up throwing an UnsupportedOperationException which wraps a 
> NoClassDefFoundException. However, in WALFactory, we _only_ close the 
> FSDataInputStream when we catch an IOException. 
> Thus, replication sits in a "fast" loop, trying to ship these edits, each 
> time leaking a new socket because of the InputStream not being closed. There 
> is an obvious workaround for this specific issue, but we should not leak this 
> inside HBase.
> Approximate, 2.1.x stack trace which lead us to this is below.
> {noformat}
> 2021-03-11 18:19:20,364 ERROR 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALReader: 
> Failed to read stream of replication entries
> java.io.IOException: Cannot get log reader
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:366)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:303)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:291)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:427)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.openReader(WALEntryStream.java:354)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.openNextLog(WALEntryStream.java:302)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.checkReader(WALEntryStream.java:293)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.tryAdvanceEntry(WALEntryStream.java:174)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.hasNext(WALEntryStream.java:100)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALReader.readWALEntries(ReplicationSourceWALReader.java:192)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALReader.run(ReplicationSourceWALReader.java:138)
> Caused by: java.lang.UnsupportedOperationException: Unable to find 
> org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec
>   at 
> org.apache.hadoop.hbase.util.ReflectionUtils.instantiateWithCustomCtor(ReflectionUtils.java:47)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.WALCellCodec.create(WALCellCodec.java:106)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.getCodec(ProtobufLogReader.java:301)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.initAfterCompression(ProtobufLogReader.java:311)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.init(ReaderBase.java:81)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.init(ProtobufLogReader.java:168)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:321)
>   ... 10 more
> Caused by: java.lang.ClassNotFoundException: 
>