[jira] [Commented] (HBASE-25924) Seeing a spike in uncleanlyClosedWALs metric.
[ https://issues.apache.org/jira/browse/HBASE-25924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17361335#comment-17361335 ] Hudson commented on HBASE-25924: Results for branch branch-2.3 [build #235 on builds.a.o|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.3/235/]: (/) *{color:green}+1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.3/235/General_20Nightly_20Build_20Report/] (/) {color:green}+1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.3/235/JDK8_20Nightly_20Build_20Report_20_28Hadoop2_29/] (/) {color:green}+1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.3/235/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/] (/) {color:green}+1 jdk11 hadoop3 checks{color} -- For more information [see jdk11 report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.3/235/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Seeing a spike in uncleanlyClosedWALs metric. > - > > Key: HBASE-25924 > URL: https://issues.apache.org/jira/browse/HBASE-25924 > Project: HBase > Issue Type: Bug > Components: Replication, wal >Affects Versions: 3.0.0-alpha-1, 1.7.0, 2.5.0, 2.4.4 >Reporter: Rushabh Shah >Assignee: Rushabh Shah >Priority: Major > Fix For: 3.0.0-alpha-1, 2.5.0, 2.4.4, 1.7.1 > > > Getting the following log line in all of our production clusters when > WALEntryStream is dequeuing WAL file. > {noformat} > 2021-05-02 04:01:30,437 DEBUG [04901996] regionserver.WALEntryStream - > Reached the end of WAL file hdfs://. It was not closed > cleanly, so we did not parse 8 bytes of data. This is normally ok. > {noformat} > The 8 bytes are usually the trailer serialized size (SIZE_OF_INT (4bytes) + > "LAWP" (4 bytes) = 8 bytes) > While dequeue'ing the WAL file from WALEntryStream, we reset the reader here. > [WALEntryStream|https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/WALEntryStream.java#L199-L221] > {code:java} > private void tryAdvanceEntry() throws IOException { > if (checkReader()) { > readNextEntryAndSetPosition(); > if (currentEntry == null) { // no more entries in this log file - see > if log was rolled > if (logQueue.getQueue(walGroupId).size() > 1) { // log was rolled > // Before dequeueing, we should always get one more attempt at > reading. > // This is in case more entries came in after we opened the reader, > // and a new log was enqueued while we were reading. See HBASE-6758 > resetReader(); ---> HERE > readNextEntryAndSetPosition(); > if (currentEntry == null) { > if (checkAllBytesParsed()) { // now we're certain we're done with > this log file > dequeueCurrentLog(); > if (openNextLog()) { > readNextEntryAndSetPosition(); > } > } > } > } // no other logs, we've simply hit the end of the current open log. > Do nothing > } > } > // do nothing if we don't have a WAL Reader (e.g. if there's no logs in > queue) > } > {code} > In resetReader, we call the following methods, WALEntryStream#resetReader > > ProtobufLogReader#reset ---> ProtobufLogReader#initInternal. > In ProtobufLogReader#initInternal, we try to create the whole reader object > from scratch to see if any new data has been written. > We reset all the fields of ProtobufLogReader except for ReaderBase#fileLength. > We calculate whether trailer is present or not depending on fileLength. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-25924) Seeing a spike in uncleanlyClosedWALs metric.
[ https://issues.apache.org/jira/browse/HBASE-25924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17361205#comment-17361205 ] Rushabh Shah commented on HBASE-25924: -- The branches have diverged a lot. It is not trivial work to backport them easily. Unfortunately don't have cycles in few days. Will get back to it later. > Seeing a spike in uncleanlyClosedWALs metric. > - > > Key: HBASE-25924 > URL: https://issues.apache.org/jira/browse/HBASE-25924 > Project: HBase > Issue Type: Bug > Components: Replication, wal >Affects Versions: 3.0.0-alpha-1, 1.7.0, 2.5.0, 2.4.4 >Reporter: Rushabh Shah >Assignee: Rushabh Shah >Priority: Major > Fix For: 3.0.0-alpha-1, 2.5.0, 2.4.4, 1.7.1 > > > Getting the following log line in all of our production clusters when > WALEntryStream is dequeuing WAL file. > {noformat} > 2021-05-02 04:01:30,437 DEBUG [04901996] regionserver.WALEntryStream - > Reached the end of WAL file hdfs://. It was not closed > cleanly, so we did not parse 8 bytes of data. This is normally ok. > {noformat} > The 8 bytes are usually the trailer serialized size (SIZE_OF_INT (4bytes) + > "LAWP" (4 bytes) = 8 bytes) > While dequeue'ing the WAL file from WALEntryStream, we reset the reader here. > [WALEntryStream|https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/WALEntryStream.java#L199-L221] > {code:java} > private void tryAdvanceEntry() throws IOException { > if (checkReader()) { > readNextEntryAndSetPosition(); > if (currentEntry == null) { // no more entries in this log file - see > if log was rolled > if (logQueue.getQueue(walGroupId).size() > 1) { // log was rolled > // Before dequeueing, we should always get one more attempt at > reading. > // This is in case more entries came in after we opened the reader, > // and a new log was enqueued while we were reading. See HBASE-6758 > resetReader(); ---> HERE > readNextEntryAndSetPosition(); > if (currentEntry == null) { > if (checkAllBytesParsed()) { // now we're certain we're done with > this log file > dequeueCurrentLog(); > if (openNextLog()) { > readNextEntryAndSetPosition(); > } > } > } > } // no other logs, we've simply hit the end of the current open log. > Do nothing > } > } > // do nothing if we don't have a WAL Reader (e.g. if there's no logs in > queue) > } > {code} > In resetReader, we call the following methods, WALEntryStream#resetReader > > ProtobufLogReader#reset ---> ProtobufLogReader#initInternal. > In ProtobufLogReader#initInternal, we try to create the whole reader object > from scratch to see if any new data has been written. > We reset all the fields of ProtobufLogReader except for ReaderBase#fileLength. > We calculate whether trailer is present or not depending on fileLength. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-25924) Seeing a spike in uncleanlyClosedWALs metric.
[ https://issues.apache.org/jira/browse/HBASE-25924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17361144#comment-17361144 ] Bharath Vissapragada commented on HBASE-25924: -- [~shahrs87] I think it was committed and reverted from 2.3, still would be nice to have a working patch in that branch (if you have spare cycles) :-). > Seeing a spike in uncleanlyClosedWALs metric. > - > > Key: HBASE-25924 > URL: https://issues.apache.org/jira/browse/HBASE-25924 > Project: HBase > Issue Type: Bug > Components: Replication, wal >Affects Versions: 3.0.0-alpha-1, 1.7.0, 2.5.0, 2.4.4 >Reporter: Rushabh Shah >Assignee: Rushabh Shah >Priority: Major > Fix For: 3.0.0-alpha-1, 2.5.0, 2.4.4, 1.7.1 > > > Getting the following log line in all of our production clusters when > WALEntryStream is dequeuing WAL file. > {noformat} > 2021-05-02 04:01:30,437 DEBUG [04901996] regionserver.WALEntryStream - > Reached the end of WAL file hdfs://. It was not closed > cleanly, so we did not parse 8 bytes of data. This is normally ok. > {noformat} > The 8 bytes are usually the trailer serialized size (SIZE_OF_INT (4bytes) + > "LAWP" (4 bytes) = 8 bytes) > While dequeue'ing the WAL file from WALEntryStream, we reset the reader here. > [WALEntryStream|https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/WALEntryStream.java#L199-L221] > {code:java} > private void tryAdvanceEntry() throws IOException { > if (checkReader()) { > readNextEntryAndSetPosition(); > if (currentEntry == null) { // no more entries in this log file - see > if log was rolled > if (logQueue.getQueue(walGroupId).size() > 1) { // log was rolled > // Before dequeueing, we should always get one more attempt at > reading. > // This is in case more entries came in after we opened the reader, > // and a new log was enqueued while we were reading. See HBASE-6758 > resetReader(); ---> HERE > readNextEntryAndSetPosition(); > if (currentEntry == null) { > if (checkAllBytesParsed()) { // now we're certain we're done with > this log file > dequeueCurrentLog(); > if (openNextLog()) { > readNextEntryAndSetPosition(); > } > } > } > } // no other logs, we've simply hit the end of the current open log. > Do nothing > } > } > // do nothing if we don't have a WAL Reader (e.g. if there's no logs in > queue) > } > {code} > In resetReader, we call the following methods, WALEntryStream#resetReader > > ProtobufLogReader#reset ---> ProtobufLogReader#initInternal. > In ProtobufLogReader#initInternal, we try to create the whole reader object > from scratch to see if any new data has been written. > We reset all the fields of ProtobufLogReader except for ReaderBase#fileLength. > We calculate whether trailer is present or not depending on fileLength. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-25924) Seeing a spike in uncleanlyClosedWALs metric.
[ https://issues.apache.org/jira/browse/HBASE-25924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17361114#comment-17361114 ] Rushabh Shah commented on HBASE-25924: -- [~psomogyi] Instead of reverting this commit, could we pick HBASE-25932 to branch-2.3 ? If yes, then I can put up a PR quickly. Cc [~bharathv] [~apurtell] > Seeing a spike in uncleanlyClosedWALs metric. > - > > Key: HBASE-25924 > URL: https://issues.apache.org/jira/browse/HBASE-25924 > Project: HBase > Issue Type: Bug > Components: Replication, wal >Affects Versions: 3.0.0-alpha-1, 1.7.0, 2.5.0, 2.4.4 >Reporter: Rushabh Shah >Assignee: Rushabh Shah >Priority: Major > Fix For: 3.0.0-alpha-1, 2.5.0, 2.4.4, 1.7.1 > > > Getting the following log line in all of our production clusters when > WALEntryStream is dequeuing WAL file. > {noformat} > 2021-05-02 04:01:30,437 DEBUG [04901996] regionserver.WALEntryStream - > Reached the end of WAL file hdfs://. It was not closed > cleanly, so we did not parse 8 bytes of data. This is normally ok. > {noformat} > The 8 bytes are usually the trailer serialized size (SIZE_OF_INT (4bytes) + > "LAWP" (4 bytes) = 8 bytes) > While dequeue'ing the WAL file from WALEntryStream, we reset the reader here. > [WALEntryStream|https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/WALEntryStream.java#L199-L221] > {code:java} > private void tryAdvanceEntry() throws IOException { > if (checkReader()) { > readNextEntryAndSetPosition(); > if (currentEntry == null) { // no more entries in this log file - see > if log was rolled > if (logQueue.getQueue(walGroupId).size() > 1) { // log was rolled > // Before dequeueing, we should always get one more attempt at > reading. > // This is in case more entries came in after we opened the reader, > // and a new log was enqueued while we were reading. See HBASE-6758 > resetReader(); ---> HERE > readNextEntryAndSetPosition(); > if (currentEntry == null) { > if (checkAllBytesParsed()) { // now we're certain we're done with > this log file > dequeueCurrentLog(); > if (openNextLog()) { > readNextEntryAndSetPosition(); > } > } > } > } // no other logs, we've simply hit the end of the current open log. > Do nothing > } > } > // do nothing if we don't have a WAL Reader (e.g. if there's no logs in > queue) > } > {code} > In resetReader, we call the following methods, WALEntryStream#resetReader > > ProtobufLogReader#reset ---> ProtobufLogReader#initInternal. > In ProtobufLogReader#initInternal, we try to create the whole reader object > from scratch to see if any new data has been written. > We reset all the fields of ProtobufLogReader except for ReaderBase#fileLength. > We calculate whether trailer is present or not depending on fileLength. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-25924) Seeing a spike in uncleanlyClosedWALs metric.
[ https://issues.apache.org/jira/browse/HBASE-25924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17361105#comment-17361105 ] Peter Somogyi commented on HBASE-25924: --- This commit also landed on branch-2.3 which was not planned based on the previous comments and HBASE-25957 subtask. This commit broke branch-2.3 builds so let me revert the change there. > Seeing a spike in uncleanlyClosedWALs metric. > - > > Key: HBASE-25924 > URL: https://issues.apache.org/jira/browse/HBASE-25924 > Project: HBase > Issue Type: Bug > Components: Replication, wal >Affects Versions: 3.0.0-alpha-1, 1.7.0, 2.5.0, 2.4.4 >Reporter: Rushabh Shah >Assignee: Rushabh Shah >Priority: Major > Fix For: 3.0.0-alpha-1, 2.5.0, 2.4.4, 1.7.1 > > > Getting the following log line in all of our production clusters when > WALEntryStream is dequeuing WAL file. > {noformat} > 2021-05-02 04:01:30,437 DEBUG [04901996] regionserver.WALEntryStream - > Reached the end of WAL file hdfs://. It was not closed > cleanly, so we did not parse 8 bytes of data. This is normally ok. > {noformat} > The 8 bytes are usually the trailer serialized size (SIZE_OF_INT (4bytes) + > "LAWP" (4 bytes) = 8 bytes) > While dequeue'ing the WAL file from WALEntryStream, we reset the reader here. > [WALEntryStream|https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/WALEntryStream.java#L199-L221] > {code:java} > private void tryAdvanceEntry() throws IOException { > if (checkReader()) { > readNextEntryAndSetPosition(); > if (currentEntry == null) { // no more entries in this log file - see > if log was rolled > if (logQueue.getQueue(walGroupId).size() > 1) { // log was rolled > // Before dequeueing, we should always get one more attempt at > reading. > // This is in case more entries came in after we opened the reader, > // and a new log was enqueued while we were reading. See HBASE-6758 > resetReader(); ---> HERE > readNextEntryAndSetPosition(); > if (currentEntry == null) { > if (checkAllBytesParsed()) { // now we're certain we're done with > this log file > dequeueCurrentLog(); > if (openNextLog()) { > readNextEntryAndSetPosition(); > } > } > } > } // no other logs, we've simply hit the end of the current open log. > Do nothing > } > } > // do nothing if we don't have a WAL Reader (e.g. if there's no logs in > queue) > } > {code} > In resetReader, we call the following methods, WALEntryStream#resetReader > > ProtobufLogReader#reset ---> ProtobufLogReader#initInternal. > In ProtobufLogReader#initInternal, we try to create the whole reader object > from scratch to see if any new data has been written. > We reset all the fields of ProtobufLogReader except for ReaderBase#fileLength. > We calculate whether trailer is present or not depending on fileLength. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-25924) Seeing a spike in uncleanlyClosedWALs metric.
[ https://issues.apache.org/jira/browse/HBASE-25924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17355221#comment-17355221 ] Bharath Vissapragada commented on HBASE-25924: -- Thanks, opened HBASE-25957 as a subtask. > Seeing a spike in uncleanlyClosedWALs metric. > - > > Key: HBASE-25924 > URL: https://issues.apache.org/jira/browse/HBASE-25924 > Project: HBase > Issue Type: Bug > Components: Replication, wal >Affects Versions: 3.0.0-alpha-1, 1.7.0, 2.5.0, 2.4.4 >Reporter: Rushabh Shah >Assignee: Rushabh Shah >Priority: Major > Fix For: 3.0.0-alpha-1, 2.5.0, 2.3.6, 2.4.4, 1.7.1 > > > Getting the following log line in all of our production clusters when > WALEntryStream is dequeuing WAL file. > {noformat} > 2021-05-02 04:01:30,437 DEBUG [04901996] regionserver.WALEntryStream - > Reached the end of WAL file hdfs://. It was not closed > cleanly, so we did not parse 8 bytes of data. This is normally ok. > {noformat} > The 8 bytes are usually the trailer serialized size (SIZE_OF_INT (4bytes) + > "LAWP" (4 bytes) = 8 bytes) > While dequeue'ing the WAL file from WALEntryStream, we reset the reader here. > [WALEntryStream|https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/WALEntryStream.java#L199-L221] > {code:java} > private void tryAdvanceEntry() throws IOException { > if (checkReader()) { > readNextEntryAndSetPosition(); > if (currentEntry == null) { // no more entries in this log file - see > if log was rolled > if (logQueue.getQueue(walGroupId).size() > 1) { // log was rolled > // Before dequeueing, we should always get one more attempt at > reading. > // This is in case more entries came in after we opened the reader, > // and a new log was enqueued while we were reading. See HBASE-6758 > resetReader(); ---> HERE > readNextEntryAndSetPosition(); > if (currentEntry == null) { > if (checkAllBytesParsed()) { // now we're certain we're done with > this log file > dequeueCurrentLog(); > if (openNextLog()) { > readNextEntryAndSetPosition(); > } > } > } > } // no other logs, we've simply hit the end of the current open log. > Do nothing > } > } > // do nothing if we don't have a WAL Reader (e.g. if there's no logs in > queue) > } > {code} > In resetReader, we call the following methods, WALEntryStream#resetReader > > ProtobufLogReader#reset ---> ProtobufLogReader#initInternal. > In ProtobufLogReader#initInternal, we try to create the whole reader object > from scratch to see if any new data has been written. > We reset all the fields of ProtobufLogReader except for ReaderBase#fileLength. > We calculate whether trailer is present or not depending on fileLength. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-25924) Seeing a spike in uncleanlyClosedWALs metric.
[ https://issues.apache.org/jira/browse/HBASE-25924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17355208#comment-17355208 ] Andrew Kyle Purtell commented on HBASE-25924: - bq. Whats the general guidance on back porting to branch-2.3? It is a live branch that we are still releasing from, so should receive all relevant bug fixes and changes that are meaningful for cross-branch compatibility (i.e. impacting an upgrade from 1.x, or impacting an upgrade to 2.4 or later). bq. That branch has diverged quite a bit and this patch doesn't apply cleanly. It's fine to resolve this issue without a 2.3 fix version and open a subtask or another jira for a backport to 2.3. > Seeing a spike in uncleanlyClosedWALs metric. > - > > Key: HBASE-25924 > URL: https://issues.apache.org/jira/browse/HBASE-25924 > Project: HBase > Issue Type: Bug > Components: Replication, wal >Affects Versions: 3.0.0-alpha-1, 1.7.0, 2.5.0, 2.4.4 >Reporter: Rushabh Shah >Assignee: Rushabh Shah >Priority: Major > Fix For: 3.0.0-alpha-1, 2.5.0, 2.3.6, 2.4.4, 1.7.1 > > > Getting the following log line in all of our production clusters when > WALEntryStream is dequeuing WAL file. > {noformat} > 2021-05-02 04:01:30,437 DEBUG [04901996] regionserver.WALEntryStream - > Reached the end of WAL file hdfs://. It was not closed > cleanly, so we did not parse 8 bytes of data. This is normally ok. > {noformat} > The 8 bytes are usually the trailer serialized size (SIZE_OF_INT (4bytes) + > "LAWP" (4 bytes) = 8 bytes) > While dequeue'ing the WAL file from WALEntryStream, we reset the reader here. > [WALEntryStream|https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/WALEntryStream.java#L199-L221] > {code:java} > private void tryAdvanceEntry() throws IOException { > if (checkReader()) { > readNextEntryAndSetPosition(); > if (currentEntry == null) { // no more entries in this log file - see > if log was rolled > if (logQueue.getQueue(walGroupId).size() > 1) { // log was rolled > // Before dequeueing, we should always get one more attempt at > reading. > // This is in case more entries came in after we opened the reader, > // and a new log was enqueued while we were reading. See HBASE-6758 > resetReader(); ---> HERE > readNextEntryAndSetPosition(); > if (currentEntry == null) { > if (checkAllBytesParsed()) { // now we're certain we're done with > this log file > dequeueCurrentLog(); > if (openNextLog()) { > readNextEntryAndSetPosition(); > } > } > } > } // no other logs, we've simply hit the end of the current open log. > Do nothing > } > } > // do nothing if we don't have a WAL Reader (e.g. if there's no logs in > queue) > } > {code} > In resetReader, we call the following methods, WALEntryStream#resetReader > > ProtobufLogReader#reset ---> ProtobufLogReader#initInternal. > In ProtobufLogReader#initInternal, we try to create the whole reader object > from scratch to see if any new data has been written. > We reset all the fields of ProtobufLogReader except for ReaderBase#fileLength. > We calculate whether trailer is present or not depending on fileLength. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-25924) Seeing a spike in uncleanlyClosedWALs metric.
[ https://issues.apache.org/jira/browse/HBASE-25924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17355178#comment-17355178 ] Bharath Vissapragada commented on HBASE-25924: -- HBASE-25932 is now committed to master/branch-2/branch2.4. [~apurtell] Whats the general guidance on back porting to branch-2.3? That branch has diverged quite a bit and this patch doesn't apply cleanly. > Seeing a spike in uncleanlyClosedWALs metric. > - > > Key: HBASE-25924 > URL: https://issues.apache.org/jira/browse/HBASE-25924 > Project: HBase > Issue Type: Bug > Components: Replication, wal >Affects Versions: 3.0.0-alpha-1, 1.7.0, 2.5.0, 2.4.4 >Reporter: Rushabh Shah >Assignee: Rushabh Shah >Priority: Major > Fix For: 3.0.0-alpha-1, 2.5.0, 2.3.6, 2.4.4, 1.7.1 > > > Getting the following log line in all of our production clusters when > WALEntryStream is dequeuing WAL file. > {noformat} > 2021-05-02 04:01:30,437 DEBUG [04901996] regionserver.WALEntryStream - > Reached the end of WAL file hdfs://. It was not closed > cleanly, so we did not parse 8 bytes of data. This is normally ok. > {noformat} > The 8 bytes are usually the trailer serialized size (SIZE_OF_INT (4bytes) + > "LAWP" (4 bytes) = 8 bytes) > While dequeue'ing the WAL file from WALEntryStream, we reset the reader here. > [WALEntryStream|https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/WALEntryStream.java#L199-L221] > {code:java} > private void tryAdvanceEntry() throws IOException { > if (checkReader()) { > readNextEntryAndSetPosition(); > if (currentEntry == null) { // no more entries in this log file - see > if log was rolled > if (logQueue.getQueue(walGroupId).size() > 1) { // log was rolled > // Before dequeueing, we should always get one more attempt at > reading. > // This is in case more entries came in after we opened the reader, > // and a new log was enqueued while we were reading. See HBASE-6758 > resetReader(); ---> HERE > readNextEntryAndSetPosition(); > if (currentEntry == null) { > if (checkAllBytesParsed()) { // now we're certain we're done with > this log file > dequeueCurrentLog(); > if (openNextLog()) { > readNextEntryAndSetPosition(); > } > } > } > } // no other logs, we've simply hit the end of the current open log. > Do nothing > } > } > // do nothing if we don't have a WAL Reader (e.g. if there's no logs in > queue) > } > {code} > In resetReader, we call the following methods, WALEntryStream#resetReader > > ProtobufLogReader#reset ---> ProtobufLogReader#initInternal. > In ProtobufLogReader#initInternal, we try to create the whole reader object > from scratch to see if any new data has been written. > We reset all the fields of ProtobufLogReader except for ReaderBase#fileLength. > We calculate whether trailer is present or not depending on fileLength. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-25924) Seeing a spike in uncleanlyClosedWALs metric.
[ https://issues.apache.org/jira/browse/HBASE-25924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17353455#comment-17353455 ] Andrew Kyle Purtell commented on HBASE-25924: - HBASE-25932 is making progress. We shouldn't close this until the issue is resolved one way or another, though. I've linked the JIRAs. > Seeing a spike in uncleanlyClosedWALs metric. > - > > Key: HBASE-25924 > URL: https://issues.apache.org/jira/browse/HBASE-25924 > Project: HBase > Issue Type: Bug > Components: Replication, wal >Affects Versions: 3.0.0-alpha-1, 1.7.0, 2.5.0, 2.4.4 >Reporter: Rushabh Shah >Assignee: Rushabh Shah >Priority: Major > Fix For: 3.0.0-alpha-1, 2.5.0, 2.3.6, 2.4.4, 1.7.1 > > > Getting the following log line in all of our production clusters when > WALEntryStream is dequeuing WAL file. > {noformat} > 2021-05-02 04:01:30,437 DEBUG [04901996] regionserver.WALEntryStream - > Reached the end of WAL file hdfs://. It was not closed > cleanly, so we did not parse 8 bytes of data. This is normally ok. > {noformat} > The 8 bytes are usually the trailer serialized size (SIZE_OF_INT (4bytes) + > "LAWP" (4 bytes) = 8 bytes) > While dequeue'ing the WAL file from WALEntryStream, we reset the reader here. > [WALEntryStream|https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/WALEntryStream.java#L199-L221] > {code:java} > private void tryAdvanceEntry() throws IOException { > if (checkReader()) { > readNextEntryAndSetPosition(); > if (currentEntry == null) { // no more entries in this log file - see > if log was rolled > if (logQueue.getQueue(walGroupId).size() > 1) { // log was rolled > // Before dequeueing, we should always get one more attempt at > reading. > // This is in case more entries came in after we opened the reader, > // and a new log was enqueued while we were reading. See HBASE-6758 > resetReader(); ---> HERE > readNextEntryAndSetPosition(); > if (currentEntry == null) { > if (checkAllBytesParsed()) { // now we're certain we're done with > this log file > dequeueCurrentLog(); > if (openNextLog()) { > readNextEntryAndSetPosition(); > } > } > } > } // no other logs, we've simply hit the end of the current open log. > Do nothing > } > } > // do nothing if we don't have a WAL Reader (e.g. if there's no logs in > queue) > } > {code} > In resetReader, we call the following methods, WALEntryStream#resetReader > > ProtobufLogReader#reset ---> ProtobufLogReader#initInternal. > In ProtobufLogReader#initInternal, we try to create the whole reader object > from scratch to see if any new data has been written. > We reset all the fields of ProtobufLogReader except for ReaderBase#fileLength. > We calculate whether trailer is present or not depending on fileLength. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-25924) Seeing a spike in uncleanlyClosedWALs metric.
[ https://issues.apache.org/jira/browse/HBASE-25924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17352729#comment-17352729 ] Rushabh Shah commented on HBASE-25924: -- [~apurtell] let's wait until tomorrow. If unable to find a fix then lets revert in master and branch-2. > Seeing a spike in uncleanlyClosedWALs metric. > - > > Key: HBASE-25924 > URL: https://issues.apache.org/jira/browse/HBASE-25924 > Project: HBase > Issue Type: Bug > Components: Replication, wal >Affects Versions: 3.0.0-alpha-1, 1.7.0, 2.5.0, 2.4.4 >Reporter: Rushabh Shah >Assignee: Rushabh Shah >Priority: Major > Fix For: 3.0.0-alpha-1, 2.5.0, 2.3.6, 2.4.4, 1.7.1 > > > Getting the following log line in all of our production clusters when > WALEntryStream is dequeuing WAL file. > {noformat} > 2021-05-02 04:01:30,437 DEBUG [04901996] regionserver.WALEntryStream - > Reached the end of WAL file hdfs://. It was not closed > cleanly, so we did not parse 8 bytes of data. This is normally ok. > {noformat} > The 8 bytes are usually the trailer serialized size (SIZE_OF_INT (4bytes) + > "LAWP" (4 bytes) = 8 bytes) > While dequeue'ing the WAL file from WALEntryStream, we reset the reader here. > [WALEntryStream|https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/WALEntryStream.java#L199-L221] > {code:java} > private void tryAdvanceEntry() throws IOException { > if (checkReader()) { > readNextEntryAndSetPosition(); > if (currentEntry == null) { // no more entries in this log file - see > if log was rolled > if (logQueue.getQueue(walGroupId).size() > 1) { // log was rolled > // Before dequeueing, we should always get one more attempt at > reading. > // This is in case more entries came in after we opened the reader, > // and a new log was enqueued while we were reading. See HBASE-6758 > resetReader(); ---> HERE > readNextEntryAndSetPosition(); > if (currentEntry == null) { > if (checkAllBytesParsed()) { // now we're certain we're done with > this log file > dequeueCurrentLog(); > if (openNextLog()) { > readNextEntryAndSetPosition(); > } > } > } > } // no other logs, we've simply hit the end of the current open log. > Do nothing > } > } > // do nothing if we don't have a WAL Reader (e.g. if there's no logs in > queue) > } > {code} > In resetReader, we call the following methods, WALEntryStream#resetReader > > ProtobufLogReader#reset ---> ProtobufLogReader#initInternal. > In ProtobufLogReader#initInternal, we try to create the whole reader object > from scratch to see if any new data has been written. > We reset all the fields of ProtobufLogReader except for ReaderBase#fileLength. > We calculate whether trailer is present or not depending on fileLength. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-25924) Seeing a spike in uncleanlyClosedWALs metric.
[ https://issues.apache.org/jira/browse/HBASE-25924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17352720#comment-17352720 ] Andrew Kyle Purtell commented on HBASE-25924: - The test failure is being tracked by HBASE-25932. Still leaving this open, because we shouldn't have a consistently failing test checked in for long. If it's going to take a while to resolve, better to revert this until its ready. > Seeing a spike in uncleanlyClosedWALs metric. > - > > Key: HBASE-25924 > URL: https://issues.apache.org/jira/browse/HBASE-25924 > Project: HBase > Issue Type: Bug > Components: Replication, wal >Affects Versions: 3.0.0-alpha-1, 1.7.0, 2.5.0, 2.4.4 >Reporter: Rushabh Shah >Assignee: Rushabh Shah >Priority: Major > Fix For: 3.0.0-alpha-1, 2.5.0, 2.3.6, 2.4.4, 1.7.1 > > > Getting the following log line in all of our production clusters when > WALEntryStream is dequeuing WAL file. > {noformat} > 2021-05-02 04:01:30,437 DEBUG [04901996] regionserver.WALEntryStream - > Reached the end of WAL file hdfs://. It was not closed > cleanly, so we did not parse 8 bytes of data. This is normally ok. > {noformat} > The 8 bytes are usually the trailer serialized size (SIZE_OF_INT (4bytes) + > "LAWP" (4 bytes) = 8 bytes) > While dequeue'ing the WAL file from WALEntryStream, we reset the reader here. > [WALEntryStream|https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/WALEntryStream.java#L199-L221] > {code:java} > private void tryAdvanceEntry() throws IOException { > if (checkReader()) { > readNextEntryAndSetPosition(); > if (currentEntry == null) { // no more entries in this log file - see > if log was rolled > if (logQueue.getQueue(walGroupId).size() > 1) { // log was rolled > // Before dequeueing, we should always get one more attempt at > reading. > // This is in case more entries came in after we opened the reader, > // and a new log was enqueued while we were reading. See HBASE-6758 > resetReader(); ---> HERE > readNextEntryAndSetPosition(); > if (currentEntry == null) { > if (checkAllBytesParsed()) { // now we're certain we're done with > this log file > dequeueCurrentLog(); > if (openNextLog()) { > readNextEntryAndSetPosition(); > } > } > } > } // no other logs, we've simply hit the end of the current open log. > Do nothing > } > } > // do nothing if we don't have a WAL Reader (e.g. if there's no logs in > queue) > } > {code} > In resetReader, we call the following methods, WALEntryStream#resetReader > > ProtobufLogReader#reset ---> ProtobufLogReader#initInternal. > In ProtobufLogReader#initInternal, we try to create the whole reader object > from scratch to see if any new data has been written. > We reset all the fields of ProtobufLogReader except for ReaderBase#fileLength. > We calculate whether trailer is present or not depending on fileLength. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-25924) Seeing a spike in uncleanlyClosedWALs metric.
[ https://issues.apache.org/jira/browse/HBASE-25924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17352715#comment-17352715 ] Hudson commented on HBASE-25924: Results for branch branch-2.4 [build #128 on builds.a.o|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.4/128/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.4/128/General_20Nightly_20Build_20Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.4/128/JDK8_20Nightly_20Build_20Report_20_28Hadoop2_29/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.4/128/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/] (x) {color:red}-1 jdk11 hadoop3 checks{color} -- For more information [see jdk11 report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.4/128/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Seeing a spike in uncleanlyClosedWALs metric. > - > > Key: HBASE-25924 > URL: https://issues.apache.org/jira/browse/HBASE-25924 > Project: HBase > Issue Type: Bug > Components: Replication, wal >Affects Versions: 3.0.0-alpha-1, 1.7.0, 2.5.0, 2.4.4 >Reporter: Rushabh Shah >Assignee: Rushabh Shah >Priority: Major > Fix For: 3.0.0-alpha-1, 2.5.0, 2.3.6, 2.4.4, 1.7.1 > > > Getting the following log line in all of our production clusters when > WALEntryStream is dequeuing WAL file. > {noformat} > 2021-05-02 04:01:30,437 DEBUG [04901996] regionserver.WALEntryStream - > Reached the end of WAL file hdfs://. It was not closed > cleanly, so we did not parse 8 bytes of data. This is normally ok. > {noformat} > The 8 bytes are usually the trailer serialized size (SIZE_OF_INT (4bytes) + > "LAWP" (4 bytes) = 8 bytes) > While dequeue'ing the WAL file from WALEntryStream, we reset the reader here. > [WALEntryStream|https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/WALEntryStream.java#L199-L221] > {code:java} > private void tryAdvanceEntry() throws IOException { > if (checkReader()) { > readNextEntryAndSetPosition(); > if (currentEntry == null) { // no more entries in this log file - see > if log was rolled > if (logQueue.getQueue(walGroupId).size() > 1) { // log was rolled > // Before dequeueing, we should always get one more attempt at > reading. > // This is in case more entries came in after we opened the reader, > // and a new log was enqueued while we were reading. See HBASE-6758 > resetReader(); ---> HERE > readNextEntryAndSetPosition(); > if (currentEntry == null) { > if (checkAllBytesParsed()) { // now we're certain we're done with > this log file > dequeueCurrentLog(); > if (openNextLog()) { > readNextEntryAndSetPosition(); > } > } > } > } // no other logs, we've simply hit the end of the current open log. > Do nothing > } > } > // do nothing if we don't have a WAL Reader (e.g. if there's no logs in > queue) > } > {code} > In resetReader, we call the following methods, WALEntryStream#resetReader > > ProtobufLogReader#reset ---> ProtobufLogReader#initInternal. > In ProtobufLogReader#initInternal, we try to create the whole reader object > from scratch to see if any new data has been written. > We reset all the fields of ProtobufLogReader except for ReaderBase#fileLength. > We calculate whether trailer is present or not depending on fileLength. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-25924) Seeing a spike in uncleanlyClosedWALs metric.
[ https://issues.apache.org/jira/browse/HBASE-25924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17352708#comment-17352708 ] Rushabh Shah commented on HBASE-25924: -- [~apurtell] I created this ticket https://issues.apache.org/jira/browse/HBASE-25932 to track the fix. > Seeing a spike in uncleanlyClosedWALs metric. > - > > Key: HBASE-25924 > URL: https://issues.apache.org/jira/browse/HBASE-25924 > Project: HBase > Issue Type: Bug > Components: Replication, wal >Affects Versions: 3.0.0-alpha-1, 1.7.0, 2.5.0, 2.4.4 >Reporter: Rushabh Shah >Assignee: Rushabh Shah >Priority: Major > Fix For: 3.0.0-alpha-1, 1.7.0, 2.5.0, 2.3.6, 2.4.4 > > > Getting the following log line in all of our production clusters when > WALEntryStream is dequeuing WAL file. > {noformat} > 2021-05-02 04:01:30,437 DEBUG [04901996] regionserver.WALEntryStream - > Reached the end of WAL file hdfs://. It was not closed > cleanly, so we did not parse 8 bytes of data. This is normally ok. > {noformat} > The 8 bytes are usually the trailer serialized size (SIZE_OF_INT (4bytes) + > "LAWP" (4 bytes) = 8 bytes) > While dequeue'ing the WAL file from WALEntryStream, we reset the reader here. > [WALEntryStream|https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/WALEntryStream.java#L199-L221] > {code:java} > private void tryAdvanceEntry() throws IOException { > if (checkReader()) { > readNextEntryAndSetPosition(); > if (currentEntry == null) { // no more entries in this log file - see > if log was rolled > if (logQueue.getQueue(walGroupId).size() > 1) { // log was rolled > // Before dequeueing, we should always get one more attempt at > reading. > // This is in case more entries came in after we opened the reader, > // and a new log was enqueued while we were reading. See HBASE-6758 > resetReader(); ---> HERE > readNextEntryAndSetPosition(); > if (currentEntry == null) { > if (checkAllBytesParsed()) { // now we're certain we're done with > this log file > dequeueCurrentLog(); > if (openNextLog()) { > readNextEntryAndSetPosition(); > } > } > } > } // no other logs, we've simply hit the end of the current open log. > Do nothing > } > } > // do nothing if we don't have a WAL Reader (e.g. if there's no logs in > queue) > } > {code} > In resetReader, we call the following methods, WALEntryStream#resetReader > > ProtobufLogReader#reset ---> ProtobufLogReader#initInternal. > In ProtobufLogReader#initInternal, we try to create the whole reader object > from scratch to see if any new data has been written. > We reset all the fields of ProtobufLogReader except for ReaderBase#fileLength. > We calculate whether trailer is present or not depending on fileLength. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-25924) Seeing a spike in uncleanlyClosedWALs metric.
[ https://issues.apache.org/jira/browse/HBASE-25924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17352632#comment-17352632 ] Hudson commented on HBASE-25924: Results for branch branch-2 [build #262 on builds.a.o|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/262/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/262/General_20Nightly_20Build_20Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/262/JDK8_20Nightly_20Build_20Report_20_28Hadoop2_29/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/262/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/] (x) {color:red}-1 jdk11 hadoop3 checks{color} -- For more information [see jdk11 report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/262/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Seeing a spike in uncleanlyClosedWALs metric. > - > > Key: HBASE-25924 > URL: https://issues.apache.org/jira/browse/HBASE-25924 > Project: HBase > Issue Type: Bug > Components: Replication, wal >Affects Versions: 3.0.0-alpha-1, 1.7.0, 2.5.0, 2.4.4 >Reporter: Rushabh Shah >Assignee: Rushabh Shah >Priority: Major > Fix For: 3.0.0-alpha-1, 1.7.0, 2.5.0, 2.3.6, 2.4.4 > > > Getting the following log line in all of our production clusters when > WALEntryStream is dequeuing WAL file. > {noformat} > 2021-05-02 04:01:30,437 DEBUG [04901996] regionserver.WALEntryStream - > Reached the end of WAL file hdfs://. It was not closed > cleanly, so we did not parse 8 bytes of data. This is normally ok. > {noformat} > The 8 bytes are usually the trailer serialized size (SIZE_OF_INT (4bytes) + > "LAWP" (4 bytes) = 8 bytes) > While dequeue'ing the WAL file from WALEntryStream, we reset the reader here. > [WALEntryStream|https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/WALEntryStream.java#L199-L221] > {code:java} > private void tryAdvanceEntry() throws IOException { > if (checkReader()) { > readNextEntryAndSetPosition(); > if (currentEntry == null) { // no more entries in this log file - see > if log was rolled > if (logQueue.getQueue(walGroupId).size() > 1) { // log was rolled > // Before dequeueing, we should always get one more attempt at > reading. > // This is in case more entries came in after we opened the reader, > // and a new log was enqueued while we were reading. See HBASE-6758 > resetReader(); ---> HERE > readNextEntryAndSetPosition(); > if (currentEntry == null) { > if (checkAllBytesParsed()) { // now we're certain we're done with > this log file > dequeueCurrentLog(); > if (openNextLog()) { > readNextEntryAndSetPosition(); > } > } > } > } // no other logs, we've simply hit the end of the current open log. > Do nothing > } > } > // do nothing if we don't have a WAL Reader (e.g. if there's no logs in > queue) > } > {code} > In resetReader, we call the following methods, WALEntryStream#resetReader > > ProtobufLogReader#reset ---> ProtobufLogReader#initInternal. > In ProtobufLogReader#initInternal, we try to create the whole reader object > from scratch to see if any new data has been written. > We reset all the fields of ProtobufLogReader except for ReaderBase#fileLength. > We calculate whether trailer is present or not depending on fileLength. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-25924) Seeing a spike in uncleanlyClosedWALs metric.
[ https://issues.apache.org/jira/browse/HBASE-25924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17352561#comment-17352561 ] Hudson commented on HBASE-25924: Results for branch branch-1 [build #132 on builds.a.o|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-1/132/]: (x) *{color:red}-1 overall{color}* details (if available): (x) {color:red}-1 general checks{color} -- Something went wrong running this stage, please [check relevant console output|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-1/132//console]. (x) {color:red}-1 jdk7 checks{color} -- For more information [see jdk7 report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-1/132//JDK7_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-1/132//JDK8_Nightly_Build_Report_(Hadoop2)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. > Seeing a spike in uncleanlyClosedWALs metric. > - > > Key: HBASE-25924 > URL: https://issues.apache.org/jira/browse/HBASE-25924 > Project: HBase > Issue Type: Bug > Components: Replication, wal >Affects Versions: 3.0.0-alpha-1, 1.7.0, 2.5.0, 2.4.4 >Reporter: Rushabh Shah >Assignee: Rushabh Shah >Priority: Major > Fix For: 3.0.0-alpha-1, 1.7.0, 2.5.0, 2.3.6, 2.4.4 > > > Getting the following log line in all of our production clusters when > WALEntryStream is dequeuing WAL file. > {noformat} > 2021-05-02 04:01:30,437 DEBUG [04901996] regionserver.WALEntryStream - > Reached the end of WAL file hdfs://. It was not closed > cleanly, so we did not parse 8 bytes of data. This is normally ok. > {noformat} > The 8 bytes are usually the trailer serialized size (SIZE_OF_INT (4bytes) + > "LAWP" (4 bytes) = 8 bytes) > While dequeue'ing the WAL file from WALEntryStream, we reset the reader here. > [WALEntryStream|https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/WALEntryStream.java#L199-L221] > {code:java} > private void tryAdvanceEntry() throws IOException { > if (checkReader()) { > readNextEntryAndSetPosition(); > if (currentEntry == null) { // no more entries in this log file - see > if log was rolled > if (logQueue.getQueue(walGroupId).size() > 1) { // log was rolled > // Before dequeueing, we should always get one more attempt at > reading. > // This is in case more entries came in after we opened the reader, > // and a new log was enqueued while we were reading. See HBASE-6758 > resetReader(); ---> HERE > readNextEntryAndSetPosition(); > if (currentEntry == null) { > if (checkAllBytesParsed()) { // now we're certain we're done with > this log file > dequeueCurrentLog(); > if (openNextLog()) { > readNextEntryAndSetPosition(); > } > } > } > } // no other logs, we've simply hit the end of the current open log. > Do nothing > } > } > // do nothing if we don't have a WAL Reader (e.g. if there's no logs in > queue) > } > {code} > In resetReader, we call the following methods, WALEntryStream#resetReader > > ProtobufLogReader#reset ---> ProtobufLogReader#initInternal. > In ProtobufLogReader#initInternal, we try to create the whole reader object > from scratch to see if any new data has been written. > We reset all the fields of ProtobufLogReader except for ReaderBase#fileLength. > We calculate whether trailer is present or not depending on fileLength. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-25924) Seeing a spike in uncleanlyClosedWALs metric.
[ https://issues.apache.org/jira/browse/HBASE-25924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17352481#comment-17352481 ] Hudson commented on HBASE-25924: Results for branch master [build #306 on builds.a.o|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/master/306/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/master/306/General_20Nightly_20Build_20Report/] (/) {color:green}+1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/master/306/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/] (x) {color:red}-1 jdk11 hadoop3 checks{color} -- For more information [see jdk11 report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/master/306/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Seeing a spike in uncleanlyClosedWALs metric. > - > > Key: HBASE-25924 > URL: https://issues.apache.org/jira/browse/HBASE-25924 > Project: HBase > Issue Type: Bug > Components: Replication, wal >Affects Versions: 3.0.0-alpha-1, 1.7.0, 2.5.0, 2.4.4 >Reporter: Rushabh Shah >Assignee: Rushabh Shah >Priority: Major > Fix For: 3.0.0-alpha-1, 1.7.0, 2.5.0, 2.3.6, 2.4.4 > > > Getting the following log line in all of our production clusters when > WALEntryStream is dequeuing WAL file. > {noformat} > 2021-05-02 04:01:30,437 DEBUG [04901996] regionserver.WALEntryStream - > Reached the end of WAL file hdfs://. It was not closed > cleanly, so we did not parse 8 bytes of data. This is normally ok. > {noformat} > The 8 bytes are usually the trailer serialized size (SIZE_OF_INT (4bytes) + > "LAWP" (4 bytes) = 8 bytes) > While dequeue'ing the WAL file from WALEntryStream, we reset the reader here. > [WALEntryStream|https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/WALEntryStream.java#L199-L221] > {code:java} > private void tryAdvanceEntry() throws IOException { > if (checkReader()) { > readNextEntryAndSetPosition(); > if (currentEntry == null) { // no more entries in this log file - see > if log was rolled > if (logQueue.getQueue(walGroupId).size() > 1) { // log was rolled > // Before dequeueing, we should always get one more attempt at > reading. > // This is in case more entries came in after we opened the reader, > // and a new log was enqueued while we were reading. See HBASE-6758 > resetReader(); ---> HERE > readNextEntryAndSetPosition(); > if (currentEntry == null) { > if (checkAllBytesParsed()) { // now we're certain we're done with > this log file > dequeueCurrentLog(); > if (openNextLog()) { > readNextEntryAndSetPosition(); > } > } > } > } // no other logs, we've simply hit the end of the current open log. > Do nothing > } > } > // do nothing if we don't have a WAL Reader (e.g. if there's no logs in > queue) > } > {code} > In resetReader, we call the following methods, WALEntryStream#resetReader > > ProtobufLogReader#reset ---> ProtobufLogReader#initInternal. > In ProtobufLogReader#initInternal, we try to create the whole reader object > from scratch to see if any new data has been written. > We reset all the fields of ProtobufLogReader except for ReaderBase#fileLength. > We calculate whether trailer is present or not depending on fileLength. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-25924) Seeing a spike in uncleanlyClosedWALs metric.
[ https://issues.apache.org/jira/browse/HBASE-25924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17352362#comment-17352362 ] Hudson commented on HBASE-25924: Results for branch branch-2.3 [build #225 on builds.a.o|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.3/225/]: (x) *{color:red}-1 overall{color}* details (if available): (x) {color:red}-1 general checks{color} -- For more information [see general report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.3/225/General_20Nightly_20Build_20Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.3/225/JDK8_20Nightly_20Build_20Report_20_28Hadoop2_29/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.3/225/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/] (x) {color:red}-1 jdk11 hadoop3 checks{color} -- For more information [see jdk11 report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.3/225/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/] (x) {color:red}-1 source release artifact{color} -- See build output for details. (x) {color:red}-1 client integration test{color} -- Something went wrong with this stage, [check relevant console output|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.3/225//console]. > Seeing a spike in uncleanlyClosedWALs metric. > - > > Key: HBASE-25924 > URL: https://issues.apache.org/jira/browse/HBASE-25924 > Project: HBase > Issue Type: Bug > Components: Replication, wal >Affects Versions: 3.0.0-alpha-1, 1.7.0, 2.5.0, 2.4.4 >Reporter: Rushabh Shah >Assignee: Rushabh Shah >Priority: Major > Fix For: 3.0.0-alpha-1, 1.7.0, 2.5.0, 2.3.6, 2.4.4 > > > Getting the following log line in all of our production clusters when > WALEntryStream is dequeuing WAL file. > {noformat} > 2021-05-02 04:01:30,437 DEBUG [04901996] regionserver.WALEntryStream - > Reached the end of WAL file hdfs://. It was not closed > cleanly, so we did not parse 8 bytes of data. This is normally ok. > {noformat} > The 8 bytes are usually the trailer serialized size (SIZE_OF_INT (4bytes) + > "LAWP" (4 bytes) = 8 bytes) > While dequeue'ing the WAL file from WALEntryStream, we reset the reader here. > [WALEntryStream|https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/WALEntryStream.java#L199-L221] > {code:java} > private void tryAdvanceEntry() throws IOException { > if (checkReader()) { > readNextEntryAndSetPosition(); > if (currentEntry == null) { // no more entries in this log file - see > if log was rolled > if (logQueue.getQueue(walGroupId).size() > 1) { // log was rolled > // Before dequeueing, we should always get one more attempt at > reading. > // This is in case more entries came in after we opened the reader, > // and a new log was enqueued while we were reading. See HBASE-6758 > resetReader(); ---> HERE > readNextEntryAndSetPosition(); > if (currentEntry == null) { > if (checkAllBytesParsed()) { // now we're certain we're done with > this log file > dequeueCurrentLog(); > if (openNextLog()) { > readNextEntryAndSetPosition(); > } > } > } > } // no other logs, we've simply hit the end of the current open log. > Do nothing > } > } > // do nothing if we don't have a WAL Reader (e.g. if there's no logs in > queue) > } > {code} > In resetReader, we call the following methods, WALEntryStream#resetReader > > ProtobufLogReader#reset ---> ProtobufLogReader#initInternal. > In ProtobufLogReader#initInternal, we try to create the whole reader object > from scratch to see if any new data has been written. > We reset all the fields of ProtobufLogReader except for ReaderBase#fileLength. > We calculate whether trailer is present or not depending on fileLength. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-25924) Seeing a spike in uncleanlyClosedWALs metric.
[ https://issues.apache.org/jira/browse/HBASE-25924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17351972#comment-17351972 ] Andrew Kyle Purtell commented on HBASE-25924: - Thanks for the fix [~shahrs87] ! > Seeing a spike in uncleanlyClosedWALs metric. > - > > Key: HBASE-25924 > URL: https://issues.apache.org/jira/browse/HBASE-25924 > Project: HBase > Issue Type: Bug > Components: Replication, wal >Affects Versions: 3.0.0-alpha-1, 1.7.0, 2.5.0, 2.4.4 >Reporter: Rushabh Shah >Assignee: Rushabh Shah >Priority: Major > Fix For: 3.0.0-alpha-1, 1.7.0, 2.5.0, 2.3.6, 2.4.4 > > > Getting the following log line in all of our production clusters when > WALEntryStream is dequeuing WAL file. > {noformat} > 2021-05-02 04:01:30,437 DEBUG [04901996] regionserver.WALEntryStream - > Reached the end of WAL file hdfs://. It was not closed > cleanly, so we did not parse 8 bytes of data. This is normally ok. > {noformat} > The 8 bytes are usually the trailer serialized size (SIZE_OF_INT (4bytes) + > "LAWP" (4 bytes) = 8 bytes) > While dequeue'ing the WAL file from WALEntryStream, we reset the reader here. > [WALEntryStream|https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/WALEntryStream.java#L199-L221] > {code:java} > private void tryAdvanceEntry() throws IOException { > if (checkReader()) { > readNextEntryAndSetPosition(); > if (currentEntry == null) { // no more entries in this log file - see > if log was rolled > if (logQueue.getQueue(walGroupId).size() > 1) { // log was rolled > // Before dequeueing, we should always get one more attempt at > reading. > // This is in case more entries came in after we opened the reader, > // and a new log was enqueued while we were reading. See HBASE-6758 > resetReader(); ---> HERE > readNextEntryAndSetPosition(); > if (currentEntry == null) { > if (checkAllBytesParsed()) { // now we're certain we're done with > this log file > dequeueCurrentLog(); > if (openNextLog()) { > readNextEntryAndSetPosition(); > } > } > } > } // no other logs, we've simply hit the end of the current open log. > Do nothing > } > } > // do nothing if we don't have a WAL Reader (e.g. if there's no logs in > queue) > } > {code} > In resetReader, we call the following methods, WALEntryStream#resetReader > > ProtobufLogReader#reset ---> ProtobufLogReader#initInternal. > In ProtobufLogReader#initInternal, we try to create the whole reader object > from scratch to see if any new data has been written. > We reset all the fields of ProtobufLogReader except for ReaderBase#fileLength. > We calculate whether trailer is present or not depending on fileLength. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-25924) Seeing a spike in uncleanlyClosedWALs metric.
[ https://issues.apache.org/jira/browse/HBASE-25924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17351973#comment-17351973 ] Rushabh Shah commented on HBASE-25924: -- Thank you [~apurtell] for the review and commit and [~bharathv] [~vjasani] for the reviews. > Seeing a spike in uncleanlyClosedWALs metric. > - > > Key: HBASE-25924 > URL: https://issues.apache.org/jira/browse/HBASE-25924 > Project: HBase > Issue Type: Bug > Components: Replication, wal >Affects Versions: 3.0.0-alpha-1, 1.7.0, 2.5.0, 2.4.4 >Reporter: Rushabh Shah >Assignee: Rushabh Shah >Priority: Major > Fix For: 3.0.0-alpha-1, 1.7.0, 2.5.0, 2.3.6, 2.4.4 > > > Getting the following log line in all of our production clusters when > WALEntryStream is dequeuing WAL file. > {noformat} > 2021-05-02 04:01:30,437 DEBUG [04901996] regionserver.WALEntryStream - > Reached the end of WAL file hdfs://. It was not closed > cleanly, so we did not parse 8 bytes of data. This is normally ok. > {noformat} > The 8 bytes are usually the trailer serialized size (SIZE_OF_INT (4bytes) + > "LAWP" (4 bytes) = 8 bytes) > While dequeue'ing the WAL file from WALEntryStream, we reset the reader here. > [WALEntryStream|https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/WALEntryStream.java#L199-L221] > {code:java} > private void tryAdvanceEntry() throws IOException { > if (checkReader()) { > readNextEntryAndSetPosition(); > if (currentEntry == null) { // no more entries in this log file - see > if log was rolled > if (logQueue.getQueue(walGroupId).size() > 1) { // log was rolled > // Before dequeueing, we should always get one more attempt at > reading. > // This is in case more entries came in after we opened the reader, > // and a new log was enqueued while we were reading. See HBASE-6758 > resetReader(); ---> HERE > readNextEntryAndSetPosition(); > if (currentEntry == null) { > if (checkAllBytesParsed()) { // now we're certain we're done with > this log file > dequeueCurrentLog(); > if (openNextLog()) { > readNextEntryAndSetPosition(); > } > } > } > } // no other logs, we've simply hit the end of the current open log. > Do nothing > } > } > // do nothing if we don't have a WAL Reader (e.g. if there's no logs in > queue) > } > {code} > In resetReader, we call the following methods, WALEntryStream#resetReader > > ProtobufLogReader#reset ---> ProtobufLogReader#initInternal. > In ProtobufLogReader#initInternal, we try to create the whole reader object > from scratch to see if any new data has been written. > We reset all the fields of ProtobufLogReader except for ReaderBase#fileLength. > We calculate whether trailer is present or not depending on fileLength. -- This message was sent by Atlassian Jira (v8.3.4#803005)