[jira] [Commented] (HBASE-25924) Seeing a spike in uncleanlyClosedWALs metric.

2021-06-10 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17361335#comment-17361335
 ] 

Hudson commented on HBASE-25924:


Results for branch branch-2.3
[build #235 on 
builds.a.o|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.3/235/]:
 (/) *{color:green}+1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.3/235/General_20Nightly_20Build_20Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.3/235/JDK8_20Nightly_20Build_20Report_20_28Hadoop2_29/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.3/235/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.3/235/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Seeing a spike in uncleanlyClosedWALs metric.
> -
>
> Key: HBASE-25924
> URL: https://issues.apache.org/jira/browse/HBASE-25924
> Project: HBase
>  Issue Type: Bug
>  Components: Replication, wal
>Affects Versions: 3.0.0-alpha-1, 1.7.0, 2.5.0, 2.4.4
>Reporter: Rushabh Shah
>Assignee: Rushabh Shah
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.5.0, 2.4.4, 1.7.1
>
>
> Getting the following log line in all of our production clusters when 
> WALEntryStream is dequeuing WAL file.
> {noformat}
>  2021-05-02 04:01:30,437 DEBUG [04901996] regionserver.WALEntryStream - 
> Reached the end of WAL file hdfs://. It was not closed 
> cleanly, so we did not parse 8 bytes of data. This is normally ok.
> {noformat}
> The 8 bytes are usually the trailer serialized size (SIZE_OF_INT (4bytes) + 
> "LAWP" (4 bytes) = 8 bytes)
> While dequeue'ing the WAL file from WALEntryStream, we reset the reader here.
> [WALEntryStream|https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/WALEntryStream.java#L199-L221]
> {code:java}
>   private void tryAdvanceEntry() throws IOException {
> if (checkReader()) {
>   readNextEntryAndSetPosition();
>   if (currentEntry == null) { // no more entries in this log file - see 
> if log was rolled
> if (logQueue.getQueue(walGroupId).size() > 1) { // log was rolled
>   // Before dequeueing, we should always get one more attempt at 
> reading.
>   // This is in case more entries came in after we opened the reader,
>   // and a new log was enqueued while we were reading. See HBASE-6758
>   resetReader(); ---> HERE
>   readNextEntryAndSetPosition();
>   if (currentEntry == null) {
> if (checkAllBytesParsed()) { // now we're certain we're done with 
> this log file
>   dequeueCurrentLog();
>   if (openNextLog()) {
> readNextEntryAndSetPosition();
>   }
> }
>   }
> } // no other logs, we've simply hit the end of the current open log. 
> Do nothing
>   }
> }
> // do nothing if we don't have a WAL Reader (e.g. if there's no logs in 
> queue)
>   }
> {code}
> In resetReader, we call the following methods, WALEntryStream#resetReader  
> >  ProtobufLogReader#reset ---> ProtobufLogReader#initInternal.
> In ProtobufLogReader#initInternal, we try to create the whole reader object 
> from scratch to see if any new data has been written.
> We reset all the fields of ProtobufLogReader except for ReaderBase#fileLength.
> We calculate whether trailer is present or not depending on fileLength.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25924) Seeing a spike in uncleanlyClosedWALs metric.

2021-06-10 Thread Rushabh Shah (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17361205#comment-17361205
 ] 

Rushabh Shah commented on HBASE-25924:
--

The branches have diverged a lot. It is not trivial work to backport them 
easily. Unfortunately don't have cycles in few days. Will get back to it later.

> Seeing a spike in uncleanlyClosedWALs metric.
> -
>
> Key: HBASE-25924
> URL: https://issues.apache.org/jira/browse/HBASE-25924
> Project: HBase
>  Issue Type: Bug
>  Components: Replication, wal
>Affects Versions: 3.0.0-alpha-1, 1.7.0, 2.5.0, 2.4.4
>Reporter: Rushabh Shah
>Assignee: Rushabh Shah
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.5.0, 2.4.4, 1.7.1
>
>
> Getting the following log line in all of our production clusters when 
> WALEntryStream is dequeuing WAL file.
> {noformat}
>  2021-05-02 04:01:30,437 DEBUG [04901996] regionserver.WALEntryStream - 
> Reached the end of WAL file hdfs://. It was not closed 
> cleanly, so we did not parse 8 bytes of data. This is normally ok.
> {noformat}
> The 8 bytes are usually the trailer serialized size (SIZE_OF_INT (4bytes) + 
> "LAWP" (4 bytes) = 8 bytes)
> While dequeue'ing the WAL file from WALEntryStream, we reset the reader here.
> [WALEntryStream|https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/WALEntryStream.java#L199-L221]
> {code:java}
>   private void tryAdvanceEntry() throws IOException {
> if (checkReader()) {
>   readNextEntryAndSetPosition();
>   if (currentEntry == null) { // no more entries in this log file - see 
> if log was rolled
> if (logQueue.getQueue(walGroupId).size() > 1) { // log was rolled
>   // Before dequeueing, we should always get one more attempt at 
> reading.
>   // This is in case more entries came in after we opened the reader,
>   // and a new log was enqueued while we were reading. See HBASE-6758
>   resetReader(); ---> HERE
>   readNextEntryAndSetPosition();
>   if (currentEntry == null) {
> if (checkAllBytesParsed()) { // now we're certain we're done with 
> this log file
>   dequeueCurrentLog();
>   if (openNextLog()) {
> readNextEntryAndSetPosition();
>   }
> }
>   }
> } // no other logs, we've simply hit the end of the current open log. 
> Do nothing
>   }
> }
> // do nothing if we don't have a WAL Reader (e.g. if there's no logs in 
> queue)
>   }
> {code}
> In resetReader, we call the following methods, WALEntryStream#resetReader  
> >  ProtobufLogReader#reset ---> ProtobufLogReader#initInternal.
> In ProtobufLogReader#initInternal, we try to create the whole reader object 
> from scratch to see if any new data has been written.
> We reset all the fields of ProtobufLogReader except for ReaderBase#fileLength.
> We calculate whether trailer is present or not depending on fileLength.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25924) Seeing a spike in uncleanlyClosedWALs metric.

2021-06-10 Thread Bharath Vissapragada (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17361144#comment-17361144
 ] 

Bharath Vissapragada commented on HBASE-25924:
--

[~shahrs87] I think it was committed and reverted from 2.3, still would be nice 
to have a working patch in that branch (if you have spare cycles) :-).

> Seeing a spike in uncleanlyClosedWALs metric.
> -
>
> Key: HBASE-25924
> URL: https://issues.apache.org/jira/browse/HBASE-25924
> Project: HBase
>  Issue Type: Bug
>  Components: Replication, wal
>Affects Versions: 3.0.0-alpha-1, 1.7.0, 2.5.0, 2.4.4
>Reporter: Rushabh Shah
>Assignee: Rushabh Shah
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.5.0, 2.4.4, 1.7.1
>
>
> Getting the following log line in all of our production clusters when 
> WALEntryStream is dequeuing WAL file.
> {noformat}
>  2021-05-02 04:01:30,437 DEBUG [04901996] regionserver.WALEntryStream - 
> Reached the end of WAL file hdfs://. It was not closed 
> cleanly, so we did not parse 8 bytes of data. This is normally ok.
> {noformat}
> The 8 bytes are usually the trailer serialized size (SIZE_OF_INT (4bytes) + 
> "LAWP" (4 bytes) = 8 bytes)
> While dequeue'ing the WAL file from WALEntryStream, we reset the reader here.
> [WALEntryStream|https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/WALEntryStream.java#L199-L221]
> {code:java}
>   private void tryAdvanceEntry() throws IOException {
> if (checkReader()) {
>   readNextEntryAndSetPosition();
>   if (currentEntry == null) { // no more entries in this log file - see 
> if log was rolled
> if (logQueue.getQueue(walGroupId).size() > 1) { // log was rolled
>   // Before dequeueing, we should always get one more attempt at 
> reading.
>   // This is in case more entries came in after we opened the reader,
>   // and a new log was enqueued while we were reading. See HBASE-6758
>   resetReader(); ---> HERE
>   readNextEntryAndSetPosition();
>   if (currentEntry == null) {
> if (checkAllBytesParsed()) { // now we're certain we're done with 
> this log file
>   dequeueCurrentLog();
>   if (openNextLog()) {
> readNextEntryAndSetPosition();
>   }
> }
>   }
> } // no other logs, we've simply hit the end of the current open log. 
> Do nothing
>   }
> }
> // do nothing if we don't have a WAL Reader (e.g. if there's no logs in 
> queue)
>   }
> {code}
> In resetReader, we call the following methods, WALEntryStream#resetReader  
> >  ProtobufLogReader#reset ---> ProtobufLogReader#initInternal.
> In ProtobufLogReader#initInternal, we try to create the whole reader object 
> from scratch to see if any new data has been written.
> We reset all the fields of ProtobufLogReader except for ReaderBase#fileLength.
> We calculate whether trailer is present or not depending on fileLength.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25924) Seeing a spike in uncleanlyClosedWALs metric.

2021-06-10 Thread Rushabh Shah (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17361114#comment-17361114
 ] 

Rushabh Shah commented on HBASE-25924:
--

[~psomogyi] Instead of reverting this commit, could we pick HBASE-25932 to 
branch-2.3 ? If yes, then I can put up a PR quickly. Cc [~bharathv] [~apurtell]

> Seeing a spike in uncleanlyClosedWALs metric.
> -
>
> Key: HBASE-25924
> URL: https://issues.apache.org/jira/browse/HBASE-25924
> Project: HBase
>  Issue Type: Bug
>  Components: Replication, wal
>Affects Versions: 3.0.0-alpha-1, 1.7.0, 2.5.0, 2.4.4
>Reporter: Rushabh Shah
>Assignee: Rushabh Shah
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.5.0, 2.4.4, 1.7.1
>
>
> Getting the following log line in all of our production clusters when 
> WALEntryStream is dequeuing WAL file.
> {noformat}
>  2021-05-02 04:01:30,437 DEBUG [04901996] regionserver.WALEntryStream - 
> Reached the end of WAL file hdfs://. It was not closed 
> cleanly, so we did not parse 8 bytes of data. This is normally ok.
> {noformat}
> The 8 bytes are usually the trailer serialized size (SIZE_OF_INT (4bytes) + 
> "LAWP" (4 bytes) = 8 bytes)
> While dequeue'ing the WAL file from WALEntryStream, we reset the reader here.
> [WALEntryStream|https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/WALEntryStream.java#L199-L221]
> {code:java}
>   private void tryAdvanceEntry() throws IOException {
> if (checkReader()) {
>   readNextEntryAndSetPosition();
>   if (currentEntry == null) { // no more entries in this log file - see 
> if log was rolled
> if (logQueue.getQueue(walGroupId).size() > 1) { // log was rolled
>   // Before dequeueing, we should always get one more attempt at 
> reading.
>   // This is in case more entries came in after we opened the reader,
>   // and a new log was enqueued while we were reading. See HBASE-6758
>   resetReader(); ---> HERE
>   readNextEntryAndSetPosition();
>   if (currentEntry == null) {
> if (checkAllBytesParsed()) { // now we're certain we're done with 
> this log file
>   dequeueCurrentLog();
>   if (openNextLog()) {
> readNextEntryAndSetPosition();
>   }
> }
>   }
> } // no other logs, we've simply hit the end of the current open log. 
> Do nothing
>   }
> }
> // do nothing if we don't have a WAL Reader (e.g. if there's no logs in 
> queue)
>   }
> {code}
> In resetReader, we call the following methods, WALEntryStream#resetReader  
> >  ProtobufLogReader#reset ---> ProtobufLogReader#initInternal.
> In ProtobufLogReader#initInternal, we try to create the whole reader object 
> from scratch to see if any new data has been written.
> We reset all the fields of ProtobufLogReader except for ReaderBase#fileLength.
> We calculate whether trailer is present or not depending on fileLength.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25924) Seeing a spike in uncleanlyClosedWALs metric.

2021-06-10 Thread Peter Somogyi (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17361105#comment-17361105
 ] 

Peter Somogyi commented on HBASE-25924:
---

This commit also landed on branch-2.3 which was not planned based on the 
previous comments and HBASE-25957 subtask. This commit broke branch-2.3 builds 
so let me revert the change there.

> Seeing a spike in uncleanlyClosedWALs metric.
> -
>
> Key: HBASE-25924
> URL: https://issues.apache.org/jira/browse/HBASE-25924
> Project: HBase
>  Issue Type: Bug
>  Components: Replication, wal
>Affects Versions: 3.0.0-alpha-1, 1.7.0, 2.5.0, 2.4.4
>Reporter: Rushabh Shah
>Assignee: Rushabh Shah
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.5.0, 2.4.4, 1.7.1
>
>
> Getting the following log line in all of our production clusters when 
> WALEntryStream is dequeuing WAL file.
> {noformat}
>  2021-05-02 04:01:30,437 DEBUG [04901996] regionserver.WALEntryStream - 
> Reached the end of WAL file hdfs://. It was not closed 
> cleanly, so we did not parse 8 bytes of data. This is normally ok.
> {noformat}
> The 8 bytes are usually the trailer serialized size (SIZE_OF_INT (4bytes) + 
> "LAWP" (4 bytes) = 8 bytes)
> While dequeue'ing the WAL file from WALEntryStream, we reset the reader here.
> [WALEntryStream|https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/WALEntryStream.java#L199-L221]
> {code:java}
>   private void tryAdvanceEntry() throws IOException {
> if (checkReader()) {
>   readNextEntryAndSetPosition();
>   if (currentEntry == null) { // no more entries in this log file - see 
> if log was rolled
> if (logQueue.getQueue(walGroupId).size() > 1) { // log was rolled
>   // Before dequeueing, we should always get one more attempt at 
> reading.
>   // This is in case more entries came in after we opened the reader,
>   // and a new log was enqueued while we were reading. See HBASE-6758
>   resetReader(); ---> HERE
>   readNextEntryAndSetPosition();
>   if (currentEntry == null) {
> if (checkAllBytesParsed()) { // now we're certain we're done with 
> this log file
>   dequeueCurrentLog();
>   if (openNextLog()) {
> readNextEntryAndSetPosition();
>   }
> }
>   }
> } // no other logs, we've simply hit the end of the current open log. 
> Do nothing
>   }
> }
> // do nothing if we don't have a WAL Reader (e.g. if there's no logs in 
> queue)
>   }
> {code}
> In resetReader, we call the following methods, WALEntryStream#resetReader  
> >  ProtobufLogReader#reset ---> ProtobufLogReader#initInternal.
> In ProtobufLogReader#initInternal, we try to create the whole reader object 
> from scratch to see if any new data has been written.
> We reset all the fields of ProtobufLogReader except for ReaderBase#fileLength.
> We calculate whether trailer is present or not depending on fileLength.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25924) Seeing a spike in uncleanlyClosedWALs metric.

2021-06-01 Thread Bharath Vissapragada (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17355221#comment-17355221
 ] 

Bharath Vissapragada commented on HBASE-25924:
--

Thanks, opened HBASE-25957 as a subtask. 

> Seeing a spike in uncleanlyClosedWALs metric.
> -
>
> Key: HBASE-25924
> URL: https://issues.apache.org/jira/browse/HBASE-25924
> Project: HBase
>  Issue Type: Bug
>  Components: Replication, wal
>Affects Versions: 3.0.0-alpha-1, 1.7.0, 2.5.0, 2.4.4
>Reporter: Rushabh Shah
>Assignee: Rushabh Shah
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.5.0, 2.3.6, 2.4.4, 1.7.1
>
>
> Getting the following log line in all of our production clusters when 
> WALEntryStream is dequeuing WAL file.
> {noformat}
>  2021-05-02 04:01:30,437 DEBUG [04901996] regionserver.WALEntryStream - 
> Reached the end of WAL file hdfs://. It was not closed 
> cleanly, so we did not parse 8 bytes of data. This is normally ok.
> {noformat}
> The 8 bytes are usually the trailer serialized size (SIZE_OF_INT (4bytes) + 
> "LAWP" (4 bytes) = 8 bytes)
> While dequeue'ing the WAL file from WALEntryStream, we reset the reader here.
> [WALEntryStream|https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/WALEntryStream.java#L199-L221]
> {code:java}
>   private void tryAdvanceEntry() throws IOException {
> if (checkReader()) {
>   readNextEntryAndSetPosition();
>   if (currentEntry == null) { // no more entries in this log file - see 
> if log was rolled
> if (logQueue.getQueue(walGroupId).size() > 1) { // log was rolled
>   // Before dequeueing, we should always get one more attempt at 
> reading.
>   // This is in case more entries came in after we opened the reader,
>   // and a new log was enqueued while we were reading. See HBASE-6758
>   resetReader(); ---> HERE
>   readNextEntryAndSetPosition();
>   if (currentEntry == null) {
> if (checkAllBytesParsed()) { // now we're certain we're done with 
> this log file
>   dequeueCurrentLog();
>   if (openNextLog()) {
> readNextEntryAndSetPosition();
>   }
> }
>   }
> } // no other logs, we've simply hit the end of the current open log. 
> Do nothing
>   }
> }
> // do nothing if we don't have a WAL Reader (e.g. if there's no logs in 
> queue)
>   }
> {code}
> In resetReader, we call the following methods, WALEntryStream#resetReader  
> >  ProtobufLogReader#reset ---> ProtobufLogReader#initInternal.
> In ProtobufLogReader#initInternal, we try to create the whole reader object 
> from scratch to see if any new data has been written.
> We reset all the fields of ProtobufLogReader except for ReaderBase#fileLength.
> We calculate whether trailer is present or not depending on fileLength.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25924) Seeing a spike in uncleanlyClosedWALs metric.

2021-06-01 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17355208#comment-17355208
 ] 

Andrew Kyle Purtell commented on HBASE-25924:
-

bq. Whats the general guidance on back porting to branch-2.3? 

It is a live branch that we are still releasing from, so should receive all 
relevant bug fixes and changes that are meaningful for cross-branch 
compatibility (i.e. impacting an upgrade from 1.x, or impacting an upgrade to 
2.4 or later).

bq. That branch has diverged quite a bit and this patch doesn't apply cleanly. 

It's fine to resolve this issue without a 2.3 fix version and open a subtask or 
another jira for a backport to 2.3.

> Seeing a spike in uncleanlyClosedWALs metric.
> -
>
> Key: HBASE-25924
> URL: https://issues.apache.org/jira/browse/HBASE-25924
> Project: HBase
>  Issue Type: Bug
>  Components: Replication, wal
>Affects Versions: 3.0.0-alpha-1, 1.7.0, 2.5.0, 2.4.4
>Reporter: Rushabh Shah
>Assignee: Rushabh Shah
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.5.0, 2.3.6, 2.4.4, 1.7.1
>
>
> Getting the following log line in all of our production clusters when 
> WALEntryStream is dequeuing WAL file.
> {noformat}
>  2021-05-02 04:01:30,437 DEBUG [04901996] regionserver.WALEntryStream - 
> Reached the end of WAL file hdfs://. It was not closed 
> cleanly, so we did not parse 8 bytes of data. This is normally ok.
> {noformat}
> The 8 bytes are usually the trailer serialized size (SIZE_OF_INT (4bytes) + 
> "LAWP" (4 bytes) = 8 bytes)
> While dequeue'ing the WAL file from WALEntryStream, we reset the reader here.
> [WALEntryStream|https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/WALEntryStream.java#L199-L221]
> {code:java}
>   private void tryAdvanceEntry() throws IOException {
> if (checkReader()) {
>   readNextEntryAndSetPosition();
>   if (currentEntry == null) { // no more entries in this log file - see 
> if log was rolled
> if (logQueue.getQueue(walGroupId).size() > 1) { // log was rolled
>   // Before dequeueing, we should always get one more attempt at 
> reading.
>   // This is in case more entries came in after we opened the reader,
>   // and a new log was enqueued while we were reading. See HBASE-6758
>   resetReader(); ---> HERE
>   readNextEntryAndSetPosition();
>   if (currentEntry == null) {
> if (checkAllBytesParsed()) { // now we're certain we're done with 
> this log file
>   dequeueCurrentLog();
>   if (openNextLog()) {
> readNextEntryAndSetPosition();
>   }
> }
>   }
> } // no other logs, we've simply hit the end of the current open log. 
> Do nothing
>   }
> }
> // do nothing if we don't have a WAL Reader (e.g. if there's no logs in 
> queue)
>   }
> {code}
> In resetReader, we call the following methods, WALEntryStream#resetReader  
> >  ProtobufLogReader#reset ---> ProtobufLogReader#initInternal.
> In ProtobufLogReader#initInternal, we try to create the whole reader object 
> from scratch to see if any new data has been written.
> We reset all the fields of ProtobufLogReader except for ReaderBase#fileLength.
> We calculate whether trailer is present or not depending on fileLength.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25924) Seeing a spike in uncleanlyClosedWALs metric.

2021-06-01 Thread Bharath Vissapragada (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17355178#comment-17355178
 ] 

Bharath Vissapragada commented on HBASE-25924:
--

HBASE-25932 is now committed to master/branch-2/branch2.4.

[~apurtell] Whats the general guidance on back porting to branch-2.3? That 
branch has diverged quite a bit and this patch doesn't apply cleanly. 

> Seeing a spike in uncleanlyClosedWALs metric.
> -
>
> Key: HBASE-25924
> URL: https://issues.apache.org/jira/browse/HBASE-25924
> Project: HBase
>  Issue Type: Bug
>  Components: Replication, wal
>Affects Versions: 3.0.0-alpha-1, 1.7.0, 2.5.0, 2.4.4
>Reporter: Rushabh Shah
>Assignee: Rushabh Shah
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.5.0, 2.3.6, 2.4.4, 1.7.1
>
>
> Getting the following log line in all of our production clusters when 
> WALEntryStream is dequeuing WAL file.
> {noformat}
>  2021-05-02 04:01:30,437 DEBUG [04901996] regionserver.WALEntryStream - 
> Reached the end of WAL file hdfs://. It was not closed 
> cleanly, so we did not parse 8 bytes of data. This is normally ok.
> {noformat}
> The 8 bytes are usually the trailer serialized size (SIZE_OF_INT (4bytes) + 
> "LAWP" (4 bytes) = 8 bytes)
> While dequeue'ing the WAL file from WALEntryStream, we reset the reader here.
> [WALEntryStream|https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/WALEntryStream.java#L199-L221]
> {code:java}
>   private void tryAdvanceEntry() throws IOException {
> if (checkReader()) {
>   readNextEntryAndSetPosition();
>   if (currentEntry == null) { // no more entries in this log file - see 
> if log was rolled
> if (logQueue.getQueue(walGroupId).size() > 1) { // log was rolled
>   // Before dequeueing, we should always get one more attempt at 
> reading.
>   // This is in case more entries came in after we opened the reader,
>   // and a new log was enqueued while we were reading. See HBASE-6758
>   resetReader(); ---> HERE
>   readNextEntryAndSetPosition();
>   if (currentEntry == null) {
> if (checkAllBytesParsed()) { // now we're certain we're done with 
> this log file
>   dequeueCurrentLog();
>   if (openNextLog()) {
> readNextEntryAndSetPosition();
>   }
> }
>   }
> } // no other logs, we've simply hit the end of the current open log. 
> Do nothing
>   }
> }
> // do nothing if we don't have a WAL Reader (e.g. if there's no logs in 
> queue)
>   }
> {code}
> In resetReader, we call the following methods, WALEntryStream#resetReader  
> >  ProtobufLogReader#reset ---> ProtobufLogReader#initInternal.
> In ProtobufLogReader#initInternal, we try to create the whole reader object 
> from scratch to see if any new data has been written.
> We reset all the fields of ProtobufLogReader except for ReaderBase#fileLength.
> We calculate whether trailer is present or not depending on fileLength.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25924) Seeing a spike in uncleanlyClosedWALs metric.

2021-05-28 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17353455#comment-17353455
 ] 

Andrew Kyle Purtell commented on HBASE-25924:
-

HBASE-25932 is making progress. We shouldn't close this until the issue is 
resolved one way or another, though. I've linked the JIRAs.

> Seeing a spike in uncleanlyClosedWALs metric.
> -
>
> Key: HBASE-25924
> URL: https://issues.apache.org/jira/browse/HBASE-25924
> Project: HBase
>  Issue Type: Bug
>  Components: Replication, wal
>Affects Versions: 3.0.0-alpha-1, 1.7.0, 2.5.0, 2.4.4
>Reporter: Rushabh Shah
>Assignee: Rushabh Shah
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.5.0, 2.3.6, 2.4.4, 1.7.1
>
>
> Getting the following log line in all of our production clusters when 
> WALEntryStream is dequeuing WAL file.
> {noformat}
>  2021-05-02 04:01:30,437 DEBUG [04901996] regionserver.WALEntryStream - 
> Reached the end of WAL file hdfs://. It was not closed 
> cleanly, so we did not parse 8 bytes of data. This is normally ok.
> {noformat}
> The 8 bytes are usually the trailer serialized size (SIZE_OF_INT (4bytes) + 
> "LAWP" (4 bytes) = 8 bytes)
> While dequeue'ing the WAL file from WALEntryStream, we reset the reader here.
> [WALEntryStream|https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/WALEntryStream.java#L199-L221]
> {code:java}
>   private void tryAdvanceEntry() throws IOException {
> if (checkReader()) {
>   readNextEntryAndSetPosition();
>   if (currentEntry == null) { // no more entries in this log file - see 
> if log was rolled
> if (logQueue.getQueue(walGroupId).size() > 1) { // log was rolled
>   // Before dequeueing, we should always get one more attempt at 
> reading.
>   // This is in case more entries came in after we opened the reader,
>   // and a new log was enqueued while we were reading. See HBASE-6758
>   resetReader(); ---> HERE
>   readNextEntryAndSetPosition();
>   if (currentEntry == null) {
> if (checkAllBytesParsed()) { // now we're certain we're done with 
> this log file
>   dequeueCurrentLog();
>   if (openNextLog()) {
> readNextEntryAndSetPosition();
>   }
> }
>   }
> } // no other logs, we've simply hit the end of the current open log. 
> Do nothing
>   }
> }
> // do nothing if we don't have a WAL Reader (e.g. if there's no logs in 
> queue)
>   }
> {code}
> In resetReader, we call the following methods, WALEntryStream#resetReader  
> >  ProtobufLogReader#reset ---> ProtobufLogReader#initInternal.
> In ProtobufLogReader#initInternal, we try to create the whole reader object 
> from scratch to see if any new data has been written.
> We reset all the fields of ProtobufLogReader except for ReaderBase#fileLength.
> We calculate whether trailer is present or not depending on fileLength.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25924) Seeing a spike in uncleanlyClosedWALs metric.

2021-05-27 Thread Rushabh Shah (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17352729#comment-17352729
 ] 

Rushabh Shah commented on HBASE-25924:
--

[~apurtell] let's wait until tomorrow. If  unable to find a fix then lets 
revert in master and branch-2.

> Seeing a spike in uncleanlyClosedWALs metric.
> -
>
> Key: HBASE-25924
> URL: https://issues.apache.org/jira/browse/HBASE-25924
> Project: HBase
>  Issue Type: Bug
>  Components: Replication, wal
>Affects Versions: 3.0.0-alpha-1, 1.7.0, 2.5.0, 2.4.4
>Reporter: Rushabh Shah
>Assignee: Rushabh Shah
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.5.0, 2.3.6, 2.4.4, 1.7.1
>
>
> Getting the following log line in all of our production clusters when 
> WALEntryStream is dequeuing WAL file.
> {noformat}
>  2021-05-02 04:01:30,437 DEBUG [04901996] regionserver.WALEntryStream - 
> Reached the end of WAL file hdfs://. It was not closed 
> cleanly, so we did not parse 8 bytes of data. This is normally ok.
> {noformat}
> The 8 bytes are usually the trailer serialized size (SIZE_OF_INT (4bytes) + 
> "LAWP" (4 bytes) = 8 bytes)
> While dequeue'ing the WAL file from WALEntryStream, we reset the reader here.
> [WALEntryStream|https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/WALEntryStream.java#L199-L221]
> {code:java}
>   private void tryAdvanceEntry() throws IOException {
> if (checkReader()) {
>   readNextEntryAndSetPosition();
>   if (currentEntry == null) { // no more entries in this log file - see 
> if log was rolled
> if (logQueue.getQueue(walGroupId).size() > 1) { // log was rolled
>   // Before dequeueing, we should always get one more attempt at 
> reading.
>   // This is in case more entries came in after we opened the reader,
>   // and a new log was enqueued while we were reading. See HBASE-6758
>   resetReader(); ---> HERE
>   readNextEntryAndSetPosition();
>   if (currentEntry == null) {
> if (checkAllBytesParsed()) { // now we're certain we're done with 
> this log file
>   dequeueCurrentLog();
>   if (openNextLog()) {
> readNextEntryAndSetPosition();
>   }
> }
>   }
> } // no other logs, we've simply hit the end of the current open log. 
> Do nothing
>   }
> }
> // do nothing if we don't have a WAL Reader (e.g. if there's no logs in 
> queue)
>   }
> {code}
> In resetReader, we call the following methods, WALEntryStream#resetReader  
> >  ProtobufLogReader#reset ---> ProtobufLogReader#initInternal.
> In ProtobufLogReader#initInternal, we try to create the whole reader object 
> from scratch to see if any new data has been written.
> We reset all the fields of ProtobufLogReader except for ReaderBase#fileLength.
> We calculate whether trailer is present or not depending on fileLength.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25924) Seeing a spike in uncleanlyClosedWALs metric.

2021-05-27 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17352720#comment-17352720
 ] 

Andrew Kyle Purtell commented on HBASE-25924:
-

The test failure is being tracked by HBASE-25932. Still leaving this open, 
because we shouldn't have a consistently failing test checked in for long. If 
it's going to take a while to resolve, better to revert this until its ready. 

> Seeing a spike in uncleanlyClosedWALs metric.
> -
>
> Key: HBASE-25924
> URL: https://issues.apache.org/jira/browse/HBASE-25924
> Project: HBase
>  Issue Type: Bug
>  Components: Replication, wal
>Affects Versions: 3.0.0-alpha-1, 1.7.0, 2.5.0, 2.4.4
>Reporter: Rushabh Shah
>Assignee: Rushabh Shah
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.5.0, 2.3.6, 2.4.4, 1.7.1
>
>
> Getting the following log line in all of our production clusters when 
> WALEntryStream is dequeuing WAL file.
> {noformat}
>  2021-05-02 04:01:30,437 DEBUG [04901996] regionserver.WALEntryStream - 
> Reached the end of WAL file hdfs://. It was not closed 
> cleanly, so we did not parse 8 bytes of data. This is normally ok.
> {noformat}
> The 8 bytes are usually the trailer serialized size (SIZE_OF_INT (4bytes) + 
> "LAWP" (4 bytes) = 8 bytes)
> While dequeue'ing the WAL file from WALEntryStream, we reset the reader here.
> [WALEntryStream|https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/WALEntryStream.java#L199-L221]
> {code:java}
>   private void tryAdvanceEntry() throws IOException {
> if (checkReader()) {
>   readNextEntryAndSetPosition();
>   if (currentEntry == null) { // no more entries in this log file - see 
> if log was rolled
> if (logQueue.getQueue(walGroupId).size() > 1) { // log was rolled
>   // Before dequeueing, we should always get one more attempt at 
> reading.
>   // This is in case more entries came in after we opened the reader,
>   // and a new log was enqueued while we were reading. See HBASE-6758
>   resetReader(); ---> HERE
>   readNextEntryAndSetPosition();
>   if (currentEntry == null) {
> if (checkAllBytesParsed()) { // now we're certain we're done with 
> this log file
>   dequeueCurrentLog();
>   if (openNextLog()) {
> readNextEntryAndSetPosition();
>   }
> }
>   }
> } // no other logs, we've simply hit the end of the current open log. 
> Do nothing
>   }
> }
> // do nothing if we don't have a WAL Reader (e.g. if there's no logs in 
> queue)
>   }
> {code}
> In resetReader, we call the following methods, WALEntryStream#resetReader  
> >  ProtobufLogReader#reset ---> ProtobufLogReader#initInternal.
> In ProtobufLogReader#initInternal, we try to create the whole reader object 
> from scratch to see if any new data has been written.
> We reset all the fields of ProtobufLogReader except for ReaderBase#fileLength.
> We calculate whether trailer is present or not depending on fileLength.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25924) Seeing a spike in uncleanlyClosedWALs metric.

2021-05-27 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17352715#comment-17352715
 ] 

Hudson commented on HBASE-25924:


Results for branch branch-2.4
[build #128 on 
builds.a.o|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.4/128/]:
 (x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.4/128/General_20Nightly_20Build_20Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.4/128/JDK8_20Nightly_20Build_20Report_20_28Hadoop2_29/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.4/128/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(x) {color:red}-1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.4/128/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Seeing a spike in uncleanlyClosedWALs metric.
> -
>
> Key: HBASE-25924
> URL: https://issues.apache.org/jira/browse/HBASE-25924
> Project: HBase
>  Issue Type: Bug
>  Components: Replication, wal
>Affects Versions: 3.0.0-alpha-1, 1.7.0, 2.5.0, 2.4.4
>Reporter: Rushabh Shah
>Assignee: Rushabh Shah
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.5.0, 2.3.6, 2.4.4, 1.7.1
>
>
> Getting the following log line in all of our production clusters when 
> WALEntryStream is dequeuing WAL file.
> {noformat}
>  2021-05-02 04:01:30,437 DEBUG [04901996] regionserver.WALEntryStream - 
> Reached the end of WAL file hdfs://. It was not closed 
> cleanly, so we did not parse 8 bytes of data. This is normally ok.
> {noformat}
> The 8 bytes are usually the trailer serialized size (SIZE_OF_INT (4bytes) + 
> "LAWP" (4 bytes) = 8 bytes)
> While dequeue'ing the WAL file from WALEntryStream, we reset the reader here.
> [WALEntryStream|https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/WALEntryStream.java#L199-L221]
> {code:java}
>   private void tryAdvanceEntry() throws IOException {
> if (checkReader()) {
>   readNextEntryAndSetPosition();
>   if (currentEntry == null) { // no more entries in this log file - see 
> if log was rolled
> if (logQueue.getQueue(walGroupId).size() > 1) { // log was rolled
>   // Before dequeueing, we should always get one more attempt at 
> reading.
>   // This is in case more entries came in after we opened the reader,
>   // and a new log was enqueued while we were reading. See HBASE-6758
>   resetReader(); ---> HERE
>   readNextEntryAndSetPosition();
>   if (currentEntry == null) {
> if (checkAllBytesParsed()) { // now we're certain we're done with 
> this log file
>   dequeueCurrentLog();
>   if (openNextLog()) {
> readNextEntryAndSetPosition();
>   }
> }
>   }
> } // no other logs, we've simply hit the end of the current open log. 
> Do nothing
>   }
> }
> // do nothing if we don't have a WAL Reader (e.g. if there's no logs in 
> queue)
>   }
> {code}
> In resetReader, we call the following methods, WALEntryStream#resetReader  
> >  ProtobufLogReader#reset ---> ProtobufLogReader#initInternal.
> In ProtobufLogReader#initInternal, we try to create the whole reader object 
> from scratch to see if any new data has been written.
> We reset all the fields of ProtobufLogReader except for ReaderBase#fileLength.
> We calculate whether trailer is present or not depending on fileLength.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25924) Seeing a spike in uncleanlyClosedWALs metric.

2021-05-27 Thread Rushabh Shah (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17352708#comment-17352708
 ] 

Rushabh Shah commented on HBASE-25924:
--

[~apurtell] I created this ticket 
https://issues.apache.org/jira/browse/HBASE-25932 to track the fix.

> Seeing a spike in uncleanlyClosedWALs metric.
> -
>
> Key: HBASE-25924
> URL: https://issues.apache.org/jira/browse/HBASE-25924
> Project: HBase
>  Issue Type: Bug
>  Components: Replication, wal
>Affects Versions: 3.0.0-alpha-1, 1.7.0, 2.5.0, 2.4.4
>Reporter: Rushabh Shah
>Assignee: Rushabh Shah
>Priority: Major
> Fix For: 3.0.0-alpha-1, 1.7.0, 2.5.0, 2.3.6, 2.4.4
>
>
> Getting the following log line in all of our production clusters when 
> WALEntryStream is dequeuing WAL file.
> {noformat}
>  2021-05-02 04:01:30,437 DEBUG [04901996] regionserver.WALEntryStream - 
> Reached the end of WAL file hdfs://. It was not closed 
> cleanly, so we did not parse 8 bytes of data. This is normally ok.
> {noformat}
> The 8 bytes are usually the trailer serialized size (SIZE_OF_INT (4bytes) + 
> "LAWP" (4 bytes) = 8 bytes)
> While dequeue'ing the WAL file from WALEntryStream, we reset the reader here.
> [WALEntryStream|https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/WALEntryStream.java#L199-L221]
> {code:java}
>   private void tryAdvanceEntry() throws IOException {
> if (checkReader()) {
>   readNextEntryAndSetPosition();
>   if (currentEntry == null) { // no more entries in this log file - see 
> if log was rolled
> if (logQueue.getQueue(walGroupId).size() > 1) { // log was rolled
>   // Before dequeueing, we should always get one more attempt at 
> reading.
>   // This is in case more entries came in after we opened the reader,
>   // and a new log was enqueued while we were reading. See HBASE-6758
>   resetReader(); ---> HERE
>   readNextEntryAndSetPosition();
>   if (currentEntry == null) {
> if (checkAllBytesParsed()) { // now we're certain we're done with 
> this log file
>   dequeueCurrentLog();
>   if (openNextLog()) {
> readNextEntryAndSetPosition();
>   }
> }
>   }
> } // no other logs, we've simply hit the end of the current open log. 
> Do nothing
>   }
> }
> // do nothing if we don't have a WAL Reader (e.g. if there's no logs in 
> queue)
>   }
> {code}
> In resetReader, we call the following methods, WALEntryStream#resetReader  
> >  ProtobufLogReader#reset ---> ProtobufLogReader#initInternal.
> In ProtobufLogReader#initInternal, we try to create the whole reader object 
> from scratch to see if any new data has been written.
> We reset all the fields of ProtobufLogReader except for ReaderBase#fileLength.
> We calculate whether trailer is present or not depending on fileLength.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25924) Seeing a spike in uncleanlyClosedWALs metric.

2021-05-27 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17352632#comment-17352632
 ] 

Hudson commented on HBASE-25924:


Results for branch branch-2
[build #262 on 
builds.a.o|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/262/]:
 (x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/262/General_20Nightly_20Build_20Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/262/JDK8_20Nightly_20Build_20Report_20_28Hadoop2_29/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/262/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(x) {color:red}-1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/262/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Seeing a spike in uncleanlyClosedWALs metric.
> -
>
> Key: HBASE-25924
> URL: https://issues.apache.org/jira/browse/HBASE-25924
> Project: HBase
>  Issue Type: Bug
>  Components: Replication, wal
>Affects Versions: 3.0.0-alpha-1, 1.7.0, 2.5.0, 2.4.4
>Reporter: Rushabh Shah
>Assignee: Rushabh Shah
>Priority: Major
> Fix For: 3.0.0-alpha-1, 1.7.0, 2.5.0, 2.3.6, 2.4.4
>
>
> Getting the following log line in all of our production clusters when 
> WALEntryStream is dequeuing WAL file.
> {noformat}
>  2021-05-02 04:01:30,437 DEBUG [04901996] regionserver.WALEntryStream - 
> Reached the end of WAL file hdfs://. It was not closed 
> cleanly, so we did not parse 8 bytes of data. This is normally ok.
> {noformat}
> The 8 bytes are usually the trailer serialized size (SIZE_OF_INT (4bytes) + 
> "LAWP" (4 bytes) = 8 bytes)
> While dequeue'ing the WAL file from WALEntryStream, we reset the reader here.
> [WALEntryStream|https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/WALEntryStream.java#L199-L221]
> {code:java}
>   private void tryAdvanceEntry() throws IOException {
> if (checkReader()) {
>   readNextEntryAndSetPosition();
>   if (currentEntry == null) { // no more entries in this log file - see 
> if log was rolled
> if (logQueue.getQueue(walGroupId).size() > 1) { // log was rolled
>   // Before dequeueing, we should always get one more attempt at 
> reading.
>   // This is in case more entries came in after we opened the reader,
>   // and a new log was enqueued while we were reading. See HBASE-6758
>   resetReader(); ---> HERE
>   readNextEntryAndSetPosition();
>   if (currentEntry == null) {
> if (checkAllBytesParsed()) { // now we're certain we're done with 
> this log file
>   dequeueCurrentLog();
>   if (openNextLog()) {
> readNextEntryAndSetPosition();
>   }
> }
>   }
> } // no other logs, we've simply hit the end of the current open log. 
> Do nothing
>   }
> }
> // do nothing if we don't have a WAL Reader (e.g. if there's no logs in 
> queue)
>   }
> {code}
> In resetReader, we call the following methods, WALEntryStream#resetReader  
> >  ProtobufLogReader#reset ---> ProtobufLogReader#initInternal.
> In ProtobufLogReader#initInternal, we try to create the whole reader object 
> from scratch to see if any new data has been written.
> We reset all the fields of ProtobufLogReader except for ReaderBase#fileLength.
> We calculate whether trailer is present or not depending on fileLength.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25924) Seeing a spike in uncleanlyClosedWALs metric.

2021-05-27 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17352561#comment-17352561
 ] 

Hudson commented on HBASE-25924:


Results for branch branch-1
[build #132 on 
builds.a.o|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-1/132/]:
 (x) *{color:red}-1 overall{color}*

details (if available):

(x) {color:red}-1 general checks{color}
-- Something went wrong running this stage, please [check relevant console 
output|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-1/132//console].


(x) {color:red}-1 jdk7 checks{color}
-- For more information [see jdk7 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-1/132//JDK7_Nightly_Build_Report/]


(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-1/132//JDK8_Nightly_Build_Report_(Hadoop2)/]




(/) {color:green}+1 source release artifact{color}
-- See build output for details.


> Seeing a spike in uncleanlyClosedWALs metric.
> -
>
> Key: HBASE-25924
> URL: https://issues.apache.org/jira/browse/HBASE-25924
> Project: HBase
>  Issue Type: Bug
>  Components: Replication, wal
>Affects Versions: 3.0.0-alpha-1, 1.7.0, 2.5.0, 2.4.4
>Reporter: Rushabh Shah
>Assignee: Rushabh Shah
>Priority: Major
> Fix For: 3.0.0-alpha-1, 1.7.0, 2.5.0, 2.3.6, 2.4.4
>
>
> Getting the following log line in all of our production clusters when 
> WALEntryStream is dequeuing WAL file.
> {noformat}
>  2021-05-02 04:01:30,437 DEBUG [04901996] regionserver.WALEntryStream - 
> Reached the end of WAL file hdfs://. It was not closed 
> cleanly, so we did not parse 8 bytes of data. This is normally ok.
> {noformat}
> The 8 bytes are usually the trailer serialized size (SIZE_OF_INT (4bytes) + 
> "LAWP" (4 bytes) = 8 bytes)
> While dequeue'ing the WAL file from WALEntryStream, we reset the reader here.
> [WALEntryStream|https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/WALEntryStream.java#L199-L221]
> {code:java}
>   private void tryAdvanceEntry() throws IOException {
> if (checkReader()) {
>   readNextEntryAndSetPosition();
>   if (currentEntry == null) { // no more entries in this log file - see 
> if log was rolled
> if (logQueue.getQueue(walGroupId).size() > 1) { // log was rolled
>   // Before dequeueing, we should always get one more attempt at 
> reading.
>   // This is in case more entries came in after we opened the reader,
>   // and a new log was enqueued while we were reading. See HBASE-6758
>   resetReader(); ---> HERE
>   readNextEntryAndSetPosition();
>   if (currentEntry == null) {
> if (checkAllBytesParsed()) { // now we're certain we're done with 
> this log file
>   dequeueCurrentLog();
>   if (openNextLog()) {
> readNextEntryAndSetPosition();
>   }
> }
>   }
> } // no other logs, we've simply hit the end of the current open log. 
> Do nothing
>   }
> }
> // do nothing if we don't have a WAL Reader (e.g. if there's no logs in 
> queue)
>   }
> {code}
> In resetReader, we call the following methods, WALEntryStream#resetReader  
> >  ProtobufLogReader#reset ---> ProtobufLogReader#initInternal.
> In ProtobufLogReader#initInternal, we try to create the whole reader object 
> from scratch to see if any new data has been written.
> We reset all the fields of ProtobufLogReader except for ReaderBase#fileLength.
> We calculate whether trailer is present or not depending on fileLength.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25924) Seeing a spike in uncleanlyClosedWALs metric.

2021-05-27 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17352481#comment-17352481
 ] 

Hudson commented on HBASE-25924:


Results for branch master
[build #306 on 
builds.a.o|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/master/306/]:
 (x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/master/306/General_20Nightly_20Build_20Report/]






(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/master/306/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(x) {color:red}-1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/master/306/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Seeing a spike in uncleanlyClosedWALs metric.
> -
>
> Key: HBASE-25924
> URL: https://issues.apache.org/jira/browse/HBASE-25924
> Project: HBase
>  Issue Type: Bug
>  Components: Replication, wal
>Affects Versions: 3.0.0-alpha-1, 1.7.0, 2.5.0, 2.4.4
>Reporter: Rushabh Shah
>Assignee: Rushabh Shah
>Priority: Major
> Fix For: 3.0.0-alpha-1, 1.7.0, 2.5.0, 2.3.6, 2.4.4
>
>
> Getting the following log line in all of our production clusters when 
> WALEntryStream is dequeuing WAL file.
> {noformat}
>  2021-05-02 04:01:30,437 DEBUG [04901996] regionserver.WALEntryStream - 
> Reached the end of WAL file hdfs://. It was not closed 
> cleanly, so we did not parse 8 bytes of data. This is normally ok.
> {noformat}
> The 8 bytes are usually the trailer serialized size (SIZE_OF_INT (4bytes) + 
> "LAWP" (4 bytes) = 8 bytes)
> While dequeue'ing the WAL file from WALEntryStream, we reset the reader here.
> [WALEntryStream|https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/WALEntryStream.java#L199-L221]
> {code:java}
>   private void tryAdvanceEntry() throws IOException {
> if (checkReader()) {
>   readNextEntryAndSetPosition();
>   if (currentEntry == null) { // no more entries in this log file - see 
> if log was rolled
> if (logQueue.getQueue(walGroupId).size() > 1) { // log was rolled
>   // Before dequeueing, we should always get one more attempt at 
> reading.
>   // This is in case more entries came in after we opened the reader,
>   // and a new log was enqueued while we were reading. See HBASE-6758
>   resetReader(); ---> HERE
>   readNextEntryAndSetPosition();
>   if (currentEntry == null) {
> if (checkAllBytesParsed()) { // now we're certain we're done with 
> this log file
>   dequeueCurrentLog();
>   if (openNextLog()) {
> readNextEntryAndSetPosition();
>   }
> }
>   }
> } // no other logs, we've simply hit the end of the current open log. 
> Do nothing
>   }
> }
> // do nothing if we don't have a WAL Reader (e.g. if there's no logs in 
> queue)
>   }
> {code}
> In resetReader, we call the following methods, WALEntryStream#resetReader  
> >  ProtobufLogReader#reset ---> ProtobufLogReader#initInternal.
> In ProtobufLogReader#initInternal, we try to create the whole reader object 
> from scratch to see if any new data has been written.
> We reset all the fields of ProtobufLogReader except for ReaderBase#fileLength.
> We calculate whether trailer is present or not depending on fileLength.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25924) Seeing a spike in uncleanlyClosedWALs metric.

2021-05-27 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17352362#comment-17352362
 ] 

Hudson commented on HBASE-25924:


Results for branch branch-2.3
[build #225 on 
builds.a.o|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.3/225/]:
 (x) *{color:red}-1 overall{color}*

details (if available):

(x) {color:red}-1 general checks{color}
-- For more information [see general 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.3/225/General_20Nightly_20Build_20Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.3/225/JDK8_20Nightly_20Build_20Report_20_28Hadoop2_29/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.3/225/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(x) {color:red}-1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.3/225/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(x) {color:red}-1 source release artifact{color}
-- See build output for details.


(x) {color:red}-1 client integration test{color}
-- Something went wrong with this stage, [check relevant console 
output|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.3/225//console].


> Seeing a spike in uncleanlyClosedWALs metric.
> -
>
> Key: HBASE-25924
> URL: https://issues.apache.org/jira/browse/HBASE-25924
> Project: HBase
>  Issue Type: Bug
>  Components: Replication, wal
>Affects Versions: 3.0.0-alpha-1, 1.7.0, 2.5.0, 2.4.4
>Reporter: Rushabh Shah
>Assignee: Rushabh Shah
>Priority: Major
> Fix For: 3.0.0-alpha-1, 1.7.0, 2.5.0, 2.3.6, 2.4.4
>
>
> Getting the following log line in all of our production clusters when 
> WALEntryStream is dequeuing WAL file.
> {noformat}
>  2021-05-02 04:01:30,437 DEBUG [04901996] regionserver.WALEntryStream - 
> Reached the end of WAL file hdfs://. It was not closed 
> cleanly, so we did not parse 8 bytes of data. This is normally ok.
> {noformat}
> The 8 bytes are usually the trailer serialized size (SIZE_OF_INT (4bytes) + 
> "LAWP" (4 bytes) = 8 bytes)
> While dequeue'ing the WAL file from WALEntryStream, we reset the reader here.
> [WALEntryStream|https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/WALEntryStream.java#L199-L221]
> {code:java}
>   private void tryAdvanceEntry() throws IOException {
> if (checkReader()) {
>   readNextEntryAndSetPosition();
>   if (currentEntry == null) { // no more entries in this log file - see 
> if log was rolled
> if (logQueue.getQueue(walGroupId).size() > 1) { // log was rolled
>   // Before dequeueing, we should always get one more attempt at 
> reading.
>   // This is in case more entries came in after we opened the reader,
>   // and a new log was enqueued while we were reading. See HBASE-6758
>   resetReader(); ---> HERE
>   readNextEntryAndSetPosition();
>   if (currentEntry == null) {
> if (checkAllBytesParsed()) { // now we're certain we're done with 
> this log file
>   dequeueCurrentLog();
>   if (openNextLog()) {
> readNextEntryAndSetPosition();
>   }
> }
>   }
> } // no other logs, we've simply hit the end of the current open log. 
> Do nothing
>   }
> }
> // do nothing if we don't have a WAL Reader (e.g. if there's no logs in 
> queue)
>   }
> {code}
> In resetReader, we call the following methods, WALEntryStream#resetReader  
> >  ProtobufLogReader#reset ---> ProtobufLogReader#initInternal.
> In ProtobufLogReader#initInternal, we try to create the whole reader object 
> from scratch to see if any new data has been written.
> We reset all the fields of ProtobufLogReader except for ReaderBase#fileLength.
> We calculate whether trailer is present or not depending on fileLength.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25924) Seeing a spike in uncleanlyClosedWALs metric.

2021-05-26 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17351972#comment-17351972
 ] 

Andrew Kyle Purtell commented on HBASE-25924:
-

Thanks for the fix [~shahrs87] !

> Seeing a spike in uncleanlyClosedWALs metric.
> -
>
> Key: HBASE-25924
> URL: https://issues.apache.org/jira/browse/HBASE-25924
> Project: HBase
>  Issue Type: Bug
>  Components: Replication, wal
>Affects Versions: 3.0.0-alpha-1, 1.7.0, 2.5.0, 2.4.4
>Reporter: Rushabh Shah
>Assignee: Rushabh Shah
>Priority: Major
> Fix For: 3.0.0-alpha-1, 1.7.0, 2.5.0, 2.3.6, 2.4.4
>
>
> Getting the following log line in all of our production clusters when 
> WALEntryStream is dequeuing WAL file.
> {noformat}
>  2021-05-02 04:01:30,437 DEBUG [04901996] regionserver.WALEntryStream - 
> Reached the end of WAL file hdfs://. It was not closed 
> cleanly, so we did not parse 8 bytes of data. This is normally ok.
> {noformat}
> The 8 bytes are usually the trailer serialized size (SIZE_OF_INT (4bytes) + 
> "LAWP" (4 bytes) = 8 bytes)
> While dequeue'ing the WAL file from WALEntryStream, we reset the reader here.
> [WALEntryStream|https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/WALEntryStream.java#L199-L221]
> {code:java}
>   private void tryAdvanceEntry() throws IOException {
> if (checkReader()) {
>   readNextEntryAndSetPosition();
>   if (currentEntry == null) { // no more entries in this log file - see 
> if log was rolled
> if (logQueue.getQueue(walGroupId).size() > 1) { // log was rolled
>   // Before dequeueing, we should always get one more attempt at 
> reading.
>   // This is in case more entries came in after we opened the reader,
>   // and a new log was enqueued while we were reading. See HBASE-6758
>   resetReader(); ---> HERE
>   readNextEntryAndSetPosition();
>   if (currentEntry == null) {
> if (checkAllBytesParsed()) { // now we're certain we're done with 
> this log file
>   dequeueCurrentLog();
>   if (openNextLog()) {
> readNextEntryAndSetPosition();
>   }
> }
>   }
> } // no other logs, we've simply hit the end of the current open log. 
> Do nothing
>   }
> }
> // do nothing if we don't have a WAL Reader (e.g. if there's no logs in 
> queue)
>   }
> {code}
> In resetReader, we call the following methods, WALEntryStream#resetReader  
> >  ProtobufLogReader#reset ---> ProtobufLogReader#initInternal.
> In ProtobufLogReader#initInternal, we try to create the whole reader object 
> from scratch to see if any new data has been written.
> We reset all the fields of ProtobufLogReader except for ReaderBase#fileLength.
> We calculate whether trailer is present or not depending on fileLength.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25924) Seeing a spike in uncleanlyClosedWALs metric.

2021-05-26 Thread Rushabh Shah (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17351973#comment-17351973
 ] 

Rushabh Shah commented on HBASE-25924:
--

Thank you [~apurtell] for the review and commit and [~bharathv] [~vjasani] for 
the reviews. 

> Seeing a spike in uncleanlyClosedWALs metric.
> -
>
> Key: HBASE-25924
> URL: https://issues.apache.org/jira/browse/HBASE-25924
> Project: HBase
>  Issue Type: Bug
>  Components: Replication, wal
>Affects Versions: 3.0.0-alpha-1, 1.7.0, 2.5.0, 2.4.4
>Reporter: Rushabh Shah
>Assignee: Rushabh Shah
>Priority: Major
> Fix For: 3.0.0-alpha-1, 1.7.0, 2.5.0, 2.3.6, 2.4.4
>
>
> Getting the following log line in all of our production clusters when 
> WALEntryStream is dequeuing WAL file.
> {noformat}
>  2021-05-02 04:01:30,437 DEBUG [04901996] regionserver.WALEntryStream - 
> Reached the end of WAL file hdfs://. It was not closed 
> cleanly, so we did not parse 8 bytes of data. This is normally ok.
> {noformat}
> The 8 bytes are usually the trailer serialized size (SIZE_OF_INT (4bytes) + 
> "LAWP" (4 bytes) = 8 bytes)
> While dequeue'ing the WAL file from WALEntryStream, we reset the reader here.
> [WALEntryStream|https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/WALEntryStream.java#L199-L221]
> {code:java}
>   private void tryAdvanceEntry() throws IOException {
> if (checkReader()) {
>   readNextEntryAndSetPosition();
>   if (currentEntry == null) { // no more entries in this log file - see 
> if log was rolled
> if (logQueue.getQueue(walGroupId).size() > 1) { // log was rolled
>   // Before dequeueing, we should always get one more attempt at 
> reading.
>   // This is in case more entries came in after we opened the reader,
>   // and a new log was enqueued while we were reading. See HBASE-6758
>   resetReader(); ---> HERE
>   readNextEntryAndSetPosition();
>   if (currentEntry == null) {
> if (checkAllBytesParsed()) { // now we're certain we're done with 
> this log file
>   dequeueCurrentLog();
>   if (openNextLog()) {
> readNextEntryAndSetPosition();
>   }
> }
>   }
> } // no other logs, we've simply hit the end of the current open log. 
> Do nothing
>   }
> }
> // do nothing if we don't have a WAL Reader (e.g. if there's no logs in 
> queue)
>   }
> {code}
> In resetReader, we call the following methods, WALEntryStream#resetReader  
> >  ProtobufLogReader#reset ---> ProtobufLogReader#initInternal.
> In ProtobufLogReader#initInternal, we try to create the whole reader object 
> from scratch to see if any new data has been written.
> We reset all the fields of ProtobufLogReader except for ReaderBase#fileLength.
> We calculate whether trailer is present or not depending on fileLength.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)