[jira] [Commented] (HBASE-25065) WAL archival to be done by a separate thread

2020-11-09 Thread Anoop Sam John (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17228932#comment-17228932
 ] 

Anoop Sam John commented on HBASE-25065:


[~ram_krish] pls add RN to highlight the configs

> WAL archival to be done by a separate thread
> 
>
> Key: HBASE-25065
> URL: https://issues.apache.org/jira/browse/HBASE-25065
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Affects Versions: 3.0.0-alpha-1, 2.4.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.4.0
>
>
> Currently we do clean up of logs once we ensure that the region data has been 
> flushed. We track the sequence number and if we ensure that the seq number 
> has been flushed for any given region and the WAL that was rolled has that 
> seq number then those WAL can be archived.
> When we have around ~50 files to archive (per RS) - we do the archiving one 
> after the other. Since archiving is nothing but a rename operation it adds to 
> the meta operation load of Cloud based FS. 
> Not only that - the entire archival is done inside the rollWriterLock. Though 
> we have closed the writer and created a new writer and the writes are ongoing 
> - we never release the lock until we are done with the archiving. 
> What happens is that during that period our logs grow in size compared to the 
> default size configured (when we have consistent writes happening). 
> So the proposal is to move the log archival to a seperate thread and ensure 
> we can do some kind of throttling or batching so that we don't do archival at 
> one shot. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25065) WAL archival to be done by a separate thread

2020-10-15 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214601#comment-17214601
 ] 

Hudson commented on HBASE-25065:


Results for branch branch-2
[build #77 on 
builds.a.o|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/77/]:
 (x) *{color:red}-1 overall{color}*

details (if available):

(x) {color:red}-1 general checks{color}
-- For more information [see general 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/77/General_20Nightly_20Build_20Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/77/JDK8_20Nightly_20Build_20Report_20_28Hadoop2_29/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/77/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(x) {color:red}-1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/77/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(x) {color:red}-1 source release artifact{color}
-- See build output for details.


(x) {color:red}-1 client integration test{color}
-- Something went wrong with this stage, [check relevant console 
output|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/77//console].


> WAL archival to be done by a separate thread
> 
>
> Key: HBASE-25065
> URL: https://issues.apache.org/jira/browse/HBASE-25065
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Affects Versions: 3.0.0-alpha-1, 2.4.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.4.0
>
>
> Currently we do clean up of logs once we ensure that the region data has been 
> flushed. We track the sequence number and if we ensure that the seq number 
> has been flushed for any given region and the WAL that was rolled has that 
> seq number then those WAL can be archived.
> When we have around ~50 files to archive (per RS) - we do the archiving one 
> after the other. Since archiving is nothing but a rename operation it adds to 
> the meta operation load of Cloud based FS. 
> Not only that - the entire archival is done inside the rollWriterLock. Though 
> we have closed the writer and created a new writer and the writes are ongoing 
> - we never release the lock until we are done with the archiving. 
> What happens is that during that period our logs grow in size compared to the 
> default size configured (when we have consistent writes happening). 
> So the proposal is to move the log archival to a seperate thread and ensure 
> we can do some kind of throttling or batching so that we don't do archival at 
> one shot. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25065) WAL archival to be done by a separate thread

2020-10-15 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214584#comment-17214584
 ] 

Hudson commented on HBASE-25065:


Results for branch master
[build #95 on 
builds.a.o|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/master/95/]:
 (x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/master/95/General_20Nightly_20Build_20Report/]






(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/master/95/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/master/95/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> WAL archival to be done by a separate thread
> 
>
> Key: HBASE-25065
> URL: https://issues.apache.org/jira/browse/HBASE-25065
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Affects Versions: 3.0.0-alpha-1, 2.4.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.4.0
>
>
> Currently we do clean up of logs once we ensure that the region data has been 
> flushed. We track the sequence number and if we ensure that the seq number 
> has been flushed for any given region and the WAL that was rolled has that 
> seq number then those WAL can be archived.
> When we have around ~50 files to archive (per RS) - we do the archiving one 
> after the other. Since archiving is nothing but a rename operation it adds to 
> the meta operation load of Cloud based FS. 
> Not only that - the entire archival is done inside the rollWriterLock. Though 
> we have closed the writer and created a new writer and the writes are ongoing 
> - we never release the lock until we are done with the archiving. 
> What happens is that during that period our logs grow in size compared to the 
> default size configured (when we have consistent writes happening). 
> So the proposal is to move the log archival to a seperate thread and ensure 
> we can do some kind of throttling or batching so that we don't do archival at 
> one shot. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25065) WAL archival to be done by a separate thread

2020-10-14 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213808#comment-17213808
 ] 

ramkrishna.s.vasudevan commented on HBASE-25065:


[~zhangduo] I just raised an issue 
https://issues.apache.org/jira/browse/HBASE-25186. Marked it as blocker.

> WAL archival to be done by a separate thread
> 
>
> Key: HBASE-25065
> URL: https://issues.apache.org/jira/browse/HBASE-25065
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Affects Versions: 3.0.0-alpha-1, 2.4.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.4.0
>
>
> Currently we do clean up of logs once we ensure that the region data has been 
> flushed. We track the sequence number and if we ensure that the seq number 
> has been flushed for any given region and the WAL that was rolled has that 
> seq number then those WAL can be archived.
> When we have around ~50 files to archive (per RS) - we do the archiving one 
> after the other. Since archiving is nothing but a rename operation it adds to 
> the meta operation load of Cloud based FS. 
> Not only that - the entire archival is done inside the rollWriterLock. Though 
> we have closed the writer and created a new writer and the writes are ongoing 
> - we never release the lock until we are done with the archiving. 
> What happens is that during that period our logs grow in size compared to the 
> default size configured (when we have consistent writes happening). 
> So the proposal is to move the log archival to a seperate thread and ensure 
> we can do some kind of throttling or batching so that we don't do archival at 
> one shot. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25065) WAL archival to be done by a separate thread

2020-10-14 Thread Duo Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213708#comment-17213708
 ] 

Duo Zhang commented on HBASE-25065:
---

I think we can open a new issue and set it as blocker for this. I do not have a 
solution in mind for now.

> WAL archival to be done by a separate thread
> 
>
> Key: HBASE-25065
> URL: https://issues.apache.org/jira/browse/HBASE-25065
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Affects Versions: 3.0.0-alpha-1, 2.4.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.4.0
>
>
> Currently we do clean up of logs once we ensure that the region data has been 
> flushed. We track the sequence number and if we ensure that the seq number 
> has been flushed for any given region and the WAL that was rolled has that 
> seq number then those WAL can be archived.
> When we have around ~50 files to archive (per RS) - we do the archiving one 
> after the other. Since archiving is nothing but a rename operation it adds to 
> the meta operation load of Cloud based FS. 
> Not only that - the entire archival is done inside the rollWriterLock. Though 
> we have closed the writer and created a new writer and the writes are ongoing 
> - we never release the lock until we are done with the archiving. 
> What happens is that during that period our logs grow in size compared to the 
> default size configured (when we have consistent writes happening). 
> So the proposal is to move the log archival to a seperate thread and ensure 
> we can do some kind of throttling or batching so that we don't do archival at 
> one shot. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25065) WAL archival to be done by a separate thread

2020-10-14 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213679#comment-17213679
 ] 

ramkrishna.s.vasudevan commented on HBASE-25065:


It is not a logical issue just that when the afterRoll is called in the flow 
the archival has not been completed. 

> WAL archival to be done by a separate thread
> 
>
> Key: HBASE-25065
> URL: https://issues.apache.org/jira/browse/HBASE-25065
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Affects Versions: 3.0.0-alpha-1, 2.4.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.4.0
>
>
> Currently we do clean up of logs once we ensure that the region data has been 
> flushed. We track the sequence number and if we ensure that the seq number 
> has been flushed for any given region and the WAL that was rolled has that 
> seq number then those WAL can be archived.
> When we have around ~50 files to archive (per RS) - we do the archiving one 
> after the other. Since archiving is nothing but a rename operation it adds to 
> the meta operation load of Cloud based FS. 
> Not only that - the entire archival is done inside the rollWriterLock. Though 
> we have closed the writer and created a new writer and the writes are ongoing 
> - we never release the lock until we are done with the archiving. 
> What happens is that during that period our logs grow in size compared to the 
> default size configured (when we have consistent writes happening). 
> So the proposal is to move the log archival to a seperate thread and ensure 
> we can do some kind of throttling or batching so that we don't do archival at 
> one shot. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25065) WAL archival to be done by a separate thread

2020-10-14 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213678#comment-17213678
 ] 

ramkrishna.s.vasudevan commented on HBASE-25065:


The problem is related to the earlier test case failure. The AbstractWalRoller 
in master and branch-2 has a afterRoll() method. Mainly for the master region. 
That depends on the fact that the wal roll and archival happens in sync and 
that it expects the WAL file to be available in the walArchive path . If it is 
available it moves it to the globalArchive path. Now since this archival is 
async the afterRoll() does not happen and that causes the test case failure. 
[~zhangduo] - Is it possible to move this afterRoll() to that async archive 
method only? But to do that we need the LogRoller with the WAL implementation 
classes. 

> WAL archival to be done by a separate thread
> 
>
> Key: HBASE-25065
> URL: https://issues.apache.org/jira/browse/HBASE-25065
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Affects Versions: 3.0.0-alpha-1, 2.4.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.4.0
>
>
> Currently we do clean up of logs once we ensure that the region data has been 
> flushed. We track the sequence number and if we ensure that the seq number 
> has been flushed for any given region and the WAL that was rolled has that 
> seq number then those WAL can be archived.
> When we have around ~50 files to archive (per RS) - we do the archiving one 
> after the other. Since archiving is nothing but a rename operation it adds to 
> the meta operation load of Cloud based FS. 
> Not only that - the entire archival is done inside the rollWriterLock. Though 
> we have closed the writer and created a new writer and the writes are ongoing 
> - we never release the lock until we are done with the archiving. 
> What happens is that during that period our logs grow in size compared to the 
> default size configured (when we have consistent writes happening). 
> So the proposal is to move the log archival to a seperate thread and ensure 
> we can do some kind of throttling or batching so that we don't do archival at 
> one shot. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25065) WAL archival to be done by a separate thread

2020-10-13 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213604#comment-17213604
 ] 

ramkrishna.s.vasudevan commented on HBASE-25065:


[~zhangduo] Thanks for pointing out. I was not available yesterday. Will have a 
look at this. I found a test failing in branch-2 and hence fixed it but not in 
master. Let me check 

> WAL archival to be done by a separate thread
> 
>
> Key: HBASE-25065
> URL: https://issues.apache.org/jira/browse/HBASE-25065
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Affects Versions: 3.0.0-alpha-1, 2.4.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.4.0
>
>
> Currently we do clean up of logs once we ensure that the region data has been 
> flushed. We track the sequence number and if we ensure that the seq number 
> has been flushed for any given region and the WAL that was rolled has that 
> seq number then those WAL can be archived.
> When we have around ~50 files to archive (per RS) - we do the archiving one 
> after the other. Since archiving is nothing but a rename operation it adds to 
> the meta operation load of Cloud based FS. 
> Not only that - the entire archival is done inside the rollWriterLock. Though 
> we have closed the writer and created a new writer and the writes are ongoing 
> - we never release the lock until we are done with the archiving. 
> What happens is that during that period our logs grow in size compared to the 
> default size configured (when we have consistent writes happening). 
> So the proposal is to move the log archival to a seperate thread and ensure 
> we can do some kind of throttling or batching so that we don't do archival at 
> one shot. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25065) WAL archival to be done by a separate thread

2020-10-13 Thread Duo Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213520#comment-17213520
 ] 

Duo Zhang commented on HBASE-25065:
---

[~ram_krish] Any feedbacks? If still no response today I will revert the patch 
to get the builds back first.

Thanks.

> WAL archival to be done by a separate thread
> 
>
> Key: HBASE-25065
> URL: https://issues.apache.org/jira/browse/HBASE-25065
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Affects Versions: 3.0.0-alpha-1, 2.4.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.4.0
>
>
> Currently we do clean up of logs once we ensure that the region data has been 
> flushed. We track the sequence number and if we ensure that the seq number 
> has been flushed for any given region and the WAL that was rolled has that 
> seq number then those WAL can be archived.
> When we have around ~50 files to archive (per RS) - we do the archiving one 
> after the other. Since archiving is nothing but a rename operation it adds to 
> the meta operation load of Cloud based FS. 
> Not only that - the entire archival is done inside the rollWriterLock. Though 
> we have closed the writer and created a new writer and the writes are ongoing 
> - we never release the lock until we are done with the archiving. 
> What happens is that during that period our logs grow in size compared to the 
> default size configured (when we have consistent writes happening). 
> So the proposal is to move the log archival to a seperate thread and ensure 
> we can do some kind of throttling or batching so that we don't do archival at 
> one shot. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25065) WAL archival to be done by a separate thread

2020-10-13 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213279#comment-17213279
 ] 

Hudson commented on HBASE-25065:


Results for branch branch-2
[build #76 on 
builds.a.o|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/76/]:
 (x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/76/General_20Nightly_20Build_20Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/76/JDK8_20Nightly_20Build_20Report_20_28Hadoop2_29/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/76/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(x) {color:red}-1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/76/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> WAL archival to be done by a separate thread
> 
>
> Key: HBASE-25065
> URL: https://issues.apache.org/jira/browse/HBASE-25065
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Affects Versions: 3.0.0-alpha-1, 2.4.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.4.0
>
>
> Currently we do clean up of logs once we ensure that the region data has been 
> flushed. We track the sequence number and if we ensure that the seq number 
> has been flushed for any given region and the WAL that was rolled has that 
> seq number then those WAL can be archived.
> When we have around ~50 files to archive (per RS) - we do the archiving one 
> after the other. Since archiving is nothing but a rename operation it adds to 
> the meta operation load of Cloud based FS. 
> Not only that - the entire archival is done inside the rollWriterLock. Though 
> we have closed the writer and created a new writer and the writes are ongoing 
> - we never release the lock until we are done with the archiving. 
> What happens is that during that period our logs grow in size compared to the 
> default size configured (when we have consistent writes happening). 
> So the proposal is to move the log archival to a seperate thread and ensure 
> we can do some kind of throttling or batching so that we don't do archival at 
> one shot. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25065) WAL archival to be done by a separate thread

2020-10-12 Thread Nick Dimiduk (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17212658#comment-17212658
 ] 

Nick Dimiduk commented on HBASE-25065:
--

[~ram_krish] Reviewed the PRs. I'm leaning towards "no" for a patch release, 
but I'd like to hear why you think otherwise.

> WAL archival to be done by a separate thread
> 
>
> Key: HBASE-25065
> URL: https://issues.apache.org/jira/browse/HBASE-25065
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Affects Versions: 3.0.0-alpha-1, 2.4.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.4.0
>
>
> Currently we do clean up of logs once we ensure that the region data has been 
> flushed. We track the sequence number and if we ensure that the seq number 
> has been flushed for any given region and the WAL that was rolled has that 
> seq number then those WAL can be archived.
> When we have around ~50 files to archive (per RS) - we do the archiving one 
> after the other. Since archiving is nothing but a rename operation it adds to 
> the meta operation load of Cloud based FS. 
> Not only that - the entire archival is done inside the rollWriterLock. Though 
> we have closed the writer and created a new writer and the writes are ongoing 
> - we never release the lock until we are done with the archiving. 
> What happens is that during that period our logs grow in size compared to the 
> default size configured (when we have consistent writes happening). 
> So the proposal is to move the log archival to a seperate thread and ensure 
> we can do some kind of throttling or batching so that we don't do archival at 
> one shot. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25065) WAL archival to be done by a separate thread

2020-10-11 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17211965#comment-17211965
 ] 

Hudson commented on HBASE-25065:


Results for branch master
[build #92 on 
builds.a.o|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/master/92/]:
 (/) *{color:green}+1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/master/92/General_20Nightly_20Build_20Report/]






(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/master/92/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/master/92/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> WAL archival to be done by a separate thread
> 
>
> Key: HBASE-25065
> URL: https://issues.apache.org/jira/browse/HBASE-25065
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Affects Versions: 3.0.0-alpha-1, 2.4.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
>
> Currently we do clean up of logs once we ensure that the region data has been 
> flushed. We track the sequence number and if we ensure that the seq number 
> has been flushed for any given region and the WAL that was rolled has that 
> seq number then those WAL can be archived.
> When we have around ~50 files to archive (per RS) - we do the archiving one 
> after the other. Since archiving is nothing but a rename operation it adds to 
> the meta operation load of Cloud based FS. 
> Not only that - the entire archival is done inside the rollWriterLock. Though 
> we have closed the writer and created a new writer and the writes are ongoing 
> - we never release the lock until we are done with the archiving. 
> What happens is that during that period our logs grow in size compared to the 
> default size configured (when we have consistent writes happening). 
> So the proposal is to move the log archival to a seperate thread and ensure 
> we can do some kind of throttling or batching so that we don't do archival at 
> one shot. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25065) WAL archival to be done by a separate thread

2020-10-10 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17211832#comment-17211832
 ] 

ramkrishna.s.vasudevan commented on HBASE-25065:


[~ndimiduk] and [~zghao]
Do you want this in 2.3 and 2.2 branches? 

> WAL archival to be done by a separate thread
> 
>
> Key: HBASE-25065
> URL: https://issues.apache.org/jira/browse/HBASE-25065
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Affects Versions: 3.0.0-alpha-1, 2.4.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
>
> Currently we do clean up of logs once we ensure that the region data has been 
> flushed. We track the sequence number and if we ensure that the seq number 
> has been flushed for any given region and the WAL that was rolled has that 
> seq number then those WAL can be archived.
> When we have around ~50 files to archive (per RS) - we do the archiving one 
> after the other. Since archiving is nothing but a rename operation it adds to 
> the meta operation load of Cloud based FS. 
> Not only that - the entire archival is done inside the rollWriterLock. Though 
> we have closed the writer and created a new writer and the writes are ongoing 
> - we never release the lock until we are done with the archiving. 
> What happens is that during that period our logs grow in size compared to the 
> default size configured (when we have consistent writes happening). 
> So the proposal is to move the log archival to a seperate thread and ensure 
> we can do some kind of throttling or batching so that we don't do archival at 
> one shot. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)