[jira] [Commented] (HBASE-18451) PeriodicMemstoreFlusher should inspect the queue before adding a delayed flush request
[ https://issues.apache.org/jira/browse/HBASE-18451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16721185#comment-16721185 ] Hudson commented on HBASE-18451: Results for branch branch-1.3 [build #576 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.3/576/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.3/576//General_Nightly_Build_Report/] (/) {color:green}+1 jdk7 checks{color} -- For more information [see jdk7 report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.3/576//JDK7_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.3/576//JDK8_Nightly_Build_Report_(Hadoop2)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. > PeriodicMemstoreFlusher should inspect the queue before adding a delayed > flush request > -- > > Key: HBASE-18451 > URL: https://issues.apache.org/jira/browse/HBASE-18451 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 2.0.0-alpha-1 >Reporter: Jean-Marc Spaggiari >Assignee: Xu Cang >Priority: Major > Fix For: 3.0.0, 1.5.0, 1.3.3, 2.2.0, 1.4.8, 2.1.1 > > Attachments: HBASE-18451.branch-1.001.patch, > HBASE-18451.branch-1.002.patch, HBASE-18451.branch-1.002.patch, > HBASE-18451.master.002.patch, HBASE-18451.master.003.patch, > HBASE-18451.master.004.patch, HBASE-18451.master.004.patch, > HBASE-18451.master.patch > > > If you run a big job every 4 hours, impacting many tables (they have 150 > regions per server), ad the end all the regions might have some data to be > flushed, and we want, after one hour, trigger a periodic flush. That's > totally fine. > Now, to avoid a flush storm, when we detect a region to be flushed, we add a > "randomDelay" to the delayed flush, that way we spread them away. > RANGE_OF_DELAY is 5 minutes. So we spread the flush over the next 5 minutes, > which is very good. > However, because we don't check if there is already a request in the queue, > 10 seconds after, we create a new request, with a new randomDelay. > If you generate a randomDelay every 10 seconds, at some point, you will end > up having a small one, and the flush will be triggered almost immediatly. > As a result, instead of spreading all the flush within the next 5 minutes, > you end-up getting them all way more quickly. Like within the first minute. > Which not only feed the queue to to many flush requests, but also defeats the > purpose of the randomDelay. > {code} > @Override > protected void chore() { > final StringBuffer whyFlush = new StringBuffer(); > for (Region r : this.server.onlineRegions.values()) { > if (r == null) continue; > if (((HRegion)r).shouldFlush(whyFlush)) { > FlushRequester requester = server.getFlushRequester(); > if (requester != null) { > long randomDelay = RandomUtils.nextInt(RANGE_OF_DELAY) + > MIN_DELAY_TIME; > LOG.info(getName() + " requesting flush of " + > r.getRegionInfo().getRegionNameAsString() + " because " + > whyFlush.toString() + > " after random delay " + randomDelay + "ms"); > //Throttle the flushes by putting a delay. If we don't throttle, > and there > //is a balanced write-load on the regions in a table, we might > end up > //overwhelming the filesystem with too many flushes at once. > requester.requestDelayedFlush(r, randomDelay, false); > } > } > } > } > {code} > {code} > 2017-07-24 18:44:33,338 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 270785ms > 2017-07-24 18:44:43,328 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 200143ms > 2017-07-24 18:44:53,954 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so
[jira] [Commented] (HBASE-18451) PeriodicMemstoreFlusher should inspect the queue before adding a delayed flush request
[ https://issues.apache.org/jira/browse/HBASE-18451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16720792#comment-16720792 ] Hudson commented on HBASE-18451: SUCCESS: Integrated in Jenkins build HBase-1.3-IT #509 (See [https://builds.apache.org/job/HBase-1.3-IT/509/]) HBASE-18451 PeriodicMemstoreFlusher should inspect the queue before (apurtell: rev 785e21fe545da33811a50e0718d7cfeb7dc74df7) * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreFlusher.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/FlushRequester.java * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHeapMemoryManager.java > PeriodicMemstoreFlusher should inspect the queue before adding a delayed > flush request > -- > > Key: HBASE-18451 > URL: https://issues.apache.org/jira/browse/HBASE-18451 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 2.0.0-alpha-1 >Reporter: Jean-Marc Spaggiari >Assignee: Xu Cang >Priority: Major > Fix For: 3.0.0, 1.5.0, 1.3.3, 2.2.0, 1.4.8, 2.1.1 > > Attachments: HBASE-18451.branch-1.001.patch, > HBASE-18451.branch-1.002.patch, HBASE-18451.branch-1.002.patch, > HBASE-18451.master.002.patch, HBASE-18451.master.003.patch, > HBASE-18451.master.004.patch, HBASE-18451.master.004.patch, > HBASE-18451.master.patch > > > If you run a big job every 4 hours, impacting many tables (they have 150 > regions per server), ad the end all the regions might have some data to be > flushed, and we want, after one hour, trigger a periodic flush. That's > totally fine. > Now, to avoid a flush storm, when we detect a region to be flushed, we add a > "randomDelay" to the delayed flush, that way we spread them away. > RANGE_OF_DELAY is 5 minutes. So we spread the flush over the next 5 minutes, > which is very good. > However, because we don't check if there is already a request in the queue, > 10 seconds after, we create a new request, with a new randomDelay. > If you generate a randomDelay every 10 seconds, at some point, you will end > up having a small one, and the flush will be triggered almost immediatly. > As a result, instead of spreading all the flush within the next 5 minutes, > you end-up getting them all way more quickly. Like within the first minute. > Which not only feed the queue to to many flush requests, but also defeats the > purpose of the randomDelay. > {code} > @Override > protected void chore() { > final StringBuffer whyFlush = new StringBuffer(); > for (Region r : this.server.onlineRegions.values()) { > if (r == null) continue; > if (((HRegion)r).shouldFlush(whyFlush)) { > FlushRequester requester = server.getFlushRequester(); > if (requester != null) { > long randomDelay = RandomUtils.nextInt(RANGE_OF_DELAY) + > MIN_DELAY_TIME; > LOG.info(getName() + " requesting flush of " + > r.getRegionInfo().getRegionNameAsString() + " because " + > whyFlush.toString() + > " after random delay " + randomDelay + "ms"); > //Throttle the flushes by putting a delay. If we don't throttle, > and there > //is a balanced write-load on the regions in a table, we might > end up > //overwhelming the filesystem with too many flushes at once. > requester.requestDelayedFlush(r, randomDelay, false); > } > } > } > } > {code} > {code} > 2017-07-24 18:44:33,338 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 270785ms > 2017-07-24 18:44:43,328 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 200143ms > 2017-07-24 18:44:53,954 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 191082ms > 2017-07-24 18:45:03,528 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: >
[jira] [Commented] (HBASE-18451) PeriodicMemstoreFlusher should inspect the queue before adding a delayed flush request
[ https://issues.apache.org/jira/browse/HBASE-18451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632986#comment-16632986 ] Hudson commented on HBASE-18451: Results for branch master [build #517 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/517/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/master/517//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/master/517//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/master/517//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > PeriodicMemstoreFlusher should inspect the queue before adding a delayed > flush request > -- > > Key: HBASE-18451 > URL: https://issues.apache.org/jira/browse/HBASE-18451 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 2.0.0-alpha-1 >Reporter: Jean-Marc Spaggiari >Assignee: Xu Cang >Priority: Major > Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.8, 2.1.1 > > Attachments: HBASE-18451.branch-1.001.patch, > HBASE-18451.branch-1.002.patch, HBASE-18451.branch-1.002.patch, > HBASE-18451.master.002.patch, HBASE-18451.master.003.patch, > HBASE-18451.master.004.patch, HBASE-18451.master.004.patch, > HBASE-18451.master.patch > > > If you run a big job every 4 hours, impacting many tables (they have 150 > regions per server), ad the end all the regions might have some data to be > flushed, and we want, after one hour, trigger a periodic flush. That's > totally fine. > Now, to avoid a flush storm, when we detect a region to be flushed, we add a > "randomDelay" to the delayed flush, that way we spread them away. > RANGE_OF_DELAY is 5 minutes. So we spread the flush over the next 5 minutes, > which is very good. > However, because we don't check if there is already a request in the queue, > 10 seconds after, we create a new request, with a new randomDelay. > If you generate a randomDelay every 10 seconds, at some point, you will end > up having a small one, and the flush will be triggered almost immediatly. > As a result, instead of spreading all the flush within the next 5 minutes, > you end-up getting them all way more quickly. Like within the first minute. > Which not only feed the queue to to many flush requests, but also defeats the > purpose of the randomDelay. > {code} > @Override > protected void chore() { > final StringBuffer whyFlush = new StringBuffer(); > for (Region r : this.server.onlineRegions.values()) { > if (r == null) continue; > if (((HRegion)r).shouldFlush(whyFlush)) { > FlushRequester requester = server.getFlushRequester(); > if (requester != null) { > long randomDelay = RandomUtils.nextInt(RANGE_OF_DELAY) + > MIN_DELAY_TIME; > LOG.info(getName() + " requesting flush of " + > r.getRegionInfo().getRegionNameAsString() + " because " + > whyFlush.toString() + > " after random delay " + randomDelay + "ms"); > //Throttle the flushes by putting a delay. If we don't throttle, > and there > //is a balanced write-load on the regions in a table, we might > end up > //overwhelming the filesystem with too many flushes at once. > requester.requestDelayedFlush(r, randomDelay, false); > } > } > } > } > {code} > {code} > 2017-07-24 18:44:33,338 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 270785ms > 2017-07-24 18:44:43,328 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 200143ms > 2017-07-24 18:44:53,954 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of
[jira] [Commented] (HBASE-18451) PeriodicMemstoreFlusher should inspect the queue before adding a delayed flush request
[ https://issues.apache.org/jira/browse/HBASE-18451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632941#comment-16632941 ] Hudson commented on HBASE-18451: Results for branch branch-1.4 [build #484 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.4/484/]: (x) *{color:red}-1 overall{color}* details (if available): (x) {color:red}-1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.4/484//General_Nightly_Build_Report/] (x) {color:red}-1 jdk7 checks{color} -- For more information [see jdk7 report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.4/484//JDK7_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.4/484//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 source release artifact{color} -- See build output for details. > PeriodicMemstoreFlusher should inspect the queue before adding a delayed > flush request > -- > > Key: HBASE-18451 > URL: https://issues.apache.org/jira/browse/HBASE-18451 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 2.0.0-alpha-1 >Reporter: Jean-Marc Spaggiari >Assignee: Xu Cang >Priority: Major > Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.8, 2.1.1 > > Attachments: HBASE-18451.branch-1.001.patch, > HBASE-18451.branch-1.002.patch, HBASE-18451.branch-1.002.patch, > HBASE-18451.master.002.patch, HBASE-18451.master.003.patch, > HBASE-18451.master.004.patch, HBASE-18451.master.004.patch, > HBASE-18451.master.patch > > > If you run a big job every 4 hours, impacting many tables (they have 150 > regions per server), ad the end all the regions might have some data to be > flushed, and we want, after one hour, trigger a periodic flush. That's > totally fine. > Now, to avoid a flush storm, when we detect a region to be flushed, we add a > "randomDelay" to the delayed flush, that way we spread them away. > RANGE_OF_DELAY is 5 minutes. So we spread the flush over the next 5 minutes, > which is very good. > However, because we don't check if there is already a request in the queue, > 10 seconds after, we create a new request, with a new randomDelay. > If you generate a randomDelay every 10 seconds, at some point, you will end > up having a small one, and the flush will be triggered almost immediatly. > As a result, instead of spreading all the flush within the next 5 minutes, > you end-up getting them all way more quickly. Like within the first minute. > Which not only feed the queue to to many flush requests, but also defeats the > purpose of the randomDelay. > {code} > @Override > protected void chore() { > final StringBuffer whyFlush = new StringBuffer(); > for (Region r : this.server.onlineRegions.values()) { > if (r == null) continue; > if (((HRegion)r).shouldFlush(whyFlush)) { > FlushRequester requester = server.getFlushRequester(); > if (requester != null) { > long randomDelay = RandomUtils.nextInt(RANGE_OF_DELAY) + > MIN_DELAY_TIME; > LOG.info(getName() + " requesting flush of " + > r.getRegionInfo().getRegionNameAsString() + " because " + > whyFlush.toString() + > " after random delay " + randomDelay + "ms"); > //Throttle the flushes by putting a delay. If we don't throttle, > and there > //is a balanced write-load on the regions in a table, we might > end up > //overwhelming the filesystem with too many flushes at once. > requester.requestDelayedFlush(r, randomDelay, false); > } > } > } > } > {code} > {code} > 2017-07-24 18:44:33,338 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 270785ms > 2017-07-24 18:44:43,328 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 200143ms > 2017-07-24 18:44:53,954 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free
[jira] [Commented] (HBASE-18451) PeriodicMemstoreFlusher should inspect the queue before adding a delayed flush request
[ https://issues.apache.org/jira/browse/HBASE-18451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632937#comment-16632937 ] Hudson commented on HBASE-18451: Results for branch branch-1 [build #482 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/482/]: (x) *{color:red}-1 overall{color}* details (if available): (x) {color:red}-1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/482//General_Nightly_Build_Report/] (x) {color:red}-1 jdk7 checks{color} -- For more information [see jdk7 report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/482//JDK7_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/482//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 source release artifact{color} -- See build output for details. > PeriodicMemstoreFlusher should inspect the queue before adding a delayed > flush request > -- > > Key: HBASE-18451 > URL: https://issues.apache.org/jira/browse/HBASE-18451 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 2.0.0-alpha-1 >Reporter: Jean-Marc Spaggiari >Assignee: Xu Cang >Priority: Major > Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.8, 2.1.1 > > Attachments: HBASE-18451.branch-1.001.patch, > HBASE-18451.branch-1.002.patch, HBASE-18451.branch-1.002.patch, > HBASE-18451.master.002.patch, HBASE-18451.master.003.patch, > HBASE-18451.master.004.patch, HBASE-18451.master.004.patch, > HBASE-18451.master.patch > > > If you run a big job every 4 hours, impacting many tables (they have 150 > regions per server), ad the end all the regions might have some data to be > flushed, and we want, after one hour, trigger a periodic flush. That's > totally fine. > Now, to avoid a flush storm, when we detect a region to be flushed, we add a > "randomDelay" to the delayed flush, that way we spread them away. > RANGE_OF_DELAY is 5 minutes. So we spread the flush over the next 5 minutes, > which is very good. > However, because we don't check if there is already a request in the queue, > 10 seconds after, we create a new request, with a new randomDelay. > If you generate a randomDelay every 10 seconds, at some point, you will end > up having a small one, and the flush will be triggered almost immediatly. > As a result, instead of spreading all the flush within the next 5 minutes, > you end-up getting them all way more quickly. Like within the first minute. > Which not only feed the queue to to many flush requests, but also defeats the > purpose of the randomDelay. > {code} > @Override > protected void chore() { > final StringBuffer whyFlush = new StringBuffer(); > for (Region r : this.server.onlineRegions.values()) { > if (r == null) continue; > if (((HRegion)r).shouldFlush(whyFlush)) { > FlushRequester requester = server.getFlushRequester(); > if (requester != null) { > long randomDelay = RandomUtils.nextInt(RANGE_OF_DELAY) + > MIN_DELAY_TIME; > LOG.info(getName() + " requesting flush of " + > r.getRegionInfo().getRegionNameAsString() + " because " + > whyFlush.toString() + > " after random delay " + randomDelay + "ms"); > //Throttle the flushes by putting a delay. If we don't throttle, > and there > //is a balanced write-load on the regions in a table, we might > end up > //overwhelming the filesystem with too many flushes at once. > requester.requestDelayedFlush(r, randomDelay, false); > } > } > } > } > {code} > {code} > 2017-07-24 18:44:33,338 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 270785ms > 2017-07-24 18:44:43,328 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 200143ms > 2017-07-24 18:44:53,954 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after
[jira] [Commented] (HBASE-18451) PeriodicMemstoreFlusher should inspect the queue before adding a delayed flush request
[ https://issues.apache.org/jira/browse/HBASE-18451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632810#comment-16632810 ] Hudson commented on HBASE-18451: Results for branch branch-2.1 [build #392 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/392/]: (/) *{color:green}+1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/392//General_Nightly_Build_Report/] (/) {color:green}+1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/392//JDK8_Nightly_Build_Report_(Hadoop2)/] (/) {color:green}+1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/392//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > PeriodicMemstoreFlusher should inspect the queue before adding a delayed > flush request > -- > > Key: HBASE-18451 > URL: https://issues.apache.org/jira/browse/HBASE-18451 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 2.0.0-alpha-1 >Reporter: Jean-Marc Spaggiari >Assignee: Xu Cang >Priority: Major > Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.8, 2.1.1 > > Attachments: HBASE-18451.branch-1.001.patch, > HBASE-18451.branch-1.002.patch, HBASE-18451.branch-1.002.patch, > HBASE-18451.master.002.patch, HBASE-18451.master.003.patch, > HBASE-18451.master.004.patch, HBASE-18451.master.004.patch, > HBASE-18451.master.patch > > > If you run a big job every 4 hours, impacting many tables (they have 150 > regions per server), ad the end all the regions might have some data to be > flushed, and we want, after one hour, trigger a periodic flush. That's > totally fine. > Now, to avoid a flush storm, when we detect a region to be flushed, we add a > "randomDelay" to the delayed flush, that way we spread them away. > RANGE_OF_DELAY is 5 minutes. So we spread the flush over the next 5 minutes, > which is very good. > However, because we don't check if there is already a request in the queue, > 10 seconds after, we create a new request, with a new randomDelay. > If you generate a randomDelay every 10 seconds, at some point, you will end > up having a small one, and the flush will be triggered almost immediatly. > As a result, instead of spreading all the flush within the next 5 minutes, > you end-up getting them all way more quickly. Like within the first minute. > Which not only feed the queue to to many flush requests, but also defeats the > purpose of the randomDelay. > {code} > @Override > protected void chore() { > final StringBuffer whyFlush = new StringBuffer(); > for (Region r : this.server.onlineRegions.values()) { > if (r == null) continue; > if (((HRegion)r).shouldFlush(whyFlush)) { > FlushRequester requester = server.getFlushRequester(); > if (requester != null) { > long randomDelay = RandomUtils.nextInt(RANGE_OF_DELAY) + > MIN_DELAY_TIME; > LOG.info(getName() + " requesting flush of " + > r.getRegionInfo().getRegionNameAsString() + " because " + > whyFlush.toString() + > " after random delay " + randomDelay + "ms"); > //Throttle the flushes by putting a delay. If we don't throttle, > and there > //is a balanced write-load on the regions in a table, we might > end up > //overwhelming the filesystem with too many flushes at once. > requester.requestDelayedFlush(r, randomDelay, false); > } > } > } > } > {code} > {code} > 2017-07-24 18:44:33,338 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 270785ms > 2017-07-24 18:44:43,328 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 200143ms > 2017-07-24 18:44:53,954 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of
[jira] [Commented] (HBASE-18451) PeriodicMemstoreFlusher should inspect the queue before adding a delayed flush request
[ https://issues.apache.org/jira/browse/HBASE-18451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632790#comment-16632790 ] Hudson commented on HBASE-18451: Results for branch branch-2 [build #1317 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1317/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1317//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1317//JDK8_Nightly_Build_Report_(Hadoop2)/] (/) {color:green}+1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1317//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > PeriodicMemstoreFlusher should inspect the queue before adding a delayed > flush request > -- > > Key: HBASE-18451 > URL: https://issues.apache.org/jira/browse/HBASE-18451 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 2.0.0-alpha-1 >Reporter: Jean-Marc Spaggiari >Assignee: Xu Cang >Priority: Major > Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.8, 2.1.1 > > Attachments: HBASE-18451.branch-1.001.patch, > HBASE-18451.branch-1.002.patch, HBASE-18451.branch-1.002.patch, > HBASE-18451.master.002.patch, HBASE-18451.master.003.patch, > HBASE-18451.master.004.patch, HBASE-18451.master.004.patch, > HBASE-18451.master.patch > > > If you run a big job every 4 hours, impacting many tables (they have 150 > regions per server), ad the end all the regions might have some data to be > flushed, and we want, after one hour, trigger a periodic flush. That's > totally fine. > Now, to avoid a flush storm, when we detect a region to be flushed, we add a > "randomDelay" to the delayed flush, that way we spread them away. > RANGE_OF_DELAY is 5 minutes. So we spread the flush over the next 5 minutes, > which is very good. > However, because we don't check if there is already a request in the queue, > 10 seconds after, we create a new request, with a new randomDelay. > If you generate a randomDelay every 10 seconds, at some point, you will end > up having a small one, and the flush will be triggered almost immediatly. > As a result, instead of spreading all the flush within the next 5 minutes, > you end-up getting them all way more quickly. Like within the first minute. > Which not only feed the queue to to many flush requests, but also defeats the > purpose of the randomDelay. > {code} > @Override > protected void chore() { > final StringBuffer whyFlush = new StringBuffer(); > for (Region r : this.server.onlineRegions.values()) { > if (r == null) continue; > if (((HRegion)r).shouldFlush(whyFlush)) { > FlushRequester requester = server.getFlushRequester(); > if (requester != null) { > long randomDelay = RandomUtils.nextInt(RANGE_OF_DELAY) + > MIN_DELAY_TIME; > LOG.info(getName() + " requesting flush of " + > r.getRegionInfo().getRegionNameAsString() + " because " + > whyFlush.toString() + > " after random delay " + randomDelay + "ms"); > //Throttle the flushes by putting a delay. If we don't throttle, > and there > //is a balanced write-load on the regions in a table, we might > end up > //overwhelming the filesystem with too many flushes at once. > requester.requestDelayedFlush(r, randomDelay, false); > } > } > } > } > {code} > {code} > 2017-07-24 18:44:33,338 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 270785ms > 2017-07-24 18:44:43,328 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 200143ms > 2017-07-24 18:44:53,954 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of
[jira] [Commented] (HBASE-18451) PeriodicMemstoreFlusher should inspect the queue before adding a delayed flush request
[ https://issues.apache.org/jira/browse/HBASE-18451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632532#comment-16632532 ] Hadoop QA commented on HBASE-18451: --- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 10s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 52s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 14s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 28s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 10s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 24s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 10m 52s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green}124m 32s{color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}167m 26s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b | | JIRA Issue | HBASE-18451 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12941715/HBASE-18451.master.004.patch | | Optional Tests | dupname asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux 272ce9242570 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 10:45:36 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / 6bc7089f9e | | maven | version: Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC3 | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/14531/testReport/ | | Max. process+thread count | 5240 (vs. ulimit of 1) | | modules | C: hbase-server U: hbase-server | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/14531/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. >
[jira] [Commented] (HBASE-18451) PeriodicMemstoreFlusher should inspect the queue before adding a delayed flush request
[ https://issues.apache.org/jira/browse/HBASE-18451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632526#comment-16632526 ] Andrew Purtell commented on HBASE-18451: Done, pushed to master, branch-2, branch-2.1, branch-1, branch-1.4. > PeriodicMemstoreFlusher should inspect the queue before adding a delayed > flush request > -- > > Key: HBASE-18451 > URL: https://issues.apache.org/jira/browse/HBASE-18451 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 2.0.0-alpha-1 >Reporter: Jean-Marc Spaggiari >Assignee: Xu Cang >Priority: Major > Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.8, 2.1.1 > > Attachments: HBASE-18451.branch-1.001.patch, > HBASE-18451.branch-1.002.patch, HBASE-18451.branch-1.002.patch, > HBASE-18451.master.002.patch, HBASE-18451.master.003.patch, > HBASE-18451.master.004.patch, HBASE-18451.master.004.patch, > HBASE-18451.master.patch > > > If you run a big job every 4 hours, impacting many tables (they have 150 > regions per server), ad the end all the regions might have some data to be > flushed, and we want, after one hour, trigger a periodic flush. That's > totally fine. > Now, to avoid a flush storm, when we detect a region to be flushed, we add a > "randomDelay" to the delayed flush, that way we spread them away. > RANGE_OF_DELAY is 5 minutes. So we spread the flush over the next 5 minutes, > which is very good. > However, because we don't check if there is already a request in the queue, > 10 seconds after, we create a new request, with a new randomDelay. > If you generate a randomDelay every 10 seconds, at some point, you will end > up having a small one, and the flush will be triggered almost immediatly. > As a result, instead of spreading all the flush within the next 5 minutes, > you end-up getting them all way more quickly. Like within the first minute. > Which not only feed the queue to to many flush requests, but also defeats the > purpose of the randomDelay. > {code} > @Override > protected void chore() { > final StringBuffer whyFlush = new StringBuffer(); > for (Region r : this.server.onlineRegions.values()) { > if (r == null) continue; > if (((HRegion)r).shouldFlush(whyFlush)) { > FlushRequester requester = server.getFlushRequester(); > if (requester != null) { > long randomDelay = RandomUtils.nextInt(RANGE_OF_DELAY) + > MIN_DELAY_TIME; > LOG.info(getName() + " requesting flush of " + > r.getRegionInfo().getRegionNameAsString() + " because " + > whyFlush.toString() + > " after random delay " + randomDelay + "ms"); > //Throttle the flushes by putting a delay. If we don't throttle, > and there > //is a balanced write-load on the regions in a table, we might > end up > //overwhelming the filesystem with too many flushes at once. > requester.requestDelayedFlush(r, randomDelay, false); > } > } > } > } > {code} > {code} > 2017-07-24 18:44:33,338 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 270785ms > 2017-07-24 18:44:43,328 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 200143ms > 2017-07-24 18:44:53,954 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 191082ms > 2017-07-24 18:45:03,528 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 92532ms > 2017-07-24 18:45:14,201 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 238780ms > 2017-07-24 18:45:24,195 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: >
[jira] [Commented] (HBASE-18451) PeriodicMemstoreFlusher should inspect the queue before adding a delayed flush request
[ https://issues.apache.org/jira/browse/HBASE-18451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632249#comment-16632249 ] Andrew Purtell commented on HBASE-18451: Doing a few local checks now > PeriodicMemstoreFlusher should inspect the queue before adding a delayed > flush request > -- > > Key: HBASE-18451 > URL: https://issues.apache.org/jira/browse/HBASE-18451 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 2.0.0-alpha-1 >Reporter: Jean-Marc Spaggiari >Assignee: Xu Cang >Priority: Major > Attachments: HBASE-18451.branch-1.001.patch, > HBASE-18451.branch-1.002.patch, HBASE-18451.branch-1.002.patch, > HBASE-18451.master.002.patch, HBASE-18451.master.003.patch, > HBASE-18451.master.004.patch, HBASE-18451.master.004.patch, > HBASE-18451.master.patch > > > If you run a big job every 4 hours, impacting many tables (they have 150 > regions per server), ad the end all the regions might have some data to be > flushed, and we want, after one hour, trigger a periodic flush. That's > totally fine. > Now, to avoid a flush storm, when we detect a region to be flushed, we add a > "randomDelay" to the delayed flush, that way we spread them away. > RANGE_OF_DELAY is 5 minutes. So we spread the flush over the next 5 minutes, > which is very good. > However, because we don't check if there is already a request in the queue, > 10 seconds after, we create a new request, with a new randomDelay. > If you generate a randomDelay every 10 seconds, at some point, you will end > up having a small one, and the flush will be triggered almost immediatly. > As a result, instead of spreading all the flush within the next 5 minutes, > you end-up getting them all way more quickly. Like within the first minute. > Which not only feed the queue to to many flush requests, but also defeats the > purpose of the randomDelay. > {code} > @Override > protected void chore() { > final StringBuffer whyFlush = new StringBuffer(); > for (Region r : this.server.onlineRegions.values()) { > if (r == null) continue; > if (((HRegion)r).shouldFlush(whyFlush)) { > FlushRequester requester = server.getFlushRequester(); > if (requester != null) { > long randomDelay = RandomUtils.nextInt(RANGE_OF_DELAY) + > MIN_DELAY_TIME; > LOG.info(getName() + " requesting flush of " + > r.getRegionInfo().getRegionNameAsString() + " because " + > whyFlush.toString() + > " after random delay " + randomDelay + "ms"); > //Throttle the flushes by putting a delay. If we don't throttle, > and there > //is a balanced write-load on the regions in a table, we might > end up > //overwhelming the filesystem with too many flushes at once. > requester.requestDelayedFlush(r, randomDelay, false); > } > } > } > } > {code} > {code} > 2017-07-24 18:44:33,338 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 270785ms > 2017-07-24 18:44:43,328 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 200143ms > 2017-07-24 18:44:53,954 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 191082ms > 2017-07-24 18:45:03,528 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 92532ms > 2017-07-24 18:45:14,201 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 238780ms > 2017-07-24 18:45:24,195 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of
[jira] [Commented] (HBASE-18451) PeriodicMemstoreFlusher should inspect the queue before adding a delayed flush request
[ https://issues.apache.org/jira/browse/HBASE-18451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632239#comment-16632239 ] Andrew Purtell commented on HBASE-18451: Ok, I'll do the commit now unless someone beats me to it. > PeriodicMemstoreFlusher should inspect the queue before adding a delayed > flush request > -- > > Key: HBASE-18451 > URL: https://issues.apache.org/jira/browse/HBASE-18451 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 2.0.0-alpha-1 >Reporter: Jean-Marc Spaggiari >Assignee: Xu Cang >Priority: Major > Attachments: HBASE-18451.branch-1.001.patch, > HBASE-18451.branch-1.002.patch, HBASE-18451.branch-1.002.patch, > HBASE-18451.master.002.patch, HBASE-18451.master.003.patch, > HBASE-18451.master.004.patch, HBASE-18451.master.004.patch, > HBASE-18451.master.patch > > > If you run a big job every 4 hours, impacting many tables (they have 150 > regions per server), ad the end all the regions might have some data to be > flushed, and we want, after one hour, trigger a periodic flush. That's > totally fine. > Now, to avoid a flush storm, when we detect a region to be flushed, we add a > "randomDelay" to the delayed flush, that way we spread them away. > RANGE_OF_DELAY is 5 minutes. So we spread the flush over the next 5 minutes, > which is very good. > However, because we don't check if there is already a request in the queue, > 10 seconds after, we create a new request, with a new randomDelay. > If you generate a randomDelay every 10 seconds, at some point, you will end > up having a small one, and the flush will be triggered almost immediatly. > As a result, instead of spreading all the flush within the next 5 minutes, > you end-up getting them all way more quickly. Like within the first minute. > Which not only feed the queue to to many flush requests, but also defeats the > purpose of the randomDelay. > {code} > @Override > protected void chore() { > final StringBuffer whyFlush = new StringBuffer(); > for (Region r : this.server.onlineRegions.values()) { > if (r == null) continue; > if (((HRegion)r).shouldFlush(whyFlush)) { > FlushRequester requester = server.getFlushRequester(); > if (requester != null) { > long randomDelay = RandomUtils.nextInt(RANGE_OF_DELAY) + > MIN_DELAY_TIME; > LOG.info(getName() + " requesting flush of " + > r.getRegionInfo().getRegionNameAsString() + " because " + > whyFlush.toString() + > " after random delay " + randomDelay + "ms"); > //Throttle the flushes by putting a delay. If we don't throttle, > and there > //is a balanced write-load on the regions in a table, we might > end up > //overwhelming the filesystem with too many flushes at once. > requester.requestDelayedFlush(r, randomDelay, false); > } > } > } > } > {code} > {code} > 2017-07-24 18:44:33,338 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 270785ms > 2017-07-24 18:44:43,328 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 200143ms > 2017-07-24 18:44:53,954 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 191082ms > 2017-07-24 18:45:03,528 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 92532ms > 2017-07-24 18:45:14,201 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 238780ms > 2017-07-24 18:45:24,195 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting
[jira] [Commented] (HBASE-18451) PeriodicMemstoreFlusher should inspect the queue before adding a delayed flush request
[ https://issues.apache.org/jira/browse/HBASE-18451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632195#comment-16632195 ] Mike Drob commented on HBASE-18451: --- For branch-1 patch, the javac warnings are showing up because the indent level changed so the line number filtering doesn't work as expected. Fine to ignore that for now as a false positive. +1 for both > PeriodicMemstoreFlusher should inspect the queue before adding a delayed > flush request > -- > > Key: HBASE-18451 > URL: https://issues.apache.org/jira/browse/HBASE-18451 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 2.0.0-alpha-1 >Reporter: Jean-Marc Spaggiari >Assignee: Xu Cang >Priority: Major > Attachments: HBASE-18451.branch-1.001.patch, > HBASE-18451.branch-1.002.patch, HBASE-18451.branch-1.002.patch, > HBASE-18451.master.002.patch, HBASE-18451.master.003.patch, > HBASE-18451.master.004.patch, HBASE-18451.master.004.patch, > HBASE-18451.master.patch > > > If you run a big job every 4 hours, impacting many tables (they have 150 > regions per server), ad the end all the regions might have some data to be > flushed, and we want, after one hour, trigger a periodic flush. That's > totally fine. > Now, to avoid a flush storm, when we detect a region to be flushed, we add a > "randomDelay" to the delayed flush, that way we spread them away. > RANGE_OF_DELAY is 5 minutes. So we spread the flush over the next 5 minutes, > which is very good. > However, because we don't check if there is already a request in the queue, > 10 seconds after, we create a new request, with a new randomDelay. > If you generate a randomDelay every 10 seconds, at some point, you will end > up having a small one, and the flush will be triggered almost immediatly. > As a result, instead of spreading all the flush within the next 5 minutes, > you end-up getting them all way more quickly. Like within the first minute. > Which not only feed the queue to to many flush requests, but also defeats the > purpose of the randomDelay. > {code} > @Override > protected void chore() { > final StringBuffer whyFlush = new StringBuffer(); > for (Region r : this.server.onlineRegions.values()) { > if (r == null) continue; > if (((HRegion)r).shouldFlush(whyFlush)) { > FlushRequester requester = server.getFlushRequester(); > if (requester != null) { > long randomDelay = RandomUtils.nextInt(RANGE_OF_DELAY) + > MIN_DELAY_TIME; > LOG.info(getName() + " requesting flush of " + > r.getRegionInfo().getRegionNameAsString() + " because " + > whyFlush.toString() + > " after random delay " + randomDelay + "ms"); > //Throttle the flushes by putting a delay. If we don't throttle, > and there > //is a balanced write-load on the regions in a table, we might > end up > //overwhelming the filesystem with too many flushes at once. > requester.requestDelayedFlush(r, randomDelay, false); > } > } > } > } > {code} > {code} > 2017-07-24 18:44:33,338 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 270785ms > 2017-07-24 18:44:43,328 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 200143ms > 2017-07-24 18:44:53,954 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 191082ms > 2017-07-24 18:45:03,528 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 92532ms > 2017-07-24 18:45:14,201 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 238780ms > 2017-07-24 18:45:24,195 INFO
[jira] [Commented] (HBASE-18451) PeriodicMemstoreFlusher should inspect the queue before adding a delayed flush request
[ https://issues.apache.org/jira/browse/HBASE-18451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16627733#comment-16627733 ] Hadoop QA commented on HBASE-18451: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 2s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 1s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} branch-1 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 54s{color} | {color:green} branch-1 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s{color} | {color:green} branch-1 passed with JDK v1.8.0_181 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s{color} | {color:green} branch-1 passed with JDK v1.7.0_191 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 21s{color} | {color:green} branch-1 passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 2m 35s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s{color} | {color:green} branch-1 passed with JDK v1.8.0_181 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 35s{color} | {color:green} branch-1 passed with JDK v1.7.0_191 {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 34s{color} | {color:green} the patch passed with JDK v1.8.0_181 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s{color} | {color:green} the patch passed with JDK v1.7.0_191 {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 38s{color} | {color:red} hbase-server-jdk1.7.0_191 with JDK v1.7.0_191 generated 2 new + 4 unchanged - 2 fixed = 6 total (was 6) {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 2m 35s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 1m 32s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s{color} | {color:green} the patch passed with JDK v1.8.0_181 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 35s{color} | {color:green} the patch passed with JDK v1.7.0_191 {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green}106m 9s{color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 20s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}124m 41s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:61288f8 | | JIRA Issue | HBASE-18451 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12941237/HBASE-18451.branch-1.002.patch | | Optional Tests | asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux 20367d59c081 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018
[jira] [Commented] (HBASE-18451) PeriodicMemstoreFlusher should inspect the queue before adding a delayed flush request
[ https://issues.apache.org/jira/browse/HBASE-18451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16627555#comment-16627555 ] Xu Cang commented on HBASE-18451: - Fixed code style for the master branch. re-uploaded branch-1 patch to trigger another hadoop-qa run. The javac error was strange, let's try it again. > PeriodicMemstoreFlusher should inspect the queue before adding a delayed > flush request > -- > > Key: HBASE-18451 > URL: https://issues.apache.org/jira/browse/HBASE-18451 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 2.0.0-alpha-1 >Reporter: Jean-Marc Spaggiari >Assignee: Xu Cang >Priority: Major > Attachments: HBASE-18451.branch-1.001.patch, > HBASE-18451.branch-1.002.patch, HBASE-18451.branch-1.002.patch, > HBASE-18451.master.002.patch, HBASE-18451.master.003.patch, > HBASE-18451.master.004.patch, HBASE-18451.master.patch > > > If you run a big job every 4 hours, impacting many tables (they have 150 > regions per server), ad the end all the regions might have some data to be > flushed, and we want, after one hour, trigger a periodic flush. That's > totally fine. > Now, to avoid a flush storm, when we detect a region to be flushed, we add a > "randomDelay" to the delayed flush, that way we spread them away. > RANGE_OF_DELAY is 5 minutes. So we spread the flush over the next 5 minutes, > which is very good. > However, because we don't check if there is already a request in the queue, > 10 seconds after, we create a new request, with a new randomDelay. > If you generate a randomDelay every 10 seconds, at some point, you will end > up having a small one, and the flush will be triggered almost immediatly. > As a result, instead of spreading all the flush within the next 5 minutes, > you end-up getting them all way more quickly. Like within the first minute. > Which not only feed the queue to to many flush requests, but also defeats the > purpose of the randomDelay. > {code} > @Override > protected void chore() { > final StringBuffer whyFlush = new StringBuffer(); > for (Region r : this.server.onlineRegions.values()) { > if (r == null) continue; > if (((HRegion)r).shouldFlush(whyFlush)) { > FlushRequester requester = server.getFlushRequester(); > if (requester != null) { > long randomDelay = RandomUtils.nextInt(RANGE_OF_DELAY) + > MIN_DELAY_TIME; > LOG.info(getName() + " requesting flush of " + > r.getRegionInfo().getRegionNameAsString() + " because " + > whyFlush.toString() + > " after random delay " + randomDelay + "ms"); > //Throttle the flushes by putting a delay. If we don't throttle, > and there > //is a balanced write-load on the regions in a table, we might > end up > //overwhelming the filesystem with too many flushes at once. > requester.requestDelayedFlush(r, randomDelay, false); > } > } > } > } > {code} > {code} > 2017-07-24 18:44:33,338 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 270785ms > 2017-07-24 18:44:43,328 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 200143ms > 2017-07-24 18:44:53,954 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 191082ms > 2017-07-24 18:45:03,528 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 92532ms > 2017-07-24 18:45:14,201 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 238780ms > 2017-07-24 18:45:24,195 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: >
[jira] [Commented] (HBASE-18451) PeriodicMemstoreFlusher should inspect the queue before adding a delayed flush request
[ https://issues.apache.org/jira/browse/HBASE-18451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16627523#comment-16627523 ] Mike Drob commented on HBASE-18451: --- +1 after checkstyle fixed > PeriodicMemstoreFlusher should inspect the queue before adding a delayed > flush request > -- > > Key: HBASE-18451 > URL: https://issues.apache.org/jira/browse/HBASE-18451 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 2.0.0-alpha-1 >Reporter: Jean-Marc Spaggiari >Assignee: Xu Cang >Priority: Major > Attachments: HBASE-18451.branch-1.001.patch, > HBASE-18451.branch-1.002.patch, HBASE-18451.master.002.patch, > HBASE-18451.master.003.patch, HBASE-18451.master.patch > > > If you run a big job every 4 hours, impacting many tables (they have 150 > regions per server), ad the end all the regions might have some data to be > flushed, and we want, after one hour, trigger a periodic flush. That's > totally fine. > Now, to avoid a flush storm, when we detect a region to be flushed, we add a > "randomDelay" to the delayed flush, that way we spread them away. > RANGE_OF_DELAY is 5 minutes. So we spread the flush over the next 5 minutes, > which is very good. > However, because we don't check if there is already a request in the queue, > 10 seconds after, we create a new request, with a new randomDelay. > If you generate a randomDelay every 10 seconds, at some point, you will end > up having a small one, and the flush will be triggered almost immediatly. > As a result, instead of spreading all the flush within the next 5 minutes, > you end-up getting them all way more quickly. Like within the first minute. > Which not only feed the queue to to many flush requests, but also defeats the > purpose of the randomDelay. > {code} > @Override > protected void chore() { > final StringBuffer whyFlush = new StringBuffer(); > for (Region r : this.server.onlineRegions.values()) { > if (r == null) continue; > if (((HRegion)r).shouldFlush(whyFlush)) { > FlushRequester requester = server.getFlushRequester(); > if (requester != null) { > long randomDelay = RandomUtils.nextInt(RANGE_OF_DELAY) + > MIN_DELAY_TIME; > LOG.info(getName() + " requesting flush of " + > r.getRegionInfo().getRegionNameAsString() + " because " + > whyFlush.toString() + > " after random delay " + randomDelay + "ms"); > //Throttle the flushes by putting a delay. If we don't throttle, > and there > //is a balanced write-load on the regions in a table, we might > end up > //overwhelming the filesystem with too many flushes at once. > requester.requestDelayedFlush(r, randomDelay, false); > } > } > } > } > {code} > {code} > 2017-07-24 18:44:33,338 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 270785ms > 2017-07-24 18:44:43,328 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 200143ms > 2017-07-24 18:44:53,954 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 191082ms > 2017-07-24 18:45:03,528 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 92532ms > 2017-07-24 18:45:14,201 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 238780ms > 2017-07-24 18:45:24,195 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay
[jira] [Commented] (HBASE-18451) PeriodicMemstoreFlusher should inspect the queue before adding a delayed flush request
[ https://issues.apache.org/jira/browse/HBASE-18451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16627130#comment-16627130 ] Hadoop QA commented on HBASE-18451: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 14m 51s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 1s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} branch-1 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 49s{color} | {color:green} branch-1 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 34s{color} | {color:green} branch-1 passed with JDK v1.8.0_181 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s{color} | {color:green} branch-1 passed with JDK v1.7.0_191 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 27s{color} | {color:green} branch-1 passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 2m 46s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 34s{color} | {color:green} branch-1 passed with JDK v1.8.0_181 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 36s{color} | {color:green} branch-1 passed with JDK v1.7.0_191 {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s{color} | {color:green} the patch passed with JDK v1.8.0_181 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s{color} | {color:green} the patch passed with JDK v1.7.0_191 {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 38s{color} | {color:red} hbase-server-jdk1.7.0_191 with JDK v1.7.0_191 generated 2 new + 4 unchanged - 2 fixed = 6 total (was 6) {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 2m 42s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 1m 39s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s{color} | {color:green} the patch passed with JDK v1.8.0_181 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 35s{color} | {color:green} the patch passed with JDK v1.7.0_191 {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green}109m 19s{color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 28s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}149m 11s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:61288f8 | | JIRA Issue | HBASE-18451 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12941182/HBASE-18451.branch-1.002.patch | | Optional Tests | asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux d14dd42fe65e 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018
[jira] [Commented] (HBASE-18451) PeriodicMemstoreFlusher should inspect the queue before adding a delayed flush request
[ https://issues.apache.org/jira/browse/HBASE-18451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16627117#comment-16627117 ] Hadoop QA commented on HBASE-18451: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 42s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 59s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 18s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 31s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 2s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 53s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 16s{color} | {color:red} hbase-server: The patch generated 1 new + 145 unchanged - 0 fixed = 146 total (was 145) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 24s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 11m 12s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}134m 44s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 35s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}178m 48s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hbase.master.procedure.TestDisableTableProcedure | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b | | JIRA Issue | HBASE-18451 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12941177/HBASE-18451.master.003.patch | | Optional Tests | asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux dedff5b5a286 4.4.0-134-generic #160~14.04.1-Ubuntu SMP Fri Aug 17 11:07:07 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / c686b535c2 | | maven | version: Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC3 | | checkstyle | https://builds.apache.org/job/PreCommit-HBASE-Build/14492/artifact/patchprocess/diff-checkstyle-hbase-server.txt | | unit | https://builds.apache.org/job/PreCommit-HBASE-Build/14492/artifact/patchprocess/patch-unit-hbase-server.txt | |
[jira] [Commented] (HBASE-18451) PeriodicMemstoreFlusher should inspect the queue before adding a delayed flush request
[ https://issues.apache.org/jira/browse/HBASE-18451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626938#comment-16626938 ] Xu Cang commented on HBASE-18451: - Uploaded another set of pathces to address Mike's review comments. Â Also ran previous (branch-1) failed unit test locally and it is passing as below: [INFO] Running org.apache.hadoop.hbase.util.TestHBaseFsck [WARNING] Tests run: 59, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 331.094 s - in org.apache.hadoop.hbase.util.TestHBaseFsck [INFO] [INFO] Results: [INFO] [WARNING] Tests run: 59, Failures: 0, Errors: 0, Skipped: 1 > PeriodicMemstoreFlusher should inspect the queue before adding a delayed > flush request > -- > > Key: HBASE-18451 > URL: https://issues.apache.org/jira/browse/HBASE-18451 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 2.0.0-alpha-1 >Reporter: Jean-Marc Spaggiari >Assignee: Xu Cang >Priority: Major > Attachments: HBASE-18451.branch-1.001.patch, > HBASE-18451.branch-1.002.patch, HBASE-18451.master.002.patch, > HBASE-18451.master.003.patch, HBASE-18451.master.patch > > > If you run a big job every 4 hours, impacting many tables (they have 150 > regions per server), ad the end all the regions might have some data to be > flushed, and we want, after one hour, trigger a periodic flush. That's > totally fine. > Now, to avoid a flush storm, when we detect a region to be flushed, we add a > "randomDelay" to the delayed flush, that way we spread them away. > RANGE_OF_DELAY is 5 minutes. So we spread the flush over the next 5 minutes, > which is very good. > However, because we don't check if there is already a request in the queue, > 10 seconds after, we create a new request, with a new randomDelay. > If you generate a randomDelay every 10 seconds, at some point, you will end > up having a small one, and the flush will be triggered almost immediatly. > As a result, instead of spreading all the flush within the next 5 minutes, > you end-up getting them all way more quickly. Like within the first minute. > Which not only feed the queue to to many flush requests, but also defeats the > purpose of the randomDelay. > {code} > @Override > protected void chore() { > final StringBuffer whyFlush = new StringBuffer(); > for (Region r : this.server.onlineRegions.values()) { > if (r == null) continue; > if (((HRegion)r).shouldFlush(whyFlush)) { > FlushRequester requester = server.getFlushRequester(); > if (requester != null) { > long randomDelay = RandomUtils.nextInt(RANGE_OF_DELAY) + > MIN_DELAY_TIME; > LOG.info(getName() + " requesting flush of " + > r.getRegionInfo().getRegionNameAsString() + " because " + > whyFlush.toString() + > " after random delay " + randomDelay + "ms"); > //Throttle the flushes by putting a delay. If we don't throttle, > and there > //is a balanced write-load on the regions in a table, we might > end up > //overwhelming the filesystem with too many flushes at once. > requester.requestDelayedFlush(r, randomDelay, false); > } > } > } > } > {code} > {code} > 2017-07-24 18:44:33,338 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 270785ms > 2017-07-24 18:44:43,328 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 200143ms > 2017-07-24 18:44:53,954 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 191082ms > 2017-07-24 18:45:03,528 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 92532ms > 2017-07-24 18:45:14,201 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of
[jira] [Commented] (HBASE-18451) PeriodicMemstoreFlusher should inspect the queue before adding a delayed flush request
[ https://issues.apache.org/jira/browse/HBASE-18451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626838#comment-16626838 ] Allan Yang commented on HBASE-18451: {quote} Let me create a new Jira for compaction queue deduplication if you don't mind. {quote} [~xucang], Yes, you can open another Jira so we can disccuss there. > PeriodicMemstoreFlusher should inspect the queue before adding a delayed > flush request > -- > > Key: HBASE-18451 > URL: https://issues.apache.org/jira/browse/HBASE-18451 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 2.0.0-alpha-1 >Reporter: Jean-Marc Spaggiari >Assignee: Xu Cang >Priority: Major > Attachments: HBASE-18451.branch-1.001.patch, > HBASE-18451.master.002.patch, HBASE-18451.master.patch > > > If you run a big job every 4 hours, impacting many tables (they have 150 > regions per server), ad the end all the regions might have some data to be > flushed, and we want, after one hour, trigger a periodic flush. That's > totally fine. > Now, to avoid a flush storm, when we detect a region to be flushed, we add a > "randomDelay" to the delayed flush, that way we spread them away. > RANGE_OF_DELAY is 5 minutes. So we spread the flush over the next 5 minutes, > which is very good. > However, because we don't check if there is already a request in the queue, > 10 seconds after, we create a new request, with a new randomDelay. > If you generate a randomDelay every 10 seconds, at some point, you will end > up having a small one, and the flush will be triggered almost immediatly. > As a result, instead of spreading all the flush within the next 5 minutes, > you end-up getting them all way more quickly. Like within the first minute. > Which not only feed the queue to to many flush requests, but also defeats the > purpose of the randomDelay. > {code} > @Override > protected void chore() { > final StringBuffer whyFlush = new StringBuffer(); > for (Region r : this.server.onlineRegions.values()) { > if (r == null) continue; > if (((HRegion)r).shouldFlush(whyFlush)) { > FlushRequester requester = server.getFlushRequester(); > if (requester != null) { > long randomDelay = RandomUtils.nextInt(RANGE_OF_DELAY) + > MIN_DELAY_TIME; > LOG.info(getName() + " requesting flush of " + > r.getRegionInfo().getRegionNameAsString() + " because " + > whyFlush.toString() + > " after random delay " + randomDelay + "ms"); > //Throttle the flushes by putting a delay. If we don't throttle, > and there > //is a balanced write-load on the regions in a table, we might > end up > //overwhelming the filesystem with too many flushes at once. > requester.requestDelayedFlush(r, randomDelay, false); > } > } > } > } > {code} > {code} > 2017-07-24 18:44:33,338 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 270785ms > 2017-07-24 18:44:43,328 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 200143ms > 2017-07-24 18:44:53,954 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 191082ms > 2017-07-24 18:45:03,528 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 92532ms > 2017-07-24 18:45:14,201 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 238780ms > 2017-07-24 18:45:24,195 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of
[jira] [Commented] (HBASE-18451) PeriodicMemstoreFlusher should inspect the queue before adding a delayed flush request
[ https://issues.apache.org/jira/browse/HBASE-18451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626830#comment-16626830 ] Xu Cang commented on HBASE-18451: - [~allan163] thanks for the review. I took a quick look at compaction related code. Seems it has way more sophisticated policies and strategies to do throttling. I understand what you want is deduplication. Let me create a new Jira for compaction queue deduplication if you don't mind.  [~mdrob] Nice catch. Fixed. Thanks > PeriodicMemstoreFlusher should inspect the queue before adding a delayed > flush request > -- > > Key: HBASE-18451 > URL: https://issues.apache.org/jira/browse/HBASE-18451 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 2.0.0-alpha-1 >Reporter: Jean-Marc Spaggiari >Assignee: Xu Cang >Priority: Major > Attachments: HBASE-18451.branch-1.001.patch, > HBASE-18451.master.002.patch, HBASE-18451.master.patch > > > If you run a big job every 4 hours, impacting many tables (they have 150 > regions per server), ad the end all the regions might have some data to be > flushed, and we want, after one hour, trigger a periodic flush. That's > totally fine. > Now, to avoid a flush storm, when we detect a region to be flushed, we add a > "randomDelay" to the delayed flush, that way we spread them away. > RANGE_OF_DELAY is 5 minutes. So we spread the flush over the next 5 minutes, > which is very good. > However, because we don't check if there is already a request in the queue, > 10 seconds after, we create a new request, with a new randomDelay. > If you generate a randomDelay every 10 seconds, at some point, you will end > up having a small one, and the flush will be triggered almost immediatly. > As a result, instead of spreading all the flush within the next 5 minutes, > you end-up getting them all way more quickly. Like within the first minute. > Which not only feed the queue to to many flush requests, but also defeats the > purpose of the randomDelay. > {code} > @Override > protected void chore() { > final StringBuffer whyFlush = new StringBuffer(); > for (Region r : this.server.onlineRegions.values()) { > if (r == null) continue; > if (((HRegion)r).shouldFlush(whyFlush)) { > FlushRequester requester = server.getFlushRequester(); > if (requester != null) { > long randomDelay = RandomUtils.nextInt(RANGE_OF_DELAY) + > MIN_DELAY_TIME; > LOG.info(getName() + " requesting flush of " + > r.getRegionInfo().getRegionNameAsString() + " because " + > whyFlush.toString() + > " after random delay " + randomDelay + "ms"); > //Throttle the flushes by putting a delay. If we don't throttle, > and there > //is a balanced write-load on the regions in a table, we might > end up > //overwhelming the filesystem with too many flushes at once. > requester.requestDelayedFlush(r, randomDelay, false); > } > } > } > } > {code} > {code} > 2017-07-24 18:44:33,338 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 270785ms > 2017-07-24 18:44:43,328 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 200143ms > 2017-07-24 18:44:53,954 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 191082ms > 2017-07-24 18:45:03,528 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 92532ms > 2017-07-24 18:45:14,201 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 238780ms > 2017-07-24 18:45:24,195 INFO >
[jira] [Commented] (HBASE-18451) PeriodicMemstoreFlusher should inspect the queue before adding a delayed flush request
[ https://issues.apache.org/jira/browse/HBASE-18451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16625924#comment-16625924 ] Mike Drob commented on HBASE-18451: --- {noformat} + LOG.info(getName() + " requesting flush of " + + r.getRegionInfo().getRegionNameAsString() + " because " + + whyFlush.toString() + + " after random delay " + randomDelay + "ms"); {noformat} nit: can we switch this to parameterized logging? {noformat} @Override public boolean requestDelayedFlush(HRegion r, long delay, boolean forceFlushAllStores) { r.incrementFlushesQueuedCount(); synchronized (regionsInQueue) { if (!regionsInQueue.containsKey(r)) { // This entry has some delay FlushRegionEntry fqe = new FlushRegionEntry(r, forceFlushAllStores, FlushLifeCycleTracker.DUMMY); fqe.requeue(delay); this.regionsInQueue.put(r, fqe); this.flushQueue.add(fqe); return true; } return false; } } {noformat} Is that call to {{incrementFlushesQueuedCount}} correct? We attempt to queue one, but don't always add one to the queue, so the metric is going to be over-inflated. Same thing happens in {{requestFlush}}. > PeriodicMemstoreFlusher should inspect the queue before adding a delayed > flush request > -- > > Key: HBASE-18451 > URL: https://issues.apache.org/jira/browse/HBASE-18451 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 2.0.0-alpha-1 >Reporter: Jean-Marc Spaggiari >Assignee: Xu Cang >Priority: Major > Attachments: HBASE-18451.branch-1.001.patch, > HBASE-18451.master.002.patch, HBASE-18451.master.patch > > > If you run a big job every 4 hours, impacting many tables (they have 150 > regions per server), ad the end all the regions might have some data to be > flushed, and we want, after one hour, trigger a periodic flush. That's > totally fine. > Now, to avoid a flush storm, when we detect a region to be flushed, we add a > "randomDelay" to the delayed flush, that way we spread them away. > RANGE_OF_DELAY is 5 minutes. So we spread the flush over the next 5 minutes, > which is very good. > However, because we don't check if there is already a request in the queue, > 10 seconds after, we create a new request, with a new randomDelay. > If you generate a randomDelay every 10 seconds, at some point, you will end > up having a small one, and the flush will be triggered almost immediatly. > As a result, instead of spreading all the flush within the next 5 minutes, > you end-up getting them all way more quickly. Like within the first minute. > Which not only feed the queue to to many flush requests, but also defeats the > purpose of the randomDelay. > {code} > @Override > protected void chore() { > final StringBuffer whyFlush = new StringBuffer(); > for (Region r : this.server.onlineRegions.values()) { > if (r == null) continue; > if (((HRegion)r).shouldFlush(whyFlush)) { > FlushRequester requester = server.getFlushRequester(); > if (requester != null) { > long randomDelay = RandomUtils.nextInt(RANGE_OF_DELAY) + > MIN_DELAY_TIME; > LOG.info(getName() + " requesting flush of " + > r.getRegionInfo().getRegionNameAsString() + " because " + > whyFlush.toString() + > " after random delay " + randomDelay + "ms"); > //Throttle the flushes by putting a delay. If we don't throttle, > and there > //is a balanced write-load on the regions in a table, we might > end up > //overwhelming the filesystem with too many flushes at once. > requester.requestDelayedFlush(r, randomDelay, false); > } > } > } > } > {code} > {code} > 2017-07-24 18:44:33,338 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 270785ms > 2017-07-24 18:44:43,328 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 200143ms > 2017-07-24 18:44:53,954 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old
[jira] [Commented] (HBASE-18451) PeriodicMemstoreFlusher should inspect the queue before adding a delayed flush request
[ https://issues.apache.org/jira/browse/HBASE-18451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16625867#comment-16625867 ] Allan Yang commented on HBASE-18451: +1 for the patch. The compaction queue has the same issue. [~xucang] maybe you can take a look. We may queue the same Store in the compaction queue to compact over and over again, making the compaction queue very big. But it may not cause big trouble like this one here. So it is not a urgent thing to do. > PeriodicMemstoreFlusher should inspect the queue before adding a delayed > flush request > -- > > Key: HBASE-18451 > URL: https://issues.apache.org/jira/browse/HBASE-18451 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 2.0.0-alpha-1 >Reporter: Jean-Marc Spaggiari >Assignee: Xu Cang >Priority: Major > Attachments: HBASE-18451.branch-1.001.patch, > HBASE-18451.master.002.patch, HBASE-18451.master.patch > > > If you run a big job every 4 hours, impacting many tables (they have 150 > regions per server), ad the end all the regions might have some data to be > flushed, and we want, after one hour, trigger a periodic flush. That's > totally fine. > Now, to avoid a flush storm, when we detect a region to be flushed, we add a > "randomDelay" to the delayed flush, that way we spread them away. > RANGE_OF_DELAY is 5 minutes. So we spread the flush over the next 5 minutes, > which is very good. > However, because we don't check if there is already a request in the queue, > 10 seconds after, we create a new request, with a new randomDelay. > If you generate a randomDelay every 10 seconds, at some point, you will end > up having a small one, and the flush will be triggered almost immediatly. > As a result, instead of spreading all the flush within the next 5 minutes, > you end-up getting them all way more quickly. Like within the first minute. > Which not only feed the queue to to many flush requests, but also defeats the > purpose of the randomDelay. > {code} > @Override > protected void chore() { > final StringBuffer whyFlush = new StringBuffer(); > for (Region r : this.server.onlineRegions.values()) { > if (r == null) continue; > if (((HRegion)r).shouldFlush(whyFlush)) { > FlushRequester requester = server.getFlushRequester(); > if (requester != null) { > long randomDelay = RandomUtils.nextInt(RANGE_OF_DELAY) + > MIN_DELAY_TIME; > LOG.info(getName() + " requesting flush of " + > r.getRegionInfo().getRegionNameAsString() + " because " + > whyFlush.toString() + > " after random delay " + randomDelay + "ms"); > //Throttle the flushes by putting a delay. If we don't throttle, > and there > //is a balanced write-load on the regions in a table, we might > end up > //overwhelming the filesystem with too many flushes at once. > requester.requestDelayedFlush(r, randomDelay, false); > } > } > } > } > {code} > {code} > 2017-07-24 18:44:33,338 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 270785ms > 2017-07-24 18:44:43,328 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 200143ms > 2017-07-24 18:44:53,954 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 191082ms > 2017-07-24 18:45:03,528 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 92532ms > 2017-07-24 18:45:14,201 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 238780ms > 2017-07-24 18:45:24,195 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: >
[jira] [Commented] (HBASE-18451) PeriodicMemstoreFlusher should inspect the queue before adding a delayed flush request
[ https://issues.apache.org/jira/browse/HBASE-18451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16625557#comment-16625557 ] Hadoop QA commented on HBASE-18451: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 1s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} branch-1 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 46s{color} | {color:green} branch-1 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s{color} | {color:green} branch-1 passed with JDK v1.8.0_181 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s{color} | {color:green} branch-1 passed with JDK v1.7.0_191 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 21s{color} | {color:green} branch-1 passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 2m 34s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s{color} | {color:green} branch-1 passed with JDK v1.8.0_181 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 35s{color} | {color:green} branch-1 passed with JDK v1.7.0_191 {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s{color} | {color:green} the patch passed with JDK v1.8.0_181 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s{color} | {color:green} the patch passed with JDK v1.7.0_191 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 2m 34s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 1m 34s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s{color} | {color:green} the patch passed with JDK v1.8.0_181 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 35s{color} | {color:green} the patch passed with JDK v1.7.0_191 {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}118m 19s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 28s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}136m 52s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hbase.util.TestHBaseFsck | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:61288f8 | | JIRA Issue | HBASE-18451 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12941012/HBASE-18451.branch-1.001.patch | | Optional Tests | asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux 324b4fa3dcc5 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64
[jira] [Commented] (HBASE-18451) PeriodicMemstoreFlusher should inspect the queue before adding a delayed flush request
[ https://issues.apache.org/jira/browse/HBASE-18451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16625428#comment-16625428 ] Xu Cang commented on HBASE-18451: - rebased [~nihed] 's patch and uploaded for both master and branch-1. Â > PeriodicMemstoreFlusher should inspect the queue before adding a delayed > flush request > -- > > Key: HBASE-18451 > URL: https://issues.apache.org/jira/browse/HBASE-18451 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 2.0.0-alpha-1 >Reporter: Jean-Marc Spaggiari >Priority: Major > Attachments: HBASE-18451.branch-1.001.patch, > HBASE-18451.master.002.patch, HBASE-18451.master.patch > > > If you run a big job every 4 hours, impacting many tables (they have 150 > regions per server), ad the end all the regions might have some data to be > flushed, and we want, after one hour, trigger a periodic flush. That's > totally fine. > Now, to avoid a flush storm, when we detect a region to be flushed, we add a > "randomDelay" to the delayed flush, that way we spread them away. > RANGE_OF_DELAY is 5 minutes. So we spread the flush over the next 5 minutes, > which is very good. > However, because we don't check if there is already a request in the queue, > 10 seconds after, we create a new request, with a new randomDelay. > If you generate a randomDelay every 10 seconds, at some point, you will end > up having a small one, and the flush will be triggered almost immediatly. > As a result, instead of spreading all the flush within the next 5 minutes, > you end-up getting them all way more quickly. Like within the first minute. > Which not only feed the queue to to many flush requests, but also defeats the > purpose of the randomDelay. > {code} > @Override > protected void chore() { > final StringBuffer whyFlush = new StringBuffer(); > for (Region r : this.server.onlineRegions.values()) { > if (r == null) continue; > if (((HRegion)r).shouldFlush(whyFlush)) { > FlushRequester requester = server.getFlushRequester(); > if (requester != null) { > long randomDelay = RandomUtils.nextInt(RANGE_OF_DELAY) + > MIN_DELAY_TIME; > LOG.info(getName() + " requesting flush of " + > r.getRegionInfo().getRegionNameAsString() + " because " + > whyFlush.toString() + > " after random delay " + randomDelay + "ms"); > //Throttle the flushes by putting a delay. If we don't throttle, > and there > //is a balanced write-load on the regions in a table, we might > end up > //overwhelming the filesystem with too many flushes at once. > requester.requestDelayedFlush(r, randomDelay, false); > } > } > } > } > {code} > {code} > 2017-07-24 18:44:33,338 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 270785ms > 2017-07-24 18:44:43,328 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 200143ms > 2017-07-24 18:44:53,954 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 191082ms > 2017-07-24 18:45:03,528 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 92532ms > 2017-07-24 18:45:14,201 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 238780ms > 2017-07-24 18:45:24,195 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 35390ms > 2017-07-24 18:45:33,362 INFO >
[jira] [Commented] (HBASE-18451) PeriodicMemstoreFlusher should inspect the queue before adding a delayed flush request
[ https://issues.apache.org/jira/browse/HBASE-18451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607444#comment-16607444 ] Andrew Purtell commented on HBASE-18451: Cancelling patch and unassigning abandoned issue. Someone else can pick it up if they like. > PeriodicMemstoreFlusher should inspect the queue before adding a delayed > flush request > -- > > Key: HBASE-18451 > URL: https://issues.apache.org/jira/browse/HBASE-18451 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 2.0.0-alpha-1 >Reporter: Jean-Marc Spaggiari >Priority: Major > Attachments: HBASE-18451.master.patch > > > If you run a big job every 4 hours, impacting many tables (they have 150 > regions per server), ad the end all the regions might have some data to be > flushed, and we want, after one hour, trigger a periodic flush. That's > totally fine. > Now, to avoid a flush storm, when we detect a region to be flushed, we add a > "randomDelay" to the delayed flush, that way we spread them away. > RANGE_OF_DELAY is 5 minutes. So we spread the flush over the next 5 minutes, > which is very good. > However, because we don't check if there is already a request in the queue, > 10 seconds after, we create a new request, with a new randomDelay. > If you generate a randomDelay every 10 seconds, at some point, you will end > up having a small one, and the flush will be triggered almost immediatly. > As a result, instead of spreading all the flush within the next 5 minutes, > you end-up getting them all way more quickly. Like within the first minute. > Which not only feed the queue to to many flush requests, but also defeats the > purpose of the randomDelay. > {code} > @Override > protected void chore() { > final StringBuffer whyFlush = new StringBuffer(); > for (Region r : this.server.onlineRegions.values()) { > if (r == null) continue; > if (((HRegion)r).shouldFlush(whyFlush)) { > FlushRequester requester = server.getFlushRequester(); > if (requester != null) { > long randomDelay = RandomUtils.nextInt(RANGE_OF_DELAY) + > MIN_DELAY_TIME; > LOG.info(getName() + " requesting flush of " + > r.getRegionInfo().getRegionNameAsString() + " because " + > whyFlush.toString() + > " after random delay " + randomDelay + "ms"); > //Throttle the flushes by putting a delay. If we don't throttle, > and there > //is a balanced write-load on the regions in a table, we might > end up > //overwhelming the filesystem with too many flushes at once. > requester.requestDelayedFlush(r, randomDelay, false); > } > } > } > } > {code} > {code} > 2017-07-24 18:44:33,338 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 270785ms > 2017-07-24 18:44:43,328 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 200143ms > 2017-07-24 18:44:53,954 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 191082ms > 2017-07-24 18:45:03,528 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 92532ms > 2017-07-24 18:45:14,201 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 238780ms > 2017-07-24 18:45:24,195 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 35390ms > 2017-07-24 18:45:33,362 INFO >
[jira] [Commented] (HBASE-18451) PeriodicMemstoreFlusher should inspect the queue before adding a delayed flush request
[ https://issues.apache.org/jira/browse/HBASE-18451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16379378#comment-16379378 ] Hadoop QA commented on HBASE-18451: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 4s{color} | {color:red} HBASE-18451 does not apply to master. Rebase required? Wrong Branch? See https://yetus.apache.org/documentation/0.7.0/precommit-patchnames for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HBASE-18451 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12879325/HBASE-18451.master.patch | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/11714/console | | Powered by | Apache Yetus 0.7.0 http://yetus.apache.org | This message was automatically generated. > PeriodicMemstoreFlusher should inspect the queue before adding a delayed > flush request > -- > > Key: HBASE-18451 > URL: https://issues.apache.org/jira/browse/HBASE-18451 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 2.0.0-alpha-1 >Reporter: Jean-Marc Spaggiari >Assignee: nihed mbarek >Priority: Major > Attachments: HBASE-18451.master.patch > > > If you run a big job every 4 hours, impacting many tables (they have 150 > regions per server), ad the end all the regions might have some data to be > flushed, and we want, after one hour, trigger a periodic flush. That's > totally fine. > Now, to avoid a flush storm, when we detect a region to be flushed, we add a > "randomDelay" to the delayed flush, that way we spread them away. > RANGE_OF_DELAY is 5 minutes. So we spread the flush over the next 5 minutes, > which is very good. > However, because we don't check if there is already a request in the queue, > 10 seconds after, we create a new request, with a new randomDelay. > If you generate a randomDelay every 10 seconds, at some point, you will end > up having a small one, and the flush will be triggered almost immediatly. > As a result, instead of spreading all the flush within the next 5 minutes, > you end-up getting them all way more quickly. Like within the first minute. > Which not only feed the queue to to many flush requests, but also defeats the > purpose of the randomDelay. > {code} > @Override > protected void chore() { > final StringBuffer whyFlush = new StringBuffer(); > for (Region r : this.server.onlineRegions.values()) { > if (r == null) continue; > if (((HRegion)r).shouldFlush(whyFlush)) { > FlushRequester requester = server.getFlushRequester(); > if (requester != null) { > long randomDelay = RandomUtils.nextInt(RANGE_OF_DELAY) + > MIN_DELAY_TIME; > LOG.info(getName() + " requesting flush of " + > r.getRegionInfo().getRegionNameAsString() + " because " + > whyFlush.toString() + > " after random delay " + randomDelay + "ms"); > //Throttle the flushes by putting a delay. If we don't throttle, > and there > //is a balanced write-load on the regions in a table, we might > end up > //overwhelming the filesystem with too many flushes at once. > requester.requestDelayedFlush(r, randomDelay, false); > } > } > } > } > {code} > {code} > 2017-07-24 18:44:33,338 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 270785ms > 2017-07-24 18:44:43,328 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 200143ms > 2017-07-24 18:44:53,954 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 191082ms > 2017-07-24 18:45:03,528 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because
[jira] [Commented] (HBASE-18451) PeriodicMemstoreFlusher should inspect the queue before adding a delayed flush request
[ https://issues.apache.org/jira/browse/HBASE-18451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16379372#comment-16379372 ] Jean-Marc Spaggiari commented on HBASE-18451: - Ping. Can we get this re-based and commited somehow? > PeriodicMemstoreFlusher should inspect the queue before adding a delayed > flush request > -- > > Key: HBASE-18451 > URL: https://issues.apache.org/jira/browse/HBASE-18451 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 2.0.0-alpha-1 >Reporter: Jean-Marc Spaggiari >Assignee: nihed mbarek >Priority: Major > Attachments: HBASE-18451.master.patch > > > If you run a big job every 4 hours, impacting many tables (they have 150 > regions per server), ad the end all the regions might have some data to be > flushed, and we want, after one hour, trigger a periodic flush. That's > totally fine. > Now, to avoid a flush storm, when we detect a region to be flushed, we add a > "randomDelay" to the delayed flush, that way we spread them away. > RANGE_OF_DELAY is 5 minutes. So we spread the flush over the next 5 minutes, > which is very good. > However, because we don't check if there is already a request in the queue, > 10 seconds after, we create a new request, with a new randomDelay. > If you generate a randomDelay every 10 seconds, at some point, you will end > up having a small one, and the flush will be triggered almost immediatly. > As a result, instead of spreading all the flush within the next 5 minutes, > you end-up getting them all way more quickly. Like within the first minute. > Which not only feed the queue to to many flush requests, but also defeats the > purpose of the randomDelay. > {code} > @Override > protected void chore() { > final StringBuffer whyFlush = new StringBuffer(); > for (Region r : this.server.onlineRegions.values()) { > if (r == null) continue; > if (((HRegion)r).shouldFlush(whyFlush)) { > FlushRequester requester = server.getFlushRequester(); > if (requester != null) { > long randomDelay = RandomUtils.nextInt(RANGE_OF_DELAY) + > MIN_DELAY_TIME; > LOG.info(getName() + " requesting flush of " + > r.getRegionInfo().getRegionNameAsString() + " because " + > whyFlush.toString() + > " after random delay " + randomDelay + "ms"); > //Throttle the flushes by putting a delay. If we don't throttle, > and there > //is a balanced write-load on the regions in a table, we might > end up > //overwhelming the filesystem with too many flushes at once. > requester.requestDelayedFlush(r, randomDelay, false); > } > } > } > } > {code} > {code} > 2017-07-24 18:44:33,338 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 270785ms > 2017-07-24 18:44:43,328 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 200143ms > 2017-07-24 18:44:53,954 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 191082ms > 2017-07-24 18:45:03,528 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 92532ms > 2017-07-24 18:45:14,201 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 238780ms > 2017-07-24 18:45:24,195 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 35390ms > 2017-07-24 18:45:33,362 INFO >
[jira] [Commented] (HBASE-18451) PeriodicMemstoreFlusher should inspect the queue before adding a delayed flush request
[ https://issues.apache.org/jira/browse/HBASE-18451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16277210#comment-16277210 ] Hadoop QA commented on HBASE-18451: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 4s{color} | {color:red} HBASE-18451 does not apply to master. Rebase required? Wrong Branch? See https://yetus.apache.org/documentation/0.6.0/precommit-patchnames for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HBASE-18451 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12879325/HBASE-18451.master.patch | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/10209/console | | Powered by | Apache Yetus 0.6.0 http://yetus.apache.org | This message was automatically generated. > PeriodicMemstoreFlusher should inspect the queue before adding a delayed > flush request > -- > > Key: HBASE-18451 > URL: https://issues.apache.org/jira/browse/HBASE-18451 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 2.0.0-alpha-1 >Reporter: Jean-Marc Spaggiari >Assignee: nihed mbarek > Attachments: HBASE-18451.master.patch > > > If you run a big job every 4 hours, impacting many tables (they have 150 > regions per server), ad the end all the regions might have some data to be > flushed, and we want, after one hour, trigger a periodic flush. That's > totally fine. > Now, to avoid a flush storm, when we detect a region to be flushed, we add a > "randomDelay" to the delayed flush, that way we spread them away. > RANGE_OF_DELAY is 5 minutes. So we spread the flush over the next 5 minutes, > which is very good. > However, because we don't check if there is already a request in the queue, > 10 seconds after, we create a new request, with a new randomDelay. > If you generate a randomDelay every 10 seconds, at some point, you will end > up having a small one, and the flush will be triggered almost immediatly. > As a result, instead of spreading all the flush within the next 5 minutes, > you end-up getting them all way more quickly. Like within the first minute. > Which not only feed the queue to to many flush requests, but also defeats the > purpose of the randomDelay. > {code} > @Override > protected void chore() { > final StringBuffer whyFlush = new StringBuffer(); > for (Region r : this.server.onlineRegions.values()) { > if (r == null) continue; > if (((HRegion)r).shouldFlush(whyFlush)) { > FlushRequester requester = server.getFlushRequester(); > if (requester != null) { > long randomDelay = RandomUtils.nextInt(RANGE_OF_DELAY) + > MIN_DELAY_TIME; > LOG.info(getName() + " requesting flush of " + > r.getRegionInfo().getRegionNameAsString() + " because " + > whyFlush.toString() + > " after random delay " + randomDelay + "ms"); > //Throttle the flushes by putting a delay. If we don't throttle, > and there > //is a balanced write-load on the regions in a table, we might > end up > //overwhelming the filesystem with too many flushes at once. > requester.requestDelayedFlush(r, randomDelay, false); > } > } > } > } > {code} > {code} > 2017-07-24 18:44:33,338 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 270785ms > 2017-07-24 18:44:43,328 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 200143ms > 2017-07-24 18:44:53,954 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 191082ms > 2017-07-24 18:45:03,528 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so
[jira] [Commented] (HBASE-18451) PeriodicMemstoreFlusher should inspect the queue before adding a delayed flush request
[ https://issues.apache.org/jira/browse/HBASE-18451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16277188#comment-16277188 ] Jean-Marc Spaggiari commented on HBASE-18451: - bump > PeriodicMemstoreFlusher should inspect the queue before adding a delayed > flush request > -- > > Key: HBASE-18451 > URL: https://issues.apache.org/jira/browse/HBASE-18451 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 2.0.0-alpha-1 >Reporter: Jean-Marc Spaggiari >Assignee: nihed mbarek > Attachments: HBASE-18451.master.patch > > > If you run a big job every 4 hours, impacting many tables (they have 150 > regions per server), ad the end all the regions might have some data to be > flushed, and we want, after one hour, trigger a periodic flush. That's > totally fine. > Now, to avoid a flush storm, when we detect a region to be flushed, we add a > "randomDelay" to the delayed flush, that way we spread them away. > RANGE_OF_DELAY is 5 minutes. So we spread the flush over the next 5 minutes, > which is very good. > However, because we don't check if there is already a request in the queue, > 10 seconds after, we create a new request, with a new randomDelay. > If you generate a randomDelay every 10 seconds, at some point, you will end > up having a small one, and the flush will be triggered almost immediatly. > As a result, instead of spreading all the flush within the next 5 minutes, > you end-up getting them all way more quickly. Like within the first minute. > Which not only feed the queue to to many flush requests, but also defeats the > purpose of the randomDelay. > {code} > @Override > protected void chore() { > final StringBuffer whyFlush = new StringBuffer(); > for (Region r : this.server.onlineRegions.values()) { > if (r == null) continue; > if (((HRegion)r).shouldFlush(whyFlush)) { > FlushRequester requester = server.getFlushRequester(); > if (requester != null) { > long randomDelay = RandomUtils.nextInt(RANGE_OF_DELAY) + > MIN_DELAY_TIME; > LOG.info(getName() + " requesting flush of " + > r.getRegionInfo().getRegionNameAsString() + " because " + > whyFlush.toString() + > " after random delay " + randomDelay + "ms"); > //Throttle the flushes by putting a delay. If we don't throttle, > and there > //is a balanced write-load on the regions in a table, we might > end up > //overwhelming the filesystem with too many flushes at once. > requester.requestDelayedFlush(r, randomDelay, false); > } > } > } > } > {code} > {code} > 2017-07-24 18:44:33,338 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 270785ms > 2017-07-24 18:44:43,328 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 200143ms > 2017-07-24 18:44:53,954 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 191082ms > 2017-07-24 18:45:03,528 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 92532ms > 2017-07-24 18:45:14,201 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 238780ms > 2017-07-24 18:45:24,195 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 35390ms > 2017-07-24 18:45:33,362 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: >
[jira] [Commented] (HBASE-18451) PeriodicMemstoreFlusher should inspect the queue before adding a delayed flush request
[ https://issues.apache.org/jira/browse/HBASE-18451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16104888#comment-16104888 ] Jean-Marc Spaggiari commented on HBASE-18451: - Thanks Anoop. > PeriodicMemstoreFlusher should inspect the queue before adding a delayed > flush request > -- > > Key: HBASE-18451 > URL: https://issues.apache.org/jira/browse/HBASE-18451 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 2.0.0-alpha-1 >Reporter: Jean-Marc Spaggiari >Assignee: nihed mbarek > Attachments: HBASE-18451.master.patch > > > If you run a big job every 4 hours, impacting many tables (they have 150 > regions per server), ad the end all the regions might have some data to be > flushed, and we want, after one hour, trigger a periodic flush. That's > totally fine. > Now, to avoid a flush storm, when we detect a region to be flushed, we add a > "randomDelay" to the delayed flush, that way we spread them away. > RANGE_OF_DELAY is 5 minutes. So we spread the flush over the next 5 minutes, > which is very good. > However, because we don't check if there is already a request in the queue, > 10 seconds after, we create a new request, with a new randomDelay. > If you generate a randomDelay every 10 seconds, at some point, you will end > up having a small one, and the flush will be triggered almost immediatly. > As a result, instead of spreading all the flush within the next 5 minutes, > you end-up getting them all way more quickly. Like within the first minute. > Which not only feed the queue to to many flush requests, but also defeats the > purpose of the randomDelay. > {code} > @Override > protected void chore() { > final StringBuffer whyFlush = new StringBuffer(); > for (Region r : this.server.onlineRegions.values()) { > if (r == null) continue; > if (((HRegion)r).shouldFlush(whyFlush)) { > FlushRequester requester = server.getFlushRequester(); > if (requester != null) { > long randomDelay = RandomUtils.nextInt(RANGE_OF_DELAY) + > MIN_DELAY_TIME; > LOG.info(getName() + " requesting flush of " + > r.getRegionInfo().getRegionNameAsString() + " because " + > whyFlush.toString() + > " after random delay " + randomDelay + "ms"); > //Throttle the flushes by putting a delay. If we don't throttle, > and there > //is a balanced write-load on the regions in a table, we might > end up > //overwhelming the filesystem with too many flushes at once. > requester.requestDelayedFlush(r, randomDelay, false); > } > } > } > } > {code} > {code} > 2017-07-24 18:44:33,338 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 270785ms > 2017-07-24 18:44:43,328 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 200143ms > 2017-07-24 18:44:53,954 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 191082ms > 2017-07-24 18:45:03,528 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 92532ms > 2017-07-24 18:45:14,201 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 238780ms > 2017-07-24 18:45:24,195 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 35390ms > 2017-07-24 18:45:33,362 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: >
[jira] [Commented] (HBASE-18451) PeriodicMemstoreFlusher should inspect the queue before adding a delayed flush request
[ https://issues.apache.org/jira/browse/HBASE-18451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16104884#comment-16104884 ] Jean-Marc Spaggiari commented on HBASE-18451: - You got it! ;) LGTM. > PeriodicMemstoreFlusher should inspect the queue before adding a delayed > flush request > -- > > Key: HBASE-18451 > URL: https://issues.apache.org/jira/browse/HBASE-18451 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 2.0.0-alpha-1 >Reporter: Jean-Marc Spaggiari >Assignee: nihed mbarek > Attachments: HBASE-18451.master.patch > > > If you run a big job every 4 hours, impacting many tables (they have 150 > regions per server), ad the end all the regions might have some data to be > flushed, and we want, after one hour, trigger a periodic flush. That's > totally fine. > Now, to avoid a flush storm, when we detect a region to be flushed, we add a > "randomDelay" to the delayed flush, that way we spread them away. > RANGE_OF_DELAY is 5 minutes. So we spread the flush over the next 5 minutes, > which is very good. > However, because we don't check if there is already a request in the queue, > 10 seconds after, we create a new request, with a new randomDelay. > If you generate a randomDelay every 10 seconds, at some point, you will end > up having a small one, and the flush will be triggered almost immediatly. > As a result, instead of spreading all the flush within the next 5 minutes, > you end-up getting them all way more quickly. Like within the first minute. > Which not only feed the queue to to many flush requests, but also defeats the > purpose of the randomDelay. > {code} > @Override > protected void chore() { > final StringBuffer whyFlush = new StringBuffer(); > for (Region r : this.server.onlineRegions.values()) { > if (r == null) continue; > if (((HRegion)r).shouldFlush(whyFlush)) { > FlushRequester requester = server.getFlushRequester(); > if (requester != null) { > long randomDelay = RandomUtils.nextInt(RANGE_OF_DELAY) + > MIN_DELAY_TIME; > LOG.info(getName() + " requesting flush of " + > r.getRegionInfo().getRegionNameAsString() + " because " + > whyFlush.toString() + > " after random delay " + randomDelay + "ms"); > //Throttle the flushes by putting a delay. If we don't throttle, > and there > //is a balanced write-load on the regions in a table, we might > end up > //overwhelming the filesystem with too many flushes at once. > requester.requestDelayedFlush(r, randomDelay, false); > } > } > } > } > {code} > {code} > 2017-07-24 18:44:33,338 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 270785ms > 2017-07-24 18:44:43,328 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 200143ms > 2017-07-24 18:44:53,954 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 191082ms > 2017-07-24 18:45:03,528 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 92532ms > 2017-07-24 18:45:14,201 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 238780ms > 2017-07-24 18:45:24,195 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 35390ms > 2017-07-24 18:45:33,362 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: >
[jira] [Commented] (HBASE-18451) PeriodicMemstoreFlusher should inspect the queue before adding a delayed flush request
[ https://issues.apache.org/jira/browse/HBASE-18451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16104880#comment-16104880 ] Hadoop QA commented on HBASE-18451: --- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 39s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 50s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 57s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 20s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 11s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 31m 9s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha4. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green}119m 11s{color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}167m 51s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=1.11.2 Server=1.11.2 Image:yetus/hbase:bdc94b1 | | JIRA Issue | HBASE-18451 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12879325/HBASE-18451.master.patch | | Optional Tests | asflicense javac javadoc unit findbugs hadoopcheck hbaseanti checkstyle compile | | uname | Linux 963f4e251072 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / 2d06a06 | | Default Java | 1.8.0_131 | | findbugs | v3.1.0-RC3 | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/7829/testReport/ | | modules | C: hbase-server U: hbase-server | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/7829/console | | Powered by | Apache Yetus 0.4.0 http://yetus.apache.org | This message was automatically generated. > PeriodicMemstoreFlusher should inspect the queue before adding a delayed > flush request > -- > > Key: HBASE-18451 > URL: https://issues.apache.org/jira/browse/HBASE-18451 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 2.0.0-alpha-1 >Reporter: Jean-Marc Spaggiari >
[jira] [Commented] (HBASE-18451) PeriodicMemstoreFlusher should inspect the queue before adding a delayed flush request
[ https://issues.apache.org/jira/browse/HBASE-18451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16104706#comment-16104706 ] Hadoop QA commented on HBASE-18451: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 5s{color} | {color:red} HBASE-18451 does not apply to master. Rebase required? Wrong Branch? See https://yetus.apache.org/documentation/0.4.0/precommit-patchnames for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HBASE-18451 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12879322/HBASE-18451.master.patch | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/7828/console | | Powered by | Apache Yetus 0.4.0 http://yetus.apache.org | This message was automatically generated. > PeriodicMemstoreFlusher should inspect the queue before adding a delayed > flush request > -- > > Key: HBASE-18451 > URL: https://issues.apache.org/jira/browse/HBASE-18451 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 2.0.0-alpha-1 >Reporter: Jean-Marc Spaggiari >Assignee: nihed mbarek > Attachments: HBASE-18451.master.patch > > > If you run a big job every 4 hours, impacting many tables (they have 150 > regions per server), ad the end all the regions might have some data to be > flushed, and we want, after one hour, trigger a periodic flush. That's > totally fine. > Now, to avoid a flush storm, when we detect a region to be flushed, we add a > "randomDelay" to the delayed flush, that way we spread them away. > RANGE_OF_DELAY is 5 minutes. So we spread the flush over the next 5 minutes, > which is very good. > However, because we don't check if there is already a request in the queue, > 10 seconds after, we create a new request, with a new randomDelay. > If you generate a randomDelay every 10 seconds, at some point, you will end > up having a small one, and the flush will be triggered almost immediatly. > As a result, instead of spreading all the flush within the next 5 minutes, > you end-up getting them all way more quickly. Like within the first minute. > Which not only feed the queue to to many flush requests, but also defeats the > purpose of the randomDelay. > {code} > @Override > protected void chore() { > final StringBuffer whyFlush = new StringBuffer(); > for (Region r : this.server.onlineRegions.values()) { > if (r == null) continue; > if (((HRegion)r).shouldFlush(whyFlush)) { > FlushRequester requester = server.getFlushRequester(); > if (requester != null) { > long randomDelay = RandomUtils.nextInt(RANGE_OF_DELAY) + > MIN_DELAY_TIME; > LOG.info(getName() + " requesting flush of " + > r.getRegionInfo().getRegionNameAsString() + " because " + > whyFlush.toString() + > " after random delay " + randomDelay + "ms"); > //Throttle the flushes by putting a delay. If we don't throttle, > and there > //is a balanced write-load on the regions in a table, we might > end up > //overwhelming the filesystem with too many flushes at once. > requester.requestDelayedFlush(r, randomDelay, false); > } > } > } > } > {code} > {code} > 2017-07-24 18:44:33,338 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 270785ms > 2017-07-24 18:44:43,328 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 200143ms > 2017-07-24 18:44:53,954 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 191082ms > 2017-07-24 18:45:03,528 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush
[jira] [Commented] (HBASE-18451) PeriodicMemstoreFlusher should inspect the queue before adding a delayed flush request
[ https://issues.apache.org/jira/browse/HBASE-18451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16104694#comment-16104694 ] Hadoop QA commented on HBASE-18451: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 5s{color} | {color:red} HBASE-18451 does not apply to master. Rebase required? Wrong Branch? See https://yetus.apache.org/documentation/0.4.0/precommit-patchnames for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HBASE-18451 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12879321/ISSUE.patch | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/7827/console | | Powered by | Apache Yetus 0.4.0 http://yetus.apache.org | This message was automatically generated. > PeriodicMemstoreFlusher should inspect the queue before adding a delayed > flush request > -- > > Key: HBASE-18451 > URL: https://issues.apache.org/jira/browse/HBASE-18451 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 2.0.0-alpha-1 >Reporter: Jean-Marc Spaggiari >Assignee: nihed mbarek > Attachments: ISSUE.patch > > > If you run a big job every 4 hours, impacting many tables (they have 150 > regions per server), ad the end all the regions might have some data to be > flushed, and we want, after one hour, trigger a periodic flush. That's > totally fine. > Now, to avoid a flush storm, when we detect a region to be flushed, we add a > "randomDelay" to the delayed flush, that way we spread them away. > RANGE_OF_DELAY is 5 minutes. So we spread the flush over the next 5 minutes, > which is very good. > However, because we don't check if there is already a request in the queue, > 10 seconds after, we create a new request, with a new randomDelay. > If you generate a randomDelay every 10 seconds, at some point, you will end > up having a small one, and the flush will be triggered almost immediatly. > As a result, instead of spreading all the flush within the next 5 minutes, > you end-up getting them all way more quickly. Like within the first minute. > Which not only feed the queue to to many flush requests, but also defeats the > purpose of the randomDelay. > {code} > @Override > protected void chore() { > final StringBuffer whyFlush = new StringBuffer(); > for (Region r : this.server.onlineRegions.values()) { > if (r == null) continue; > if (((HRegion)r).shouldFlush(whyFlush)) { > FlushRequester requester = server.getFlushRequester(); > if (requester != null) { > long randomDelay = RandomUtils.nextInt(RANGE_OF_DELAY) + > MIN_DELAY_TIME; > LOG.info(getName() + " requesting flush of " + > r.getRegionInfo().getRegionNameAsString() + " because " + > whyFlush.toString() + > " after random delay " + randomDelay + "ms"); > //Throttle the flushes by putting a delay. If we don't throttle, > and there > //is a balanced write-load on the regions in a table, we might > end up > //overwhelming the filesystem with too many flushes at once. > requester.requestDelayedFlush(r, randomDelay, false); > } > } > } > } > {code} > {code} > 2017-07-24 18:44:33,338 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 270785ms > 2017-07-24 18:44:43,328 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 200143ms > 2017-07-24 18:44:53,954 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 191082ms > 2017-07-24 18:45:03,528 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random
[jira] [Commented] (HBASE-18451) PeriodicMemstoreFlusher should inspect the queue before adding a delayed flush request
[ https://issues.apache.org/jira/browse/HBASE-18451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16104681#comment-16104681 ] Hadoop QA commented on HBASE-18451: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 7s{color} | {color:red} HBASE-18451 does not apply to master. Rebase required? Wrong Branch? See https://yetus.apache.org/documentation/0.4.0/precommit-patchnames for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HBASE-18451 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12879318/ISSUE.patch | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/7826/console | | Powered by | Apache Yetus 0.4.0 http://yetus.apache.org | This message was automatically generated. > PeriodicMemstoreFlusher should inspect the queue before adding a delayed > flush request > -- > > Key: HBASE-18451 > URL: https://issues.apache.org/jira/browse/HBASE-18451 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 2.0.0-alpha-1 >Reporter: Jean-Marc Spaggiari >Assignee: nihed mbarek > Attachments: ISSUE.patch > > > If you run a big job every 4 hours, impacting many tables (they have 150 > regions per server), ad the end all the regions might have some data to be > flushed, and we want, after one hour, trigger a periodic flush. That's > totally fine. > Now, to avoid a flush storm, when we detect a region to be flushed, we add a > "randomDelay" to the delayed flush, that way we spread them away. > RANGE_OF_DELAY is 5 minutes. So we spread the flush over the next 5 minutes, > which is very good. > However, because we don't check if there is already a request in the queue, > 10 seconds after, we create a new request, with a new randomDelay. > If you generate a randomDelay every 10 seconds, at some point, you will end > up having a small one, and the flush will be triggered almost immediatly. > As a result, instead of spreading all the flush within the next 5 minutes, > you end-up getting them all way more quickly. Like within the first minute. > Which not only feed the queue to to many flush requests, but also defeats the > purpose of the randomDelay. > {code} > @Override > protected void chore() { > final StringBuffer whyFlush = new StringBuffer(); > for (Region r : this.server.onlineRegions.values()) { > if (r == null) continue; > if (((HRegion)r).shouldFlush(whyFlush)) { > FlushRequester requester = server.getFlushRequester(); > if (requester != null) { > long randomDelay = RandomUtils.nextInt(RANGE_OF_DELAY) + > MIN_DELAY_TIME; > LOG.info(getName() + " requesting flush of " + > r.getRegionInfo().getRegionNameAsString() + " because " + > whyFlush.toString() + > " after random delay " + randomDelay + "ms"); > //Throttle the flushes by putting a delay. If we don't throttle, > and there > //is a balanced write-load on the regions in a table, we might > end up > //overwhelming the filesystem with too many flushes at once. > requester.requestDelayedFlush(r, randomDelay, false); > } > } > } > } > {code} > {code} > 2017-07-24 18:44:33,338 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 270785ms > 2017-07-24 18:44:43,328 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 200143ms > 2017-07-24 18:44:53,954 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 191082ms > 2017-07-24 18:45:03,528 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random
[jira] [Commented] (HBASE-18451) PeriodicMemstoreFlusher should inspect the queue before adding a delayed flush request
[ https://issues.apache.org/jira/browse/HBASE-18451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16104649#comment-16104649 ] Hadoop QA commented on HBASE-18451: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 5s{color} | {color:red} HBASE-18451 does not apply to master. Rebase required? Wrong Branch? See https://yetus.apache.org/documentation/0.4.0/precommit-patchnames for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HBASE-18451 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12879313/0001-HBASE-15134-Add-visibility-into-Flush-and-Compaction.patch | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/7825/console | | Powered by | Apache Yetus 0.4.0 http://yetus.apache.org | This message was automatically generated. > PeriodicMemstoreFlusher should inspect the queue before adding a delayed > flush request > -- > > Key: HBASE-18451 > URL: https://issues.apache.org/jira/browse/HBASE-18451 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 2.0.0-alpha-1 >Reporter: Jean-Marc Spaggiari >Assignee: nihed mbarek > Attachments: > 0001-HBASE-15134-Add-visibility-into-Flush-and-Compaction.patch, > 0001-HBASE-18451-PeriodicMemstoreFlusher-should-inspect-t.patch > > > If you run a big job every 4 hours, impacting many tables (they have 150 > regions per server), ad the end all the regions might have some data to be > flushed, and we want, after one hour, trigger a periodic flush. That's > totally fine. > Now, to avoid a flush storm, when we detect a region to be flushed, we add a > "randomDelay" to the delayed flush, that way we spread them away. > RANGE_OF_DELAY is 5 minutes. So we spread the flush over the next 5 minutes, > which is very good. > However, because we don't check if there is already a request in the queue, > 10 seconds after, we create a new request, with a new randomDelay. > If you generate a randomDelay every 10 seconds, at some point, you will end > up having a small one, and the flush will be triggered almost immediatly. > As a result, instead of spreading all the flush within the next 5 minutes, > you end-up getting them all way more quickly. Like within the first minute. > Which not only feed the queue to to many flush requests, but also defeats the > purpose of the randomDelay. > {code} > @Override > protected void chore() { > final StringBuffer whyFlush = new StringBuffer(); > for (Region r : this.server.onlineRegions.values()) { > if (r == null) continue; > if (((HRegion)r).shouldFlush(whyFlush)) { > FlushRequester requester = server.getFlushRequester(); > if (requester != null) { > long randomDelay = RandomUtils.nextInt(RANGE_OF_DELAY) + > MIN_DELAY_TIME; > LOG.info(getName() + " requesting flush of " + > r.getRegionInfo().getRegionNameAsString() + " because " + > whyFlush.toString() + > " after random delay " + randomDelay + "ms"); > //Throttle the flushes by putting a delay. If we don't throttle, > and there > //is a balanced write-load on the regions in a table, we might > end up > //overwhelming the filesystem with too many flushes at once. > requester.requestDelayedFlush(r, randomDelay, false); > } > } > } > } > {code} > {code} > 2017-07-24 18:44:33,338 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 270785ms > 2017-07-24 18:44:43,328 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 200143ms > 2017-07-24 18:44:53,954 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 191082ms > 2017-07-24 18:45:03,528 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: >
[jira] [Commented] (HBASE-18451) PeriodicMemstoreFlusher should inspect the queue before adding a delayed flush request
[ https://issues.apache.org/jira/browse/HBASE-18451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16104618#comment-16104618 ] Hadoop QA commented on HBASE-18451: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 4s{color} | {color:red} HBASE-18451 does not apply to master. Rebase required? Wrong Branch? See https://yetus.apache.org/documentation/0.4.0/precommit-patchnames for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HBASE-18451 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12879311/0001-HBASE-18451-PeriodicMemstoreFlusher-should-inspect-t.patch | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/7824/console | | Powered by | Apache Yetus 0.4.0 http://yetus.apache.org | This message was automatically generated. > PeriodicMemstoreFlusher should inspect the queue before adding a delayed > flush request > -- > > Key: HBASE-18451 > URL: https://issues.apache.org/jira/browse/HBASE-18451 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 2.0.0-alpha-1 >Reporter: Jean-Marc Spaggiari >Assignee: nihed mbarek > Attachments: > 0001-HBASE-18451-PeriodicMemstoreFlusher-should-inspect-t.patch > > > If you run a big job every 4 hours, impacting many tables (they have 150 > regions per server), ad the end all the regions might have some data to be > flushed, and we want, after one hour, trigger a periodic flush. That's > totally fine. > Now, to avoid a flush storm, when we detect a region to be flushed, we add a > "randomDelay" to the delayed flush, that way we spread them away. > RANGE_OF_DELAY is 5 minutes. So we spread the flush over the next 5 minutes, > which is very good. > However, because we don't check if there is already a request in the queue, > 10 seconds after, we create a new request, with a new randomDelay. > If you generate a randomDelay every 10 seconds, at some point, you will end > up having a small one, and the flush will be triggered almost immediatly. > As a result, instead of spreading all the flush within the next 5 minutes, > you end-up getting them all way more quickly. Like within the first minute. > Which not only feed the queue to to many flush requests, but also defeats the > purpose of the randomDelay. > {code} > @Override > protected void chore() { > final StringBuffer whyFlush = new StringBuffer(); > for (Region r : this.server.onlineRegions.values()) { > if (r == null) continue; > if (((HRegion)r).shouldFlush(whyFlush)) { > FlushRequester requester = server.getFlushRequester(); > if (requester != null) { > long randomDelay = RandomUtils.nextInt(RANGE_OF_DELAY) + > MIN_DELAY_TIME; > LOG.info(getName() + " requesting flush of " + > r.getRegionInfo().getRegionNameAsString() + " because " + > whyFlush.toString() + > " after random delay " + randomDelay + "ms"); > //Throttle the flushes by putting a delay. If we don't throttle, > and there > //is a balanced write-load on the regions in a table, we might > end up > //overwhelming the filesystem with too many flushes at once. > requester.requestDelayedFlush(r, randomDelay, false); > } > } > } > } > {code} > {code} > 2017-07-24 18:44:33,338 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 270785ms > 2017-07-24 18:44:43,328 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 200143ms > 2017-07-24 18:44:53,954 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 191082ms > 2017-07-24 18:45:03,528 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of
[jira] [Commented] (HBASE-18451) PeriodicMemstoreFlusher should inspect the queue before adding a delayed flush request
[ https://issues.apache.org/jira/browse/HBASE-18451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16104426#comment-16104426 ] Anoop Sam John commented on HBASE-18451: Done JMS. > PeriodicMemstoreFlusher should inspect the queue before adding a delayed > flush request > -- > > Key: HBASE-18451 > URL: https://issues.apache.org/jira/browse/HBASE-18451 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 2.0.0-alpha-1 >Reporter: Jean-Marc Spaggiari >Assignee: nihed mbarek > > If you run a big job every 4 hours, impacting many tables (they have 150 > regions per server), ad the end all the regions might have some data to be > flushed, and we want, after one hour, trigger a periodic flush. That's > totally fine. > Now, to avoid a flush storm, when we detect a region to be flushed, we add a > "randomDelay" to the delayed flush, that way we spread them away. > RANGE_OF_DELAY is 5 minutes. So we spread the flush over the next 5 minutes, > which is very good. > However, because we don't check if there is already a request in the queue, > 10 seconds after, we create a new request, with a new randomDelay. > If you generate a randomDelay every 10 seconds, at some point, you will end > up having a small one, and the flush will be triggered almost immediatly. > As a result, instead of spreading all the flush within the next 5 minutes, > you end-up getting them all way more quickly. Like within the first minute. > Which not only feed the queue to to many flush requests, but also defeats the > purpose of the randomDelay. > {code} > @Override > protected void chore() { > final StringBuffer whyFlush = new StringBuffer(); > for (Region r : this.server.onlineRegions.values()) { > if (r == null) continue; > if (((HRegion)r).shouldFlush(whyFlush)) { > FlushRequester requester = server.getFlushRequester(); > if (requester != null) { > long randomDelay = RandomUtils.nextInt(RANGE_OF_DELAY) + > MIN_DELAY_TIME; > LOG.info(getName() + " requesting flush of " + > r.getRegionInfo().getRegionNameAsString() + " because " + > whyFlush.toString() + > " after random delay " + randomDelay + "ms"); > //Throttle the flushes by putting a delay. If we don't throttle, > and there > //is a balanced write-load on the regions in a table, we might > end up > //overwhelming the filesystem with too many flushes at once. > requester.requestDelayedFlush(r, randomDelay, false); > } > } > } > } > {code} > {code} > 2017-07-24 18:44:33,338 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 270785ms > 2017-07-24 18:44:43,328 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 200143ms > 2017-07-24 18:44:53,954 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 191082ms > 2017-07-24 18:45:03,528 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 92532ms > 2017-07-24 18:45:14,201 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 238780ms > 2017-07-24 18:45:24,195 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 35390ms > 2017-07-24 18:45:33,362 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of
[jira] [Commented] (HBASE-18451) PeriodicMemstoreFlusher should inspect the queue before adding a delayed flush request
[ https://issues.apache.org/jira/browse/HBASE-18451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103472#comment-16103472 ] Jean-Marc Spaggiari commented on HBASE-18451: - @stack Can you please assign this JIRA to Nihed? Thanks. > PeriodicMemstoreFlusher should inspect the queue before adding a delayed > flush request > -- > > Key: HBASE-18451 > URL: https://issues.apache.org/jira/browse/HBASE-18451 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 2.0.0-alpha-1 >Reporter: Jean-Marc Spaggiari > > If you run a big job every 4 hours, impacting many tables (they have 150 > regions per server), ad the end all the regions might have some data to be > flushed, and we want, after one hour, trigger a periodic flush. That's > totally fine. > Now, to avoid a flush storm, when we detect a region to be flushed, we add a > "randomDelay" to the delayed flush, that way we spread them away. > RANGE_OF_DELAY is 5 minutes. So we spread the flush over the next 5 minutes, > which is very good. > However, because we don't check if there is already a request in the queue, > 10 seconds after, we create a new request, with a new randomDelay. > If you generate a randomDelay every 10 seconds, at some point, you will end > up having a small one, and the flush will be triggered almost immediatly. > As a result, instead of spreading all the flush within the next 5 minutes, > you end-up getting them all way more quickly. Like within the first minute. > Which not only feed the queue to to many flush requests, but also defeats the > purpose of the randomDelay. > {code} > @Override > protected void chore() { > final StringBuffer whyFlush = new StringBuffer(); > for (Region r : this.server.onlineRegions.values()) { > if (r == null) continue; > if (((HRegion)r).shouldFlush(whyFlush)) { > FlushRequester requester = server.getFlushRequester(); > if (requester != null) { > long randomDelay = RandomUtils.nextInt(RANGE_OF_DELAY) + > MIN_DELAY_TIME; > LOG.info(getName() + " requesting flush of " + > r.getRegionInfo().getRegionNameAsString() + " because " + > whyFlush.toString() + > " after random delay " + randomDelay + "ms"); > //Throttle the flushes by putting a delay. If we don't throttle, > and there > //is a balanced write-load on the regions in a table, we might > end up > //overwhelming the filesystem with too many flushes at once. > requester.requestDelayedFlush(r, randomDelay, false); > } > } > } > } > {code} > {code} > 2017-07-24 18:44:33,338 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 270785ms > 2017-07-24 18:44:43,328 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 200143ms > 2017-07-24 18:44:53,954 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 191082ms > 2017-07-24 18:45:03,528 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 92532ms > 2017-07-24 18:45:14,201 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 238780ms > 2017-07-24 18:45:24,195 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f > has an old edit so flush to free WALs after random delay 35390ms > 2017-07-24 18:45:33,362 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of
[jira] [Commented] (HBASE-18451) PeriodicMemstoreFlusher should inspect the queue before adding a delayed flush request
[ https://issues.apache.org/jira/browse/HBASE-18451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103307#comment-16103307 ] nihed mbarek commented on HBASE-18451: -- After investigation, I discover that the check is already implemented on the call requestDelayedFlush and it's more an issue with logs that arrive before the insert. {code} requester.requestDelayedFlush(r, randomDelay, false); {code} is calling {code} @Override public void requestDelayedFlush(Region r, long delay, boolean forceFlushAllStores) { synchronized (regionsInQueue) { if (!regionsInQueue.containsKey(r)) { // This entry has some delay FlushRegionEntry fqe = new FlushRegionEntry(r, forceFlushAllStores); fqe.requeue(delay); this.regionsInQueue.put(r, fqe); this.flushQueue.add(fqe); } } } {code} My solution is to do a review of our interface FlushRequester to change return of requestFlush and requestDelayedFlush from void to boolean, true if our region is added on the queue, false if not. And the result will be {code} @Override protected void chore() { final StringBuffer whyFlush = new StringBuffer(); for (Region r : this.server.onlineRegions.values()) { if (r == null) continue; if (((HRegion)r).shouldFlush(whyFlush)) { FlushRequester requester = server.getFlushRequester(); if (requester != null) { long randomDelay = RandomUtils.nextInt(RANGE_OF_DELAY) + MIN_DELAY_TIME; //Throttle the flushes by putting a delay. If we don't throttle, and there //is a balanced write-load on the regions in a table, we might end up //overwhelming the filesystem with too many flushes at once. if (requester.requestDelayedFlush(r, randomDelay, false)) { LOG.info(getName() + " requesting flush of " + r.getRegionInfo().getRegionNameAsString() + " because " + whyFlush.toString() + " after random delay " + randomDelay + "ms"); } } } } } {code} > PeriodicMemstoreFlusher should inspect the queue before adding a delayed > flush request > -- > > Key: HBASE-18451 > URL: https://issues.apache.org/jira/browse/HBASE-18451 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 2.0.0-alpha-1 >Reporter: Jean-Marc Spaggiari > > If you run a big job every 4 hours, impacting many tables (they have 150 > regions per server), ad the end all the regions might have some data to be > flushed, and we want, after one hour, trigger a periodic flush. That's > totally fine. > Now, to avoid a flush storm, when we detect a region to be flushed, we add a > "randomDelay" to the delayed flush, that way we spread them away. > RANGE_OF_DELAY is 5 minutes. So we spread the flush over the next 5 minutes, > which is very good. > However, because we don't check if there is already a request in the queue, > 10 seconds after, we create a new request, with a new randomDelay. > If you generate a randomDelay every 10 seconds, at some point, you will end > up having a small one, and the flush will be triggered almost immediatly. > As a result, instead of spreading all the flush within the next 5 minutes, > you end-up getting them all way more quickly. Like within the first minute. > Which not only feed the queue to to many flush requests, but also defeats the > purpose of the randomDelay. > {code} > @Override > protected void chore() { > final StringBuffer whyFlush = new StringBuffer(); > for (Region r : this.server.onlineRegions.values()) { > if (r == null) continue; > if (((HRegion)r).shouldFlush(whyFlush)) { > FlushRequester requester = server.getFlushRequester(); > if (requester != null) { > long randomDelay = RandomUtils.nextInt(RANGE_OF_DELAY) + > MIN_DELAY_TIME; > LOG.info(getName() + " requesting flush of " + > r.getRegionInfo().getRegionNameAsString() + " because " + > whyFlush.toString() + > " after random delay " + randomDelay + "ms"); > //Throttle the flushes by putting a delay. If we don't throttle, > and there > //is a balanced write-load on the regions in a table, we might > end up > //overwhelming the filesystem with too many flushes at once. > requester.requestDelayedFlush(r, randomDelay, false); > } > } > } > } > {code} > {code} > 2017-07-24 18:44:33,338 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting > flush of