[jira] [Commented] (HBASE-20165) Shell command to make a normal peer to be a serial replication peer
[ https://issues.apache.org/jira/browse/HBASE-20165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394825#comment-16394825 ] Guanghao Zhang commented on HBASE-20165: We also need show serial state in list_peers result? > Shell command to make a normal peer to be a serial replication peer > --- > > Key: HBASE-20165 > URL: https://issues.apache.org/jira/browse/HBASE-20165 > Project: HBase > Issue Type: Sub-task >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Major > Attachments: HBASE-20165.v1.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20167) Optimize the implementation of ReplicationSourceWALReader
[ https://issues.apache.org/jira/browse/HBASE-20167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394822#comment-16394822 ] Guanghao Zhang commented on HBASE-20167: +1. > Optimize the implementation of ReplicationSourceWALReader > - > > Key: HBASE-20167 > URL: https://issues.apache.org/jira/browse/HBASE-20167 > Project: HBase > Issue Type: Sub-task > Components: Replication >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0 > > Attachments: HBASE-20167-v1.patch, HBASE-20167-v2.patch, > HBASE-20167.patch > > > After HBASE-20148, serial replication will be an option for peer. Since an > instance of ReplicationSourceWALReader can only belongs to one peer, we do > not need to add the so many 'if' in the implementation of readWALEntries to > check whether we should consider serial replication. We can just make a sub > class or something similiar for serial replication to make the code clean. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19389) Limit concurrency of put with dense (hundreds) columns to prevent write handler exhausted
[ https://issues.apache.org/jira/browse/HBASE-19389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394814#comment-16394814 ] Hadoop QA commented on HBASE-19389: --- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 21s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 19s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 59s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 41s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 6m 34s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 26s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 23s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 23s{color} | {color:green} The patch hbase-common passed checkstyle {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 20s{color} | {color:green} hbase-server: The patch generated 0 new + 388 unchanged - 1 fixed = 388 total (was 389) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 54s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 18m 47s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.5 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 16s{color} | {color:green} hbase-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green}110m 3s{color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 39s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}159m 46s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 | | JIRA Issue | HBASE-19389 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12913955/HBASE-19389.master.v2.patch | | Optional Tests | asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux 6bde3f6ecac2 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality |
[jira] [Commented] (HBASE-20045) When running compaction, cache recent blocks.
[ https://issues.apache.org/jira/browse/HBASE-20045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394811#comment-16394811 ] ramkrishna.s.vasudevan commented on HBASE-20045: bq. I see the argument above about not bothering to cache a block if all its cells are weeks old. In our case, the data is advertising identifiers and can come in unpredictably, and like I said we have a big enough bucket cache anyway, so why not just cache everything? The old blocks from the compacted away files are going to be evicted anyway, so we should never run out of bucket cache if we have sized it much larger than our entire data size. [~saadmufti] Thanks for chiming in here. I like your argument. But one thing to note is that even if your compacted file blocks (old files) are evicted away when the new file is created after compaction (assuming there are no deletes) then almost the same number of blocks will be created again (new file after compaction) unless the the Column famliy has a TTL. Even I thought we can do this but the discussion here helped me to understand that it may not be possible always but may be cache recent data alone like what JMS says here. WE should also try out your suggestion also may be with a config, but warn the user that only a big enough bucket cache can help here. So just roughly can you say what is your bucket cache size ? and I think it is file mode and that file is in S3. > When running compaction, cache recent blocks. > - > > Key: HBASE-20045 > URL: https://issues.apache.org/jira/browse/HBASE-20045 > Project: HBase > Issue Type: New Feature > Components: BlockCache, Compaction >Affects Versions: 2.0.0-beta-1 >Reporter: Jean-Marc Spaggiari >Priority: Major > > HBase already allows to cache blocks on flush. This is very useful for > usecases where most queries are against recent data. However, as soon as > their is a compaction, those blocks are evicted. It will be interesting to > have a table level parameter to say "When compacting, cache blocks less than > 24 hours old". That way, when running compaction, all blocks where some data > are less than 24h hold, will be automatically cached. > > Very useful for table design where there is TS in the key but a long history > (Like a year of sensor data). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20167) Optimize the implementation of ReplicationSourceWALReader
[ https://issues.apache.org/jira/browse/HBASE-20167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394801#comment-16394801 ] Duo Zhang commented on HBASE-20167: --- Changed the several fields to be private. > Optimize the implementation of ReplicationSourceWALReader > - > > Key: HBASE-20167 > URL: https://issues.apache.org/jira/browse/HBASE-20167 > Project: HBase > Issue Type: Sub-task > Components: Replication >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0 > > Attachments: HBASE-20167-v1.patch, HBASE-20167-v2.patch, > HBASE-20167.patch > > > After HBASE-20148, serial replication will be an option for peer. Since an > instance of ReplicationSourceWALReader can only belongs to one peer, we do > not need to add the so many 'if' in the implementation of readWALEntries to > check whether we should consider serial replication. We can just make a sub > class or something similiar for serial replication to make the code clean. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20167) Optimize the implementation of ReplicationSourceWALReader
[ https://issues.apache.org/jira/browse/HBASE-20167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang updated HBASE-20167: -- Attachment: HBASE-20167-v2.patch > Optimize the implementation of ReplicationSourceWALReader > - > > Key: HBASE-20167 > URL: https://issues.apache.org/jira/browse/HBASE-20167 > Project: HBase > Issue Type: Sub-task > Components: Replication >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0 > > Attachments: HBASE-20167-v1.patch, HBASE-20167-v2.patch, > HBASE-20167.patch > > > After HBASE-20148, serial replication will be an option for peer. Since an > instance of ReplicationSourceWALReader can only belongs to one peer, we do > not need to add the so many 'if' in the implementation of readWALEntries to > check whether we should consider serial replication. We can just make a sub > class or something similiar for serial replication to make the code clean. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20124) Make hbase-spark module work with hadoop3
[ https://issues.apache.org/jira/browse/HBASE-20124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394799#comment-16394799 ] Ted Yu commented on HBASE-20124: [~mdrob]: Can you take another look ? > Make hbase-spark module work with hadoop3 > - > > Key: HBASE-20124 > URL: https://issues.apache.org/jira/browse/HBASE-20124 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Attachments: 20124.v1.txt, 20124.v2.txt, 20124.v3.txt > > > The following error can be observed when running tests in hbase-spark module > against hadoop3: > {code} > HBaseDStreamFunctionsSuite: > *** RUN ABORTED *** > java.lang.NoClassDefFoundError: org/apache/hadoop/ipc/ExternalCall > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getStorageDirs(FSNamesystem.java:1464) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNamespaceDirs(FSNamesystem.java:1444) > at > org.apache.hadoop.hdfs.MiniDFSCluster.createNameNodesAndSetConf(MiniDFSCluster.java:939) > at > org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:815) > at org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:746) > at > org.apache.hadoop.hbase.HBaseTestingUtility.startMiniDFSCluster(HBaseTestingUtility.java:668) > at > org.apache.hadoop.hbase.HBaseTestingUtility.startMiniDFSCluster(HBaseTestingUtility.java:640) > at > org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:979) > at > org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:859) > at > org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:853) > ... > Cause: java.lang.ClassNotFoundException: org.apache.hadoop.ipc.ExternalCall > at java.net.URLClassLoader.findClass(URLClassLoader.java:381) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:338) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getStorageDirs(FSNamesystem.java:1464) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNamespaceDirs(FSNamesystem.java:1444) > at > org.apache.hadoop.hdfs.MiniDFSCluster.createNameNodesAndSetConf(MiniDFSCluster.java:939) > at > org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:815) > at org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:746) > at > org.apache.hadoop.hbase.HBaseTestingUtility.startMiniDFSCluster(HBaseTestingUtility.java:668) > {code} > The dependency tree shows mixture of hadoop 2.7.4 and hadoop3 for the > hbase-spark module. > This should be addressed by adding proper profile in pom.xml -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20167) Optimize the implementation of ReplicationSourceWALReader
[ https://issues.apache.org/jira/browse/HBASE-20167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394797#comment-16394797 ] Duo Zhang commented on HBASE-20167: --- {quote} Why this block was removed? {quote} You can see the code in ReplicationSource.tryStartNewShipper, I changed the order. Now we will set WALReader first and then start the shipper, so it will never be null. > Optimize the implementation of ReplicationSourceWALReader > - > > Key: HBASE-20167 > URL: https://issues.apache.org/jira/browse/HBASE-20167 > Project: HBase > Issue Type: Sub-task > Components: Replication >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0 > > Attachments: HBASE-20167-v1.patch, HBASE-20167.patch > > > After HBASE-20148, serial replication will be an option for peer. Since an > instance of ReplicationSourceWALReader can only belongs to one peer, we do > not need to add the so many 'if' in the implementation of readWALEntries to > check whether we should consider serial replication. We can just make a sub > class or something similiar for serial replication to make the code clean. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20167) Optimize the implementation of ReplicationSourceWALReader
[ https://issues.apache.org/jira/browse/HBASE-20167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394780#comment-16394780 ] Guanghao Zhang commented on HBASE-20167: bq. protected final ReplicationSource source; bq. protected final long replicationBatchSizeCapacity; Can be private? {code:java} while (entryReader == null) { 100 if (sleepForRetries("Replication WAL entry reader thread not initialized", 101 sleepMultiplier)) { 102 sleepMultiplier++; 103 } 104 } {code} Why this block was removed? > Optimize the implementation of ReplicationSourceWALReader > - > > Key: HBASE-20167 > URL: https://issues.apache.org/jira/browse/HBASE-20167 > Project: HBase > Issue Type: Sub-task > Components: Replication >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0 > > Attachments: HBASE-20167-v1.patch, HBASE-20167.patch > > > After HBASE-20148, serial replication will be an option for peer. Since an > instance of ReplicationSourceWALReader can only belongs to one peer, we do > not need to add the so many 'if' in the implementation of readWALEntries to > check whether we should consider serial replication. We can just make a sub > class or something similiar for serial replication to make the code clean. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19389) Limit concurrency of put with dense (hundreds) columns to prevent write handler exhausted
[ https://issues.apache.org/jira/browse/HBASE-19389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394738#comment-16394738 ] Chance Li commented on HBASE-19389: --- checkstyle, run UT again > Limit concurrency of put with dense (hundreds) columns to prevent write > handler exhausted > - > > Key: HBASE-19389 > URL: https://issues.apache.org/jira/browse/HBASE-19389 > Project: HBase > Issue Type: Improvement > Components: Performance >Affects Versions: 2.0.0 > Environment: 2000+ Region Servers > PCI-E ssd >Reporter: Chance Li >Assignee: Chance Li >Priority: Critical > Fix For: 2.0.0 > > Attachments: CSLM-concurrent-write.png, > HBASE-19389-branch-2-V2.patch, HBASE-19389-branch-2-V3.patch, > HBASE-19389-branch-2-V4.patch, HBASE-19389-branch-2-V5.patch, > HBASE-19389-branch-2-V6.patch, HBASE-19389-branch-2-V7.patch, > HBASE-19389-branch-2-V8.patch, HBASE-19389-branch-2-V9.patch, > HBASE-19389-branch-2.patch, HBASE-19389.master.patch, > HBASE-19389.master.v2.patch, metrics-1.png, ycsb-result.png > > > In a large cluster, with a large number of clients, we found the RS's > handlers are all busy sometimes. And after investigation we found the root > cause is about CSLM, such as compare function heavy load. We reviewed the > related WALs, and found that there were many columns (more than 1000 columns) > were writing at that time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-19389) Limit concurrency of put with dense (hundreds) columns to prevent write handler exhausted
[ https://issues.apache.org/jira/browse/HBASE-19389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chance Li updated HBASE-19389: -- Attachment: HBASE-19389.master.v2.patch > Limit concurrency of put with dense (hundreds) columns to prevent write > handler exhausted > - > > Key: HBASE-19389 > URL: https://issues.apache.org/jira/browse/HBASE-19389 > Project: HBase > Issue Type: Improvement > Components: Performance >Affects Versions: 2.0.0 > Environment: 2000+ Region Servers > PCI-E ssd >Reporter: Chance Li >Assignee: Chance Li >Priority: Critical > Fix For: 2.0.0 > > Attachments: CSLM-concurrent-write.png, > HBASE-19389-branch-2-V2.patch, HBASE-19389-branch-2-V3.patch, > HBASE-19389-branch-2-V4.patch, HBASE-19389-branch-2-V5.patch, > HBASE-19389-branch-2-V6.patch, HBASE-19389-branch-2-V7.patch, > HBASE-19389-branch-2-V8.patch, HBASE-19389-branch-2-V9.patch, > HBASE-19389-branch-2.patch, HBASE-19389.master.patch, > HBASE-19389.master.v2.patch, metrics-1.png, ycsb-result.png > > > In a large cluster, with a large number of clients, we found the RS's > handlers are all busy sometimes. And after investigation we found the root > cause is about CSLM, such as compare function heavy load. We reviewed the > related WALs, and found that there were many columns (more than 1000 columns) > were writing at that time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20105) Allow flushes to target SSD storage
[ https://issues.apache.org/jira/browse/HBASE-20105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394733#comment-16394733 ] Hadoop QA commented on HBASE-20105: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 22s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 17s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 23s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 2s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 39s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 6m 29s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 21s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 45s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 11s{color} | {color:red} hbase-server: The patch generated 1 new + 98 unchanged - 0 fixed = 99 total (was 98) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 45s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 18m 35s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.5 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 18s{color} | {color:green} hbase-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green}106m 57s{color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 40s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}156m 10s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 | | JIRA Issue | HBASE-20105 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12913951/HBASE-20105-v5.patch | | Optional Tests | asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux 48a8ae329a4c 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build@2/component/dev-support/hbase-personality.sh | |
[jira] [Commented] (HBASE-20152) [AMv2] DisableTableProcedure versus ServerCrashProcedure
[ https://issues.apache.org/jira/browse/HBASE-20152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394721#comment-16394721 ] Duo Zhang commented on HBASE-20152: --- {quote} Can't schedule an SCP. Fails because server is going down already not present as online. Procedure suspended. {quote} So the SCP is suspended? This is a little confusing to me... I think SCP is for a crashed server? Why is it suspended? > [AMv2] DisableTableProcedure versus ServerCrashProcedure > > > Key: HBASE-20152 > URL: https://issues.apache.org/jira/browse/HBASE-20152 > Project: HBase > Issue Type: Bug > Components: amv2 >Reporter: stack >Assignee: stack >Priority: Major > > Seeing a small spate of issues where disabled tables/regions are being > assigned. Usually they happen when a DisableTableProcedure is running > concurrent with a ServerCrashProcedure. See below. See associated > HBASE-20131. This is umbrella issue for fixing. > h3. Deadlock > From HBASE-20137, 'TestRSGroups is Flakey', > https://issues.apache.org/jira/browse/HBASE-20137?focusedCommentId=16390325=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16390325 > {code} > * SCP is running because a server was aborted in test. > * SCP starts AssignProcedure of region X from crashed server. > * DisableTable Procedure runs because test has finished and we're doing > table delete. Queues > * UnassignProcedure for region X. > * Disable Unassign gets Lock on region X first. > * SCP AssignProcedure tries to get lock, waits on lock. > * DisableTable Procedure UnassignProcedure RPC fails because server is down > (Thats why the SCP). > * Tries to expire the server it failed the RPC against. Fails (currently > being SCP'd). > * DisableTable Procedure Unassign is suspended. It is a suspend with lock on > region X held > * SCP can't run because lock on X is held > * Test timesout. > {code} > Here is the actual log from around the deadlock. pid=308 is the SCP. pid=309 > is the disable table: > {code} > 2018-03-05 11:29:21,224 DEBUG [PEWorker-7] > procedure.ServerCrashProcedure(225): Done splitting WALs pid=308, > state=RUNNABLE:SERVER_CRASH_SPLIT_LOGS; ServerCrashProcedure > server=1cfd208ff882,40584,1520249102524, splitWal=true, meta=false > 2018-03-05 11:29:21,300 INFO > [RpcServer.default.FPBQ.Fifo.handler=2,queue=0,port=38498] > rsgroup.RSGroupAdminServer(371): Move server done: default=>appInfo > 2018-03-05 11:29:21,307 INFO > [RpcServer.default.FPBQ.Fifo.handler=2,queue=0,port=38498] > rsgroup.RSGroupAdminEndpoint$RSGroupAdminServiceImpl(279): > Client=jenkins//172.17.0.2 list rsgroup > 2018-03-05 11:29:21,312 INFO [Time-limited test] client.HBaseAdmin$15(901): > Started disable of Group_ns:testKillRS > 2018-03-05 11:29:21,313 INFO > [RpcServer.default.FPBQ.Fifo.handler=2,queue=0,port=38498] > master.HMaster$7(2278): Client=jenkins//172.17.0.2 disable Group_ns:testKillRS > 2018-03-05 11:29:21,384 INFO [PEWorker-9] > procedure2.ProcedureExecutor(1495): Initialized subprocedures=[{pid=310, > ppid=308, state=RUNNABLE:REGION_TRANSITION_QUEUE; AssignProcedure > table=Group_ns:testKillRS, region=de7534c208a06502537cd95c248b3043}] > 2018-03-05 11:29:21,534 DEBUG > [RpcServer.default.FPBQ.Fifo.handler=2,queue=0,port=38498] > procedure2.ProcedureExecutor(865): Stored pid=309, > state=RUNNABLE:DISABLE_TABLE_PREPARE; DisableTableProcedure > table=Group_ns:testKillRS > 2018-03-05 11:29:21,542 DEBUG > [RpcServer.default.FPBQ.Fifo.handler=2,queue=0,port=38498] > master.MasterRpcServices(1134): Checking to see if procedure is done pid=309 > 2018-03-05 11:29:21,644 DEBUG > [RpcServer.default.FPBQ.Fifo.handler=2,queue=0,port=38498] > master.MasterRpcServices(1134): Checking to see if procedure is done pid=309 > 2018-03-05 11:29:21,847 DEBUG > [RpcServer.default.FPBQ.Fifo.handler=2,queue=0,port=38498] > master.MasterRpcServices(1134): Checking to see if procedure is done pid=309 > 2018-03-05 11:29:22,118 DEBUG [PEWorker-5] hbase.MetaTableAccessor(1944): Put > {"totalColumns":1,"row":"Group_ns:testKillRS","families":{"table":[{"qualifier":"state","vlen":2,"tag":[],"timestamp":1520249362117}]},"ts":1520249362117} > 2018-03-05 11:29:22,123 INFO [PEWorker-5] hbase.MetaTableAccessor(1646): > Updated table Group_ns:testKillRS state to DISABLING in META > 2018-03-05 11:29:22,148 DEBUG > [RpcServer.default.FPBQ.Fifo.handler=2,queue=0,port=38498] > master.MasterRpcServices(1134): Checking to see if procedure is done pid=309 > 2018-03-05 11:29:22,345 INFO [PEWorker-5] > procedure2.ProcedureExecutor(1495): Initialized subprocedures=[{pid=311, > ppid=309, state=RUNNABLE:REGION_TRANSITION_DISPATCH; UnassignProcedure > table=Group_ns:testKillRS, region=de7534c208a06502537cd95c248b3043, >
[jira] [Commented] (HBASE-20152) [AMv2] DisableTableProcedure versus ServerCrashProcedure
[ https://issues.apache.org/jira/browse/HBASE-20152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394719#comment-16394719 ] Duo Zhang commented on HBASE-20152: --- {quote} Trying to think through abandon... Would work in the small but hard part is how to fail compound procedures like Move and Disable Table... Split, Merge.. .. {quote} I do not think we should abandon it. In this case suspend is OK, but the SCP should be finished first, and then we wake up the unassign procedure and it can go on. Fail an unassign procedure may have lots of side effect, as you describe above, how do we process the parent procedures... Thanks. > [AMv2] DisableTableProcedure versus ServerCrashProcedure > > > Key: HBASE-20152 > URL: https://issues.apache.org/jira/browse/HBASE-20152 > Project: HBase > Issue Type: Bug > Components: amv2 >Reporter: stack >Assignee: stack >Priority: Major > > Seeing a small spate of issues where disabled tables/regions are being > assigned. Usually they happen when a DisableTableProcedure is running > concurrent with a ServerCrashProcedure. See below. See associated > HBASE-20131. This is umbrella issue for fixing. > h3. Deadlock > From HBASE-20137, 'TestRSGroups is Flakey', > https://issues.apache.org/jira/browse/HBASE-20137?focusedCommentId=16390325=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16390325 > {code} > * SCP is running because a server was aborted in test. > * SCP starts AssignProcedure of region X from crashed server. > * DisableTable Procedure runs because test has finished and we're doing > table delete. Queues > * UnassignProcedure for region X. > * Disable Unassign gets Lock on region X first. > * SCP AssignProcedure tries to get lock, waits on lock. > * DisableTable Procedure UnassignProcedure RPC fails because server is down > (Thats why the SCP). > * Tries to expire the server it failed the RPC against. Fails (currently > being SCP'd). > * DisableTable Procedure Unassign is suspended. It is a suspend with lock on > region X held > * SCP can't run because lock on X is held > * Test timesout. > {code} > Here is the actual log from around the deadlock. pid=308 is the SCP. pid=309 > is the disable table: > {code} > 2018-03-05 11:29:21,224 DEBUG [PEWorker-7] > procedure.ServerCrashProcedure(225): Done splitting WALs pid=308, > state=RUNNABLE:SERVER_CRASH_SPLIT_LOGS; ServerCrashProcedure > server=1cfd208ff882,40584,1520249102524, splitWal=true, meta=false > 2018-03-05 11:29:21,300 INFO > [RpcServer.default.FPBQ.Fifo.handler=2,queue=0,port=38498] > rsgroup.RSGroupAdminServer(371): Move server done: default=>appInfo > 2018-03-05 11:29:21,307 INFO > [RpcServer.default.FPBQ.Fifo.handler=2,queue=0,port=38498] > rsgroup.RSGroupAdminEndpoint$RSGroupAdminServiceImpl(279): > Client=jenkins//172.17.0.2 list rsgroup > 2018-03-05 11:29:21,312 INFO [Time-limited test] client.HBaseAdmin$15(901): > Started disable of Group_ns:testKillRS > 2018-03-05 11:29:21,313 INFO > [RpcServer.default.FPBQ.Fifo.handler=2,queue=0,port=38498] > master.HMaster$7(2278): Client=jenkins//172.17.0.2 disable Group_ns:testKillRS > 2018-03-05 11:29:21,384 INFO [PEWorker-9] > procedure2.ProcedureExecutor(1495): Initialized subprocedures=[{pid=310, > ppid=308, state=RUNNABLE:REGION_TRANSITION_QUEUE; AssignProcedure > table=Group_ns:testKillRS, region=de7534c208a06502537cd95c248b3043}] > 2018-03-05 11:29:21,534 DEBUG > [RpcServer.default.FPBQ.Fifo.handler=2,queue=0,port=38498] > procedure2.ProcedureExecutor(865): Stored pid=309, > state=RUNNABLE:DISABLE_TABLE_PREPARE; DisableTableProcedure > table=Group_ns:testKillRS > 2018-03-05 11:29:21,542 DEBUG > [RpcServer.default.FPBQ.Fifo.handler=2,queue=0,port=38498] > master.MasterRpcServices(1134): Checking to see if procedure is done pid=309 > 2018-03-05 11:29:21,644 DEBUG > [RpcServer.default.FPBQ.Fifo.handler=2,queue=0,port=38498] > master.MasterRpcServices(1134): Checking to see if procedure is done pid=309 > 2018-03-05 11:29:21,847 DEBUG > [RpcServer.default.FPBQ.Fifo.handler=2,queue=0,port=38498] > master.MasterRpcServices(1134): Checking to see if procedure is done pid=309 > 2018-03-05 11:29:22,118 DEBUG [PEWorker-5] hbase.MetaTableAccessor(1944): Put > {"totalColumns":1,"row":"Group_ns:testKillRS","families":{"table":[{"qualifier":"state","vlen":2,"tag":[],"timestamp":1520249362117}]},"ts":1520249362117} > 2018-03-05 11:29:22,123 INFO [PEWorker-5] hbase.MetaTableAccessor(1646): > Updated table Group_ns:testKillRS state to DISABLING in META > 2018-03-05 11:29:22,148 DEBUG > [RpcServer.default.FPBQ.Fifo.handler=2,queue=0,port=38498] > master.MasterRpcServices(1134): Checking to see if procedure is done pid=309 > 2018-03-05 11:29:22,345 INFO [PEWorker-5] >
[jira] [Commented] (HBASE-20152) [AMv2] DisableTableProcedure versus ServerCrashProcedure
[ https://issues.apache.org/jira/browse/HBASE-20152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394716#comment-16394716 ] Duo Zhang commented on HBASE-20152: --- {quote} Cluster shutdown is set. This means cluster down flag is set, we expire servers, but no ServerCrashProcedure gets scheduled. {quote} Just ask, after restarting we will re-schedule a SCP right? > [AMv2] DisableTableProcedure versus ServerCrashProcedure > > > Key: HBASE-20152 > URL: https://issues.apache.org/jira/browse/HBASE-20152 > Project: HBase > Issue Type: Bug > Components: amv2 >Reporter: stack >Assignee: stack >Priority: Major > > Seeing a small spate of issues where disabled tables/regions are being > assigned. Usually they happen when a DisableTableProcedure is running > concurrent with a ServerCrashProcedure. See below. See associated > HBASE-20131. This is umbrella issue for fixing. > h3. Deadlock > From HBASE-20137, 'TestRSGroups is Flakey', > https://issues.apache.org/jira/browse/HBASE-20137?focusedCommentId=16390325=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16390325 > {code} > * SCP is running because a server was aborted in test. > * SCP starts AssignProcedure of region X from crashed server. > * DisableTable Procedure runs because test has finished and we're doing > table delete. Queues > * UnassignProcedure for region X. > * Disable Unassign gets Lock on region X first. > * SCP AssignProcedure tries to get lock, waits on lock. > * DisableTable Procedure UnassignProcedure RPC fails because server is down > (Thats why the SCP). > * Tries to expire the server it failed the RPC against. Fails (currently > being SCP'd). > * DisableTable Procedure Unassign is suspended. It is a suspend with lock on > region X held > * SCP can't run because lock on X is held > * Test timesout. > {code} > Here is the actual log from around the deadlock. pid=308 is the SCP. pid=309 > is the disable table: > {code} > 2018-03-05 11:29:21,224 DEBUG [PEWorker-7] > procedure.ServerCrashProcedure(225): Done splitting WALs pid=308, > state=RUNNABLE:SERVER_CRASH_SPLIT_LOGS; ServerCrashProcedure > server=1cfd208ff882,40584,1520249102524, splitWal=true, meta=false > 2018-03-05 11:29:21,300 INFO > [RpcServer.default.FPBQ.Fifo.handler=2,queue=0,port=38498] > rsgroup.RSGroupAdminServer(371): Move server done: default=>appInfo > 2018-03-05 11:29:21,307 INFO > [RpcServer.default.FPBQ.Fifo.handler=2,queue=0,port=38498] > rsgroup.RSGroupAdminEndpoint$RSGroupAdminServiceImpl(279): > Client=jenkins//172.17.0.2 list rsgroup > 2018-03-05 11:29:21,312 INFO [Time-limited test] client.HBaseAdmin$15(901): > Started disable of Group_ns:testKillRS > 2018-03-05 11:29:21,313 INFO > [RpcServer.default.FPBQ.Fifo.handler=2,queue=0,port=38498] > master.HMaster$7(2278): Client=jenkins//172.17.0.2 disable Group_ns:testKillRS > 2018-03-05 11:29:21,384 INFO [PEWorker-9] > procedure2.ProcedureExecutor(1495): Initialized subprocedures=[{pid=310, > ppid=308, state=RUNNABLE:REGION_TRANSITION_QUEUE; AssignProcedure > table=Group_ns:testKillRS, region=de7534c208a06502537cd95c248b3043}] > 2018-03-05 11:29:21,534 DEBUG > [RpcServer.default.FPBQ.Fifo.handler=2,queue=0,port=38498] > procedure2.ProcedureExecutor(865): Stored pid=309, > state=RUNNABLE:DISABLE_TABLE_PREPARE; DisableTableProcedure > table=Group_ns:testKillRS > 2018-03-05 11:29:21,542 DEBUG > [RpcServer.default.FPBQ.Fifo.handler=2,queue=0,port=38498] > master.MasterRpcServices(1134): Checking to see if procedure is done pid=309 > 2018-03-05 11:29:21,644 DEBUG > [RpcServer.default.FPBQ.Fifo.handler=2,queue=0,port=38498] > master.MasterRpcServices(1134): Checking to see if procedure is done pid=309 > 2018-03-05 11:29:21,847 DEBUG > [RpcServer.default.FPBQ.Fifo.handler=2,queue=0,port=38498] > master.MasterRpcServices(1134): Checking to see if procedure is done pid=309 > 2018-03-05 11:29:22,118 DEBUG [PEWorker-5] hbase.MetaTableAccessor(1944): Put > {"totalColumns":1,"row":"Group_ns:testKillRS","families":{"table":[{"qualifier":"state","vlen":2,"tag":[],"timestamp":1520249362117}]},"ts":1520249362117} > 2018-03-05 11:29:22,123 INFO [PEWorker-5] hbase.MetaTableAccessor(1646): > Updated table Group_ns:testKillRS state to DISABLING in META > 2018-03-05 11:29:22,148 DEBUG > [RpcServer.default.FPBQ.Fifo.handler=2,queue=0,port=38498] > master.MasterRpcServices(1134): Checking to see if procedure is done pid=309 > 2018-03-05 11:29:22,345 INFO [PEWorker-5] > procedure2.ProcedureExecutor(1495): Initialized subprocedures=[{pid=311, > ppid=309, state=RUNNABLE:REGION_TRANSITION_DISPATCH; UnassignProcedure > table=Group_ns:testKillRS, region=de7534c208a06502537cd95c248b3043, > server=1cfd208ff882,40584,1520249102524}] > 2018-03-05
[jira] [Commented] (HBASE-20173) [AMv2] DisableTableProcedure concurrent to ServerCrashProcedure can deadlock
[ https://issues.apache.org/jira/browse/HBASE-20173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394695#comment-16394695 ] Hadoop QA commented on HBASE-20173: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 22s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} branch-2 Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 35s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 29s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 38s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 5m 40s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 26s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 58s{color} | {color:green} branch-2 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 1m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 44s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 13s{color} | {color:red} hbase-server: The patch generated 9 new + 85 unchanged - 6 fixed = 94 total (was 91) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 3s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 14m 58s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.5 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green} 1m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 50s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 29s{color} | {color:green} hbase-protocol-shaded in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green}120m 29s{color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 2s{color} | {color:green} hbase-it in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 58s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}169m 24s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:9f2f2db | | JIRA Issue | HBASE-20173 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12913944/HBASE-20173.branch-2.001.patch | | Optional Tests | asflicense javac javadoc unit findbugs shadedjars
[jira] [Updated] (HBASE-20105) Allow flushes to target SSD storage
[ https://issues.apache.org/jira/browse/HBASE-20105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jean-Marc Spaggiari updated HBASE-20105: Status: Patch Available (was: Open) Testing a backport on 1.2.0 and works well. Passed the tests locally too. Looks done to me. > Allow flushes to target SSD storage > --- > > Key: HBASE-20105 > URL: https://issues.apache.org/jira/browse/HBASE-20105 > Project: HBase > Issue Type: New Feature > Components: Performance, regionserver >Affects Versions: hbase-2.0.0-alpha-4 >Reporter: Jean-Marc Spaggiari >Assignee: Jean-Marc Spaggiari >Priority: Major > Attachments: HBASE-20105-v0.patch, HBASE-20105-v1.patch, > HBASE-20105-v2.patch, HBASE-20105-v3.patch, HBASE-20105-v4.patch, > HBASE-20105-v5.patch > > > On heavy writes usecases, flushes are compactes together pretty quickly. > Allowing flushes to go on SSD allows faster flush and faster first > compactions. Subsequent compactions going on regular storage. > > I will be interesting to have an option to target SSD for flushes. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20105) Allow flushes to target SSD storage
[ https://issues.apache.org/jira/browse/HBASE-20105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jean-Marc Spaggiari updated HBASE-20105: Attachment: HBASE-20105-v5.patch > Allow flushes to target SSD storage > --- > > Key: HBASE-20105 > URL: https://issues.apache.org/jira/browse/HBASE-20105 > Project: HBase > Issue Type: New Feature > Components: Performance, regionserver >Affects Versions: hbase-2.0.0-alpha-4 >Reporter: Jean-Marc Spaggiari >Assignee: Jean-Marc Spaggiari >Priority: Major > Attachments: HBASE-20105-v0.patch, HBASE-20105-v1.patch, > HBASE-20105-v2.patch, HBASE-20105-v3.patch, HBASE-20105-v4.patch, > HBASE-20105-v5.patch > > > On heavy writes usecases, flushes are compactes together pretty quickly. > Allowing flushes to go on SSD allows faster flush and faster first > compactions. Subsequent compactions going on regular storage. > > I will be interesting to have an option to target SSD for flushes. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20105) Allow flushes to target SSD storage
[ https://issues.apache.org/jira/browse/HBASE-20105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jean-Marc Spaggiari updated HBASE-20105: Status: Open (was: Patch Available) > Allow flushes to target SSD storage > --- > > Key: HBASE-20105 > URL: https://issues.apache.org/jira/browse/HBASE-20105 > Project: HBase > Issue Type: New Feature > Components: Performance, regionserver >Affects Versions: hbase-2.0.0-alpha-4 >Reporter: Jean-Marc Spaggiari >Assignee: Jean-Marc Spaggiari >Priority: Major > Attachments: HBASE-20105-v0.patch, HBASE-20105-v1.patch, > HBASE-20105-v2.patch, HBASE-20105-v3.patch, HBASE-20105-v4.patch, > HBASE-20105-v5.patch > > > On heavy writes usecases, flushes are compactes together pretty quickly. > Allowing flushes to go on SSD allows faster flush and faster first > compactions. Subsequent compactions going on regular storage. > > I will be interesting to have an option to target SSD for flushes. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20105) Allow flushes to target SSD storage
[ https://issues.apache.org/jira/browse/HBASE-20105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394684#comment-16394684 ] Jean-Marc Spaggiari commented on HBASE-20105: - Did some small changes and tested it locally: {code:java} root@hbasetest1:~# hdfs fsck /hbase/data/default/t1/1c925d870fcf663dd3f48d31bf2b98d8/f1/f22416c954df4e24b499e5fc707cb029 -files -blocks -locations Connecting to namenode via http://hbasetest1.distparser.com:50070/fsck?ugi=root=1=1=1=%2Fhbase%2Fdata%2Fdefault%2Ft1%2F1c925d870fcf663dd3f48d31bf2b98d8%2Ff1%2Ff22416c954df4e24b499e5fc707cb029 FSCK started by root (auth:SIMPLE) from /192.168.23.51 for path /hbase/data/default/t1/1c925d870fcf663dd3f48d31bf2b98d8/f1/f22416c954df4e24b499e5fc707cb029 at Sun Mar 11 18:44:49 EDT 2018 /hbase/data/default/t1/1c925d870fcf663dd3f48d31bf2b98d8/f1/f22416c954df4e24b499e5fc707cb029 4908 bytes, 1 block(s): OK 0. BP-2069742952-192.168.23.51-1431229364576:blk_1074774898_1034473 len=4908 Live_repl=3 [DatanodeInfoWithStorage[192.168.23.54:50010,DS-6c810995-115c-42cd-af32-c34f5095e45c,SSD], DatanodeInfoWithStorage[192.168.23.52:50010,DS-d4e1790e-b7d3-492f-bb4f-7fb11b7ceff4,SSD], DatanodeInfoWithStorage[192.168.23.53:50010,DS-04dac874-1eaf-43b5-ac1d-2768572b7a36,SSD]] Status: HEALTHY Total size: 4908 B Total dirs: 0 Total files:1 Total symlinks: 0 Total blocks (validated): 1 (avg. block size 4908 B) Minimally replicated blocks:1 (100.0 %) Over-replicated blocks: 0 (0.0 %) Under-replicated blocks:0 (0.0 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor: 3 Average block replication: 3.0 Corrupt blocks: 0 Missing replicas: 0 (0.0 %) Number of data-nodes: 3 Number of racks:1 FSCK ended at Sun Mar 11 18:44:49 EDT 2018 in 0 milliseconds The filesystem under path '/hbase/data/default/t1/1c925d870fcf663dd3f48d31bf2b98d8/f1/f22416c954df4e24b499e5fc707cb029' is HEALTHY {code} Sound like it works. Updated patch coming soon. > Allow flushes to target SSD storage > --- > > Key: HBASE-20105 > URL: https://issues.apache.org/jira/browse/HBASE-20105 > Project: HBase > Issue Type: New Feature > Components: Performance, regionserver >Affects Versions: hbase-2.0.0-alpha-4 >Reporter: Jean-Marc Spaggiari >Assignee: Jean-Marc Spaggiari >Priority: Major > Attachments: HBASE-20105-v0.patch, HBASE-20105-v1.patch, > HBASE-20105-v2.patch, HBASE-20105-v3.patch, HBASE-20105-v4.patch > > > On heavy writes usecases, flushes are compactes together pretty quickly. > Allowing flushes to go on SSD allows faster flush and faster first > compactions. Subsequent compactions going on regular storage. > > I will be interesting to have an option to target SSD for flushes. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20173) [AMv2] DisableTableProcedure concurrent to ServerCrashProcedure can deadlock
[ https://issues.apache.org/jira/browse/HBASE-20173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-20173: -- Status: Patch Available (was: Open) Trying hadoopqa. This one is hard to write a test for since it depdendent on aligning two macro procedure steps exactly. My best bet I think is the test IntegrationTestDDLMasterFailover on a cluster. Will try it concurrent to this hadoopqa run. > [AMv2] DisableTableProcedure concurrent to ServerCrashProcedure can deadlock > > > Key: HBASE-20173 > URL: https://issues.apache.org/jira/browse/HBASE-20173 > Project: HBase > Issue Type: Sub-task > Components: amv2 >Reporter: stack >Assignee: stack >Priority: Critical > Fix For: 2.0.0 > > Attachments: HBASE-20173.branch-2.001.patch > > > See 'Deadlock' scenario in parent issue. Doing as focused subtask since > parent has a few things going on in it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20173) [AMv2] DisableTableProcedure concurrent to ServerCrashProcedure can deadlock
[ https://issues.apache.org/jira/browse/HBASE-20173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-20173: -- Priority: Critical (was: Major) > [AMv2] DisableTableProcedure concurrent to ServerCrashProcedure can deadlock > > > Key: HBASE-20173 > URL: https://issues.apache.org/jira/browse/HBASE-20173 > Project: HBase > Issue Type: Sub-task > Components: amv2 >Reporter: stack >Assignee: stack >Priority: Critical > Fix For: 2.0.0 > > Attachments: HBASE-20173.branch-2.001.patch > > > See 'Deadlock' scenario in parent issue. Doing as focused subtask since > parent has a few things going on in it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20173) [AMv2] DisableTableProcedure concurrent to ServerCrashProcedure can deadlock
[ https://issues.apache.org/jira/browse/HBASE-20173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-20173: -- Attachment: HBASE-20173.branch-2.001.patch > [AMv2] DisableTableProcedure concurrent to ServerCrashProcedure can deadlock > > > Key: HBASE-20173 > URL: https://issues.apache.org/jira/browse/HBASE-20173 > Project: HBase > Issue Type: Sub-task > Components: amv2 >Reporter: stack >Assignee: stack >Priority: Major > Fix For: 2.0.0 > > Attachments: HBASE-20173.branch-2.001.patch > > > See 'Deadlock' scenario in parent issue. Doing as focused subtask since > parent has a few things going on in it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20173) [AMv2] DisableTableProcedure concurrent to ServerCrashProcedure can deadlock
stack created HBASE-20173: - Summary: [AMv2] DisableTableProcedure concurrent to ServerCrashProcedure can deadlock Key: HBASE-20173 URL: https://issues.apache.org/jira/browse/HBASE-20173 Project: HBase Issue Type: Sub-task Components: amv2 Reporter: stack Assignee: stack Fix For: 2.0.0 See 'Deadlock' scenario in parent issue. Doing as focused subtask since parent has a few things going on in it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20045) When running compaction, cache recent blocks.
[ https://issues.apache.org/jira/browse/HBASE-20045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394597#comment-16394597 ] Saad Mufti commented on HBASE-20045: Also if ever a setting is contemplated, it would be great to backport back to 1.4.0 or earlier. Thanks. > When running compaction, cache recent blocks. > - > > Key: HBASE-20045 > URL: https://issues.apache.org/jira/browse/HBASE-20045 > Project: HBase > Issue Type: New Feature > Components: BlockCache, Compaction >Affects Versions: 2.0.0-beta-1 >Reporter: Jean-Marc Spaggiari >Priority: Major > > HBase already allows to cache blocks on flush. This is very useful for > usecases where most queries are against recent data. However, as soon as > their is a compaction, those blocks are evicted. It will be interesting to > have a table level parameter to say "When compacting, cache blocks less than > 24 hours old". That way, when running compaction, all blocks where some data > are less than 24h hold, will be automatically cached. > > Very useful for table design where there is TS in the key but a long history > (Like a year of sensor data). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (HBASE-20045) When running compaction, cache recent blocks.
[ https://issues.apache.org/jira/browse/HBASE-20045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394587#comment-16394587 ] Saad Mufti edited comment on HBASE-20045 at 3/11/18 6:15 PM: - I hope I'm not interrupting the whole discussion, but I would like to describe a current use case for which this would be incredibly useful in my current line of work. We are using HBase on AWS EMR where the actual storage is all on S3 using AWS's proprietary EMRFS filesystem. We have configured our bucket cache to be more than large enough to store all our data and more, but the S3 buys us failure recoverability and AWS EMR does not support more than a single master yet, so loss of the master means loss of the entire cluster and having our data in S3 lets us survive cluster failure cleanly. We have a heavy read and write load, and performance is more than good enough when everything is coming from the bucket cache (we have set the prefetch on open flag to true in the relevant column families' schema, except for one where we do heavy write but never read from it in the HBase cluster). Now we come to compaction, both minor and major compaction. We have tuned minor compaction min and max filesize very low to avoid as much minor compaction as practical, and run a homegrown tool that does major compaction across the whole cluster in batches, with each batch being one region per region server. In past HBase clusters where storage was in HDFS, this served our needs well and we'd like to keep this tool for its operational flexibility and not having to take any downtime to compact. The problem of course is that compaction evicts from the bucket cache blocks from the files being compacted away, and also refuses to put blocks from the newly compacted file in the bucket cache. In the face of ongoing traffic that does a lot of checkAndPut, this causes the reads to go to S3 and be slow, causing the write lock for checkAndPut to be held for a long time, causing timeouts in other operations trying to get the same row lock. Also our overall performance suffers while the batch is going on due to requests building up in IPC queues. Response time means and other percentiles look like a sawtooth pattern. In our case each batch of compactions lasts roughly 10 minutes or so and the response time sawtooth pattern has the same periodicity. For now we have worked around this by a) using the new setting in HBase 1.4.0 in our client to avoid one slow region server from blocking all client ops to other region servers, b) accepting some timeouts as the cost of business and requeuing them in a special Kafka retry topic for our upstream system to reprocess. This also limits traffic on the slow region server, letting it clean out its backed up IPC queues instead of being hammered with traffic that doesn't let it recover. Also we run the tool once a day and it finishes in 8-10 hours, so performance is great the rest of the day. But if we got a setting that would let us tell HBase to always cache even newly compacted files, our performance hit would totally go away. I see the argument above about not bothering to cache a block if all its cells are weeks old. In our case, the data is advertising identifiers and can come in unpredictably, and like I said we have a big enough bucket cache anyway, so why not just cache everything? The old blocks from the compacted away files are going to be evicted anyway, so we should never run out of bucket cache if we have sized it much larger than our entire data size. Also of course the default behavior would continue, any new configuration around this can come with heavy warnings of the potential consequences and only being used by advanced users etc. Again I apologize for this long description if it distracts from where the current state of the discussion was. Even a setting to only cache newly compacted blocks if they had "new" cells would still be hugely benficial to us. Cheers. was (Author: saadmufti): I hope I'm not interrupting the whole discussion, but I would like to describe a current use case for which this would be incredibly useful in my current line of work. We are using HBase on AWS EMR where the actual storage is all on S3 using AWS's proprietary EMRFS filesystem. We have configured our bucket cache to be more than large enough to store all our data and more, but the S3 buys us failure recoverability and AWS EMR does not support more than a single master yet, so loss of the master means loss of the entire cluster and having our data in S3 lets us survive cluster failure cleanly. We have a heavy read and write load, and performance is more than good enough when everything is coming from the bucket cache (we have set the prefetch on open flag to true in the relevant column families' schema, except for one where we do heavy write but never read
[jira] [Comment Edited] (HBASE-20045) When running compaction, cache recent blocks.
[ https://issues.apache.org/jira/browse/HBASE-20045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394587#comment-16394587 ] Saad Mufti edited comment on HBASE-20045 at 3/11/18 6:14 PM: - I hope I'm not interrupting the whole discussion, but I would like to describe a current use case for which this would be incredibly useful in my current line of work. We are using HBase on AWS EMR where the actual storage is all on S3 using AWS's proprietary EMRFS filesystem. We have configured our bucket cache to be more than large enough to store all our data and more, but the S3 buys us failure recoverability and AWS EMR does not support more than a single master yet, so loss of the master means loss of the entire cluster and having our data in S3 lets us survive cluster failure cleanly. We have a heavy read and write load, and performance is more than good enough when everything is coming from the bucket cache (we have set the prefetch on open flag to true in the relevant column families' schema, except for one where we do heavy write but never read from it in the HBase cluster). Now we come to compaction, both minor and major compaction. We have tuned minor compaction min and max filesize very low to avoid as much minor compaction as practical, and run a homegrown tool that does major compaction across the whole cluster in batches, with each batch being one region per region server. In past HBase clusters where storage was in HDFS, this served our needs well and we'd like to keep this tool for its operational flexibility and not having to take any downtime to compact. The problem of course is that compaction evicts from the bucket cache blocks from the files being compacted away. In the face of ongoing traffic that does a lot of checkAndPut, this causes the reads to go to S3 and be slow, causing the write lock for checkAndPut to be held for a long time, causing timeouts in other operations trying to get the same row lock. Also our overall performance suffers while the batch is going on due to requests building up in IPC queues. Response time means and other percentiles look like a sawtooth pattern. In our case each batch of compactions lasts roughly 10 minutes or so and the response time sawtooth pattern has the same periodicity. For now we have worked around this by a) using the new setting in HBase 1.4.0 in our client to avoid one slow region server from blocking all client ops to other region servers, b) accepting some timeouts as the cost of business and requeuing them in a special Kafka retry topic for our upstream system to reprocess. This also limits traffic on the slow region server, letting it clean out its backed up IPC queues instead of being hammered with traffic that doesn't let it recover. Also we run the tool once a day and it finishes in 8-10 hours, so performance is great the rest of the day. But if we got a setting that would let us tell HBase to always cache even newly compacted files, our performance hit would totally go away. I see the argument above about not bothering to cache a block if all its cells are weeks old. In our case, the data is advertising identifiers and can come in unpredictably, and like I said we have a big enough bucket cache anyway, so why not just cache everything? The old blocks from the compacted away files are going to be evicted anyway, so we should never run out of bucket cache if we have sized it much larger than our entire data size. Also of course the default behavior would continue, any new configuration around this can come with heavy warnings of the potential consequences and only being used by advanced users etc. Again I apologize for this long description if it distracts from where the current state of the discussion was. Even a setting to only cache newly compacted blocks if they had "new" cells would still be hugely benficial to us. Cheers. was (Author: saadmufti): I hope I'm not interrupting the whole discussion, but I would like to describe a current use case for which this would be incredibly useful in my current line of work. We are using HBase on AWS EMR where the actual storage is all on S3 using AWS's proprietary EMRFS filesystem. We have configured our bucket cache to be more than large enough to store all our data and more, but the S3 buys us failure recoverability and AWS EMR does not support more than a single master yet, so loss of the master means loss of the entire cluster and having our data in S3 lets us survive cluster failure cleanly. We have a heavy read and write load, and performance is more than good enough when everything is coming from the bucket cache (we have set the prefetch on open flag to true in the relevant column families' schema, except for one where we do heavy write but never read from it in the HBase cluster). Now we come to compaction, both minor and major
[jira] [Comment Edited] (HBASE-20045) When running compaction, cache recent blocks.
[ https://issues.apache.org/jira/browse/HBASE-20045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394587#comment-16394587 ] Saad Mufti edited comment on HBASE-20045 at 3/11/18 5:51 PM: - I hope I'm not interrupting the whole discussion, but I would like to describe a current use case for which this would be incredibly useful in my current line of work. We are using HBase on AWS EMR where the actual storage is all on S3 using AWS's proprietary EMRFS filesystem. We have configured our bucket cache to be more than large enough to store all our data and more, but the S3 buys us failure recoverability and AWS EMR does not support more than a single master yet, so loss of the master means loss of the entire cluster and having our data in S3 lets us survive cluster failure cleanly. We have a heavy read and write load, and performance is more than good enough when everything is coming from the bucket cache (we have set the prefetch on open flag to true in the relevant column families' schema, except for one where we do heavy write but never read from it in the HBase cluster). Now we come to compaction, both minor and major compaction. We have tuned minor compaction min and max filesize very slow to avoid as much minor compaction as practical, and run a homegrown tool that does major compaction across the whole cluster in batches, with each batch being one region per region server. In past HBase clusters where storage was in HDFS, this served our needs well and we'd like to keep this tool for its operational flexibility and not having to take any downtime to compact. The problem of course is that compaction evicts from the bucket cache blocks from the files being compacted away. In the face of ongoing traffic that does a lot of checkAndPut, this causes the reads to go to S3 and be slow, causing the write lock for checkAndPut to be held for a long time, causing timeouts in other operations trying to get the same row lock. Also our overall performance suffers while the batch is going on due to requests building up in IPC queues. Response time means and other percentiles look like a sawtooth pattern. In our case each batch of compactions lasts roughly 10 minutes or so and the response time sawtooth pattern has the same periodicity. For now we have worked around this by a) using the new setting in HBase 1.4.0 in our client to avoid one slow region server from blocking all client ops to other region servers, b) accepting some timeouts as the cost of business and requeuing them in a special Kafka retry topic for our upstream system to reprocess. This also limits traffic on the slow region server, letting it clean out its backed up IPC queues instead of being hammered with traffic that doesn't let it recover. Also we run the tool once a day and it finishes in 8-10 hours, so performance is great the rest of the day. But if we got a setting that would let us tell HBase to always cache even newly compacted files, our performance hit would totally go away. I see the argument above about not bothering to cache a block if all its cells are weeks old. In our case, the data is advertising identifiers and can come in unpredictably, and like I said we have a big enough bucket cache anyway, so why not just cache everything? The old blocks from the compacted away files are going to be evicted anyway, so we should never run out of bucket cache if we have sized it much larger than our entire data size. Also of course the default behavior would continue, any new configuration around this can come with heavy warnings of the potential consequences and only being used by advanced users etc. Again I apologize for this long description if it distracts from where the current state of the discussion was. Even a setting to only cache newly compacted blocks if they had "new" cells would still be hugely benficial to us. Cheers. was (Author: saadmufti): I hope I'm not interrupting the whole discussion, but I would like to describe a current use case for which this would be incredibly useful in my current line of work. We are using HBase on AWS EMR where the actual storage is all on S3 using AWS's proprietary EMRFS filesystem. We have configured our bucket cache to be more than large enough to store all our data and more, but the S3 buys us failure recoverability and AWS EMR does not support more than a single master yet, so loss of the master means loss of the entire cluster and having our data in S3 lets us survive cluster failure cleanly. We have a heavy read and write load, and performance is more than good enough when everything is coming from the bucket cache (we have set the prefetch on open flag to true in the relevant column families' schema, except for one where we do heavy write but never read from it in the HBase cluster). Now we come to compaction, both minor and major
[jira] [Commented] (HBASE-20045) When running compaction, cache recent blocks.
[ https://issues.apache.org/jira/browse/HBASE-20045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394587#comment-16394587 ] Saad Mufti commented on HBASE-20045: I hope I'm not interrupting the whole discussion, but I would like to describe a current use case for which this would be incredibly useful in my current line of work. We are using HBase on AWS EMR where the actual storage is all on S3 using AWS's proprietary EMRFS filesystem. We have configured our bucket cache to be more than large enough to store all our data and more, but the S3 buys us failure recoverability and AWS EMR does not support more than a single master yet, so loss of the master means loss of the entire cluster and having our data in S3 lets us survive cluster failure cleanly. We have a heavy read and write load, and performance is more than good enough when everything is coming from the bucket cache (we have set the prefetch on open flag to true in the relevant column families' schema, except for one where we do heavy write but never read from it in the HBase cluster). Now we come to compaction, both minor and major compaction. We have tuned minor compaction min and max filesize very slow to avoid as much minor compaction as practical, and run a homegrown tool that does major compaction across the whole cluster in batches, with each batch being one region per region server. In past HBase clusters where storage was in HDFS, this served our needs well and we'd like to keep this tool for its operational flexibility and not having to take any downtime to compact. The problem of course is that compaction evicts from the bucket cache blocks from the files being compacted away. In the face of ongoing traffic that does a lot of checkAndPut, this causes the reads to go to S3 and be slow, causing the write lock for checkAndPut to be held for a long time, causing timeouts in other operations trying to get the same row lock. Also our overall performance suffers while the batch is going on due to requests building up in IPC queues. Response time means and other percentiles look like a sawtooth pattern. In our case each batch of compactions lasts roughly 10 minutes or so and the response time sawtooth pattern has the same periodicity. For now we have worked around this by a) using the new setting in HBase 1.4.0 in our client to avoid one slow region server from blocking all client ops to other region servers, b) accepting some timeouts as the cost of business and requeuing them in a special Kafka retry topic for our upstream system to reprocess. This also limits traffic on the slow region server, letting it clean out its backed up IPC queues instead of being hammered with traffic that doesn't let it recover. Also we run the tool once a day and it finishes in 8-10 hours, so performance is great the rest of the day. But if we got a setting that would let us tell HBase to always cache even newly compacted files, our performance hit would totally go away. I see the argument above about not bothering to cache a block if all its cells are weeks old. In our case, the data is advertising identifiers and can come in unpredictably, and like I said we have a big enough bucket cache anyway, so why not just cache everything? The old blocks from the compacted away files are going to be evicted anyway, so we should never run out of bucket cache if we have sized it much larger than our entire data size. Again I apologize for this long description if it distracts from where the current state of the discussion was. Even a setting to only cache newly compacted blocks if they had "new" cells would still be hugely benficial to us. Cheers. > When running compaction, cache recent blocks. > - > > Key: HBASE-20045 > URL: https://issues.apache.org/jira/browse/HBASE-20045 > Project: HBase > Issue Type: New Feature > Components: BlockCache, Compaction >Affects Versions: 2.0.0-beta-1 >Reporter: Jean-Marc Spaggiari >Priority: Major > > HBase already allows to cache blocks on flush. This is very useful for > usecases where most queries are against recent data. However, as soon as > their is a compaction, those blocks are evicted. It will be interesting to > have a table level parameter to say "When compacting, cache blocks less than > 24 hours old". That way, when running compaction, all blocks where some data > are less than 24h hold, will be automatically cached. > > Very useful for table design where there is TS in the key but a long history > (Like a year of sensor data). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20172) During coprocessor load, switch classloader only if it's a custom CP.
[ https://issues.apache.org/jira/browse/HBASE-20172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394575#comment-16394575 ] Ankit Singhal commented on HBASE-20172: --- bq. is there a reason to do this in java8? I believe setting current classloader to null and relying on Java to fallback to system classloader is not a good idea as every Java API would not be handling this consistently. > During coprocessor load, switch classloader only if it's a custom CP. > - > > Key: HBASE-20172 > URL: https://issues.apache.org/jira/browse/HBASE-20172 > Project: HBase > Issue Type: Bug >Affects Versions: 1.4.0 >Reporter: Ankit Singhal >Assignee: Ankit Singhal >Priority: Major > Fix For: 2.0.0 > > Attachments: HBASE-20172.patch > > > Current Impact:- > Metric registries will not be able to load its implementation through service > loader and etc. > We are not observing this with Java8 because ServiceLoader uses system class > loader if provided class loader is null but it gets exposed with Java7 > easily(TEPHRA-285) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20172) During coprocessor load, switch classloader only if it's a custom CP.
[ https://issues.apache.org/jira/browse/HBASE-20172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394522#comment-16394522 ] Sean Busbey commented on HBASE-20172: - we only do java7 on 1.y.z releases. could you make a patch for branch-1? is there a reason to do this in java8? > During coprocessor load, switch classloader only if it's a custom CP. > - > > Key: HBASE-20172 > URL: https://issues.apache.org/jira/browse/HBASE-20172 > Project: HBase > Issue Type: Bug >Affects Versions: 1.4.0 >Reporter: Ankit Singhal >Assignee: Ankit Singhal >Priority: Major > Fix For: 2.0.0 > > Attachments: HBASE-20172.patch > > > Current Impact:- > Metric registries will not be able to load its implementation through service > loader and etc. > We are not observing this with Java8 because ServiceLoader uses system class > loader if provided class loader is null but it gets exposed with Java7 > easily(TEPHRA-285) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20171) Remove o.a.h.h.ProcedureState
[ https://issues.apache.org/jira/browse/HBASE-20171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chia-Ping Tsai updated HBASE-20171: --- Description: It was introduced by HBASE-15609, and HBASE-18106 make it be a orphan > Remove o.a.h.h.ProcedureState > - > > Key: HBASE-20171 > URL: https://issues.apache.org/jira/browse/HBASE-20171 > Project: HBase > Issue Type: Task >Reporter: Chia-Ping Tsai >Priority: Minor > Labels: beginner, beginners > Fix For: 2.0.0, 3.0.0, 2.1.0 > > > It was introduced by HBASE-15609, and HBASE-18106 make it be a orphan -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20171) Remove o.a.h.h.ProcedureState
[ https://issues.apache.org/jira/browse/HBASE-20171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chia-Ping Tsai updated HBASE-20171: --- Fix Version/s: 2.0.0 > Remove o.a.h.h.ProcedureState > - > > Key: HBASE-20171 > URL: https://issues.apache.org/jira/browse/HBASE-20171 > Project: HBase > Issue Type: Task > Environment: It was introduced by HBASE-15609, and HBASE-18106 make > it be a orphan >Reporter: Chia-Ping Tsai >Priority: Minor > Labels: beginner, beginners > Fix For: 2.0.0, 3.0.0, 2.1.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20171) Remove o.a.h.h.ProcedureState
[ https://issues.apache.org/jira/browse/HBASE-20171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chia-Ping Tsai updated HBASE-20171: --- Environment: (was: It was introduced by HBASE-15609, and HBASE-18106 make it be a orphan) > Remove o.a.h.h.ProcedureState > - > > Key: HBASE-20171 > URL: https://issues.apache.org/jira/browse/HBASE-20171 > Project: HBase > Issue Type: Task >Reporter: Chia-Ping Tsai >Priority: Minor > Labels: beginner, beginners > Fix For: 2.0.0, 3.0.0, 2.1.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20171) Remove o.a.h.h.ProcedureState
[ https://issues.apache.org/jira/browse/HBASE-20171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394509#comment-16394509 ] Chia-Ping Tsai commented on HBASE-20171: The class is notated as Public. If we want to remove it from 2.x, we must put the patch to branch-2.0 also. > Remove o.a.h.h.ProcedureState > - > > Key: HBASE-20171 > URL: https://issues.apache.org/jira/browse/HBASE-20171 > Project: HBase > Issue Type: Task > Environment: It was introduced by HBASE-15609, and HBASE-18106 make > it be a orphan >Reporter: Chia-Ping Tsai >Priority: Minor > Labels: beginner, beginners > Fix For: 2.0.0, 3.0.0, 2.1.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20172) During coprocessor load, switch classloader only if it's a custom CP.
[ https://issues.apache.org/jira/browse/HBASE-20172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394506#comment-16394506 ] Hadoop QA commented on HBASE-20172: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 23s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 10s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 5m 53s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 47s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 38s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 18m 51s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.5 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}110m 27s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 20s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}152m 26s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 | | JIRA Issue | HBASE-20172 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12913929/HBASE-20172.patch | | Optional Tests | asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux bb2b73614e32 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / d5aaeee88b | | maven | version: Apache Maven 3.5.3 (3383c37e1f9e9b3bc3df5050c29c8aff9f295297; 2018-02-24T19:49:05Z) | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC3 | | unit | https://builds.apache.org/job/PreCommit-HBASE-Build/11902/artifact/patchprocess/patch-unit-hbase-server.txt | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/11902/testReport/ | | Max. process+thread count | 4925 (vs. ulimit of 1) | | modules | C: hbase-server U: hbase-server
[jira] [Commented] (HBASE-20133) Calculate correct assignment and build region movement plans for mis-placed regions in one pass
[ https://issues.apache.org/jira/browse/HBASE-20133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394493#comment-16394493 ] Hudson commented on HBASE-20133: Results for branch master [build #259 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/259/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/master/259//General_Nightly_Build_Report/] (/) {color:green}+1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/master/259//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/master/259//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. > Calculate correct assignment and build region movement plans for mis-placed > regions in one pass > --- > > Key: HBASE-20133 > URL: https://issues.apache.org/jira/browse/HBASE-20133 > Project: HBase > Issue Type: Improvement > Components: rsgroup >Reporter: Xiang Li >Assignee: Xiang Li >Priority: Minor > Fix For: 3.0.0 > > Attachments: HBASE-20133.master.000.patch, > HBASE-20133.master.001.patch, HBASE-20133.master.002.patch > > > In RSGroupBasedLoadBalancer#balanceCluster(clusterState), the logic could be > improved: > correctAssignment() builds a map for mis-placed and placed regions. For > mis-placed regions, the key(ServerName) is BOGUS_SERVER_NAME. Then the logic > gets those mis-paced regions out and calls findServerForRegion() several > times to find out the current host server, in order to build RegionPlan for > movement. > Some logic in correctAssignment() and findServerForRegion() could be merged > so as to build both corrected assignment and RegionPlan for mis-placed region > in one pass. As a result, findServerForRegion() could be removed if I get it > correctly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19969) Improve fault tolerance in backup merge operation
[ https://issues.apache.org/jira/browse/HBASE-19969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394494#comment-16394494 ] Hudson commented on HBASE-19969: Results for branch master [build #259 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/259/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/master/259//General_Nightly_Build_Report/] (/) {color:green}+1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/master/259//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/master/259//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. > Improve fault tolerance in backup merge operation > - > > Key: HBASE-19969 > URL: https://issues.apache.org/jira/browse/HBASE-19969 > Project: HBase > Issue Type: Sub-task >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Fix For: 3.0.0 > > Attachments: 19969-v4.patch, HBASE-19969-v1.patch, > HBASE-19969-v2.patch, HBASE-19969-v3.patch > > > Some file system operations are not fault tolerant during merge. We delete > backup data in a backup file system, then copy new data over to backup > destination. Deletes can be partial, copy can fail as well -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19389) Limit concurrency of put with dense (hundreds) columns to prevent write handler exhausted
[ https://issues.apache.org/jira/browse/HBASE-19389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394492#comment-16394492 ] Hadoop QA commented on HBASE-19389: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 30s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 59s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 3s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 48s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 6m 54s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 34s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 48s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 4s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 18s{color} | {color:red} hbase-server: The patch generated 5 new + 388 unchanged - 1 fixed = 393 total (was 389) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 49s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 21m 1s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.5 2.7.4 or 3.0.0. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 46s{color} | {color:red} hbase-server generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 5s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 35s{color} | {color:green} hbase-common in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 22m 28s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 24s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 76m 40s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hbase-server | | | org.apache.hadoop.hbase.regionserver.HStore.add(Iterable, MemStoreSizing) does not release lock on all exception paths At HStore.java:lock on all exception paths At HStore.java:[line 728] | | | org.apache.hadoop.hbase.regionserver.HStore.add(Cell, MemStoreSizing) does not release lock on all exception paths At HStore.java:lock on all exception paths At HStore.java:[line 709] | | Failed junit tests | hadoop.hbase.regionserver.throttle.TestStoreHotnessProtector | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 | | JIRA Issue | HBASE-19389 | | JIRA Patch URL |
[jira] [Commented] (HBASE-19389) Limit concurrency of put with dense (hundreds) columns to prevent write handler exhausted
[ https://issues.apache.org/jira/browse/HBASE-19389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394484#comment-16394484 ] Chance Li commented on HBASE-19389: --- update patch HBASE-19389-branch-2-V8.patch, add HBASE-19389.master.patch for master. Fixed an issue when handling the STORE_TOO_BUSY code or a atomic request,. > Limit concurrency of put with dense (hundreds) columns to prevent write > handler exhausted > - > > Key: HBASE-19389 > URL: https://issues.apache.org/jira/browse/HBASE-19389 > Project: HBase > Issue Type: Improvement > Components: Performance >Affects Versions: 2.0.0 > Environment: 2000+ Region Servers > PCI-E ssd >Reporter: Chance Li >Assignee: Chance Li >Priority: Critical > Fix For: 2.0.0 > > Attachments: CSLM-concurrent-write.png, > HBASE-19389-branch-2-V2.patch, HBASE-19389-branch-2-V3.patch, > HBASE-19389-branch-2-V4.patch, HBASE-19389-branch-2-V5.patch, > HBASE-19389-branch-2-V6.patch, HBASE-19389-branch-2-V7.patch, > HBASE-19389-branch-2-V8.patch, HBASE-19389-branch-2-V9.patch, > HBASE-19389-branch-2.patch, HBASE-19389.master.patch, metrics-1.png, > ycsb-result.png > > > In a large cluster, with a large number of clients, we found the RS's > handlers are all busy sometimes. And after investigation we found the root > cause is about CSLM, such as compare function heavy load. We reviewed the > related WALs, and found that there were many columns (more than 1000 columns) > were writing at that time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-19389) Limit concurrency of put with dense (hundreds) columns to prevent write handler exhausted
[ https://issues.apache.org/jira/browse/HBASE-19389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chance Li updated HBASE-19389: -- Attachment: HBASE-19389.master.patch > Limit concurrency of put with dense (hundreds) columns to prevent write > handler exhausted > - > > Key: HBASE-19389 > URL: https://issues.apache.org/jira/browse/HBASE-19389 > Project: HBase > Issue Type: Improvement > Components: Performance >Affects Versions: 2.0.0 > Environment: 2000+ Region Servers > PCI-E ssd >Reporter: Chance Li >Assignee: Chance Li >Priority: Critical > Fix For: 2.0.0 > > Attachments: CSLM-concurrent-write.png, > HBASE-19389-branch-2-V2.patch, HBASE-19389-branch-2-V3.patch, > HBASE-19389-branch-2-V4.patch, HBASE-19389-branch-2-V5.patch, > HBASE-19389-branch-2-V6.patch, HBASE-19389-branch-2-V7.patch, > HBASE-19389-branch-2-V8.patch, HBASE-19389-branch-2-V9.patch, > HBASE-19389-branch-2.patch, HBASE-19389.master.patch, metrics-1.png, > ycsb-result.png > > > In a large cluster, with a large number of clients, we found the RS's > handlers are all busy sometimes. And after investigation we found the root > cause is about CSLM, such as compare function heavy load. We reviewed the > related WALs, and found that there were many columns (more than 1000 columns) > were writing at that time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-19389) Limit concurrency of put with dense (hundreds) columns to prevent write handler exhausted
[ https://issues.apache.org/jira/browse/HBASE-19389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chance Li updated HBASE-19389: -- Attachment: HBASE-19389-branch-2-V9.patch > Limit concurrency of put with dense (hundreds) columns to prevent write > handler exhausted > - > > Key: HBASE-19389 > URL: https://issues.apache.org/jira/browse/HBASE-19389 > Project: HBase > Issue Type: Improvement > Components: Performance >Affects Versions: 2.0.0 > Environment: 2000+ Region Servers > PCI-E ssd >Reporter: Chance Li >Assignee: Chance Li >Priority: Critical > Fix For: 2.0.0 > > Attachments: CSLM-concurrent-write.png, > HBASE-19389-branch-2-V2.patch, HBASE-19389-branch-2-V3.patch, > HBASE-19389-branch-2-V4.patch, HBASE-19389-branch-2-V5.patch, > HBASE-19389-branch-2-V6.patch, HBASE-19389-branch-2-V7.patch, > HBASE-19389-branch-2-V8.patch, HBASE-19389-branch-2-V9.patch, > HBASE-19389-branch-2.patch, HBASE-19389.master.patch, metrics-1.png, > ycsb-result.png > > > In a large cluster, with a large number of clients, we found the RS's > handlers are all busy sometimes. And after investigation we found the root > cause is about CSLM, such as compare function heavy load. We reviewed the > related WALs, and found that there were many columns (more than 1000 columns) > were writing at that time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20172) During coprocessor load, switch classloader only if it's a custom CP.
[ https://issues.apache.org/jira/browse/HBASE-20172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-20172: --- Status: Patch Available (was: Open) > During coprocessor load, switch classloader only if it's a custom CP. > - > > Key: HBASE-20172 > URL: https://issues.apache.org/jira/browse/HBASE-20172 > Project: HBase > Issue Type: Bug >Affects Versions: 1.4.0 >Reporter: Ankit Singhal >Assignee: Ankit Singhal >Priority: Major > Fix For: 2.0.0 > > Attachments: HBASE-20172.patch > > > Current Impact:- > Metric registries will not be able to load its implementation through service > loader and etc. > We are not observing this with Java8 because ServiceLoader uses system class > loader if provided class loader is null but it gets exposed with Java7 > easily(TEPHRA-285) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20172) During coprocessor load, switch classloader only if it's a custom CP.
[ https://issues.apache.org/jira/browse/HBASE-20172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Singhal updated HBASE-20172: -- Attachment: HBASE-20172.patch > During coprocessor load, switch classloader only if it's a custom CP. > - > > Key: HBASE-20172 > URL: https://issues.apache.org/jira/browse/HBASE-20172 > Project: HBase > Issue Type: Bug >Affects Versions: 1.4.0 >Reporter: Ankit Singhal >Assignee: Ankit Singhal >Priority: Major > Fix For: 2.0.0 > > Attachments: HBASE-20172.patch > > > Current Impact:- > Metric registries will not be able to load its implementation through service > loader and etc. > We are not observing this with Java8 because ServiceLoader uses system class > loader if provided class loader is null but it gets exposed with Java7 > easily(TEPHRA-285) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20172) During coprocessor load, switch classloader only if it's a custom CP.
Ankit Singhal created HBASE-20172: - Summary: During coprocessor load, switch classloader only if it's a custom CP. Key: HBASE-20172 URL: https://issues.apache.org/jira/browse/HBASE-20172 Project: HBase Issue Type: Bug Affects Versions: 1.4.0 Reporter: Ankit Singhal Assignee: Ankit Singhal Fix For: 2.0.0 Current Impact:- Metric registries will not be able to load its implementation through service loader and etc. We are not observing this with Java8 because ServiceLoader uses system class loader if provided class loader is null but it gets exposed with Java7 easily(TEPHRA-285) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20165) Shell command to make a normal peer to be a serial replication peer
[ https://issues.apache.org/jira/browse/HBASE-20165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394457#comment-16394457 ] Duo Zhang commented on HBASE-20165: --- +1. To be honest I never understand the violation of rubocop and ruby-lint... > Shell command to make a normal peer to be a serial replication peer > --- > > Key: HBASE-20165 > URL: https://issues.apache.org/jira/browse/HBASE-20165 > Project: HBase > Issue Type: Sub-task >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Major > Attachments: HBASE-20165.v1.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20167) Optimize the implementation of ReplicationSourceWALReader
[ https://issues.apache.org/jira/browse/HBASE-20167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394447#comment-16394447 ] Duo Zhang commented on HBASE-20167: --- OK, this is a refactoring so no new tests, others are all green. Any concerns? [~zghaobac] [~openinx]? I think this patch can make our code much cleaner. Thanks. > Optimize the implementation of ReplicationSourceWALReader > - > > Key: HBASE-20167 > URL: https://issues.apache.org/jira/browse/HBASE-20167 > Project: HBase > Issue Type: Sub-task > Components: Replication >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0 > > Attachments: HBASE-20167-v1.patch, HBASE-20167.patch > > > After HBASE-20148, serial replication will be an option for peer. Since an > instance of ReplicationSourceWALReader can only belongs to one peer, we do > not need to add the so many 'if' in the implementation of readWALEntries to > check whether we should consider serial replication. We can just make a sub > class or something similiar for serial replication to make the code clean. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20167) Optimize the implementation of ReplicationSourceWALReader
[ https://issues.apache.org/jira/browse/HBASE-20167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394445#comment-16394445 ] Hadoop QA commented on HBASE-20167: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 19s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 11s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 5m 57s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 47s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 10s{color} | {color:green} hbase-server: The patch generated 0 new + 0 unchanged - 1 fixed = 0 total (was 1) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 46s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 18m 37s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.5 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green}106m 10s{color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 21s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}147m 44s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 | | JIRA Issue | HBASE-20167 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12913922/HBASE-20167-v1.patch | | Optional Tests | asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux 7c2b35ec7db0 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / d5aaeee88b | | maven | version: Apache Maven 3.5.3 (3383c37e1f9e9b3bc3df5050c29c8aff9f295297; 2018-02-24T19:49:05Z) | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC3 | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/11901/testReport/ | | Max. process+thread count | 4542 (vs. ulimit of 1) | | modules | C: hbase-server U: hbase-server | | Console output |