[jira] [Updated] (HBASE-13336) Consistent rules for security meta table protections
[ https://issues.apache.org/jira/browse/HBASE-13336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Srikanth Srungarapu updated HBASE-13336: Attachment: HBASE-13336_v2.patch Consistent rules for security meta table protections Key: HBASE-13336 URL: https://issues.apache.org/jira/browse/HBASE-13336 Project: HBase Issue Type: Improvement Reporter: Andrew Purtell Assignee: Srikanth Srungarapu Fix For: 2.0.0, 0.98.14, 1.3.0 Attachments: HBASE-13336.patch, HBASE-13336_v2.patch The AccessController and VisibilityController do different things regarding protecting their meta tables. The AC allows schema changes and disable/enable if the user has permission. The VC unconditionally disallows all admin actions. Generally, bad things will happen if these meta tables are damaged, disabled, or dropped. The likely outcome is random frequent (or constant) server side op failures with nasty stack traces. On the other hand some things like column family and table attribute changes can have valid use cases. We should have consistent and sensible rules for protecting security meta tables. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13336) Consistent rules for security meta table protections
[ https://issues.apache.org/jira/browse/HBASE-13336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Srikanth Srungarapu updated HBASE-13336: Attachment: (was: HBASE-13336_v2.patch) Consistent rules for security meta table protections Key: HBASE-13336 URL: https://issues.apache.org/jira/browse/HBASE-13336 Project: HBase Issue Type: Improvement Reporter: Andrew Purtell Assignee: Srikanth Srungarapu Fix For: 2.0.0, 0.98.14, 1.3.0 Attachments: HBASE-13336.patch, HBASE-13336_v2.patch The AccessController and VisibilityController do different things regarding protecting their meta tables. The AC allows schema changes and disable/enable if the user has permission. The VC unconditionally disallows all admin actions. Generally, bad things will happen if these meta tables are damaged, disabled, or dropped. The likely outcome is random frequent (or constant) server side op failures with nasty stack traces. On the other hand some things like column family and table attribute changes can have valid use cases. We should have consistent and sensible rules for protecting security meta tables. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (HBASE-13336) Consistent rules for security meta table protections
[ https://issues.apache.org/jira/browse/HBASE-13336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HBASE-13336 started by Srikanth Srungarapu. --- Consistent rules for security meta table protections Key: HBASE-13336 URL: https://issues.apache.org/jira/browse/HBASE-13336 Project: HBase Issue Type: Improvement Reporter: Andrew Purtell Assignee: Srikanth Srungarapu Fix For: 2.0.0, 0.98.14, 1.3.0 Attachments: HBASE-13336.patch, HBASE-13336_v2.patch The AccessController and VisibilityController do different things regarding protecting their meta tables. The AC allows schema changes and disable/enable if the user has permission. The VC unconditionally disallows all admin actions. Generally, bad things will happen if these meta tables are damaged, disabled, or dropped. The likely outcome is random frequent (or constant) server side op failures with nasty stack traces. On the other hand some things like column family and table attribute changes can have valid use cases. We should have consistent and sensible rules for protecting security meta tables. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13959) Region splitting takes too long because it uses a single thread in most common cases
[ https://issues.apache.org/jira/browse/HBASE-13959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601785#comment-14601785 ] Lars Hofhansl commented on HBASE-13959: --- Specifically we can set the default maximum to the 1/2 #blockingStoreFiles. That way we have a good default, and folks can override and (a) decrease if they set blockStoreFiles to a large value or (b) increase if they have many column families. Region splitting takes too long because it uses a single thread in most common cases Key: HBASE-13959 URL: https://issues.apache.org/jira/browse/HBASE-13959 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.98.12 Reporter: Hari Krishna Dara Assignee: Hari Krishna Dara Fix For: 0.98.14 Attachments: HBASE-13959-2.patch, HBASE-13959-3.patch, HBASE-13959-4.patch, HBASE-13959.patch, region-split-durations-compared.png When storefiles need to be split as part of a region split, the current logic uses a threadpool with the size set to the size of the number of stores. Since most common table setup involves only a single column family, this translates to having a single store and so the threadpool is run with a single thread. However, in a write heavy workload, there could be several tens of storefiles in a store at the time of splitting, and with a threadpool size of one, these files end up getting split sequentially. With a bit of tracing, I noticed that it takes on an average of 350ms to create a single reference file, and splitting each storefile involves creating two of these, so with a storefile count of 20, it takes about 14s just to get through this phase alone (2 reference files for each storefile), pushing the total time the region is offline to 18s or more. For environments that are setup to fail fast, this makes the client exhaust all retries and fail with NotServingRegionException. The fix should increase the concurrency of this operation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13835) KeyValueHeap.current might be in heap when exception happens in pollRealKV
[ https://issues.apache.org/jira/browse/HBASE-13835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601867#comment-14601867 ] Hudson commented on HBASE-13835: FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #991 (See [https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/991/]) HBASE-13835 KeyValueHeap.current might be in heap when exception happens in pollRealKV. (zhouyingchao) (anoopsamjohn: rev 46e9a8ea0a276cf23b33fbcafba8f00611c3c885) * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ReversedKeyValueHeap.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java KeyValueHeap.current might be in heap when exception happens in pollRealKV -- Key: HBASE-13835 URL: https://issues.apache.org/jira/browse/HBASE-13835 Project: HBase Issue Type: Bug Components: Scanners Reporter: zhouyingchao Assignee: zhouyingchao Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.2, 1.3.0 Attachments: HBASE-13835-001.patch, HBASE-13835-002.patch, HBASE-13835-branch1-001.patch, HBASE-13835_0.98.patch, HBASE-13835_branch-1.0.patch, HBASE-13835_branch-1.patch, HBASE-13835_branch-1.patch In a 0.94 hbase cluster, we found a NPE with following stack: {code} Exception in thread regionserver21600.leaseChecker java.lang.NullPointerException at org.apache.hadoop.hbase.KeyValue$KVComparator.compare(KeyValue.java:1530) at org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:225) at org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:201) at org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:191) at java.util.PriorityQueue.siftDownUsingComparator(PriorityQueue.java:641) at java.util.PriorityQueue.siftDown(PriorityQueue.java:612) at java.util.PriorityQueue.poll(PriorityQueue.java:523) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.close(KeyValueHeap.java:241) at org.apache.hadoop.hbase.regionserver.StoreScanner.close(StoreScanner.java:355) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.close(KeyValueHeap.java:237) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.close(HRegion.java:4302) at org.apache.hadoop.hbase.regionserver.HRegionServer$ScannerListener.leaseExpired(HRegionServer.java:3033) at org.apache.hadoop.hbase.regionserver.Leases.run(Leases.java:119) at java.lang.Thread.run(Thread.java:662) {code} Before this NPE exception, there is an exception happens in pollRealKV, which we think is the culprit of the NPE. {code} ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: java.io.IOException: Could not reseek StoreFileScanner[HFileScanner for reader reader= at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:180) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.enforceSeek(StoreFileScanner.java:371) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:366) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:116) at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:455) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:154) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:4124) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:4196) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:4067) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:4057) at org.apache.hadoop.hbase.regionserver.HRegionServer.internalNext(HRegionServer.java:2898) at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2833) at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2815) at sun.reflect.GeneratedMethodAccessor38.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.SecureRpcEngine$Server.call(SecureRpcEngine.java:337) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1583) {code} Simply put, if there is an exception happens in
[jira] [Resolved] (HBASE-13972) Hanging test finder should report killed test
[ https://issues.apache.org/jira/browse/HBASE-13972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu resolved HBASE-13972. Resolution: Cannot Reproduce Subsequent run of findHangingTests.py reported TestProcedureStoreTracker as hanging test. Hanging test finder should report killed test - Key: HBASE-13972 URL: https://issues.apache.org/jira/browse/HBASE-13972 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Priority: Minor I was looking at https://builds.apache.org/job/PreCommit-HBASE-Build/14576/console and found that findHangingTests.py didn't report any hanging / failing test. {code} Running org.apache.hadoop.hbase.procedure2.store.TestProcedureStoreTracker Killed {code} It turns out that findHangingTests.py didn't distinguish the state for tests that were killed. Patch coming shortly which allows printing of killed test(s) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HBASE-13959) Region splitting takes too long because it uses a single thread in most common cases
[ https://issues.apache.org/jira/browse/HBASE-13959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601785#comment-14601785 ] Lars Hofhansl edited comment on HBASE-13959 at 6/25/15 7:30 PM: Specifically we can set the default maximum to the 1/2 #blockingStoreFiles (or maybe just #blockingStoreFiles) That way we have a good default, and folks can override and (a) decrease if they set blockStoreFiles to a large value or (b) increase if they have many column families. was (Author: lhofhansl): Specifically we can set the default maximum to the 1/2 #blockingStoreFiles. That way we have a good default, and folks can override and (a) decrease if they set blockStoreFiles to a large value or (b) increase if they have many column families. Region splitting takes too long because it uses a single thread in most common cases Key: HBASE-13959 URL: https://issues.apache.org/jira/browse/HBASE-13959 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.98.12 Reporter: Hari Krishna Dara Assignee: Hari Krishna Dara Fix For: 0.98.14 Attachments: HBASE-13959-2.patch, HBASE-13959-3.patch, HBASE-13959-4.patch, HBASE-13959.patch, region-split-durations-compared.png When storefiles need to be split as part of a region split, the current logic uses a threadpool with the size set to the size of the number of stores. Since most common table setup involves only a single column family, this translates to having a single store and so the threadpool is run with a single thread. However, in a write heavy workload, there could be several tens of storefiles in a store at the time of splitting, and with a threadpool size of one, these files end up getting split sequentially. With a bit of tracing, I noticed that it takes on an average of 350ms to create a single reference file, and splitting each storefile involves creating two of these, so with a storefile count of 20, it takes about 14s just to get through this phase alone (2 reference files for each storefile), pushing the total time the region is offline to 18s or more. For environments that are setup to fail fast, this makes the client exhaust all retries and fail with NotServingRegionException. The fix should increase the concurrency of this operation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13835) KeyValueHeap.current might be in heap when exception happens in pollRealKV
[ https://issues.apache.org/jira/browse/HBASE-13835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601842#comment-14601842 ] Hudson commented on HBASE-13835: FAILURE: Integrated in HBase-1.1 #557 (See [https://builds.apache.org/job/HBase-1.1/557/]) HBASE-13835 KeyValueHeap.current might be in heap when exception happens in pollRealKV. (zhouyingchao) (anoopsamjohn: rev a7f31ce357d8d90922fd7530bfc008839c2fa72d) * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ReversedKeyValueHeap.java * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java KeyValueHeap.current might be in heap when exception happens in pollRealKV -- Key: HBASE-13835 URL: https://issues.apache.org/jira/browse/HBASE-13835 Project: HBase Issue Type: Bug Components: Scanners Reporter: zhouyingchao Assignee: zhouyingchao Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.2, 1.3.0 Attachments: HBASE-13835-001.patch, HBASE-13835-002.patch, HBASE-13835-branch1-001.patch, HBASE-13835_0.98.patch, HBASE-13835_branch-1.0.patch, HBASE-13835_branch-1.patch, HBASE-13835_branch-1.patch In a 0.94 hbase cluster, we found a NPE with following stack: {code} Exception in thread regionserver21600.leaseChecker java.lang.NullPointerException at org.apache.hadoop.hbase.KeyValue$KVComparator.compare(KeyValue.java:1530) at org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:225) at org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:201) at org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:191) at java.util.PriorityQueue.siftDownUsingComparator(PriorityQueue.java:641) at java.util.PriorityQueue.siftDown(PriorityQueue.java:612) at java.util.PriorityQueue.poll(PriorityQueue.java:523) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.close(KeyValueHeap.java:241) at org.apache.hadoop.hbase.regionserver.StoreScanner.close(StoreScanner.java:355) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.close(KeyValueHeap.java:237) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.close(HRegion.java:4302) at org.apache.hadoop.hbase.regionserver.HRegionServer$ScannerListener.leaseExpired(HRegionServer.java:3033) at org.apache.hadoop.hbase.regionserver.Leases.run(Leases.java:119) at java.lang.Thread.run(Thread.java:662) {code} Before this NPE exception, there is an exception happens in pollRealKV, which we think is the culprit of the NPE. {code} ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: java.io.IOException: Could not reseek StoreFileScanner[HFileScanner for reader reader= at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:180) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.enforceSeek(StoreFileScanner.java:371) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:366) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:116) at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:455) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:154) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:4124) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:4196) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:4067) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:4057) at org.apache.hadoop.hbase.regionserver.HRegionServer.internalNext(HRegionServer.java:2898) at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2833) at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2815) at sun.reflect.GeneratedMethodAccessor38.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.SecureRpcEngine$Server.call(SecureRpcEngine.java:337) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1583) {code} Simply put, if there is an exception happens in pollRealKV( ), the
[jira] [Updated] (HBASE-13832) Procedure V2: master fail to start due to WALProcedureStore sync failures when HDFS data nodes count is low
[ https://issues.apache.org/jira/browse/HBASE-13832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HBASE-13832: - Fix Version/s: 1.2.1 1.3.0 1.1.2 2.0.0 Procedure V2: master fail to start due to WALProcedureStore sync failures when HDFS data nodes count is low --- Key: HBASE-13832 URL: https://issues.apache.org/jira/browse/HBASE-13832 Project: HBase Issue Type: Sub-task Components: master, proc-v2 Affects Versions: 2.0.0, 1.1.0, 1.2.0 Reporter: Stephen Yuan Jiang Assignee: Matteo Bertozzi Priority: Critical Fix For: 2.0.0, 1.1.2, 1.3.0, 1.2.1 Attachments: HBASE-13832-v0.patch, HDFSPipeline.java when the data node 3, we got failure in WALProcedureStore#syncLoop() during master start. The failure prevents master to get started. {noformat} 2015-05-29 13:27:16,625 ERROR [WALProcedureStoreSyncThread] wal.WALProcedureStore: Sync slot failed, abort. java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[DatanodeInfoWithStorage[10.333.444.555:50010,DS-3ced-93f4-47b6-9c23-1426f7a6acdc,DISK], DatanodeInfoWithStorage[10.222.666.777:50010,DS-f9c983b4-1f10-4d5e-8983-490ece56c772,DISK]], original=[DatanodeInfoWithStorage[10.333.444.555:50010,DS-3ced-93f4-47b6-9c23-1426f7a6acdc,DISK], DatanodeInfoWithStorage[10.222.666.777:50010,DS-f9c983b4-1f10-4d5e-8983- 490ece56c772,DISK]]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration. at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:951) {noformat} One proposal is to implement some similar logic as FSHLog: if IOException is thrown during syncLoop in WALProcedureStore#start(), instead of immediate abort, we could try to roll the log and see whether this resolve the issue; if the new log cannot be created or more exception from rolling the log, we then abort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13832) Procedure V2: master fail to start due to WALProcedureStore sync failures when HDFS data nodes count is low
[ https://issues.apache.org/jira/browse/HBASE-13832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601868#comment-14601868 ] Matteo Bertozzi commented on HBASE-13832: - even with the patch the master will not start until the 3rd data node is back. in theory you should ping pong between the backup masters until something a DN is available. what the patch does is just retry for some time hoping that a 3rd data node is online, before giving up. Procedure V2: master fail to start due to WALProcedureStore sync failures when HDFS data nodes count is low --- Key: HBASE-13832 URL: https://issues.apache.org/jira/browse/HBASE-13832 Project: HBase Issue Type: Sub-task Components: master, proc-v2 Affects Versions: 2.0.0, 1.1.0, 1.2.0 Reporter: Stephen Yuan Jiang Assignee: Matteo Bertozzi Priority: Critical Fix For: 2.0.0, 1.1.2, 1.3.0, 1.2.1 Attachments: HBASE-13832-v0.patch, HDFSPipeline.java when the data node 3, we got failure in WALProcedureStore#syncLoop() during master start. The failure prevents master to get started. {noformat} 2015-05-29 13:27:16,625 ERROR [WALProcedureStoreSyncThread] wal.WALProcedureStore: Sync slot failed, abort. java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[DatanodeInfoWithStorage[10.333.444.555:50010,DS-3ced-93f4-47b6-9c23-1426f7a6acdc,DISK], DatanodeInfoWithStorage[10.222.666.777:50010,DS-f9c983b4-1f10-4d5e-8983-490ece56c772,DISK]], original=[DatanodeInfoWithStorage[10.333.444.555:50010,DS-3ced-93f4-47b6-9c23-1426f7a6acdc,DISK], DatanodeInfoWithStorage[10.222.666.777:50010,DS-f9c983b4-1f10-4d5e-8983- 490ece56c772,DISK]]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration. at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:951) {noformat} One proposal is to implement some similar logic as FSHLog: if IOException is thrown during syncLoop in WALProcedureStore#start(), instead of immediate abort, we could try to roll the log and see whether this resolve the issue; if the new log cannot be created or more exception from rolling the log, we then abort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13702) ImportTsv: Add dry-run functionality and log bad rows
[ https://issues.apache.org/jira/browse/HBASE-13702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601878#comment-14601878 ] Hadoop QA commented on HBASE-13702: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12741902/HBASE-13702-v5.patch against master branch at commit edef3d64bce41fffbc5649ffa19b2cf80ce28d7a. ATTACHMENT ID: 12741902 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 7 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.util.TestHBaseFsck org.apache.hadoop.hbase.TestRegionRebalancing Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/14574//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/14574//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/14574//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/14574//console This message is automatically generated. ImportTsv: Add dry-run functionality and log bad rows - Key: HBASE-13702 URL: https://issues.apache.org/jira/browse/HBASE-13702 Project: HBase Issue Type: New Feature Reporter: Apekshit Sharma Assignee: Apekshit Sharma Fix For: 2.0.0, 1.3.0 Attachments: HBASE-13702-v2.patch, HBASE-13702-v3.patch, HBASE-13702-v4.patch, HBASE-13702-v5.patch, HBASE-13702.patch ImportTSV job skips bad records by default (keeps a count though). -Dimporttsv.skip.bad.lines=false can be used to fail if a bad row is encountered. To be easily able to determine which rows are corrupted in an input, rather than failing on one row at a time seems like a good feature to have. Moreover, there should be 'dry-run' functionality in such kinds of tools, which can essentially does a quick run of tool without making any changes but reporting any errors/warnings and success/failure. To identify corrupted rows, simply logging them should be enough. In worst case, all rows will be logged and size of logs will be same as input size, which seems fine. However, user might have to do some work figuring out where the logs. Is there some link we can show to the user when the tool starts which can help them with that? For the dry run, we can simply use if-else to skip over writing out KVs, and any other mutations, if present. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13959) Region splitting takes too long because it uses a single thread in most common cases
[ https://issues.apache.org/jira/browse/HBASE-13959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-13959: -- Attachment: 13959-suggest.txt Something like this for example. (This does leak HStore references into SplitTransactionImpl, though) I'm just trying to a good default, rather than something else that everyone will have to configure. Region splitting takes too long because it uses a single thread in most common cases Key: HBASE-13959 URL: https://issues.apache.org/jira/browse/HBASE-13959 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.98.12 Reporter: Hari Krishna Dara Assignee: Hari Krishna Dara Fix For: 0.98.14 Attachments: 13959-suggest.txt, HBASE-13959-2.patch, HBASE-13959-3.patch, HBASE-13959-4.patch, HBASE-13959.patch, region-split-durations-compared.png When storefiles need to be split as part of a region split, the current logic uses a threadpool with the size set to the size of the number of stores. Since most common table setup involves only a single column family, this translates to having a single store and so the threadpool is run with a single thread. However, in a write heavy workload, there could be several tens of storefiles in a store at the time of splitting, and with a threadpool size of one, these files end up getting split sequentially. With a bit of tracing, I noticed that it takes on an average of 350ms to create a single reference file, and splitting each storefile involves creating two of these, so with a storefile count of 20, it takes about 14s just to get through this phase alone (2 reference files for each storefile), pushing the total time the region is offline to 18s or more. For environments that are setup to fail fast, this makes the client exhaust all retries and fail with NotServingRegionException. The fix should increase the concurrency of this operation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13832) Procedure V2: master fail to start due to WALProcedureStore sync failures when HDFS data nodes count is low
[ https://issues.apache.org/jira/browse/HBASE-13832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14602383#comment-14602383 ] Hadoop QA commented on HBASE-13832: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12742004/HBASE-13832-v1.patch against master branch at commit 2ed058554c0b6d6da0388497562e254107f13d67. ATTACHMENT ID: 12742004 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:red}-1 checkstyle{color}. The applied patch generated 1902 checkstyle errors (more than the master's current 1901 errors). {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/14582//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/14582//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/14582//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/14582//console This message is automatically generated. Procedure V2: master fail to start due to WALProcedureStore sync failures when HDFS data nodes count is low --- Key: HBASE-13832 URL: https://issues.apache.org/jira/browse/HBASE-13832 Project: HBase Issue Type: Sub-task Components: master, proc-v2 Affects Versions: 2.0.0, 1.1.0, 1.2.0 Reporter: Stephen Yuan Jiang Assignee: Matteo Bertozzi Priority: Critical Fix For: 2.0.0, 1.1.2, 1.3.0, 1.2.1 Attachments: HBASE-13832-v0.patch, HBASE-13832-v1.patch, HDFSPipeline.java when the data node 3, we got failure in WALProcedureStore#syncLoop() during master start. The failure prevents master to get started. {noformat} 2015-05-29 13:27:16,625 ERROR [WALProcedureStoreSyncThread] wal.WALProcedureStore: Sync slot failed, abort. java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[DatanodeInfoWithStorage[10.333.444.555:50010,DS-3ced-93f4-47b6-9c23-1426f7a6acdc,DISK], DatanodeInfoWithStorage[10.222.666.777:50010,DS-f9c983b4-1f10-4d5e-8983-490ece56c772,DISK]], original=[DatanodeInfoWithStorage[10.333.444.555:50010,DS-3ced-93f4-47b6-9c23-1426f7a6acdc,DISK], DatanodeInfoWithStorage[10.222.666.777:50010,DS-f9c983b4-1f10-4d5e-8983- 490ece56c772,DISK]]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration. at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:951) {noformat} One proposal is to implement some similar logic as FSHLog: if IOException is thrown during syncLoop in WALProcedureStore#start(), instead of immediate abort, we could try to roll the log and see whether this resolve the issue; if the new log cannot be created or more exception from rolling the log, we then abort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13750) set up jenkins builds that run branch-1 ITs with java 8
[ https://issues.apache.org/jira/browse/HBASE-13750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14602390#comment-14602390 ] Sean Busbey commented on HBASE-13750: - But yes, the matrix is part of the jenkins config. Sorry for leaving that out. set up jenkins builds that run branch-1 ITs with java 8 --- Key: HBASE-13750 URL: https://issues.apache.org/jira/browse/HBASE-13750 Project: HBase Issue Type: Sub-task Reporter: Sean Busbey Assignee: Sean Busbey Fix For: 1.2.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13750) set up jenkins builds that run branch-1 ITs with java 8
[ https://issues.apache.org/jira/browse/HBASE-13750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14602389#comment-14602389 ] Sean Busbey commented on HBASE-13750: - A given execution is invoke with a line like this: {noformat} $ mvn -Dit.test=IntegrationTestBigLinkedList -Dtest=NoUnitTests clean package verify {noformat} Instead of using an exclusion the matrix just expressly lists each test that should be run. set up jenkins builds that run branch-1 ITs with java 8 --- Key: HBASE-13750 URL: https://issues.apache.org/jira/browse/HBASE-13750 Project: HBase Issue Type: Sub-task Reporter: Sean Busbey Assignee: Sean Busbey Fix For: 1.2.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13939) Make HFileReaderImpl.getFirstKeyInBlock() to return a Cell
[ https://issues.apache.org/jira/browse/HBASE-13939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14602408#comment-14602408 ] stack commented on HBASE-13939: --- Failures are: kalashnikov:hbase.git stack$ python ./dev-support/findHangingTests.py https://builds.apache.org/job/PreCommit-HBASE-Build/14548/consoleText Fetching the console output from the URL Printing hanging tests Printing Failing tests Failing test : org.apache.hadoop.hbase.client.TestFastFail Failing test : org.apache.hadoop.hbase.master.TestDistributedLogSplitting Look unrelated and common failures currently. Looked at patch again. Nice. +1 to commit [~ram_krish] Make HFileReaderImpl.getFirstKeyInBlock() to return a Cell -- Key: HBASE-13939 URL: https://issues.apache.org/jira/browse/HBASE-13939 Project: HBase Issue Type: Sub-task Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Minor Fix For: 2.0.0, 1.1.2 Attachments: HBASE-13939.patch, HBASE-13939_1.patch, HBASE-13939_2.patch, HBASE-13939_3.patch, HBASE-13939_3.patch, HBASE-13939_branch-1.1.patch The getFirstKeyInBlock() in HFileReaderImpl is returning a BB. It is getting used in seekBefore cases. Because we return a BB we create a KeyOnlyKV once for comparison {code} if (reader.getComparator() .compareKeyIgnoresMvcc( new KeyValue.KeyOnlyKeyValue(firstKey.array(), firstKey.arrayOffset(), firstKey.limit()), key) = 0) { long previousBlockOffset = seekToBlock.getPrevBlockOffset(); // The key we are interested in if (previousBlockOffset == -1) { // we have a 'problem', the key we want is the first of the file. return false; } {code} And if the compare fails we again create another KeyOnlyKv {code} Cell firstKeyInCurrentBlock = new KeyValue.KeyOnlyKeyValue(Bytes.getBytes(firstKey)); loadBlockAndSeekToKey(seekToBlock, firstKeyInCurrentBlock, true, key, true); {code} So one object will be enough and that can be returned by getFirstKeyInBlock. Also will be useful when we go with Buffered backed server cell to change in one place. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13947) Use MasterServices instead of Server in AssignmentManager
[ https://issues.apache.org/jira/browse/HBASE-13947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14602423#comment-14602423 ] Hudson commented on HBASE-13947: FAILURE: Integrated in HBase-1.2 #34 (See [https://builds.apache.org/job/HBase-1.2/34/]) HBASE-13947 Use MasterServices instead of Server in AssignmentManager (matteo.bertozzi: rev d476b56c4b3a0f203dcfe8e4d0c652795ac9d50d) * hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java * hbase-server/src/test/java/org/apache/hadoop/hbase/TestDrainingServer.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java Use MasterServices instead of Server in AssignmentManager - Key: HBASE-13947 URL: https://issues.apache.org/jira/browse/HBASE-13947 Project: HBase Issue Type: Improvement Components: master Affects Versions: 1.2.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Trivial Fix For: 1.2.0 Attachments: HBASE-13947-v0-branch-1.patch Working on a patch for branch-1, the AM is using Server instead of MasterServices and does a cast to MasterServices when needed. We should have MasterServices as arg as we do in master. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13969) AuthenticationTokenSecretManager is never stopped in RPCServer
[ https://issues.apache.org/jira/browse/HBASE-13969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14602422#comment-14602422 ] Hudson commented on HBASE-13969: FAILURE: Integrated in HBase-1.2 #34 (See [https://builds.apache.org/job/HBase-1.2/34/]) HBASE-13969 AuthenticationTokenSecretManager is never stopped in RPCServer (Pankaj Kumar) (tedyu: rev 139cb4e979d2b7f19072bfd0873cb9f206a2038e) * hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java AuthenticationTokenSecretManager is never stopped in RPCServer -- Key: HBASE-13969 URL: https://issues.apache.org/jira/browse/HBASE-13969 Project: HBase Issue Type: Bug Affects Versions: 0.98.13 Reporter: Pankaj Kumar Assignee: Pankaj Kumar Priority: Minor Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.2, 1.3.0 Attachments: 13969-V2.patch, HBASE-13969-V2.patch, HBASE-13969.patch AuthenticationTokenSecretManager is never stopped in RPCServer. {code} AuthenticationTokenSecretManager mgr = createSecretManager(); if (mgr != null) { setSecretManager(mgr); mgr.start(); } {code} It should be stopped during exit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13969) AuthenticationTokenSecretManager is never stopped in RPCServer
[ https://issues.apache.org/jira/browse/HBASE-13969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14602424#comment-14602424 ] Hudson commented on HBASE-13969: FAILURE: Integrated in HBase-TRUNK #6605 (See [https://builds.apache.org/job/HBase-TRUNK/6605/]) HBASE-13969 AuthenticationTokenSecretManager is never stopped in RPCServer (Pankaj Kumar) (tedyu: rev 2ed058554c0b6d6da0388497562e254107f13d67) * hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java AuthenticationTokenSecretManager is never stopped in RPCServer -- Key: HBASE-13969 URL: https://issues.apache.org/jira/browse/HBASE-13969 Project: HBase Issue Type: Bug Affects Versions: 0.98.13 Reporter: Pankaj Kumar Assignee: Pankaj Kumar Priority: Minor Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.2, 1.3.0 Attachments: 13969-V2.patch, HBASE-13969-V2.patch, HBASE-13969.patch AuthenticationTokenSecretManager is never stopped in RPCServer. {code} AuthenticationTokenSecretManager mgr = createSecretManager(); if (mgr != null) { setSecretManager(mgr); mgr.start(); } {code} It should be stopped during exit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13969) AuthenticationTokenSecretManager is never stopped in RPCServer
[ https://issues.apache.org/jira/browse/HBASE-13969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14602438#comment-14602438 ] Hudson commented on HBASE-13969: SUCCESS: Integrated in HBase-1.2-IT #24 (See [https://builds.apache.org/job/HBase-1.2-IT/24/]) HBASE-13969 AuthenticationTokenSecretManager is never stopped in RPCServer (Pankaj Kumar) (tedyu: rev 139cb4e979d2b7f19072bfd0873cb9f206a2038e) * hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java AuthenticationTokenSecretManager is never stopped in RPCServer -- Key: HBASE-13969 URL: https://issues.apache.org/jira/browse/HBASE-13969 Project: HBase Issue Type: Bug Affects Versions: 0.98.13 Reporter: Pankaj Kumar Assignee: Pankaj Kumar Priority: Minor Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.2, 1.3.0 Attachments: 13969-V2.patch, HBASE-13969-V2.patch, HBASE-13969.patch AuthenticationTokenSecretManager is never stopped in RPCServer. {code} AuthenticationTokenSecretManager mgr = createSecretManager(); if (mgr != null) { setSecretManager(mgr); mgr.start(); } {code} It should be stopped during exit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13969) AuthenticationTokenSecretManager is never stopped in RPCServer
[ https://issues.apache.org/jira/browse/HBASE-13969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14602448#comment-14602448 ] Hudson commented on HBASE-13969: SUCCESS: Integrated in HBase-1.3-IT #9 (See [https://builds.apache.org/job/HBase-1.3-IT/9/]) HBASE-13969 AuthenticationTokenSecretManager is never stopped in RPCServer (Pankaj Kumar) (tedyu: rev 6e9a30280871987c35dbb67c5d3217915f105d01) * hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java AuthenticationTokenSecretManager is never stopped in RPCServer -- Key: HBASE-13969 URL: https://issues.apache.org/jira/browse/HBASE-13969 Project: HBase Issue Type: Bug Affects Versions: 0.98.13 Reporter: Pankaj Kumar Assignee: Pankaj Kumar Priority: Minor Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.2, 1.3.0 Attachments: 13969-V2.patch, HBASE-13969-V2.patch, HBASE-13969.patch AuthenticationTokenSecretManager is never stopped in RPCServer. {code} AuthenticationTokenSecretManager mgr = createSecretManager(); if (mgr != null) { setSecretManager(mgr); mgr.start(); } {code} It should be stopped during exit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-13970) NPE during compaction in trunk
ramkrishna.s.vasudevan created HBASE-13970: -- Summary: NPE during compaction in trunk Key: HBASE-13970 URL: https://issues.apache.org/jira/browse/HBASE-13970 Project: HBase Issue Type: Bug Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Updated the trunk.. Loaded the table with PE tool. Trigger a flush to ensure all data is flushed out to disk. When the first compaction is triggered we get an NPE and this is very easy to reproduce {code} 015-06-25 21:33:46,041 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received procedure start children changed event: /hbase/flush-table-proc/acquired 2015-06-25 21:33:46,051 INFO [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] regionserver.HRegion: Flushing 1/1 column families, memstore=76.91 MB 2015-06-25 21:33:46,159 ERROR [regionserver/stobdtserver3/10.224.54.70:16040-longCompactions-1435248183945] regionserver.CompactSplitThread: Compaction failed Request = regionName=TestTable,283887,1435248198798.028fb0324cd6eb03d5022eb8c147b7c4., storeName=info, fileCount=3, fileSize=343.4 M (114.5 M, 114.5 M, 114.5 M), priority=3, time=7536968291719985 java.lang.NullPointerException at org.apache.hadoop.hbase.regionserver.compactions.PressureAwareCompactionThroughputController$ActiveCompaction.access$700(PressureAwareCompactionThroughputController.java:79) at org.apache.hadoop.hbase.regionserver.compactions.PressureAwareCompactionThroughputController.finish(PressureAwareCompactionThroughputController.java:238) at org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:306) at org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:106) at org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:112) at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1202) at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1792) at org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:524) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 2015-06-25 21:33:46,745 INFO [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] regionserver.DefaultStoreFlusher: Flushed, sequenceid=1534, memsize=76.9 M, hasBloomFilter=true, into tmp file hdfs://stobdtserver3:9010/hbase/data/default/TestTable/028fb0324cd6eb03d5022eb8c147b7c4/.tmp/942ba0831a0047a08987439e34361a0c 2015-06-25 21:33:46,772 INFO [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] regionserver.HStore: Added hdfs://stobdtserver3:9010/hbase/data/default/TestTable/028fb0324cd6eb03d5022eb8c147b7c4/info/942ba0831a0047a08987439e34361a0c, entries=68116, sequenceid=1534, filesize=68.7 M 2015-06-25 21:33:46,773 INFO [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] regionserver.HRegion: Finished memstore flush of ~76.91 MB/80649344, currentsize=0 B/0 for region TestTable,283887,1435248198798.028fb0324cd6eb03d5022eb8c147b7c4. in 723ms, sequenceid=1534, compaction requested=true 2015-06-25 21:33:46,780 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received created event:/hbase/flush-table-proc/reached/TestTable 2015-06-25 21:33:46,790 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received created event:/hbase/flush-table-proc/abort/TestTable 2015-06-25 21:33:46,791 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received procedure abort children changed event: /hbase/flush-table-proc/abort 2015-06-25 21:33:46,803 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received procedure start children changed event: /hbase/flush-table-proc/acquired 2015-06-25 21:33:46,818 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received procedure abort children changed event: /hbase/flush-table-proc/abort {code} Will check this on what is the reason behind it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13963) avoid leaking jdk.tools
[ https://issues.apache.org/jira/browse/HBASE-13963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601034#comment-14601034 ] Gabor Liptak commented on HBASE-13963: -- I will look at HADOOP-9406 to see their changes. I decided to exclude test-jar dependencies )and wasn't sure about hbase-testing). I will continue to review this evening. avoid leaking jdk.tools --- Key: HBASE-13963 URL: https://issues.apache.org/jira/browse/HBASE-13963 Project: HBase Issue Type: Sub-task Components: build, documentation Reporter: Sean Busbey Assignee: Gabor Liptak Priority: Critical Fix For: 2.0.0, 1.2.0, 1.3.0 Attachments: HBASE-13963.1.patch Right now hbase-annotations uses jdk7 jdk.tools and exposes that to downstream via hbase-client. We need it for building and using our custom doclet, but can improve a couple of things: -1) We should be using a jdk.tools version based on our java version (use jdk activated profiles to set it)- 2) We should not be including any jdk.tools version in our hbase-client transitive dependencies (or other downstream-facing artifacts). Unfortunately, system dependencies are included in transitive resolution, so we'll need to exclude it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13964) Skip region normalization for tables under namespace quota
[ https://issues.apache.org/jira/browse/HBASE-13964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601201#comment-14601201 ] Hudson commented on HBASE-13964: FAILURE: Integrated in HBase-1.3 #17 (See [https://builds.apache.org/job/HBase-1.3/17/]) HBASE-13964 Skip region normalization for tables under namespace quota (tedyu: rev ed72fa212875814f7e44eebaf7789710ec670c6a) * hbase-server/src/main/java/org/apache/hadoop/hbase/namespace/NamespaceAuditor.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java Skip region normalization for tables under namespace quota -- Key: HBASE-13964 URL: https://issues.apache.org/jira/browse/HBASE-13964 Project: HBase Issue Type: Task Components: Balancer, Usability Reporter: Mikhail Antonov Assignee: Ted Yu Fix For: 2.0.0, 1.2.0, 1.3.0 Attachments: 13964-branch-1-v2.txt, 13964-branch-1-v3.txt, 13964-v1.txt As [~te...@apache.org] pointed out in HBASE-13103, we need to discuss how to normalize regions of tables under namespace control. What was proposed is to disable normalization of such tables. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13970) NPE during compaction in trunk
[ https://issues.apache.org/jira/browse/HBASE-13970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14600976#comment-14600976 ] ramkrishna.s.vasudevan commented on HBASE-13970: Note that when I tried to load some more data and when a compaction was triggered no NPE happened. NPE during compaction in trunk -- Key: HBASE-13970 URL: https://issues.apache.org/jira/browse/HBASE-13970 Project: HBase Issue Type: Bug Affects Versions: 2.0.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 2.0.0 Updated the trunk.. Loaded the table with PE tool. Trigger a flush to ensure all data is flushed out to disk. When the first compaction is triggered we get an NPE and this is very easy to reproduce {code} 015-06-25 21:33:46,041 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received procedure start children changed event: /hbase/flush-table-proc/acquired 2015-06-25 21:33:46,051 INFO [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] regionserver.HRegion: Flushing 1/1 column families, memstore=76.91 MB 2015-06-25 21:33:46,159 ERROR [regionserver/stobdtserver3/10.224.54.70:16040-longCompactions-1435248183945] regionserver.CompactSplitThread: Compaction failed Request = regionName=TestTable,283887,1435248198798.028fb0324cd6eb03d5022eb8c147b7c4., storeName=info, fileCount=3, fileSize=343.4 M (114.5 M, 114.5 M, 114.5 M), priority=3, time=7536968291719985 java.lang.NullPointerException at org.apache.hadoop.hbase.regionserver.compactions.PressureAwareCompactionThroughputController$ActiveCompaction.access$700(PressureAwareCompactionThroughputController.java:79) at org.apache.hadoop.hbase.regionserver.compactions.PressureAwareCompactionThroughputController.finish(PressureAwareCompactionThroughputController.java:238) at org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:306) at org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:106) at org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:112) at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1202) at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1792) at org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:524) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 2015-06-25 21:33:46,745 INFO [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] regionserver.DefaultStoreFlusher: Flushed, sequenceid=1534, memsize=76.9 M, hasBloomFilter=true, into tmp file hdfs://stobdtserver3:9010/hbase/data/default/TestTable/028fb0324cd6eb03d5022eb8c147b7c4/.tmp/942ba0831a0047a08987439e34361a0c 2015-06-25 21:33:46,772 INFO [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] regionserver.HStore: Added hdfs://stobdtserver3:9010/hbase/data/default/TestTable/028fb0324cd6eb03d5022eb8c147b7c4/info/942ba0831a0047a08987439e34361a0c, entries=68116, sequenceid=1534, filesize=68.7 M 2015-06-25 21:33:46,773 INFO [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] regionserver.HRegion: Finished memstore flush of ~76.91 MB/80649344, currentsize=0 B/0 for region TestTable,283887,1435248198798.028fb0324cd6eb03d5022eb8c147b7c4. in 723ms, sequenceid=1534, compaction requested=true 2015-06-25 21:33:46,780 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received created event:/hbase/flush-table-proc/reached/TestTable 2015-06-25 21:33:46,790 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received created event:/hbase/flush-table-proc/abort/TestTable 2015-06-25 21:33:46,791 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received procedure abort children changed event: /hbase/flush-table-proc/abort 2015-06-25 21:33:46,803 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received procedure start children changed event: /hbase/flush-table-proc/acquired 2015-06-25 21:33:46,818 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received procedure abort children changed event: /hbase/flush-table-proc/abort {code} Will check this on what is the reason behind it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13959) Region splitting takes too long because it uses a single thread in most common cases
[ https://issues.apache.org/jira/browse/HBASE-13959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601035#comment-14601035 ] Hari Krishna Dara commented on HBASE-13959: --- I just attached region-split-durations-compared.png. I have done a basic comparison of split times with one thread vs 8 threads on a table. The table had no presplits and had a single column family. Starting from an empty table, I loaded 400M rows (about 570 bytes/row). The run with 1 thread encountered NSRE exceptions a few times that coincides with a long running split. The run with 8 threads had no NSRE's. Here are some numbers: Thread pool size = 1 Number of splits: 27 Average split duration: 8.44s Min split duration: 3 Max split duration: 16 p99 split duration: 16 Thread pool size = 8 Number of splits: 25 Average split duration: 3.4s Min split duration: 2 Max split duration: 6 p99 split duration: 5.76 I will attach an histogram showing the durations side by side. Region splitting takes too long because it uses a single thread in most common cases Key: HBASE-13959 URL: https://issues.apache.org/jira/browse/HBASE-13959 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.98.12 Reporter: Hari Krishna Dara Assignee: Hari Krishna Dara Fix For: 0.98.14 Attachments: HBASE-13959-2.patch, HBASE-13959-3.patch, HBASE-13959-4.patch, HBASE-13959.patch, region-split-durations-compared.png When storefiles need to be split as part of a region split, the current logic uses a threadpool with the size set to the size of the number of stores. Since most common table setup involves only a single column family, this translates to having a single store and so the threadpool is run with a single thread. However, in a write heavy workload, there could be several tens of storefiles in a store at the time of splitting, and with a threadpool size of one, these files end up getting split sequentially. With a bit of tracing, I noticed that it takes on an average of 350ms to create a single reference file, and splitting each storefile involves creating two of these, so with a storefile count of 20, it takes about 14s just to get through this phase alone (2 reference files for each storefile), pushing the total time the region is offline to 18s or more. For environments that are setup to fail fast, this makes the client exhaust all retries and fail with NotServingRegionException. The fix should increase the concurrency of this operation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13959) Region splitting takes too long because it uses a single thread in most common cases
[ https://issues.apache.org/jira/browse/HBASE-13959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Krishna Dara updated HBASE-13959: -- Attachment: region-split-durations-compared.png Region splitting takes too long because it uses a single thread in most common cases Key: HBASE-13959 URL: https://issues.apache.org/jira/browse/HBASE-13959 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.98.12 Reporter: Hari Krishna Dara Assignee: Hari Krishna Dara Fix For: 0.98.14 Attachments: HBASE-13959-2.patch, HBASE-13959-3.patch, HBASE-13959-4.patch, HBASE-13959.patch, region-split-durations-compared.png When storefiles need to be split as part of a region split, the current logic uses a threadpool with the size set to the size of the number of stores. Since most common table setup involves only a single column family, this translates to having a single store and so the threadpool is run with a single thread. However, in a write heavy workload, there could be several tens of storefiles in a store at the time of splitting, and with a threadpool size of one, these files end up getting split sequentially. With a bit of tracing, I noticed that it takes on an average of 350ms to create a single reference file, and splitting each storefile involves creating two of these, so with a storefile count of 20, it takes about 14s just to get through this phase alone (2 reference files for each storefile), pushing the total time the region is offline to 18s or more. For environments that are setup to fail fast, this makes the client exhaust all retries and fail with NotServingRegionException. The fix should increase the concurrency of this operation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13964) Skip region normalization for tables under namespace quota
[ https://issues.apache.org/jira/browse/HBASE-13964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14600979#comment-14600979 ] Mikhail Antonov commented on HBASE-13964: - Yeah, speaking of the savings.. I didn't do any profiling, but I'm fairly certain that cost of extra calls here is negligible compared to time of actual split/merge for any real-world regions :) so I wouldn't worry about it now. Skip region normalization for tables under namespace quota -- Key: HBASE-13964 URL: https://issues.apache.org/jira/browse/HBASE-13964 Project: HBase Issue Type: Task Components: Balancer, Usability Reporter: Mikhail Antonov Assignee: Ted Yu Fix For: 2.0.0, 1.2.0, 1.3.0 Attachments: 13964-branch-1-v2.txt, 13964-branch-1-v3.txt, 13964-v1.txt As [~te...@apache.org] pointed out in HBASE-13103, we need to discuss how to normalize regions of tables under namespace control. What was proposed is to disable normalization of such tables. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13964) Skip region normalization for tables under namespace quota
[ https://issues.apache.org/jira/browse/HBASE-13964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14600980#comment-14600980 ] Mikhail Antonov commented on HBASE-13964: - Thanks [~te...@apache.org]! Skip region normalization for tables under namespace quota -- Key: HBASE-13964 URL: https://issues.apache.org/jira/browse/HBASE-13964 Project: HBase Issue Type: Task Components: Balancer, Usability Reporter: Mikhail Antonov Assignee: Ted Yu Fix For: 2.0.0, 1.2.0, 1.3.0 Attachments: 13964-branch-1-v2.txt, 13964-branch-1-v3.txt, 13964-v1.txt As [~te...@apache.org] pointed out in HBASE-13103, we need to discuss how to normalize regions of tables under namespace control. What was proposed is to disable normalization of such tables. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13970) NPE during compaction in trunk
[ https://issues.apache.org/jira/browse/HBASE-13970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-13970: --- Affects Version/s: 2.0.0 Fix Version/s: 2.0.0 NPE during compaction in trunk -- Key: HBASE-13970 URL: https://issues.apache.org/jira/browse/HBASE-13970 Project: HBase Issue Type: Bug Affects Versions: 2.0.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 2.0.0 Updated the trunk.. Loaded the table with PE tool. Trigger a flush to ensure all data is flushed out to disk. When the first compaction is triggered we get an NPE and this is very easy to reproduce {code} 015-06-25 21:33:46,041 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received procedure start children changed event: /hbase/flush-table-proc/acquired 2015-06-25 21:33:46,051 INFO [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] regionserver.HRegion: Flushing 1/1 column families, memstore=76.91 MB 2015-06-25 21:33:46,159 ERROR [regionserver/stobdtserver3/10.224.54.70:16040-longCompactions-1435248183945] regionserver.CompactSplitThread: Compaction failed Request = regionName=TestTable,283887,1435248198798.028fb0324cd6eb03d5022eb8c147b7c4., storeName=info, fileCount=3, fileSize=343.4 M (114.5 M, 114.5 M, 114.5 M), priority=3, time=7536968291719985 java.lang.NullPointerException at org.apache.hadoop.hbase.regionserver.compactions.PressureAwareCompactionThroughputController$ActiveCompaction.access$700(PressureAwareCompactionThroughputController.java:79) at org.apache.hadoop.hbase.regionserver.compactions.PressureAwareCompactionThroughputController.finish(PressureAwareCompactionThroughputController.java:238) at org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:306) at org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:106) at org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:112) at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1202) at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1792) at org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:524) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 2015-06-25 21:33:46,745 INFO [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] regionserver.DefaultStoreFlusher: Flushed, sequenceid=1534, memsize=76.9 M, hasBloomFilter=true, into tmp file hdfs://stobdtserver3:9010/hbase/data/default/TestTable/028fb0324cd6eb03d5022eb8c147b7c4/.tmp/942ba0831a0047a08987439e34361a0c 2015-06-25 21:33:46,772 INFO [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] regionserver.HStore: Added hdfs://stobdtserver3:9010/hbase/data/default/TestTable/028fb0324cd6eb03d5022eb8c147b7c4/info/942ba0831a0047a08987439e34361a0c, entries=68116, sequenceid=1534, filesize=68.7 M 2015-06-25 21:33:46,773 INFO [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] regionserver.HRegion: Finished memstore flush of ~76.91 MB/80649344, currentsize=0 B/0 for region TestTable,283887,1435248198798.028fb0324cd6eb03d5022eb8c147b7c4. in 723ms, sequenceid=1534, compaction requested=true 2015-06-25 21:33:46,780 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received created event:/hbase/flush-table-proc/reached/TestTable 2015-06-25 21:33:46,790 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received created event:/hbase/flush-table-proc/abort/TestTable 2015-06-25 21:33:46,791 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received procedure abort children changed event: /hbase/flush-table-proc/abort 2015-06-25 21:33:46,803 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received procedure start children changed event: /hbase/flush-table-proc/acquired 2015-06-25 21:33:46,818 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received procedure abort children changed event: /hbase/flush-table-proc/abort {code} Will check this on what is the reason behind it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13964) Skip region normalization for tables under namespace quota
[ https://issues.apache.org/jira/browse/HBASE-13964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-13964: --- Resolution: Fixed Status: Resolved (was: Patch Available) Thanks for the reviews, Mikhail and Ashish. Skip region normalization for tables under namespace quota -- Key: HBASE-13964 URL: https://issues.apache.org/jira/browse/HBASE-13964 Project: HBase Issue Type: Task Components: Balancer, Usability Reporter: Mikhail Antonov Assignee: Ted Yu Fix For: 2.0.0, 1.2.0, 1.3.0 Attachments: 13964-branch-1-v2.txt, 13964-branch-1-v3.txt, 13964-v1.txt As [~te...@apache.org] pointed out in HBASE-13103, we need to discuss how to normalize regions of tables under namespace control. What was proposed is to disable normalization of such tables. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13214) Remove deprecated and unused methods from HTable class
[ https://issues.apache.org/jira/browse/HBASE-13214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14600993#comment-14600993 ] Ashish Singhi commented on HBASE-13214: --- Any more comments ? If not can we commit this I am worried about it getting rotten. Remove deprecated and unused methods from HTable class -- Key: HBASE-13214 URL: https://issues.apache.org/jira/browse/HBASE-13214 Project: HBase Issue Type: Sub-task Components: API Affects Versions: 2.0.0 Reporter: Mikhail Antonov Assignee: Ashish Singhi Fix For: 2.0.0 Attachments: HBASE-13214-v1.patch, HBASE-13214-v2-again-v1.patch, HBASE-13214-v2-again.patch, HBASE-13214-v2.patch, HBASE-13214-v3.patch, HBASE-13214-v3.patch, HBASE-13214.patch Methods like #getRegionLocation(), #isTableEnabled() etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HBASE-13970) NPE during compaction in trunk
[ https://issues.apache.org/jira/browse/HBASE-13970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601010#comment-14601010 ] ramkrishna.s.vasudevan edited comment on HBASE-13970 at 6/25/15 11:11 AM: -- The reason here is that there are 2 compactions getting triggered from the CompactionSplitThread on a region that is newly split. One may be due to the split that happened? So when both the compaction goes parallely then the PressureAwareCompactionController is started by the Compactor thread {code} @Override public void start(String compactionName) { activeCompactions.put(compactionName, new ActiveCompaction()); } {code} Now by the time the second compaction is in progress the first compaction completes and does a finish {code} @Override public void finish(String compactionName) { ActiveCompaction compaction = activeCompactions.remove(compactionName); long elapsedTime = Math.max(1, EnvironmentEdgeManager.currentTime() - compaction.startTime); . {code} The compactionName is going to be common here because {code} String compactionName = store.getRegionInfo().getRegionNameAsString() + # + store.getFamily().getNameAsString(); {code} When the second compaction comes for completion the compaction has already been removed and hence we get an NPE. Logs One compaction started with 4 files {code} 2015-06-25 22:07:49,135 INFO [regionserver/stobdtserver3/10.224.54.70:16040-shortCompactions-1435250269126] regionserver.HRegion: Starting compaction on info in region TestTable,283887,1435250266343.5eb54f001fd85035ab448f44d049ab84. 2015-06-25 22:07:49,135 INFO [regionserver/stobdtserver3/10.224.54.70:16040-shortCompactions-1435250269126] regionserver.HStore: Starting compaction of 4 file(s) in info of TestTable,283887,1435250266343.5eb54f001fd85035ab448f44d049ab84. into tmpdir=hdfs://stobdtserver3:9010/hbase/data/default/TestTable/5eb54f001fd85035ab448f44d049ab84/.tmp, totalSize=285.6 M 2015-06-25 22:07:49,165 INFO [regionserver/stobdtserver3/10.224.54.70:16040-shortCompactions-1435250269126] hfile.CacheConfig: blockCache=org.apache.hadoop.hbase.io.hfile.CombinedBlockCache@71f1ce16, cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, prefetchOnOpen=false 2015-06-25 22:07:49,954 INFO [regionserver/stobdtserver3/10.224.54.70:16040.logRoller] wal.FSHLog: Rolled WAL /hbase/WALs/stobdtserver3,16040,1435250244539/stobdtserver3%2C16040%2C1435250244539.default.1435250268365 with entries=90, filesize=124.14 MB; new WAL /hbase/WALs/stobdtserver3,16040,1435250244539/stobdtserver3%2C16040%2C1435250244539.default.1435250269933 20 {code} Another compaction has been started with 3 files {code} 2015-06-25 22:07:53,405 INFO [regionserver/stobdtserver3/10.224.54.70:16040-longCompactions-1435250245998] regionserver.HRegion: Starting compaction on info in region TestTable,283887,1435250266343.5eb54f001fd85035ab448f44d049ab84. 2015-06-25 22:07:53,406 INFO [regionserver/stobdtserver3/10.224.54.70:16040-longCompactions-1435250245998] regionserver.HStore: Starting compaction of 3 file(s) in info of TestTable,283887,1435250266343.5eb54f001fd85035ab448f44d049ab84. into tmpdir=hdfs://stobdtserver3:9010/hbase/data/default/TestTable/5eb54f001fd85035ab448f44d049ab84/.tmp, totalSize=343.4 M 2015-06-25 22:07:53,411 INFO [regionserver/stobdtserver3/10.224.54.70:16040-longCompactions-1435250245998] hfile.CacheConfig: blockCache=org.apache.hadoop.hbase.io.hfile.CombinedBlockCache@71f1ce16, cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, prefetchOnOpen=false 2015-06-25 22:07:54,211 INFO [MemStoreFlusher.1] regionserver.HRegion: Flushing 1/1 column families, memstore=128.23 MB 2015-06-25 22:07:54,639 INFO [regionserver/stobdtserver3/10.224.54.70:16040.logRoller] wal.FSHLog: Rolled WAL /hbase/WALs/stobdtserver3,16040,1435250244539/stobdtserver3%2C16040%2C1435250244539.default.1435250273034 with entries=90, filesize= {code} {code} 2015-06-25 22:08:09,446 INFO [regionserver/stobdtserver3/10.224.54.70:16040-shortCompactions-1435250269126] compactions.PressureAwareCompactionThroughputController: TestTable,283887,1435250266343.5eb54f001fd85035ab448f44d049ab84.#info average throughput is 19.80 MB/sec, slept 30 time(s) and total slept time is 27694 ms. 0 active compactions remaining, total limit is 10.00 MB/sec 2015-06-25 22:08:09,520 INFO [regionserver/stobdtserver3/10.224.54.70:16040-shortCompactions-1435250269126] regionserver.HStore: Completed compaction of 4 (all) file(s) in info of
[jira] [Commented] (HBASE-13964) Skip region normalization for tables under namespace quota
[ https://issues.apache.org/jira/browse/HBASE-13964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601113#comment-14601113 ] Hudson commented on HBASE-13964: SUCCESS: Integrated in HBase-1.2 #31 (See [https://builds.apache.org/job/HBase-1.2/31/]) HBASE-13964 Skip region normalization for tables under namespace quota (tedyu: rev c680e2f40927407a0699b0b1ce687867bc2bb398) * hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java * hbase-server/src/main/java/org/apache/hadoop/hbase/namespace/NamespaceAuditor.java Skip region normalization for tables under namespace quota -- Key: HBASE-13964 URL: https://issues.apache.org/jira/browse/HBASE-13964 Project: HBase Issue Type: Task Components: Balancer, Usability Reporter: Mikhail Antonov Assignee: Ted Yu Fix For: 2.0.0, 1.2.0, 1.3.0 Attachments: 13964-branch-1-v2.txt, 13964-branch-1-v3.txt, 13964-v1.txt As [~te...@apache.org] pointed out in HBASE-13103, we need to discuss how to normalize regions of tables under namespace control. What was proposed is to disable normalization of such tables. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13964) Skip region normalization for tables under namespace quota
[ https://issues.apache.org/jira/browse/HBASE-13964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14600971#comment-14600971 ] Ashish Singhi commented on HBASE-13964: --- bq. I thought about this. Even if we do that, the null check must be made. So there is not much saving. Yes null check must be made, suppose if it is not null then we call again quotaManager.getNamespaceQuotaManager() which can be save one extra call for each enabled table in the cluster. But again as you said there is not much saving, so I am ok with as it is also. Skip region normalization for tables under namespace quota -- Key: HBASE-13964 URL: https://issues.apache.org/jira/browse/HBASE-13964 Project: HBase Issue Type: Task Components: Balancer, Usability Reporter: Mikhail Antonov Assignee: Ted Yu Fix For: 2.0.0, 1.2.0, 1.3.0 Attachments: 13964-branch-1-v2.txt, 13964-branch-1-v3.txt, 13964-v1.txt As [~te...@apache.org] pointed out in HBASE-13103, we need to discuss how to normalize regions of tables under namespace control. What was proposed is to disable normalization of such tables. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13959) Region splitting takes too long because it uses a single thread in most common cases
[ https://issues.apache.org/jira/browse/HBASE-13959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601087#comment-14601087 ] Hadoop QA commented on HBASE-13959: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12741822/HBASE-13959-4.patch against master branch at commit 2df3236a4eee48bf723213a7c4ff3d29c832c8cf. ATTACHMENT ID: 12741822 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.util.TestHBaseFsck {color:red}-1 core zombie tests{color}. There are 3 zombie test(s): Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/14567//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/14567//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/14567//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/14567//console This message is automatically generated. Region splitting takes too long because it uses a single thread in most common cases Key: HBASE-13959 URL: https://issues.apache.org/jira/browse/HBASE-13959 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.98.12 Reporter: Hari Krishna Dara Assignee: Hari Krishna Dara Fix For: 0.98.14 Attachments: HBASE-13959-2.patch, HBASE-13959-3.patch, HBASE-13959-4.patch, HBASE-13959.patch, region-split-durations-compared.png When storefiles need to be split as part of a region split, the current logic uses a threadpool with the size set to the size of the number of stores. Since most common table setup involves only a single column family, this translates to having a single store and so the threadpool is run with a single thread. However, in a write heavy workload, there could be several tens of storefiles in a store at the time of splitting, and with a threadpool size of one, these files end up getting split sequentially. With a bit of tracing, I noticed that it takes on an average of 350ms to create a single reference file, and splitting each storefile involves creating two of these, so with a storefile count of 20, it takes about 14s just to get through this phase alone (2 reference files for each storefile), pushing the total time the region is offline to 18s or more. For environments that are setup to fail fast, this makes the client exhaust all retries and fail with NotServingRegionException. The fix should increase the concurrency of this operation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13964) Skip region normalization for tables under namespace quota
[ https://issues.apache.org/jira/browse/HBASE-13964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14600969#comment-14600969 ] Ted Yu commented on HBASE-13964: bq. that we are skipping region normalizing The log I added is consistent with existing log w.r.t. system table. bq. Can we extract it to a local variable ? I thought about this. Even if we do that, the null check must be made. So there is not much saving. Skip region normalization for tables under namespace quota -- Key: HBASE-13964 URL: https://issues.apache.org/jira/browse/HBASE-13964 Project: HBase Issue Type: Task Components: Balancer, Usability Reporter: Mikhail Antonov Assignee: Ted Yu Fix For: 2.0.0, 1.2.0, 1.3.0 Attachments: 13964-branch-1-v2.txt, 13964-branch-1-v3.txt, 13964-v1.txt As [~te...@apache.org] pointed out in HBASE-13103, we need to discuss how to normalize regions of tables under namespace control. What was proposed is to disable normalization of such tables. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13959) Region splitting takes too long because it uses a single thread in most common cases
[ https://issues.apache.org/jira/browse/HBASE-13959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14600973#comment-14600973 ] Ted Yu commented on HBASE-13959: Is it possible to measure performance gain with your patch ? Thanks Region splitting takes too long because it uses a single thread in most common cases Key: HBASE-13959 URL: https://issues.apache.org/jira/browse/HBASE-13959 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.98.12 Reporter: Hari Krishna Dara Assignee: Hari Krishna Dara Fix For: 0.98.14 Attachments: HBASE-13959-2.patch, HBASE-13959-3.patch, HBASE-13959-4.patch, HBASE-13959.patch When storefiles need to be split as part of a region split, the current logic uses a threadpool with the size set to the size of the number of stores. Since most common table setup involves only a single column family, this translates to having a single store and so the threadpool is run with a single thread. However, in a write heavy workload, there could be several tens of storefiles in a store at the time of splitting, and with a threadpool size of one, these files end up getting split sequentially. With a bit of tracing, I noticed that it takes on an average of 350ms to create a single reference file, and splitting each storefile involves creating two of these, so with a storefile count of 20, it takes about 14s just to get through this phase alone (2 reference files for each storefile), pushing the total time the region is offline to 18s or more. For environments that are setup to fail fast, this makes the client exhaust all retries and fail with NotServingRegionException. The fix should increase the concurrency of this operation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13970) NPE during compaction in trunk
[ https://issues.apache.org/jira/browse/HBASE-13970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601010#comment-14601010 ] ramkrishna.s.vasudevan commented on HBASE-13970: The reason here is that there are 2 compactions getting triggered from the CompactionSplitThread on a region that is newly split. One may be due to the split that happened? So when both the compaction goes parallely then the PressureAwareCompactionController is started by the Compactor thread {code} @Override public void start(String compactionName) { activeCompactions.put(compactionName, new ActiveCompaction()); } {code} Now by the time the second compaction is in progress the first compaction completes and does a finish {code} @Override public void finish(String compactionName) { ActiveCompaction compaction = activeCompactions.remove(compactionName); long elapsedTime = Math.max(1, EnvironmentEdgeManager.currentTime() - compaction.startTime); . {code} The compactionName is going to be common here because {code} String compactionName = store.getRegionInfo().getRegionNameAsString() + # + store.getFamily().getNameAsString(); {code} When the second compaction comes for completion the compaction has already been removed and hence we get an NPE. Logs One compaction started with 4 files {code} 2015-06-25 22:07:49,135 INFO [regionserver/stobdtserver3/10.224.54.70:16040-shortCompactions-1435250269126] regionserver.HRegion: Starting compaction on info in region TestTable,283887,1435250266343.5eb54f001fd85035ab448f44d049ab84. 2015-06-25 22:07:49,135 INFO [regionserver/stobdtserver3/10.224.54.70:16040-shortCompactions-1435250269126] regionserver.HStore: Starting compaction of 4 file(s) in info of TestTable,283887,1435250266343.5eb54f001fd85035ab448f44d049ab84. into tmpdir=hdfs://stobdtserver3:9010/hbase/data/default/TestTable/5eb54f001fd85035ab448f44d049ab84/.tmp, totalSize=285.6 M 2015-06-25 22:07:49,165 INFO [regionserver/stobdtserver3/10.224.54.70:16040-shortCompactions-1435250269126] hfile.CacheConfig: blockCache=org.apache.hadoop.hbase.io.hfile.CombinedBlockCache@71f1ce16, cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, prefetchOnOpen=false 2015-06-25 22:07:49,954 INFO [regionserver/stobdtserver3/10.224.54.70:16040.logRoller] wal.FSHLog: Rolled WAL /hbase/WALs/stobdtserver3,16040,1435250244539/stobdtserver3%2C16040%2C1435250244539.default.1435250268365 with entries=90, filesize=124.14 MB; new WAL /hbase/WALs/stobdtserver3,16040,1435250244539/stobdtserver3%2C16040%2C1435250244539.default.1435250269933 20 {code} Another compaction has been started with 3 files {code} 2015-06-25 22:07:53,405 INFO [regionserver/stobdtserver3/10.224.54.70:16040-longCompactions-1435250245998] regionserver.HRegion: Starting compaction on info in region TestTable,283887,1435250266343.5eb54f001fd85035ab448f44d049ab84. 2015-06-25 22:07:53,406 INFO [regionserver/stobdtserver3/10.224.54.70:16040-longCompactions-1435250245998] regionserver.HStore: Starting compaction of 3 file(s) in info of TestTable,283887,1435250266343.5eb54f001fd85035ab448f44d049ab84. into tmpdir=hdfs://stobdtserver3:9010/hbase/data/default/TestTable/5eb54f001fd85035ab448f44d049ab84/.tmp, totalSize=343.4 M 2015-06-25 22:07:53,411 INFO [regionserver/stobdtserver3/10.224.54.70:16040-longCompactions-1435250245998] hfile.CacheConfig: blockCache=org.apache.hadoop.hbase.io.hfile.CombinedBlockCache@71f1ce16, cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, prefetchOnOpen=false 2015-06-25 22:07:54,211 INFO [MemStoreFlusher.1] regionserver.HRegion: Flushing 1/1 column families, memstore=128.23 MB 2015-06-25 22:07:54,639 INFO [regionserver/stobdtserver3/10.224.54.70:16040.logRoller] wal.FSHLog: Rolled WAL /hbase/WALs/stobdtserver3,16040,1435250244539/stobdtserver3%2C16040%2C1435250244539.default.1435250273034 with entries=90, filesize= {code} {code} 2015-06-25 22:08:09,446 INFO [regionserver/stobdtserver3/10.224.54.70:16040-shortCompactions-1435250269126] compactions.PressureAwareCompactionThroughputController: TestTable,283887,1435250266343.5eb54f001fd85035ab448f44d049ab84.#info average throughput is 19.80 MB/sec, slept 30 time(s) and total slept time is 27694 ms. 0 active compactions remaining, total limit is 10.00 MB/sec 2015-06-25 22:08:09,520 INFO [regionserver/stobdtserver3/10.224.54.70:16040-shortCompactions-1435250269126] regionserver.HStore: Completed compaction of 4 (all) file(s) in info of TestTable,283887,1435250266343.5eb54f001fd85035ab448f44d049ab84. into
[jira] [Commented] (HBASE-8642) [Snapshot] List and delete snapshot by table
[ https://issues.apache.org/jira/browse/HBASE-8642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601096#comment-14601096 ] Hadoop QA commented on HBASE-8642: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12741823/HBASE-8642-v1.patch against master branch at commit 2df3236a4eee48bf723213a7c4ff3d29c832c8cf. ATTACHMENT ID: 12741823 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 lineLengths{color}. The patch introduces the following lines longer than 100: +puts \nDelete the above #{count} snapshots (y/n)? \nNOTE: Snapshot(s) matching the given regular expressions and taken after the above list is displayed will be also deleted. unless count == 0 +puts No snapshots matched the table name regular expression #{tableNameregex.to_s} and the snapshot name regular expression #{snapshotNameRegex.to_s} if count == 0 +puts #{successfullyDeleted} snapshots successfully deleted. unless successfullyDeleted == 0 {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.TestRegionRebalancing Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/14566//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/14566//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/14566//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/14566//console This message is automatically generated. [Snapshot] List and delete snapshot by table Key: HBASE-8642 URL: https://issues.apache.org/jira/browse/HBASE-8642 Project: HBase Issue Type: Improvement Components: snapshots Affects Versions: 0.98.0, 0.95.0, 0.95.1, 0.95.2 Reporter: Julian Zhou Assignee: Ashish Singhi Fix For: 2.0.0 Attachments: 8642-trunk-0.95-v0.patch, 8642-trunk-0.95-v1.patch, 8642-trunk-0.95-v2.patch, HBASE-8642-v1.patch, HBASE-8642.patch Support list and delete snapshots by table names. User scenario: A user wants to delete all the snapshots which were taken in January month for a table 't' where snapshot names starts with 'Jan'. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13964) Skip region normalization for tables under namespace quota
[ https://issues.apache.org/jira/browse/HBASE-13964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601076#comment-14601076 ] Hudson commented on HBASE-13964: FAILURE: Integrated in HBase-TRUNK #6601 (See [https://builds.apache.org/job/HBase-TRUNK/6601/]) HBASE-13964 Skip region normalization for tables under namespace quota (tedyu: rev edef3d64bce41fffbc5649ffa19b2cf80ce28d7a) * hbase-server/src/main/java/org/apache/hadoop/hbase/namespace/NamespaceAuditor.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java Skip region normalization for tables under namespace quota -- Key: HBASE-13964 URL: https://issues.apache.org/jira/browse/HBASE-13964 Project: HBase Issue Type: Task Components: Balancer, Usability Reporter: Mikhail Antonov Assignee: Ted Yu Fix For: 2.0.0, 1.2.0, 1.3.0 Attachments: 13964-branch-1-v2.txt, 13964-branch-1-v3.txt, 13964-v1.txt As [~te...@apache.org] pointed out in HBASE-13103, we need to discuss how to normalize regions of tables under namespace control. What was proposed is to disable normalization of such tables. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13964) Skip region normalization for tables under namespace quota
[ https://issues.apache.org/jira/browse/HBASE-13964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601212#comment-14601212 ] Hudson commented on HBASE-13964: SUCCESS: Integrated in HBase-1.2-IT #21 (See [https://builds.apache.org/job/HBase-1.2-IT/21/]) HBASE-13964 Skip region normalization for tables under namespace quota (tedyu: rev c680e2f40927407a0699b0b1ce687867bc2bb398) * hbase-server/src/main/java/org/apache/hadoop/hbase/namespace/NamespaceAuditor.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java Skip region normalization for tables under namespace quota -- Key: HBASE-13964 URL: https://issues.apache.org/jira/browse/HBASE-13964 Project: HBase Issue Type: Task Components: Balancer, Usability Reporter: Mikhail Antonov Assignee: Ted Yu Fix For: 2.0.0, 1.2.0, 1.3.0 Attachments: 13964-branch-1-v2.txt, 13964-branch-1-v3.txt, 13964-v1.txt As [~te...@apache.org] pointed out in HBASE-13103, we need to discuss how to normalize regions of tables under namespace control. What was proposed is to disable normalization of such tables. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HBASE-13959) Region splitting takes too long because it uses a single thread in most common cases
[ https://issues.apache.org/jira/browse/HBASE-13959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601685#comment-14601685 ] Lars Hofhansl edited comment on HBASE-13959 at 6/25/15 7:25 PM: Nice find and patch. The 8 seems to come out of nowhere. Do you have numbers for different numbers of threads? Maybe default it to 1/2 of blocking store file count...? was (Author: lhofhansl): Nice find and patch. The 8 seems to come out of nowhere. Do you have numbers for different numbers of threads? Maybe default it to 1/2 of block store file count...? Region splitting takes too long because it uses a single thread in most common cases Key: HBASE-13959 URL: https://issues.apache.org/jira/browse/HBASE-13959 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.98.12 Reporter: Hari Krishna Dara Assignee: Hari Krishna Dara Fix For: 0.98.14 Attachments: HBASE-13959-2.patch, HBASE-13959-3.patch, HBASE-13959-4.patch, HBASE-13959.patch, region-split-durations-compared.png When storefiles need to be split as part of a region split, the current logic uses a threadpool with the size set to the size of the number of stores. Since most common table setup involves only a single column family, this translates to having a single store and so the threadpool is run with a single thread. However, in a write heavy workload, there could be several tens of storefiles in a store at the time of splitting, and with a threadpool size of one, these files end up getting split sequentially. With a bit of tracing, I noticed that it takes on an average of 350ms to create a single reference file, and splitting each storefile involves creating two of these, so with a storefile count of 20, it takes about 14s just to get through this phase alone (2 reference files for each storefile), pushing the total time the region is offline to 18s or more. For environments that are setup to fail fast, this makes the client exhaust all retries and fail with NotServingRegionException. The fix should increase the concurrency of this operation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13950) Add a NoopProcedureStore for testing
[ https://issues.apache.org/jira/browse/HBASE-13950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601801#comment-14601801 ] Hadoop QA commented on HBASE-13950: --- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12741896/HBASE-13950-v1.patch against master branch at commit edef3d64bce41fffbc5649ffa19b2cf80ce28d7a. ATTACHMENT ID: 12741896 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/14573//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/14573//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/14573//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/14573//console This message is automatically generated. Add a NoopProcedureStore for testing Key: HBASE-13950 URL: https://issues.apache.org/jira/browse/HBASE-13950 Project: HBase Issue Type: Sub-task Components: proc-v2 Affects Versions: 2.0.0, 1.2.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Trivial Fix For: 2.0.0, 1.2.0 Attachments: HBASE-13950-v0-branch-1.patch, HBASE-13950-v1-branch-1.patch, HBASE-13950-v1.patch Add a NoopProcedureStore and an helper in ProcedureTestingUtil to submitAndWait() a procedure without having to do anything else. This is useful to avoid extra code like in case of TestAssignmentManager.processServerShutdownHandler() -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13336) Consistent rules for security meta table protections
[ https://issues.apache.org/jira/browse/HBASE-13336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601886#comment-14601886 ] Hadoop QA commented on HBASE-13336: --- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12741807/HBASE-13336_v2.patch against master branch at commit edef3d64bce41fffbc5649ffa19b2cf80ce28d7a. ATTACHMENT ID: 12741807 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/14575//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/14575//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/14575//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/14575//console This message is automatically generated. Consistent rules for security meta table protections Key: HBASE-13336 URL: https://issues.apache.org/jira/browse/HBASE-13336 Project: HBase Issue Type: Improvement Reporter: Andrew Purtell Assignee: Srikanth Srungarapu Fix For: 2.0.0, 0.98.14, 1.3.0 Attachments: HBASE-13336.patch, HBASE-13336_v2.patch The AccessController and VisibilityController do different things regarding protecting their meta tables. The AC allows schema changes and disable/enable if the user has permission. The VC unconditionally disallows all admin actions. Generally, bad things will happen if these meta tables are damaged, disabled, or dropped. The likely outcome is random frequent (or constant) server side op failures with nasty stack traces. On the other hand some things like column family and table attribute changes can have valid use cases. We should have consistent and sensible rules for protecting security meta tables. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13964) Skip region normalization for tables under namespace quota
[ https://issues.apache.org/jira/browse/HBASE-13964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14600955#comment-14600955 ] Ashish Singhi commented on HBASE-13964: --- +1(non-binding) Minor nits (Your call whether to not fix or fix it on commit): bq. Skipping normalizing Can we explicitly tell in the log that we are skipping region normalizing ? bq. quotaManager.getNamespaceQuotaManager() Can we extract it to a local variable ? Skip region normalization for tables under namespace quota -- Key: HBASE-13964 URL: https://issues.apache.org/jira/browse/HBASE-13964 Project: HBase Issue Type: Task Components: Balancer, Usability Reporter: Mikhail Antonov Assignee: Ted Yu Fix For: 2.0.0, 1.2.0, 1.3.0 Attachments: 13964-branch-1-v2.txt, 13964-branch-1-v3.txt, 13964-v1.txt As [~te...@apache.org] pointed out in HBASE-13103, we need to discuss how to normalize regions of tables under namespace control. What was proposed is to disable normalization of such tables. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13942) HBase client stalls during region split when client threads hbase.hconnection.threads.max
[ https://issues.apache.org/jira/browse/HBASE-13942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukund Murrali updated HBASE-13942: --- Component/s: regionserver HBase client stalls during region split when client threads hbase.hconnection.threads.max --- Key: HBASE-13942 URL: https://issues.apache.org/jira/browse/HBASE-13942 Project: HBase Issue Type: Bug Components: Client, regionserver Reporter: Mukund Murrali Performing any operataion using a single hconnection with client threads hbase.hconnection.threads.max causing the client to stall indefinetly during first region split. All the hconnection threads in client side are waiting with the following stack. hconnection-0x648a83fd-shared--pool1-t8 daemon prio=10 tid=0x7f447c003800 nid=0x62ff waiting on condition [0x7f44c72f] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x0007d768bdf0 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at java.util.concurrent.ArrayBlockingQueue.take(ArrayBlockingQueue.java:374) at org.apache.hadoop.hbase.util.BoundedCompletionService.take(BoundedCompletionService.java:74) at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:174) at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:56) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200) at org.apache.hadoop.hbase.client.ClientSmallReversedScanner.next(ClientSmallReversedScanner.java:145) at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1200) at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1109) at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.findAllLocationsOrFail(AsyncProcess.java:916) at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.groupAndSendMultiAction(AsyncProcess.java:833) at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.resubmit(AsyncProcess.java:1156) at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.receiveMultiAction(AsyncProcess.java:1296) at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.access$1200(AsyncProcess.java:574) at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl$SingleServerRequestRunnable.run(AsyncProcess.java:716) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13969) AuthenticationTokenSecretManager is never stopped in RPCServer
[ https://issues.apache.org/jira/browse/HBASE-13969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601830#comment-14601830 ] Ted Yu commented on HBASE-13969: +1 if tests pass. AuthenticationTokenSecretManager is never stopped in RPCServer -- Key: HBASE-13969 URL: https://issues.apache.org/jira/browse/HBASE-13969 Project: HBase Issue Type: Bug Affects Versions: 0.98.13 Reporter: Pankaj Kumar Assignee: Pankaj Kumar Priority: Minor Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.2, 1.3.0 Attachments: HBASE-13969-V2.patch, HBASE-13969.patch AuthenticationTokenSecretManager is never stopped in RPCServer. {code} AuthenticationTokenSecretManager mgr = createSecretManager(); if (mgr != null) { setSecretManager(mgr); mgr.start(); } {code} It should be stopped during exit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13832) Procedure V2: master fail to start due to WALProcedureStore sync failures when HDFS data nodes count is low
[ https://issues.apache.org/jira/browse/HBASE-13832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HBASE-13832: - Status: Patch Available (was: Open) Procedure V2: master fail to start due to WALProcedureStore sync failures when HDFS data nodes count is low --- Key: HBASE-13832 URL: https://issues.apache.org/jira/browse/HBASE-13832 Project: HBase Issue Type: Sub-task Components: master, proc-v2 Affects Versions: 1.1.0, 2.0.0, 1.2.0 Reporter: Stephen Yuan Jiang Assignee: Matteo Bertozzi Priority: Critical Fix For: 2.0.0, 1.1.2, 1.3.0, 1.2.1 Attachments: HBASE-13832-v0.patch, HDFSPipeline.java when the data node 3, we got failure in WALProcedureStore#syncLoop() during master start. The failure prevents master to get started. {noformat} 2015-05-29 13:27:16,625 ERROR [WALProcedureStoreSyncThread] wal.WALProcedureStore: Sync slot failed, abort. java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[DatanodeInfoWithStorage[10.333.444.555:50010,DS-3ced-93f4-47b6-9c23-1426f7a6acdc,DISK], DatanodeInfoWithStorage[10.222.666.777:50010,DS-f9c983b4-1f10-4d5e-8983-490ece56c772,DISK]], original=[DatanodeInfoWithStorage[10.333.444.555:50010,DS-3ced-93f4-47b6-9c23-1426f7a6acdc,DISK], DatanodeInfoWithStorage[10.222.666.777:50010,DS-f9c983b4-1f10-4d5e-8983- 490ece56c772,DISK]]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration. at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:951) {noformat} One proposal is to implement some similar logic as FSHLog: if IOException is thrown during syncLoop in WALProcedureStore#start(), instead of immediate abort, we could try to roll the log and see whether this resolve the issue; if the new log cannot be created or more exception from rolling the log, we then abort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13832) Procedure V2: master fail to start due to WALProcedureStore sync failures when HDFS data nodes count is low
[ https://issues.apache.org/jira/browse/HBASE-13832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HBASE-13832: - Priority: Critical (was: Major) Procedure V2: master fail to start due to WALProcedureStore sync failures when HDFS data nodes count is low --- Key: HBASE-13832 URL: https://issues.apache.org/jira/browse/HBASE-13832 Project: HBase Issue Type: Sub-task Components: master, proc-v2 Affects Versions: 2.0.0, 1.1.0, 1.2.0 Reporter: Stephen Yuan Jiang Assignee: Matteo Bertozzi Priority: Critical Fix For: 2.0.0, 1.1.2, 1.3.0, 1.2.1 Attachments: HBASE-13832-v0.patch, HDFSPipeline.java when the data node 3, we got failure in WALProcedureStore#syncLoop() during master start. The failure prevents master to get started. {noformat} 2015-05-29 13:27:16,625 ERROR [WALProcedureStoreSyncThread] wal.WALProcedureStore: Sync slot failed, abort. java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[DatanodeInfoWithStorage[10.333.444.555:50010,DS-3ced-93f4-47b6-9c23-1426f7a6acdc,DISK], DatanodeInfoWithStorage[10.222.666.777:50010,DS-f9c983b4-1f10-4d5e-8983-490ece56c772,DISK]], original=[DatanodeInfoWithStorage[10.333.444.555:50010,DS-3ced-93f4-47b6-9c23-1426f7a6acdc,DISK], DatanodeInfoWithStorage[10.222.666.777:50010,DS-f9c983b4-1f10-4d5e-8983- 490ece56c772,DISK]]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration. at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:951) {noformat} One proposal is to implement some similar logic as FSHLog: if IOException is thrown during syncLoop in WALProcedureStore#start(), instead of immediate abort, we could try to roll the log and see whether this resolve the issue; if the new log cannot be created or more exception from rolling the log, we then abort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13835) KeyValueHeap.current might be in heap when exception happens in pollRealKV
[ https://issues.apache.org/jira/browse/HBASE-13835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601819#comment-14601819 ] Hudson commented on HBASE-13835: FAILURE: Integrated in HBase-1.0 #975 (See [https://builds.apache.org/job/HBase-1.0/975/]) HBASE-13835 KeyValueHeap.current might be in heap when exception happens in pollRealKV. (zhouyingchao) (anoopsamjohn: rev 59357ced27e5ac43c654500479502bd19f1b99ae) * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ReversedKeyValueHeap.java KeyValueHeap.current might be in heap when exception happens in pollRealKV -- Key: HBASE-13835 URL: https://issues.apache.org/jira/browse/HBASE-13835 Project: HBase Issue Type: Bug Components: Scanners Reporter: zhouyingchao Assignee: zhouyingchao Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.2, 1.3.0 Attachments: HBASE-13835-001.patch, HBASE-13835-002.patch, HBASE-13835-branch1-001.patch, HBASE-13835_0.98.patch, HBASE-13835_branch-1.0.patch, HBASE-13835_branch-1.patch, HBASE-13835_branch-1.patch In a 0.94 hbase cluster, we found a NPE with following stack: {code} Exception in thread regionserver21600.leaseChecker java.lang.NullPointerException at org.apache.hadoop.hbase.KeyValue$KVComparator.compare(KeyValue.java:1530) at org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:225) at org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:201) at org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:191) at java.util.PriorityQueue.siftDownUsingComparator(PriorityQueue.java:641) at java.util.PriorityQueue.siftDown(PriorityQueue.java:612) at java.util.PriorityQueue.poll(PriorityQueue.java:523) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.close(KeyValueHeap.java:241) at org.apache.hadoop.hbase.regionserver.StoreScanner.close(StoreScanner.java:355) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.close(KeyValueHeap.java:237) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.close(HRegion.java:4302) at org.apache.hadoop.hbase.regionserver.HRegionServer$ScannerListener.leaseExpired(HRegionServer.java:3033) at org.apache.hadoop.hbase.regionserver.Leases.run(Leases.java:119) at java.lang.Thread.run(Thread.java:662) {code} Before this NPE exception, there is an exception happens in pollRealKV, which we think is the culprit of the NPE. {code} ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: java.io.IOException: Could not reseek StoreFileScanner[HFileScanner for reader reader= at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:180) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.enforceSeek(StoreFileScanner.java:371) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:366) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:116) at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:455) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:154) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:4124) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:4196) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:4067) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:4057) at org.apache.hadoop.hbase.regionserver.HRegionServer.internalNext(HRegionServer.java:2898) at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2833) at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2815) at sun.reflect.GeneratedMethodAccessor38.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.SecureRpcEngine$Server.call(SecureRpcEngine.java:337) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1583) {code} Simply put, if there is an exception happens in pollRealKV( ), the
[jira] [Commented] (HBASE-13835) KeyValueHeap.current might be in heap when exception happens in pollRealKV
[ https://issues.apache.org/jira/browse/HBASE-13835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601856#comment-14601856 ] Hudson commented on HBASE-13835: FAILURE: Integrated in HBase-1.3 #18 (See [https://builds.apache.org/job/HBase-1.3/18/]) HBASE-13835 KeyValueHeap.current might be in heap when exception happens in pollRealKV. (zhouyingchao) (anoopsamjohn: rev 92b6622d97d21700a92a4061a7b05dfc7cf5a3df) * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ReversedKeyValueHeap.java KeyValueHeap.current might be in heap when exception happens in pollRealKV -- Key: HBASE-13835 URL: https://issues.apache.org/jira/browse/HBASE-13835 Project: HBase Issue Type: Bug Components: Scanners Reporter: zhouyingchao Assignee: zhouyingchao Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.2, 1.3.0 Attachments: HBASE-13835-001.patch, HBASE-13835-002.patch, HBASE-13835-branch1-001.patch, HBASE-13835_0.98.patch, HBASE-13835_branch-1.0.patch, HBASE-13835_branch-1.patch, HBASE-13835_branch-1.patch In a 0.94 hbase cluster, we found a NPE with following stack: {code} Exception in thread regionserver21600.leaseChecker java.lang.NullPointerException at org.apache.hadoop.hbase.KeyValue$KVComparator.compare(KeyValue.java:1530) at org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:225) at org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:201) at org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:191) at java.util.PriorityQueue.siftDownUsingComparator(PriorityQueue.java:641) at java.util.PriorityQueue.siftDown(PriorityQueue.java:612) at java.util.PriorityQueue.poll(PriorityQueue.java:523) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.close(KeyValueHeap.java:241) at org.apache.hadoop.hbase.regionserver.StoreScanner.close(StoreScanner.java:355) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.close(KeyValueHeap.java:237) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.close(HRegion.java:4302) at org.apache.hadoop.hbase.regionserver.HRegionServer$ScannerListener.leaseExpired(HRegionServer.java:3033) at org.apache.hadoop.hbase.regionserver.Leases.run(Leases.java:119) at java.lang.Thread.run(Thread.java:662) {code} Before this NPE exception, there is an exception happens in pollRealKV, which we think is the culprit of the NPE. {code} ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: java.io.IOException: Could not reseek StoreFileScanner[HFileScanner for reader reader= at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:180) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.enforceSeek(StoreFileScanner.java:371) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:366) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:116) at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:455) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:154) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:4124) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:4196) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:4067) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:4057) at org.apache.hadoop.hbase.regionserver.HRegionServer.internalNext(HRegionServer.java:2898) at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2833) at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2815) at sun.reflect.GeneratedMethodAccessor38.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.SecureRpcEngine$Server.call(SecureRpcEngine.java:337) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1583) {code} Simply put, if there is an exception happens in pollRealKV( ), the
[jira] [Commented] (HBASE-13832) Procedure V2: master fail to start due to WALProcedureStore sync failures when HDFS data nodes count is low
[ https://issues.apache.org/jira/browse/HBASE-13832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601855#comment-14601855 ] Nick Dimiduk commented on HBASE-13832: -- Failure of master to start is a problem. Bumping priority and setting some fix-version targets. [~jinghe] are you able to reproduce? Can you take the attached patch for a spin? Procedure V2: master fail to start due to WALProcedureStore sync failures when HDFS data nodes count is low --- Key: HBASE-13832 URL: https://issues.apache.org/jira/browse/HBASE-13832 Project: HBase Issue Type: Sub-task Components: master, proc-v2 Affects Versions: 2.0.0, 1.1.0, 1.2.0 Reporter: Stephen Yuan Jiang Assignee: Matteo Bertozzi Priority: Critical Fix For: 2.0.0, 1.1.2, 1.3.0, 1.2.1 Attachments: HBASE-13832-v0.patch, HDFSPipeline.java when the data node 3, we got failure in WALProcedureStore#syncLoop() during master start. The failure prevents master to get started. {noformat} 2015-05-29 13:27:16,625 ERROR [WALProcedureStoreSyncThread] wal.WALProcedureStore: Sync slot failed, abort. java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[DatanodeInfoWithStorage[10.333.444.555:50010,DS-3ced-93f4-47b6-9c23-1426f7a6acdc,DISK], DatanodeInfoWithStorage[10.222.666.777:50010,DS-f9c983b4-1f10-4d5e-8983-490ece56c772,DISK]], original=[DatanodeInfoWithStorage[10.333.444.555:50010,DS-3ced-93f4-47b6-9c23-1426f7a6acdc,DISK], DatanodeInfoWithStorage[10.222.666.777:50010,DS-f9c983b4-1f10-4d5e-8983- 490ece56c772,DISK]]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration. at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:951) {noformat} One proposal is to implement some similar logic as FSHLog: if IOException is thrown during syncLoop in WALProcedureStore#start(), instead of immediate abort, we could try to roll the log and see whether this resolve the issue; if the new log cannot be created or more exception from rolling the log, we then abort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13969) AuthenticationTokenSecretManager is never stopped in RPCServer
[ https://issues.apache.org/jira/browse/HBASE-13969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601885#comment-14601885 ] Hadoop QA commented on HBASE-13969: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12741921/HBASE-13969-V2.patch against master branch at commit d9ba4d5bb513624fef8787f04b18a57ac5eb5203. ATTACHMENT ID: 12741921 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/14576//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/14576//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/14576//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/14576//console This message is automatically generated. AuthenticationTokenSecretManager is never stopped in RPCServer -- Key: HBASE-13969 URL: https://issues.apache.org/jira/browse/HBASE-13969 Project: HBase Issue Type: Bug Affects Versions: 0.98.13 Reporter: Pankaj Kumar Assignee: Pankaj Kumar Priority: Minor Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.2, 1.3.0 Attachments: HBASE-13969-V2.patch, HBASE-13969.patch AuthenticationTokenSecretManager is never stopped in RPCServer. {code} AuthenticationTokenSecretManager mgr = createSecretManager(); if (mgr != null) { setSecretManager(mgr); mgr.start(); } {code} It should be stopped during exit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-13972) Hanging test finder should report killed test
Ted Yu created HBASE-13972: -- Summary: Hanging test finder should report killed test Key: HBASE-13972 URL: https://issues.apache.org/jira/browse/HBASE-13972 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Priority: Minor I was looking at https://builds.apache.org/job/PreCommit-HBASE-Build/14576/console and found that findHangingTests.py didn't report any hanging / failing test. {code} Running org.apache.hadoop.hbase.procedure2.store.TestProcedureStoreTracker Killed {code} It turns out that findHangingTests.py didn't distinguish the state for tests that were killed. Patch coming shortly which allows printing of killed test(s) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13835) KeyValueHeap.current might be in heap when exception happens in pollRealKV
[ https://issues.apache.org/jira/browse/HBASE-13835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601938#comment-14601938 ] Hudson commented on HBASE-13835: SUCCESS: Integrated in HBase-1.2-IT #22 (See [https://builds.apache.org/job/HBase-1.2-IT/22/]) HBASE-13835 KeyValueHeap.current might be in heap when exception happens in pollRealKV. (zhouyingchao) (anoopsamjohn: rev 27fd3441f5a69a6bb795e28da37f5039545a41e7) * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ReversedKeyValueHeap.java * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java KeyValueHeap.current might be in heap when exception happens in pollRealKV -- Key: HBASE-13835 URL: https://issues.apache.org/jira/browse/HBASE-13835 Project: HBase Issue Type: Bug Components: Scanners Reporter: zhouyingchao Assignee: zhouyingchao Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.2, 1.3.0 Attachments: HBASE-13835-001.patch, HBASE-13835-002.patch, HBASE-13835-branch1-001.patch, HBASE-13835_0.98.patch, HBASE-13835_branch-1.0.patch, HBASE-13835_branch-1.patch, HBASE-13835_branch-1.patch In a 0.94 hbase cluster, we found a NPE with following stack: {code} Exception in thread regionserver21600.leaseChecker java.lang.NullPointerException at org.apache.hadoop.hbase.KeyValue$KVComparator.compare(KeyValue.java:1530) at org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:225) at org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:201) at org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:191) at java.util.PriorityQueue.siftDownUsingComparator(PriorityQueue.java:641) at java.util.PriorityQueue.siftDown(PriorityQueue.java:612) at java.util.PriorityQueue.poll(PriorityQueue.java:523) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.close(KeyValueHeap.java:241) at org.apache.hadoop.hbase.regionserver.StoreScanner.close(StoreScanner.java:355) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.close(KeyValueHeap.java:237) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.close(HRegion.java:4302) at org.apache.hadoop.hbase.regionserver.HRegionServer$ScannerListener.leaseExpired(HRegionServer.java:3033) at org.apache.hadoop.hbase.regionserver.Leases.run(Leases.java:119) at java.lang.Thread.run(Thread.java:662) {code} Before this NPE exception, there is an exception happens in pollRealKV, which we think is the culprit of the NPE. {code} ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: java.io.IOException: Could not reseek StoreFileScanner[HFileScanner for reader reader= at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:180) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.enforceSeek(StoreFileScanner.java:371) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:366) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:116) at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:455) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:154) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:4124) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:4196) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:4067) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:4057) at org.apache.hadoop.hbase.regionserver.HRegionServer.internalNext(HRegionServer.java:2898) at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2833) at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2815) at sun.reflect.GeneratedMethodAccessor38.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.SecureRpcEngine$Server.call(SecureRpcEngine.java:337) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1583) {code} Simply put, if there is an exception happens in pollRealKV( ), the
[jira] [Commented] (HBASE-13835) KeyValueHeap.current might be in heap when exception happens in pollRealKV
[ https://issues.apache.org/jira/browse/HBASE-13835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601952#comment-14601952 ] Hudson commented on HBASE-13835: SUCCESS: Integrated in HBase-1.3-IT #7 (See [https://builds.apache.org/job/HBase-1.3-IT/7/]) HBASE-13835 KeyValueHeap.current might be in heap when exception happens in pollRealKV. (zhouyingchao) (anoopsamjohn: rev 92b6622d97d21700a92a4061a7b05dfc7cf5a3df) * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ReversedKeyValueHeap.java KeyValueHeap.current might be in heap when exception happens in pollRealKV -- Key: HBASE-13835 URL: https://issues.apache.org/jira/browse/HBASE-13835 Project: HBase Issue Type: Bug Components: Scanners Reporter: zhouyingchao Assignee: zhouyingchao Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.2, 1.3.0 Attachments: HBASE-13835-001.patch, HBASE-13835-002.patch, HBASE-13835-branch1-001.patch, HBASE-13835_0.98.patch, HBASE-13835_branch-1.0.patch, HBASE-13835_branch-1.patch, HBASE-13835_branch-1.patch In a 0.94 hbase cluster, we found a NPE with following stack: {code} Exception in thread regionserver21600.leaseChecker java.lang.NullPointerException at org.apache.hadoop.hbase.KeyValue$KVComparator.compare(KeyValue.java:1530) at org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:225) at org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:201) at org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:191) at java.util.PriorityQueue.siftDownUsingComparator(PriorityQueue.java:641) at java.util.PriorityQueue.siftDown(PriorityQueue.java:612) at java.util.PriorityQueue.poll(PriorityQueue.java:523) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.close(KeyValueHeap.java:241) at org.apache.hadoop.hbase.regionserver.StoreScanner.close(StoreScanner.java:355) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.close(KeyValueHeap.java:237) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.close(HRegion.java:4302) at org.apache.hadoop.hbase.regionserver.HRegionServer$ScannerListener.leaseExpired(HRegionServer.java:3033) at org.apache.hadoop.hbase.regionserver.Leases.run(Leases.java:119) at java.lang.Thread.run(Thread.java:662) {code} Before this NPE exception, there is an exception happens in pollRealKV, which we think is the culprit of the NPE. {code} ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: java.io.IOException: Could not reseek StoreFileScanner[HFileScanner for reader reader= at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:180) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.enforceSeek(StoreFileScanner.java:371) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:366) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:116) at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:455) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:154) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:4124) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:4196) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:4067) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:4057) at org.apache.hadoop.hbase.regionserver.HRegionServer.internalNext(HRegionServer.java:2898) at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2833) at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2815) at sun.reflect.GeneratedMethodAccessor38.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.SecureRpcEngine$Server.call(SecureRpcEngine.java:337) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1583) {code} Simply put, if there is an exception happens in pollRealKV( ), the
[jira] [Commented] (HBASE-13670) [HBase MOB] ExpiredMobFileCleaner tool deletes mob files later for one more day after they are expired
[ https://issues.apache.org/jira/browse/HBASE-13670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601259#comment-14601259 ] Gururaj Shetty commented on HBASE-13670: Hi [~anoop.hbase] Incorporated you comments and attached the patch. Thanks [HBase MOB] ExpiredMobFileCleaner tool deletes mob files later for one more day after they are expired -- Key: HBASE-13670 URL: https://issues.apache.org/jira/browse/HBASE-13670 Project: HBase Issue Type: Improvement Components: documentation, mob Affects Versions: hbase-11339 Reporter: Y. SREENIVASULU REDDY Assignee: Gururaj Shetty Fix For: hbase-11339 Attachments: HBASE-13670.patch, HBASE-13670_01.patch Currently the ExpiredMobFileCleaner cleans the expired mob file according to the date in the mob file name. The minimum unit of the date is day. So ExpiredMobFileCleaner only cleans the expired mob files later for one more day after they are expired. We need to document this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13959) Region splitting takes too long because it uses a single thread in most common cases
[ https://issues.apache.org/jira/browse/HBASE-13959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601382#comment-14601382 ] Ted Yu commented on HBASE-13959: Nice results in performance improvement. Should the new constants be defined in SplitTransactionImpl.java ? They're only referenced by SplitTransactionImpl.java Region splitting takes too long because it uses a single thread in most common cases Key: HBASE-13959 URL: https://issues.apache.org/jira/browse/HBASE-13959 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.98.12 Reporter: Hari Krishna Dara Assignee: Hari Krishna Dara Fix For: 0.98.14 Attachments: HBASE-13959-2.patch, HBASE-13959-3.patch, HBASE-13959-4.patch, HBASE-13959.patch, region-split-durations-compared.png When storefiles need to be split as part of a region split, the current logic uses a threadpool with the size set to the size of the number of stores. Since most common table setup involves only a single column family, this translates to having a single store and so the threadpool is run with a single thread. However, in a write heavy workload, there could be several tens of storefiles in a store at the time of splitting, and with a threadpool size of one, these files end up getting split sequentially. With a bit of tracing, I noticed that it takes on an average of 350ms to create a single reference file, and splitting each storefile involves creating two of these, so with a storefile count of 20, it takes about 14s just to get through this phase alone (2 reference files for each storefile), pushing the total time the region is offline to 18s or more. For environments that are setup to fail fast, this makes the client exhaust all retries and fail with NotServingRegionException. The fix should increase the concurrency of this operation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13967) add jdk profiles for jdk.tools dependency
[ https://issues.apache.org/jira/browse/HBASE-13967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601386#comment-14601386 ] stack commented on HBASE-13967: --- +1 add jdk profiles for jdk.tools dependency - Key: HBASE-13967 URL: https://issues.apache.org/jira/browse/HBASE-13967 Project: HBase Issue Type: Sub-task Affects Versions: 1.2.0 Reporter: Sean Busbey Assignee: Sean Busbey Fix For: 2.0.0, 1.2.0, 1.3.0 Attachments: HBASE-13967.1.patch Right now hbase-annotations uses jdk7 jdk.tools and exposes that to downstream via hbase-client. We need it for building and using our custom doclet, but we should be using a jdk.tools version based on our java version (use jdk activated profiles to set it) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13970) NPE during compaction in trunk
[ https://issues.apache.org/jira/browse/HBASE-13970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang updated HBASE-13970: -- Attachment: HBASE-13970.patch A simple patch that add an increment AtomicInteger as the suffix of compaction name. [~ram_krish] Could you please test whether this patch works? Thanks. And also, thanks for the great digging. NPE during compaction in trunk -- Key: HBASE-13970 URL: https://issues.apache.org/jira/browse/HBASE-13970 Project: HBase Issue Type: Bug Affects Versions: 2.0.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 2.0.0 Attachments: HBASE-13970.patch Updated the trunk.. Loaded the table with PE tool. Trigger a flush to ensure all data is flushed out to disk. When the first compaction is triggered we get an NPE and this is very easy to reproduce {code} 015-06-25 21:33:46,041 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received procedure start children changed event: /hbase/flush-table-proc/acquired 2015-06-25 21:33:46,051 INFO [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] regionserver.HRegion: Flushing 1/1 column families, memstore=76.91 MB 2015-06-25 21:33:46,159 ERROR [regionserver/stobdtserver3/10.224.54.70:16040-longCompactions-1435248183945] regionserver.CompactSplitThread: Compaction failed Request = regionName=TestTable,283887,1435248198798.028fb0324cd6eb03d5022eb8c147b7c4., storeName=info, fileCount=3, fileSize=343.4 M (114.5 M, 114.5 M, 114.5 M), priority=3, time=7536968291719985 java.lang.NullPointerException at org.apache.hadoop.hbase.regionserver.compactions.PressureAwareCompactionThroughputController$ActiveCompaction.access$700(PressureAwareCompactionThroughputController.java:79) at org.apache.hadoop.hbase.regionserver.compactions.PressureAwareCompactionThroughputController.finish(PressureAwareCompactionThroughputController.java:238) at org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:306) at org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:106) at org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:112) at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1202) at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1792) at org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:524) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 2015-06-25 21:33:46,745 INFO [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] regionserver.DefaultStoreFlusher: Flushed, sequenceid=1534, memsize=76.9 M, hasBloomFilter=true, into tmp file hdfs://stobdtserver3:9010/hbase/data/default/TestTable/028fb0324cd6eb03d5022eb8c147b7c4/.tmp/942ba0831a0047a08987439e34361a0c 2015-06-25 21:33:46,772 INFO [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] regionserver.HStore: Added hdfs://stobdtserver3:9010/hbase/data/default/TestTable/028fb0324cd6eb03d5022eb8c147b7c4/info/942ba0831a0047a08987439e34361a0c, entries=68116, sequenceid=1534, filesize=68.7 M 2015-06-25 21:33:46,773 INFO [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] regionserver.HRegion: Finished memstore flush of ~76.91 MB/80649344, currentsize=0 B/0 for region TestTable,283887,1435248198798.028fb0324cd6eb03d5022eb8c147b7c4. in 723ms, sequenceid=1534, compaction requested=true 2015-06-25 21:33:46,780 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received created event:/hbase/flush-table-proc/reached/TestTable 2015-06-25 21:33:46,790 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received created event:/hbase/flush-table-proc/abort/TestTable 2015-06-25 21:33:46,791 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received procedure abort children changed event: /hbase/flush-table-proc/abort 2015-06-25 21:33:46,803 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received procedure start children changed event: /hbase/flush-table-proc/acquired 2015-06-25 21:33:46,818 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received procedure abort children changed event: /hbase/flush-table-proc/abort {code} Will check this on what is the reason behind it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13970) NPE during compaction in trunk
[ https://issues.apache.org/jira/browse/HBASE-13970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601376#comment-14601376 ] Anoop Sam John commented on HBASE-13970: Ya this can work. NPE during compaction in trunk -- Key: HBASE-13970 URL: https://issues.apache.org/jira/browse/HBASE-13970 Project: HBase Issue Type: Bug Affects Versions: 2.0.0, 0.98.13, 1.2.0, 1.1.1 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 2.0.0, 0.98.14, 1.2.0, 1.1.2 Attachments: HBASE-13970.patch Updated the trunk.. Loaded the table with PE tool. Trigger a flush to ensure all data is flushed out to disk. When the first compaction is triggered we get an NPE and this is very easy to reproduce {code} 015-06-25 21:33:46,041 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received procedure start children changed event: /hbase/flush-table-proc/acquired 2015-06-25 21:33:46,051 INFO [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] regionserver.HRegion: Flushing 1/1 column families, memstore=76.91 MB 2015-06-25 21:33:46,159 ERROR [regionserver/stobdtserver3/10.224.54.70:16040-longCompactions-1435248183945] regionserver.CompactSplitThread: Compaction failed Request = regionName=TestTable,283887,1435248198798.028fb0324cd6eb03d5022eb8c147b7c4., storeName=info, fileCount=3, fileSize=343.4 M (114.5 M, 114.5 M, 114.5 M), priority=3, time=7536968291719985 java.lang.NullPointerException at org.apache.hadoop.hbase.regionserver.compactions.PressureAwareCompactionThroughputController$ActiveCompaction.access$700(PressureAwareCompactionThroughputController.java:79) at org.apache.hadoop.hbase.regionserver.compactions.PressureAwareCompactionThroughputController.finish(PressureAwareCompactionThroughputController.java:238) at org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:306) at org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:106) at org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:112) at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1202) at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1792) at org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:524) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 2015-06-25 21:33:46,745 INFO [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] regionserver.DefaultStoreFlusher: Flushed, sequenceid=1534, memsize=76.9 M, hasBloomFilter=true, into tmp file hdfs://stobdtserver3:9010/hbase/data/default/TestTable/028fb0324cd6eb03d5022eb8c147b7c4/.tmp/942ba0831a0047a08987439e34361a0c 2015-06-25 21:33:46,772 INFO [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] regionserver.HStore: Added hdfs://stobdtserver3:9010/hbase/data/default/TestTable/028fb0324cd6eb03d5022eb8c147b7c4/info/942ba0831a0047a08987439e34361a0c, entries=68116, sequenceid=1534, filesize=68.7 M 2015-06-25 21:33:46,773 INFO [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] regionserver.HRegion: Finished memstore flush of ~76.91 MB/80649344, currentsize=0 B/0 for region TestTable,283887,1435248198798.028fb0324cd6eb03d5022eb8c147b7c4. in 723ms, sequenceid=1534, compaction requested=true 2015-06-25 21:33:46,780 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received created event:/hbase/flush-table-proc/reached/TestTable 2015-06-25 21:33:46,790 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received created event:/hbase/flush-table-proc/abort/TestTable 2015-06-25 21:33:46,791 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received procedure abort children changed event: /hbase/flush-table-proc/abort 2015-06-25 21:33:46,803 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received procedure start children changed event: /hbase/flush-table-proc/acquired 2015-06-25 21:33:46,818 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received procedure abort children changed event: /hbase/flush-table-proc/abort {code} Will check this on what is the reason behind it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13670) [HBase MOB] ExpiredMobFileCleaner tool deletes mob files later for one more day after they are expired
[ https://issues.apache.org/jira/browse/HBASE-13670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601281#comment-14601281 ] Hadoop QA commented on HBASE-13670: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12741858/HBASE-13670_01.patch against master branch at commit edef3d64bce41fffbc5649ffa19b2cf80ce28d7a. ATTACHMENT ID: 12741858 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+0 tests included{color}. The patch appears to be a documentation patch that doesn't require tests. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/14569//console This message is automatically generated. [HBase MOB] ExpiredMobFileCleaner tool deletes mob files later for one more day after they are expired -- Key: HBASE-13670 URL: https://issues.apache.org/jira/browse/HBASE-13670 Project: HBase Issue Type: Improvement Components: documentation, mob Affects Versions: hbase-11339 Reporter: Y. SREENIVASULU REDDY Assignee: Gururaj Shetty Fix For: hbase-11339 Attachments: HBASE-13670.patch, HBASE-13670_01.patch Currently the ExpiredMobFileCleaner cleans the expired mob file according to the date in the mob file name. The minimum unit of the date is day. So ExpiredMobFileCleaner only cleans the expired mob files later for one more day after they are expired. We need to document this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13970) NPE during compaction in trunk
[ https://issues.apache.org/jira/browse/HBASE-13970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601295#comment-14601295 ] Duo Zhang commented on HBASE-13970: --- Oh, this should be a bug on all branches which contains HBASE-8329. I used to assume that compactions can not be executed parallel on the same store, so the compationName only contains regionName and storeName. I think we could add a counter in the name to avoid conflict. NPE during compaction in trunk -- Key: HBASE-13970 URL: https://issues.apache.org/jira/browse/HBASE-13970 Project: HBase Issue Type: Bug Affects Versions: 2.0.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 2.0.0 Updated the trunk.. Loaded the table with PE tool. Trigger a flush to ensure all data is flushed out to disk. When the first compaction is triggered we get an NPE and this is very easy to reproduce {code} 015-06-25 21:33:46,041 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received procedure start children changed event: /hbase/flush-table-proc/acquired 2015-06-25 21:33:46,051 INFO [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] regionserver.HRegion: Flushing 1/1 column families, memstore=76.91 MB 2015-06-25 21:33:46,159 ERROR [regionserver/stobdtserver3/10.224.54.70:16040-longCompactions-1435248183945] regionserver.CompactSplitThread: Compaction failed Request = regionName=TestTable,283887,1435248198798.028fb0324cd6eb03d5022eb8c147b7c4., storeName=info, fileCount=3, fileSize=343.4 M (114.5 M, 114.5 M, 114.5 M), priority=3, time=7536968291719985 java.lang.NullPointerException at org.apache.hadoop.hbase.regionserver.compactions.PressureAwareCompactionThroughputController$ActiveCompaction.access$700(PressureAwareCompactionThroughputController.java:79) at org.apache.hadoop.hbase.regionserver.compactions.PressureAwareCompactionThroughputController.finish(PressureAwareCompactionThroughputController.java:238) at org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:306) at org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:106) at org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:112) at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1202) at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1792) at org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:524) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 2015-06-25 21:33:46,745 INFO [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] regionserver.DefaultStoreFlusher: Flushed, sequenceid=1534, memsize=76.9 M, hasBloomFilter=true, into tmp file hdfs://stobdtserver3:9010/hbase/data/default/TestTable/028fb0324cd6eb03d5022eb8c147b7c4/.tmp/942ba0831a0047a08987439e34361a0c 2015-06-25 21:33:46,772 INFO [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] regionserver.HStore: Added hdfs://stobdtserver3:9010/hbase/data/default/TestTable/028fb0324cd6eb03d5022eb8c147b7c4/info/942ba0831a0047a08987439e34361a0c, entries=68116, sequenceid=1534, filesize=68.7 M 2015-06-25 21:33:46,773 INFO [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] regionserver.HRegion: Finished memstore flush of ~76.91 MB/80649344, currentsize=0 B/0 for region TestTable,283887,1435248198798.028fb0324cd6eb03d5022eb8c147b7c4. in 723ms, sequenceid=1534, compaction requested=true 2015-06-25 21:33:46,780 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received created event:/hbase/flush-table-proc/reached/TestTable 2015-06-25 21:33:46,790 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received created event:/hbase/flush-table-proc/abort/TestTable 2015-06-25 21:33:46,791 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received procedure abort children changed event: /hbase/flush-table-proc/abort 2015-06-25 21:33:46,803 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received procedure start children changed event: /hbase/flush-table-proc/acquired 2015-06-25 21:33:46,818 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received procedure abort children changed event: /hbase/flush-table-proc/abort {code} Will check this on what is the reason behind it. -- This message was sent by
[jira] [Commented] (HBASE-13964) Skip region normalization for tables under namespace quota
[ https://issues.apache.org/jira/browse/HBASE-13964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601239#comment-14601239 ] Hudson commented on HBASE-13964: SUCCESS: Integrated in HBase-1.3-IT #6 (See [https://builds.apache.org/job/HBase-1.3-IT/6/]) HBASE-13964 Skip region normalization for tables under namespace quota (tedyu: rev ed72fa212875814f7e44eebaf7789710ec670c6a) * hbase-server/src/main/java/org/apache/hadoop/hbase/namespace/NamespaceAuditor.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java Skip region normalization for tables under namespace quota -- Key: HBASE-13964 URL: https://issues.apache.org/jira/browse/HBASE-13964 Project: HBase Issue Type: Task Components: Balancer, Usability Reporter: Mikhail Antonov Assignee: Ted Yu Fix For: 2.0.0, 1.2.0, 1.3.0 Attachments: 13964-branch-1-v2.txt, 13964-branch-1-v3.txt, 13964-v1.txt As [~te...@apache.org] pointed out in HBASE-13103, we need to discuss how to normalize regions of tables under namespace control. What was proposed is to disable normalization of such tables. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13670) [HBase MOB] ExpiredMobFileCleaner tool deletes mob files later for one more day after they are expired
[ https://issues.apache.org/jira/browse/HBASE-13670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gururaj Shetty updated HBASE-13670: --- Attachment: HBASE-13670_01.patch [HBase MOB] ExpiredMobFileCleaner tool deletes mob files later for one more day after they are expired -- Key: HBASE-13670 URL: https://issues.apache.org/jira/browse/HBASE-13670 Project: HBase Issue Type: Improvement Components: documentation, mob Affects Versions: hbase-11339 Reporter: Y. SREENIVASULU REDDY Assignee: Gururaj Shetty Fix For: hbase-11339 Attachments: HBASE-13670.patch, HBASE-13670_01.patch Currently the ExpiredMobFileCleaner cleans the expired mob file according to the date in the mob file name. The minimum unit of the date is day. So ExpiredMobFileCleaner only cleans the expired mob files later for one more day after they are expired. We need to document this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13970) NPE during compaction in trunk
[ https://issues.apache.org/jira/browse/HBASE-13970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang updated HBASE-13970: -- Fix Version/s: 1.1.2 1.2.0 0.98.14 Affects Version/s: 1.1.1 1.2.0 0.98.13 Status: Patch Available (was: Open) NPE during compaction in trunk -- Key: HBASE-13970 URL: https://issues.apache.org/jira/browse/HBASE-13970 Project: HBase Issue Type: Bug Affects Versions: 0.98.13, 2.0.0, 1.2.0, 1.1.1 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 2.0.0, 0.98.14, 1.2.0, 1.1.2 Attachments: HBASE-13970.patch Updated the trunk.. Loaded the table with PE tool. Trigger a flush to ensure all data is flushed out to disk. When the first compaction is triggered we get an NPE and this is very easy to reproduce {code} 015-06-25 21:33:46,041 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received procedure start children changed event: /hbase/flush-table-proc/acquired 2015-06-25 21:33:46,051 INFO [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] regionserver.HRegion: Flushing 1/1 column families, memstore=76.91 MB 2015-06-25 21:33:46,159 ERROR [regionserver/stobdtserver3/10.224.54.70:16040-longCompactions-1435248183945] regionserver.CompactSplitThread: Compaction failed Request = regionName=TestTable,283887,1435248198798.028fb0324cd6eb03d5022eb8c147b7c4., storeName=info, fileCount=3, fileSize=343.4 M (114.5 M, 114.5 M, 114.5 M), priority=3, time=7536968291719985 java.lang.NullPointerException at org.apache.hadoop.hbase.regionserver.compactions.PressureAwareCompactionThroughputController$ActiveCompaction.access$700(PressureAwareCompactionThroughputController.java:79) at org.apache.hadoop.hbase.regionserver.compactions.PressureAwareCompactionThroughputController.finish(PressureAwareCompactionThroughputController.java:238) at org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:306) at org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:106) at org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:112) at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1202) at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1792) at org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:524) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 2015-06-25 21:33:46,745 INFO [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] regionserver.DefaultStoreFlusher: Flushed, sequenceid=1534, memsize=76.9 M, hasBloomFilter=true, into tmp file hdfs://stobdtserver3:9010/hbase/data/default/TestTable/028fb0324cd6eb03d5022eb8c147b7c4/.tmp/942ba0831a0047a08987439e34361a0c 2015-06-25 21:33:46,772 INFO [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] regionserver.HStore: Added hdfs://stobdtserver3:9010/hbase/data/default/TestTable/028fb0324cd6eb03d5022eb8c147b7c4/info/942ba0831a0047a08987439e34361a0c, entries=68116, sequenceid=1534, filesize=68.7 M 2015-06-25 21:33:46,773 INFO [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] regionserver.HRegion: Finished memstore flush of ~76.91 MB/80649344, currentsize=0 B/0 for region TestTable,283887,1435248198798.028fb0324cd6eb03d5022eb8c147b7c4. in 723ms, sequenceid=1534, compaction requested=true 2015-06-25 21:33:46,780 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received created event:/hbase/flush-table-proc/reached/TestTable 2015-06-25 21:33:46,790 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received created event:/hbase/flush-table-proc/abort/TestTable 2015-06-25 21:33:46,791 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received procedure abort children changed event: /hbase/flush-table-proc/abort 2015-06-25 21:33:46,803 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received procedure start children changed event: /hbase/flush-table-proc/acquired 2015-06-25 21:33:46,818 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received procedure abort children changed event: /hbase/flush-table-proc/abort {code} Will check this on what is the reason behind it. -- This message was sent by Atlassian JIRA
[jira] [Commented] (HBASE-13970) NPE during compaction in trunk
[ https://issues.apache.org/jira/browse/HBASE-13970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601451#comment-14601451 ] ramkrishna.s.vasudevan commented on HBASE-13970: I will test this out. But this should work. Can we create this per counter per store? NPE during compaction in trunk -- Key: HBASE-13970 URL: https://issues.apache.org/jira/browse/HBASE-13970 Project: HBase Issue Type: Bug Affects Versions: 2.0.0, 0.98.13, 1.2.0, 1.1.1 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 2.0.0, 0.98.14, 1.2.0, 1.1.2 Attachments: HBASE-13970.patch Updated the trunk.. Loaded the table with PE tool. Trigger a flush to ensure all data is flushed out to disk. When the first compaction is triggered we get an NPE and this is very easy to reproduce {code} 015-06-25 21:33:46,041 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received procedure start children changed event: /hbase/flush-table-proc/acquired 2015-06-25 21:33:46,051 INFO [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] regionserver.HRegion: Flushing 1/1 column families, memstore=76.91 MB 2015-06-25 21:33:46,159 ERROR [regionserver/stobdtserver3/10.224.54.70:16040-longCompactions-1435248183945] regionserver.CompactSplitThread: Compaction failed Request = regionName=TestTable,283887,1435248198798.028fb0324cd6eb03d5022eb8c147b7c4., storeName=info, fileCount=3, fileSize=343.4 M (114.5 M, 114.5 M, 114.5 M), priority=3, time=7536968291719985 java.lang.NullPointerException at org.apache.hadoop.hbase.regionserver.compactions.PressureAwareCompactionThroughputController$ActiveCompaction.access$700(PressureAwareCompactionThroughputController.java:79) at org.apache.hadoop.hbase.regionserver.compactions.PressureAwareCompactionThroughputController.finish(PressureAwareCompactionThroughputController.java:238) at org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:306) at org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:106) at org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:112) at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1202) at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1792) at org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:524) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 2015-06-25 21:33:46,745 INFO [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] regionserver.DefaultStoreFlusher: Flushed, sequenceid=1534, memsize=76.9 M, hasBloomFilter=true, into tmp file hdfs://stobdtserver3:9010/hbase/data/default/TestTable/028fb0324cd6eb03d5022eb8c147b7c4/.tmp/942ba0831a0047a08987439e34361a0c 2015-06-25 21:33:46,772 INFO [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] regionserver.HStore: Added hdfs://stobdtserver3:9010/hbase/data/default/TestTable/028fb0324cd6eb03d5022eb8c147b7c4/info/942ba0831a0047a08987439e34361a0c, entries=68116, sequenceid=1534, filesize=68.7 M 2015-06-25 21:33:46,773 INFO [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] regionserver.HRegion: Finished memstore flush of ~76.91 MB/80649344, currentsize=0 B/0 for region TestTable,283887,1435248198798.028fb0324cd6eb03d5022eb8c147b7c4. in 723ms, sequenceid=1534, compaction requested=true 2015-06-25 21:33:46,780 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received created event:/hbase/flush-table-proc/reached/TestTable 2015-06-25 21:33:46,790 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received created event:/hbase/flush-table-proc/abort/TestTable 2015-06-25 21:33:46,791 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received procedure abort children changed event: /hbase/flush-table-proc/abort 2015-06-25 21:33:46,803 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received procedure start children changed event: /hbase/flush-table-proc/acquired 2015-06-25 21:33:46,818 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received procedure abort children changed event: /hbase/flush-table-proc/abort {code} Will check this on what is the reason behind it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HBASE-13897) OOM occurs when Import importing a row that including too much KeyValue
[ https://issues.apache.org/jira/browse/HBASE-13897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu reassigned HBASE-13897: -- Assignee: Liu Junhong (was: Ted Yu) OOM occurs when Import importing a row that including too much KeyValue Key: HBASE-13897 URL: https://issues.apache.org/jira/browse/HBASE-13897 Project: HBase Issue Type: Bug Affects Versions: 0.98.13 Reporter: Liu Junhong Assignee: Liu Junhong Fix For: 0.98.14 Attachments: HBASE-13897-0.98.patch When importing a row with too many KeyValues (may have too many columns or versions),KeyValueReducer will incur OOM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13939) Make HFileReaderImpl.getFirstKeyInBlock() to return a Cell
[ https://issues.apache.org/jira/browse/HBASE-13939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601562#comment-14601562 ] ramkrishna.s.vasudevan commented on HBASE-13939: Ping!!! Make HFileReaderImpl.getFirstKeyInBlock() to return a Cell -- Key: HBASE-13939 URL: https://issues.apache.org/jira/browse/HBASE-13939 Project: HBase Issue Type: Sub-task Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Minor Fix For: 2.0.0, 1.1.2 Attachments: HBASE-13939.patch, HBASE-13939_1.patch, HBASE-13939_2.patch, HBASE-13939_3.patch, HBASE-13939_3.patch, HBASE-13939_branch-1.1.patch The getFirstKeyInBlock() in HFileReaderImpl is returning a BB. It is getting used in seekBefore cases. Because we return a BB we create a KeyOnlyKV once for comparison {code} if (reader.getComparator() .compareKeyIgnoresMvcc( new KeyValue.KeyOnlyKeyValue(firstKey.array(), firstKey.arrayOffset(), firstKey.limit()), key) = 0) { long previousBlockOffset = seekToBlock.getPrevBlockOffset(); // The key we are interested in if (previousBlockOffset == -1) { // we have a 'problem', the key we want is the first of the file. return false; } {code} And if the compare fails we again create another KeyOnlyKv {code} Cell firstKeyInCurrentBlock = new KeyValue.KeyOnlyKeyValue(Bytes.getBytes(firstKey)); loadBlockAndSeekToKey(seekToBlock, firstKeyInCurrentBlock, true, key, true); {code} So one object will be enough and that can be returned by getFirstKeyInBlock. Also will be useful when we go with Buffered backed server cell to change in one place. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13702) ImportTsv: Add dry-run functionality and log bad rows
[ https://issues.apache.org/jira/browse/HBASE-13702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apekshit Sharma updated HBASE-13702: Attachment: HBASE-13702-v5.patch [~tedyu] you're right. fixed the issues. ImportTsv: Add dry-run functionality and log bad rows - Key: HBASE-13702 URL: https://issues.apache.org/jira/browse/HBASE-13702 Project: HBase Issue Type: New Feature Reporter: Apekshit Sharma Assignee: Apekshit Sharma Attachments: HBASE-13702-v2.patch, HBASE-13702-v3.patch, HBASE-13702-v4.patch, HBASE-13702-v5.patch, HBASE-13702.patch ImportTSV job skips bad records by default (keeps a count though). -Dimporttsv.skip.bad.lines=false can be used to fail if a bad row is encountered. To be easily able to determine which rows are corrupted in an input, rather than failing on one row at a time seems like a good feature to have. Moreover, there should be 'dry-run' functionality in such kinds of tools, which can essentially does a quick run of tool without making any changes but reporting any errors/warnings and success/failure. To identify corrupted rows, simply logging them should be enough. In worst case, all rows will be logged and size of logs will be same as input size, which seems fine. However, user might have to do some work figuring out where the logs. Is there some link we can show to the user when the tool starts which can help them with that? For the dry run, we can simply use if-else to skip over writing out KVs, and any other mutations, if present. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13702) ImportTsv: Add dry-run functionality and log bad rows
[ https://issues.apache.org/jira/browse/HBASE-13702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apekshit Sharma updated HBASE-13702: Status: Patch Available (was: Open) ImportTsv: Add dry-run functionality and log bad rows - Key: HBASE-13702 URL: https://issues.apache.org/jira/browse/HBASE-13702 Project: HBase Issue Type: New Feature Reporter: Apekshit Sharma Assignee: Apekshit Sharma Attachments: HBASE-13702-v2.patch, HBASE-13702-v3.patch, HBASE-13702-v4.patch, HBASE-13702-v5.patch, HBASE-13702.patch ImportTSV job skips bad records by default (keeps a count though). -Dimporttsv.skip.bad.lines=false can be used to fail if a bad row is encountered. To be easily able to determine which rows are corrupted in an input, rather than failing on one row at a time seems like a good feature to have. Moreover, there should be 'dry-run' functionality in such kinds of tools, which can essentially does a quick run of tool without making any changes but reporting any errors/warnings and success/failure. To identify corrupted rows, simply logging them should be enough. In worst case, all rows will be logged and size of logs will be same as input size, which seems fine. However, user might have to do some work figuring out where the logs. Is there some link we can show to the user when the tool starts which can help them with that? For the dry run, we can simply use if-else to skip over writing out KVs, and any other mutations, if present. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13950) Add a NoopProcedureStore for testing
[ https://issues.apache.org/jira/browse/HBASE-13950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-13950: Attachment: HBASE-13950-v1.patch Add a NoopProcedureStore for testing Key: HBASE-13950 URL: https://issues.apache.org/jira/browse/HBASE-13950 Project: HBase Issue Type: Sub-task Components: proc-v2 Affects Versions: 2.0.0, 1.2.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Trivial Fix For: 2.0.0, 1.2.0 Attachments: HBASE-13950-v0-branch-1.patch, HBASE-13950-v1-branch-1.patch, HBASE-13950-v1.patch Add a NoopProcedureStore and an helper in ProcedureTestingUtil to submitAndWait() a procedure without having to do anything else. This is useful to avoid extra code like in case of TestAssignmentManager.processServerShutdownHandler() -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13950) Add a NoopProcedureStore for testing
[ https://issues.apache.org/jira/browse/HBASE-13950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-13950: Attachment: (was: HBASE-13950-v1.patch) Add a NoopProcedureStore for testing Key: HBASE-13950 URL: https://issues.apache.org/jira/browse/HBASE-13950 Project: HBase Issue Type: Sub-task Components: proc-v2 Affects Versions: 2.0.0, 1.2.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Trivial Fix For: 2.0.0, 1.2.0 Attachments: HBASE-13950-v0-branch-1.patch, HBASE-13950-v1-branch-1.patch, HBASE-13950-v1.patch Add a NoopProcedureStore and an helper in ProcedureTestingUtil to submitAndWait() a procedure without having to do anything else. This is useful to avoid extra code like in case of TestAssignmentManager.processServerShutdownHandler() -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13897) OOM occurs when Import importing a row that including too much KeyValue
[ https://issues.apache.org/jira/browse/HBASE-13897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601542#comment-14601542 ] Ted Yu commented on HBASE-13897: Any chance of an update, Junhong ? OOM occurs when Import importing a row that including too much KeyValue Key: HBASE-13897 URL: https://issues.apache.org/jira/browse/HBASE-13897 Project: HBase Issue Type: Bug Affects Versions: 0.98.13 Reporter: Liu Junhong Assignee: Ted Yu Fix For: 0.98.14 Attachments: HBASE-13897-0.98.patch When importing a row with too many KeyValues (may have too many columns or versions),KeyValueReducer will incur OOM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13670) [HBase MOB] ExpiredMobFileCleaner tool deletes mob files later for one more day after they are expired
[ https://issues.apache.org/jira/browse/HBASE-13670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601556#comment-14601556 ] Anoop Sam John commented on HBASE-13670: Seems u made this patch wrongly. On top of the older patch u made it.. U have to freshly make it. [HBase MOB] ExpiredMobFileCleaner tool deletes mob files later for one more day after they are expired -- Key: HBASE-13670 URL: https://issues.apache.org/jira/browse/HBASE-13670 Project: HBase Issue Type: Improvement Components: documentation, mob Affects Versions: hbase-11339 Reporter: Y. SREENIVASULU REDDY Assignee: Gururaj Shetty Fix For: hbase-11339 Attachments: HBASE-13670.patch, HBASE-13670_01.patch Currently the ExpiredMobFileCleaner cleans the expired mob file according to the date in the mob file name. The minimum unit of the date is day. So ExpiredMobFileCleaner only cleans the expired mob files later for one more day after they are expired. We need to document this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13702) ImportTsv: Add dry-run functionality and log bad rows
[ https://issues.apache.org/jira/browse/HBASE-13702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-13702: --- Status: Open (was: Patch Available) ImportTsv: Add dry-run functionality and log bad rows - Key: HBASE-13702 URL: https://issues.apache.org/jira/browse/HBASE-13702 Project: HBase Issue Type: New Feature Reporter: Apekshit Sharma Assignee: Apekshit Sharma Attachments: HBASE-13702-v2.patch, HBASE-13702-v3.patch, HBASE-13702-v4.patch, HBASE-13702.patch ImportTSV job skips bad records by default (keeps a count though). -Dimporttsv.skip.bad.lines=false can be used to fail if a bad row is encountered. To be easily able to determine which rows are corrupted in an input, rather than failing on one row at a time seems like a good feature to have. Moreover, there should be 'dry-run' functionality in such kinds of tools, which can essentially does a quick run of tool without making any changes but reporting any errors/warnings and success/failure. To identify corrupted rows, simply logging them should be enough. In worst case, all rows will be logged and size of logs will be same as input size, which seems fine. However, user might have to do some work figuring out where the logs. Is there some link we can show to the user when the tool starts which can help them with that? For the dry run, we can simply use if-else to skip over writing out KVs, and any other mutations, if present. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13970) NPE during compaction in trunk
[ https://issues.apache.org/jira/browse/HBASE-13970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601491#comment-14601491 ] Anoop Sam John commented on HBASE-13970: Per store counter means we will have to keep it in Map or so which is adding overhead. The integer even if it overflows over the run, is fine(?) Will go to -ve integers. We need only unique Strings. So this should be ok IMO. NPE during compaction in trunk -- Key: HBASE-13970 URL: https://issues.apache.org/jira/browse/HBASE-13970 Project: HBase Issue Type: Bug Affects Versions: 2.0.0, 0.98.13, 1.2.0, 1.1.1 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 2.0.0, 0.98.14, 1.2.0, 1.1.2 Attachments: HBASE-13970.patch Updated the trunk.. Loaded the table with PE tool. Trigger a flush to ensure all data is flushed out to disk. When the first compaction is triggered we get an NPE and this is very easy to reproduce {code} 015-06-25 21:33:46,041 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received procedure start children changed event: /hbase/flush-table-proc/acquired 2015-06-25 21:33:46,051 INFO [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] regionserver.HRegion: Flushing 1/1 column families, memstore=76.91 MB 2015-06-25 21:33:46,159 ERROR [regionserver/stobdtserver3/10.224.54.70:16040-longCompactions-1435248183945] regionserver.CompactSplitThread: Compaction failed Request = regionName=TestTable,283887,1435248198798.028fb0324cd6eb03d5022eb8c147b7c4., storeName=info, fileCount=3, fileSize=343.4 M (114.5 M, 114.5 M, 114.5 M), priority=3, time=7536968291719985 java.lang.NullPointerException at org.apache.hadoop.hbase.regionserver.compactions.PressureAwareCompactionThroughputController$ActiveCompaction.access$700(PressureAwareCompactionThroughputController.java:79) at org.apache.hadoop.hbase.regionserver.compactions.PressureAwareCompactionThroughputController.finish(PressureAwareCompactionThroughputController.java:238) at org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:306) at org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:106) at org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:112) at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1202) at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1792) at org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:524) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 2015-06-25 21:33:46,745 INFO [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] regionserver.DefaultStoreFlusher: Flushed, sequenceid=1534, memsize=76.9 M, hasBloomFilter=true, into tmp file hdfs://stobdtserver3:9010/hbase/data/default/TestTable/028fb0324cd6eb03d5022eb8c147b7c4/.tmp/942ba0831a0047a08987439e34361a0c 2015-06-25 21:33:46,772 INFO [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] regionserver.HStore: Added hdfs://stobdtserver3:9010/hbase/data/default/TestTable/028fb0324cd6eb03d5022eb8c147b7c4/info/942ba0831a0047a08987439e34361a0c, entries=68116, sequenceid=1534, filesize=68.7 M 2015-06-25 21:33:46,773 INFO [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] regionserver.HRegion: Finished memstore flush of ~76.91 MB/80649344, currentsize=0 B/0 for region TestTable,283887,1435248198798.028fb0324cd6eb03d5022eb8c147b7c4. in 723ms, sequenceid=1534, compaction requested=true 2015-06-25 21:33:46,780 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received created event:/hbase/flush-table-proc/reached/TestTable 2015-06-25 21:33:46,790 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received created event:/hbase/flush-table-proc/abort/TestTable 2015-06-25 21:33:46,791 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received procedure abort children changed event: /hbase/flush-table-proc/abort 2015-06-25 21:33:46,803 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received procedure start children changed event: /hbase/flush-table-proc/acquired 2015-06-25 21:33:46,818 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received procedure abort children changed event: /hbase/flush-table-proc/abort {code} Will check this on what is the
[jira] [Commented] (HBASE-13970) NPE during compaction in trunk
[ https://issues.apache.org/jira/browse/HBASE-13970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601533#comment-14601533 ] ramkrishna.s.vasudevan commented on HBASE-13970: What I thought was just do store.incrementandGetCounter()? That will return that atomic counter. Did not think as a map. Main concern was that overflow part only. (negative is fine then ok). NPE during compaction in trunk -- Key: HBASE-13970 URL: https://issues.apache.org/jira/browse/HBASE-13970 Project: HBase Issue Type: Bug Affects Versions: 2.0.0, 0.98.13, 1.2.0, 1.1.1 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 2.0.0, 0.98.14, 1.2.0, 1.1.2 Attachments: HBASE-13970.patch Updated the trunk.. Loaded the table with PE tool. Trigger a flush to ensure all data is flushed out to disk. When the first compaction is triggered we get an NPE and this is very easy to reproduce {code} 015-06-25 21:33:46,041 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received procedure start children changed event: /hbase/flush-table-proc/acquired 2015-06-25 21:33:46,051 INFO [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] regionserver.HRegion: Flushing 1/1 column families, memstore=76.91 MB 2015-06-25 21:33:46,159 ERROR [regionserver/stobdtserver3/10.224.54.70:16040-longCompactions-1435248183945] regionserver.CompactSplitThread: Compaction failed Request = regionName=TestTable,283887,1435248198798.028fb0324cd6eb03d5022eb8c147b7c4., storeName=info, fileCount=3, fileSize=343.4 M (114.5 M, 114.5 M, 114.5 M), priority=3, time=7536968291719985 java.lang.NullPointerException at org.apache.hadoop.hbase.regionserver.compactions.PressureAwareCompactionThroughputController$ActiveCompaction.access$700(PressureAwareCompactionThroughputController.java:79) at org.apache.hadoop.hbase.regionserver.compactions.PressureAwareCompactionThroughputController.finish(PressureAwareCompactionThroughputController.java:238) at org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:306) at org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:106) at org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:112) at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1202) at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1792) at org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:524) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 2015-06-25 21:33:46,745 INFO [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] regionserver.DefaultStoreFlusher: Flushed, sequenceid=1534, memsize=76.9 M, hasBloomFilter=true, into tmp file hdfs://stobdtserver3:9010/hbase/data/default/TestTable/028fb0324cd6eb03d5022eb8c147b7c4/.tmp/942ba0831a0047a08987439e34361a0c 2015-06-25 21:33:46,772 INFO [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] regionserver.HStore: Added hdfs://stobdtserver3:9010/hbase/data/default/TestTable/028fb0324cd6eb03d5022eb8c147b7c4/info/942ba0831a0047a08987439e34361a0c, entries=68116, sequenceid=1534, filesize=68.7 M 2015-06-25 21:33:46,773 INFO [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] regionserver.HRegion: Finished memstore flush of ~76.91 MB/80649344, currentsize=0 B/0 for region TestTable,283887,1435248198798.028fb0324cd6eb03d5022eb8c147b7c4. in 723ms, sequenceid=1534, compaction requested=true 2015-06-25 21:33:46,780 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received created event:/hbase/flush-table-proc/reached/TestTable 2015-06-25 21:33:46,790 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received created event:/hbase/flush-table-proc/abort/TestTable 2015-06-25 21:33:46,791 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received procedure abort children changed event: /hbase/flush-table-proc/abort 2015-06-25 21:33:46,803 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received procedure start children changed event: /hbase/flush-table-proc/acquired 2015-06-25 21:33:46,818 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received procedure abort children changed event: /hbase/flush-table-proc/abort {code} Will check this on what is the reason behind it.
[jira] [Commented] (HBASE-13970) NPE during compaction in trunk
[ https://issues.apache.org/jira/browse/HBASE-13970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601661#comment-14601661 ] Hadoop QA commented on HBASE-13970: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12741873/HBASE-13970.patch against master branch at commit edef3d64bce41fffbc5649ffa19b2cf80ce28d7a. ATTACHMENT ID: 12741873 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/14571//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/14571//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/14571//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/14571//console This message is automatically generated. NPE during compaction in trunk -- Key: HBASE-13970 URL: https://issues.apache.org/jira/browse/HBASE-13970 Project: HBase Issue Type: Bug Affects Versions: 2.0.0, 0.98.13, 1.2.0, 1.1.1 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 2.0.0, 0.98.14, 1.2.0, 1.1.2 Attachments: HBASE-13970.patch Updated the trunk.. Loaded the table with PE tool. Trigger a flush to ensure all data is flushed out to disk. When the first compaction is triggered we get an NPE and this is very easy to reproduce {code} 015-06-25 21:33:46,041 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received procedure start children changed event: /hbase/flush-table-proc/acquired 2015-06-25 21:33:46,051 INFO [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] regionserver.HRegion: Flushing 1/1 column families, memstore=76.91 MB 2015-06-25 21:33:46,159 ERROR [regionserver/stobdtserver3/10.224.54.70:16040-longCompactions-1435248183945] regionserver.CompactSplitThread: Compaction failed Request = regionName=TestTable,283887,1435248198798.028fb0324cd6eb03d5022eb8c147b7c4., storeName=info, fileCount=3, fileSize=343.4 M (114.5 M, 114.5 M, 114.5 M), priority=3, time=7536968291719985 java.lang.NullPointerException at org.apache.hadoop.hbase.regionserver.compactions.PressureAwareCompactionThroughputController$ActiveCompaction.access$700(PressureAwareCompactionThroughputController.java:79) at org.apache.hadoop.hbase.regionserver.compactions.PressureAwareCompactionThroughputController.finish(PressureAwareCompactionThroughputController.java:238) at org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:306) at org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:106) at org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:112) at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1202) at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1792) at org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:524) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
[jira] [Commented] (HBASE-13969) AuthenticationTokenSecretManager is never stopped in RPCServer
[ https://issues.apache.org/jira/browse/HBASE-13969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601712#comment-14601712 ] Ted Yu commented on HBASE-13969: lgtm Minor comment: {code} 2238if (authTokenSecretMgr != null) { 2239 authTokenSecretMgr.stop(); 2240} {code} Set authTokenSecretMgr to null in the above if block after calling stop(). AuthenticationTokenSecretManager is never stopped in RPCServer -- Key: HBASE-13969 URL: https://issues.apache.org/jira/browse/HBASE-13969 Project: HBase Issue Type: Bug Affects Versions: 0.98.13 Reporter: Pankaj Kumar Assignee: Pankaj Kumar Priority: Minor Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.2, 1.3.0 Attachments: HBASE-13969.patch AuthenticationTokenSecretManager is never stopped in RPCServer. {code} AuthenticationTokenSecretManager mgr = createSecretManager(); if (mgr != null) { setSecretManager(mgr); mgr.start(); } {code} It should be stopped during exit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13971) Flushes stuck since 6 hours on a regionserver.
[ https://issues.apache.org/jira/browse/HBASE-13971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhilash updated HBASE-13971: - Description: One region server stuck while flushing(possible deadlock). Its trying to flush two regions since last 6 hours (see the screenshot). Caused while running IntegrationTestLoadAndVerify for 20 M rows with 600 mapper jobs and 100 back references. ~37 Million writes on each regionserver till now but no writes happening on any other regionserver from past 6 hours and their memstore size is zero(I dont know if this is related). But this particular regionserver has memstore size of 9GBs from past 6 hours. Relevant snaps from debug dump: Tasks: === Task: Flushing IntegrationTestLoadAndVerify,R\x9B\x1B\xBF\xAE\x08\xD1\xA2,1435179555993.8e2d075f94ce7699f416ec4ced9873cd. Status: RUNNING:Preparing to flush by snapshotting stores in 8e2d075f94ce7699f416ec4ced9873cd Running for 22034s Task: Flushing IntegrationTestLoadAndVerify,\x93\xA385\x81Z\x11\xE6,1435179555993.9f8d0e01a40405b835bf6e5a22a86390. Status: RUNNING:Preparing to flush by snapshotting stores in 9f8d0e01a40405b835bf6e5a22a86390 Running for 22033s Executors: === ... Thread 139 (MemStoreFlusher.1): State: WAITING Blocked count: 139711 Waited count: 239212 Waiting on java.util.concurrent.CountDownLatch$Sync@b9c094a Stack: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997) java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231) org.apache.hadoop.hbase.wal.WALKey.getSequenceId(WALKey.java:305) org.apache.hadoop.hbase.regionserver.HRegion.getNextSequenceId(HRegion.java:2422) org.apache.hadoop.hbase.regionserver.HRegion.internalPrepareFlushCache(HRegion.java:2168) org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2047) org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2011) org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1902) org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:1828) org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:510) org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:471) org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$900(MemStoreFlusher.java:75) org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:259) java.lang.Thread.run(Thread.java:745) Thread 137 (MemStoreFlusher.0): State: WAITING Blocked count: 138931 Waited count: 237448 Waiting on java.util.concurrent.CountDownLatch$Sync@53f41f76 Stack: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997) java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231) org.apache.hadoop.hbase.wal.WALKey.getSequenceId(WALKey.java:305) org.apache.hadoop.hbase.regionserver.HRegion.getNextSequenceId(HRegion.java:2422) org.apache.hadoop.hbase.regionserver.HRegion.internalPrepareFlushCache(HRegion.java:2168) org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2047) org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2011) org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1902) org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:1828) org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:510) org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:471) org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$900(MemStoreFlusher.java:75) org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:259) java.lang.Thread.run(Thread.java:745) was: One region server stuck while flushing(possible deadlock). Its trying to flush two regions since last 6 hours (see the screenshot). Caused while running
[jira] [Updated] (HBASE-13969) AuthenticationTokenSecretManager is never stopped in RPCServer
[ https://issues.apache.org/jira/browse/HBASE-13969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pankaj Kumar updated HBASE-13969: - Attachment: HBASE-13969-V2.patch Thanks [~tedyu] for reviewing the patch. Added the modified patch. AuthenticationTokenSecretManager is never stopped in RPCServer -- Key: HBASE-13969 URL: https://issues.apache.org/jira/browse/HBASE-13969 Project: HBase Issue Type: Bug Affects Versions: 0.98.13 Reporter: Pankaj Kumar Assignee: Pankaj Kumar Priority: Minor Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.2, 1.3.0 Attachments: HBASE-13969-V2.patch, HBASE-13969.patch AuthenticationTokenSecretManager is never stopped in RPCServer. {code} AuthenticationTokenSecretManager mgr = createSecretManager(); if (mgr != null) { setSecretManager(mgr); mgr.start(); } {code} It should be stopped during exit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-13971) Flushes stuck since 6 hours on a regionserver.
Abhilash created HBASE-13971: Summary: Flushes stuck since 6 hours on a regionserver. Key: HBASE-13971 URL: https://issues.apache.org/jira/browse/HBASE-13971 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 1.3.0 Environment: Caused while running IntegrationTestLoadAndVerify for 20 M rows on cluster with 32 region servers each with max heap size of 24GBs. Reporter: Abhilash One region server stuck while flushing(possible deadlock). Its trying to flush two regions since last 6 hours (see the screenshot). Caused while running IntegrationTestLoadAndVerify for 20 M rows with 600 mapper jobs and 100 back references. ~37 Million writes on each regionserver till now but no writes happening on any other regionserver from past 6 hours and their memstore size is zero(I dont know if this is related). But this particular regionserver has memstore size of 9GBs from past 6 hours. Relevant snaps from debug dump: Tasks: === Task: Flushing IntegrationTestLoadAndVerify,R\x9B\x1B\xBF\xAE\x08\xD1\xA2,1435179555993.8e2d075f94ce7699f416ec4ced9873cd. Status: RUNNING:Preparing to flush by snapshotting stores in 8e2d075f94ce7699f416ec4ced9873cd Running for 22034s Task: Flushing IntegrationTestLoadAndVerify,\x93\xA385\x81Z\x11\xE6,1435179555993.9f8d0e01a40405b835bf6e5a22a86390. Status: RUNNING:Preparing to flush by snapshotting stores in 9f8d0e01a40405b835bf6e5a22a86390 Running for 22033s Thread 139 (MemStoreFlusher.1): State: WAITING Blocked count: 139711 Waited count: 239212 Waiting on java.util.concurrent.CountDownLatch$Sync@b9c094a Stack: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997) java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231) org.apache.hadoop.hbase.wal.WALKey.getSequenceId(WALKey.java:305) org.apache.hadoop.hbase.regionserver.HRegion.getNextSequenceId(HRegion.java:2422) org.apache.hadoop.hbase.regionserver.HRegion.internalPrepareFlushCache(HRegion.java:2168) org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2047) org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2011) org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1902) org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:1828) org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:510) org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:471) org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$900(MemStoreFlusher.java:75) org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:259) java.lang.Thread.run(Thread.java:745) Thread 137 (MemStoreFlusher.0): State: WAITING Blocked count: 138931 Waited count: 237448 Waiting on java.util.concurrent.CountDownLatch$Sync@53f41f76 Stack: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997) java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231) org.apache.hadoop.hbase.wal.WALKey.getSequenceId(WALKey.java:305) org.apache.hadoop.hbase.regionserver.HRegion.getNextSequenceId(HRegion.java:2422) org.apache.hadoop.hbase.regionserver.HRegion.internalPrepareFlushCache(HRegion.java:2168) org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2047) org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2011) org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1902) org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:1828) org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:510) org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:471) org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$900(MemStoreFlusher.java:75)
[jira] [Commented] (HBASE-13971) Flushes stuck since 6 hours on a regionserver.
[ https://issues.apache.org/jira/browse/HBASE-13971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601728#comment-14601728 ] Ted Yu commented on HBASE-13971: Can you attach the complete jstack for the region server ? Region server log would also be helpful. Flushes stuck since 6 hours on a regionserver. -- Key: HBASE-13971 URL: https://issues.apache.org/jira/browse/HBASE-13971 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 1.3.0 Environment: Caused while running IntegrationTestLoadAndVerify for 20 M rows on cluster with 32 region servers each with max heap size of 24GBs. Reporter: Abhilash Priority: Critical Attachments: screenshot-1.png One region server stuck while flushing(possible deadlock). Its trying to flush two regions since last 6 hours (see the screenshot). Caused while running IntegrationTestLoadAndVerify for 20 M rows with 600 mapper jobs and 100 back references. ~37 Million writes on each regionserver till now but no writes happening on any regionserver from past 6 hours and their memstore size is zero(I dont know if this is related). But this particular regionserver has memstore size of 9GBs from past 6 hours. Relevant snaps from debug dump: Tasks: === Task: Flushing IntegrationTestLoadAndVerify,R\x9B\x1B\xBF\xAE\x08\xD1\xA2,1435179555993.8e2d075f94ce7699f416ec4ced9873cd. Status: RUNNING:Preparing to flush by snapshotting stores in 8e2d075f94ce7699f416ec4ced9873cd Running for 22034s Task: Flushing IntegrationTestLoadAndVerify,\x93\xA385\x81Z\x11\xE6,1435179555993.9f8d0e01a40405b835bf6e5a22a86390. Status: RUNNING:Preparing to flush by snapshotting stores in 9f8d0e01a40405b835bf6e5a22a86390 Running for 22033s Executors: === ... Thread 139 (MemStoreFlusher.1): State: WAITING Blocked count: 139711 Waited count: 239212 Waiting on java.util.concurrent.CountDownLatch$Sync@b9c094a Stack: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997) java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231) org.apache.hadoop.hbase.wal.WALKey.getSequenceId(WALKey.java:305) org.apache.hadoop.hbase.regionserver.HRegion.getNextSequenceId(HRegion.java:2422) org.apache.hadoop.hbase.regionserver.HRegion.internalPrepareFlushCache(HRegion.java:2168) org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2047) org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2011) org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1902) org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:1828) org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:510) org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:471) org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$900(MemStoreFlusher.java:75) org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:259) java.lang.Thread.run(Thread.java:745) Thread 137 (MemStoreFlusher.0): State: WAITING Blocked count: 138931 Waited count: 237448 Waiting on java.util.concurrent.CountDownLatch$Sync@53f41f76 Stack: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997) java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231) org.apache.hadoop.hbase.wal.WALKey.getSequenceId(WALKey.java:305) org.apache.hadoop.hbase.regionserver.HRegion.getNextSequenceId(HRegion.java:2422) org.apache.hadoop.hbase.regionserver.HRegion.internalPrepareFlushCache(HRegion.java:2168) org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2047)
[jira] [Updated] (HBASE-13971) Flushes stuck since 6 hours on a regionserver.
[ https://issues.apache.org/jira/browse/HBASE-13971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhilash updated HBASE-13971: - Description: One region server stuck while flushing(possible deadlock). Its trying to flush two regions since last 6 hours (see the screenshot). Caused while running IntegrationTestLoadAndVerify for 20 M rows with 600 mapper jobs and 100 back references. ~37 Million writes on each regionserver till now but no writes happening on any regionserver from past 6 hours and their memstore size is zero(I dont know if this is related). But this particular regionserver has memstore size of 9GBs from past 6 hours. Relevant snaps from debug dump: Tasks: === Task: Flushing IntegrationTestLoadAndVerify,R\x9B\x1B\xBF\xAE\x08\xD1\xA2,1435179555993.8e2d075f94ce7699f416ec4ced9873cd. Status: RUNNING:Preparing to flush by snapshotting stores in 8e2d075f94ce7699f416ec4ced9873cd Running for 22034s Task: Flushing IntegrationTestLoadAndVerify,\x93\xA385\x81Z\x11\xE6,1435179555993.9f8d0e01a40405b835bf6e5a22a86390. Status: RUNNING:Preparing to flush by snapshotting stores in 9f8d0e01a40405b835bf6e5a22a86390 Running for 22033s Executors: === ... Thread 139 (MemStoreFlusher.1): State: WAITING Blocked count: 139711 Waited count: 239212 Waiting on java.util.concurrent.CountDownLatch$Sync@b9c094a Stack: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997) java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231) org.apache.hadoop.hbase.wal.WALKey.getSequenceId(WALKey.java:305) org.apache.hadoop.hbase.regionserver.HRegion.getNextSequenceId(HRegion.java:2422) org.apache.hadoop.hbase.regionserver.HRegion.internalPrepareFlushCache(HRegion.java:2168) org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2047) org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2011) org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1902) org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:1828) org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:510) org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:471) org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$900(MemStoreFlusher.java:75) org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:259) java.lang.Thread.run(Thread.java:745) Thread 137 (MemStoreFlusher.0): State: WAITING Blocked count: 138931 Waited count: 237448 Waiting on java.util.concurrent.CountDownLatch$Sync@53f41f76 Stack: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997) java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231) org.apache.hadoop.hbase.wal.WALKey.getSequenceId(WALKey.java:305) org.apache.hadoop.hbase.regionserver.HRegion.getNextSequenceId(HRegion.java:2422) org.apache.hadoop.hbase.regionserver.HRegion.internalPrepareFlushCache(HRegion.java:2168) org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2047) org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2011) org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1902) org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:1828) org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:510) org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:471) org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$900(MemStoreFlusher.java:75) org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:259) java.lang.Thread.run(Thread.java:745) was: One region server stuck while flushing(possible deadlock). Its trying to flush two regions since last 6 hours (see the screenshot). Caused while running IntegrationTestLoadAndVerify
[jira] [Updated] (HBASE-13336) Consistent rules for security meta table protections
[ https://issues.apache.org/jira/browse/HBASE-13336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Srikanth Srungarapu updated HBASE-13336: Status: Patch Available (was: In Progress) Consistent rules for security meta table protections Key: HBASE-13336 URL: https://issues.apache.org/jira/browse/HBASE-13336 Project: HBase Issue Type: Improvement Reporter: Andrew Purtell Assignee: Srikanth Srungarapu Fix For: 2.0.0, 0.98.14, 1.3.0 Attachments: HBASE-13336.patch, HBASE-13336_v2.patch The AccessController and VisibilityController do different things regarding protecting their meta tables. The AC allows schema changes and disable/enable if the user has permission. The VC unconditionally disallows all admin actions. Generally, bad things will happen if these meta tables are damaged, disabled, or dropped. The likely outcome is random frequent (or constant) server side op failures with nasty stack traces. On the other hand some things like column family and table attribute changes can have valid use cases. We should have consistent and sensible rules for protecting security meta tables. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13702) ImportTsv: Add dry-run functionality and log bad rows
[ https://issues.apache.org/jira/browse/HBASE-13702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601647#comment-14601647 ] Ted Yu commented on HBASE-13702: +1 if tests pass. ImportTsv: Add dry-run functionality and log bad rows - Key: HBASE-13702 URL: https://issues.apache.org/jira/browse/HBASE-13702 Project: HBase Issue Type: New Feature Reporter: Apekshit Sharma Assignee: Apekshit Sharma Fix For: 2.0.0, 1.3.0 Attachments: HBASE-13702-v2.patch, HBASE-13702-v3.patch, HBASE-13702-v4.patch, HBASE-13702-v5.patch, HBASE-13702.patch ImportTSV job skips bad records by default (keeps a count though). -Dimporttsv.skip.bad.lines=false can be used to fail if a bad row is encountered. To be easily able to determine which rows are corrupted in an input, rather than failing on one row at a time seems like a good feature to have. Moreover, there should be 'dry-run' functionality in such kinds of tools, which can essentially does a quick run of tool without making any changes but reporting any errors/warnings and success/failure. To identify corrupted rows, simply logging them should be enough. In worst case, all rows will be logged and size of logs will be same as input size, which seems fine. However, user might have to do some work figuring out where the logs. Is there some link we can show to the user when the tool starts which can help them with that? For the dry run, we can simply use if-else to skip over writing out KVs, and any other mutations, if present. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13959) Region splitting takes too long because it uses a single thread in most common cases
[ https://issues.apache.org/jira/browse/HBASE-13959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601685#comment-14601685 ] Lars Hofhansl commented on HBASE-13959: --- Nice find and patch. The 8 seems to come out of nowhere. Do you have numbers for different numbers of threads? Maybe default it to 1/2 of block store file count...? Region splitting takes too long because it uses a single thread in most common cases Key: HBASE-13959 URL: https://issues.apache.org/jira/browse/HBASE-13959 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.98.12 Reporter: Hari Krishna Dara Assignee: Hari Krishna Dara Fix For: 0.98.14 Attachments: HBASE-13959-2.patch, HBASE-13959-3.patch, HBASE-13959-4.patch, HBASE-13959.patch, region-split-durations-compared.png When storefiles need to be split as part of a region split, the current logic uses a threadpool with the size set to the size of the number of stores. Since most common table setup involves only a single column family, this translates to having a single store and so the threadpool is run with a single thread. However, in a write heavy workload, there could be several tens of storefiles in a store at the time of splitting, and with a threadpool size of one, these files end up getting split sequentially. With a bit of tracing, I noticed that it takes on an average of 350ms to create a single reference file, and splitting each storefile involves creating two of these, so with a storefile count of 20, it takes about 14s just to get through this phase alone (2 reference files for each storefile), pushing the total time the region is offline to 18s or more. For environments that are setup to fail fast, this makes the client exhaust all retries and fail with NotServingRegionException. The fix should increase the concurrency of this operation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13971) Flushes stuck since 6 hours on a regionserver.
[ https://issues.apache.org/jira/browse/HBASE-13971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhilash updated HBASE-13971: - Priority: Critical (was: Major) Flushes stuck since 6 hours on a regionserver. -- Key: HBASE-13971 URL: https://issues.apache.org/jira/browse/HBASE-13971 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 1.3.0 Environment: Caused while running IntegrationTestLoadAndVerify for 20 M rows on cluster with 32 region servers each with max heap size of 24GBs. Reporter: Abhilash Priority: Critical One region server stuck while flushing(possible deadlock). Its trying to flush two regions since last 6 hours (see the screenshot). Caused while running IntegrationTestLoadAndVerify for 20 M rows with 600 mapper jobs and 100 back references. ~37 Million writes on each regionserver till now but no writes happening on any other regionserver from past 6 hours and their memstore size is zero(I dont know if this is related). But this particular regionserver has memstore size of 9GBs from past 6 hours. Relevant snaps from debug dump: Tasks: === Task: Flushing IntegrationTestLoadAndVerify,R\x9B\x1B\xBF\xAE\x08\xD1\xA2,1435179555993.8e2d075f94ce7699f416ec4ced9873cd. Status: RUNNING:Preparing to flush by snapshotting stores in 8e2d075f94ce7699f416ec4ced9873cd Running for 22034s Task: Flushing IntegrationTestLoadAndVerify,\x93\xA385\x81Z\x11\xE6,1435179555993.9f8d0e01a40405b835bf6e5a22a86390. Status: RUNNING:Preparing to flush by snapshotting stores in 9f8d0e01a40405b835bf6e5a22a86390 Running for 22033s Executors: === ... Thread 139 (MemStoreFlusher.1): State: WAITING Blocked count: 139711 Waited count: 239212 Waiting on java.util.concurrent.CountDownLatch$Sync@b9c094a Stack: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997) java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231) org.apache.hadoop.hbase.wal.WALKey.getSequenceId(WALKey.java:305) org.apache.hadoop.hbase.regionserver.HRegion.getNextSequenceId(HRegion.java:2422) org.apache.hadoop.hbase.regionserver.HRegion.internalPrepareFlushCache(HRegion.java:2168) org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2047) org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2011) org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1902) org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:1828) org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:510) org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:471) org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$900(MemStoreFlusher.java:75) org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:259) java.lang.Thread.run(Thread.java:745) Thread 137 (MemStoreFlusher.0): State: WAITING Blocked count: 138931 Waited count: 237448 Waiting on java.util.concurrent.CountDownLatch$Sync@53f41f76 Stack: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997) java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231) org.apache.hadoop.hbase.wal.WALKey.getSequenceId(WALKey.java:305) org.apache.hadoop.hbase.regionserver.HRegion.getNextSequenceId(HRegion.java:2422) org.apache.hadoop.hbase.regionserver.HRegion.internalPrepareFlushCache(HRegion.java:2168) org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2047) org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2011) org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1902)
[jira] [Updated] (HBASE-13971) Flushes stuck since 6 hours on a regionserver.
[ https://issues.apache.org/jira/browse/HBASE-13971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhilash updated HBASE-13971: - Attachment: rsDebugDump.txt Flushes stuck since 6 hours on a regionserver. -- Key: HBASE-13971 URL: https://issues.apache.org/jira/browse/HBASE-13971 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 1.3.0 Environment: Caused while running IntegrationTestLoadAndVerify for 20 M rows on cluster with 32 region servers each with max heap size of 24GBs. Reporter: Abhilash Priority: Critical Attachments: rsDebugDump.txt, screenshot-1.png One region server stuck while flushing(possible deadlock). Its trying to flush two regions since last 6 hours (see the screenshot). Caused while running IntegrationTestLoadAndVerify for 20 M rows with 600 mapper jobs and 100 back references. ~37 Million writes on each regionserver till now but no writes happening on any regionserver from past 6 hours and their memstore size is zero(I dont know if this is related). But this particular regionserver has memstore size of 9GBs from past 6 hours. Relevant snaps from debug dump: Tasks: === Task: Flushing IntegrationTestLoadAndVerify,R\x9B\x1B\xBF\xAE\x08\xD1\xA2,1435179555993.8e2d075f94ce7699f416ec4ced9873cd. Status: RUNNING:Preparing to flush by snapshotting stores in 8e2d075f94ce7699f416ec4ced9873cd Running for 22034s Task: Flushing IntegrationTestLoadAndVerify,\x93\xA385\x81Z\x11\xE6,1435179555993.9f8d0e01a40405b835bf6e5a22a86390. Status: RUNNING:Preparing to flush by snapshotting stores in 9f8d0e01a40405b835bf6e5a22a86390 Running for 22033s Executors: === ... Thread 139 (MemStoreFlusher.1): State: WAITING Blocked count: 139711 Waited count: 239212 Waiting on java.util.concurrent.CountDownLatch$Sync@b9c094a Stack: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997) java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231) org.apache.hadoop.hbase.wal.WALKey.getSequenceId(WALKey.java:305) org.apache.hadoop.hbase.regionserver.HRegion.getNextSequenceId(HRegion.java:2422) org.apache.hadoop.hbase.regionserver.HRegion.internalPrepareFlushCache(HRegion.java:2168) org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2047) org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2011) org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1902) org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:1828) org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:510) org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:471) org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$900(MemStoreFlusher.java:75) org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:259) java.lang.Thread.run(Thread.java:745) Thread 137 (MemStoreFlusher.0): State: WAITING Blocked count: 138931 Waited count: 237448 Waiting on java.util.concurrent.CountDownLatch$Sync@53f41f76 Stack: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997) java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231) org.apache.hadoop.hbase.wal.WALKey.getSequenceId(WALKey.java:305) org.apache.hadoop.hbase.regionserver.HRegion.getNextSequenceId(HRegion.java:2422) org.apache.hadoop.hbase.regionserver.HRegion.internalPrepareFlushCache(HRegion.java:2168) org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2047) org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2011) org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1902)
[jira] [Updated] (HBASE-13702) ImportTsv: Add dry-run functionality and log bad rows
[ https://issues.apache.org/jira/browse/HBASE-13702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-13702: --- Fix Version/s: 1.3.0 2.0.0 ImportTsv: Add dry-run functionality and log bad rows - Key: HBASE-13702 URL: https://issues.apache.org/jira/browse/HBASE-13702 Project: HBase Issue Type: New Feature Reporter: Apekshit Sharma Assignee: Apekshit Sharma Fix For: 2.0.0, 1.3.0 Attachments: HBASE-13702-v2.patch, HBASE-13702-v3.patch, HBASE-13702-v4.patch, HBASE-13702-v5.patch, HBASE-13702.patch ImportTSV job skips bad records by default (keeps a count though). -Dimporttsv.skip.bad.lines=false can be used to fail if a bad row is encountered. To be easily able to determine which rows are corrupted in an input, rather than failing on one row at a time seems like a good feature to have. Moreover, there should be 'dry-run' functionality in such kinds of tools, which can essentially does a quick run of tool without making any changes but reporting any errors/warnings and success/failure. To identify corrupted rows, simply logging them should be enough. In worst case, all rows will be logged and size of logs will be same as input size, which seems fine. However, user might have to do some work figuring out where the logs. Is there some link we can show to the user when the tool starts which can help them with that? For the dry run, we can simply use if-else to skip over writing out KVs, and any other mutations, if present. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13702) ImportTsv: Add dry-run functionality and log bad rows
[ https://issues.apache.org/jira/browse/HBASE-13702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601667#comment-14601667 ] Ted Yu commented on HBASE-13702: There're several hunks in hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestImportTsv.java which don't apply on branch-1 Mind providing patch for branch-1 ? Thanks ImportTsv: Add dry-run functionality and log bad rows - Key: HBASE-13702 URL: https://issues.apache.org/jira/browse/HBASE-13702 Project: HBase Issue Type: New Feature Reporter: Apekshit Sharma Assignee: Apekshit Sharma Fix For: 2.0.0, 1.3.0 Attachments: HBASE-13702-v2.patch, HBASE-13702-v3.patch, HBASE-13702-v4.patch, HBASE-13702-v5.patch, HBASE-13702.patch ImportTSV job skips bad records by default (keeps a count though). -Dimporttsv.skip.bad.lines=false can be used to fail if a bad row is encountered. To be easily able to determine which rows are corrupted in an input, rather than failing on one row at a time seems like a good feature to have. Moreover, there should be 'dry-run' functionality in such kinds of tools, which can essentially does a quick run of tool without making any changes but reporting any errors/warnings and success/failure. To identify corrupted rows, simply logging them should be enough. In worst case, all rows will be logged and size of logs will be same as input size, which seems fine. However, user might have to do some work figuring out where the logs. Is there some link we can show to the user when the tool starts which can help them with that? For the dry run, we can simply use if-else to skip over writing out KVs, and any other mutations, if present. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13969) AuthenticationTokenSecretManager is never stopped in RPCServer
[ https://issues.apache.org/jira/browse/HBASE-13969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pankaj Kumar updated HBASE-13969: - Fix Version/s: 1.3.0 1.1.2 1.2.0 1.0.2 0.98.14 2.0.0 Affects Version/s: 0.98.13 Status: Patch Available (was: Open) AuthenticationTokenSecretManager is never stopped in RPCServer -- Key: HBASE-13969 URL: https://issues.apache.org/jira/browse/HBASE-13969 Project: HBase Issue Type: Bug Affects Versions: 0.98.13 Reporter: Pankaj Kumar Assignee: Pankaj Kumar Priority: Minor Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.2, 1.3.0 Attachments: HBASE-13969.patch AuthenticationTokenSecretManager is never stopped in RPCServer. {code} AuthenticationTokenSecretManager mgr = createSecretManager(); if (mgr != null) { setSecretManager(mgr); mgr.start(); } {code} It should be stopped during exit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13835) KeyValueHeap.current might be in heap when exception happens in pollRealKV
[ https://issues.apache.org/jira/browse/HBASE-13835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anoop Sam John updated HBASE-13835: --- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 1.3.0 1.1.2 1.2.0 1.0.2 0.98.14 2.0.0 Status: Resolved (was: Patch Available) Pushed to 0.98+ branches. Thanks for the patch [~sinago] KeyValueHeap.current might be in heap when exception happens in pollRealKV -- Key: HBASE-13835 URL: https://issues.apache.org/jira/browse/HBASE-13835 Project: HBase Issue Type: Bug Components: Scanners Reporter: zhouyingchao Assignee: zhouyingchao Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.2, 1.3.0 Attachments: HBASE-13835-001.patch, HBASE-13835-002.patch, HBASE-13835-branch1-001.patch, HBASE-13835_0.98.patch, HBASE-13835_branch-1.0.patch, HBASE-13835_branch-1.patch, HBASE-13835_branch-1.patch In a 0.94 hbase cluster, we found a NPE with following stack: {code} Exception in thread regionserver21600.leaseChecker java.lang.NullPointerException at org.apache.hadoop.hbase.KeyValue$KVComparator.compare(KeyValue.java:1530) at org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:225) at org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:201) at org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:191) at java.util.PriorityQueue.siftDownUsingComparator(PriorityQueue.java:641) at java.util.PriorityQueue.siftDown(PriorityQueue.java:612) at java.util.PriorityQueue.poll(PriorityQueue.java:523) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.close(KeyValueHeap.java:241) at org.apache.hadoop.hbase.regionserver.StoreScanner.close(StoreScanner.java:355) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.close(KeyValueHeap.java:237) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.close(HRegion.java:4302) at org.apache.hadoop.hbase.regionserver.HRegionServer$ScannerListener.leaseExpired(HRegionServer.java:3033) at org.apache.hadoop.hbase.regionserver.Leases.run(Leases.java:119) at java.lang.Thread.run(Thread.java:662) {code} Before this NPE exception, there is an exception happens in pollRealKV, which we think is the culprit of the NPE. {code} ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: java.io.IOException: Could not reseek StoreFileScanner[HFileScanner for reader reader= at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:180) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.enforceSeek(StoreFileScanner.java:371) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:366) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:116) at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:455) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:154) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:4124) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:4196) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:4067) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:4057) at org.apache.hadoop.hbase.regionserver.HRegionServer.internalNext(HRegionServer.java:2898) at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2833) at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2815) at sun.reflect.GeneratedMethodAccessor38.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.SecureRpcEngine$Server.call(SecureRpcEngine.java:337) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1583) {code} Simply put, if there is an exception happens in pollRealKV( ), the KeyValueHeap.current might be in heap. Later on, when KeyValueHeap.close( ) is called, the current would be closed firstly. However, since it might still be in the heap, it would either be closed again or its peek() (which is null after it is closed) is
[jira] [Issue Comment Deleted] (HBASE-13959) Region splitting takes too long because it uses a single thread in most common cases
[ https://issues.apache.org/jira/browse/HBASE-13959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-13959: -- Comment: was deleted (was: Can't there be other regions splits that could go on in parallel? [~apurtell] have suggestions on using shared executor pools with larger scope, which would scale and perform better than sizing this thread pool proportional to some metric related to the current region.) Region splitting takes too long because it uses a single thread in most common cases Key: HBASE-13959 URL: https://issues.apache.org/jira/browse/HBASE-13959 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.98.12 Reporter: Hari Krishna Dara Assignee: Hari Krishna Dara Fix For: 0.98.14 Attachments: HBASE-13959-2.patch, HBASE-13959-3.patch, HBASE-13959-4.patch, HBASE-13959.patch, region-split-durations-compared.png When storefiles need to be split as part of a region split, the current logic uses a threadpool with the size set to the size of the number of stores. Since most common table setup involves only a single column family, this translates to having a single store and so the threadpool is run with a single thread. However, in a write heavy workload, there could be several tens of storefiles in a store at the time of splitting, and with a threadpool size of one, these files end up getting split sequentially. With a bit of tracing, I noticed that it takes on an average of 350ms to create a single reference file, and splitting each storefile involves creating two of these, so with a storefile count of 20, it takes about 14s just to get through this phase alone (2 reference files for each storefile), pushing the total time the region is offline to 18s or more. For environments that are setup to fail fast, this makes the client exhaust all retries and fail with NotServingRegionException. The fix should increase the concurrency of this operation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (HBASE-13959) Region splitting takes too long because it uses a single thread in most common cases
[ https://issues.apache.org/jira/browse/HBASE-13959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-13959: -- Comment: was deleted (was: I am able to apply patch (created with `git format-patch`) with -p1, not sure what is wrong. I will attach again generated with `git diff`.) Region splitting takes too long because it uses a single thread in most common cases Key: HBASE-13959 URL: https://issues.apache.org/jira/browse/HBASE-13959 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.98.12 Reporter: Hari Krishna Dara Assignee: Hari Krishna Dara Fix For: 0.98.14 Attachments: HBASE-13959-2.patch, HBASE-13959-3.patch, HBASE-13959-4.patch, HBASE-13959.patch, region-split-durations-compared.png When storefiles need to be split as part of a region split, the current logic uses a threadpool with the size set to the size of the number of stores. Since most common table setup involves only a single column family, this translates to having a single store and so the threadpool is run with a single thread. However, in a write heavy workload, there could be several tens of storefiles in a store at the time of splitting, and with a threadpool size of one, these files end up getting split sequentially. With a bit of tracing, I noticed that it takes on an average of 350ms to create a single reference file, and splitting each storefile involves creating two of these, so with a storefile count of 20, it takes about 14s just to get through this phase alone (2 reference files for each storefile), pushing the total time the region is offline to 18s or more. For environments that are setup to fail fast, this makes the client exhaust all retries and fail with NotServingRegionException. The fix should increase the concurrency of this operation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13971) Flushes stuck since 6 hours on a regionserver.
[ https://issues.apache.org/jira/browse/HBASE-13971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhilash updated HBASE-13971: - Attachment: screenshot-1.png Flushes stuck since 6 hours on a regionserver. -- Key: HBASE-13971 URL: https://issues.apache.org/jira/browse/HBASE-13971 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 1.3.0 Environment: Caused while running IntegrationTestLoadAndVerify for 20 M rows on cluster with 32 region servers each with max heap size of 24GBs. Reporter: Abhilash Priority: Critical Attachments: screenshot-1.png One region server stuck while flushing(possible deadlock). Its trying to flush two regions since last 6 hours (see the screenshot). Caused while running IntegrationTestLoadAndVerify for 20 M rows with 600 mapper jobs and 100 back references. ~37 Million writes on each regionserver till now but no writes happening on any other regionserver from past 6 hours and their memstore size is zero(I dont know if this is related). But this particular regionserver has memstore size of 9GBs from past 6 hours. Relevant snaps from debug dump: Tasks: === Task: Flushing IntegrationTestLoadAndVerify,R\x9B\x1B\xBF\xAE\x08\xD1\xA2,1435179555993.8e2d075f94ce7699f416ec4ced9873cd. Status: RUNNING:Preparing to flush by snapshotting stores in 8e2d075f94ce7699f416ec4ced9873cd Running for 22034s Task: Flushing IntegrationTestLoadAndVerify,\x93\xA385\x81Z\x11\xE6,1435179555993.9f8d0e01a40405b835bf6e5a22a86390. Status: RUNNING:Preparing to flush by snapshotting stores in 9f8d0e01a40405b835bf6e5a22a86390 Running for 22033s Executors: === ... Thread 139 (MemStoreFlusher.1): State: WAITING Blocked count: 139711 Waited count: 239212 Waiting on java.util.concurrent.CountDownLatch$Sync@b9c094a Stack: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997) java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231) org.apache.hadoop.hbase.wal.WALKey.getSequenceId(WALKey.java:305) org.apache.hadoop.hbase.regionserver.HRegion.getNextSequenceId(HRegion.java:2422) org.apache.hadoop.hbase.regionserver.HRegion.internalPrepareFlushCache(HRegion.java:2168) org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2047) org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2011) org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1902) org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:1828) org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:510) org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:471) org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$900(MemStoreFlusher.java:75) org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:259) java.lang.Thread.run(Thread.java:745) Thread 137 (MemStoreFlusher.0): State: WAITING Blocked count: 138931 Waited count: 237448 Waiting on java.util.concurrent.CountDownLatch$Sync@53f41f76 Stack: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997) java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231) org.apache.hadoop.hbase.wal.WALKey.getSequenceId(WALKey.java:305) org.apache.hadoop.hbase.regionserver.HRegion.getNextSequenceId(HRegion.java:2422) org.apache.hadoop.hbase.regionserver.HRegion.internalPrepareFlushCache(HRegion.java:2168) org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2047) org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2011) org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1902)
[jira] [Commented] (HBASE-13864) HColumnDescriptor should parse the output from master and from describe for ttl
[ https://issues.apache.org/jira/browse/HBASE-13864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601739#comment-14601739 ] Ashu Pachauri commented on HBASE-13864: --- The test failures don't seem related to the change. HColumnDescriptor should parse the output from master and from describe for ttl --- Key: HBASE-13864 URL: https://issues.apache.org/jira/browse/HBASE-13864 Project: HBase Issue Type: Bug Components: shell Reporter: Elliott Clark Assignee: Ashu Pachauri Attachments: HBASE-13864-1.patch, HBASE-13864-2.patch, HBASE-13864-3.patch, HBASE-13864.patch The TTL printing on HColumnDescriptor adds a human readable time. When using that string for the create command it throws an error. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13959) Region splitting takes too long because it uses a single thread in most common cases
[ https://issues.apache.org/jira/browse/HBASE-13959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-13959: -- Priority: Critical (was: Major) Region splitting takes too long because it uses a single thread in most common cases Key: HBASE-13959 URL: https://issues.apache.org/jira/browse/HBASE-13959 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.98.12 Reporter: Hari Krishna Dara Assignee: Hari Krishna Dara Priority: Critical Fix For: 0.98.14 Attachments: 13959-suggest.txt, HBASE-13959-2.patch, HBASE-13959-3.patch, HBASE-13959-4.patch, HBASE-13959.patch, region-split-durations-compared.png When storefiles need to be split as part of a region split, the current logic uses a threadpool with the size set to the size of the number of stores. Since most common table setup involves only a single column family, this translates to having a single store and so the threadpool is run with a single thread. However, in a write heavy workload, there could be several tens of storefiles in a store at the time of splitting, and with a threadpool size of one, these files end up getting split sequentially. With a bit of tracing, I noticed that it takes on an average of 350ms to create a single reference file, and splitting each storefile involves creating two of these, so with a storefile count of 20, it takes about 14s just to get through this phase alone (2 reference files for each storefile), pushing the total time the region is offline to 18s or more. For environments that are setup to fail fast, this makes the client exhaust all retries and fail with NotServingRegionException. The fix should increase the concurrency of this operation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)