[jira] [Updated] (HBASE-8558) I meet a strange phenomenon. when a regionserver die , meanwhile client which is performing put operation hangs.
[ https://issues.apache.org/jira/browse/HBASE-8558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang Xie updated HBASE-8558: - Attachment: HBASE-8558-0.94.txt I meet a strange phenomenon. when a regionserver die , meanwhile client which is performing put operation hangs. - Key: HBASE-8558 URL: https://issues.apache.org/jira/browse/HBASE-8558 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.94.5, 0.94.14 Reporter: wanbin Attachments: HBASE-8558-0.94.txt I run jstack at client host. The result is below. hbase-tablepool-60-thread-34 daemon prio=10 tid=0x7f1e65a48000 nid=0x5173 runnable [0x579cc000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) - locked 0x000758cb0780 (a sun.nio.ch.Util$2) - locked 0x000758cb0770 (a java.util.Collections$UnmodifiableSet) - locked 0x000758cb0548 (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:336) at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:158) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:153) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:114) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) - locked 0x000754e978a0 (a java.io.BufferedOutputStream) at java.io.DataOutputStream.flush(DataOutputStream.java:106) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.sendParam(HBaseClient.java:620) - locked 0x000754e97880 (a java.io.DataOutputStream) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:975) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:86) at $Proxy13.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1395) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1393) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:210) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1402) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1390) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) This thread have hung for one hours Meanwhile other thread try to close connection IPC Client (1983049639) connection to dump002030.cm6.tbsite.net/10.246.2.30:30020 from admin daemon prio=10 tid=0x7f1e70674800 nid=0x3d76 waiting for monitor entry [0x4bc0f000] java.lang.Thread.State: BLOCKED (on object monitor) at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) - waiting to lock 0x000754e978a0 (a java.io.BufferedOutputStream) at java.io.DataOutputStream.flush(DataOutputStream.java:106) at java.io.FilterOutputStream.close(FilterOutputStream.java:140) at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:237) at org.apache.hadoop.io.IOUtils.closeStream(IOUtils.java:254) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.close(HBaseClient.java:715) - locked 0x000754e7b818 (a org.apache.hadoop.hbase.ipc.HBaseClient$Connection) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:587) dump002030.cm6.tbsite.net is dead regionserver. I read hbase sourececode, discover connection.out doesn't set timeout this.out = new DataOutputStream (new BufferedOutputStream(NetUtils.getOutputStream(socket))); I see this mean epoll_wait will block indefinitely. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Assigned] (HBASE-8558) I meet a strange phenomenon. when a regionserver die , meanwhile client which is performing put operation hangs.
[ https://issues.apache.org/jira/browse/HBASE-8558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang Xie reassigned HBASE-8558: Assignee: Liang Xie I meet a strange phenomenon. when a regionserver die , meanwhile client which is performing put operation hangs. - Key: HBASE-8558 URL: https://issues.apache.org/jira/browse/HBASE-8558 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.94.5, 0.94.14 Reporter: wanbin Assignee: Liang Xie Attachments: HBASE-8558-0.94.txt I run jstack at client host. The result is below. hbase-tablepool-60-thread-34 daemon prio=10 tid=0x7f1e65a48000 nid=0x5173 runnable [0x579cc000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) - locked 0x000758cb0780 (a sun.nio.ch.Util$2) - locked 0x000758cb0770 (a java.util.Collections$UnmodifiableSet) - locked 0x000758cb0548 (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:336) at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:158) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:153) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:114) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) - locked 0x000754e978a0 (a java.io.BufferedOutputStream) at java.io.DataOutputStream.flush(DataOutputStream.java:106) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.sendParam(HBaseClient.java:620) - locked 0x000754e97880 (a java.io.DataOutputStream) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:975) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:86) at $Proxy13.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1395) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1393) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:210) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1402) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1390) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) This thread have hung for one hours Meanwhile other thread try to close connection IPC Client (1983049639) connection to dump002030.cm6.tbsite.net/10.246.2.30:30020 from admin daemon prio=10 tid=0x7f1e70674800 nid=0x3d76 waiting for monitor entry [0x4bc0f000] java.lang.Thread.State: BLOCKED (on object monitor) at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) - waiting to lock 0x000754e978a0 (a java.io.BufferedOutputStream) at java.io.DataOutputStream.flush(DataOutputStream.java:106) at java.io.FilterOutputStream.close(FilterOutputStream.java:140) at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:237) at org.apache.hadoop.io.IOUtils.closeStream(IOUtils.java:254) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.close(HBaseClient.java:715) - locked 0x000754e7b818 (a org.apache.hadoop.hbase.ipc.HBaseClient$Connection) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:587) dump002030.cm6.tbsite.net is dead regionserver. I read hbase sourececode, discover connection.out doesn't set timeout this.out = new DataOutputStream (new BufferedOutputStream(NetUtils.getOutputStream(socket))); I see this mean epoll_wait will block indefinitely. -- This message was sent by
[jira] [Updated] (HBASE-8558) I meet a strange phenomenon. when a regionserver die , meanwhile client which is performing put operation hangs.
[ https://issues.apache.org/jira/browse/HBASE-8558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang Xie updated HBASE-8558: - Affects Version/s: 0.94.14 Status: Patch Available (was: Open) I meet a strange phenomenon. when a regionserver die , meanwhile client which is performing put operation hangs. - Key: HBASE-8558 URL: https://issues.apache.org/jira/browse/HBASE-8558 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.94.14, 0.94.5 Reporter: wanbin Assignee: Liang Xie Attachments: HBASE-8558-0.94.txt I run jstack at client host. The result is below. hbase-tablepool-60-thread-34 daemon prio=10 tid=0x7f1e65a48000 nid=0x5173 runnable [0x579cc000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) - locked 0x000758cb0780 (a sun.nio.ch.Util$2) - locked 0x000758cb0770 (a java.util.Collections$UnmodifiableSet) - locked 0x000758cb0548 (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:336) at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:158) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:153) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:114) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) - locked 0x000754e978a0 (a java.io.BufferedOutputStream) at java.io.DataOutputStream.flush(DataOutputStream.java:106) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.sendParam(HBaseClient.java:620) - locked 0x000754e97880 (a java.io.DataOutputStream) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:975) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:86) at $Proxy13.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1395) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1393) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:210) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1402) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1390) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) This thread have hung for one hours Meanwhile other thread try to close connection IPC Client (1983049639) connection to dump002030.cm6.tbsite.net/10.246.2.30:30020 from admin daemon prio=10 tid=0x7f1e70674800 nid=0x3d76 waiting for monitor entry [0x4bc0f000] java.lang.Thread.State: BLOCKED (on object monitor) at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) - waiting to lock 0x000754e978a0 (a java.io.BufferedOutputStream) at java.io.DataOutputStream.flush(DataOutputStream.java:106) at java.io.FilterOutputStream.close(FilterOutputStream.java:140) at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:237) at org.apache.hadoop.io.IOUtils.closeStream(IOUtils.java:254) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.close(HBaseClient.java:715) - locked 0x000754e7b818 (a org.apache.hadoop.hbase.ipc.HBaseClient$Connection) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:587) dump002030.cm6.tbsite.net is dead regionserver. I read hbase sourececode, discover connection.out doesn't set timeout this.out = new DataOutputStream (new BufferedOutputStream(NetUtils.getOutputStream(socket))); I see this mean epoll_wait will
[jira] [Updated] (HBASE-8558) Add timeout limit for HBaseClient dataOutputStream
[ https://issues.apache.org/jira/browse/HBASE-8558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang Xie updated HBASE-8558: - Summary: Add timeout limit for HBaseClient dataOutputStream (was: I meet a strange phenomenon. when a regionserver die , meanwhile client which is performing put operation hangs. ) Add timeout limit for HBaseClient dataOutputStream -- Key: HBASE-8558 URL: https://issues.apache.org/jira/browse/HBASE-8558 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.94.5, 0.94.14 Reporter: wanbin Assignee: Liang Xie Attachments: HBASE-8558-0.94.txt I run jstack at client host. The result is below. hbase-tablepool-60-thread-34 daemon prio=10 tid=0x7f1e65a48000 nid=0x5173 runnable [0x579cc000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) - locked 0x000758cb0780 (a sun.nio.ch.Util$2) - locked 0x000758cb0770 (a java.util.Collections$UnmodifiableSet) - locked 0x000758cb0548 (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:336) at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:158) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:153) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:114) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) - locked 0x000754e978a0 (a java.io.BufferedOutputStream) at java.io.DataOutputStream.flush(DataOutputStream.java:106) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.sendParam(HBaseClient.java:620) - locked 0x000754e97880 (a java.io.DataOutputStream) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:975) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:86) at $Proxy13.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1395) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1393) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:210) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1402) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1390) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) This thread have hung for one hours Meanwhile other thread try to close connection IPC Client (1983049639) connection to dump002030.cm6.tbsite.net/10.246.2.30:30020 from admin daemon prio=10 tid=0x7f1e70674800 nid=0x3d76 waiting for monitor entry [0x4bc0f000] java.lang.Thread.State: BLOCKED (on object monitor) at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) - waiting to lock 0x000754e978a0 (a java.io.BufferedOutputStream) at java.io.DataOutputStream.flush(DataOutputStream.java:106) at java.io.FilterOutputStream.close(FilterOutputStream.java:140) at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:237) at org.apache.hadoop.io.IOUtils.closeStream(IOUtils.java:254) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.close(HBaseClient.java:715) - locked 0x000754e7b818 (a org.apache.hadoop.hbase.ipc.HBaseClient$Connection) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:587) dump002030.cm6.tbsite.net is dead regionserver. I read hbase sourececode, discover connection.out doesn't set timeout this.out = new DataOutputStream (new BufferedOutputStream(NetUtils.getOutputStream(socket))); I see this mean epoll_wait will block indefinitely.
[jira] [Commented] (HBASE-8558) Add timeout limit for HBaseClient dataOutputStream
[ https://issues.apache.org/jira/browse/HBASE-8558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13853801#comment-13853801 ] Liang Xie commented on HBASE-8558: -- Thanks [~wanbin] for your detailed report ! Current impl has a default 0 value for timeout, it really need an explicit setting:) Right now it's just a 0.94 branch issue, i found all 0.96+ branch have those similar code style already: {code} NetUtils.getOutputStream(socket, pingInterval); {code} [~lhofhansl], i didn't add/run any test case, but it's just small, so, i guess OK:) Add timeout limit for HBaseClient dataOutputStream -- Key: HBASE-8558 URL: https://issues.apache.org/jira/browse/HBASE-8558 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.94.5, 0.94.14 Reporter: wanbin Assignee: Liang Xie Attachments: HBASE-8558-0.94.txt I run jstack at client host. The result is below. hbase-tablepool-60-thread-34 daemon prio=10 tid=0x7f1e65a48000 nid=0x5173 runnable [0x579cc000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) - locked 0x000758cb0780 (a sun.nio.ch.Util$2) - locked 0x000758cb0770 (a java.util.Collections$UnmodifiableSet) - locked 0x000758cb0548 (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:336) at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:158) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:153) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:114) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) - locked 0x000754e978a0 (a java.io.BufferedOutputStream) at java.io.DataOutputStream.flush(DataOutputStream.java:106) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.sendParam(HBaseClient.java:620) - locked 0x000754e97880 (a java.io.DataOutputStream) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:975) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:86) at $Proxy13.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1395) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1393) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:210) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1402) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1390) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) This thread have hung for one hours Meanwhile other thread try to close connection IPC Client (1983049639) connection to dump002030.cm6.tbsite.net/10.246.2.30:30020 from admin daemon prio=10 tid=0x7f1e70674800 nid=0x3d76 waiting for monitor entry [0x4bc0f000] java.lang.Thread.State: BLOCKED (on object monitor) at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) - waiting to lock 0x000754e978a0 (a java.io.BufferedOutputStream) at java.io.DataOutputStream.flush(DataOutputStream.java:106) at java.io.FilterOutputStream.close(FilterOutputStream.java:140) at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:237) at org.apache.hadoop.io.IOUtils.closeStream(IOUtils.java:254) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.close(HBaseClient.java:715) - locked 0x000754e7b818 (a org.apache.hadoop.hbase.ipc.HBaseClient$Connection) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:587) dump002030.cm6.tbsite.net is dead
[jira] [Created] (HBASE-10213) Add read log size per second metrics for replication source
cuijianwei created HBASE-10213: -- Summary: Add read log size per second metrics for replication source Key: HBASE-10213 URL: https://issues.apache.org/jira/browse/HBASE-10213 Project: HBase Issue Type: Improvement Components: metrics, Replication Affects Versions: 0.94.14 Reporter: cuijianwei Priority: Minor The current metrics of replication source contain logEditsReadRate, shippedBatchesRate, etc, which could indicate how fast the data replicated to peer cluster to some extent. However, it is not clear enough to know how many bytes replicating to peer cluster from these metrics. In production environment, it may be important to know the size of replicating data per second because the services may be affected if the network become busy. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HBASE-10213) Add read log size per second metrics for replication source
[ https://issues.apache.org/jira/browse/HBASE-10213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] cuijianwei updated HBASE-10213: --- Attachment: HBASE-10213-0.94-v1.patch This patch adds a metric 'logReadRateInByte' to show how many bytes read by the source per second. Add read log size per second metrics for replication source --- Key: HBASE-10213 URL: https://issues.apache.org/jira/browse/HBASE-10213 Project: HBase Issue Type: Improvement Components: metrics, Replication Affects Versions: 0.94.14 Reporter: cuijianwei Priority: Minor Attachments: HBASE-10213-0.94-v1.patch The current metrics of replication source contain logEditsReadRate, shippedBatchesRate, etc, which could indicate how fast the data replicated to peer cluster to some extent. However, it is not clear enough to know how many bytes replicating to peer cluster from these metrics. In production environment, it may be important to know the size of replicating data per second because the services may be affected if the network become busy. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HBASE-10213) Add read log size per second metrics for replication source
[ https://issues.apache.org/jira/browse/HBASE-10213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang Xie updated HBASE-10213: -- Assignee: cuijianwei Status: Patch Available (was: Open) Add read log size per second metrics for replication source --- Key: HBASE-10213 URL: https://issues.apache.org/jira/browse/HBASE-10213 Project: HBase Issue Type: Improvement Components: metrics, Replication Affects Versions: 0.94.14 Reporter: cuijianwei Assignee: cuijianwei Priority: Minor Attachments: HBASE-10213-0.94-v1.patch The current metrics of replication source contain logEditsReadRate, shippedBatchesRate, etc, which could indicate how fast the data replicated to peer cluster to some extent. However, it is not clear enough to know how many bytes replicating to peer cluster from these metrics. In production environment, it may be important to know the size of replicating data per second because the services may be affected if the network become busy. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-7781) Update security unit tests to use a KDC if available
[ https://issues.apache.org/jira/browse/HBASE-7781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13853857#comment-13853857 ] ramkrishna.s.vasudevan commented on HBASE-7781: --- Before proceeding with the JIRA, so I went through what is given in all the JIRAs mentioned here. HADOOP-8078 also tries to start the ApacheDS but it seems to be an older version. HADOOP-9848 introduces miniKDC in the hadoop project itself as a module. So we would also be introducing the miniKDC in HBase side and all security testcases will run that along with the cluster? So the miniKDC available in hbase will be a seperate module(like in hadoop) or will it be a class that just allows to start a minikdc? Update security unit tests to use a KDC if available Key: HBASE-7781 URL: https://issues.apache.org/jira/browse/HBASE-7781 Project: HBase Issue Type: Test Components: security, test Reporter: Gary Helmling Assignee: ramkrishna.s.vasudevan Priority: Blocker Fix For: 0.98.0 We currently have large holes in the test coverage of HBase with security enabled. Two recent examples of bugs which really should have been caught with testing are HBASE-7771 and HBASE-7772. The long standing problem with testing with security enabled has been the requirement for supporting kerberos infrastructure. We need to close this gap and provide some automated testing with security enabled, if necessary standing up and provisioning a temporary KDC as an option for running integration tests, see HADOOP-8078 and HADOOP-9004 where a similar approach was taken. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Created] (HBASE-10214) Regionserver shutdown impropery and leave the dir in .old not delete.
binlijin created HBASE-10214: Summary: Regionserver shutdown impropery and leave the dir in .old not delete. Key: HBASE-10214 URL: https://issues.apache.org/jira/browse/HBASE-10214 Project: HBase Issue Type: Bug Reporter: binlijin RegionServer log {code} 2013-12-18 15:17:45,771 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: Waiting on 51b27391410efdca841db264df46085f 2013-12-18 15:17:45,776 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Connected to master at null 2013-12-18 15:17:48,776 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Exiting; cluster shutdown set and not carrying any regions 2013-12-18 15:17:48,776 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server node,60020,1384410974572: Unhandled exception: null java.lang.NullPointerException at org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:880) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:753) at java.lang.Thread.run(Thread.java:662) {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HBASE-10214) Regionserver shutdown impropery and leave the dir in .old not delete.
[ https://issues.apache.org/jira/browse/HBASE-10214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] binlijin updated HBASE-10214: - Attachment: HBASE-10214.patch Regionserver shutdown impropery and leave the dir in .old not delete. - Key: HBASE-10214 URL: https://issues.apache.org/jira/browse/HBASE-10214 Project: HBase Issue Type: Bug Reporter: binlijin Attachments: HBASE-10214.patch RegionServer log {code} 2013-12-18 15:17:45,771 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: Waiting on 51b27391410efdca841db264df46085f 2013-12-18 15:17:45,776 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Connected to master at null 2013-12-18 15:17:48,776 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Exiting; cluster shutdown set and not carrying any regions 2013-12-18 15:17:48,776 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server node,60020,1384410974572: Unhandled exception: null java.lang.NullPointerException at org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:880) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:753) at java.lang.Thread.run(Thread.java:662) {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HBASE-10214) Regionserver shutdown impropery and leave the dir in .old not delete.
[ https://issues.apache.org/jira/browse/HBASE-10214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] binlijin updated HBASE-10214: - Attachment: HBASE-10214-94.patch Regionserver shutdown impropery and leave the dir in .old not delete. - Key: HBASE-10214 URL: https://issues.apache.org/jira/browse/HBASE-10214 Project: HBase Issue Type: Bug Reporter: binlijin Attachments: HBASE-10214-94.patch RegionServer log {code} 2013-12-18 15:17:45,771 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: Waiting on 51b27391410efdca841db264df46085f 2013-12-18 15:17:45,776 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Connected to master at null 2013-12-18 15:17:48,776 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Exiting; cluster shutdown set and not carrying any regions 2013-12-18 15:17:48,776 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server node,60020,1384410974572: Unhandled exception: null java.lang.NullPointerException at org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:880) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:753) at java.lang.Thread.run(Thread.java:662) {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10214) Regionserver shutdown impropery and leave the dir in .old not delete.
[ https://issues.apache.org/jira/browse/HBASE-10214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13853876#comment-13853876 ] binlijin commented on HBASE-10214: -- Looks like the trunk don't have this problem and the patch is based on 0.94-branch. Regionserver shutdown impropery and leave the dir in .old not delete. - Key: HBASE-10214 URL: https://issues.apache.org/jira/browse/HBASE-10214 Project: HBase Issue Type: Bug Reporter: binlijin Attachments: HBASE-10214-94.patch RegionServer log {code} 2013-12-18 15:17:45,771 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: Waiting on 51b27391410efdca841db264df46085f 2013-12-18 15:17:45,776 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Connected to master at null 2013-12-18 15:17:48,776 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Exiting; cluster shutdown set and not carrying any regions 2013-12-18 15:17:48,776 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server node,60020,1384410974572: Unhandled exception: null java.lang.NullPointerException at org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:880) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:753) at java.lang.Thread.run(Thread.java:662) {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HBASE-10214) Regionserver shutdown impropery and leave the dir in .old not delete.
[ https://issues.apache.org/jira/browse/HBASE-10214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] binlijin updated HBASE-10214: - Attachment: (was: HBASE-10214.patch) Regionserver shutdown impropery and leave the dir in .old not delete. - Key: HBASE-10214 URL: https://issues.apache.org/jira/browse/HBASE-10214 Project: HBase Issue Type: Bug Reporter: binlijin Attachments: HBASE-10214-94.patch RegionServer log {code} 2013-12-18 15:17:45,771 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: Waiting on 51b27391410efdca841db264df46085f 2013-12-18 15:17:45,776 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Connected to master at null 2013-12-18 15:17:48,776 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Exiting; cluster shutdown set and not carrying any regions 2013-12-18 15:17:48,776 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server node,60020,1384410974572: Unhandled exception: null java.lang.NullPointerException at org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:880) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:753) at java.lang.Thread.run(Thread.java:662) {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HBASE-10161) [AccessController] Tolerate regions in recovery
[ https://issues.apache.org/jira/browse/HBASE-10161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anoop Sam John updated HBASE-10161: --- Status: Open (was: Patch Available) [AccessController] Tolerate regions in recovery --- Key: HBASE-10161 URL: https://issues.apache.org/jira/browse/HBASE-10161 Project: HBase Issue Type: Bug Affects Versions: 0.96.0 Reporter: Andrew Purtell Assignee: Anoop Sam John Priority: Blocker Fix For: 0.98.0, 0.96.2, 0.99.0 Attachments: HBASE-10161.patch, HBASE-10161_V2.patch AccessController fixes for the issue also affecting VisibilityController described on HBASE-10148. Coprocessors that initialize in postOpen upcalls must check if the region is still in recovery and defer initialization until recovery is complete. We need to add a new CP hook for post recovery upcalls and modify existing CPs to defer initialization until this new hook as needed. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HBASE-10161) [AccessController] Tolerate regions in recovery
[ https://issues.apache.org/jira/browse/HBASE-10161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anoop Sam John updated HBASE-10161: --- Attachment: (was: HBASE-10161_V2.patch) [AccessController] Tolerate regions in recovery --- Key: HBASE-10161 URL: https://issues.apache.org/jira/browse/HBASE-10161 Project: HBase Issue Type: Bug Affects Versions: 0.96.0 Reporter: Andrew Purtell Assignee: Anoop Sam John Priority: Blocker Fix For: 0.98.0, 0.96.2, 0.99.0 Attachments: HBASE-10161.patch, HBASE-10161_V2.patch AccessController fixes for the issue also affecting VisibilityController described on HBASE-10148. Coprocessors that initialize in postOpen upcalls must check if the region is still in recovery and defer initialization until recovery is complete. We need to add a new CP hook for post recovery upcalls and modify existing CPs to defer initialization until this new hook as needed. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HBASE-10161) [AccessController] Tolerate regions in recovery
[ https://issues.apache.org/jira/browse/HBASE-10161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anoop Sam John updated HBASE-10161: --- Attachment: HBASE-10161_V2.patch [AccessController] Tolerate regions in recovery --- Key: HBASE-10161 URL: https://issues.apache.org/jira/browse/HBASE-10161 Project: HBase Issue Type: Bug Affects Versions: 0.96.0 Reporter: Andrew Purtell Assignee: Anoop Sam John Priority: Blocker Fix For: 0.98.0, 0.96.2, 0.99.0 Attachments: HBASE-10161.patch, HBASE-10161_V2.patch AccessController fixes for the issue also affecting VisibilityController described on HBASE-10148. Coprocessors that initialize in postOpen upcalls must check if the region is still in recovery and defer initialization until recovery is complete. We need to add a new CP hook for post recovery upcalls and modify existing CPs to defer initialization until this new hook as needed. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HBASE-10161) [AccessController] Tolerate regions in recovery
[ https://issues.apache.org/jira/browse/HBASE-10161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anoop Sam John updated HBASE-10161: --- Status: Patch Available (was: Open) [AccessController] Tolerate regions in recovery --- Key: HBASE-10161 URL: https://issues.apache.org/jira/browse/HBASE-10161 Project: HBase Issue Type: Bug Affects Versions: 0.96.0 Reporter: Andrew Purtell Assignee: Anoop Sam John Priority: Blocker Fix For: 0.98.0, 0.96.2, 0.99.0 Attachments: HBASE-10161.patch, HBASE-10161_V2.patch AccessController fixes for the issue also affecting VisibilityController described on HBASE-10148. Coprocessors that initialize in postOpen upcalls must check if the region is still in recovery and defer initialization until recovery is complete. We need to add a new CP hook for post recovery upcalls and modify existing CPs to defer initialization until this new hook as needed. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Created] (HBASE-10215) TableNotFoundException should be thrown after removing stale znode in ETH
rajeshbabu created HBASE-10215: -- Summary: TableNotFoundException should be thrown after removing stale znode in ETH Key: HBASE-10215 URL: https://issues.apache.org/jira/browse/HBASE-10215 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.94.14, 0.96.1 Reporter: rajeshbabu Assignee: rajeshbabu Priority: Minor Fix For: 0.98.0, 0.94.16, 0.96.2, 0.99.0 Lets suppose master went down while creating table then znode will be left in ENABLING state. Master to recover them on restart. If there are no meta entries for the table. While recovering the table we are checking whether table exists in meta or not, if not we are removing the znode. After removing znode we need to throw TableNotFoundException. Presently not throwing the exception so the znode will be recrated. It will be stale forever. Even on master restart we cannot delete. We cannot create the table with same name also. {code} // Check if table exists if (!MetaReader.tableExists(catalogTracker, tableName)) { // retainAssignment is true only during recovery. In normal case it is false if (!this.skipTableStateCheck) { throw new TableNotFoundException(tableName); } try { this.assignmentManager.getZKTable().removeEnablingTable(tableName, true); } catch (KeeperException e) { // TODO : Use HBCK to clear such nodes LOG.warn(Failed to delete the ENABLING node for the table + tableName + . The table will remain unusable. Run HBCK to manually fix the problem.); } } {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10175) 2-thread ChaosMonkey steps on its own toes
[ https://issues.apache.org/jira/browse/HBASE-10175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13853931#comment-13853931 ] Hadoop QA commented on HBASE-10175: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12619725/HBASE-10175.patch against trunk revision . ATTACHMENT ID: 12619725 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 21 new or modified tests. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop 1.1 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.security.access.TestAccessController Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8239//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8239//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8239//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8239//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8239//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8239//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8239//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8239//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8239//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8239//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8239//console This message is automatically generated. 2-thread ChaosMonkey steps on its own toes -- Key: HBASE-10175 URL: https://issues.apache.org/jira/browse/HBASE-10175 Project: HBase Issue Type: Improvement Components: test Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Priority: Minor Attachments: HBASE-10175.patch ChaosMonkey with one destructive and one volatility (flush-compact-split-etc.) threads steps on its own toes and logs a lot of exceptions. A simple solution would be to catch most (or all), like NotServingRegionException, and log less (not a full callstack for example, it's not very useful anyway). A more complicated/complementary one would be to keep track which regions the destructive thread affects and use other regions for volatile one. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HBASE-10215) TableNotFoundException should be thrown after removing stale znode in ETH
[ https://issues.apache.org/jira/browse/HBASE-10215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] rajeshbabu updated HBASE-10215: --- Status: Patch Available (was: Open) TableNotFoundException should be thrown after removing stale znode in ETH - Key: HBASE-10215 URL: https://issues.apache.org/jira/browse/HBASE-10215 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.94.14, 0.96.1 Reporter: rajeshbabu Assignee: rajeshbabu Priority: Minor Fix For: 0.98.0, 0.94.16, 0.96.2, 0.99.0 Attachments: HBASE-10215.patch Lets suppose master went down while creating table then znode will be left in ENABLING state. Master to recover them on restart. If there are no meta entries for the table. While recovering the table we are checking whether table exists in meta or not, if not we are removing the znode. After removing znode we need to throw TableNotFoundException. Presently not throwing the exception so the znode will be recrated. It will be stale forever. Even on master restart we cannot delete. We cannot create the table with same name also. {code} // Check if table exists if (!MetaReader.tableExists(catalogTracker, tableName)) { // retainAssignment is true only during recovery. In normal case it is false if (!this.skipTableStateCheck) { throw new TableNotFoundException(tableName); } try { this.assignmentManager.getZKTable().removeEnablingTable(tableName, true); } catch (KeeperException e) { // TODO : Use HBCK to clear such nodes LOG.warn(Failed to delete the ENABLING node for the table + tableName + . The table will remain unusable. Run HBCK to manually fix the problem.); } } {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HBASE-10215) TableNotFoundException should be thrown after removing stale znode in ETH
[ https://issues.apache.org/jira/browse/HBASE-10215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] rajeshbabu updated HBASE-10215: --- Attachment: HBASE-10215.patch Patch for trunk. Please review. TableNotFoundException should be thrown after removing stale znode in ETH - Key: HBASE-10215 URL: https://issues.apache.org/jira/browse/HBASE-10215 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.96.1, 0.94.14 Reporter: rajeshbabu Assignee: rajeshbabu Priority: Minor Fix For: 0.98.0, 0.94.16, 0.96.2, 0.99.0 Attachments: HBASE-10215.patch Lets suppose master went down while creating table then znode will be left in ENABLING state. Master to recover them on restart. If there are no meta entries for the table. While recovering the table we are checking whether table exists in meta or not, if not we are removing the znode. After removing znode we need to throw TableNotFoundException. Presently not throwing the exception so the znode will be recrated. It will be stale forever. Even on master restart we cannot delete. We cannot create the table with same name also. {code} // Check if table exists if (!MetaReader.tableExists(catalogTracker, tableName)) { // retainAssignment is true only during recovery. In normal case it is false if (!this.skipTableStateCheck) { throw new TableNotFoundException(tableName); } try { this.assignmentManager.getZKTable().removeEnablingTable(tableName, true); } catch (KeeperException e) { // TODO : Use HBCK to clear such nodes LOG.warn(Failed to delete the ENABLING node for the table + tableName + . The table will remain unusable. Run HBCK to manually fix the problem.); } } {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10213) Add read log size per second metrics for replication source
[ https://issues.apache.org/jira/browse/HBASE-10213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13853941#comment-13853941 ] Hadoop QA commented on HBASE-10213: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12619783/HBASE-10213-0.94-v1.patch against trunk revision . ATTACHMENT ID: 12619783 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8242//console This message is automatically generated. Add read log size per second metrics for replication source --- Key: HBASE-10213 URL: https://issues.apache.org/jira/browse/HBASE-10213 Project: HBase Issue Type: Improvement Components: metrics, Replication Affects Versions: 0.94.14 Reporter: cuijianwei Assignee: cuijianwei Priority: Minor Attachments: HBASE-10213-0.94-v1.patch The current metrics of replication source contain logEditsReadRate, shippedBatchesRate, etc, which could indicate how fast the data replicated to peer cluster to some extent. However, it is not clear enough to know how many bytes replicating to peer cluster from these metrics. In production environment, it may be important to know the size of replicating data per second because the services may be affected if the network become busy. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-8859) truncate_preserve should get table split keys as it is instead of converting them to string type and then again to bytes
[ https://issues.apache.org/jira/browse/HBASE-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13853945#comment-13853945 ] Hadoop QA commented on HBASE-8859: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12619748/HBASE-8859_trunk_4.patch against trunk revision . ATTACHMENT ID: 12619748 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop 1.1 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.security.access.TestAccessController Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8241//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8241//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8241//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8241//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8241//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8241//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8241//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8241//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8241//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8241//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8241//console This message is automatically generated. truncate_preserve should get table split keys as it is instead of converting them to string type and then again to bytes Key: HBASE-8859 URL: https://issues.apache.org/jira/browse/HBASE-8859 Project: HBase Issue Type: Bug Components: scripts Affects Versions: 0.95.1 Reporter: rajeshbabu Assignee: rajeshbabu Fix For: 0.98.0, 0.99.0 Attachments: HBASE-8859-Test_to_reproduce.patch, HBASE-8859_trunk.patch, HBASE-8859_trunk_2.patch, HBASE-8859_trunk_3.patch, HBASE-8859_trunk_4.patch If we take int,long or double bytes as split keys then we are not creating table with same split keys because converting them to strings directly and to bytes is giving different split keys, sometimes getting IllegalArgument exception because of same split keys(converted). Instead we can get split keys directly from HTable and pass them while creating table. {code} h_table = org.apache.hadoop.hbase.client.HTable.new(conf, table_name) splits = h_table.getRegionLocations().keys().map{|i| i.getStartKey} :byte splits = org.apache.hadoop.hbase.util.Bytes.toByteArrays(splits) {code} {code} Truncating 'emp3' table (it may take a while): - Disabling table... - Dropping table... - Creating table with region boundaries... ERROR: java.lang.IllegalArgumentException: All split keys must be unique, found duplicate:
[jira] [Commented] (HBASE-8558) Add timeout limit for HBaseClient dataOutputStream
[ https://issues.apache.org/jira/browse/HBASE-8558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13853950#comment-13853950 ] Hadoop QA commented on HBASE-8558: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12619777/HBASE-8558-0.94.txt against trunk revision . ATTACHMENT ID: 12619777 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8244//console This message is automatically generated. Add timeout limit for HBaseClient dataOutputStream -- Key: HBASE-8558 URL: https://issues.apache.org/jira/browse/HBASE-8558 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.94.5, 0.94.14 Reporter: wanbin Assignee: Liang Xie Attachments: HBASE-8558-0.94.txt I run jstack at client host. The result is below. hbase-tablepool-60-thread-34 daemon prio=10 tid=0x7f1e65a48000 nid=0x5173 runnable [0x579cc000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) - locked 0x000758cb0780 (a sun.nio.ch.Util$2) - locked 0x000758cb0770 (a java.util.Collections$UnmodifiableSet) - locked 0x000758cb0548 (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:336) at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:158) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:153) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:114) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) - locked 0x000754e978a0 (a java.io.BufferedOutputStream) at java.io.DataOutputStream.flush(DataOutputStream.java:106) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.sendParam(HBaseClient.java:620) - locked 0x000754e97880 (a java.io.DataOutputStream) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:975) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:86) at $Proxy13.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1395) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1393) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:210) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1402) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1390) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) This thread have hung for one hours Meanwhile other thread try to close connection IPC Client (1983049639) connection to dump002030.cm6.tbsite.net/10.246.2.30:30020 from admin daemon prio=10 tid=0x7f1e70674800 nid=0x3d76 waiting for monitor entry [0x4bc0f000] java.lang.Thread.State: BLOCKED (on object monitor) at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) - waiting to lock 0x000754e978a0 (a java.io.BufferedOutputStream) at java.io.DataOutputStream.flush(DataOutputStream.java:106) at java.io.FilterOutputStream.close(FilterOutputStream.java:140) at
[jira] [Commented] (HBASE-10214) Regionserver shutdown impropery and leave the dir in .old not delete.
[ https://issues.apache.org/jira/browse/HBASE-10214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13853952#comment-13853952 ] Jean-Marc Spaggiari commented on HBASE-10214: - Hi [~aoxiang], which HBase version did you try with? The trace doesn't seems to allign with a recent one. Regionserver shutdown impropery and leave the dir in .old not delete. - Key: HBASE-10214 URL: https://issues.apache.org/jira/browse/HBASE-10214 Project: HBase Issue Type: Bug Reporter: binlijin Attachments: HBASE-10214-94.patch RegionServer log {code} 2013-12-18 15:17:45,771 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: Waiting on 51b27391410efdca841db264df46085f 2013-12-18 15:17:45,776 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Connected to master at null 2013-12-18 15:17:48,776 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Exiting; cluster shutdown set and not carrying any regions 2013-12-18 15:17:48,776 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server node,60020,1384410974572: Unhandled exception: null java.lang.NullPointerException at org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:880) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:753) at java.lang.Thread.run(Thread.java:662) {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10214) Regionserver shutdown impropery and leave the dir in .old not delete.
[ https://issues.apache.org/jira/browse/HBASE-10214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13853959#comment-13853959 ] binlijin commented on HBASE-10214: -- [~jmspaggi],i use 0.94.10 , this patch is for 0.94 version. Regionserver shutdown impropery and leave the dir in .old not delete. - Key: HBASE-10214 URL: https://issues.apache.org/jira/browse/HBASE-10214 Project: HBase Issue Type: Bug Reporter: binlijin Attachments: HBASE-10214-94.patch RegionServer log {code} 2013-12-18 15:17:45,771 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: Waiting on 51b27391410efdca841db264df46085f 2013-12-18 15:17:45,776 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Connected to master at null 2013-12-18 15:17:48,776 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Exiting; cluster shutdown set and not carrying any regions 2013-12-18 15:17:48,776 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server node,60020,1384410974572: Unhandled exception: null java.lang.NullPointerException at org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:880) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:753) at java.lang.Thread.run(Thread.java:662) {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-9346) HBCK should provide an option to check if regions boundaries are the same in META and in stores.
[ https://issues.apache.org/jira/browse/HBASE-9346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13853960#comment-13853960 ] Jean-Marc Spaggiari commented on HBASE-9346: For the = 0 vs 0 I think we should keep = 0 The storesLastKey should almost never be equal to metaLastKey, but there is nothing to avoid that so it still can be. If it's never equal, then = will not hurt. If it is, then = will be we good to have it. I might be wrong ;) But that's seems to be correct. HBCK should provide an option to check if regions boundaries are the same in META and in stores. Key: HBASE-9346 URL: https://issues.apache.org/jira/browse/HBASE-9346 Project: HBase Issue Type: Bug Components: hbck, Operability Affects Versions: 0.94.14, 0.98.1, 0.99.0, 0.96.1.1 Reporter: Jean-Marc Spaggiari Assignee: Jean-Marc Spaggiari Attachments: HBASE-9346-v0-0.94.patch, HBASE-9346-v1-trunk.patch, HBASE-9346-v2-trunk.patch, HBASE-9346-v3-trunk.patch, HBASE-9346-v4-trunk.patch, HBASE-9346-v5-trunk.patch, HBASE-9346-v6-trunk.patch, HBASE-9346-v7-trunk.patch, HBASE-9346-v8-trunk.patch If META don't have the same region boundaries as the stores files, writes and read might go to the wrong place. We need to provide a way to check that withing HBCK. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10214) Regionserver shutdown impropery and leave the dir in .old not delete.
[ https://issues.apache.org/jira/browse/HBASE-10214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13853967#comment-13853967 ] Jean-Marc Spaggiari commented on HBASE-10214: - I'm not able to find the same lines into 0.94.10 neither. Line 880 of HRegionServer is: {code} closeWAL(abortRequested ? false : true); {code} Line 753 is: {code} registerMBean(); {code} Code for 0.94.10 and 0.94.15 is the same for tryRegionServerReport() so should not be an issue. But might be interesting to see what was throwing this NPE... Was it hbaseMaster like in your patch? Regionserver shutdown impropery and leave the dir in .old not delete. - Key: HBASE-10214 URL: https://issues.apache.org/jira/browse/HBASE-10214 Project: HBase Issue Type: Bug Reporter: binlijin Attachments: HBASE-10214-94.patch RegionServer log {code} 2013-12-18 15:17:45,771 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: Waiting on 51b27391410efdca841db264df46085f 2013-12-18 15:17:45,776 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Connected to master at null 2013-12-18 15:17:48,776 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Exiting; cluster shutdown set and not carrying any regions 2013-12-18 15:17:48,776 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server node,60020,1384410974572: Unhandled exception: null java.lang.NullPointerException at org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:880) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:753) at java.lang.Thread.run(Thread.java:662) {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10173) Need HFile version check in security coprocessors
[ https://issues.apache.org/jira/browse/HBASE-10173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13853979#comment-13853979 ] Hudson commented on HBASE-10173: FAILURE: Integrated in HBase-0.98 #26 (See [https://builds.apache.org/job/HBase-0.98/26/]) HBASE-10173. Need HFile version check in security coprocessors (apurtell: rev 1552504) * /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java * /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/AccessController.java * /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/security/visibility/VisibilityController.java * /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/rest/TestScannersWithLabels.java * /hbase/branches/0.98/hbase-thrift/src/test/java/org/apache/hadoop/hbase/thrift2/TestThriftHBaseServiceHandlerWithLabels.java Need HFile version check in security coprocessors - Key: HBASE-10173 URL: https://issues.apache.org/jira/browse/HBASE-10173 Project: HBase Issue Type: Improvement Components: security Affects Versions: 0.98.0, 0.99.0 Reporter: Anoop Sam John Assignee: Andrew Purtell Priority: Critical Fix For: 0.98.0, 0.99.0 Attachments: 10173.patch, 10173.patch, HBASE-10173.patch, HBASE-10173_partial.patch Cell level visibility labels are stored as cell tags. So HFile V3 is the minimum version which can support this feature. Better to have a version check in VisibilityController. Some one using this CP but with any HFile version as V2, we can better throw error. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10138) incorrect or confusing test value is used in block caches
[ https://issues.apache.org/jira/browse/HBASE-10138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13853978#comment-13853978 ] Hudson commented on HBASE-10138: FAILURE: Integrated in HBase-0.98 #26 (See [https://builds.apache.org/job/HBase-0.98/26/]) HBASE-10138. Incorrect or confusing test value is used in block caches (Sergey Shelukhin) (apurtell: rev 1552505) * /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/CacheConfig.java * /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketCache.java * /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java * /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/bucket/TestBucketCache.java * /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java * /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java incorrect or confusing test value is used in block caches - Key: HBASE-10138 URL: https://issues.apache.org/jira/browse/HBASE-10138 Project: HBase Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Fix For: 0.98.0, 0.99.0 Attachments: HBASE-10138.patch DEFAULT_BLOCKSIZE_SMALL is described as: {code} // Make default block size for StoreFiles 8k while testing. TODO: FIX! // Need to make it 8k for testing. public static final int DEFAULT_BLOCKSIZE_SMALL = 8 * 1024; {code} This value is used on production path in CacheConfig thru HStore/HRegion, and passed to various cache object. We should change it to actual block size, or if it is somehow by design at least we should clarify it and remove the comment. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10207) ZKVisibilityLabelWatcher : Populate the labels cache on startup
[ https://issues.apache.org/jira/browse/HBASE-10207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13853980#comment-13853980 ] Hudson commented on HBASE-10207: FAILURE: Integrated in HBase-0.98 #26 (See [https://builds.apache.org/job/HBase-0.98/26/]) HBASE-10207 ZKVisibilityLabelWatcher : Populate the labels cache on startup (anoopsamjohn: rev 1552489) * /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/security/visibility/ZKVisibilityLabelWatcher.java ZKVisibilityLabelWatcher : Populate the labels cache on startup --- Key: HBASE-10207 URL: https://issues.apache.org/jira/browse/HBASE-10207 Project: HBase Issue Type: Bug Affects Versions: 0.98.0 Reporter: Anoop Sam John Assignee: Anoop Sam John Priority: Blocker Fix For: 0.98.0, 0.99.0 Attachments: HBASE-10207.patch -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10214) Regionserver shutdown impropery and leave the dir in .old not delete.
[ https://issues.apache.org/jira/browse/HBASE-10214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13853984#comment-13853984 ] binlijin commented on HBASE-10214: -- oh, sorry, the line is not match with the apache hbase 0.94.10 version, this is our own internal version which based on apache hbase 0.94.10. {code} long now = System.currentTimeMillis(); if ((now - lastMsg) = msgInterval) { doMetrics(); tryRegionServerReport(); // 753 lastMsg = System.currentTimeMillis(); } if (!this.stopped) this.sleeper.sleep(); void tryRegionServerReport() throws IOException { HServerLoad hsl = buildServerLoad(); // Why we do this? this.requestCount.set(0); try { this.hbaseMaster.regionServerReport(this.serverNameFromMasterPOV.getVersionedBytes(), hsl); // line 880 } catch (IOException ioe) { if (ioe instanceof RemoteException) { ioe = ((RemoteException)ioe).unwrapRemoteException(); } if (ioe instanceof YouAreDeadException) { // This will be caught and handled as a fatal error in run() throw ioe; } // Couldn't connect to the master, get location from zk and reconnect // Method blocks until new master is found or we are stopped getMaster(); } } {code} Regionserver shutdown impropery and leave the dir in .old not delete. - Key: HBASE-10214 URL: https://issues.apache.org/jira/browse/HBASE-10214 Project: HBase Issue Type: Bug Reporter: binlijin Attachments: HBASE-10214-94.patch RegionServer log {code} 2013-12-18 15:17:45,771 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: Waiting on 51b27391410efdca841db264df46085f 2013-12-18 15:17:45,776 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Connected to master at null 2013-12-18 15:17:48,776 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Exiting; cluster shutdown set and not carrying any regions 2013-12-18 15:17:48,776 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server node,60020,1384410974572: Unhandled exception: null java.lang.NullPointerException at org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:880) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:753) at java.lang.Thread.run(Thread.java:662) {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10161) [AccessController] Tolerate regions in recovery
[ https://issues.apache.org/jira/browse/HBASE-10161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13853990#comment-13853990 ] Hadoop QA commented on HBASE-10161: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12619800/HBASE-10161_V2.patch against trunk revision . ATTACHMENT ID: 12619800 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop 1.1 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.security.access.TestAccessController Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8243//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8243//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8243//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8243//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8243//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8243//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8243//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8243//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8243//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8243//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8243//console This message is automatically generated. [AccessController] Tolerate regions in recovery --- Key: HBASE-10161 URL: https://issues.apache.org/jira/browse/HBASE-10161 Project: HBase Issue Type: Bug Affects Versions: 0.96.0 Reporter: Andrew Purtell Assignee: Anoop Sam John Priority: Blocker Fix For: 0.98.0, 0.96.2, 0.99.0 Attachments: HBASE-10161.patch, HBASE-10161_V2.patch AccessController fixes for the issue also affecting VisibilityController described on HBASE-10148. Coprocessors that initialize in postOpen upcalls must check if the region is still in recovery and defer initialization until recovery is complete. We need to add a new CP hook for post recovery upcalls and modify existing CPs to defer initialization until this new hook as needed. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10207) ZKVisibilityLabelWatcher : Populate the labels cache on startup
[ https://issues.apache.org/jira/browse/HBASE-10207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854006#comment-13854006 ] Hudson commented on HBASE-10207: FAILURE: Integrated in HBase-TRUNK #4741 (See [https://builds.apache.org/job/HBase-TRUNK/4741/]) HBASE-10207 ZKVisibilityLabelWatcher : Populate the labels cache on startup (anoopsamjohn: rev 1552488) * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/security/visibility/ZKVisibilityLabelWatcher.java ZKVisibilityLabelWatcher : Populate the labels cache on startup --- Key: HBASE-10207 URL: https://issues.apache.org/jira/browse/HBASE-10207 Project: HBase Issue Type: Bug Affects Versions: 0.98.0 Reporter: Anoop Sam John Assignee: Anoop Sam John Priority: Blocker Fix For: 0.98.0, 0.99.0 Attachments: HBASE-10207.patch -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10173) Need HFile version check in security coprocessors
[ https://issues.apache.org/jira/browse/HBASE-10173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854005#comment-13854005 ] Hudson commented on HBASE-10173: FAILURE: Integrated in HBase-TRUNK #4741 (See [https://builds.apache.org/job/HBase-TRUNK/4741/]) HBASE-10173. Need HFile version check in security coprocessors (apurtell: rev 1552503) * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/AccessController.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/security/visibility/VisibilityController.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/rest/TestScannersWithLabels.java * /hbase/trunk/hbase-thrift/src/test/java/org/apache/hadoop/hbase/thrift2/TestThriftHBaseServiceHandlerWithLabels.java Need HFile version check in security coprocessors - Key: HBASE-10173 URL: https://issues.apache.org/jira/browse/HBASE-10173 Project: HBase Issue Type: Improvement Components: security Affects Versions: 0.98.0, 0.99.0 Reporter: Anoop Sam John Assignee: Andrew Purtell Priority: Critical Fix For: 0.98.0, 0.99.0 Attachments: 10173.patch, 10173.patch, HBASE-10173.patch, HBASE-10173_partial.patch Cell level visibility labels are stored as cell tags. So HFile V3 is the minimum version which can support this feature. Better to have a version check in VisibilityController. Some one using this CP but with any HFile version as V2, we can better throw error. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10214) Regionserver shutdown impropery and leave the dir in .old not delete.
[ https://issues.apache.org/jira/browse/HBASE-10214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854013#comment-13854013 ] Jean-Marc Spaggiari commented on HBASE-10214: - Ok. Make sense now ;) Thanks for the clarification. Is there any risk for isClusterUp() to return true but for hbaseMaster to be null? If so, we will still get a NPE. no? Regionserver shutdown impropery and leave the dir in .old not delete. - Key: HBASE-10214 URL: https://issues.apache.org/jira/browse/HBASE-10214 Project: HBase Issue Type: Bug Reporter: binlijin Attachments: HBASE-10214-94.patch RegionServer log {code} 2013-12-18 15:17:45,771 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: Waiting on 51b27391410efdca841db264df46085f 2013-12-18 15:17:45,776 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Connected to master at null 2013-12-18 15:17:48,776 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Exiting; cluster shutdown set and not carrying any regions 2013-12-18 15:17:48,776 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server node,60020,1384410974572: Unhandled exception: null java.lang.NullPointerException at org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:880) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:753) at java.lang.Thread.run(Thread.java:662) {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-9151) HBCK cannot fix when meta server znode deleted, this can happen if all region servers stopped and there are no logs to split.
[ https://issues.apache.org/jira/browse/HBASE-9151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854023#comment-13854023 ] Hadoop QA commented on HBASE-9151: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12619761/HBASE-9151.patch against trunk revision . ATTACHMENT ID: 12619761 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified tests. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop 1.1 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.TestRSKilledWhenInitializing org.apache.hadoop.hbase.security.access.TestAccessController {color:red}-1 core zombie tests{color}. There are 1 zombie test(s): at org.apache.hadoop.hbase.TestAcidGuarantees.testMixedAtomicity(TestAcidGuarantees.java:351) Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8245//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8245//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8245//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8245//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8245//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8245//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8245//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8245//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8245//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8245//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8245//console This message is automatically generated. HBCK cannot fix when meta server znode deleted, this can happen if all region servers stopped and there are no logs to split. - Key: HBASE-9151 URL: https://issues.apache.org/jira/browse/HBASE-9151 Project: HBase Issue Type: Bug Components: hbck Reporter: rajeshbabu Assignee: rajeshbabu Fix For: 0.98.0, 0.99.0 Attachments: HBASE-9151.patch When meta server znode deleted and meta in FAILED_OPEN state, then hbck cannot fix it. This scenario can come when all region servers stopped by stop command and didnt start any RS within 10 secs(with default configurations). {code} public void assignMeta() throws KeeperException { MetaRegionTracker.deleteMetaLocation(this.watcher); assign(HRegionInfo.FIRST_META_REGIONINFO, true); } {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10138) incorrect or confusing test value is used in block caches
[ https://issues.apache.org/jira/browse/HBASE-10138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854025#comment-13854025 ] Hudson commented on HBASE-10138: FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #23 (See [https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/23/]) HBASE-10138. Incorrect or confusing test value is used in block caches (Sergey Shelukhin) (apurtell: rev 1552505) * /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/CacheConfig.java * /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketCache.java * /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java * /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/bucket/TestBucketCache.java * /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java * /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java incorrect or confusing test value is used in block caches - Key: HBASE-10138 URL: https://issues.apache.org/jira/browse/HBASE-10138 Project: HBase Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Fix For: 0.98.0, 0.99.0 Attachments: HBASE-10138.patch DEFAULT_BLOCKSIZE_SMALL is described as: {code} // Make default block size for StoreFiles 8k while testing. TODO: FIX! // Need to make it 8k for testing. public static final int DEFAULT_BLOCKSIZE_SMALL = 8 * 1024; {code} This value is used on production path in CacheConfig thru HStore/HRegion, and passed to various cache object. We should change it to actual block size, or if it is somehow by design at least we should clarify it and remove the comment. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10207) ZKVisibilityLabelWatcher : Populate the labels cache on startup
[ https://issues.apache.org/jira/browse/HBASE-10207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854027#comment-13854027 ] Hudson commented on HBASE-10207: FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #23 (See [https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/23/]) HBASE-10207 ZKVisibilityLabelWatcher : Populate the labels cache on startup (anoopsamjohn: rev 1552489) * /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/security/visibility/ZKVisibilityLabelWatcher.java ZKVisibilityLabelWatcher : Populate the labels cache on startup --- Key: HBASE-10207 URL: https://issues.apache.org/jira/browse/HBASE-10207 Project: HBase Issue Type: Bug Affects Versions: 0.98.0 Reporter: Anoop Sam John Assignee: Anoop Sam John Priority: Blocker Fix For: 0.98.0, 0.99.0 Attachments: HBASE-10207.patch -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10173) Need HFile version check in security coprocessors
[ https://issues.apache.org/jira/browse/HBASE-10173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854026#comment-13854026 ] Hudson commented on HBASE-10173: FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #23 (See [https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/23/]) HBASE-10173. Need HFile version check in security coprocessors (apurtell: rev 1552504) * /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java * /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/AccessController.java * /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/security/visibility/VisibilityController.java * /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/rest/TestScannersWithLabels.java * /hbase/branches/0.98/hbase-thrift/src/test/java/org/apache/hadoop/hbase/thrift2/TestThriftHBaseServiceHandlerWithLabels.java Need HFile version check in security coprocessors - Key: HBASE-10173 URL: https://issues.apache.org/jira/browse/HBASE-10173 Project: HBase Issue Type: Improvement Components: security Affects Versions: 0.98.0, 0.99.0 Reporter: Anoop Sam John Assignee: Andrew Purtell Priority: Critical Fix For: 0.98.0, 0.99.0 Attachments: 10173.patch, 10173.patch, HBASE-10173.patch, HBASE-10173_partial.patch Cell level visibility labels are stored as cell tags. So HFile V3 is the minimum version which can support this feature. Better to have a version check in VisibilityController. Some one using this CP but with any HFile version as V2, we can better throw error. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10214) Regionserver shutdown impropery and leave the dir in .old not delete.
[ https://issues.apache.org/jira/browse/HBASE-10214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854055#comment-13854055 ] binlijin commented on HBASE-10214: -- No, lt looks like impossible. Regionserver shutdown impropery and leave the dir in .old not delete. - Key: HBASE-10214 URL: https://issues.apache.org/jira/browse/HBASE-10214 Project: HBase Issue Type: Bug Reporter: binlijin Attachments: HBASE-10214-94.patch RegionServer log {code} 2013-12-18 15:17:45,771 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: Waiting on 51b27391410efdca841db264df46085f 2013-12-18 15:17:45,776 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Connected to master at null 2013-12-18 15:17:48,776 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Exiting; cluster shutdown set and not carrying any regions 2013-12-18 15:17:48,776 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server node,60020,1384410974572: Unhandled exception: null java.lang.NullPointerException at org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:880) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:753) at java.lang.Thread.run(Thread.java:662) {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-9151) HBCK cannot fix when meta server znode deleted, this can happen if all region servers stopped and there are no logs to split.
[ https://issues.apache.org/jira/browse/HBASE-9151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854057#comment-13854057 ] rajeshbabu commented on HBASE-9151: --- TestRSKilledWhenInitializing test case failure is related to the patch. I will fix and upload new patch. HBCK cannot fix when meta server znode deleted, this can happen if all region servers stopped and there are no logs to split. - Key: HBASE-9151 URL: https://issues.apache.org/jira/browse/HBASE-9151 Project: HBase Issue Type: Bug Components: hbck Reporter: rajeshbabu Assignee: rajeshbabu Fix For: 0.98.0, 0.99.0 Attachments: HBASE-9151.patch When meta server znode deleted and meta in FAILED_OPEN state, then hbck cannot fix it. This scenario can come when all region servers stopped by stop command and didnt start any RS within 10 secs(with default configurations). {code} public void assignMeta() throws KeeperException { MetaRegionTracker.deleteMetaLocation(this.watcher); assign(HRegionInfo.FIRST_META_REGIONINFO, true); } {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HBASE-10214) Regionserver shutdown impropery and leave the dir in .old not delete.
[ https://issues.apache.org/jira/browse/HBASE-10214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] binlijin updated HBASE-10214: - Attachment: HBASE-10214-94-V2.patch Regionserver shutdown impropery and leave the dir in .old not delete. - Key: HBASE-10214 URL: https://issues.apache.org/jira/browse/HBASE-10214 Project: HBase Issue Type: Bug Reporter: binlijin Attachments: HBASE-10214-94-V2.patch, HBASE-10214-94.patch RegionServer log {code} 2013-12-18 15:17:45,771 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: Waiting on 51b27391410efdca841db264df46085f 2013-12-18 15:17:45,776 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Connected to master at null 2013-12-18 15:17:48,776 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Exiting; cluster shutdown set and not carrying any regions 2013-12-18 15:17:48,776 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server node,60020,1384410974572: Unhandled exception: null java.lang.NullPointerException at org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:880) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:753) at java.lang.Thread.run(Thread.java:662) {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10173) Need HFile version check in security coprocessors
[ https://issues.apache.org/jira/browse/HBASE-10173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854072#comment-13854072 ] Anoop Sam John commented on HBASE-10173: https://builds.apache.org/job/PreCommit-HBASE-Build/8241//testReport/org.apache.hadoop.hbase.security.access/TestAccessController/testCellPermissions/ Failure is related to commit.. I can give a small addendum here. Need HFile version check in security coprocessors - Key: HBASE-10173 URL: https://issues.apache.org/jira/browse/HBASE-10173 Project: HBase Issue Type: Improvement Components: security Affects Versions: 0.98.0, 0.99.0 Reporter: Anoop Sam John Assignee: Andrew Purtell Priority: Critical Fix For: 0.98.0, 0.99.0 Attachments: 10173.patch, 10173.patch, HBASE-10173.patch, HBASE-10173_partial.patch Cell level visibility labels are stored as cell tags. So HFile V3 is the minimum version which can support this feature. Better to have a version check in VisibilityController. Some one using this CP but with any HFile version as V2, we can better throw error. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Created] (HBASE-10216) Change HBase to support local compactions
David Witten created HBASE-10216: Summary: Change HBase to support local compactions Key: HBASE-10216 URL: https://issues.apache.org/jira/browse/HBASE-10216 Project: HBase Issue Type: New Feature Components: Compaction Environment: All Reporter: David Witten As I understand it compactions will read data from DFS and write to DFS. This means that even when the reading occurs on the local host (because region server has a local copy) all the writing must go over the network to the other replicas. This proposal suggests that HBase would perform much better if all the reading and writing occurred locally and did not go over the network. I propose that the DFS interface be extended to provide method that would merge files so that the merging and deleting can be performed on local data nodes with no file contents moving over the network. The method would take a list of paths to be merged and deleted and the merged file path and an indication of a file-format-aware class that would be run on each data node to perform the merge. The merge method provided by this merging class would be passed files open for reading for all the files to be merged and one file open for writing. The custom class provided merge method would read all the input files and append to the output file using some standard API that would work across all DFS implementations. The DFS would ensure that the merge had happened properly on all replicas before returning to the caller. It could be that greater resiliency could be achieved by implementing the deletion as a separate phase that is only done after enough of the replicas had completed the merge. HBase would be changed to use the new merge method for compactions, and would provide an implementation of the merging class that works with HFiles. This proposal would require a custom code that understands the file format to be runnable by the data nodes to manage the merge. So there would need to be a facility to load classes into DFS if there isn't such a facility already. Or, less generally, HDFS could build in support for HFile merging. The merge method might be optional. If the DFS implementation did not provide it a generic version that performed the merge on top of the regular DFS interfaces would be used. It may be that this method needs to be tweaked or ignored when the region server does not have a local copy data so that, as happens currently, one copy of the data moves to the region server. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HBASE-10173) Need HFile version check in security coprocessors
[ https://issues.apache.org/jira/browse/HBASE-10173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anoop Sam John updated HBASE-10173: --- Attachment: HBASE-10173_Addendum.patch Addendum to fix test failure . [~apurtell] , [~ram_krish] what do you guys say? Need HFile version check in security coprocessors - Key: HBASE-10173 URL: https://issues.apache.org/jira/browse/HBASE-10173 Project: HBase Issue Type: Improvement Components: security Affects Versions: 0.98.0, 0.99.0 Reporter: Anoop Sam John Assignee: Andrew Purtell Priority: Critical Fix For: 0.98.0, 0.99.0 Attachments: 10173.patch, 10173.patch, HBASE-10173.patch, HBASE-10173_Addendum.patch, HBASE-10173_partial.patch Cell level visibility labels are stored as cell tags. So HFile V3 is the minimum version which can support this feature. Better to have a version check in VisibilityController. Some one using this CP but with any HFile version as V2, we can better throw error. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10216) Change HBase to support local compactions
[ https://issues.apache.org/jira/browse/HBASE-10216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854117#comment-13854117 ] Liang Xie commented on HBASE-10216: --- sound crazy while i read firstly, but yep, seems reasonable. seems need do lots of work at HDFS side, you need to let the accordingly data blocks allocate to the same data nodes always, then your proposal merge probably could bypass the most of the network operation. current HDFS code, however, no ganrantee all the HFile's low layer data blocks into the same nodes:) Change HBase to support local compactions - Key: HBASE-10216 URL: https://issues.apache.org/jira/browse/HBASE-10216 Project: HBase Issue Type: New Feature Components: Compaction Environment: All Reporter: David Witten As I understand it compactions will read data from DFS and write to DFS. This means that even when the reading occurs on the local host (because region server has a local copy) all the writing must go over the network to the other replicas. This proposal suggests that HBase would perform much better if all the reading and writing occurred locally and did not go over the network. I propose that the DFS interface be extended to provide method that would merge files so that the merging and deleting can be performed on local data nodes with no file contents moving over the network. The method would take a list of paths to be merged and deleted and the merged file path and an indication of a file-format-aware class that would be run on each data node to perform the merge. The merge method provided by this merging class would be passed files open for reading for all the files to be merged and one file open for writing. The custom class provided merge method would read all the input files and append to the output file using some standard API that would work across all DFS implementations. The DFS would ensure that the merge had happened properly on all replicas before returning to the caller. It could be that greater resiliency could be achieved by implementing the deletion as a separate phase that is only done after enough of the replicas had completed the merge. HBase would be changed to use the new merge method for compactions, and would provide an implementation of the merging class that works with HFiles. This proposal would require a custom code that understands the file format to be runnable by the data nodes to manage the merge. So there would need to be a facility to load classes into DFS if there isn't such a facility already. Or, less generally, HDFS could build in support for HFile merging. The merge method might be optional. If the DFS implementation did not provide it a generic version that performed the merge on top of the regular DFS interfaces would be used. It may be that this method needs to be tweaked or ignored when the region server does not have a local copy data so that, as happens currently, one copy of the data moves to the region server. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10173) Need HFile version check in security coprocessors
[ https://issues.apache.org/jira/browse/HBASE-10173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854118#comment-13854118 ] Andrew Purtell commented on HBASE-10173: Yep, annoying this didn't show up locally. Need HFile version check in security coprocessors - Key: HBASE-10173 URL: https://issues.apache.org/jira/browse/HBASE-10173 Project: HBase Issue Type: Improvement Components: security Affects Versions: 0.98.0, 0.99.0 Reporter: Anoop Sam John Assignee: Andrew Purtell Priority: Critical Fix For: 0.98.0, 0.99.0 Attachments: 10173.patch, 10173.patch, HBASE-10173.patch, HBASE-10173_Addendum.patch, HBASE-10173_partial.patch Cell level visibility labels are stored as cell tags. So HFile V3 is the minimum version which can support this feature. Better to have a version check in VisibilityController. Some one using this CP but with any HFile version as V2, we can better throw error. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10215) TableNotFoundException should be thrown after removing stale znode in ETH
[ https://issues.apache.org/jira/browse/HBASE-10215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854120#comment-13854120 ] Hadoop QA commented on HBASE-10215: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12619812/HBASE-10215.patch against trunk revision . ATTACHMENT ID: 12619812 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified tests. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop 1.1 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.security.access.TestAccessController Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8246//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8246//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8246//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8246//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8246//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8246//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8246//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8246//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8246//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8246//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8246//console This message is automatically generated. TableNotFoundException should be thrown after removing stale znode in ETH - Key: HBASE-10215 URL: https://issues.apache.org/jira/browse/HBASE-10215 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.96.1, 0.94.14 Reporter: rajeshbabu Assignee: rajeshbabu Priority: Minor Fix For: 0.98.0, 0.94.16, 0.96.2, 0.99.0 Attachments: HBASE-10215.patch Lets suppose master went down while creating table then znode will be left in ENABLING state. Master to recover them on restart. If there are no meta entries for the table. While recovering the table we are checking whether table exists in meta or not, if not we are removing the znode. After removing znode we need to throw TableNotFoundException. Presently not throwing the exception so the znode will be recrated. It will be stale forever. Even on master restart we cannot delete. We cannot create the table with same name also. {code} // Check if table exists if (!MetaReader.tableExists(catalogTracker, tableName)) { // retainAssignment is true only during recovery. In normal case it is false if (!this.skipTableStateCheck) { throw new TableNotFoundException(tableName); } try { this.assignmentManager.getZKTable().removeEnablingTable(tableName, true); } catch (KeeperException e) { // TODO : Use HBCK to clear such nodes LOG.warn(Failed to delete the ENABLING node for the table + tableName + . The table will remain unusable. Run HBCK to manually fix the problem.); } }
[jira] [Commented] (HBASE-10173) Need HFile version check in security coprocessors
[ https://issues.apache.org/jira/browse/HBASE-10173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854121#comment-13854121 ] Andrew Purtell commented on HBASE-10173: +1 on addendum Need HFile version check in security coprocessors - Key: HBASE-10173 URL: https://issues.apache.org/jira/browse/HBASE-10173 Project: HBase Issue Type: Improvement Components: security Affects Versions: 0.98.0, 0.99.0 Reporter: Anoop Sam John Assignee: Andrew Purtell Priority: Critical Fix For: 0.98.0, 0.99.0 Attachments: 10173.patch, 10173.patch, HBASE-10173.patch, HBASE-10173_Addendum.patch, HBASE-10173_partial.patch Cell level visibility labels are stored as cell tags. So HFile V3 is the minimum version which can support this feature. Better to have a version check in VisibilityController. Some one using this CP but with any HFile version as V2, we can better throw error. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HBASE-10193) Cleanup HRegion if one of the store fails to open at region initialization
[ https://issues.apache.org/jira/browse/HBASE-10193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-10193: --- Fix Version/s: 0.99.0 0.96.2 0.94.15 0.98.0 Cleanup HRegion if one of the store fails to open at region initialization -- Key: HBASE-10193 URL: https://issues.apache.org/jira/browse/HBASE-10193 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.96.1, 0.94.14 Reporter: Aditya Kishore Assignee: Aditya Kishore Priority: Critical Fix For: 0.98.0, 0.94.15, 0.96.2, 0.99.0 Attachments: HBASE-10193.patch, HBASE-10193_0.94.patch, HBASE-10193_0.94_v2.patch, HBASE-10193_0.94_v3.patch, HBASE-10193_0.94_v4.patch, HBASE-10193_v2.patch, HBASE-10193_v3.patch, HBASE-10193_v4.patch While investigating a different issue, I realized that the fix for HBASE-9737 is not sufficient to prevent resource leak if a region fails to open for some reason, say a corrupt HFile. The region may have, by then, opened other good HFiles in that store or other stores if it has more than one column family and their streams may leak if not closed. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10193) Cleanup HRegion if one of the store fails to open at region initialization
[ https://issues.apache.org/jira/browse/HBASE-10193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854125#comment-13854125 ] ramkrishna.s.vasudevan commented on HBASE-10193: Committed to 0.96, trunk and 0.94. Not able to commit to 0.98. Says access denied. [~anoop.hbase], [~apurtell] - could you pls commit to 0.98. Cleanup HRegion if one of the store fails to open at region initialization -- Key: HBASE-10193 URL: https://issues.apache.org/jira/browse/HBASE-10193 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.96.1, 0.94.14 Reporter: Aditya Kishore Assignee: Aditya Kishore Priority: Critical Fix For: 0.98.0, 0.94.15, 0.96.2, 0.99.0 Attachments: HBASE-10193.patch, HBASE-10193_0.94.patch, HBASE-10193_0.94_v2.patch, HBASE-10193_0.94_v3.patch, HBASE-10193_0.94_v4.patch, HBASE-10193_v2.patch, HBASE-10193_v3.patch, HBASE-10193_v4.patch While investigating a different issue, I realized that the fix for HBASE-9737 is not sufficient to prevent resource leak if a region fails to open for some reason, say a corrupt HFile. The region may have, by then, opened other good HFiles in that store or other stores if it has more than one column family and their streams may leak if not closed. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10206) Explain tags in the hbase book
[ https://issues.apache.org/jira/browse/HBASE-10206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854128#comment-13854128 ] ramkrishna.s.vasudevan commented on HBASE-10206: Committed to trunk. Need to commit to 0.98. So leaving it open. [~anoop.hbase],[~apurtell] - Could you pls commit this to 0.98? Explain tags in the hbase book -- Key: HBASE-10206 URL: https://issues.apache.org/jira/browse/HBASE-10206 Project: HBase Issue Type: Task Components: documentation Affects Versions: 0.98.0, 0.99.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.98.0 Attachments: HBASE-10206.patch, HBASE-10206.patch -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-9721) RegionServer should not accept regionOpen RPC intended for another(previous) server
[ https://issues.apache.org/jira/browse/HBASE-9721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854135#comment-13854135 ] Jimmy Xiang commented on HBASE-9721: bq. should the RS check both the znode version and data before open the region? I think I prefer to put the sn (or just the startcode?) in the RPC as this patch does since we may do assignment without ZK later on. RegionServer should not accept regionOpen RPC intended for another(previous) server --- Key: HBASE-9721 URL: https://issues.apache.org/jira/browse/HBASE-9721 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.96.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Attachments: hbase-9721_v0.patch, hbase-9721_v1.patch, hbase-9721_v2.patch On a test cluster, this following events happened with ITBLL and CM leading to meta being unavailable until master is restarted. An RS carrying meta died, and master assigned the region to one of the RSs. {code} 2013-10-03 23:30:06,611 INFO [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-1] master.AssignmentManager: Assigning hbase:meta,,1.1588230740 to gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820 2013-10-03 23:30:06,611 INFO [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-1] master.RegionStates: Transitioned {1588230740 state=OFFLINE, ts=1380843006601, server=null} to {1588230740 state=PENDING_OPEN, ts=1380843006611, server=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820} 2013-10-03 23:30:06,611 DEBUG [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-1] master.ServerManager: New admin connection to gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820 {code} At the same time, the RS that meta recently got assigned also died (due to CM), and restarted: {code} 2013-10-03 23:30:07,636 DEBUG [RpcServer.handler=17,port=6] master.ServerManager: REPORT: Server gs-hdp2-secure-1380781860-hbase-8.cs1cloud.internal,60020,1380843002494 came back up, removed it from the dead servers list 2013-10-03 23:30:08,769 INFO [RpcServer.handler=18,port=6] master.ServerManager: Triggering server recovery; existingServer gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820 looks stale, new server:gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380843006362 2013-10-03 23:30:08,771 DEBUG [RpcServer.handler=18,port=6] master.AssignmentManager: Checking region=hbase:meta,,1.1588230740, zk server=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820 current=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820, matches=true 2013-10-03 23:30:08,771 DEBUG [RpcServer.handler=18,port=6] master.ServerManager: Added=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820 to dead servers, submitted shutdown handler to be executed meta=true 2013-10-03 23:30:08,771 INFO [RpcServer.handler=18,port=6] master.ServerManager: Registering server=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380843006362 2013-10-03 23:30:08,772 INFO [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-2] handler.MetaServerShutdownHandler: Splitting hbase:meta logs for gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820 {code} AM/SSH sees that the RS that died was carrying meta, but the assignment RPC request was still not sent: {code} 2013-10-03 23:30:08,791 DEBUG [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-2] master.AssignmentManager: Checking region=hbase:meta,,1.1588230740, zk server=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820 current=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820, matches=true 2013-10-03 23:30:08,791 INFO [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-2] handler.MetaServerShutdownHandler: Server gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820 was carrying META. Trying to assign. 2013-10-03 23:30:08,791 DEBUG [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-2] master.RegionStates: Offline 1588230740 with current state=PENDING_OPEN, expected state=OFFLINE/SPLITTING/MERGING 2013-10-03 23:30:08,791 INFO [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-2] master.RegionStates: Transitioned {1588230740 state=PENDING_OPEN, ts=1380843006611, server=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820} to {1588230740 state=OFFLINE, ts=1380843008791, server=null}
[jira] [Commented] (HBASE-10173) Need HFile version check in security coprocessors
[ https://issues.apache.org/jira/browse/HBASE-10173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854138#comment-13854138 ] ramkrishna.s.vasudevan commented on HBASE-10173: Sorry. Took some time to understand. So the ACL region came in after some other region had come first. So we are moving the check to start(). Correct? Need HFile version check in security coprocessors - Key: HBASE-10173 URL: https://issues.apache.org/jira/browse/HBASE-10173 Project: HBase Issue Type: Improvement Components: security Affects Versions: 0.98.0, 0.99.0 Reporter: Anoop Sam John Assignee: Andrew Purtell Priority: Critical Fix For: 0.98.0, 0.99.0 Attachments: 10173.patch, 10173.patch, HBASE-10173.patch, HBASE-10173_Addendum.patch, HBASE-10173_partial.patch Cell level visibility labels are stored as cell tags. So HFile V3 is the minimum version which can support this feature. Better to have a version check in VisibilityController. Some one using this CP but with any HFile version as V2, we can better throw error. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Reopened] (HBASE-10173) Need HFile version check in security coprocessors
[ https://issues.apache.org/jira/browse/HBASE-10173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anoop Sam John reopened HBASE-10173: Need HFile version check in security coprocessors - Key: HBASE-10173 URL: https://issues.apache.org/jira/browse/HBASE-10173 Project: HBase Issue Type: Improvement Components: security Affects Versions: 0.98.0, 0.99.0 Reporter: Anoop Sam John Assignee: Andrew Purtell Priority: Critical Fix For: 0.98.0, 0.99.0 Attachments: 10173.patch, 10173.patch, HBASE-10173.patch, HBASE-10173_Addendum.patch, HBASE-10173_partial.patch Cell level visibility labels are stored as cell tags. So HFile V3 is the minimum version which can support this feature. Better to have a version check in VisibilityController. Some one using this CP but with any HFile version as V2, we can better throw error. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10206) Explain tags in the hbase book
[ https://issues.apache.org/jira/browse/HBASE-10206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854145#comment-13854145 ] Andrew Purtell commented on HBASE-10206: On another issue [~stack] mentioned that he copies the current trunk docs to branch and commits that just before a RC - did I remember that correctly? Seems a fine way to do it for now because the doc for branch is the same as trunk. Explain tags in the hbase book -- Key: HBASE-10206 URL: https://issues.apache.org/jira/browse/HBASE-10206 Project: HBase Issue Type: Task Components: documentation Affects Versions: 0.98.0, 0.99.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.98.0 Attachments: HBASE-10206.patch, HBASE-10206.patch -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10173) Need HFile version check in security coprocessors
[ https://issues.apache.org/jira/browse/HBASE-10173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854144#comment-13854144 ] Anoop Sam John commented on HBASE-10173: Not exactly Ram... initialize() will be called on the CP object created for the _acl_ region. In case of other regions this is not called and the boolean is always false :( So moving this to start() will make sure that this is done for all regions Need HFile version check in security coprocessors - Key: HBASE-10173 URL: https://issues.apache.org/jira/browse/HBASE-10173 Project: HBase Issue Type: Improvement Components: security Affects Versions: 0.98.0, 0.99.0 Reporter: Anoop Sam John Assignee: Andrew Purtell Priority: Critical Fix For: 0.98.0, 0.99.0 Attachments: 10173.patch, 10173.patch, HBASE-10173.patch, HBASE-10173_Addendum.patch, HBASE-10173_partial.patch Cell level visibility labels are stored as cell tags. So HFile V3 is the minimum version which can support this feature. Better to have a version check in VisibilityController. Some one using this CP but with any HFile version as V2, we can better throw error. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Resolved] (HBASE-10173) Need HFile version check in security coprocessors
[ https://issues.apache.org/jira/browse/HBASE-10173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anoop Sam John resolved HBASE-10173. Resolution: Fixed Committed the addendum to 0.98 and trunk. Need HFile version check in security coprocessors - Key: HBASE-10173 URL: https://issues.apache.org/jira/browse/HBASE-10173 Project: HBase Issue Type: Improvement Components: security Affects Versions: 0.98.0, 0.99.0 Reporter: Anoop Sam John Assignee: Andrew Purtell Priority: Critical Fix For: 0.98.0, 0.99.0 Attachments: 10173.patch, 10173.patch, HBASE-10173.patch, HBASE-10173_Addendum.patch, HBASE-10173_partial.patch Cell level visibility labels are stored as cell tags. So HFile V3 is the minimum version which can support this feature. Better to have a version check in VisibilityController. Some one using this CP but with any HFile version as V2, we can better throw error. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10161) [AccessController] Tolerate regions in recovery
[ https://issues.apache.org/jira/browse/HBASE-10161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854146#comment-13854146 ] Anoop Sam John commented on HBASE-10161: Test failure is addressed by the addendum committed to HBASE-10173. [AccessController] Tolerate regions in recovery --- Key: HBASE-10161 URL: https://issues.apache.org/jira/browse/HBASE-10161 Project: HBase Issue Type: Bug Affects Versions: 0.96.0 Reporter: Andrew Purtell Assignee: Anoop Sam John Priority: Blocker Fix For: 0.98.0, 0.96.2, 0.99.0 Attachments: HBASE-10161.patch, HBASE-10161_V2.patch AccessController fixes for the issue also affecting VisibilityController described on HBASE-10148. Coprocessors that initialize in postOpen upcalls must check if the region is still in recovery and defer initialization until recovery is complete. We need to add a new CP hook for post recovery upcalls and modify existing CPs to defer initialization until this new hook as needed. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10193) Cleanup HRegion if one of the store fails to open at region initialization
[ https://issues.apache.org/jira/browse/HBASE-10193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854151#comment-13854151 ] Anoop Sam John commented on HBASE-10193: Committed to 0.98 branch as well.. Cleanup HRegion if one of the store fails to open at region initialization -- Key: HBASE-10193 URL: https://issues.apache.org/jira/browse/HBASE-10193 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.96.1, 0.94.14 Reporter: Aditya Kishore Assignee: Aditya Kishore Priority: Critical Fix For: 0.98.0, 0.94.15, 0.96.2, 0.99.0 Attachments: HBASE-10193.patch, HBASE-10193_0.94.patch, HBASE-10193_0.94_v2.patch, HBASE-10193_0.94_v3.patch, HBASE-10193_0.94_v4.patch, HBASE-10193_v2.patch, HBASE-10193_v3.patch, HBASE-10193_v4.patch While investigating a different issue, I realized that the fix for HBASE-9737 is not sufficient to prevent resource leak if a region fails to open for some reason, say a corrupt HFile. The region may have, by then, opened other good HFiles in that store or other stores if it has more than one column family and their streams may leak if not closed. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Comment Edited] (HBASE-10193) Cleanup HRegion if one of the store fails to open at region initialization
[ https://issues.apache.org/jira/browse/HBASE-10193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854151#comment-13854151 ] Anoop Sam John edited comment on HBASE-10193 at 12/20/13 4:55 PM: -- Committed to 0.98 branch as well.. Thanks for the patch Aditya. Thanks Ted and Ram for the reviews was (Author: anoop.hbase): Committed to 0.98 branch as well.. Cleanup HRegion if one of the store fails to open at region initialization -- Key: HBASE-10193 URL: https://issues.apache.org/jira/browse/HBASE-10193 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.96.1, 0.94.14 Reporter: Aditya Kishore Assignee: Aditya Kishore Priority: Critical Fix For: 0.98.0, 0.94.15, 0.96.2, 0.99.0 Attachments: HBASE-10193.patch, HBASE-10193_0.94.patch, HBASE-10193_0.94_v2.patch, HBASE-10193_0.94_v3.patch, HBASE-10193_0.94_v4.patch, HBASE-10193_v2.patch, HBASE-10193_v3.patch, HBASE-10193_v4.patch While investigating a different issue, I realized that the fix for HBASE-9737 is not sufficient to prevent resource leak if a region fails to open for some reason, say a corrupt HFile. The region may have, by then, opened other good HFiles in that store or other stores if it has more than one column family and their streams may leak if not closed. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-7781) Update security unit tests to use a KDC if available
[ https://issues.apache.org/jira/browse/HBASE-7781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854153#comment-13854153 ] Andrew Purtell commented on HBASE-7781: --- Have a look at the utility classes (and tests) under hbase-server src/test in org.apache.hadoop.hbase.security. We want helpers that allow a test writer to start a mini KDC. I don't think we can depend on Hadoop's mini KDC module yet until it is in a release. Then it would be nice to have an integration test that starts up the mini KDC and uses it if running in a minicluster configuration. Update security unit tests to use a KDC if available Key: HBASE-7781 URL: https://issues.apache.org/jira/browse/HBASE-7781 Project: HBase Issue Type: Test Components: security, test Reporter: Gary Helmling Assignee: ramkrishna.s.vasudevan Priority: Blocker Fix For: 0.98.0 We currently have large holes in the test coverage of HBase with security enabled. Two recent examples of bugs which really should have been caught with testing are HBASE-7771 and HBASE-7772. The long standing problem with testing with security enabled has been the requirement for supporting kerberos infrastructure. We need to close this gap and provide some automated testing with security enabled, if necessary standing up and provisioning a temporary KDC as an option for running integration tests, see HADOOP-8078 and HADOOP-9004 where a similar approach was taken. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Resolved] (HBASE-10193) Cleanup HRegion if one of the store fails to open at region initialization
[ https://issues.apache.org/jira/browse/HBASE-10193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anoop Sam John resolved HBASE-10193. Resolution: Fixed Fix Version/s: (was: 0.94.15) 0.94.16 Hadoop Flags: Reviewed Cleanup HRegion if one of the store fails to open at region initialization -- Key: HBASE-10193 URL: https://issues.apache.org/jira/browse/HBASE-10193 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.96.1, 0.94.14 Reporter: Aditya Kishore Assignee: Aditya Kishore Priority: Critical Fix For: 0.98.0, 0.94.16, 0.96.2, 0.99.0 Attachments: HBASE-10193.patch, HBASE-10193_0.94.patch, HBASE-10193_0.94_v2.patch, HBASE-10193_0.94_v3.patch, HBASE-10193_0.94_v4.patch, HBASE-10193_v2.patch, HBASE-10193_v3.patch, HBASE-10193_v4.patch While investigating a different issue, I realized that the fix for HBASE-9737 is not sufficient to prevent resource leak if a region fails to open for some reason, say a corrupt HFile. The region may have, by then, opened other good HFiles in that store or other stores if it has more than one column family and their streams may leak if not closed. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10193) Cleanup HRegion if one of the store fails to open at region initialization
[ https://issues.apache.org/jira/browse/HBASE-10193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854154#comment-13854154 ] Andrew Purtell commented on HBASE-10193: Thanks Anoop. Local permissions problem Ram? Your access in the repo should be fine. I just did a svn copy from trunk to create the branch, nothing unusual there. Cleanup HRegion if one of the store fails to open at region initialization -- Key: HBASE-10193 URL: https://issues.apache.org/jira/browse/HBASE-10193 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.96.1, 0.94.14 Reporter: Aditya Kishore Assignee: Aditya Kishore Priority: Critical Fix For: 0.98.0, 0.94.16, 0.96.2, 0.99.0 Attachments: HBASE-10193.patch, HBASE-10193_0.94.patch, HBASE-10193_0.94_v2.patch, HBASE-10193_0.94_v3.patch, HBASE-10193_0.94_v4.patch, HBASE-10193_v2.patch, HBASE-10193_v3.patch, HBASE-10193_v4.patch While investigating a different issue, I realized that the fix for HBASE-9737 is not sufficient to prevent resource leak if a region fails to open for some reason, say a corrupt HFile. The region may have, by then, opened other good HFiles in that store or other stores if it has more than one column family and their streams may leak if not closed. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10161) [AccessController] Tolerate regions in recovery
[ https://issues.apache.org/jira/browse/HBASE-10161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854156#comment-13854156 ] Andrew Purtell commented on HBASE-10161: +1 [AccessController] Tolerate regions in recovery --- Key: HBASE-10161 URL: https://issues.apache.org/jira/browse/HBASE-10161 Project: HBase Issue Type: Bug Affects Versions: 0.96.0 Reporter: Andrew Purtell Assignee: Anoop Sam John Priority: Blocker Fix For: 0.98.0, 0.96.2, 0.99.0 Attachments: HBASE-10161.patch, HBASE-10161_V2.patch AccessController fixes for the issue also affecting VisibilityController described on HBASE-10148. Coprocessors that initialize in postOpen upcalls must check if the region is still in recovery and defer initialization until recovery is complete. We need to add a new CP hook for post recovery upcalls and modify existing CPs to defer initialization until this new hook as needed. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10161) [AccessController] Tolerate regions in recovery
[ https://issues.apache.org/jira/browse/HBASE-10161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854160#comment-13854160 ] Andrew Purtell commented on HBASE-10161: One question. This part: {code} Index: hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/AccessControlLists.java === --- hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/AccessControlLists.java (revision 1552489) +++ hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/AccessControlLists.java (working copy) @@ -116,6 +116,8 @@ Compression.Algorithm.NONE.getName(), true, true, 8 * 1024, HConstants.FOREVER, BloomType.NONE.toString(), HConstants.REPLICATION_SCOPE_LOCAL)); + ACL_TABLEDESC.setValue(Bytes.toBytes(HConstants.DISALLOW_WRITES_IN_RECOVERING), +Bytes.toBytes(true)); } {code} For future use? [AccessController] Tolerate regions in recovery --- Key: HBASE-10161 URL: https://issues.apache.org/jira/browse/HBASE-10161 Project: HBase Issue Type: Bug Affects Versions: 0.96.0 Reporter: Andrew Purtell Assignee: Anoop Sam John Priority: Blocker Fix For: 0.98.0, 0.96.2, 0.99.0 Attachments: HBASE-10161.patch, HBASE-10161_V2.patch AccessController fixes for the issue also affecting VisibilityController described on HBASE-10148. Coprocessors that initialize in postOpen upcalls must check if the region is still in recovery and defer initialization until recovery is complete. We need to add a new CP hook for post recovery upcalls and modify existing CPs to defer initialization until this new hook as needed. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10206) Explain tags in the hbase book
[ https://issues.apache.org/jira/browse/HBASE-10206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854158#comment-13854158 ] ramkrishna.s.vasudevan commented on HBASE-10206: Yes but this doc should come only in 0.98 right. So better we commit there too. But as i said am not able to commit to 0.98 due to permission reasons. Explain tags in the hbase book -- Key: HBASE-10206 URL: https://issues.apache.org/jira/browse/HBASE-10206 Project: HBase Issue Type: Task Components: documentation Affects Versions: 0.98.0, 0.99.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.98.0 Attachments: HBASE-10206.patch, HBASE-10206.patch -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HBASE-9151) HBCK cannot fix when meta server znode deleted, this can happen if all region servers stopped and there are no logs to split.
[ https://issues.apache.org/jira/browse/HBASE-9151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] rajeshbabu updated HBASE-9151: -- Attachment: HBASE-9151_v2.patch fixed TestRSKilledWhenInitializing in current patch. HBCK cannot fix when meta server znode deleted, this can happen if all region servers stopped and there are no logs to split. - Key: HBASE-9151 URL: https://issues.apache.org/jira/browse/HBASE-9151 Project: HBase Issue Type: Bug Components: hbck Reporter: rajeshbabu Assignee: rajeshbabu Fix For: 0.98.0, 0.99.0 Attachments: HBASE-9151.patch, HBASE-9151_v2.patch When meta server znode deleted and meta in FAILED_OPEN state, then hbck cannot fix it. This scenario can come when all region servers stopped by stop command and didnt start any RS within 10 secs(with default configurations). {code} public void assignMeta() throws KeeperException { MetaRegionTracker.deleteMetaLocation(this.watcher); assign(HRegionInfo.FIRST_META_REGIONINFO, true); } {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-7781) Update security unit tests to use a KDC if available
[ https://issues.apache.org/jira/browse/HBASE-7781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854166#comment-13854166 ] ramkrishna.s.vasudevan commented on HBASE-7781: --- bq.Have a look at the utility classes (and tests) under hbase-server src/test in org.apache.hadoop.hbase.security Yes Andy. Exactly doing that. I understand that using miniKDC we could define our own principles, and add proper kdc configuraitons for the NN, DN, RS and master. Using that the security test should be running. I hope in testcases too once the security is enabled through kerberos, secure DN and secure NN starts running. Let me see that. Update security unit tests to use a KDC if available Key: HBASE-7781 URL: https://issues.apache.org/jira/browse/HBASE-7781 Project: HBase Issue Type: Test Components: security, test Reporter: Gary Helmling Assignee: ramkrishna.s.vasudevan Priority: Blocker Fix For: 0.98.0 We currently have large holes in the test coverage of HBase with security enabled. Two recent examples of bugs which really should have been caught with testing are HBASE-7771 and HBASE-7772. The long standing problem with testing with security enabled has been the requirement for supporting kerberos infrastructure. We need to close this gap and provide some automated testing with security enabled, if necessary standing up and provisioning a temporary KDC as an option for running integration tests, see HADOOP-8078 and HADOOP-9004 where a similar approach was taken. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-9346) HBCK should provide an option to check if regions boundaries are the same in META and in stores.
[ https://issues.apache.org/jira/browse/HBASE-9346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854177#comment-13854177 ] Andrew Purtell commented on HBASE-9346: --- +1 for 0.98 HBCK should provide an option to check if regions boundaries are the same in META and in stores. Key: HBASE-9346 URL: https://issues.apache.org/jira/browse/HBASE-9346 Project: HBase Issue Type: Bug Components: hbck, Operability Affects Versions: 0.94.14, 0.98.1, 0.99.0, 0.96.1.1 Reporter: Jean-Marc Spaggiari Assignee: Jean-Marc Spaggiari Attachments: HBASE-9346-v0-0.94.patch, HBASE-9346-v1-trunk.patch, HBASE-9346-v2-trunk.patch, HBASE-9346-v3-trunk.patch, HBASE-9346-v4-trunk.patch, HBASE-9346-v5-trunk.patch, HBASE-9346-v6-trunk.patch, HBASE-9346-v7-trunk.patch, HBASE-9346-v8-trunk.patch If META don't have the same region boundaries as the stores files, writes and read might go to the wrong place. We need to provide a way to check that withing HBCK. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10216) Change HBase to support local compactions
[ https://issues.apache.org/jira/browse/HBASE-10216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854179#comment-13854179 ] David Witten commented on HBASE-10216: -- I'm no HDFS expert. But I had imagined that a data node, D, performing a merge would just do the merge with local files, then tell the name node that D has a replica for all the data blocks for the merged file. Change HBase to support local compactions - Key: HBASE-10216 URL: https://issues.apache.org/jira/browse/HBASE-10216 Project: HBase Issue Type: New Feature Components: Compaction Environment: All Reporter: David Witten As I understand it compactions will read data from DFS and write to DFS. This means that even when the reading occurs on the local host (because region server has a local copy) all the writing must go over the network to the other replicas. This proposal suggests that HBase would perform much better if all the reading and writing occurred locally and did not go over the network. I propose that the DFS interface be extended to provide method that would merge files so that the merging and deleting can be performed on local data nodes with no file contents moving over the network. The method would take a list of paths to be merged and deleted and the merged file path and an indication of a file-format-aware class that would be run on each data node to perform the merge. The merge method provided by this merging class would be passed files open for reading for all the files to be merged and one file open for writing. The custom class provided merge method would read all the input files and append to the output file using some standard API that would work across all DFS implementations. The DFS would ensure that the merge had happened properly on all replicas before returning to the caller. It could be that greater resiliency could be achieved by implementing the deletion as a separate phase that is only done after enough of the replicas had completed the merge. HBase would be changed to use the new merge method for compactions, and would provide an implementation of the merging class that works with HFiles. This proposal would require a custom code that understands the file format to be runnable by the data nodes to manage the merge. So there would need to be a facility to load classes into DFS if there isn't such a facility already. Or, less generally, HDFS could build in support for HFile merging. The merge method might be optional. If the DFS implementation did not provide it a generic version that performed the merge on top of the regular DFS interfaces would be used. It may be that this method needs to be tweaked or ignored when the region server does not have a local copy data so that, as happens currently, one copy of the data moves to the region server. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10161) [AccessController] Tolerate regions in recovery
[ https://issues.apache.org/jira/browse/HBASE-10161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854185#comment-13854185 ] Anoop Sam John commented on HBASE-10161: We have initilaized boolean checks now. I think I can remove this. Fine on that Andy? [AccessController] Tolerate regions in recovery --- Key: HBASE-10161 URL: https://issues.apache.org/jira/browse/HBASE-10161 Project: HBase Issue Type: Bug Affects Versions: 0.96.0 Reporter: Andrew Purtell Assignee: Anoop Sam John Priority: Blocker Fix For: 0.98.0, 0.96.2, 0.99.0 Attachments: HBASE-10161.patch, HBASE-10161_V2.patch AccessController fixes for the issue also affecting VisibilityController described on HBASE-10148. Coprocessors that initialize in postOpen upcalls must check if the region is still in recovery and defer initialization until recovery is complete. We need to add a new CP hook for post recovery upcalls and modify existing CPs to defer initialization until this new hook as needed. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10213) Add read log size per second metrics for replication source
[ https://issues.apache.org/jira/browse/HBASE-10213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854189#comment-13854189 ] Andrew Purtell commented on HBASE-10213: To get a good HadoopQA result, it will need a patch against trunk. {code} index 3831bba..8315c3a 100644 --- src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java +++ src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java @@ -458,6 +458,7 @@ public class ReplicationSource extends Thread throws IOException{ long seenEntries = 0; this.repLogReader.seek(); +long persitionBeforeRead = this.repLogReader.getPosition(); HLog.Entry entry = this.repLogReader.readNextAndSetPosition(); while (entry != null) { {code} persitionBeforeRead should be positionBeforeRead. {code} index da0905c..e32a3bc 100644 --- src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceMetrics.java +++ src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceMetrics.java @@ -66,6 +66,9 @@ public class ReplicationSourceMetrics implements Updater { */ public final MetricsIntValue sizeOfLogQueue = new MetricsIntValue(sizeOfLogQueue, registry); + + /** Rate of log entries read by the source */ + public MetricsRate logReadRateInByte = new MetricsRate(logReadRateInByte, registry); {code} The usual convention for names with units is to pluralize the unit, so logReadRateInBytes Add read log size per second metrics for replication source --- Key: HBASE-10213 URL: https://issues.apache.org/jira/browse/HBASE-10213 Project: HBase Issue Type: Improvement Components: metrics, Replication Affects Versions: 0.94.14 Reporter: cuijianwei Assignee: cuijianwei Priority: Minor Attachments: HBASE-10213-0.94-v1.patch The current metrics of replication source contain logEditsReadRate, shippedBatchesRate, etc, which could indicate how fast the data replicated to peer cluster to some extent. However, it is not clear enough to know how many bytes replicating to peer cluster from these metrics. In production environment, it may be important to know the size of replicating data per second because the services may be affected if the network become busy. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10161) [AccessController] Tolerate regions in recovery
[ https://issues.apache.org/jira/browse/HBASE-10161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854191#comment-13854191 ] Andrew Purtell commented on HBASE-10161: bq. I think I can remove this. Fine on that Andy? Sure [AccessController] Tolerate regions in recovery --- Key: HBASE-10161 URL: https://issues.apache.org/jira/browse/HBASE-10161 Project: HBase Issue Type: Bug Affects Versions: 0.96.0 Reporter: Andrew Purtell Assignee: Anoop Sam John Priority: Blocker Fix For: 0.98.0, 0.96.2, 0.99.0 Attachments: HBASE-10161.patch, HBASE-10161_V2.patch AccessController fixes for the issue also affecting VisibilityController described on HBASE-10148. Coprocessors that initialize in postOpen upcalls must check if the region is still in recovery and defer initialization until recovery is complete. We need to add a new CP hook for post recovery upcalls and modify existing CPs to defer initialization until this new hook as needed. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10161) [AccessController] Tolerate regions in recovery
[ https://issues.apache.org/jira/browse/HBASE-10161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854198#comment-13854198 ] Anoop Sam John commented on HBASE-10161: V3 which avoids the change in AccessControlLists. Going to commit now [AccessController] Tolerate regions in recovery --- Key: HBASE-10161 URL: https://issues.apache.org/jira/browse/HBASE-10161 Project: HBase Issue Type: Bug Affects Versions: 0.96.0 Reporter: Andrew Purtell Assignee: Anoop Sam John Priority: Blocker Fix For: 0.98.0, 0.96.2, 0.99.0 Attachments: HBASE-10161.patch, HBASE-10161_V2.patch, HBASE-10161_V3.patch AccessController fixes for the issue also affecting VisibilityController described on HBASE-10148. Coprocessors that initialize in postOpen upcalls must check if the region is still in recovery and defer initialization until recovery is complete. We need to add a new CP hook for post recovery upcalls and modify existing CPs to defer initialization until this new hook as needed. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HBASE-10161) [AccessController] Tolerate regions in recovery
[ https://issues.apache.org/jira/browse/HBASE-10161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anoop Sam John updated HBASE-10161: --- Attachment: HBASE-10161_V3.patch [AccessController] Tolerate regions in recovery --- Key: HBASE-10161 URL: https://issues.apache.org/jira/browse/HBASE-10161 Project: HBase Issue Type: Bug Affects Versions: 0.96.0 Reporter: Andrew Purtell Assignee: Anoop Sam John Priority: Blocker Fix For: 0.98.0, 0.96.2, 0.99.0 Attachments: HBASE-10161.patch, HBASE-10161_V2.patch, HBASE-10161_V3.patch AccessController fixes for the issue also affecting VisibilityController described on HBASE-10148. Coprocessors that initialize in postOpen upcalls must check if the region is still in recovery and defer initialization until recovery is complete. We need to add a new CP hook for post recovery upcalls and modify existing CPs to defer initialization until this new hook as needed. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10161) [AccessController] Tolerate regions in recovery
[ https://issues.apache.org/jira/browse/HBASE-10161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854206#comment-13854206 ] Anoop Sam John commented on HBASE-10161: Ping [~stack]. This is required in 96 branch also. Pls +1 [AccessController] Tolerate regions in recovery --- Key: HBASE-10161 URL: https://issues.apache.org/jira/browse/HBASE-10161 Project: HBase Issue Type: Bug Affects Versions: 0.96.0 Reporter: Andrew Purtell Assignee: Anoop Sam John Priority: Blocker Fix For: 0.98.0, 0.96.2, 0.99.0 Attachments: HBASE-10161.patch, HBASE-10161_V2.patch, HBASE-10161_V3.patch AccessController fixes for the issue also affecting VisibilityController described on HBASE-10148. Coprocessors that initialize in postOpen upcalls must check if the region is still in recovery and defer initialization until recovery is complete. We need to add a new CP hook for post recovery upcalls and modify existing CPs to defer initialization until this new hook as needed. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10047) postScannerFilterRow consumes a lot of CPU in tall table scans
[ https://issues.apache.org/jira/browse/HBASE-10047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854211#comment-13854211 ] Andrew Purtell commented on HBASE-10047: The set of installed coprocessors can change at runtime concurrent with iteration of the list. postScannerFilterRow consumes a lot of CPU in tall table scans -- Key: HBASE-10047 URL: https://issues.apache.org/jira/browse/HBASE-10047 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Attachments: 10047-0.94-sample-v2.txt, 10047-0.94-sample.txt, postScannerFilterRow.png Continuing my profiling quest, I find that in scanning tall table (and filtering everything on the server) a quarter of the time is now spent in the postScannerFilterRow coprocessor hook. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10206) Explain tags in the hbase book
[ https://issues.apache.org/jira/browse/HBASE-10206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854221#comment-13854221 ] Andrew Purtell commented on HBASE-10206: Right, so when prepping the RC I can copy the entire manual over from trunk, we don't have to bring commits to the manual to the branch piece by piece. Explain tags in the hbase book -- Key: HBASE-10206 URL: https://issues.apache.org/jira/browse/HBASE-10206 Project: HBase Issue Type: Task Components: documentation Affects Versions: 0.98.0, 0.99.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.98.0 Attachments: HBASE-10206.patch, HBASE-10206.patch -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10216) Change HBase to support local compactions
[ https://issues.apache.org/jira/browse/HBASE-10216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854261#comment-13854261 ] haosdent commented on HBASE-10216: -- I don't think local compaction is feasible. HDFS stored hfiles as many blocks, and these blocks have a fixed size. To provide a method to merge files in hdfs may be couldn't bring outstanding improvement. In other words, hdfs local reads maybe enough for this. Change HBase to support local compactions - Key: HBASE-10216 URL: https://issues.apache.org/jira/browse/HBASE-10216 Project: HBase Issue Type: New Feature Components: Compaction Environment: All Reporter: David Witten As I understand it compactions will read data from DFS and write to DFS. This means that even when the reading occurs on the local host (because region server has a local copy) all the writing must go over the network to the other replicas. This proposal suggests that HBase would perform much better if all the reading and writing occurred locally and did not go over the network. I propose that the DFS interface be extended to provide method that would merge files so that the merging and deleting can be performed on local data nodes with no file contents moving over the network. The method would take a list of paths to be merged and deleted and the merged file path and an indication of a file-format-aware class that would be run on each data node to perform the merge. The merge method provided by this merging class would be passed files open for reading for all the files to be merged and one file open for writing. The custom class provided merge method would read all the input files and append to the output file using some standard API that would work across all DFS implementations. The DFS would ensure that the merge had happened properly on all replicas before returning to the caller. It could be that greater resiliency could be achieved by implementing the deletion as a separate phase that is only done after enough of the replicas had completed the merge. HBase would be changed to use the new merge method for compactions, and would provide an implementation of the merging class that works with HFiles. This proposal would require a custom code that understands the file format to be runnable by the data nodes to manage the merge. So there would need to be a facility to load classes into DFS if there isn't such a facility already. Or, less generally, HDFS could build in support for HFile merging. The merge method might be optional. If the DFS implementation did not provide it a generic version that performed the merge on top of the regular DFS interfaces would be used. It may be that this method needs to be tweaked or ignored when the region server does not have a local copy data so that, as happens currently, one copy of the data moves to the region server. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10161) [AccessController] Tolerate regions in recovery
[ https://issues.apache.org/jira/browse/HBASE-10161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854311#comment-13854311 ] Anoop Sam John commented on HBASE-10161: Committed to 0.98 and Trunk. Will add to 0.96 as well once Stack gives a go. [AccessController] Tolerate regions in recovery --- Key: HBASE-10161 URL: https://issues.apache.org/jira/browse/HBASE-10161 Project: HBase Issue Type: Bug Affects Versions: 0.96.0 Reporter: Andrew Purtell Assignee: Anoop Sam John Priority: Blocker Fix For: 0.98.0, 0.96.2, 0.99.0 Attachments: HBASE-10161.patch, HBASE-10161_V2.patch, HBASE-10161_V3.patch AccessController fixes for the issue also affecting VisibilityController described on HBASE-10148. Coprocessors that initialize in postOpen upcalls must check if the region is still in recovery and defer initialization until recovery is complete. We need to add a new CP hook for post recovery upcalls and modify existing CPs to defer initialization until this new hook as needed. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-9648) collection one expired storefile causes it to be replaced by another expired storefile
[ https://issues.apache.org/jira/browse/HBASE-9648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854374#comment-13854374 ] Sergey Shelukhin commented on HBASE-9648: - Just clarifying, why is it hard to create writer is needed? For the case when there are seemingly no KVs when you were creating the writer. I think coprocs cannot screw up the seqIds, because set of files is already chosen, so that should be ok collection one expired storefile causes it to be replaced by another expired storefile -- Key: HBASE-9648 URL: https://issues.apache.org/jira/browse/HBASE-9648 Project: HBase Issue Type: Bug Components: Compaction Reporter: Sergey Shelukhin Assignee: Jean-Marc Spaggiari Attachments: HBASE-9648-v0-0.94.patch, HBASE-9648-v0-trunk.patch, HBASE-9648-v1-trunk.patch, HBASE-9648-v2-trunk.patch, HBASE-9648-v3-trunk.patch, HBASE-9648.patch There's a shortcut in compaction selection that causes the selection of expired store files to quickly delete. However, there's also the code that ensures we write at least one file to preserve seqnum. This new empty file is expired, because it has no data, presumably. So it's collected again, etc. This affects 94, probably also 96. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10175) 2-thread ChaosMonkey steps on its own toes
[ https://issues.apache.org/jira/browse/HBASE-10175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854375#comment-13854375 ] Sergey Shelukhin commented on HBASE-10175: -- I don't think test failure can be related. [~enis] you want to review? 2-thread ChaosMonkey steps on its own toes -- Key: HBASE-10175 URL: https://issues.apache.org/jira/browse/HBASE-10175 Project: HBase Issue Type: Improvement Components: test Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Priority: Minor Attachments: HBASE-10175.patch ChaosMonkey with one destructive and one volatility (flush-compact-split-etc.) threads steps on its own toes and logs a lot of exceptions. A simple solution would be to catch most (or all), like NotServingRegionException, and log less (not a full callstack for example, it's not very useful anyway). A more complicated/complementary one would be to keep track which regions the destructive thread affects and use other regions for volatile one. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-8558) Add timeout limit for HBaseClient dataOutputStream
[ https://issues.apache.org/jira/browse/HBASE-8558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854397#comment-13854397 ] Lars Hofhansl commented on HBASE-8558: -- So the issue is: While we were writing something a RegionServer went down and we sit there forever waiting? Add timeout limit for HBaseClient dataOutputStream -- Key: HBASE-8558 URL: https://issues.apache.org/jira/browse/HBASE-8558 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.94.5, 0.94.14 Reporter: wanbin Assignee: Liang Xie Attachments: HBASE-8558-0.94.txt I run jstack at client host. The result is below. hbase-tablepool-60-thread-34 daemon prio=10 tid=0x7f1e65a48000 nid=0x5173 runnable [0x579cc000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) - locked 0x000758cb0780 (a sun.nio.ch.Util$2) - locked 0x000758cb0770 (a java.util.Collections$UnmodifiableSet) - locked 0x000758cb0548 (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:336) at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:158) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:153) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:114) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) - locked 0x000754e978a0 (a java.io.BufferedOutputStream) at java.io.DataOutputStream.flush(DataOutputStream.java:106) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.sendParam(HBaseClient.java:620) - locked 0x000754e97880 (a java.io.DataOutputStream) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:975) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:86) at $Proxy13.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1395) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1393) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:210) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1402) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1390) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) This thread have hung for one hours Meanwhile other thread try to close connection IPC Client (1983049639) connection to dump002030.cm6.tbsite.net/10.246.2.30:30020 from admin daemon prio=10 tid=0x7f1e70674800 nid=0x3d76 waiting for monitor entry [0x4bc0f000] java.lang.Thread.State: BLOCKED (on object monitor) at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) - waiting to lock 0x000754e978a0 (a java.io.BufferedOutputStream) at java.io.DataOutputStream.flush(DataOutputStream.java:106) at java.io.FilterOutputStream.close(FilterOutputStream.java:140) at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:237) at org.apache.hadoop.io.IOUtils.closeStream(IOUtils.java:254) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.close(HBaseClient.java:715) - locked 0x000754e7b818 (a org.apache.hadoop.hbase.ipc.HBaseClient$Connection) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:587) dump002030.cm6.tbsite.net is dead regionserver. I read hbase sourececode, discover connection.out doesn't set timeout this.out = new DataOutputStream (new BufferedOutputStream(NetUtils.getOutputStream(socket))); I see this mean epoll_wait will block indefinitely. -- This message
[jira] [Commented] (HBASE-10047) postScannerFilterRow consumes a lot of CPU in tall table scans
[ https://issues.apache.org/jira/browse/HBASE-10047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854409#comment-13854409 ] Lars Hofhansl commented on HBASE-10047: --- Interesting, didn't realize that can happen. After the region is loaded? Ohh, when we detect an error we remove the coprocessor. SortedCopyOnWriteSet should have been a hint too :) The first patch is still valid, since we're only removing after the region was loaded. I didn't measure any perf improvement with v2 anyway, it seems instanceof is not the issue. postScannerFilterRow consumes a lot of CPU in tall table scans -- Key: HBASE-10047 URL: https://issues.apache.org/jira/browse/HBASE-10047 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Attachments: 10047-0.94-sample-v2.txt, 10047-0.94-sample.txt, postScannerFilterRow.png Continuing my profiling quest, I find that in scanning tall table (and filtering everything on the server) a quarter of the time is now spent in the postScannerFilterRow coprocessor hook. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10216) Change HBase to support local compactions
[ https://issues.apache.org/jira/browse/HBASE-10216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854414#comment-13854414 ] Vladimir Rodionov commented on HBASE-10216: --- This should be opened as HDFS ticket. Provide API to *register* new file with a given path and block locations. This may benefits a lot hdfs copy as well - blocks can be copied locally and new file will be created with just one HDFS API call registerFile(Path path, BlockLocation[] locations). Compaction will be performed locally (mostly) and the coordinator of compaction will call *registerFile(Path path, BlockLocation[] locations)* when all involved nodes are finished. Change HBase to support local compactions - Key: HBASE-10216 URL: https://issues.apache.org/jira/browse/HBASE-10216 Project: HBase Issue Type: New Feature Components: Compaction Environment: All Reporter: David Witten As I understand it compactions will read data from DFS and write to DFS. This means that even when the reading occurs on the local host (because region server has a local copy) all the writing must go over the network to the other replicas. This proposal suggests that HBase would perform much better if all the reading and writing occurred locally and did not go over the network. I propose that the DFS interface be extended to provide method that would merge files so that the merging and deleting can be performed on local data nodes with no file contents moving over the network. The method would take a list of paths to be merged and deleted and the merged file path and an indication of a file-format-aware class that would be run on each data node to perform the merge. The merge method provided by this merging class would be passed files open for reading for all the files to be merged and one file open for writing. The custom class provided merge method would read all the input files and append to the output file using some standard API that would work across all DFS implementations. The DFS would ensure that the merge had happened properly on all replicas before returning to the caller. It could be that greater resiliency could be achieved by implementing the deletion as a separate phase that is only done after enough of the replicas had completed the merge. HBase would be changed to use the new merge method for compactions, and would provide an implementation of the merging class that works with HFiles. This proposal would require a custom code that understands the file format to be runnable by the data nodes to manage the merge. So there would need to be a facility to load classes into DFS if there isn't such a facility already. Or, less generally, HDFS could build in support for HFile merging. The merge method might be optional. If the DFS implementation did not provide it a generic version that performed the merge on top of the regular DFS interfaces would be used. It may be that this method needs to be tweaked or ignored when the region server does not have a local copy data so that, as happens currently, one copy of the data moves to the region server. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HBASE-10095) Selective WALEdit encryption
[ https://issues.apache.org/jira/browse/HBASE-10095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-10095: --- Affects Version/s: (was: 0.98.0) 0.99.0 Fix Version/s: (was: 0.98.0) I've spent some time looking at how to accomplish this. We have implemented WALEdit encryption using a WALCellCodec, which is necessary because WALEdits are stratified by rows, not columns, so some cells in a WALEdit will be encrypted and some not if we are selectively doing this. In the WALCellCodec context, we only have information about the cell, we can't get a reference to anything that will lead to family information. Replication provides an existing example of how to do family-specific WALEdit modification. Replication modifies WALEdits by adding a WALActionsListener at a high level where it has access to the server. The WALEdit type already has fields for carrying scope information. We could do something similar here: We could add a field to WALEdit indicating if it should be encrypted or not and register a listener (up in HStore?) that sets it accordingly, but this is not enough because WALCellCodecs only see Cells, not the WALEdit that contains them. I have experimented with a few interface changes and am not happy with any of the results so far. So I am going to move this out. Selective WALEdit encryption Key: HBASE-10095 URL: https://issues.apache.org/jira/browse/HBASE-10095 Project: HBase Issue Type: Improvement Affects Versions: 0.99.0 Reporter: Andrew Purtell Assignee: Andrew Purtell The SecureWALProtobufWriter currently will encrypt every WAL entry if WAL encryption is enabled. However, SecureWALProtobufReader can distinguish between encrypted and unencrypted entries, and we encrypt every entry individually in part because the reader can skip and seek around during split and recovery, but also in part to enable selective encryption of WALedits. We should consider encrypting only the WALedits of column families for which HBASE-7544 features are configured. If few column families are encrypted relative to all CFs on the cluster, the performance difference will be significant. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Comment Edited] (HBASE-10095) Selective WALEdit encryption
[ https://issues.apache.org/jira/browse/HBASE-10095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854422#comment-13854422 ] Andrew Purtell edited comment on HBASE-10095 at 12/20/13 7:02 PM: -- I've spent some time looking at how to accomplish this. We have implemented WALEdit encryption using a WALCellCodec, which is necessary because WALEdits are stratified by rows, not columns, so some cells in a WALEdit will be encrypted and some not if we are selectively doing this. In the WALCellCodec context, we only have information about the cell, we can't get a reference to anything that will lead to family information. Replication provides an existing example of how to do family-specific WALEdit modification. Replication modifies WALEdits by adding a WALActionsListener at a high level where it has access to the server. The WALEdit type already has fields for carrying scope information. We could do something similar here: We could add a field to WALEdit indicating which cells for which famililes within it should be encrypted, and register a listener (up in HStore?) that sets it accordingly, but this is not enough because WALCellCodecs only see Cells, not the WALEdit that contains them. I have experimented with a few interface changes and am not happy with any of the results so far. So I am going to move this out. was (Author: apurtell): I've spent some time looking at how to accomplish this. We have implemented WALEdit encryption using a WALCellCodec, which is necessary because WALEdits are stratified by rows, not columns, so some cells in a WALEdit will be encrypted and some not if we are selectively doing this. In the WALCellCodec context, we only have information about the cell, we can't get a reference to anything that will lead to family information. Replication provides an existing example of how to do family-specific WALEdit modification. Replication modifies WALEdits by adding a WALActionsListener at a high level where it has access to the server. The WALEdit type already has fields for carrying scope information. We could do something similar here: We could add a field to WALEdit indicating if it should be encrypted or not and register a listener (up in HStore?) that sets it accordingly, but this is not enough because WALCellCodecs only see Cells, not the WALEdit that contains them. I have experimented with a few interface changes and am not happy with any of the results so far. So I am going to move this out. Selective WALEdit encryption Key: HBASE-10095 URL: https://issues.apache.org/jira/browse/HBASE-10095 Project: HBase Issue Type: Improvement Affects Versions: 0.99.0 Reporter: Andrew Purtell Assignee: Andrew Purtell The SecureWALProtobufWriter currently will encrypt every WAL entry if WAL encryption is enabled. However, SecureWALProtobufReader can distinguish between encrypted and unencrypted entries, and we encrypt every entry individually in part because the reader can skip and seek around during split and recovery, but also in part to enable selective encryption of WALedits. We should consider encrypting only the WALedits of column families for which HBASE-7544 features are configured. If few column families are encrypted relative to all CFs on the cluster, the performance difference will be significant. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Comment Edited] (HBASE-10095) Selective WALEdit encryption
[ https://issues.apache.org/jira/browse/HBASE-10095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854422#comment-13854422 ] Andrew Purtell edited comment on HBASE-10095 at 12/20/13 7:04 PM: -- I've spent some time looking at how to accomplish this. We have implemented WALEdit encryption using a WALCellCodec, which is necessary because WALEdits are stratified by rows, not columns, so some cells in a WALEdit will be encrypted and some not if we are selectively doing this. In the WALCellCodec context, we only have information about the cell, we can't get a reference to anything that will lead to family information. Replication provides an existing example of how to do family-specific WALEdit modification. Replication modifies WALEdits by adding a WALActionsListener at a high level where it has access to the server. The WALEdit type already has fields for carrying scope information. We could do something similar here: We could add a field to WALEdit indicating which cells for which famililes within it should be encrypted, and register a listener (up in HStore?) that sets it accordingly, but that would still not be quite enough because WALCellCodecs only see Cells, not the WALEdit that contains them. I have experimented with a few interface changes and am not happy with any of the results so far. So I am going to move this out. was (Author: apurtell): I've spent some time looking at how to accomplish this. We have implemented WALEdit encryption using a WALCellCodec, which is necessary because WALEdits are stratified by rows, not columns, so some cells in a WALEdit will be encrypted and some not if we are selectively doing this. In the WALCellCodec context, we only have information about the cell, we can't get a reference to anything that will lead to family information. Replication provides an existing example of how to do family-specific WALEdit modification. Replication modifies WALEdits by adding a WALActionsListener at a high level where it has access to the server. The WALEdit type already has fields for carrying scope information. We could do something similar here: We could add a field to WALEdit indicating which cells for which famililes within it should be encrypted, and register a listener (up in HStore?) that sets it accordingly, but this is not enough because WALCellCodecs only see Cells, not the WALEdit that contains them. I have experimented with a few interface changes and am not happy with any of the results so far. So I am going to move this out. Selective WALEdit encryption Key: HBASE-10095 URL: https://issues.apache.org/jira/browse/HBASE-10095 Project: HBase Issue Type: Improvement Affects Versions: 0.99.0 Reporter: Andrew Purtell Assignee: Andrew Purtell The SecureWALProtobufWriter currently will encrypt every WAL entry if WAL encryption is enabled. However, SecureWALProtobufReader can distinguish between encrypted and unencrypted entries, and we encrypt every entry individually in part because the reader can skip and seek around during split and recovery, but also in part to enable selective encryption of WALedits. We should consider encrypting only the WALedits of column families for which HBASE-7544 features are configured. If few column families are encrypted relative to all CFs on the cluster, the performance difference will be significant. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10210) during master startup, RS can be you-are-dead-ed by master in error
[ https://issues.apache.org/jira/browse/HBASE-10210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854434#comment-13854434 ] Jimmy Xiang commented on HBASE-10210: - Looks like when the master starts up, we don't put those regionservers in ZK into the online server list. Check RegionServerTracker#start. Will fixing this will fix this issue? during master startup, RS can be you-are-dead-ed by master in error --- Key: HBASE-10210 URL: https://issues.apache.org/jira/browse/HBASE-10210 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.96.1, 0.99.0, 0.96.1.1 Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HBASE-10210.patch Not sure of the root cause yet, I am at how did this ever work stage. We see this problem in 0.96.1, but didn't in 0.96.0 + some patches. It looks like RS information arriving from 2 sources - ZK and server itself, can conflict. Master doesn't handle such cases (timestamp match), and anyway technically timestamps can collide for two separate servers. So, master YouAreDead-s the already-recorded reporting RS, and adds it too. Then it discovers that the new server has died with fatal error! Note the threads. Addition is called from master initialization and from RPC. {noformat} 2013-12-19 11:16:45,290 INFO [master:h2-ubuntu12-sec-1387431063-hbase-10:6] master.ServerManager: Finished waiting for region servers count to settle; checked in 2, slept for 18262 ms, expecting minimum of 1, maximum of 2147483647, master is running. 2013-12-19 11:16:45,290 INFO [master:h2-ubuntu12-sec-1387431063-hbase-10:6] master.ServerManager: Registering server=h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 2013-12-19 11:16:45,290 INFO [master:h2-ubuntu12-sec-1387431063-hbase-10:6] master.HMaster: Registered server found up in zk but who has not yet reported in: h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 2013-12-19 11:16:45,380 INFO [RpcServer.handler=4,port=6] master.ServerManager: Triggering server recovery; existingServer h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 looks stale, new server:h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 2013-12-19 11:16:45,380 INFO [RpcServer.handler=4,port=6] master.ServerManager: Master doesn't enable ServerShutdownHandler during initialization, delay expiring server h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 ... 2013-12-19 11:16:46,925 ERROR [RpcServer.handler=7,port=6] master.HMaster: Region server h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 reported a fatal error: ABORTING region server h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800: org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected; currently processing h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 as dead server {noformat} Presumably some of the recent ZK listener related changes b -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10210) during master startup, RS can be you-are-dead-ed by master in error
[ https://issues.apache.org/jira/browse/HBASE-10210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854445#comment-13854445 ] Sergey Shelukhin commented on HBASE-10210: -- You mean the online servers in the tracker? It does add them to its internal list. Can you elaborate a bit. If they are put into other online servers, wouldn't it make the issue worse - as far as I see in the check...AndAdd method and around ,there's no provision for one server to be added twice, if it was already there the same issue will happen, report rejected. during master startup, RS can be you-are-dead-ed by master in error --- Key: HBASE-10210 URL: https://issues.apache.org/jira/browse/HBASE-10210 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.96.1, 0.99.0, 0.96.1.1 Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HBASE-10210.patch Not sure of the root cause yet, I am at how did this ever work stage. We see this problem in 0.96.1, but didn't in 0.96.0 + some patches. It looks like RS information arriving from 2 sources - ZK and server itself, can conflict. Master doesn't handle such cases (timestamp match), and anyway technically timestamps can collide for two separate servers. So, master YouAreDead-s the already-recorded reporting RS, and adds it too. Then it discovers that the new server has died with fatal error! Note the threads. Addition is called from master initialization and from RPC. {noformat} 2013-12-19 11:16:45,290 INFO [master:h2-ubuntu12-sec-1387431063-hbase-10:6] master.ServerManager: Finished waiting for region servers count to settle; checked in 2, slept for 18262 ms, expecting minimum of 1, maximum of 2147483647, master is running. 2013-12-19 11:16:45,290 INFO [master:h2-ubuntu12-sec-1387431063-hbase-10:6] master.ServerManager: Registering server=h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 2013-12-19 11:16:45,290 INFO [master:h2-ubuntu12-sec-1387431063-hbase-10:6] master.HMaster: Registered server found up in zk but who has not yet reported in: h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 2013-12-19 11:16:45,380 INFO [RpcServer.handler=4,port=6] master.ServerManager: Triggering server recovery; existingServer h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 looks stale, new server:h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 2013-12-19 11:16:45,380 INFO [RpcServer.handler=4,port=6] master.ServerManager: Master doesn't enable ServerShutdownHandler during initialization, delay expiring server h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 ... 2013-12-19 11:16:46,925 ERROR [RpcServer.handler=7,port=6] master.HMaster: Region server h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 reported a fatal error: ABORTING region server h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800: org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected; currently processing h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 as dead server {noformat} Presumably some of the recent ZK listener related changes b -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Comment Edited] (HBASE-10210) during master startup, RS can be you-are-dead-ed by master in error
[ https://issues.apache.org/jira/browse/HBASE-10210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854445#comment-13854445 ] Sergey Shelukhin edited comment on HBASE-10210 at 12/20/13 7:15 PM: You mean the online servers in the tracker? It does add them to its internal list. Can you elaborate a bit. If they are put into other online servers, wouldn't it make the issue worse - as far as I see in the check...AndAdd method and around ,there's no provision for one server to be added twice, if it was already there the same issue will happen, it will expire the old one (from ZK), then get report rejected. was (Author: sershe): You mean the online servers in the tracker? It does add them to its internal list. Can you elaborate a bit. If they are put into other online servers, wouldn't it make the issue worse - as far as I see in the check...AndAdd method and around ,there's no provision for one server to be added twice, if it was already there the same issue will happen, report rejected. during master startup, RS can be you-are-dead-ed by master in error --- Key: HBASE-10210 URL: https://issues.apache.org/jira/browse/HBASE-10210 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.96.1, 0.99.0, 0.96.1.1 Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HBASE-10210.patch Not sure of the root cause yet, I am at how did this ever work stage. We see this problem in 0.96.1, but didn't in 0.96.0 + some patches. It looks like RS information arriving from 2 sources - ZK and server itself, can conflict. Master doesn't handle such cases (timestamp match), and anyway technically timestamps can collide for two separate servers. So, master YouAreDead-s the already-recorded reporting RS, and adds it too. Then it discovers that the new server has died with fatal error! Note the threads. Addition is called from master initialization and from RPC. {noformat} 2013-12-19 11:16:45,290 INFO [master:h2-ubuntu12-sec-1387431063-hbase-10:6] master.ServerManager: Finished waiting for region servers count to settle; checked in 2, slept for 18262 ms, expecting minimum of 1, maximum of 2147483647, master is running. 2013-12-19 11:16:45,290 INFO [master:h2-ubuntu12-sec-1387431063-hbase-10:6] master.ServerManager: Registering server=h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 2013-12-19 11:16:45,290 INFO [master:h2-ubuntu12-sec-1387431063-hbase-10:6] master.HMaster: Registered server found up in zk but who has not yet reported in: h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 2013-12-19 11:16:45,380 INFO [RpcServer.handler=4,port=6] master.ServerManager: Triggering server recovery; existingServer h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 looks stale, new server:h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 2013-12-19 11:16:45,380 INFO [RpcServer.handler=4,port=6] master.ServerManager: Master doesn't enable ServerShutdownHandler during initialization, delay expiring server h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 ... 2013-12-19 11:16:46,925 ERROR [RpcServer.handler=7,port=6] master.HMaster: Region server h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 reported a fatal error: ABORTING region server h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800: org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected; currently processing h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 as dead server {noformat} Presumably some of the recent ZK listener related changes b -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10210) during master startup, RS can be you-are-dead-ed by master in error
[ https://issues.apache.org/jira/browse/HBASE-10210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854452#comment-13854452 ] Jimmy Xiang commented on HBASE-10210: - I have not thought through the issue yet. For now, as I know ServerManager has a list, and RegionServerTracker has a list too. The start call only adds the rs from the ZK to the list in RegionServerTracker, which is right. However, For the first run, should we also add them to the list in ServerManager? during master startup, RS can be you-are-dead-ed by master in error --- Key: HBASE-10210 URL: https://issues.apache.org/jira/browse/HBASE-10210 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.96.1, 0.99.0, 0.96.1.1 Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HBASE-10210.patch Not sure of the root cause yet, I am at how did this ever work stage. We see this problem in 0.96.1, but didn't in 0.96.0 + some patches. It looks like RS information arriving from 2 sources - ZK and server itself, can conflict. Master doesn't handle such cases (timestamp match), and anyway technically timestamps can collide for two separate servers. So, master YouAreDead-s the already-recorded reporting RS, and adds it too. Then it discovers that the new server has died with fatal error! Note the threads. Addition is called from master initialization and from RPC. {noformat} 2013-12-19 11:16:45,290 INFO [master:h2-ubuntu12-sec-1387431063-hbase-10:6] master.ServerManager: Finished waiting for region servers count to settle; checked in 2, slept for 18262 ms, expecting minimum of 1, maximum of 2147483647, master is running. 2013-12-19 11:16:45,290 INFO [master:h2-ubuntu12-sec-1387431063-hbase-10:6] master.ServerManager: Registering server=h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 2013-12-19 11:16:45,290 INFO [master:h2-ubuntu12-sec-1387431063-hbase-10:6] master.HMaster: Registered server found up in zk but who has not yet reported in: h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 2013-12-19 11:16:45,380 INFO [RpcServer.handler=4,port=6] master.ServerManager: Triggering server recovery; existingServer h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 looks stale, new server:h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 2013-12-19 11:16:45,380 INFO [RpcServer.handler=4,port=6] master.ServerManager: Master doesn't enable ServerShutdownHandler during initialization, delay expiring server h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 ... 2013-12-19 11:16:46,925 ERROR [RpcServer.handler=7,port=6] master.HMaster: Region server h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 reported a fatal error: ABORTING region server h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800: org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected; currently processing h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 as dead server {noformat} Presumably some of the recent ZK listener related changes b -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10183) Need enforce a reserved range of system tag types
[ https://issues.apache.org/jira/browse/HBASE-10183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854454#comment-13854454 ] Jeffrey Zhong commented on HBASE-10183: --- Sounds good. Please close it as dup. Thanks. Need enforce a reserved range of system tag types - Key: HBASE-10183 URL: https://issues.apache.org/jira/browse/HBASE-10183 Project: HBase Issue Type: Task Components: HFile Affects Versions: 0.98.0 Reporter: Jeffrey Zhong Assignee: ramkrishna.s.vasudevan Priority: Critical Fix For: 0.98.0 If we don't reserve a system tag types now, let's say 0-64(total tag type range is 0-255). We'll have a hard time when introducing a new system tag type in the future because the new tag type may collide with an existing user tag type as tag is open to users as well. [~ram_krish], [~anoop.hbase] How do you guys think? Thanks! -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-8558) Add timeout limit for HBaseClient dataOutputStream
[ https://issues.apache.org/jira/browse/HBASE-8558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854459#comment-13854459 ] Lars Hofhansl commented on HBASE-8558: -- +1 Add timeout limit for HBaseClient dataOutputStream -- Key: HBASE-8558 URL: https://issues.apache.org/jira/browse/HBASE-8558 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.94.5, 0.94.14 Reporter: wanbin Assignee: Liang Xie Attachments: HBASE-8558-0.94.txt I run jstack at client host. The result is below. hbase-tablepool-60-thread-34 daemon prio=10 tid=0x7f1e65a48000 nid=0x5173 runnable [0x579cc000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) - locked 0x000758cb0780 (a sun.nio.ch.Util$2) - locked 0x000758cb0770 (a java.util.Collections$UnmodifiableSet) - locked 0x000758cb0548 (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:336) at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:158) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:153) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:114) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) - locked 0x000754e978a0 (a java.io.BufferedOutputStream) at java.io.DataOutputStream.flush(DataOutputStream.java:106) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.sendParam(HBaseClient.java:620) - locked 0x000754e97880 (a java.io.DataOutputStream) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:975) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:86) at $Proxy13.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1395) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1393) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:210) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1402) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1390) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) This thread have hung for one hours Meanwhile other thread try to close connection IPC Client (1983049639) connection to dump002030.cm6.tbsite.net/10.246.2.30:30020 from admin daemon prio=10 tid=0x7f1e70674800 nid=0x3d76 waiting for monitor entry [0x4bc0f000] java.lang.Thread.State: BLOCKED (on object monitor) at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) - waiting to lock 0x000754e978a0 (a java.io.BufferedOutputStream) at java.io.DataOutputStream.flush(DataOutputStream.java:106) at java.io.FilterOutputStream.close(FilterOutputStream.java:140) at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:237) at org.apache.hadoop.io.IOUtils.closeStream(IOUtils.java:254) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.close(HBaseClient.java:715) - locked 0x000754e7b818 (a org.apache.hadoop.hbase.ipc.HBaseClient$Connection) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:587) dump002030.cm6.tbsite.net is dead regionserver. I read hbase sourececode, discover connection.out doesn't set timeout this.out = new DataOutputStream (new BufferedOutputStream(NetUtils.getOutputStream(socket))); I see this mean epoll_wait will block indefinitely. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10210) during master startup, RS can be you-are-dead-ed by master in error
[ https://issues.apache.org/jira/browse/HBASE-10210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854467#comment-13854467 ] Sergey Shelukhin commented on HBASE-10210: -- that's what it does in the loop after waiting for reporting servers (only for non-reported), afais during master startup, RS can be you-are-dead-ed by master in error --- Key: HBASE-10210 URL: https://issues.apache.org/jira/browse/HBASE-10210 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.96.1, 0.99.0, 0.96.1.1 Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HBASE-10210.patch Not sure of the root cause yet, I am at how did this ever work stage. We see this problem in 0.96.1, but didn't in 0.96.0 + some patches. It looks like RS information arriving from 2 sources - ZK and server itself, can conflict. Master doesn't handle such cases (timestamp match), and anyway technically timestamps can collide for two separate servers. So, master YouAreDead-s the already-recorded reporting RS, and adds it too. Then it discovers that the new server has died with fatal error! Note the threads. Addition is called from master initialization and from RPC. {noformat} 2013-12-19 11:16:45,290 INFO [master:h2-ubuntu12-sec-1387431063-hbase-10:6] master.ServerManager: Finished waiting for region servers count to settle; checked in 2, slept for 18262 ms, expecting minimum of 1, maximum of 2147483647, master is running. 2013-12-19 11:16:45,290 INFO [master:h2-ubuntu12-sec-1387431063-hbase-10:6] master.ServerManager: Registering server=h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 2013-12-19 11:16:45,290 INFO [master:h2-ubuntu12-sec-1387431063-hbase-10:6] master.HMaster: Registered server found up in zk but who has not yet reported in: h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 2013-12-19 11:16:45,380 INFO [RpcServer.handler=4,port=6] master.ServerManager: Triggering server recovery; existingServer h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 looks stale, new server:h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 2013-12-19 11:16:45,380 INFO [RpcServer.handler=4,port=6] master.ServerManager: Master doesn't enable ServerShutdownHandler during initialization, delay expiring server h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 ... 2013-12-19 11:16:46,925 ERROR [RpcServer.handler=7,port=6] master.HMaster: Region server h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 reported a fatal error: ABORTING region server h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800: org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected; currently processing h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 as dead server {noformat} Presumably some of the recent ZK listener related changes b -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-8529) checkOpen is missing from multi, mutate, get and multiGet etc.
[ https://issues.apache.org/jira/browse/HBASE-8529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854472#comment-13854472 ] Jeffrey Zhong commented on HBASE-8529: -- Thanks [~anoop.hbase], [~ram_krish] for the reviews! I've integrated it into 0.98 and trunk branch. checkOpen is missing from multi, mutate, get and multiGet etc. -- Key: HBASE-8529 URL: https://issues.apache.org/jira/browse/HBASE-8529 Project: HBase Issue Type: Bug Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Priority: Minor Fix For: 0.98.0, 0.99.0 Attachments: hbase-8529.patch I saw we have checkOpen in all those functions in 0.94 while they're missing from trunk. Does anyone know why? For multi and mutate, if we don't call checkOpen we could flood our logs with bunch of DFSOutputStream is closed errors when we sync WAL. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HBASE-8529) checkOpen is missing from multi, mutate, get and multiGet etc.
[ https://issues.apache.org/jira/browse/HBASE-8529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Zhong updated HBASE-8529: - Resolution: Fixed Status: Resolved (was: Patch Available) checkOpen is missing from multi, mutate, get and multiGet etc. -- Key: HBASE-8529 URL: https://issues.apache.org/jira/browse/HBASE-8529 Project: HBase Issue Type: Bug Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Priority: Minor Fix For: 0.98.0, 0.99.0 Attachments: hbase-8529.patch I saw we have checkOpen in all those functions in 0.94 while they're missing from trunk. Does anyone know why? For multi and mutate, if we don't call checkOpen we could flood our logs with bunch of DFSOutputStream is closed errors when we sync WAL. -- This message was sent by Atlassian JIRA (v6.1.4#6159)