[jira] [Updated] (HBASE-8558) I meet a strange phenomenon. when a regionserver die , meanwhile client which is performing put operation hangs.

2013-12-20 Thread Liang Xie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang Xie updated HBASE-8558:
-

Attachment: HBASE-8558-0.94.txt

 I meet a strange phenomenon. when a regionserver die , meanwhile client which 
 is performing put operation hangs. 
 -

 Key: HBASE-8558
 URL: https://issues.apache.org/jira/browse/HBASE-8558
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.94.5, 0.94.14
Reporter: wanbin
 Attachments: HBASE-8558-0.94.txt


 I run jstack at client host. The result is below.
 hbase-tablepool-60-thread-34 daemon prio=10 tid=0x7f1e65a48000 
 nid=0x5173 runnable [0x579cc000]
java.lang.Thread.State: RUNNABLE
 at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
 at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
 at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
 at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
 - locked 0x000758cb0780 (a sun.nio.ch.Util$2)
 - locked 0x000758cb0770 (a 
 java.util.Collections$UnmodifiableSet)
 - locked 0x000758cb0548 (a sun.nio.ch.EPollSelectorImpl)
 at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
 at 
 org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:336)
 at 
 org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:158)
 at 
 org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:153)
 at 
 org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:114)
 at 
 java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
 at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
 - locked 0x000754e978a0 (a java.io.BufferedOutputStream)
 at java.io.DataOutputStream.flush(DataOutputStream.java:106)
 at 
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection.sendParam(HBaseClient.java:620)
 - locked 0x000754e97880 (a java.io.DataOutputStream)
 at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:975)
 at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:86)
 at $Proxy13.multi(Unknown Source)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1395)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1393)
 at 
 org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:210)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1402)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1390)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 This thread have hung for one hours
 Meanwhile other thread try to close connection
 IPC Client (1983049639) connection to 
 dump002030.cm6.tbsite.net/10.246.2.30:30020 from admin daemon prio=10 
 tid=0x7f1e70674800 nid=0x3d76 waiting for monitor entry 
 [0x4bc0f000]
java.lang.Thread.State: BLOCKED (on object monitor)
 at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
 - waiting to lock 0x000754e978a0 (a 
 java.io.BufferedOutputStream)
 at java.io.DataOutputStream.flush(DataOutputStream.java:106)
 at java.io.FilterOutputStream.close(FilterOutputStream.java:140)
 at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:237)
 at org.apache.hadoop.io.IOUtils.closeStream(IOUtils.java:254)
 at 
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection.close(HBaseClient.java:715)
 - locked 0x000754e7b818 (a 
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection)
 at 
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:587)
 dump002030.cm6.tbsite.net is dead regionserver.
 I read  hbase sourececode, discover connection.out doesn't set timeout 
 this.out = new DataOutputStream
 (new BufferedOutputStream(NetUtils.getOutputStream(socket)));
 I see this mean epoll_wait will block indefinitely. 



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Assigned] (HBASE-8558) I meet a strange phenomenon. when a regionserver die , meanwhile client which is performing put operation hangs.

2013-12-20 Thread Liang Xie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang Xie reassigned HBASE-8558:


Assignee: Liang Xie

 I meet a strange phenomenon. when a regionserver die , meanwhile client which 
 is performing put operation hangs. 
 -

 Key: HBASE-8558
 URL: https://issues.apache.org/jira/browse/HBASE-8558
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.94.5, 0.94.14
Reporter: wanbin
Assignee: Liang Xie
 Attachments: HBASE-8558-0.94.txt


 I run jstack at client host. The result is below.
 hbase-tablepool-60-thread-34 daemon prio=10 tid=0x7f1e65a48000 
 nid=0x5173 runnable [0x579cc000]
java.lang.Thread.State: RUNNABLE
 at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
 at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
 at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
 at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
 - locked 0x000758cb0780 (a sun.nio.ch.Util$2)
 - locked 0x000758cb0770 (a 
 java.util.Collections$UnmodifiableSet)
 - locked 0x000758cb0548 (a sun.nio.ch.EPollSelectorImpl)
 at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
 at 
 org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:336)
 at 
 org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:158)
 at 
 org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:153)
 at 
 org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:114)
 at 
 java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
 at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
 - locked 0x000754e978a0 (a java.io.BufferedOutputStream)
 at java.io.DataOutputStream.flush(DataOutputStream.java:106)
 at 
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection.sendParam(HBaseClient.java:620)
 - locked 0x000754e97880 (a java.io.DataOutputStream)
 at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:975)
 at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:86)
 at $Proxy13.multi(Unknown Source)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1395)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1393)
 at 
 org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:210)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1402)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1390)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 This thread have hung for one hours
 Meanwhile other thread try to close connection
 IPC Client (1983049639) connection to 
 dump002030.cm6.tbsite.net/10.246.2.30:30020 from admin daemon prio=10 
 tid=0x7f1e70674800 nid=0x3d76 waiting for monitor entry 
 [0x4bc0f000]
java.lang.Thread.State: BLOCKED (on object monitor)
 at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
 - waiting to lock 0x000754e978a0 (a 
 java.io.BufferedOutputStream)
 at java.io.DataOutputStream.flush(DataOutputStream.java:106)
 at java.io.FilterOutputStream.close(FilterOutputStream.java:140)
 at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:237)
 at org.apache.hadoop.io.IOUtils.closeStream(IOUtils.java:254)
 at 
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection.close(HBaseClient.java:715)
 - locked 0x000754e7b818 (a 
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection)
 at 
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:587)
 dump002030.cm6.tbsite.net is dead regionserver.
 I read  hbase sourececode, discover connection.out doesn't set timeout 
 this.out = new DataOutputStream
 (new BufferedOutputStream(NetUtils.getOutputStream(socket)));
 I see this mean epoll_wait will block indefinitely. 



--
This message was sent by 

[jira] [Updated] (HBASE-8558) I meet a strange phenomenon. when a regionserver die , meanwhile client which is performing put operation hangs.

2013-12-20 Thread Liang Xie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang Xie updated HBASE-8558:
-

Affects Version/s: 0.94.14
   Status: Patch Available  (was: Open)

 I meet a strange phenomenon. when a regionserver die , meanwhile client which 
 is performing put operation hangs. 
 -

 Key: HBASE-8558
 URL: https://issues.apache.org/jira/browse/HBASE-8558
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.94.14, 0.94.5
Reporter: wanbin
Assignee: Liang Xie
 Attachments: HBASE-8558-0.94.txt


 I run jstack at client host. The result is below.
 hbase-tablepool-60-thread-34 daemon prio=10 tid=0x7f1e65a48000 
 nid=0x5173 runnable [0x579cc000]
java.lang.Thread.State: RUNNABLE
 at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
 at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
 at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
 at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
 - locked 0x000758cb0780 (a sun.nio.ch.Util$2)
 - locked 0x000758cb0770 (a 
 java.util.Collections$UnmodifiableSet)
 - locked 0x000758cb0548 (a sun.nio.ch.EPollSelectorImpl)
 at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
 at 
 org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:336)
 at 
 org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:158)
 at 
 org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:153)
 at 
 org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:114)
 at 
 java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
 at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
 - locked 0x000754e978a0 (a java.io.BufferedOutputStream)
 at java.io.DataOutputStream.flush(DataOutputStream.java:106)
 at 
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection.sendParam(HBaseClient.java:620)
 - locked 0x000754e97880 (a java.io.DataOutputStream)
 at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:975)
 at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:86)
 at $Proxy13.multi(Unknown Source)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1395)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1393)
 at 
 org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:210)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1402)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1390)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 This thread have hung for one hours
 Meanwhile other thread try to close connection
 IPC Client (1983049639) connection to 
 dump002030.cm6.tbsite.net/10.246.2.30:30020 from admin daemon prio=10 
 tid=0x7f1e70674800 nid=0x3d76 waiting for monitor entry 
 [0x4bc0f000]
java.lang.Thread.State: BLOCKED (on object monitor)
 at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
 - waiting to lock 0x000754e978a0 (a 
 java.io.BufferedOutputStream)
 at java.io.DataOutputStream.flush(DataOutputStream.java:106)
 at java.io.FilterOutputStream.close(FilterOutputStream.java:140)
 at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:237)
 at org.apache.hadoop.io.IOUtils.closeStream(IOUtils.java:254)
 at 
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection.close(HBaseClient.java:715)
 - locked 0x000754e7b818 (a 
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection)
 at 
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:587)
 dump002030.cm6.tbsite.net is dead regionserver.
 I read  hbase sourececode, discover connection.out doesn't set timeout 
 this.out = new DataOutputStream
 (new BufferedOutputStream(NetUtils.getOutputStream(socket)));
 I see this mean epoll_wait will 

[jira] [Updated] (HBASE-8558) Add timeout limit for HBaseClient dataOutputStream

2013-12-20 Thread Liang Xie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang Xie updated HBASE-8558:
-

Summary: Add timeout limit for HBaseClient dataOutputStream  (was: I meet a 
strange phenomenon. when a regionserver die , meanwhile client which is 
performing put operation hangs. )

 Add timeout limit for HBaseClient dataOutputStream
 --

 Key: HBASE-8558
 URL: https://issues.apache.org/jira/browse/HBASE-8558
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.94.5, 0.94.14
Reporter: wanbin
Assignee: Liang Xie
 Attachments: HBASE-8558-0.94.txt


 I run jstack at client host. The result is below.
 hbase-tablepool-60-thread-34 daemon prio=10 tid=0x7f1e65a48000 
 nid=0x5173 runnable [0x579cc000]
java.lang.Thread.State: RUNNABLE
 at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
 at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
 at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
 at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
 - locked 0x000758cb0780 (a sun.nio.ch.Util$2)
 - locked 0x000758cb0770 (a 
 java.util.Collections$UnmodifiableSet)
 - locked 0x000758cb0548 (a sun.nio.ch.EPollSelectorImpl)
 at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
 at 
 org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:336)
 at 
 org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:158)
 at 
 org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:153)
 at 
 org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:114)
 at 
 java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
 at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
 - locked 0x000754e978a0 (a java.io.BufferedOutputStream)
 at java.io.DataOutputStream.flush(DataOutputStream.java:106)
 at 
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection.sendParam(HBaseClient.java:620)
 - locked 0x000754e97880 (a java.io.DataOutputStream)
 at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:975)
 at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:86)
 at $Proxy13.multi(Unknown Source)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1395)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1393)
 at 
 org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:210)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1402)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1390)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 This thread have hung for one hours
 Meanwhile other thread try to close connection
 IPC Client (1983049639) connection to 
 dump002030.cm6.tbsite.net/10.246.2.30:30020 from admin daemon prio=10 
 tid=0x7f1e70674800 nid=0x3d76 waiting for monitor entry 
 [0x4bc0f000]
java.lang.Thread.State: BLOCKED (on object monitor)
 at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
 - waiting to lock 0x000754e978a0 (a 
 java.io.BufferedOutputStream)
 at java.io.DataOutputStream.flush(DataOutputStream.java:106)
 at java.io.FilterOutputStream.close(FilterOutputStream.java:140)
 at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:237)
 at org.apache.hadoop.io.IOUtils.closeStream(IOUtils.java:254)
 at 
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection.close(HBaseClient.java:715)
 - locked 0x000754e7b818 (a 
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection)
 at 
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:587)
 dump002030.cm6.tbsite.net is dead regionserver.
 I read  hbase sourececode, discover connection.out doesn't set timeout 
 this.out = new DataOutputStream
 (new BufferedOutputStream(NetUtils.getOutputStream(socket)));
 I see this mean epoll_wait will block indefinitely. 




[jira] [Commented] (HBASE-8558) Add timeout limit for HBaseClient dataOutputStream

2013-12-20 Thread Liang Xie (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13853801#comment-13853801
 ] 

Liang Xie commented on HBASE-8558:
--

Thanks [~wanbin] for your detailed report ! Current impl has a default 0 
value for timeout, it really need an explicit setting:)
Right now it's just a 0.94 branch issue,  i found all 0.96+ branch have those 
similar code style already:
{code}
NetUtils.getOutputStream(socket, pingInterval);
{code}

[~lhofhansl], i didn't add/run any test case, but it's just small, so, i guess 
OK:)

 Add timeout limit for HBaseClient dataOutputStream
 --

 Key: HBASE-8558
 URL: https://issues.apache.org/jira/browse/HBASE-8558
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.94.5, 0.94.14
Reporter: wanbin
Assignee: Liang Xie
 Attachments: HBASE-8558-0.94.txt


 I run jstack at client host. The result is below.
 hbase-tablepool-60-thread-34 daemon prio=10 tid=0x7f1e65a48000 
 nid=0x5173 runnable [0x579cc000]
java.lang.Thread.State: RUNNABLE
 at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
 at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
 at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
 at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
 - locked 0x000758cb0780 (a sun.nio.ch.Util$2)
 - locked 0x000758cb0770 (a 
 java.util.Collections$UnmodifiableSet)
 - locked 0x000758cb0548 (a sun.nio.ch.EPollSelectorImpl)
 at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
 at 
 org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:336)
 at 
 org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:158)
 at 
 org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:153)
 at 
 org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:114)
 at 
 java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
 at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
 - locked 0x000754e978a0 (a java.io.BufferedOutputStream)
 at java.io.DataOutputStream.flush(DataOutputStream.java:106)
 at 
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection.sendParam(HBaseClient.java:620)
 - locked 0x000754e97880 (a java.io.DataOutputStream)
 at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:975)
 at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:86)
 at $Proxy13.multi(Unknown Source)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1395)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1393)
 at 
 org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:210)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1402)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1390)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 This thread have hung for one hours
 Meanwhile other thread try to close connection
 IPC Client (1983049639) connection to 
 dump002030.cm6.tbsite.net/10.246.2.30:30020 from admin daemon prio=10 
 tid=0x7f1e70674800 nid=0x3d76 waiting for monitor entry 
 [0x4bc0f000]
java.lang.Thread.State: BLOCKED (on object monitor)
 at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
 - waiting to lock 0x000754e978a0 (a 
 java.io.BufferedOutputStream)
 at java.io.DataOutputStream.flush(DataOutputStream.java:106)
 at java.io.FilterOutputStream.close(FilterOutputStream.java:140)
 at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:237)
 at org.apache.hadoop.io.IOUtils.closeStream(IOUtils.java:254)
 at 
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection.close(HBaseClient.java:715)
 - locked 0x000754e7b818 (a 
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection)
 at 
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:587)
 dump002030.cm6.tbsite.net is dead 

[jira] [Created] (HBASE-10213) Add read log size per second metrics for replication source

2013-12-20 Thread cuijianwei (JIRA)
cuijianwei created HBASE-10213:
--

 Summary: Add read log size per second metrics for replication 
source
 Key: HBASE-10213
 URL: https://issues.apache.org/jira/browse/HBASE-10213
 Project: HBase
  Issue Type: Improvement
  Components: metrics, Replication
Affects Versions: 0.94.14
Reporter: cuijianwei
Priority: Minor


The current metrics of replication source contain logEditsReadRate, 
shippedBatchesRate, etc, which could indicate how fast the data replicated to 
peer cluster to some extent. However, it is not clear enough to know how many 
bytes replicating to peer cluster from these metrics. In production 
environment, it may be important to know the size of replicating data per 
second because the services may be affected if the network become busy.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HBASE-10213) Add read log size per second metrics for replication source

2013-12-20 Thread cuijianwei (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

cuijianwei updated HBASE-10213:
---

Attachment: HBASE-10213-0.94-v1.patch

This patch adds a metric 'logReadRateInByte' to show how many bytes read by the 
source per second.

 Add read log size per second metrics for replication source
 ---

 Key: HBASE-10213
 URL: https://issues.apache.org/jira/browse/HBASE-10213
 Project: HBase
  Issue Type: Improvement
  Components: metrics, Replication
Affects Versions: 0.94.14
Reporter: cuijianwei
Priority: Minor
 Attachments: HBASE-10213-0.94-v1.patch


 The current metrics of replication source contain logEditsReadRate, 
 shippedBatchesRate, etc, which could indicate how fast the data replicated to 
 peer cluster to some extent. However, it is not clear enough to know how many 
 bytes replicating to peer cluster from these metrics. In production 
 environment, it may be important to know the size of replicating data per 
 second because the services may be affected if the network become busy.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HBASE-10213) Add read log size per second metrics for replication source

2013-12-20 Thread Liang Xie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang Xie updated HBASE-10213:
--

Assignee: cuijianwei
  Status: Patch Available  (was: Open)

 Add read log size per second metrics for replication source
 ---

 Key: HBASE-10213
 URL: https://issues.apache.org/jira/browse/HBASE-10213
 Project: HBase
  Issue Type: Improvement
  Components: metrics, Replication
Affects Versions: 0.94.14
Reporter: cuijianwei
Assignee: cuijianwei
Priority: Minor
 Attachments: HBASE-10213-0.94-v1.patch


 The current metrics of replication source contain logEditsReadRate, 
 shippedBatchesRate, etc, which could indicate how fast the data replicated to 
 peer cluster to some extent. However, it is not clear enough to know how many 
 bytes replicating to peer cluster from these metrics. In production 
 environment, it may be important to know the size of replicating data per 
 second because the services may be affected if the network become busy.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-7781) Update security unit tests to use a KDC if available

2013-12-20 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13853857#comment-13853857
 ] 

ramkrishna.s.vasudevan commented on HBASE-7781:
---

Before proceeding with the JIRA, so I went through what is given in all the 
JIRAs mentioned here.  HADOOP-8078 also tries to start the ApacheDS but it 
seems to be an older version.
HADOOP-9848 introduces miniKDC in the hadoop project itself as a module.
So we would also be introducing the miniKDC in HBase side and all security 
testcases will run that along with the cluster?  
So the miniKDC available in hbase will be a seperate module(like in hadoop) or 
will it be a class that just allows to start a minikdc?  

 Update security unit tests to use a KDC if available
 

 Key: HBASE-7781
 URL: https://issues.apache.org/jira/browse/HBASE-7781
 Project: HBase
  Issue Type: Test
  Components: security, test
Reporter: Gary Helmling
Assignee: ramkrishna.s.vasudevan
Priority: Blocker
 Fix For: 0.98.0


 We currently have large holes in the test coverage of HBase with security 
 enabled.  Two recent examples of bugs which really should have been caught 
 with testing are HBASE-7771 and HBASE-7772.  The long standing problem with 
 testing with security enabled has been the requirement for supporting 
 kerberos infrastructure.
 We need to close this gap and provide some automated testing with security 
 enabled, if necessary standing up and provisioning a temporary KDC as an 
 option for running integration tests, see HADOOP-8078 and HADOOP-9004 where a 
 similar approach was taken.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Created] (HBASE-10214) Regionserver shutdown impropery and leave the dir in .old not delete.

2013-12-20 Thread binlijin (JIRA)
binlijin created HBASE-10214:


 Summary: Regionserver shutdown impropery and leave the dir in .old 
not delete.
 Key: HBASE-10214
 URL: https://issues.apache.org/jira/browse/HBASE-10214
 Project: HBase
  Issue Type: Bug
Reporter: binlijin


RegionServer log
{code}
2013-12-18 15:17:45,771 DEBUG 
org.apache.hadoop.hbase.regionserver.HRegionServer: Waiting on 
51b27391410efdca841db264df46085f
2013-12-18 15:17:45,776 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: Connected to master at null

2013-12-18 15:17:48,776 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Exiting; cluster 
shutdown set and not carrying any regions
2013-12-18 15:17:48,776 FATAL 
org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
node,60020,1384410974572: Unhandled exception: null
java.lang.NullPointerException
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:880)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:753)
at java.lang.Thread.run(Thread.java:662)
{code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HBASE-10214) Regionserver shutdown impropery and leave the dir in .old not delete.

2013-12-20 Thread binlijin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

binlijin updated HBASE-10214:
-

Attachment: HBASE-10214.patch

 Regionserver shutdown impropery and leave the dir in .old not delete.
 -

 Key: HBASE-10214
 URL: https://issues.apache.org/jira/browse/HBASE-10214
 Project: HBase
  Issue Type: Bug
Reporter: binlijin
 Attachments: HBASE-10214.patch


 RegionServer log
 {code}
 2013-12-18 15:17:45,771 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: Waiting on 
 51b27391410efdca841db264df46085f
 2013-12-18 15:17:45,776 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: Connected to master at 
 null
 2013-12-18 15:17:48,776 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Exiting; cluster 
 shutdown set and not carrying any regions
 2013-12-18 15:17:48,776 FATAL 
 org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
 node,60020,1384410974572: Unhandled exception: null
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:880)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:753)
 at java.lang.Thread.run(Thread.java:662)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HBASE-10214) Regionserver shutdown impropery and leave the dir in .old not delete.

2013-12-20 Thread binlijin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

binlijin updated HBASE-10214:
-

Attachment: HBASE-10214-94.patch

 Regionserver shutdown impropery and leave the dir in .old not delete.
 -

 Key: HBASE-10214
 URL: https://issues.apache.org/jira/browse/HBASE-10214
 Project: HBase
  Issue Type: Bug
Reporter: binlijin
 Attachments: HBASE-10214-94.patch


 RegionServer log
 {code}
 2013-12-18 15:17:45,771 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: Waiting on 
 51b27391410efdca841db264df46085f
 2013-12-18 15:17:45,776 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: Connected to master at 
 null
 2013-12-18 15:17:48,776 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Exiting; cluster 
 shutdown set and not carrying any regions
 2013-12-18 15:17:48,776 FATAL 
 org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
 node,60020,1384410974572: Unhandled exception: null
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:880)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:753)
 at java.lang.Thread.run(Thread.java:662)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10214) Regionserver shutdown impropery and leave the dir in .old not delete.

2013-12-20 Thread binlijin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13853876#comment-13853876
 ] 

binlijin commented on HBASE-10214:
--

Looks like the trunk don't have this problem and the patch is based on 
0.94-branch.

 Regionserver shutdown impropery and leave the dir in .old not delete.
 -

 Key: HBASE-10214
 URL: https://issues.apache.org/jira/browse/HBASE-10214
 Project: HBase
  Issue Type: Bug
Reporter: binlijin
 Attachments: HBASE-10214-94.patch


 RegionServer log
 {code}
 2013-12-18 15:17:45,771 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: Waiting on 
 51b27391410efdca841db264df46085f
 2013-12-18 15:17:45,776 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: Connected to master at 
 null
 2013-12-18 15:17:48,776 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Exiting; cluster 
 shutdown set and not carrying any regions
 2013-12-18 15:17:48,776 FATAL 
 org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
 node,60020,1384410974572: Unhandled exception: null
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:880)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:753)
 at java.lang.Thread.run(Thread.java:662)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HBASE-10214) Regionserver shutdown impropery and leave the dir in .old not delete.

2013-12-20 Thread binlijin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

binlijin updated HBASE-10214:
-

Attachment: (was: HBASE-10214.patch)

 Regionserver shutdown impropery and leave the dir in .old not delete.
 -

 Key: HBASE-10214
 URL: https://issues.apache.org/jira/browse/HBASE-10214
 Project: HBase
  Issue Type: Bug
Reporter: binlijin
 Attachments: HBASE-10214-94.patch


 RegionServer log
 {code}
 2013-12-18 15:17:45,771 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: Waiting on 
 51b27391410efdca841db264df46085f
 2013-12-18 15:17:45,776 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: Connected to master at 
 null
 2013-12-18 15:17:48,776 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Exiting; cluster 
 shutdown set and not carrying any regions
 2013-12-18 15:17:48,776 FATAL 
 org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
 node,60020,1384410974572: Unhandled exception: null
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:880)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:753)
 at java.lang.Thread.run(Thread.java:662)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HBASE-10161) [AccessController] Tolerate regions in recovery

2013-12-20 Thread Anoop Sam John (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John updated HBASE-10161:
---

Status: Open  (was: Patch Available)

 [AccessController] Tolerate regions in recovery
 ---

 Key: HBASE-10161
 URL: https://issues.apache.org/jira/browse/HBASE-10161
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.96.0
Reporter: Andrew Purtell
Assignee: Anoop Sam John
Priority: Blocker
 Fix For: 0.98.0, 0.96.2, 0.99.0

 Attachments: HBASE-10161.patch, HBASE-10161_V2.patch


 AccessController fixes for the issue also affecting VisibilityController 
 described on HBASE-10148. Coprocessors that initialize in postOpen upcalls 
 must check if the region is still in recovery and defer initialization until 
 recovery is complete. We need to add a new CP hook for post recovery upcalls 
 and modify existing CPs to defer initialization until this new hook as needed.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HBASE-10161) [AccessController] Tolerate regions in recovery

2013-12-20 Thread Anoop Sam John (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John updated HBASE-10161:
---

Attachment: (was: HBASE-10161_V2.patch)

 [AccessController] Tolerate regions in recovery
 ---

 Key: HBASE-10161
 URL: https://issues.apache.org/jira/browse/HBASE-10161
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.96.0
Reporter: Andrew Purtell
Assignee: Anoop Sam John
Priority: Blocker
 Fix For: 0.98.0, 0.96.2, 0.99.0

 Attachments: HBASE-10161.patch, HBASE-10161_V2.patch


 AccessController fixes for the issue also affecting VisibilityController 
 described on HBASE-10148. Coprocessors that initialize in postOpen upcalls 
 must check if the region is still in recovery and defer initialization until 
 recovery is complete. We need to add a new CP hook for post recovery upcalls 
 and modify existing CPs to defer initialization until this new hook as needed.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HBASE-10161) [AccessController] Tolerate regions in recovery

2013-12-20 Thread Anoop Sam John (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John updated HBASE-10161:
---

Attachment: HBASE-10161_V2.patch

 [AccessController] Tolerate regions in recovery
 ---

 Key: HBASE-10161
 URL: https://issues.apache.org/jira/browse/HBASE-10161
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.96.0
Reporter: Andrew Purtell
Assignee: Anoop Sam John
Priority: Blocker
 Fix For: 0.98.0, 0.96.2, 0.99.0

 Attachments: HBASE-10161.patch, HBASE-10161_V2.patch


 AccessController fixes for the issue also affecting VisibilityController 
 described on HBASE-10148. Coprocessors that initialize in postOpen upcalls 
 must check if the region is still in recovery and defer initialization until 
 recovery is complete. We need to add a new CP hook for post recovery upcalls 
 and modify existing CPs to defer initialization until this new hook as needed.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HBASE-10161) [AccessController] Tolerate regions in recovery

2013-12-20 Thread Anoop Sam John (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John updated HBASE-10161:
---

Status: Patch Available  (was: Open)

 [AccessController] Tolerate regions in recovery
 ---

 Key: HBASE-10161
 URL: https://issues.apache.org/jira/browse/HBASE-10161
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.96.0
Reporter: Andrew Purtell
Assignee: Anoop Sam John
Priority: Blocker
 Fix For: 0.98.0, 0.96.2, 0.99.0

 Attachments: HBASE-10161.patch, HBASE-10161_V2.patch


 AccessController fixes for the issue also affecting VisibilityController 
 described on HBASE-10148. Coprocessors that initialize in postOpen upcalls 
 must check if the region is still in recovery and defer initialization until 
 recovery is complete. We need to add a new CP hook for post recovery upcalls 
 and modify existing CPs to defer initialization until this new hook as needed.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Created] (HBASE-10215) TableNotFoundException should be thrown after removing stale znode in ETH

2013-12-20 Thread rajeshbabu (JIRA)
rajeshbabu created HBASE-10215:
--

 Summary: TableNotFoundException should be thrown after removing 
stale znode in ETH
 Key: HBASE-10215
 URL: https://issues.apache.org/jira/browse/HBASE-10215
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.94.14, 0.96.1
Reporter: rajeshbabu
Assignee: rajeshbabu
Priority: Minor
 Fix For: 0.98.0, 0.94.16, 0.96.2, 0.99.0


Lets suppose master went down while creating table then znode will be left in 
ENABLING state. Master to recover them on restart. 
If there are no meta entries for the table.
While recovering the table we are checking whether table exists in meta or not, 
if not we are removing the znode. After removing znode we need to throw 
TableNotFoundException. Presently not throwing the exception so the znode will 
be recrated. It will be stale forever. Even on master restart we cannot delete. 
We cannot create the table with same name also.

{code}
  // Check if table exists
  if (!MetaReader.tableExists(catalogTracker, tableName)) {
// retainAssignment is true only during recovery.  In normal case it is 
false
if (!this.skipTableStateCheck) {
  throw new TableNotFoundException(tableName);
} 
try {
  this.assignmentManager.getZKTable().removeEnablingTable(tableName, 
true);
} catch (KeeperException e) {
  // TODO : Use HBCK to clear such nodes
  LOG.warn(Failed to delete the ENABLING node for the table  + 
tableName
  + .  The table will remain unusable. Run HBCK to manually fix 
the problem.);
}
  }
{code}




--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10175) 2-thread ChaosMonkey steps on its own toes

2013-12-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13853931#comment-13853931
 ] 

Hadoop QA commented on HBASE-10175:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12619725/HBASE-10175.patch
  against trunk revision .
  ATTACHMENT ID: 12619725

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 21 new 
or modified tests.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop1.1{color}.  The patch compiles against the hadoop 
1.1 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

{color:red}-1 site{color}.  The patch appears to cause mvn site goal to 
fail.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.security.access.TestAccessController

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8239//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8239//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8239//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8239//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8239//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8239//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8239//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8239//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8239//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8239//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8239//console

This message is automatically generated.

 2-thread ChaosMonkey steps on its own toes
 --

 Key: HBASE-10175
 URL: https://issues.apache.org/jira/browse/HBASE-10175
 Project: HBase
  Issue Type: Improvement
  Components: test
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
Priority: Minor
 Attachments: HBASE-10175.patch


 ChaosMonkey with one destructive and one volatility 
 (flush-compact-split-etc.) threads steps on its own toes and logs a lot of 
 exceptions.
 A simple solution would be to catch most (or all), like 
 NotServingRegionException, and log less (not a full callstack for example, 
 it's not very useful anyway).
 A more complicated/complementary one would be to keep track which regions the 
 destructive thread affects and use other regions for volatile one.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HBASE-10215) TableNotFoundException should be thrown after removing stale znode in ETH

2013-12-20 Thread rajeshbabu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rajeshbabu updated HBASE-10215:
---

Status: Patch Available  (was: Open)

 TableNotFoundException should be thrown after removing stale znode in ETH
 -

 Key: HBASE-10215
 URL: https://issues.apache.org/jira/browse/HBASE-10215
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.94.14, 0.96.1
Reporter: rajeshbabu
Assignee: rajeshbabu
Priority: Minor
 Fix For: 0.98.0, 0.94.16, 0.96.2, 0.99.0

 Attachments: HBASE-10215.patch


 Lets suppose master went down while creating table then znode will be left in 
 ENABLING state. Master to recover them on restart. 
 If there are no meta entries for the table.
 While recovering the table we are checking whether table exists in meta or 
 not, if not we are removing the znode. After removing znode we need to throw 
 TableNotFoundException. Presently not throwing the exception so the znode 
 will be recrated. It will be stale forever. Even on master restart we cannot 
 delete. We cannot create the table with same name also.
 {code}
   // Check if table exists
   if (!MetaReader.tableExists(catalogTracker, tableName)) {
 // retainAssignment is true only during recovery.  In normal case it 
 is false
 if (!this.skipTableStateCheck) {
   throw new TableNotFoundException(tableName);
 } 
 try {
   this.assignmentManager.getZKTable().removeEnablingTable(tableName, 
 true);
 } catch (KeeperException e) {
   // TODO : Use HBCK to clear such nodes
   LOG.warn(Failed to delete the ENABLING node for the table  + 
 tableName
   + .  The table will remain unusable. Run HBCK to manually fix 
 the problem.);
 }
   }
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HBASE-10215) TableNotFoundException should be thrown after removing stale znode in ETH

2013-12-20 Thread rajeshbabu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rajeshbabu updated HBASE-10215:
---

Attachment: HBASE-10215.patch

Patch for trunk. Please review.

 TableNotFoundException should be thrown after removing stale znode in ETH
 -

 Key: HBASE-10215
 URL: https://issues.apache.org/jira/browse/HBASE-10215
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.96.1, 0.94.14
Reporter: rajeshbabu
Assignee: rajeshbabu
Priority: Minor
 Fix For: 0.98.0, 0.94.16, 0.96.2, 0.99.0

 Attachments: HBASE-10215.patch


 Lets suppose master went down while creating table then znode will be left in 
 ENABLING state. Master to recover them on restart. 
 If there are no meta entries for the table.
 While recovering the table we are checking whether table exists in meta or 
 not, if not we are removing the znode. After removing znode we need to throw 
 TableNotFoundException. Presently not throwing the exception so the znode 
 will be recrated. It will be stale forever. Even on master restart we cannot 
 delete. We cannot create the table with same name also.
 {code}
   // Check if table exists
   if (!MetaReader.tableExists(catalogTracker, tableName)) {
 // retainAssignment is true only during recovery.  In normal case it 
 is false
 if (!this.skipTableStateCheck) {
   throw new TableNotFoundException(tableName);
 } 
 try {
   this.assignmentManager.getZKTable().removeEnablingTable(tableName, 
 true);
 } catch (KeeperException e) {
   // TODO : Use HBCK to clear such nodes
   LOG.warn(Failed to delete the ENABLING node for the table  + 
 tableName
   + .  The table will remain unusable. Run HBCK to manually fix 
 the problem.);
 }
   }
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10213) Add read log size per second metrics for replication source

2013-12-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13853941#comment-13853941
 ] 

Hadoop QA commented on HBASE-10213:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12619783/HBASE-10213-0.94-v1.patch
  against trunk revision .
  ATTACHMENT ID: 12619783

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8242//console

This message is automatically generated.

 Add read log size per second metrics for replication source
 ---

 Key: HBASE-10213
 URL: https://issues.apache.org/jira/browse/HBASE-10213
 Project: HBase
  Issue Type: Improvement
  Components: metrics, Replication
Affects Versions: 0.94.14
Reporter: cuijianwei
Assignee: cuijianwei
Priority: Minor
 Attachments: HBASE-10213-0.94-v1.patch


 The current metrics of replication source contain logEditsReadRate, 
 shippedBatchesRate, etc, which could indicate how fast the data replicated to 
 peer cluster to some extent. However, it is not clear enough to know how many 
 bytes replicating to peer cluster from these metrics. In production 
 environment, it may be important to know the size of replicating data per 
 second because the services may be affected if the network become busy.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-8859) truncate_preserve should get table split keys as it is instead of converting them to string type and then again to bytes

2013-12-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13853945#comment-13853945
 ] 

Hadoop QA commented on HBASE-8859:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12619748/HBASE-8859_trunk_4.patch
  against trunk revision .
  ATTACHMENT ID: 12619748

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop1.1{color}.  The patch compiles against the hadoop 
1.1 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

{color:red}-1 site{color}.  The patch appears to cause mvn site goal to 
fail.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.security.access.TestAccessController

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8241//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8241//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8241//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8241//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8241//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8241//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8241//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8241//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8241//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8241//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8241//console

This message is automatically generated.

 truncate_preserve should get table split keys as it is instead of converting 
 them to string type and then again to bytes
 

 Key: HBASE-8859
 URL: https://issues.apache.org/jira/browse/HBASE-8859
 Project: HBase
  Issue Type: Bug
  Components: scripts
Affects Versions: 0.95.1
Reporter: rajeshbabu
Assignee: rajeshbabu
 Fix For: 0.98.0, 0.99.0

 Attachments: HBASE-8859-Test_to_reproduce.patch, 
 HBASE-8859_trunk.patch, HBASE-8859_trunk_2.patch, HBASE-8859_trunk_3.patch, 
 HBASE-8859_trunk_4.patch


 If we take int,long or double bytes as split keys then we are not creating 
 table with same split keys because converting them to strings directly and to 
 bytes is giving different split keys, sometimes getting IllegalArgument 
 exception because of same split keys(converted). Instead we can get split 
 keys directly from HTable and pass them while creating table.
 {code}
   h_table = org.apache.hadoop.hbase.client.HTable.new(conf, table_name)
   splits = h_table.getRegionLocations().keys().map{|i| i.getStartKey} 
 :byte
   splits = org.apache.hadoop.hbase.util.Bytes.toByteArrays(splits)
 {code}
 {code}
 Truncating 'emp3' table (it may take a while):
  - Disabling table...
  - Dropping table...
  - Creating table with region boundaries...
 ERROR: java.lang.IllegalArgumentException: All split keys must be unique, 
 found duplicate: 

[jira] [Commented] (HBASE-8558) Add timeout limit for HBaseClient dataOutputStream

2013-12-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13853950#comment-13853950
 ] 

Hadoop QA commented on HBASE-8558:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12619777/HBASE-8558-0.94.txt
  against trunk revision .
  ATTACHMENT ID: 12619777

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8244//console

This message is automatically generated.

 Add timeout limit for HBaseClient dataOutputStream
 --

 Key: HBASE-8558
 URL: https://issues.apache.org/jira/browse/HBASE-8558
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.94.5, 0.94.14
Reporter: wanbin
Assignee: Liang Xie
 Attachments: HBASE-8558-0.94.txt


 I run jstack at client host. The result is below.
 hbase-tablepool-60-thread-34 daemon prio=10 tid=0x7f1e65a48000 
 nid=0x5173 runnable [0x579cc000]
java.lang.Thread.State: RUNNABLE
 at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
 at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
 at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
 at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
 - locked 0x000758cb0780 (a sun.nio.ch.Util$2)
 - locked 0x000758cb0770 (a 
 java.util.Collections$UnmodifiableSet)
 - locked 0x000758cb0548 (a sun.nio.ch.EPollSelectorImpl)
 at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
 at 
 org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:336)
 at 
 org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:158)
 at 
 org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:153)
 at 
 org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:114)
 at 
 java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
 at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
 - locked 0x000754e978a0 (a java.io.BufferedOutputStream)
 at java.io.DataOutputStream.flush(DataOutputStream.java:106)
 at 
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection.sendParam(HBaseClient.java:620)
 - locked 0x000754e97880 (a java.io.DataOutputStream)
 at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:975)
 at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:86)
 at $Proxy13.multi(Unknown Source)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1395)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1393)
 at 
 org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:210)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1402)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1390)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 This thread have hung for one hours
 Meanwhile other thread try to close connection
 IPC Client (1983049639) connection to 
 dump002030.cm6.tbsite.net/10.246.2.30:30020 from admin daemon prio=10 
 tid=0x7f1e70674800 nid=0x3d76 waiting for monitor entry 
 [0x4bc0f000]
java.lang.Thread.State: BLOCKED (on object monitor)
 at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
 - waiting to lock 0x000754e978a0 (a 
 java.io.BufferedOutputStream)
 at java.io.DataOutputStream.flush(DataOutputStream.java:106)
 at java.io.FilterOutputStream.close(FilterOutputStream.java:140)
 at 

[jira] [Commented] (HBASE-10214) Regionserver shutdown impropery and leave the dir in .old not delete.

2013-12-20 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13853952#comment-13853952
 ] 

Jean-Marc Spaggiari commented on HBASE-10214:
-

Hi [~aoxiang], which HBase version did you try with? The trace doesn't seems to 
allign with a recent one.

 Regionserver shutdown impropery and leave the dir in .old not delete.
 -

 Key: HBASE-10214
 URL: https://issues.apache.org/jira/browse/HBASE-10214
 Project: HBase
  Issue Type: Bug
Reporter: binlijin
 Attachments: HBASE-10214-94.patch


 RegionServer log
 {code}
 2013-12-18 15:17:45,771 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: Waiting on 
 51b27391410efdca841db264df46085f
 2013-12-18 15:17:45,776 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: Connected to master at 
 null
 2013-12-18 15:17:48,776 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Exiting; cluster 
 shutdown set and not carrying any regions
 2013-12-18 15:17:48,776 FATAL 
 org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
 node,60020,1384410974572: Unhandled exception: null
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:880)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:753)
 at java.lang.Thread.run(Thread.java:662)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10214) Regionserver shutdown impropery and leave the dir in .old not delete.

2013-12-20 Thread binlijin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13853959#comment-13853959
 ] 

binlijin commented on HBASE-10214:
--

[~jmspaggi],i use 0.94.10 , this patch is for 0.94 version.

 Regionserver shutdown impropery and leave the dir in .old not delete.
 -

 Key: HBASE-10214
 URL: https://issues.apache.org/jira/browse/HBASE-10214
 Project: HBase
  Issue Type: Bug
Reporter: binlijin
 Attachments: HBASE-10214-94.patch


 RegionServer log
 {code}
 2013-12-18 15:17:45,771 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: Waiting on 
 51b27391410efdca841db264df46085f
 2013-12-18 15:17:45,776 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: Connected to master at 
 null
 2013-12-18 15:17:48,776 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Exiting; cluster 
 shutdown set and not carrying any regions
 2013-12-18 15:17:48,776 FATAL 
 org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
 node,60020,1384410974572: Unhandled exception: null
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:880)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:753)
 at java.lang.Thread.run(Thread.java:662)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-9346) HBCK should provide an option to check if regions boundaries are the same in META and in stores.

2013-12-20 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13853960#comment-13853960
 ] 

Jean-Marc Spaggiari commented on HBASE-9346:


For the = 0 vs  0 I think we should keep = 0

The storesLastKey should almost never be equal to metaLastKey, but there is 
nothing to avoid that so it still can be.

If it's never equal, then = will not hurt. If it is, then = will be we good 
to have it.

I might be wrong ;) But that's seems to be correct.

 HBCK should provide an option to check if regions boundaries are the same in 
 META and in stores.
 

 Key: HBASE-9346
 URL: https://issues.apache.org/jira/browse/HBASE-9346
 Project: HBase
  Issue Type: Bug
  Components: hbck, Operability
Affects Versions: 0.94.14, 0.98.1, 0.99.0, 0.96.1.1
Reporter: Jean-Marc Spaggiari
Assignee: Jean-Marc Spaggiari
 Attachments: HBASE-9346-v0-0.94.patch, HBASE-9346-v1-trunk.patch, 
 HBASE-9346-v2-trunk.patch, HBASE-9346-v3-trunk.patch, 
 HBASE-9346-v4-trunk.patch, HBASE-9346-v5-trunk.patch, 
 HBASE-9346-v6-trunk.patch, HBASE-9346-v7-trunk.patch, 
 HBASE-9346-v8-trunk.patch


 If META don't have the same region boundaries as the stores files, writes and 
 read might go to the wrong place. We need to provide a way to check that 
 withing HBCK.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10214) Regionserver shutdown impropery and leave the dir in .old not delete.

2013-12-20 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13853967#comment-13853967
 ] 

Jean-Marc Spaggiari commented on HBASE-10214:
-

I'm not able to find the same lines into 0.94.10 neither.

Line 880 of HRegionServer is:
{code}
closeWAL(abortRequested ? false : true);
{code}
Line 753 is:
{code}
registerMBean();
{code}

Code for 0.94.10 and 0.94.15 is the same for tryRegionServerReport() so should 
not be an issue. But might be interesting to see what was throwing this NPE... 
Was it hbaseMaster like in your patch?


 Regionserver shutdown impropery and leave the dir in .old not delete.
 -

 Key: HBASE-10214
 URL: https://issues.apache.org/jira/browse/HBASE-10214
 Project: HBase
  Issue Type: Bug
Reporter: binlijin
 Attachments: HBASE-10214-94.patch


 RegionServer log
 {code}
 2013-12-18 15:17:45,771 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: Waiting on 
 51b27391410efdca841db264df46085f
 2013-12-18 15:17:45,776 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: Connected to master at 
 null
 2013-12-18 15:17:48,776 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Exiting; cluster 
 shutdown set and not carrying any regions
 2013-12-18 15:17:48,776 FATAL 
 org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
 node,60020,1384410974572: Unhandled exception: null
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:880)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:753)
 at java.lang.Thread.run(Thread.java:662)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10173) Need HFile version check in security coprocessors

2013-12-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13853979#comment-13853979
 ] 

Hudson commented on HBASE-10173:


FAILURE: Integrated in HBase-0.98 #26 (See 
[https://builds.apache.org/job/HBase-0.98/26/])
HBASE-10173. Need HFile version check in security coprocessors (apurtell: rev 
1552504)
* 
/hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java
* 
/hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/AccessController.java
* 
/hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/security/visibility/VisibilityController.java
* 
/hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/rest/TestScannersWithLabels.java
* 
/hbase/branches/0.98/hbase-thrift/src/test/java/org/apache/hadoop/hbase/thrift2/TestThriftHBaseServiceHandlerWithLabels.java


 Need HFile version check in security coprocessors
 -

 Key: HBASE-10173
 URL: https://issues.apache.org/jira/browse/HBASE-10173
 Project: HBase
  Issue Type: Improvement
  Components: security
Affects Versions: 0.98.0, 0.99.0
Reporter: Anoop Sam John
Assignee: Andrew Purtell
Priority: Critical
 Fix For: 0.98.0, 0.99.0

 Attachments: 10173.patch, 10173.patch, HBASE-10173.patch, 
 HBASE-10173_partial.patch


 Cell level visibility labels are stored as cell tags. So HFile V3 is the 
 minimum version which can support this feature. Better to have a version 
 check in VisibilityController. Some one using this CP but with any HFile 
 version as V2, we can better throw error.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10138) incorrect or confusing test value is used in block caches

2013-12-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13853978#comment-13853978
 ] 

Hudson commented on HBASE-10138:


FAILURE: Integrated in HBase-0.98 #26 (See 
[https://builds.apache.org/job/HBase-0.98/26/])
HBASE-10138. Incorrect or confusing test value is used in block caches (Sergey 
Shelukhin) (apurtell: rev 1552505)
* 
/hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/CacheConfig.java
* 
/hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketCache.java
* 
/hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
* 
/hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/bucket/TestBucketCache.java
* 
/hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java
* 
/hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java


 incorrect or confusing test value is used in block caches
 -

 Key: HBASE-10138
 URL: https://issues.apache.org/jira/browse/HBASE-10138
 Project: HBase
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Fix For: 0.98.0, 0.99.0

 Attachments: HBASE-10138.patch


 DEFAULT_BLOCKSIZE_SMALL is described as:
 {code}
   // Make default block size for StoreFiles 8k while testing.  TODO: FIX!
   // Need to make it 8k for testing.
   public static final int DEFAULT_BLOCKSIZE_SMALL = 8 * 1024;
 {code}
 This value is used on production path in CacheConfig thru HStore/HRegion, and 
 passed to various cache object.
 We should change it to actual block size, or if it is somehow by design at 
 least we should clarify it and remove the comment. 



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10207) ZKVisibilityLabelWatcher : Populate the labels cache on startup

2013-12-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13853980#comment-13853980
 ] 

Hudson commented on HBASE-10207:


FAILURE: Integrated in HBase-0.98 #26 (See 
[https://builds.apache.org/job/HBase-0.98/26/])
HBASE-10207 ZKVisibilityLabelWatcher : Populate the labels cache on startup 
(anoopsamjohn: rev 1552489)
* 
/hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/security/visibility/ZKVisibilityLabelWatcher.java


 ZKVisibilityLabelWatcher : Populate the labels cache on startup
 ---

 Key: HBASE-10207
 URL: https://issues.apache.org/jira/browse/HBASE-10207
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0
Reporter: Anoop Sam John
Assignee: Anoop Sam John
Priority: Blocker
 Fix For: 0.98.0, 0.99.0

 Attachments: HBASE-10207.patch






--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10214) Regionserver shutdown impropery and leave the dir in .old not delete.

2013-12-20 Thread binlijin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13853984#comment-13853984
 ] 

binlijin commented on HBASE-10214:
--

oh, sorry, the line is not match with the apache hbase 0.94.10 version, this is 
our own internal version which based on apache hbase 0.94.10.
{code}

long now = System.currentTimeMillis();
if ((now - lastMsg) = msgInterval) {
  doMetrics();
  tryRegionServerReport();  // 753
  lastMsg = System.currentTimeMillis();
}
if (!this.stopped) this.sleeper.sleep();


  void tryRegionServerReport()
  throws IOException {
HServerLoad hsl = buildServerLoad();
// Why we do this?
this.requestCount.set(0);
try {
  
this.hbaseMaster.regionServerReport(this.serverNameFromMasterPOV.getVersionedBytes(),
 hsl); // line 880
} catch (IOException ioe) {
  if (ioe instanceof RemoteException) {
ioe = ((RemoteException)ioe).unwrapRemoteException();
  }
  if (ioe instanceof YouAreDeadException) {
// This will be caught and handled as a fatal error in run()
throw ioe;
  }
  // Couldn't connect to the master, get location from zk and reconnect
  // Method blocks until new master is found or we are stopped
  getMaster();
}
  }
{code}

 Regionserver shutdown impropery and leave the dir in .old not delete.
 -

 Key: HBASE-10214
 URL: https://issues.apache.org/jira/browse/HBASE-10214
 Project: HBase
  Issue Type: Bug
Reporter: binlijin
 Attachments: HBASE-10214-94.patch


 RegionServer log
 {code}
 2013-12-18 15:17:45,771 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: Waiting on 
 51b27391410efdca841db264df46085f
 2013-12-18 15:17:45,776 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: Connected to master at 
 null
 2013-12-18 15:17:48,776 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Exiting; cluster 
 shutdown set and not carrying any regions
 2013-12-18 15:17:48,776 FATAL 
 org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
 node,60020,1384410974572: Unhandled exception: null
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:880)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:753)
 at java.lang.Thread.run(Thread.java:662)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10161) [AccessController] Tolerate regions in recovery

2013-12-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13853990#comment-13853990
 ] 

Hadoop QA commented on HBASE-10161:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12619800/HBASE-10161_V2.patch
  against trunk revision .
  ATTACHMENT ID: 12619800

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop1.1{color}.  The patch compiles against the hadoop 
1.1 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

{color:red}-1 site{color}.  The patch appears to cause mvn site goal to 
fail.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.security.access.TestAccessController

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8243//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8243//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8243//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8243//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8243//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8243//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8243//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8243//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8243//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8243//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8243//console

This message is automatically generated.

 [AccessController] Tolerate regions in recovery
 ---

 Key: HBASE-10161
 URL: https://issues.apache.org/jira/browse/HBASE-10161
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.96.0
Reporter: Andrew Purtell
Assignee: Anoop Sam John
Priority: Blocker
 Fix For: 0.98.0, 0.96.2, 0.99.0

 Attachments: HBASE-10161.patch, HBASE-10161_V2.patch


 AccessController fixes for the issue also affecting VisibilityController 
 described on HBASE-10148. Coprocessors that initialize in postOpen upcalls 
 must check if the region is still in recovery and defer initialization until 
 recovery is complete. We need to add a new CP hook for post recovery upcalls 
 and modify existing CPs to defer initialization until this new hook as needed.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10207) ZKVisibilityLabelWatcher : Populate the labels cache on startup

2013-12-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854006#comment-13854006
 ] 

Hudson commented on HBASE-10207:


FAILURE: Integrated in HBase-TRUNK #4741 (See 
[https://builds.apache.org/job/HBase-TRUNK/4741/])
HBASE-10207 ZKVisibilityLabelWatcher : Populate the labels cache on startup 
(anoopsamjohn: rev 1552488)
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/security/visibility/ZKVisibilityLabelWatcher.java


 ZKVisibilityLabelWatcher : Populate the labels cache on startup
 ---

 Key: HBASE-10207
 URL: https://issues.apache.org/jira/browse/HBASE-10207
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0
Reporter: Anoop Sam John
Assignee: Anoop Sam John
Priority: Blocker
 Fix For: 0.98.0, 0.99.0

 Attachments: HBASE-10207.patch






--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10173) Need HFile version check in security coprocessors

2013-12-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854005#comment-13854005
 ] 

Hudson commented on HBASE-10173:


FAILURE: Integrated in HBase-TRUNK #4741 (See 
[https://builds.apache.org/job/HBase-TRUNK/4741/])
HBASE-10173. Need HFile version check in security coprocessors (apurtell: rev 
1552503)
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/AccessController.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/security/visibility/VisibilityController.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/rest/TestScannersWithLabels.java
* 
/hbase/trunk/hbase-thrift/src/test/java/org/apache/hadoop/hbase/thrift2/TestThriftHBaseServiceHandlerWithLabels.java


 Need HFile version check in security coprocessors
 -

 Key: HBASE-10173
 URL: https://issues.apache.org/jira/browse/HBASE-10173
 Project: HBase
  Issue Type: Improvement
  Components: security
Affects Versions: 0.98.0, 0.99.0
Reporter: Anoop Sam John
Assignee: Andrew Purtell
Priority: Critical
 Fix For: 0.98.0, 0.99.0

 Attachments: 10173.patch, 10173.patch, HBASE-10173.patch, 
 HBASE-10173_partial.patch


 Cell level visibility labels are stored as cell tags. So HFile V3 is the 
 minimum version which can support this feature. Better to have a version 
 check in VisibilityController. Some one using this CP but with any HFile 
 version as V2, we can better throw error.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10214) Regionserver shutdown impropery and leave the dir in .old not delete.

2013-12-20 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854013#comment-13854013
 ] 

Jean-Marc Spaggiari commented on HBASE-10214:
-

Ok. Make sense now ;) Thanks for the clarification.

Is there any risk for isClusterUp() to return true but for hbaseMaster to be 
null? If so, we will still get a NPE. no?


 Regionserver shutdown impropery and leave the dir in .old not delete.
 -

 Key: HBASE-10214
 URL: https://issues.apache.org/jira/browse/HBASE-10214
 Project: HBase
  Issue Type: Bug
Reporter: binlijin
 Attachments: HBASE-10214-94.patch


 RegionServer log
 {code}
 2013-12-18 15:17:45,771 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: Waiting on 
 51b27391410efdca841db264df46085f
 2013-12-18 15:17:45,776 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: Connected to master at 
 null
 2013-12-18 15:17:48,776 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Exiting; cluster 
 shutdown set and not carrying any regions
 2013-12-18 15:17:48,776 FATAL 
 org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
 node,60020,1384410974572: Unhandled exception: null
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:880)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:753)
 at java.lang.Thread.run(Thread.java:662)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-9151) HBCK cannot fix when meta server znode deleted, this can happen if all region servers stopped and there are no logs to split.

2013-12-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854023#comment-13854023
 ] 

Hadoop QA commented on HBASE-9151:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12619761/HBASE-9151.patch
  against trunk revision .
  ATTACHMENT ID: 12619761

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified tests.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop1.1{color}.  The patch compiles against the hadoop 
1.1 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

{color:red}-1 site{color}.  The patch appears to cause mvn site goal to 
fail.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.regionserver.TestRSKilledWhenInitializing
  org.apache.hadoop.hbase.security.access.TestAccessController

 {color:red}-1 core zombie tests{color}.  There are 1 zombie test(s):   
at 
org.apache.hadoop.hbase.TestAcidGuarantees.testMixedAtomicity(TestAcidGuarantees.java:351)

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8245//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8245//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8245//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8245//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8245//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8245//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8245//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8245//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8245//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8245//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8245//console

This message is automatically generated.

 HBCK cannot fix when meta server znode deleted, this can happen if all region 
 servers stopped and there are no logs to split.
 -

 Key: HBASE-9151
 URL: https://issues.apache.org/jira/browse/HBASE-9151
 Project: HBase
  Issue Type: Bug
  Components: hbck
Reporter: rajeshbabu
Assignee: rajeshbabu
 Fix For: 0.98.0, 0.99.0

 Attachments: HBASE-9151.patch


 When meta server znode deleted and meta in FAILED_OPEN state, then hbck 
 cannot fix it. This scenario can come when all region servers stopped by stop 
 command and didnt start any RS within 10 secs(with default configurations). 
 {code}
   public void assignMeta() throws KeeperException {
 MetaRegionTracker.deleteMetaLocation(this.watcher);
 assign(HRegionInfo.FIRST_META_REGIONINFO, true);
   }
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10138) incorrect or confusing test value is used in block caches

2013-12-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854025#comment-13854025
 ] 

Hudson commented on HBASE-10138:


FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #23 (See 
[https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/23/])
HBASE-10138. Incorrect or confusing test value is used in block caches (Sergey 
Shelukhin) (apurtell: rev 1552505)
* 
/hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/CacheConfig.java
* 
/hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketCache.java
* 
/hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
* 
/hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/bucket/TestBucketCache.java
* 
/hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java
* 
/hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java


 incorrect or confusing test value is used in block caches
 -

 Key: HBASE-10138
 URL: https://issues.apache.org/jira/browse/HBASE-10138
 Project: HBase
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Fix For: 0.98.0, 0.99.0

 Attachments: HBASE-10138.patch


 DEFAULT_BLOCKSIZE_SMALL is described as:
 {code}
   // Make default block size for StoreFiles 8k while testing.  TODO: FIX!
   // Need to make it 8k for testing.
   public static final int DEFAULT_BLOCKSIZE_SMALL = 8 * 1024;
 {code}
 This value is used on production path in CacheConfig thru HStore/HRegion, and 
 passed to various cache object.
 We should change it to actual block size, or if it is somehow by design at 
 least we should clarify it and remove the comment. 



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10207) ZKVisibilityLabelWatcher : Populate the labels cache on startup

2013-12-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854027#comment-13854027
 ] 

Hudson commented on HBASE-10207:


FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #23 (See 
[https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/23/])
HBASE-10207 ZKVisibilityLabelWatcher : Populate the labels cache on startup 
(anoopsamjohn: rev 1552489)
* 
/hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/security/visibility/ZKVisibilityLabelWatcher.java


 ZKVisibilityLabelWatcher : Populate the labels cache on startup
 ---

 Key: HBASE-10207
 URL: https://issues.apache.org/jira/browse/HBASE-10207
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0
Reporter: Anoop Sam John
Assignee: Anoop Sam John
Priority: Blocker
 Fix For: 0.98.0, 0.99.0

 Attachments: HBASE-10207.patch






--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10173) Need HFile version check in security coprocessors

2013-12-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854026#comment-13854026
 ] 

Hudson commented on HBASE-10173:


FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #23 (See 
[https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/23/])
HBASE-10173. Need HFile version check in security coprocessors (apurtell: rev 
1552504)
* 
/hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java
* 
/hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/AccessController.java
* 
/hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/security/visibility/VisibilityController.java
* 
/hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/rest/TestScannersWithLabels.java
* 
/hbase/branches/0.98/hbase-thrift/src/test/java/org/apache/hadoop/hbase/thrift2/TestThriftHBaseServiceHandlerWithLabels.java


 Need HFile version check in security coprocessors
 -

 Key: HBASE-10173
 URL: https://issues.apache.org/jira/browse/HBASE-10173
 Project: HBase
  Issue Type: Improvement
  Components: security
Affects Versions: 0.98.0, 0.99.0
Reporter: Anoop Sam John
Assignee: Andrew Purtell
Priority: Critical
 Fix For: 0.98.0, 0.99.0

 Attachments: 10173.patch, 10173.patch, HBASE-10173.patch, 
 HBASE-10173_partial.patch


 Cell level visibility labels are stored as cell tags. So HFile V3 is the 
 minimum version which can support this feature. Better to have a version 
 check in VisibilityController. Some one using this CP but with any HFile 
 version as V2, we can better throw error.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10214) Regionserver shutdown impropery and leave the dir in .old not delete.

2013-12-20 Thread binlijin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854055#comment-13854055
 ] 

binlijin commented on HBASE-10214:
--

No, lt looks like impossible.

 Regionserver shutdown impropery and leave the dir in .old not delete.
 -

 Key: HBASE-10214
 URL: https://issues.apache.org/jira/browse/HBASE-10214
 Project: HBase
  Issue Type: Bug
Reporter: binlijin
 Attachments: HBASE-10214-94.patch


 RegionServer log
 {code}
 2013-12-18 15:17:45,771 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: Waiting on 
 51b27391410efdca841db264df46085f
 2013-12-18 15:17:45,776 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: Connected to master at 
 null
 2013-12-18 15:17:48,776 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Exiting; cluster 
 shutdown set and not carrying any regions
 2013-12-18 15:17:48,776 FATAL 
 org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
 node,60020,1384410974572: Unhandled exception: null
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:880)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:753)
 at java.lang.Thread.run(Thread.java:662)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-9151) HBCK cannot fix when meta server znode deleted, this can happen if all region servers stopped and there are no logs to split.

2013-12-20 Thread rajeshbabu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854057#comment-13854057
 ] 

rajeshbabu commented on HBASE-9151:
---

TestRSKilledWhenInitializing  test case failure is related to the patch. I will 
fix and upload new patch.

 HBCK cannot fix when meta server znode deleted, this can happen if all region 
 servers stopped and there are no logs to split.
 -

 Key: HBASE-9151
 URL: https://issues.apache.org/jira/browse/HBASE-9151
 Project: HBase
  Issue Type: Bug
  Components: hbck
Reporter: rajeshbabu
Assignee: rajeshbabu
 Fix For: 0.98.0, 0.99.0

 Attachments: HBASE-9151.patch


 When meta server znode deleted and meta in FAILED_OPEN state, then hbck 
 cannot fix it. This scenario can come when all region servers stopped by stop 
 command and didnt start any RS within 10 secs(with default configurations). 
 {code}
   public void assignMeta() throws KeeperException {
 MetaRegionTracker.deleteMetaLocation(this.watcher);
 assign(HRegionInfo.FIRST_META_REGIONINFO, true);
   }
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HBASE-10214) Regionserver shutdown impropery and leave the dir in .old not delete.

2013-12-20 Thread binlijin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

binlijin updated HBASE-10214:
-

Attachment: HBASE-10214-94-V2.patch

 Regionserver shutdown impropery and leave the dir in .old not delete.
 -

 Key: HBASE-10214
 URL: https://issues.apache.org/jira/browse/HBASE-10214
 Project: HBase
  Issue Type: Bug
Reporter: binlijin
 Attachments: HBASE-10214-94-V2.patch, HBASE-10214-94.patch


 RegionServer log
 {code}
 2013-12-18 15:17:45,771 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: Waiting on 
 51b27391410efdca841db264df46085f
 2013-12-18 15:17:45,776 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: Connected to master at 
 null
 2013-12-18 15:17:48,776 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Exiting; cluster 
 shutdown set and not carrying any regions
 2013-12-18 15:17:48,776 FATAL 
 org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
 node,60020,1384410974572: Unhandled exception: null
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:880)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:753)
 at java.lang.Thread.run(Thread.java:662)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10173) Need HFile version check in security coprocessors

2013-12-20 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854072#comment-13854072
 ] 

Anoop Sam John commented on HBASE-10173:


https://builds.apache.org/job/PreCommit-HBASE-Build/8241//testReport/org.apache.hadoop.hbase.security.access/TestAccessController/testCellPermissions/
Failure is related to commit..  I can give a small addendum here.

 Need HFile version check in security coprocessors
 -

 Key: HBASE-10173
 URL: https://issues.apache.org/jira/browse/HBASE-10173
 Project: HBase
  Issue Type: Improvement
  Components: security
Affects Versions: 0.98.0, 0.99.0
Reporter: Anoop Sam John
Assignee: Andrew Purtell
Priority: Critical
 Fix For: 0.98.0, 0.99.0

 Attachments: 10173.patch, 10173.patch, HBASE-10173.patch, 
 HBASE-10173_partial.patch


 Cell level visibility labels are stored as cell tags. So HFile V3 is the 
 minimum version which can support this feature. Better to have a version 
 check in VisibilityController. Some one using this CP but with any HFile 
 version as V2, we can better throw error.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Created] (HBASE-10216) Change HBase to support local compactions

2013-12-20 Thread David Witten (JIRA)
David Witten created HBASE-10216:


 Summary: Change HBase to support local compactions
 Key: HBASE-10216
 URL: https://issues.apache.org/jira/browse/HBASE-10216
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
 Environment: All
Reporter: David Witten


As I understand it compactions will read data from DFS and write to DFS.  This 
means that even when the reading occurs on the local host (because region 
server has a local copy) all the writing must go over the network to the other 
replicas.  This proposal suggests that HBase would perform much better if all 
the reading and writing occurred locally and did not go over the network. 

I propose that the DFS interface be extended to provide method that would merge 
files so that the merging and deleting can be performed on local data nodes 
with no file contents moving over the network.  The method would take a list of 
paths to be merged and deleted and the merged file path and an indication of a 
file-format-aware class that would be run on each data node to perform the 
merge.  The merge method provided by this merging class would be passed files 
open for reading for all the files to be merged and one file open for writing.  
The custom class provided merge method would read all the input files and 
append to the output file using some standard API that would work across all 
DFS implementations.  The DFS would ensure that the merge had happened properly 
on all replicas before returning to the caller.  It could be that greater 
resiliency could be achieved by implementing the deletion as a separate phase 
that is only done after enough of the replicas had completed the merge. 

HBase would be changed to use the new merge method for compactions, and would 
provide an implementation of the merging class that works with HFiles.

This proposal would require a custom code that understands the file format to 
be runnable by the data nodes to manage the merge.  So there would need to be a 
facility to load classes into DFS if there isn't such a facility already.  Or, 
less generally, HDFS could build in support for HFile merging.

The merge method might be optional.  If the DFS implementation did not provide 
it a generic version that performed the merge on top of the regular DFS 
interfaces would be used.

It may be that this method needs to be tweaked or ignored when the region 
server does not have a local copy data so that, as happens currently, one copy 
of the data moves to the region server.




--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HBASE-10173) Need HFile version check in security coprocessors

2013-12-20 Thread Anoop Sam John (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John updated HBASE-10173:
---

Attachment: HBASE-10173_Addendum.patch

Addendum to fix test failure .
[~apurtell] , [~ram_krish] what do you guys say?

 Need HFile version check in security coprocessors
 -

 Key: HBASE-10173
 URL: https://issues.apache.org/jira/browse/HBASE-10173
 Project: HBase
  Issue Type: Improvement
  Components: security
Affects Versions: 0.98.0, 0.99.0
Reporter: Anoop Sam John
Assignee: Andrew Purtell
Priority: Critical
 Fix For: 0.98.0, 0.99.0

 Attachments: 10173.patch, 10173.patch, HBASE-10173.patch, 
 HBASE-10173_Addendum.patch, HBASE-10173_partial.patch


 Cell level visibility labels are stored as cell tags. So HFile V3 is the 
 minimum version which can support this feature. Better to have a version 
 check in VisibilityController. Some one using this CP but with any HFile 
 version as V2, we can better throw error.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10216) Change HBase to support local compactions

2013-12-20 Thread Liang Xie (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854117#comment-13854117
 ] 

Liang Xie commented on HBASE-10216:
---

sound  crazy while i read firstly, but yep, seems reasonable.
seems need do lots of work at HDFS side, you need to let the accordingly data 
blocks allocate to the same data nodes always, then your proposal merge 
probably could bypass the most of the network operation. current HDFS code, 
however, no ganrantee all the HFile's low layer data blocks into the same 
nodes:)

 Change HBase to support local compactions
 -

 Key: HBASE-10216
 URL: https://issues.apache.org/jira/browse/HBASE-10216
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
 Environment: All
Reporter: David Witten

 As I understand it compactions will read data from DFS and write to DFS.  
 This means that even when the reading occurs on the local host (because 
 region server has a local copy) all the writing must go over the network to 
 the other replicas.  This proposal suggests that HBase would perform much 
 better if all the reading and writing occurred locally and did not go over 
 the network. 
 I propose that the DFS interface be extended to provide method that would 
 merge files so that the merging and deleting can be performed on local data 
 nodes with no file contents moving over the network.  The method would take a 
 list of paths to be merged and deleted and the merged file path and an 
 indication of a file-format-aware class that would be run on each data node 
 to perform the merge.  The merge method provided by this merging class would 
 be passed files open for reading for all the files to be merged and one file 
 open for writing.  The custom class provided merge method would read all the 
 input files and append to the output file using some standard API that would 
 work across all DFS implementations.  The DFS would ensure that the merge had 
 happened properly on all replicas before returning to the caller.  It could 
 be that greater resiliency could be achieved by implementing the deletion as 
 a separate phase that is only done after enough of the replicas had completed 
 the merge. 
 HBase would be changed to use the new merge method for compactions, and would 
 provide an implementation of the merging class that works with HFiles.
 This proposal would require a custom code that understands the file format to 
 be runnable by the data nodes to manage the merge.  So there would need to be 
 a facility to load classes into DFS if there isn't such a facility already.  
 Or, less generally, HDFS could build in support for HFile merging.
 The merge method might be optional.  If the DFS implementation did not 
 provide it a generic version that performed the merge on top of the regular 
 DFS interfaces would be used.
 It may be that this method needs to be tweaked or ignored when the region 
 server does not have a local copy data so that, as happens currently, one 
 copy of the data moves to the region server.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10173) Need HFile version check in security coprocessors

2013-12-20 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854118#comment-13854118
 ] 

Andrew Purtell commented on HBASE-10173:


Yep, annoying this didn't show up locally. 

 Need HFile version check in security coprocessors
 -

 Key: HBASE-10173
 URL: https://issues.apache.org/jira/browse/HBASE-10173
 Project: HBase
  Issue Type: Improvement
  Components: security
Affects Versions: 0.98.0, 0.99.0
Reporter: Anoop Sam John
Assignee: Andrew Purtell
Priority: Critical
 Fix For: 0.98.0, 0.99.0

 Attachments: 10173.patch, 10173.patch, HBASE-10173.patch, 
 HBASE-10173_Addendum.patch, HBASE-10173_partial.patch


 Cell level visibility labels are stored as cell tags. So HFile V3 is the 
 minimum version which can support this feature. Better to have a version 
 check in VisibilityController. Some one using this CP but with any HFile 
 version as V2, we can better throw error.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10215) TableNotFoundException should be thrown after removing stale znode in ETH

2013-12-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854120#comment-13854120
 ] 

Hadoop QA commented on HBASE-10215:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12619812/HBASE-10215.patch
  against trunk revision .
  ATTACHMENT ID: 12619812

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified tests.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop1.1{color}.  The patch compiles against the hadoop 
1.1 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

{color:red}-1 site{color}.  The patch appears to cause mvn site goal to 
fail.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.security.access.TestAccessController

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8246//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8246//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8246//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8246//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8246//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8246//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8246//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8246//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8246//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8246//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8246//console

This message is automatically generated.

 TableNotFoundException should be thrown after removing stale znode in ETH
 -

 Key: HBASE-10215
 URL: https://issues.apache.org/jira/browse/HBASE-10215
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.96.1, 0.94.14
Reporter: rajeshbabu
Assignee: rajeshbabu
Priority: Minor
 Fix For: 0.98.0, 0.94.16, 0.96.2, 0.99.0

 Attachments: HBASE-10215.patch


 Lets suppose master went down while creating table then znode will be left in 
 ENABLING state. Master to recover them on restart. 
 If there are no meta entries for the table.
 While recovering the table we are checking whether table exists in meta or 
 not, if not we are removing the znode. After removing znode we need to throw 
 TableNotFoundException. Presently not throwing the exception so the znode 
 will be recrated. It will be stale forever. Even on master restart we cannot 
 delete. We cannot create the table with same name also.
 {code}
   // Check if table exists
   if (!MetaReader.tableExists(catalogTracker, tableName)) {
 // retainAssignment is true only during recovery.  In normal case it 
 is false
 if (!this.skipTableStateCheck) {
   throw new TableNotFoundException(tableName);
 } 
 try {
   this.assignmentManager.getZKTable().removeEnablingTable(tableName, 
 true);
 } catch (KeeperException e) {
   // TODO : Use HBCK to clear such nodes
   LOG.warn(Failed to delete the ENABLING node for the table  + 
 tableName
   + .  The table will remain unusable. Run HBCK to manually fix 
 the problem.);
 }
   }
 

[jira] [Commented] (HBASE-10173) Need HFile version check in security coprocessors

2013-12-20 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854121#comment-13854121
 ] 

Andrew Purtell commented on HBASE-10173:


+1 on addendum

 Need HFile version check in security coprocessors
 -

 Key: HBASE-10173
 URL: https://issues.apache.org/jira/browse/HBASE-10173
 Project: HBase
  Issue Type: Improvement
  Components: security
Affects Versions: 0.98.0, 0.99.0
Reporter: Anoop Sam John
Assignee: Andrew Purtell
Priority: Critical
 Fix For: 0.98.0, 0.99.0

 Attachments: 10173.patch, 10173.patch, HBASE-10173.patch, 
 HBASE-10173_Addendum.patch, HBASE-10173_partial.patch


 Cell level visibility labels are stored as cell tags. So HFile V3 is the 
 minimum version which can support this feature. Better to have a version 
 check in VisibilityController. Some one using this CP but with any HFile 
 version as V2, we can better throw error.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HBASE-10193) Cleanup HRegion if one of the store fails to open at region initialization

2013-12-20 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-10193:
---

Fix Version/s: 0.99.0
   0.96.2
   0.94.15
   0.98.0

 Cleanup HRegion if one of the store fails to open at region initialization
 --

 Key: HBASE-10193
 URL: https://issues.apache.org/jira/browse/HBASE-10193
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.96.1, 0.94.14
Reporter: Aditya Kishore
Assignee: Aditya Kishore
Priority: Critical
 Fix For: 0.98.0, 0.94.15, 0.96.2, 0.99.0

 Attachments: HBASE-10193.patch, HBASE-10193_0.94.patch, 
 HBASE-10193_0.94_v2.patch, HBASE-10193_0.94_v3.patch, 
 HBASE-10193_0.94_v4.patch, HBASE-10193_v2.patch, HBASE-10193_v3.patch, 
 HBASE-10193_v4.patch


 While investigating a different issue, I realized that the fix for HBASE-9737 
 is not sufficient to prevent resource leak if a region fails to open for some 
 reason, say a corrupt HFile.
 The region may have, by then, opened other good HFiles in that store or other 
 stores if it has more than one column family and their streams may leak if 
 not closed.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10193) Cleanup HRegion if one of the store fails to open at region initialization

2013-12-20 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854125#comment-13854125
 ] 

ramkrishna.s.vasudevan commented on HBASE-10193:


Committed to 0.96, trunk and 0.94.
Not able to commit to 0.98. Says access denied.  [~anoop.hbase], [~apurtell] - 
could you pls commit to 0.98.

 Cleanup HRegion if one of the store fails to open at region initialization
 --

 Key: HBASE-10193
 URL: https://issues.apache.org/jira/browse/HBASE-10193
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.96.1, 0.94.14
Reporter: Aditya Kishore
Assignee: Aditya Kishore
Priority: Critical
 Fix For: 0.98.0, 0.94.15, 0.96.2, 0.99.0

 Attachments: HBASE-10193.patch, HBASE-10193_0.94.patch, 
 HBASE-10193_0.94_v2.patch, HBASE-10193_0.94_v3.patch, 
 HBASE-10193_0.94_v4.patch, HBASE-10193_v2.patch, HBASE-10193_v3.patch, 
 HBASE-10193_v4.patch


 While investigating a different issue, I realized that the fix for HBASE-9737 
 is not sufficient to prevent resource leak if a region fails to open for some 
 reason, say a corrupt HFile.
 The region may have, by then, opened other good HFiles in that store or other 
 stores if it has more than one column family and their streams may leak if 
 not closed.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10206) Explain tags in the hbase book

2013-12-20 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854128#comment-13854128
 ] 

ramkrishna.s.vasudevan commented on HBASE-10206:


Committed to trunk.  Need to commit to 0.98.  So leaving it open.
[~anoop.hbase],[~apurtell] - Could you pls commit this to 0.98?

 Explain tags in the hbase book
 --

 Key: HBASE-10206
 URL: https://issues.apache.org/jira/browse/HBASE-10206
 Project: HBase
  Issue Type: Task
  Components: documentation
Affects Versions: 0.98.0, 0.99.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.98.0

 Attachments: HBASE-10206.patch, HBASE-10206.patch






--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-9721) RegionServer should not accept regionOpen RPC intended for another(previous) server

2013-12-20 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854135#comment-13854135
 ] 

Jimmy Xiang commented on HBASE-9721:


bq.  should the RS check both the znode version and data before open the region?
I think I prefer to put the sn (or just the startcode?) in the RPC as this 
patch does since we may do assignment without ZK later on.

 RegionServer should not accept regionOpen RPC intended for another(previous) 
 server
 ---

 Key: HBASE-9721
 URL: https://issues.apache.org/jira/browse/HBASE-9721
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0, 0.96.0
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Attachments: hbase-9721_v0.patch, hbase-9721_v1.patch, 
 hbase-9721_v2.patch


 On a test cluster, this following events happened with ITBLL and CM leading 
 to meta being unavailable until master is restarted. 
 An RS carrying meta died, and master assigned the region to one of the RSs. 
 {code}
 2013-10-03 23:30:06,611 INFO  
 [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-1] 
 master.AssignmentManager: Assigning hbase:meta,,1.1588230740 to 
 gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820
 2013-10-03 23:30:06,611 INFO  
 [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-1] 
 master.RegionStates: Transitioned {1588230740 state=OFFLINE, 
 ts=1380843006601, server=null} to {1588230740 state=PENDING_OPEN, 
 ts=1380843006611, 
 server=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820}
 2013-10-03 23:30:06,611 DEBUG 
 [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-1] 
 master.ServerManager: New admin connection to 
 gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820
 {code}
 At the same time, the RS that meta recently got assigned also died (due to 
 CM), and restarted: 
 {code}
 2013-10-03 23:30:07,636 DEBUG [RpcServer.handler=17,port=6] 
 master.ServerManager: REPORT: Server 
 gs-hdp2-secure-1380781860-hbase-8.cs1cloud.internal,60020,1380843002494 came 
 back up, removed it from the dead servers list
 2013-10-03 23:30:08,769 INFO  [RpcServer.handler=18,port=6] 
 master.ServerManager: Triggering server recovery; existingServer 
 gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820 looks 
 stale, new 
 server:gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380843006362
 2013-10-03 23:30:08,771 DEBUG [RpcServer.handler=18,port=6] 
 master.AssignmentManager: Checking region=hbase:meta,,1.1588230740, zk 
 server=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820
  
 current=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820,
  matches=true
 2013-10-03 23:30:08,771 DEBUG [RpcServer.handler=18,port=6] 
 master.ServerManager: 
 Added=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820 
 to dead servers, submitted shutdown handler to be executed meta=true
 2013-10-03 23:30:08,771 INFO  [RpcServer.handler=18,port=6] 
 master.ServerManager: Registering 
 server=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380843006362
 2013-10-03 23:30:08,772 INFO  
 [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-2] 
 handler.MetaServerShutdownHandler: Splitting hbase:meta logs for 
 gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820
 {code}
 AM/SSH sees that the RS that died was carrying meta, but the assignment RPC 
 request was still not sent:
 {code}
 2013-10-03 23:30:08,791 DEBUG 
 [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-2] 
 master.AssignmentManager: Checking region=hbase:meta,,1.1588230740, zk 
 server=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820
  
 current=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820,
  matches=true
 2013-10-03 23:30:08,791 INFO  
 [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-2] 
 handler.MetaServerShutdownHandler: Server 
 gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820 was 
 carrying META. Trying to assign.
 2013-10-03 23:30:08,791 DEBUG 
 [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-2] 
 master.RegionStates: Offline 1588230740 with current state=PENDING_OPEN, 
 expected state=OFFLINE/SPLITTING/MERGING
 2013-10-03 23:30:08,791 INFO  
 [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-2] 
 master.RegionStates: Transitioned {1588230740 state=PENDING_OPEN, 
 ts=1380843006611, 
 server=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820}
  to {1588230740 state=OFFLINE, ts=1380843008791, server=null}
 

[jira] [Commented] (HBASE-10173) Need HFile version check in security coprocessors

2013-12-20 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854138#comment-13854138
 ] 

ramkrishna.s.vasudevan commented on HBASE-10173:


Sorry.  Took some time to understand.  So the ACL region came in after some 
other region had come first.  So we are moving the check to start().  Correct?


 Need HFile version check in security coprocessors
 -

 Key: HBASE-10173
 URL: https://issues.apache.org/jira/browse/HBASE-10173
 Project: HBase
  Issue Type: Improvement
  Components: security
Affects Versions: 0.98.0, 0.99.0
Reporter: Anoop Sam John
Assignee: Andrew Purtell
Priority: Critical
 Fix For: 0.98.0, 0.99.0

 Attachments: 10173.patch, 10173.patch, HBASE-10173.patch, 
 HBASE-10173_Addendum.patch, HBASE-10173_partial.patch


 Cell level visibility labels are stored as cell tags. So HFile V3 is the 
 minimum version which can support this feature. Better to have a version 
 check in VisibilityController. Some one using this CP but with any HFile 
 version as V2, we can better throw error.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Reopened] (HBASE-10173) Need HFile version check in security coprocessors

2013-12-20 Thread Anoop Sam John (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John reopened HBASE-10173:



 Need HFile version check in security coprocessors
 -

 Key: HBASE-10173
 URL: https://issues.apache.org/jira/browse/HBASE-10173
 Project: HBase
  Issue Type: Improvement
  Components: security
Affects Versions: 0.98.0, 0.99.0
Reporter: Anoop Sam John
Assignee: Andrew Purtell
Priority: Critical
 Fix For: 0.98.0, 0.99.0

 Attachments: 10173.patch, 10173.patch, HBASE-10173.patch, 
 HBASE-10173_Addendum.patch, HBASE-10173_partial.patch


 Cell level visibility labels are stored as cell tags. So HFile V3 is the 
 minimum version which can support this feature. Better to have a version 
 check in VisibilityController. Some one using this CP but with any HFile 
 version as V2, we can better throw error.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10206) Explain tags in the hbase book

2013-12-20 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854145#comment-13854145
 ] 

Andrew Purtell commented on HBASE-10206:


On another issue [~stack] mentioned that he copies the current trunk docs to 
branch and commits that just before a RC - did I remember that correctly? Seems 
a fine way to do it for now because the doc for branch is the same as trunk.

 Explain tags in the hbase book
 --

 Key: HBASE-10206
 URL: https://issues.apache.org/jira/browse/HBASE-10206
 Project: HBase
  Issue Type: Task
  Components: documentation
Affects Versions: 0.98.0, 0.99.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.98.0

 Attachments: HBASE-10206.patch, HBASE-10206.patch






--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10173) Need HFile version check in security coprocessors

2013-12-20 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854144#comment-13854144
 ] 

Anoop Sam John commented on HBASE-10173:


Not exactly Ram...
initialize() will be called on the CP object created for the _acl_ region. In 
case of other regions this is not called and the boolean is always false :(
So moving this to start() will make sure that this is done for all regions

 Need HFile version check in security coprocessors
 -

 Key: HBASE-10173
 URL: https://issues.apache.org/jira/browse/HBASE-10173
 Project: HBase
  Issue Type: Improvement
  Components: security
Affects Versions: 0.98.0, 0.99.0
Reporter: Anoop Sam John
Assignee: Andrew Purtell
Priority: Critical
 Fix For: 0.98.0, 0.99.0

 Attachments: 10173.patch, 10173.patch, HBASE-10173.patch, 
 HBASE-10173_Addendum.patch, HBASE-10173_partial.patch


 Cell level visibility labels are stored as cell tags. So HFile V3 is the 
 minimum version which can support this feature. Better to have a version 
 check in VisibilityController. Some one using this CP but with any HFile 
 version as V2, we can better throw error.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Resolved] (HBASE-10173) Need HFile version check in security coprocessors

2013-12-20 Thread Anoop Sam John (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John resolved HBASE-10173.


Resolution: Fixed

Committed the addendum to 0.98 and trunk.

 Need HFile version check in security coprocessors
 -

 Key: HBASE-10173
 URL: https://issues.apache.org/jira/browse/HBASE-10173
 Project: HBase
  Issue Type: Improvement
  Components: security
Affects Versions: 0.98.0, 0.99.0
Reporter: Anoop Sam John
Assignee: Andrew Purtell
Priority: Critical
 Fix For: 0.98.0, 0.99.0

 Attachments: 10173.patch, 10173.patch, HBASE-10173.patch, 
 HBASE-10173_Addendum.patch, HBASE-10173_partial.patch


 Cell level visibility labels are stored as cell tags. So HFile V3 is the 
 minimum version which can support this feature. Better to have a version 
 check in VisibilityController. Some one using this CP but with any HFile 
 version as V2, we can better throw error.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10161) [AccessController] Tolerate regions in recovery

2013-12-20 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854146#comment-13854146
 ] 

Anoop Sam John commented on HBASE-10161:


Test failure is addressed by the addendum committed to HBASE-10173. 

 [AccessController] Tolerate regions in recovery
 ---

 Key: HBASE-10161
 URL: https://issues.apache.org/jira/browse/HBASE-10161
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.96.0
Reporter: Andrew Purtell
Assignee: Anoop Sam John
Priority: Blocker
 Fix For: 0.98.0, 0.96.2, 0.99.0

 Attachments: HBASE-10161.patch, HBASE-10161_V2.patch


 AccessController fixes for the issue also affecting VisibilityController 
 described on HBASE-10148. Coprocessors that initialize in postOpen upcalls 
 must check if the region is still in recovery and defer initialization until 
 recovery is complete. We need to add a new CP hook for post recovery upcalls 
 and modify existing CPs to defer initialization until this new hook as needed.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10193) Cleanup HRegion if one of the store fails to open at region initialization

2013-12-20 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854151#comment-13854151
 ] 

Anoop Sam John commented on HBASE-10193:


Committed to 0.98 branch as well..

 Cleanup HRegion if one of the store fails to open at region initialization
 --

 Key: HBASE-10193
 URL: https://issues.apache.org/jira/browse/HBASE-10193
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.96.1, 0.94.14
Reporter: Aditya Kishore
Assignee: Aditya Kishore
Priority: Critical
 Fix For: 0.98.0, 0.94.15, 0.96.2, 0.99.0

 Attachments: HBASE-10193.patch, HBASE-10193_0.94.patch, 
 HBASE-10193_0.94_v2.patch, HBASE-10193_0.94_v3.patch, 
 HBASE-10193_0.94_v4.patch, HBASE-10193_v2.patch, HBASE-10193_v3.patch, 
 HBASE-10193_v4.patch


 While investigating a different issue, I realized that the fix for HBASE-9737 
 is not sufficient to prevent resource leak if a region fails to open for some 
 reason, say a corrupt HFile.
 The region may have, by then, opened other good HFiles in that store or other 
 stores if it has more than one column family and their streams may leak if 
 not closed.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Comment Edited] (HBASE-10193) Cleanup HRegion if one of the store fails to open at region initialization

2013-12-20 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854151#comment-13854151
 ] 

Anoop Sam John edited comment on HBASE-10193 at 12/20/13 4:55 PM:
--

Committed to 0.98 branch as well..
Thanks for the patch Aditya. Thanks Ted and Ram for the reviews


was (Author: anoop.hbase):
Committed to 0.98 branch as well..

 Cleanup HRegion if one of the store fails to open at region initialization
 --

 Key: HBASE-10193
 URL: https://issues.apache.org/jira/browse/HBASE-10193
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.96.1, 0.94.14
Reporter: Aditya Kishore
Assignee: Aditya Kishore
Priority: Critical
 Fix For: 0.98.0, 0.94.15, 0.96.2, 0.99.0

 Attachments: HBASE-10193.patch, HBASE-10193_0.94.patch, 
 HBASE-10193_0.94_v2.patch, HBASE-10193_0.94_v3.patch, 
 HBASE-10193_0.94_v4.patch, HBASE-10193_v2.patch, HBASE-10193_v3.patch, 
 HBASE-10193_v4.patch


 While investigating a different issue, I realized that the fix for HBASE-9737 
 is not sufficient to prevent resource leak if a region fails to open for some 
 reason, say a corrupt HFile.
 The region may have, by then, opened other good HFiles in that store or other 
 stores if it has more than one column family and their streams may leak if 
 not closed.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-7781) Update security unit tests to use a KDC if available

2013-12-20 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854153#comment-13854153
 ] 

Andrew Purtell commented on HBASE-7781:
---

Have a look at the utility classes (and tests) under hbase-server src/test in 
org.apache.hadoop.hbase.security. We want helpers that allow a test writer to 
start a mini KDC. I don't think we can depend on Hadoop's mini KDC module yet 
until it is in a release. Then it would be nice to have an integration test 
that starts up the mini KDC and uses it if running in a minicluster 
configuration.

 Update security unit tests to use a KDC if available
 

 Key: HBASE-7781
 URL: https://issues.apache.org/jira/browse/HBASE-7781
 Project: HBase
  Issue Type: Test
  Components: security, test
Reporter: Gary Helmling
Assignee: ramkrishna.s.vasudevan
Priority: Blocker
 Fix For: 0.98.0


 We currently have large holes in the test coverage of HBase with security 
 enabled.  Two recent examples of bugs which really should have been caught 
 with testing are HBASE-7771 and HBASE-7772.  The long standing problem with 
 testing with security enabled has been the requirement for supporting 
 kerberos infrastructure.
 We need to close this gap and provide some automated testing with security 
 enabled, if necessary standing up and provisioning a temporary KDC as an 
 option for running integration tests, see HADOOP-8078 and HADOOP-9004 where a 
 similar approach was taken.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Resolved] (HBASE-10193) Cleanup HRegion if one of the store fails to open at region initialization

2013-12-20 Thread Anoop Sam John (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John resolved HBASE-10193.


   Resolution: Fixed
Fix Version/s: (was: 0.94.15)
   0.94.16
 Hadoop Flags: Reviewed

 Cleanup HRegion if one of the store fails to open at region initialization
 --

 Key: HBASE-10193
 URL: https://issues.apache.org/jira/browse/HBASE-10193
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.96.1, 0.94.14
Reporter: Aditya Kishore
Assignee: Aditya Kishore
Priority: Critical
 Fix For: 0.98.0, 0.94.16, 0.96.2, 0.99.0

 Attachments: HBASE-10193.patch, HBASE-10193_0.94.patch, 
 HBASE-10193_0.94_v2.patch, HBASE-10193_0.94_v3.patch, 
 HBASE-10193_0.94_v4.patch, HBASE-10193_v2.patch, HBASE-10193_v3.patch, 
 HBASE-10193_v4.patch


 While investigating a different issue, I realized that the fix for HBASE-9737 
 is not sufficient to prevent resource leak if a region fails to open for some 
 reason, say a corrupt HFile.
 The region may have, by then, opened other good HFiles in that store or other 
 stores if it has more than one column family and their streams may leak if 
 not closed.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10193) Cleanup HRegion if one of the store fails to open at region initialization

2013-12-20 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854154#comment-13854154
 ] 

Andrew Purtell commented on HBASE-10193:


Thanks Anoop.

Local permissions problem Ram? Your access in the repo should be fine. I just 
did a svn copy from trunk to create the branch, nothing unusual there.

 Cleanup HRegion if one of the store fails to open at region initialization
 --

 Key: HBASE-10193
 URL: https://issues.apache.org/jira/browse/HBASE-10193
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.96.1, 0.94.14
Reporter: Aditya Kishore
Assignee: Aditya Kishore
Priority: Critical
 Fix For: 0.98.0, 0.94.16, 0.96.2, 0.99.0

 Attachments: HBASE-10193.patch, HBASE-10193_0.94.patch, 
 HBASE-10193_0.94_v2.patch, HBASE-10193_0.94_v3.patch, 
 HBASE-10193_0.94_v4.patch, HBASE-10193_v2.patch, HBASE-10193_v3.patch, 
 HBASE-10193_v4.patch


 While investigating a different issue, I realized that the fix for HBASE-9737 
 is not sufficient to prevent resource leak if a region fails to open for some 
 reason, say a corrupt HFile.
 The region may have, by then, opened other good HFiles in that store or other 
 stores if it has more than one column family and their streams may leak if 
 not closed.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10161) [AccessController] Tolerate regions in recovery

2013-12-20 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854156#comment-13854156
 ] 

Andrew Purtell commented on HBASE-10161:


+1 

 [AccessController] Tolerate regions in recovery
 ---

 Key: HBASE-10161
 URL: https://issues.apache.org/jira/browse/HBASE-10161
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.96.0
Reporter: Andrew Purtell
Assignee: Anoop Sam John
Priority: Blocker
 Fix For: 0.98.0, 0.96.2, 0.99.0

 Attachments: HBASE-10161.patch, HBASE-10161_V2.patch


 AccessController fixes for the issue also affecting VisibilityController 
 described on HBASE-10148. Coprocessors that initialize in postOpen upcalls 
 must check if the region is still in recovery and defer initialization until 
 recovery is complete. We need to add a new CP hook for post recovery upcalls 
 and modify existing CPs to defer initialization until this new hook as needed.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10161) [AccessController] Tolerate regions in recovery

2013-12-20 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854160#comment-13854160
 ] 

Andrew Purtell commented on HBASE-10161:


One question. This part:
{code}
Index: 
hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/AccessControlLists.java
===
--- 
hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/AccessControlLists.java
  (revision 1552489)
+++ 
hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/AccessControlLists.java
  (working copy)
@@ -116,6 +116,8 @@
 Compression.Algorithm.NONE.getName(), true, true, 8 * 1024,
 HConstants.FOREVER, BloomType.NONE.toString(),
 HConstants.REPLICATION_SCOPE_LOCAL));
+
ACL_TABLEDESC.setValue(Bytes.toBytes(HConstants.DISALLOW_WRITES_IN_RECOVERING),
+Bytes.toBytes(true));
   }
{code}

For future use? 

 [AccessController] Tolerate regions in recovery
 ---

 Key: HBASE-10161
 URL: https://issues.apache.org/jira/browse/HBASE-10161
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.96.0
Reporter: Andrew Purtell
Assignee: Anoop Sam John
Priority: Blocker
 Fix For: 0.98.0, 0.96.2, 0.99.0

 Attachments: HBASE-10161.patch, HBASE-10161_V2.patch


 AccessController fixes for the issue also affecting VisibilityController 
 described on HBASE-10148. Coprocessors that initialize in postOpen upcalls 
 must check if the region is still in recovery and defer initialization until 
 recovery is complete. We need to add a new CP hook for post recovery upcalls 
 and modify existing CPs to defer initialization until this new hook as needed.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10206) Explain tags in the hbase book

2013-12-20 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854158#comment-13854158
 ] 

ramkrishna.s.vasudevan commented on HBASE-10206:


Yes but this doc should come only in 0.98 right.  So better we commit there 
too. But as i said am not able to commit to 0.98 due to permission reasons.

 Explain tags in the hbase book
 --

 Key: HBASE-10206
 URL: https://issues.apache.org/jira/browse/HBASE-10206
 Project: HBase
  Issue Type: Task
  Components: documentation
Affects Versions: 0.98.0, 0.99.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.98.0

 Attachments: HBASE-10206.patch, HBASE-10206.patch






--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HBASE-9151) HBCK cannot fix when meta server znode deleted, this can happen if all region servers stopped and there are no logs to split.

2013-12-20 Thread rajeshbabu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rajeshbabu updated HBASE-9151:
--

Attachment: HBASE-9151_v2.patch

fixed TestRSKilledWhenInitializing in current patch.

 HBCK cannot fix when meta server znode deleted, this can happen if all region 
 servers stopped and there are no logs to split.
 -

 Key: HBASE-9151
 URL: https://issues.apache.org/jira/browse/HBASE-9151
 Project: HBase
  Issue Type: Bug
  Components: hbck
Reporter: rajeshbabu
Assignee: rajeshbabu
 Fix For: 0.98.0, 0.99.0

 Attachments: HBASE-9151.patch, HBASE-9151_v2.patch


 When meta server znode deleted and meta in FAILED_OPEN state, then hbck 
 cannot fix it. This scenario can come when all region servers stopped by stop 
 command and didnt start any RS within 10 secs(with default configurations). 
 {code}
   public void assignMeta() throws KeeperException {
 MetaRegionTracker.deleteMetaLocation(this.watcher);
 assign(HRegionInfo.FIRST_META_REGIONINFO, true);
   }
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-7781) Update security unit tests to use a KDC if available

2013-12-20 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854166#comment-13854166
 ] 

ramkrishna.s.vasudevan commented on HBASE-7781:
---

bq.Have a look at the utility classes (and tests) under hbase-server src/test 
in org.apache.hadoop.hbase.security
Yes Andy.  Exactly doing that. 
I understand that using miniKDC we could define our own principles, and add 
proper kdc configuraitons for the NN, DN, RS and master. Using that the 
security test should be running.
I hope in testcases too once the security is enabled through kerberos, secure 
DN and secure NN starts running. Let me see that.

 Update security unit tests to use a KDC if available
 

 Key: HBASE-7781
 URL: https://issues.apache.org/jira/browse/HBASE-7781
 Project: HBase
  Issue Type: Test
  Components: security, test
Reporter: Gary Helmling
Assignee: ramkrishna.s.vasudevan
Priority: Blocker
 Fix For: 0.98.0


 We currently have large holes in the test coverage of HBase with security 
 enabled.  Two recent examples of bugs which really should have been caught 
 with testing are HBASE-7771 and HBASE-7772.  The long standing problem with 
 testing with security enabled has been the requirement for supporting 
 kerberos infrastructure.
 We need to close this gap and provide some automated testing with security 
 enabled, if necessary standing up and provisioning a temporary KDC as an 
 option for running integration tests, see HADOOP-8078 and HADOOP-9004 where a 
 similar approach was taken.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-9346) HBCK should provide an option to check if regions boundaries are the same in META and in stores.

2013-12-20 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854177#comment-13854177
 ] 

Andrew Purtell commented on HBASE-9346:
---

+1 for 0.98

 HBCK should provide an option to check if regions boundaries are the same in 
 META and in stores.
 

 Key: HBASE-9346
 URL: https://issues.apache.org/jira/browse/HBASE-9346
 Project: HBase
  Issue Type: Bug
  Components: hbck, Operability
Affects Versions: 0.94.14, 0.98.1, 0.99.0, 0.96.1.1
Reporter: Jean-Marc Spaggiari
Assignee: Jean-Marc Spaggiari
 Attachments: HBASE-9346-v0-0.94.patch, HBASE-9346-v1-trunk.patch, 
 HBASE-9346-v2-trunk.patch, HBASE-9346-v3-trunk.patch, 
 HBASE-9346-v4-trunk.patch, HBASE-9346-v5-trunk.patch, 
 HBASE-9346-v6-trunk.patch, HBASE-9346-v7-trunk.patch, 
 HBASE-9346-v8-trunk.patch


 If META don't have the same region boundaries as the stores files, writes and 
 read might go to the wrong place. We need to provide a way to check that 
 withing HBCK.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10216) Change HBase to support local compactions

2013-12-20 Thread David Witten (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854179#comment-13854179
 ] 

David Witten commented on HBASE-10216:
--

I'm no HDFS expert.  But I had imagined that a data node, D, performing a merge 
would just do the merge with local files, then tell the name node that D has a 
replica for all the data blocks for the merged file.

 Change HBase to support local compactions
 -

 Key: HBASE-10216
 URL: https://issues.apache.org/jira/browse/HBASE-10216
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
 Environment: All
Reporter: David Witten

 As I understand it compactions will read data from DFS and write to DFS.  
 This means that even when the reading occurs on the local host (because 
 region server has a local copy) all the writing must go over the network to 
 the other replicas.  This proposal suggests that HBase would perform much 
 better if all the reading and writing occurred locally and did not go over 
 the network. 
 I propose that the DFS interface be extended to provide method that would 
 merge files so that the merging and deleting can be performed on local data 
 nodes with no file contents moving over the network.  The method would take a 
 list of paths to be merged and deleted and the merged file path and an 
 indication of a file-format-aware class that would be run on each data node 
 to perform the merge.  The merge method provided by this merging class would 
 be passed files open for reading for all the files to be merged and one file 
 open for writing.  The custom class provided merge method would read all the 
 input files and append to the output file using some standard API that would 
 work across all DFS implementations.  The DFS would ensure that the merge had 
 happened properly on all replicas before returning to the caller.  It could 
 be that greater resiliency could be achieved by implementing the deletion as 
 a separate phase that is only done after enough of the replicas had completed 
 the merge. 
 HBase would be changed to use the new merge method for compactions, and would 
 provide an implementation of the merging class that works with HFiles.
 This proposal would require a custom code that understands the file format to 
 be runnable by the data nodes to manage the merge.  So there would need to be 
 a facility to load classes into DFS if there isn't such a facility already.  
 Or, less generally, HDFS could build in support for HFile merging.
 The merge method might be optional.  If the DFS implementation did not 
 provide it a generic version that performed the merge on top of the regular 
 DFS interfaces would be used.
 It may be that this method needs to be tweaked or ignored when the region 
 server does not have a local copy data so that, as happens currently, one 
 copy of the data moves to the region server.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10161) [AccessController] Tolerate regions in recovery

2013-12-20 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854185#comment-13854185
 ] 

Anoop Sam John commented on HBASE-10161:


We have initilaized boolean checks now. I think I can remove this. Fine on that 
Andy?

 [AccessController] Tolerate regions in recovery
 ---

 Key: HBASE-10161
 URL: https://issues.apache.org/jira/browse/HBASE-10161
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.96.0
Reporter: Andrew Purtell
Assignee: Anoop Sam John
Priority: Blocker
 Fix For: 0.98.0, 0.96.2, 0.99.0

 Attachments: HBASE-10161.patch, HBASE-10161_V2.patch


 AccessController fixes for the issue also affecting VisibilityController 
 described on HBASE-10148. Coprocessors that initialize in postOpen upcalls 
 must check if the region is still in recovery and defer initialization until 
 recovery is complete. We need to add a new CP hook for post recovery upcalls 
 and modify existing CPs to defer initialization until this new hook as needed.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10213) Add read log size per second metrics for replication source

2013-12-20 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854189#comment-13854189
 ] 

Andrew Purtell commented on HBASE-10213:


To get a good HadoopQA result, it will need a patch against trunk. 

{code}
index 3831bba..8315c3a 100644
--- 
src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
+++ 
src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
@@ -458,6 +458,7 @@ public class ReplicationSource extends Thread
   throws IOException{
 long seenEntries = 0;
 this.repLogReader.seek();
+long persitionBeforeRead = this.repLogReader.getPosition();
 HLog.Entry entry =
 this.repLogReader.readNextAndSetPosition();
 while (entry != null) {
{code}

persitionBeforeRead should be positionBeforeRead.

{code}
index da0905c..e32a3bc 100644
--- 
src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceMetrics.java
+++ 
src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceMetrics.java
@@ -66,6 +66,9 @@ public class ReplicationSourceMetrics implements Updater {
*/
   public final MetricsIntValue sizeOfLogQueue =
   new MetricsIntValue(sizeOfLogQueue, registry);
+  
+  /** Rate of log entries read by the source */
+  public MetricsRate logReadRateInByte = new MetricsRate(logReadRateInByte, 
registry);
{code}

The usual convention for names with units is to pluralize the unit, so 
logReadRateInBytes

 Add read log size per second metrics for replication source
 ---

 Key: HBASE-10213
 URL: https://issues.apache.org/jira/browse/HBASE-10213
 Project: HBase
  Issue Type: Improvement
  Components: metrics, Replication
Affects Versions: 0.94.14
Reporter: cuijianwei
Assignee: cuijianwei
Priority: Minor
 Attachments: HBASE-10213-0.94-v1.patch


 The current metrics of replication source contain logEditsReadRate, 
 shippedBatchesRate, etc, which could indicate how fast the data replicated to 
 peer cluster to some extent. However, it is not clear enough to know how many 
 bytes replicating to peer cluster from these metrics. In production 
 environment, it may be important to know the size of replicating data per 
 second because the services may be affected if the network become busy.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10161) [AccessController] Tolerate regions in recovery

2013-12-20 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854191#comment-13854191
 ] 

Andrew Purtell commented on HBASE-10161:


bq.  I think I can remove this. Fine on that Andy?

Sure

 [AccessController] Tolerate regions in recovery
 ---

 Key: HBASE-10161
 URL: https://issues.apache.org/jira/browse/HBASE-10161
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.96.0
Reporter: Andrew Purtell
Assignee: Anoop Sam John
Priority: Blocker
 Fix For: 0.98.0, 0.96.2, 0.99.0

 Attachments: HBASE-10161.patch, HBASE-10161_V2.patch


 AccessController fixes for the issue also affecting VisibilityController 
 described on HBASE-10148. Coprocessors that initialize in postOpen upcalls 
 must check if the region is still in recovery and defer initialization until 
 recovery is complete. We need to add a new CP hook for post recovery upcalls 
 and modify existing CPs to defer initialization until this new hook as needed.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10161) [AccessController] Tolerate regions in recovery

2013-12-20 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854198#comment-13854198
 ] 

Anoop Sam John commented on HBASE-10161:


V3 which avoids the change in AccessControlLists. Going to commit now

 [AccessController] Tolerate regions in recovery
 ---

 Key: HBASE-10161
 URL: https://issues.apache.org/jira/browse/HBASE-10161
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.96.0
Reporter: Andrew Purtell
Assignee: Anoop Sam John
Priority: Blocker
 Fix For: 0.98.0, 0.96.2, 0.99.0

 Attachments: HBASE-10161.patch, HBASE-10161_V2.patch, 
 HBASE-10161_V3.patch


 AccessController fixes for the issue also affecting VisibilityController 
 described on HBASE-10148. Coprocessors that initialize in postOpen upcalls 
 must check if the region is still in recovery and defer initialization until 
 recovery is complete. We need to add a new CP hook for post recovery upcalls 
 and modify existing CPs to defer initialization until this new hook as needed.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HBASE-10161) [AccessController] Tolerate regions in recovery

2013-12-20 Thread Anoop Sam John (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John updated HBASE-10161:
---

Attachment: HBASE-10161_V3.patch

 [AccessController] Tolerate regions in recovery
 ---

 Key: HBASE-10161
 URL: https://issues.apache.org/jira/browse/HBASE-10161
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.96.0
Reporter: Andrew Purtell
Assignee: Anoop Sam John
Priority: Blocker
 Fix For: 0.98.0, 0.96.2, 0.99.0

 Attachments: HBASE-10161.patch, HBASE-10161_V2.patch, 
 HBASE-10161_V3.patch


 AccessController fixes for the issue also affecting VisibilityController 
 described on HBASE-10148. Coprocessors that initialize in postOpen upcalls 
 must check if the region is still in recovery and defer initialization until 
 recovery is complete. We need to add a new CP hook for post recovery upcalls 
 and modify existing CPs to defer initialization until this new hook as needed.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10161) [AccessController] Tolerate regions in recovery

2013-12-20 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854206#comment-13854206
 ] 

Anoop Sam John commented on HBASE-10161:


Ping [~stack]. This is required in 96 branch also. Pls +1

 [AccessController] Tolerate regions in recovery
 ---

 Key: HBASE-10161
 URL: https://issues.apache.org/jira/browse/HBASE-10161
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.96.0
Reporter: Andrew Purtell
Assignee: Anoop Sam John
Priority: Blocker
 Fix For: 0.98.0, 0.96.2, 0.99.0

 Attachments: HBASE-10161.patch, HBASE-10161_V2.patch, 
 HBASE-10161_V3.patch


 AccessController fixes for the issue also affecting VisibilityController 
 described on HBASE-10148. Coprocessors that initialize in postOpen upcalls 
 must check if the region is still in recovery and defer initialization until 
 recovery is complete. We need to add a new CP hook for post recovery upcalls 
 and modify existing CPs to defer initialization until this new hook as needed.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10047) postScannerFilterRow consumes a lot of CPU in tall table scans

2013-12-20 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854211#comment-13854211
 ] 

Andrew Purtell commented on HBASE-10047:


The set of installed coprocessors can change at runtime concurrent with 
iteration of the list.

 postScannerFilterRow consumes a lot of CPU in tall table scans
 --

 Key: HBASE-10047
 URL: https://issues.apache.org/jira/browse/HBASE-10047
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
 Attachments: 10047-0.94-sample-v2.txt, 10047-0.94-sample.txt, 
 postScannerFilterRow.png


 Continuing my profiling quest, I find that in scanning tall table (and 
 filtering everything on the server) a quarter of the time is now spent in the 
 postScannerFilterRow coprocessor hook.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10206) Explain tags in the hbase book

2013-12-20 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854221#comment-13854221
 ] 

Andrew Purtell commented on HBASE-10206:


Right, so when prepping the RC I can copy the entire manual over from trunk, we 
don't have to bring commits to the manual to the branch piece by piece.

 Explain tags in the hbase book
 --

 Key: HBASE-10206
 URL: https://issues.apache.org/jira/browse/HBASE-10206
 Project: HBase
  Issue Type: Task
  Components: documentation
Affects Versions: 0.98.0, 0.99.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.98.0

 Attachments: HBASE-10206.patch, HBASE-10206.patch






--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10216) Change HBase to support local compactions

2013-12-20 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854261#comment-13854261
 ] 

haosdent commented on HBASE-10216:
--

I don't think local compaction is feasible. HDFS stored hfiles as many blocks, 
and these blocks have a fixed size. To provide a method to merge files in hdfs 
may be couldn't bring outstanding improvement. In other words, hdfs local reads 
maybe enough for this.

 Change HBase to support local compactions
 -

 Key: HBASE-10216
 URL: https://issues.apache.org/jira/browse/HBASE-10216
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
 Environment: All
Reporter: David Witten

 As I understand it compactions will read data from DFS and write to DFS.  
 This means that even when the reading occurs on the local host (because 
 region server has a local copy) all the writing must go over the network to 
 the other replicas.  This proposal suggests that HBase would perform much 
 better if all the reading and writing occurred locally and did not go over 
 the network. 
 I propose that the DFS interface be extended to provide method that would 
 merge files so that the merging and deleting can be performed on local data 
 nodes with no file contents moving over the network.  The method would take a 
 list of paths to be merged and deleted and the merged file path and an 
 indication of a file-format-aware class that would be run on each data node 
 to perform the merge.  The merge method provided by this merging class would 
 be passed files open for reading for all the files to be merged and one file 
 open for writing.  The custom class provided merge method would read all the 
 input files and append to the output file using some standard API that would 
 work across all DFS implementations.  The DFS would ensure that the merge had 
 happened properly on all replicas before returning to the caller.  It could 
 be that greater resiliency could be achieved by implementing the deletion as 
 a separate phase that is only done after enough of the replicas had completed 
 the merge. 
 HBase would be changed to use the new merge method for compactions, and would 
 provide an implementation of the merging class that works with HFiles.
 This proposal would require a custom code that understands the file format to 
 be runnable by the data nodes to manage the merge.  So there would need to be 
 a facility to load classes into DFS if there isn't such a facility already.  
 Or, less generally, HDFS could build in support for HFile merging.
 The merge method might be optional.  If the DFS implementation did not 
 provide it a generic version that performed the merge on top of the regular 
 DFS interfaces would be used.
 It may be that this method needs to be tweaked or ignored when the region 
 server does not have a local copy data so that, as happens currently, one 
 copy of the data moves to the region server.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10161) [AccessController] Tolerate regions in recovery

2013-12-20 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854311#comment-13854311
 ] 

Anoop Sam John commented on HBASE-10161:


Committed to 0.98 and Trunk. Will add to 0.96 as well once Stack gives a go.

 [AccessController] Tolerate regions in recovery
 ---

 Key: HBASE-10161
 URL: https://issues.apache.org/jira/browse/HBASE-10161
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.96.0
Reporter: Andrew Purtell
Assignee: Anoop Sam John
Priority: Blocker
 Fix For: 0.98.0, 0.96.2, 0.99.0

 Attachments: HBASE-10161.patch, HBASE-10161_V2.patch, 
 HBASE-10161_V3.patch


 AccessController fixes for the issue also affecting VisibilityController 
 described on HBASE-10148. Coprocessors that initialize in postOpen upcalls 
 must check if the region is still in recovery and defer initialization until 
 recovery is complete. We need to add a new CP hook for post recovery upcalls 
 and modify existing CPs to defer initialization until this new hook as needed.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-9648) collection one expired storefile causes it to be replaced by another expired storefile

2013-12-20 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854374#comment-13854374
 ] 

Sergey Shelukhin commented on HBASE-9648:
-

Just clarifying, why is it hard to create writer is needed? For the case when 
there are seemingly no KVs when you were creating the writer. I think coprocs 
cannot screw up the seqIds, because set of files is already chosen, so that 
should be ok

 collection one expired storefile causes it to be replaced by another expired 
 storefile
 --

 Key: HBASE-9648
 URL: https://issues.apache.org/jira/browse/HBASE-9648
 Project: HBase
  Issue Type: Bug
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Jean-Marc Spaggiari
 Attachments: HBASE-9648-v0-0.94.patch, HBASE-9648-v0-trunk.patch, 
 HBASE-9648-v1-trunk.patch, HBASE-9648-v2-trunk.patch, 
 HBASE-9648-v3-trunk.patch, HBASE-9648.patch


 There's a shortcut in compaction selection that causes the selection of 
 expired store files to quickly delete.
 However, there's also the code that ensures we write at least one file to 
 preserve seqnum. This new empty file is expired, because it has no data, 
 presumably.
 So it's collected again, etc.
 This affects 94, probably also 96.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10175) 2-thread ChaosMonkey steps on its own toes

2013-12-20 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854375#comment-13854375
 ] 

Sergey Shelukhin commented on HBASE-10175:
--

I don't think test failure can be related. [~enis] you want to review?

 2-thread ChaosMonkey steps on its own toes
 --

 Key: HBASE-10175
 URL: https://issues.apache.org/jira/browse/HBASE-10175
 Project: HBase
  Issue Type: Improvement
  Components: test
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
Priority: Minor
 Attachments: HBASE-10175.patch


 ChaosMonkey with one destructive and one volatility 
 (flush-compact-split-etc.) threads steps on its own toes and logs a lot of 
 exceptions.
 A simple solution would be to catch most (or all), like 
 NotServingRegionException, and log less (not a full callstack for example, 
 it's not very useful anyway).
 A more complicated/complementary one would be to keep track which regions the 
 destructive thread affects and use other regions for volatile one.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-8558) Add timeout limit for HBaseClient dataOutputStream

2013-12-20 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854397#comment-13854397
 ] 

Lars Hofhansl commented on HBASE-8558:
--

So the issue is: While we were writing something a RegionServer went down and 
we sit there forever waiting?

 Add timeout limit for HBaseClient dataOutputStream
 --

 Key: HBASE-8558
 URL: https://issues.apache.org/jira/browse/HBASE-8558
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.94.5, 0.94.14
Reporter: wanbin
Assignee: Liang Xie
 Attachments: HBASE-8558-0.94.txt


 I run jstack at client host. The result is below.
 hbase-tablepool-60-thread-34 daemon prio=10 tid=0x7f1e65a48000 
 nid=0x5173 runnable [0x579cc000]
java.lang.Thread.State: RUNNABLE
 at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
 at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
 at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
 at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
 - locked 0x000758cb0780 (a sun.nio.ch.Util$2)
 - locked 0x000758cb0770 (a 
 java.util.Collections$UnmodifiableSet)
 - locked 0x000758cb0548 (a sun.nio.ch.EPollSelectorImpl)
 at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
 at 
 org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:336)
 at 
 org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:158)
 at 
 org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:153)
 at 
 org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:114)
 at 
 java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
 at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
 - locked 0x000754e978a0 (a java.io.BufferedOutputStream)
 at java.io.DataOutputStream.flush(DataOutputStream.java:106)
 at 
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection.sendParam(HBaseClient.java:620)
 - locked 0x000754e97880 (a java.io.DataOutputStream)
 at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:975)
 at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:86)
 at $Proxy13.multi(Unknown Source)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1395)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1393)
 at 
 org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:210)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1402)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1390)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 This thread have hung for one hours
 Meanwhile other thread try to close connection
 IPC Client (1983049639) connection to 
 dump002030.cm6.tbsite.net/10.246.2.30:30020 from admin daemon prio=10 
 tid=0x7f1e70674800 nid=0x3d76 waiting for monitor entry 
 [0x4bc0f000]
java.lang.Thread.State: BLOCKED (on object monitor)
 at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
 - waiting to lock 0x000754e978a0 (a 
 java.io.BufferedOutputStream)
 at java.io.DataOutputStream.flush(DataOutputStream.java:106)
 at java.io.FilterOutputStream.close(FilterOutputStream.java:140)
 at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:237)
 at org.apache.hadoop.io.IOUtils.closeStream(IOUtils.java:254)
 at 
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection.close(HBaseClient.java:715)
 - locked 0x000754e7b818 (a 
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection)
 at 
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:587)
 dump002030.cm6.tbsite.net is dead regionserver.
 I read  hbase sourececode, discover connection.out doesn't set timeout 
 this.out = new DataOutputStream
 (new BufferedOutputStream(NetUtils.getOutputStream(socket)));
 I see this mean epoll_wait will block indefinitely. 



--
This message 

[jira] [Commented] (HBASE-10047) postScannerFilterRow consumes a lot of CPU in tall table scans

2013-12-20 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854409#comment-13854409
 ] 

Lars Hofhansl commented on HBASE-10047:
---

Interesting, didn't realize that can happen. After the region is loaded? Ohh, 
when we detect an error we remove the coprocessor.
SortedCopyOnWriteSet should have been a hint too :)

The first patch is still valid, since we're only removing after the region was 
loaded.
I didn't measure any perf improvement with v2 anyway, it seems instanceof is 
not the issue.


 postScannerFilterRow consumes a lot of CPU in tall table scans
 --

 Key: HBASE-10047
 URL: https://issues.apache.org/jira/browse/HBASE-10047
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
 Attachments: 10047-0.94-sample-v2.txt, 10047-0.94-sample.txt, 
 postScannerFilterRow.png


 Continuing my profiling quest, I find that in scanning tall table (and 
 filtering everything on the server) a quarter of the time is now spent in the 
 postScannerFilterRow coprocessor hook.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10216) Change HBase to support local compactions

2013-12-20 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854414#comment-13854414
 ] 

Vladimir Rodionov commented on HBASE-10216:
---

This should be opened as  HDFS ticket. Provide API to *register*  new file with 
a given path and block locations. This may benefits a lot hdfs copy as well - 
blocks can be copied locally and new file will be created with just one HDFS 
API call registerFile(Path path, BlockLocation[] locations). Compaction will be 
performed locally (mostly) and the coordinator of compaction will call 
*registerFile(Path path, BlockLocation[] locations)* when all involved nodes 
are finished.

 

 Change HBase to support local compactions
 -

 Key: HBASE-10216
 URL: https://issues.apache.org/jira/browse/HBASE-10216
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
 Environment: All
Reporter: David Witten

 As I understand it compactions will read data from DFS and write to DFS.  
 This means that even when the reading occurs on the local host (because 
 region server has a local copy) all the writing must go over the network to 
 the other replicas.  This proposal suggests that HBase would perform much 
 better if all the reading and writing occurred locally and did not go over 
 the network. 
 I propose that the DFS interface be extended to provide method that would 
 merge files so that the merging and deleting can be performed on local data 
 nodes with no file contents moving over the network.  The method would take a 
 list of paths to be merged and deleted and the merged file path and an 
 indication of a file-format-aware class that would be run on each data node 
 to perform the merge.  The merge method provided by this merging class would 
 be passed files open for reading for all the files to be merged and one file 
 open for writing.  The custom class provided merge method would read all the 
 input files and append to the output file using some standard API that would 
 work across all DFS implementations.  The DFS would ensure that the merge had 
 happened properly on all replicas before returning to the caller.  It could 
 be that greater resiliency could be achieved by implementing the deletion as 
 a separate phase that is only done after enough of the replicas had completed 
 the merge. 
 HBase would be changed to use the new merge method for compactions, and would 
 provide an implementation of the merging class that works with HFiles.
 This proposal would require a custom code that understands the file format to 
 be runnable by the data nodes to manage the merge.  So there would need to be 
 a facility to load classes into DFS if there isn't such a facility already.  
 Or, less generally, HDFS could build in support for HFile merging.
 The merge method might be optional.  If the DFS implementation did not 
 provide it a generic version that performed the merge on top of the regular 
 DFS interfaces would be used.
 It may be that this method needs to be tweaked or ignored when the region 
 server does not have a local copy data so that, as happens currently, one 
 copy of the data moves to the region server.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HBASE-10095) Selective WALEdit encryption

2013-12-20 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-10095:
---

Affects Version/s: (was: 0.98.0)
   0.99.0
Fix Version/s: (was: 0.98.0)

I've spent some time looking at how to accomplish this. We have implemented 
WALEdit encryption using a WALCellCodec, which is necessary because WALEdits 
are stratified by rows, not columns, so some cells in a WALEdit will be 
encrypted and some not if we are selectively doing this. In the WALCellCodec 
context, we only have information about the cell, we can't get a reference to 
anything that will lead to family information.

Replication provides an existing example of how to do family-specific WALEdit 
modification. Replication modifies WALEdits by adding a WALActionsListener at a 
high level where it has access to the server. The WALEdit type already has 
fields for carrying scope information. We could do something similar here: We 
could add a field to WALEdit indicating if it should be encrypted or not and 
register a listener (up in HStore?) that sets it accordingly, but this is not 
enough because WALCellCodecs only see Cells, not the WALEdit that contains them.

I have experimented with a few interface changes and am not happy with any of 
the results so far. So I am going to move this out.

 Selective WALEdit encryption
 

 Key: HBASE-10095
 URL: https://issues.apache.org/jira/browse/HBASE-10095
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.99.0
Reporter: Andrew Purtell
Assignee: Andrew Purtell

 The SecureWALProtobufWriter currently will encrypt every WAL entry if WAL 
 encryption is enabled. However, SecureWALProtobufReader can distinguish 
 between encrypted and unencrypted entries, and we encrypt every entry 
 individually in part because the reader can skip and seek around during split 
 and recovery, but also in part to enable selective encryption of WALedits. We 
 should consider encrypting only the WALedits of column families for which 
 HBASE-7544 features are configured. If few column families are encrypted 
 relative to all CFs on the cluster, the performance difference will be 
 significant.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Comment Edited] (HBASE-10095) Selective WALEdit encryption

2013-12-20 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854422#comment-13854422
 ] 

Andrew Purtell edited comment on HBASE-10095 at 12/20/13 7:02 PM:
--

I've spent some time looking at how to accomplish this. We have implemented 
WALEdit encryption using a WALCellCodec, which is necessary because WALEdits 
are stratified by rows, not columns, so some cells in a WALEdit will be 
encrypted and some not if we are selectively doing this. In the WALCellCodec 
context, we only have information about the cell, we can't get a reference to 
anything that will lead to family information.

Replication provides an existing example of how to do family-specific WALEdit 
modification. Replication modifies WALEdits by adding a WALActionsListener at a 
high level where it has access to the server. The WALEdit type already has 
fields for carrying scope information. We could do something similar here: We 
could add a field to WALEdit indicating which cells for which famililes within 
it should be encrypted, and register a listener (up in HStore?) that sets it 
accordingly, but this is not enough because WALCellCodecs only see Cells, not 
the WALEdit that contains them.

I have experimented with a few interface changes and am not happy with any of 
the results so far. So I am going to move this out.


was (Author: apurtell):
I've spent some time looking at how to accomplish this. We have implemented 
WALEdit encryption using a WALCellCodec, which is necessary because WALEdits 
are stratified by rows, not columns, so some cells in a WALEdit will be 
encrypted and some not if we are selectively doing this. In the WALCellCodec 
context, we only have information about the cell, we can't get a reference to 
anything that will lead to family information.

Replication provides an existing example of how to do family-specific WALEdit 
modification. Replication modifies WALEdits by adding a WALActionsListener at a 
high level where it has access to the server. The WALEdit type already has 
fields for carrying scope information. We could do something similar here: We 
could add a field to WALEdit indicating if it should be encrypted or not and 
register a listener (up in HStore?) that sets it accordingly, but this is not 
enough because WALCellCodecs only see Cells, not the WALEdit that contains them.

I have experimented with a few interface changes and am not happy with any of 
the results so far. So I am going to move this out.

 Selective WALEdit encryption
 

 Key: HBASE-10095
 URL: https://issues.apache.org/jira/browse/HBASE-10095
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.99.0
Reporter: Andrew Purtell
Assignee: Andrew Purtell

 The SecureWALProtobufWriter currently will encrypt every WAL entry if WAL 
 encryption is enabled. However, SecureWALProtobufReader can distinguish 
 between encrypted and unencrypted entries, and we encrypt every entry 
 individually in part because the reader can skip and seek around during split 
 and recovery, but also in part to enable selective encryption of WALedits. We 
 should consider encrypting only the WALedits of column families for which 
 HBASE-7544 features are configured. If few column families are encrypted 
 relative to all CFs on the cluster, the performance difference will be 
 significant.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Comment Edited] (HBASE-10095) Selective WALEdit encryption

2013-12-20 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854422#comment-13854422
 ] 

Andrew Purtell edited comment on HBASE-10095 at 12/20/13 7:04 PM:
--

I've spent some time looking at how to accomplish this. We have implemented 
WALEdit encryption using a WALCellCodec, which is necessary because WALEdits 
are stratified by rows, not columns, so some cells in a WALEdit will be 
encrypted and some not if we are selectively doing this. In the WALCellCodec 
context, we only have information about the cell, we can't get a reference to 
anything that will lead to family information.

Replication provides an existing example of how to do family-specific WALEdit 
modification. Replication modifies WALEdits by adding a WALActionsListener at a 
high level where it has access to the server. The WALEdit type already has 
fields for carrying scope information. We could do something similar here: We 
could add a field to WALEdit indicating which cells for which famililes within 
it should be encrypted, and register a listener (up in HStore?) that sets it 
accordingly, but that would still not be quite enough because WALCellCodecs 
only see Cells, not the WALEdit that contains them.

I have experimented with a few interface changes and am not happy with any of 
the results so far. So I am going to move this out.


was (Author: apurtell):
I've spent some time looking at how to accomplish this. We have implemented 
WALEdit encryption using a WALCellCodec, which is necessary because WALEdits 
are stratified by rows, not columns, so some cells in a WALEdit will be 
encrypted and some not if we are selectively doing this. In the WALCellCodec 
context, we only have information about the cell, we can't get a reference to 
anything that will lead to family information.

Replication provides an existing example of how to do family-specific WALEdit 
modification. Replication modifies WALEdits by adding a WALActionsListener at a 
high level where it has access to the server. The WALEdit type already has 
fields for carrying scope information. We could do something similar here: We 
could add a field to WALEdit indicating which cells for which famililes within 
it should be encrypted, and register a listener (up in HStore?) that sets it 
accordingly, but this is not enough because WALCellCodecs only see Cells, not 
the WALEdit that contains them.

I have experimented with a few interface changes and am not happy with any of 
the results so far. So I am going to move this out.

 Selective WALEdit encryption
 

 Key: HBASE-10095
 URL: https://issues.apache.org/jira/browse/HBASE-10095
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.99.0
Reporter: Andrew Purtell
Assignee: Andrew Purtell

 The SecureWALProtobufWriter currently will encrypt every WAL entry if WAL 
 encryption is enabled. However, SecureWALProtobufReader can distinguish 
 between encrypted and unencrypted entries, and we encrypt every entry 
 individually in part because the reader can skip and seek around during split 
 and recovery, but also in part to enable selective encryption of WALedits. We 
 should consider encrypting only the WALedits of column families for which 
 HBASE-7544 features are configured. If few column families are encrypted 
 relative to all CFs on the cluster, the performance difference will be 
 significant.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10210) during master startup, RS can be you-are-dead-ed by master in error

2013-12-20 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854434#comment-13854434
 ] 

Jimmy Xiang commented on HBASE-10210:
-

Looks like when the master starts up, we don't put those regionservers in ZK 
into the online server list. Check RegionServerTracker#start. Will fixing this 
will fix this issue?

 during master startup, RS can be you-are-dead-ed by master in error
 ---

 Key: HBASE-10210
 URL: https://issues.apache.org/jira/browse/HBASE-10210
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0, 0.96.1, 0.99.0, 0.96.1.1
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HBASE-10210.patch


 Not sure of the root cause yet, I am at how did this ever work stage.
 We see this problem in 0.96.1, but didn't in 0.96.0 + some patches.
 It looks like RS information arriving from 2 sources - ZK and server itself, 
 can conflict. Master doesn't handle such cases (timestamp match), and anyway 
 technically timestamps can collide for two separate servers.
 So, master YouAreDead-s the already-recorded reporting RS, and adds it too. 
 Then it discovers that the new server has died with fatal error!
 Note the threads.
 Addition is called from master initialization and from RPC.
 {noformat}
 2013-12-19 11:16:45,290 INFO  
 [master:h2-ubuntu12-sec-1387431063-hbase-10:6] master.ServerManager: 
 Finished waiting for region servers count to settle; checked in 2, slept for 
 18262 ms, expecting minimum of 1, maximum of 2147483647, master is running.
 2013-12-19 11:16:45,290 INFO  
 [master:h2-ubuntu12-sec-1387431063-hbase-10:6] master.ServerManager: 
 Registering 
 server=h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800
 2013-12-19 11:16:45,290 INFO  
 [master:h2-ubuntu12-sec-1387431063-hbase-10:6] master.HMaster: Registered 
 server found up in zk but who has not yet reported in: 
 h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800
 2013-12-19 11:16:45,380 INFO  [RpcServer.handler=4,port=6] 
 master.ServerManager: Triggering server recovery; existingServer 
 h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 
 looks stale, new 
 server:h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800
 2013-12-19 11:16:45,380 INFO  [RpcServer.handler=4,port=6] 
 master.ServerManager: Master doesn't enable ServerShutdownHandler during 
 initialization, delay expiring server 
 h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800
 ...
 2013-12-19 11:16:46,925 ERROR [RpcServer.handler=7,port=6] 
 master.HMaster: Region server 
 h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 
 reported a fatal error:
 ABORTING region server 
 h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800: 
 org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected; 
 currently processing 
 h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 as 
 dead server
 {noformat}
 Presumably some of the recent ZK listener related changes b



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10210) during master startup, RS can be you-are-dead-ed by master in error

2013-12-20 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854445#comment-13854445
 ] 

Sergey Shelukhin commented on HBASE-10210:
--

You mean the online servers in the tracker? It does add them to its internal 
list. Can you elaborate a bit.
If they are put into other online servers, wouldn't it make the issue worse - 
as far as I see in the check...AndAdd method and around ,there's no provision 
for one server to be added twice, if it was already there the same issue will 
happen, report rejected.

 during master startup, RS can be you-are-dead-ed by master in error
 ---

 Key: HBASE-10210
 URL: https://issues.apache.org/jira/browse/HBASE-10210
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0, 0.96.1, 0.99.0, 0.96.1.1
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HBASE-10210.patch


 Not sure of the root cause yet, I am at how did this ever work stage.
 We see this problem in 0.96.1, but didn't in 0.96.0 + some patches.
 It looks like RS information arriving from 2 sources - ZK and server itself, 
 can conflict. Master doesn't handle such cases (timestamp match), and anyway 
 technically timestamps can collide for two separate servers.
 So, master YouAreDead-s the already-recorded reporting RS, and adds it too. 
 Then it discovers that the new server has died with fatal error!
 Note the threads.
 Addition is called from master initialization and from RPC.
 {noformat}
 2013-12-19 11:16:45,290 INFO  
 [master:h2-ubuntu12-sec-1387431063-hbase-10:6] master.ServerManager: 
 Finished waiting for region servers count to settle; checked in 2, slept for 
 18262 ms, expecting minimum of 1, maximum of 2147483647, master is running.
 2013-12-19 11:16:45,290 INFO  
 [master:h2-ubuntu12-sec-1387431063-hbase-10:6] master.ServerManager: 
 Registering 
 server=h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800
 2013-12-19 11:16:45,290 INFO  
 [master:h2-ubuntu12-sec-1387431063-hbase-10:6] master.HMaster: Registered 
 server found up in zk but who has not yet reported in: 
 h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800
 2013-12-19 11:16:45,380 INFO  [RpcServer.handler=4,port=6] 
 master.ServerManager: Triggering server recovery; existingServer 
 h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 
 looks stale, new 
 server:h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800
 2013-12-19 11:16:45,380 INFO  [RpcServer.handler=4,port=6] 
 master.ServerManager: Master doesn't enable ServerShutdownHandler during 
 initialization, delay expiring server 
 h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800
 ...
 2013-12-19 11:16:46,925 ERROR [RpcServer.handler=7,port=6] 
 master.HMaster: Region server 
 h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 
 reported a fatal error:
 ABORTING region server 
 h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800: 
 org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected; 
 currently processing 
 h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 as 
 dead server
 {noformat}
 Presumably some of the recent ZK listener related changes b



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Comment Edited] (HBASE-10210) during master startup, RS can be you-are-dead-ed by master in error

2013-12-20 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854445#comment-13854445
 ] 

Sergey Shelukhin edited comment on HBASE-10210 at 12/20/13 7:15 PM:


You mean the online servers in the tracker? It does add them to its internal 
list. Can you elaborate a bit.
If they are put into other online servers, wouldn't it make the issue worse - 
as far as I see in the check...AndAdd method and around ,there's no provision 
for one server to be added twice, if it was already there the same issue will 
happen, it will expire the old one (from ZK), then get report rejected.


was (Author: sershe):
You mean the online servers in the tracker? It does add them to its internal 
list. Can you elaborate a bit.
If they are put into other online servers, wouldn't it make the issue worse - 
as far as I see in the check...AndAdd method and around ,there's no provision 
for one server to be added twice, if it was already there the same issue will 
happen, report rejected.

 during master startup, RS can be you-are-dead-ed by master in error
 ---

 Key: HBASE-10210
 URL: https://issues.apache.org/jira/browse/HBASE-10210
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0, 0.96.1, 0.99.0, 0.96.1.1
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HBASE-10210.patch


 Not sure of the root cause yet, I am at how did this ever work stage.
 We see this problem in 0.96.1, but didn't in 0.96.0 + some patches.
 It looks like RS information arriving from 2 sources - ZK and server itself, 
 can conflict. Master doesn't handle such cases (timestamp match), and anyway 
 technically timestamps can collide for two separate servers.
 So, master YouAreDead-s the already-recorded reporting RS, and adds it too. 
 Then it discovers that the new server has died with fatal error!
 Note the threads.
 Addition is called from master initialization and from RPC.
 {noformat}
 2013-12-19 11:16:45,290 INFO  
 [master:h2-ubuntu12-sec-1387431063-hbase-10:6] master.ServerManager: 
 Finished waiting for region servers count to settle; checked in 2, slept for 
 18262 ms, expecting minimum of 1, maximum of 2147483647, master is running.
 2013-12-19 11:16:45,290 INFO  
 [master:h2-ubuntu12-sec-1387431063-hbase-10:6] master.ServerManager: 
 Registering 
 server=h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800
 2013-12-19 11:16:45,290 INFO  
 [master:h2-ubuntu12-sec-1387431063-hbase-10:6] master.HMaster: Registered 
 server found up in zk but who has not yet reported in: 
 h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800
 2013-12-19 11:16:45,380 INFO  [RpcServer.handler=4,port=6] 
 master.ServerManager: Triggering server recovery; existingServer 
 h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 
 looks stale, new 
 server:h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800
 2013-12-19 11:16:45,380 INFO  [RpcServer.handler=4,port=6] 
 master.ServerManager: Master doesn't enable ServerShutdownHandler during 
 initialization, delay expiring server 
 h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800
 ...
 2013-12-19 11:16:46,925 ERROR [RpcServer.handler=7,port=6] 
 master.HMaster: Region server 
 h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 
 reported a fatal error:
 ABORTING region server 
 h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800: 
 org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected; 
 currently processing 
 h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 as 
 dead server
 {noformat}
 Presumably some of the recent ZK listener related changes b



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10210) during master startup, RS can be you-are-dead-ed by master in error

2013-12-20 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854452#comment-13854452
 ] 

Jimmy Xiang commented on HBASE-10210:
-

I have not thought through the issue yet. For now, as I know ServerManager has 
a list, and RegionServerTracker has a list too. The start call only adds the rs 
from the ZK to the list in RegionServerTracker, which is right. However, For 
the first run, should we also add them to the list in ServerManager?

 during master startup, RS can be you-are-dead-ed by master in error
 ---

 Key: HBASE-10210
 URL: https://issues.apache.org/jira/browse/HBASE-10210
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0, 0.96.1, 0.99.0, 0.96.1.1
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HBASE-10210.patch


 Not sure of the root cause yet, I am at how did this ever work stage.
 We see this problem in 0.96.1, but didn't in 0.96.0 + some patches.
 It looks like RS information arriving from 2 sources - ZK and server itself, 
 can conflict. Master doesn't handle such cases (timestamp match), and anyway 
 technically timestamps can collide for two separate servers.
 So, master YouAreDead-s the already-recorded reporting RS, and adds it too. 
 Then it discovers that the new server has died with fatal error!
 Note the threads.
 Addition is called from master initialization and from RPC.
 {noformat}
 2013-12-19 11:16:45,290 INFO  
 [master:h2-ubuntu12-sec-1387431063-hbase-10:6] master.ServerManager: 
 Finished waiting for region servers count to settle; checked in 2, slept for 
 18262 ms, expecting minimum of 1, maximum of 2147483647, master is running.
 2013-12-19 11:16:45,290 INFO  
 [master:h2-ubuntu12-sec-1387431063-hbase-10:6] master.ServerManager: 
 Registering 
 server=h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800
 2013-12-19 11:16:45,290 INFO  
 [master:h2-ubuntu12-sec-1387431063-hbase-10:6] master.HMaster: Registered 
 server found up in zk but who has not yet reported in: 
 h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800
 2013-12-19 11:16:45,380 INFO  [RpcServer.handler=4,port=6] 
 master.ServerManager: Triggering server recovery; existingServer 
 h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 
 looks stale, new 
 server:h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800
 2013-12-19 11:16:45,380 INFO  [RpcServer.handler=4,port=6] 
 master.ServerManager: Master doesn't enable ServerShutdownHandler during 
 initialization, delay expiring server 
 h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800
 ...
 2013-12-19 11:16:46,925 ERROR [RpcServer.handler=7,port=6] 
 master.HMaster: Region server 
 h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 
 reported a fatal error:
 ABORTING region server 
 h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800: 
 org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected; 
 currently processing 
 h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 as 
 dead server
 {noformat}
 Presumably some of the recent ZK listener related changes b



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10183) Need enforce a reserved range of system tag types

2013-12-20 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854454#comment-13854454
 ] 

Jeffrey Zhong commented on HBASE-10183:
---

Sounds good. Please close it as dup. Thanks.

 Need enforce a reserved range of system tag types
 -

 Key: HBASE-10183
 URL: https://issues.apache.org/jira/browse/HBASE-10183
 Project: HBase
  Issue Type: Task
  Components: HFile
Affects Versions: 0.98.0
Reporter: Jeffrey Zhong
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.98.0


 If we don't reserve a system tag types now, let's say 0-64(total tag type 
 range is 0-255). We'll have a hard time when introducing a new system tag 
 type in the future because the new tag type may collide with an existing user 
 tag type as tag is open to users as well.
 [~ram_krish], [~anoop.hbase] How do you guys think?
 Thanks!



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-8558) Add timeout limit for HBaseClient dataOutputStream

2013-12-20 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854459#comment-13854459
 ] 

Lars Hofhansl commented on HBASE-8558:
--

+1

 Add timeout limit for HBaseClient dataOutputStream
 --

 Key: HBASE-8558
 URL: https://issues.apache.org/jira/browse/HBASE-8558
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.94.5, 0.94.14
Reporter: wanbin
Assignee: Liang Xie
 Attachments: HBASE-8558-0.94.txt


 I run jstack at client host. The result is below.
 hbase-tablepool-60-thread-34 daemon prio=10 tid=0x7f1e65a48000 
 nid=0x5173 runnable [0x579cc000]
java.lang.Thread.State: RUNNABLE
 at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
 at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
 at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
 at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
 - locked 0x000758cb0780 (a sun.nio.ch.Util$2)
 - locked 0x000758cb0770 (a 
 java.util.Collections$UnmodifiableSet)
 - locked 0x000758cb0548 (a sun.nio.ch.EPollSelectorImpl)
 at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
 at 
 org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:336)
 at 
 org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:158)
 at 
 org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:153)
 at 
 org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:114)
 at 
 java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
 at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
 - locked 0x000754e978a0 (a java.io.BufferedOutputStream)
 at java.io.DataOutputStream.flush(DataOutputStream.java:106)
 at 
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection.sendParam(HBaseClient.java:620)
 - locked 0x000754e97880 (a java.io.DataOutputStream)
 at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:975)
 at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:86)
 at $Proxy13.multi(Unknown Source)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1395)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1393)
 at 
 org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:210)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1402)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1390)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 This thread have hung for one hours
 Meanwhile other thread try to close connection
 IPC Client (1983049639) connection to 
 dump002030.cm6.tbsite.net/10.246.2.30:30020 from admin daemon prio=10 
 tid=0x7f1e70674800 nid=0x3d76 waiting for monitor entry 
 [0x4bc0f000]
java.lang.Thread.State: BLOCKED (on object monitor)
 at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
 - waiting to lock 0x000754e978a0 (a 
 java.io.BufferedOutputStream)
 at java.io.DataOutputStream.flush(DataOutputStream.java:106)
 at java.io.FilterOutputStream.close(FilterOutputStream.java:140)
 at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:237)
 at org.apache.hadoop.io.IOUtils.closeStream(IOUtils.java:254)
 at 
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection.close(HBaseClient.java:715)
 - locked 0x000754e7b818 (a 
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection)
 at 
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:587)
 dump002030.cm6.tbsite.net is dead regionserver.
 I read  hbase sourececode, discover connection.out doesn't set timeout 
 this.out = new DataOutputStream
 (new BufferedOutputStream(NetUtils.getOutputStream(socket)));
 I see this mean epoll_wait will block indefinitely. 



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10210) during master startup, RS can be you-are-dead-ed by master in error

2013-12-20 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854467#comment-13854467
 ] 

Sergey Shelukhin commented on HBASE-10210:
--

that's what it does in the loop after waiting for reporting servers (only for 
non-reported), afais

 during master startup, RS can be you-are-dead-ed by master in error
 ---

 Key: HBASE-10210
 URL: https://issues.apache.org/jira/browse/HBASE-10210
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0, 0.96.1, 0.99.0, 0.96.1.1
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HBASE-10210.patch


 Not sure of the root cause yet, I am at how did this ever work stage.
 We see this problem in 0.96.1, but didn't in 0.96.0 + some patches.
 It looks like RS information arriving from 2 sources - ZK and server itself, 
 can conflict. Master doesn't handle such cases (timestamp match), and anyway 
 technically timestamps can collide for two separate servers.
 So, master YouAreDead-s the already-recorded reporting RS, and adds it too. 
 Then it discovers that the new server has died with fatal error!
 Note the threads.
 Addition is called from master initialization and from RPC.
 {noformat}
 2013-12-19 11:16:45,290 INFO  
 [master:h2-ubuntu12-sec-1387431063-hbase-10:6] master.ServerManager: 
 Finished waiting for region servers count to settle; checked in 2, slept for 
 18262 ms, expecting minimum of 1, maximum of 2147483647, master is running.
 2013-12-19 11:16:45,290 INFO  
 [master:h2-ubuntu12-sec-1387431063-hbase-10:6] master.ServerManager: 
 Registering 
 server=h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800
 2013-12-19 11:16:45,290 INFO  
 [master:h2-ubuntu12-sec-1387431063-hbase-10:6] master.HMaster: Registered 
 server found up in zk but who has not yet reported in: 
 h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800
 2013-12-19 11:16:45,380 INFO  [RpcServer.handler=4,port=6] 
 master.ServerManager: Triggering server recovery; existingServer 
 h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 
 looks stale, new 
 server:h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800
 2013-12-19 11:16:45,380 INFO  [RpcServer.handler=4,port=6] 
 master.ServerManager: Master doesn't enable ServerShutdownHandler during 
 initialization, delay expiring server 
 h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800
 ...
 2013-12-19 11:16:46,925 ERROR [RpcServer.handler=7,port=6] 
 master.HMaster: Region server 
 h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 
 reported a fatal error:
 ABORTING region server 
 h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800: 
 org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected; 
 currently processing 
 h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 as 
 dead server
 {noformat}
 Presumably some of the recent ZK listener related changes b



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-8529) checkOpen is missing from multi, mutate, get and multiGet etc.

2013-12-20 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854472#comment-13854472
 ] 

Jeffrey Zhong commented on HBASE-8529:
--

Thanks [~anoop.hbase], [~ram_krish] for the reviews! I've integrated it into 
0.98 and trunk branch.

 checkOpen is missing from multi, mutate, get and multiGet etc.
 --

 Key: HBASE-8529
 URL: https://issues.apache.org/jira/browse/HBASE-8529
 Project: HBase
  Issue Type: Bug
Reporter: Jeffrey Zhong
Assignee: Jeffrey Zhong
Priority: Minor
 Fix For: 0.98.0, 0.99.0

 Attachments: hbase-8529.patch


 I saw we have checkOpen in all those functions in 0.94 while they're missing 
 from trunk. Does anyone know why?
 For multi and mutate, if we don't call checkOpen we could flood our logs with 
 bunch of DFSOutputStream is closed errors when we sync WAL.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HBASE-8529) checkOpen is missing from multi, mutate, get and multiGet etc.

2013-12-20 Thread Jeffrey Zhong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Zhong updated HBASE-8529:
-

Resolution: Fixed
Status: Resolved  (was: Patch Available)

 checkOpen is missing from multi, mutate, get and multiGet etc.
 --

 Key: HBASE-8529
 URL: https://issues.apache.org/jira/browse/HBASE-8529
 Project: HBase
  Issue Type: Bug
Reporter: Jeffrey Zhong
Assignee: Jeffrey Zhong
Priority: Minor
 Fix For: 0.98.0, 0.99.0

 Attachments: hbase-8529.patch


 I saw we have checkOpen in all those functions in 0.94 while they're missing 
 from trunk. Does anyone know why?
 For multi and mutate, if we don't call checkOpen we could flood our logs with 
 bunch of DFSOutputStream is closed errors when we sync WAL.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


  1   2   >