[jira] [Commented] (HBASE-14062) RpcServer.Listener.doAccept get blocked by LinkedList.remove
[ https://issues.apache.org/jira/browse/HBASE-14062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625918#comment-14625918 ] Heng Chen commented on HBASE-14062: --- you are welcome RpcServer.Listener.doAccept get blocked by LinkedList.remove Key: HBASE-14062 URL: https://issues.apache.org/jira/browse/HBASE-14062 Project: HBase Issue Type: Bug Components: IPC/RPC Affects Versions: 0.98.12 Reporter: Victor Xu Attachments: hbase.log, jstack.log We saw these blocked info in our jstack output: {noformat} RpcServer.listener,port=60020 daemon prio=10 tid=0x7f158097b800 nid=0x2cd05 waiting for monitor entry [0x46374000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hbase.ipc.RpcServer$Listener.doAccept(RpcServer.java:833) - waiting to lock 0x0002bb094ac8 (a java.util.Collections$SynchronizedList) at org.apache.hadoop.hbase.ipc.RpcServer$Listener.run(RpcServer.java:748) {noformat} And the owner of the lock is LinkedList.remove: {noformat} RpcServer.reader=9,port=60020 daemon prio=10 tid=0x7f1580394000 nid=0x2cc19 runnable [0x43b4c000] java.lang.Thread.State: RUNNABLE at java.util.LinkedList.remove(LinkedList.java:363) at java.util.Collections$SynchronizedCollection.remove(Collections.java:1639) - locked 0x0002bb094ac8 (a java.util.Collections$SynchronizedList) at org.apache.hadoop.hbase.ipc.RpcServer.closeConnection(RpcServer.java:1992) - locked 0x0002bb094ac8 (a java.util.Collections$SynchronizedList) at org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:867) at org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:645) - locked 0x0002bae09a30 (a org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader) at org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:620) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) {noformat} This issue blocked RS once in a while and I had to restart it whenever it happens. It seems like a bug. Any suggestions? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14072) Nullpointerexception in running mapreduce with toolrun and exteded class TableInputFormat
[ https://issues.apache.org/jira/browse/HBASE-14072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625981#comment-14625981 ] aslam commented on HBASE-14072: --- at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getTable(TableInputFormatBase.java:585) INFO | jvm 1| 2015/07/14 18:30:56 | at com.flytxt.ostrich.engine.hadoop.MultiLineTableInputFormat.getSplits(MultiLineTableInputFormat.java:41) INFO | jvm 1| 2015/07/14 18:30:56 | at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:493) INFO | jvm 1| 2015/07/14 18:30:56 | at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:510) INFO | jvm 1| 2015/07/14 18:30:56 | at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:394) INFO | jvm 1| 2015/07/14 18:30:56 | at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285) INFO | jvm 1| 2015/07/14 18:30:56 | at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282) INFO | jvm 1| 2015/07/14 18:30:56 | at java.security.AccessController.doPrivileged(Native Method) INFO | jvm 1| 2015/07/14 18:30:56 | at javax.security.auth.Subject.doAs(Subject.java:415) INFO | jvm 1| 2015/07/14 18:30:56 | at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) INFO | jvm 1| 2015/07/14 18:30:56 | at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282) INFO | jvm 1| 2015/07/14 18:30:56 | at com.flytxt.ostrich.engine.hadoop.AbstractMRJob.run(AbstractMRJob.java:96) INFO | jvm 1| 2015/07/14 18:30:56 | at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) INFO | jvm 1| 2015/07/14 18:30:56 | at com.flytxt.ostrich.engine.executor.SplitExecutorimpl.start(SplitExecutorimpl.java:72) INFO | jvm 1| 2015/07/14 18:30:56 | at com.flytxt.ostrich.core.main.ConveyorBelt.startConveyor(ConveyorBelt.java:158) INFO | jvm 1| 2015/07/14 18:30:56 | at com.flytxt.ostrich.core.main.ConveyorBelt.execute(ConveyorBelt.java:243) INFO | jvm 1| 2015/07/14 18:30:56 | at com.flytxt.ostrich.core.message.JMSConveyorBeltReceiver.onMessage(JMSConveyorBeltReceiver.java:44) INFO | jvm 1| 2015/07/14 18:30:56 | at org.springframework.jms.listener.adapter.MessageListenerAdapter.onMessage(MessageListenerAdapter.java:339) INFO | jvm 1| 2015/07/14 18:30:56 | at org.springframework.jms.listener.AbstractMessageListenerContainer.doInvokeListener(AbstractMessageListenerContainer.java:537) INFO | jvm 1| 2015/07/14 18:30:56 | at org.springframework.jms.listener.AbstractMessageListenerContainer.invokeListener(AbstractMessageListenerContainer.java:497) INFO | jvm 1| 2015/07/14 18:30:56 | at org.springframework.jms.listener.AbstractMessageListenerContainer.doExecuteListener(AbstractMessageListenerContainer.java:468) INFO | jvm 1| 2015/07/14 18:30:56 | at org.springframework.jms.listener.AbstractPollingMessageListenerContainer.doReceiveAndExecute(AbstractPollingMessageListenerContainer.java:325) INFO | jvm 1| 2015/07/14 18:30:56 | at org.springframework.jms.listener.AbstractPollingMessageListenerContainer.receiveAndExecute(AbstractPollingMessageListenerContainer.java:263) INFO | jvm 1| 2015/07/14 18:30:56 | at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.invokeListener(DefaultMessageListenerContainer.java:1096) INFO | jvm 1| 2015/07/14 18:30:56 | at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.executeOngoingLoop(DefaultMessageListenerContainer.java:1088) INFO | jvm 1| 2015/07/14 18:30:56 | at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.run(DefaultMessageListenerContainer.java:985) INFO | jvm 1| 2015/07/14 18:30:56 | at java.lang.Thread.run(Thread.java:745) Nullpointerexception in running mapreduce with toolrun and exteded class TableInputFormat -- Key: HBASE-14072 URL: https://issues.apache.org/jira/browse/HBASE-14072 Project: HBase Issue Type: Bug Components: hbase Affects Versions: 1.1.0.1 Reporter: aslam Running mapreduce with extended class org.apache.hadoop.hbase.mapreduce.TableInputFormat , and using toolrun ,nullpointerexception is coming from getTable().getName().The code was working if previous version of hbase. Its working if we call initialize(context) inside getSplits method. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14062) RpcServer.Listener.doAccept get blocked by LinkedList.remove
[ https://issues.apache.org/jira/browse/HBASE-14062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625896#comment-14625896 ] Heng Chen commented on HBASE-14062: --- I think so. what is your Hbase client logic? RpcServer.Listener.doAccept get blocked by LinkedList.remove Key: HBASE-14062 URL: https://issues.apache.org/jira/browse/HBASE-14062 Project: HBase Issue Type: Bug Components: IPC/RPC Affects Versions: 0.98.12 Reporter: Victor Xu Attachments: hbase.log, jstack.log We saw these blocked info in our jstack output: {noformat} RpcServer.listener,port=60020 daemon prio=10 tid=0x7f158097b800 nid=0x2cd05 waiting for monitor entry [0x46374000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hbase.ipc.RpcServer$Listener.doAccept(RpcServer.java:833) - waiting to lock 0x0002bb094ac8 (a java.util.Collections$SynchronizedList) at org.apache.hadoop.hbase.ipc.RpcServer$Listener.run(RpcServer.java:748) {noformat} And the owner of the lock is LinkedList.remove: {noformat} RpcServer.reader=9,port=60020 daemon prio=10 tid=0x7f1580394000 nid=0x2cc19 runnable [0x43b4c000] java.lang.Thread.State: RUNNABLE at java.util.LinkedList.remove(LinkedList.java:363) at java.util.Collections$SynchronizedCollection.remove(Collections.java:1639) - locked 0x0002bb094ac8 (a java.util.Collections$SynchronizedList) at org.apache.hadoop.hbase.ipc.RpcServer.closeConnection(RpcServer.java:1992) - locked 0x0002bb094ac8 (a java.util.Collections$SynchronizedList) at org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:867) at org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:645) - locked 0x0002bae09a30 (a org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader) at org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:620) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) {noformat} This issue blocked RS once in a while and I had to restart it whenever it happens. It seems like a bug. Any suggestions? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14062) RpcServer.Listener.doAccept get blocked by LinkedList.remove
[ https://issues.apache.org/jira/browse/HBASE-14062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625914#comment-14625914 ] Victor Xu commented on HBASE-14062: --- A variety of applications are using this hbase cluster, and they do not share the same client configurations and retry logic. I'll use tcpdump to find the guilty application when I come across this issue next time. Thanks for your help, Heng Chen! RpcServer.Listener.doAccept get blocked by LinkedList.remove Key: HBASE-14062 URL: https://issues.apache.org/jira/browse/HBASE-14062 Project: HBase Issue Type: Bug Components: IPC/RPC Affects Versions: 0.98.12 Reporter: Victor Xu Attachments: hbase.log, jstack.log We saw these blocked info in our jstack output: {noformat} RpcServer.listener,port=60020 daemon prio=10 tid=0x7f158097b800 nid=0x2cd05 waiting for monitor entry [0x46374000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hbase.ipc.RpcServer$Listener.doAccept(RpcServer.java:833) - waiting to lock 0x0002bb094ac8 (a java.util.Collections$SynchronizedList) at org.apache.hadoop.hbase.ipc.RpcServer$Listener.run(RpcServer.java:748) {noformat} And the owner of the lock is LinkedList.remove: {noformat} RpcServer.reader=9,port=60020 daemon prio=10 tid=0x7f1580394000 nid=0x2cc19 runnable [0x43b4c000] java.lang.Thread.State: RUNNABLE at java.util.LinkedList.remove(LinkedList.java:363) at java.util.Collections$SynchronizedCollection.remove(Collections.java:1639) - locked 0x0002bb094ac8 (a java.util.Collections$SynchronizedList) at org.apache.hadoop.hbase.ipc.RpcServer.closeConnection(RpcServer.java:1992) - locked 0x0002bb094ac8 (a java.util.Collections$SynchronizedList) at org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:867) at org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:645) - locked 0x0002bae09a30 (a org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader) at org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:620) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) {noformat} This issue blocked RS once in a while and I had to restart it whenever it happens. It seems like a bug. Any suggestions? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14062) RpcServer.Listener.doAccept get blocked by LinkedList.remove
[ https://issues.apache.org/jira/browse/HBASE-14062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625877#comment-14625877 ] Victor Xu commented on HBASE-14062: --- There might be lots of requests coming together and only 10 readers are there to handler them. Whenever a reader starts to read the data, the client quits. All readers are busy repeating these read/fail loop so the lock seems to be always held, and other normal requests are blocked(or served slowly). Am I right? RpcServer.Listener.doAccept get blocked by LinkedList.remove Key: HBASE-14062 URL: https://issues.apache.org/jira/browse/HBASE-14062 Project: HBase Issue Type: Bug Components: IPC/RPC Affects Versions: 0.98.12 Reporter: Victor Xu Attachments: hbase.log, jstack.log We saw these blocked info in our jstack output: {noformat} RpcServer.listener,port=60020 daemon prio=10 tid=0x7f158097b800 nid=0x2cd05 waiting for monitor entry [0x46374000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hbase.ipc.RpcServer$Listener.doAccept(RpcServer.java:833) - waiting to lock 0x0002bb094ac8 (a java.util.Collections$SynchronizedList) at org.apache.hadoop.hbase.ipc.RpcServer$Listener.run(RpcServer.java:748) {noformat} And the owner of the lock is LinkedList.remove: {noformat} RpcServer.reader=9,port=60020 daemon prio=10 tid=0x7f1580394000 nid=0x2cc19 runnable [0x43b4c000] java.lang.Thread.State: RUNNABLE at java.util.LinkedList.remove(LinkedList.java:363) at java.util.Collections$SynchronizedCollection.remove(Collections.java:1639) - locked 0x0002bb094ac8 (a java.util.Collections$SynchronizedList) at org.apache.hadoop.hbase.ipc.RpcServer.closeConnection(RpcServer.java:1992) - locked 0x0002bb094ac8 (a java.util.Collections$SynchronizedList) at org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:867) at org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:645) - locked 0x0002bae09a30 (a org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader) at org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:620) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) {noformat} This issue blocked RS once in a while and I had to restart it whenever it happens. It seems like a bug. Any suggestions? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14072) Nullpointerexception in running mapreduce with toolrun and exteded class TableInputFormat
aslam created HBASE-14072: - Summary: Nullpointerexception in running mapreduce with toolrun and exteded class TableInputFormat Key: HBASE-14072 URL: https://issues.apache.org/jira/browse/HBASE-14072 Project: HBase Issue Type: Bug Components: hbase Affects Versions: 1.1.0.1 Reporter: aslam Running mapreduce with extended class org.apache.hadoop.hbase.mapreduce.TableInputFormat , and using toolrun ,nullpointerexception is coming from getTable().getName().The code was working if previous version of hbase. Its working if we call initialize(context) inside getSplits method. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14062) RpcServer.Listener.doAccept get blocked by LinkedList.remove
[ https://issues.apache.org/jira/browse/HBASE-14062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625897#comment-14625897 ] Heng Chen commented on HBASE-14062: --- I think so. what is your Hbase client logic? RpcServer.Listener.doAccept get blocked by LinkedList.remove Key: HBASE-14062 URL: https://issues.apache.org/jira/browse/HBASE-14062 Project: HBase Issue Type: Bug Components: IPC/RPC Affects Versions: 0.98.12 Reporter: Victor Xu Attachments: hbase.log, jstack.log We saw these blocked info in our jstack output: {noformat} RpcServer.listener,port=60020 daemon prio=10 tid=0x7f158097b800 nid=0x2cd05 waiting for monitor entry [0x46374000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hbase.ipc.RpcServer$Listener.doAccept(RpcServer.java:833) - waiting to lock 0x0002bb094ac8 (a java.util.Collections$SynchronizedList) at org.apache.hadoop.hbase.ipc.RpcServer$Listener.run(RpcServer.java:748) {noformat} And the owner of the lock is LinkedList.remove: {noformat} RpcServer.reader=9,port=60020 daemon prio=10 tid=0x7f1580394000 nid=0x2cc19 runnable [0x43b4c000] java.lang.Thread.State: RUNNABLE at java.util.LinkedList.remove(LinkedList.java:363) at java.util.Collections$SynchronizedCollection.remove(Collections.java:1639) - locked 0x0002bb094ac8 (a java.util.Collections$SynchronizedList) at org.apache.hadoop.hbase.ipc.RpcServer.closeConnection(RpcServer.java:1992) - locked 0x0002bb094ac8 (a java.util.Collections$SynchronizedList) at org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:867) at org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:645) - locked 0x0002bae09a30 (a org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader) at org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:620) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) {noformat} This issue blocked RS once in a while and I had to restart it whenever it happens. It seems like a bug. Any suggestions? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14062) RpcServer.Listener.doAccept get blocked by LinkedList.remove
[ https://issues.apache.org/jira/browse/HBASE-14062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625898#comment-14625898 ] Heng Chen commented on HBASE-14062: --- I think so. what is your Hbase client logic? RpcServer.Listener.doAccept get blocked by LinkedList.remove Key: HBASE-14062 URL: https://issues.apache.org/jira/browse/HBASE-14062 Project: HBase Issue Type: Bug Components: IPC/RPC Affects Versions: 0.98.12 Reporter: Victor Xu Attachments: hbase.log, jstack.log We saw these blocked info in our jstack output: {noformat} RpcServer.listener,port=60020 daemon prio=10 tid=0x7f158097b800 nid=0x2cd05 waiting for monitor entry [0x46374000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hbase.ipc.RpcServer$Listener.doAccept(RpcServer.java:833) - waiting to lock 0x0002bb094ac8 (a java.util.Collections$SynchronizedList) at org.apache.hadoop.hbase.ipc.RpcServer$Listener.run(RpcServer.java:748) {noformat} And the owner of the lock is LinkedList.remove: {noformat} RpcServer.reader=9,port=60020 daemon prio=10 tid=0x7f1580394000 nid=0x2cc19 runnable [0x43b4c000] java.lang.Thread.State: RUNNABLE at java.util.LinkedList.remove(LinkedList.java:363) at java.util.Collections$SynchronizedCollection.remove(Collections.java:1639) - locked 0x0002bb094ac8 (a java.util.Collections$SynchronizedList) at org.apache.hadoop.hbase.ipc.RpcServer.closeConnection(RpcServer.java:1992) - locked 0x0002bb094ac8 (a java.util.Collections$SynchronizedList) at org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:867) at org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:645) - locked 0x0002bae09a30 (a org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader) at org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:620) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) {noformat} This issue blocked RS once in a while and I had to restart it whenever it happens. It seems like a bug. Any suggestions? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14062) RpcServer.Listener.doAccept get blocked by LinkedList.remove
[ https://issues.apache.org/jira/browse/HBASE-14062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625902#comment-14625902 ] Victor Xu commented on HBASE-14062: --- We can see from the rs log that META table located on that rs. I guess maybe some applications use very short client rpc timeout or have requests cached locally before actually sending to this rs, and when the requests reach the rs, they almost exceed the timeout immediately. When the clients retry, this request-and-fail loop continues. This could happen when some big job (tens of thousands of maps using TableInputFormat) starts. RpcServer.Listener.doAccept get blocked by LinkedList.remove Key: HBASE-14062 URL: https://issues.apache.org/jira/browse/HBASE-14062 Project: HBase Issue Type: Bug Components: IPC/RPC Affects Versions: 0.98.12 Reporter: Victor Xu Attachments: hbase.log, jstack.log We saw these blocked info in our jstack output: {noformat} RpcServer.listener,port=60020 daemon prio=10 tid=0x7f158097b800 nid=0x2cd05 waiting for monitor entry [0x46374000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hbase.ipc.RpcServer$Listener.doAccept(RpcServer.java:833) - waiting to lock 0x0002bb094ac8 (a java.util.Collections$SynchronizedList) at org.apache.hadoop.hbase.ipc.RpcServer$Listener.run(RpcServer.java:748) {noformat} And the owner of the lock is LinkedList.remove: {noformat} RpcServer.reader=9,port=60020 daemon prio=10 tid=0x7f1580394000 nid=0x2cc19 runnable [0x43b4c000] java.lang.Thread.State: RUNNABLE at java.util.LinkedList.remove(LinkedList.java:363) at java.util.Collections$SynchronizedCollection.remove(Collections.java:1639) - locked 0x0002bb094ac8 (a java.util.Collections$SynchronizedList) at org.apache.hadoop.hbase.ipc.RpcServer.closeConnection(RpcServer.java:1992) - locked 0x0002bb094ac8 (a java.util.Collections$SynchronizedList) at org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:867) at org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:645) - locked 0x0002bae09a30 (a org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader) at org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:620) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) {noformat} This issue blocked RS once in a while and I had to restart it whenever it happens. It seems like a bug. Any suggestions? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14062) RpcServer.Listener.doAccept get blocked by LinkedList.remove
[ https://issues.apache.org/jira/browse/HBASE-14062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625882#comment-14625882 ] Heng Chen commented on HBASE-14062: --- So i think the lock is hold due to a lot of exceptions throwed by doRead。 When exception throw, doRead will call closeConnection, and closeConnection will hold the lock. And when having too many exceptions, the lock is always acquired by closeConnection, so the lock is always waited by doAccept Why the exception is throwed? RpcServer.Listener.doAccept get blocked by LinkedList.remove Key: HBASE-14062 URL: https://issues.apache.org/jira/browse/HBASE-14062 Project: HBase Issue Type: Bug Components: IPC/RPC Affects Versions: 0.98.12 Reporter: Victor Xu Attachments: hbase.log, jstack.log We saw these blocked info in our jstack output: {noformat} RpcServer.listener,port=60020 daemon prio=10 tid=0x7f158097b800 nid=0x2cd05 waiting for monitor entry [0x46374000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hbase.ipc.RpcServer$Listener.doAccept(RpcServer.java:833) - waiting to lock 0x0002bb094ac8 (a java.util.Collections$SynchronizedList) at org.apache.hadoop.hbase.ipc.RpcServer$Listener.run(RpcServer.java:748) {noformat} And the owner of the lock is LinkedList.remove: {noformat} RpcServer.reader=9,port=60020 daemon prio=10 tid=0x7f1580394000 nid=0x2cc19 runnable [0x43b4c000] java.lang.Thread.State: RUNNABLE at java.util.LinkedList.remove(LinkedList.java:363) at java.util.Collections$SynchronizedCollection.remove(Collections.java:1639) - locked 0x0002bb094ac8 (a java.util.Collections$SynchronizedList) at org.apache.hadoop.hbase.ipc.RpcServer.closeConnection(RpcServer.java:1992) - locked 0x0002bb094ac8 (a java.util.Collections$SynchronizedList) at org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:867) at org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:645) - locked 0x0002bae09a30 (a org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader) at org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:620) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) {noformat} This issue blocked RS once in a while and I had to restart it whenever it happens. It seems like a bug. Any suggestions? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12296) Filters should work with ByteBufferedCell
[ https://issues.apache.org/jira/browse/HBASE-12296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625885#comment-14625885 ] Anoop Sam John commented on HBASE-12296: Thanks for the review Ram We have all compare methods in CellComparator. That is why I thought to keep it there. In fact in my first version (internal) I kept it in CellUtil and later changed my mind. Here the compare methods are bit different as it take Comparators. I agree.. But still they are comparing the passed cells fields. wdyt? Regarding the new API which takes BB in ByteArrayComparable. We have ByteArrayComparable implements Comparablebyte[] For the comparable as such the type is byte[]. But we are not using that standard compare(byte[]) method. We have compare(byte,int,int) as special to avoid bytes copy. So similar lines I though a method which takes BB instead of byte[] is ok. Agree if the name as like BytesComparable it would have been less confusing. So what you say is add this new abstract class and deprecated ByteArrayComparable in favor of the new one ? Am ok with both ways. Filters should work with ByteBufferedCell - Key: HBASE-12296 URL: https://issues.apache.org/jira/browse/HBASE-12296 Project: HBase Issue Type: Sub-task Components: regionserver, Scanners Reporter: ramkrishna.s.vasudevan Assignee: Anoop Sam John Fix For: 2.0.0 Attachments: HBASE-12296_v1.patch Now we have added an extension for Cell in server side, ByteBufferedCell, where Cells are backed by BB (on heap or off heap). When the Cell is backed by off heap buffer, the getXXXArray() APIs has to create temp byte[] and do data copy and return that. This will be bit costly. We have avoided this in areas like CellComparator/SQM etc. Filter area was not touched in that patch. This Jira aims at doing it in Filter area. Eg : SCVF checking the cell value for the given value condition. It uses getValueArray() to get cell value bytes. When the cell is BB backed, it has to use getValueByteBuffer() API instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14073) TestRemoteTable.testDelete failed in the latest trunk code
[ https://issues.apache.org/jira/browse/HBASE-14073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jingcheng Du updated HBASE-14073: - Attachment: HBASE-14073.patch Upload the patch. TestRemoteTable.testDelete failed in the latest trunk code -- Key: HBASE-14073 URL: https://issues.apache.org/jira/browse/HBASE-14073 Project: HBase Issue Type: Bug Components: REST Affects Versions: 2.0.0 Reporter: Jingcheng Du Assignee: Jingcheng Du Attachments: HBASE-14073.patch TestRemoteTable.testDelete failed in the latest trunk code. {code} excepted null, but was: B@615c4156 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14073) TestRemoteTable.testDelete failed in the latest trunk code
[ https://issues.apache.org/jira/browse/HBASE-14073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jingcheng Du updated HBASE-14073: - Status: Patch Available (was: Open) TestRemoteTable.testDelete failed in the latest trunk code -- Key: HBASE-14073 URL: https://issues.apache.org/jira/browse/HBASE-14073 Project: HBase Issue Type: Bug Components: REST Affects Versions: 2.0.0 Reporter: Jingcheng Du Assignee: Jingcheng Du Attachments: HBASE-14073.patch TestRemoteTable.testDelete failed in the latest trunk code. {code} excepted null, but was: B@615c4156 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11339) HBase MOB
[ https://issues.apache.org/jira/browse/HBASE-11339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626198#comment-14626198 ] Hadoop QA commented on HBASE-11339: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12745213/11339-master-v8.patch against master branch at commit a3d30892b41f604ab5a62d4f612fa7c230267dfe. ATTACHMENT ID: 12745213 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 102 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:red}-1 checkstyle{color}. The applied patch generated 1889 checkstyle errors (more than the master's current 1873 errors). {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 lineLengths{color}. The patch introduces the following lines longer than 100: + family.setMobEnabled(JBoolean.valueOf(arg.delete(org.apache.hadoop.hbase.HColumnDescriptor::IS_MOB))) if arg.include?(org.apache.hadoop.hbase.HColumnDescriptor::IS_MOB) + family.setMobThreshold(JLong.valueOf(arg.delete(org.apache.hadoop.hbase.HColumnDescriptor::MOB_THRESHOLD))) if arg.include?(org.apache.hadoop.hbase.HColumnDescriptor::MOB_THRESHOLD) + @admin.compactMob(org.apache.hadoop.hbase.TableName.valueOf(table_name), family.to_java_bytes) + @admin.majorCompactMob(org.apache.hadoop.hbase.TableName.valueOf(table_name), family.to_java_bytes) {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.rest.client.TestRemoteTable Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/14764//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/14764//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/14764//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/14764//console This message is automatically generated. HBase MOB - Key: HBASE-11339 URL: https://issues.apache.org/jira/browse/HBASE-11339 Project: HBase Issue Type: Umbrella Components: regionserver, Scanners Affects Versions: 2.0.0 Reporter: Jingcheng Du Assignee: Jingcheng Du Fix For: hbase-11339 Attachments: 11339-master-v3.txt, 11339-master-v4.txt, 11339-master-v5.txt, 11339-master-v6.txt, 11339-master-v7.txt, 11339-master-v8.patch, 11339-master-v9.patch, HBase MOB Design-v2.pdf, HBase MOB Design-v3.pdf, HBase MOB Design-v4.pdf, HBase MOB Design-v5.pdf, HBase MOB Design.pdf, MOB user guide.docx, MOB user guide_v2.docx, MOB user guide_v3.docx, MOB user guide_v4.docx, MOB user guide_v5.docx, hbase-11339-150519.patch, hbase-11339-in-dev.patch, hbase-11339.150417.patch, merge-150212.patch, merge.150212b.patch, merge.150212c.patch, merge.150710.patch It's quite useful to save the medium binary data like images, documents into Apache HBase. Unfortunately directly saving the binary MOB(medium object) to HBase leads to a worse performance since the frequent split and compaction. In this design, the MOB data are stored in an more efficient way, which keeps a high write/read performance and guarantees the data consistency in Apache HBase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14073) TestRemoteTable.testDelete failed in the latest trunk code
Jingcheng Du created HBASE-14073: Summary: TestRemoteTable.testDelete failed in the latest trunk code Key: HBASE-14073 URL: https://issues.apache.org/jira/browse/HBASE-14073 Project: HBase Issue Type: Bug Components: REST Affects Versions: 2.0.0 Reporter: Jingcheng Du TestRemoteTable.testDelete failed in the latest trunk code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14074) HBase cluster crashed on-the-hour
JoneZhang created HBASE-14074: - Summary: HBase cluster crashed on-the-hour Key: HBASE-14074 URL: https://issues.apache.org/jira/browse/HBASE-14074 Project: HBase Issue Type: Bug Components: Admin Affects Versions: 0.96.2 Environment: Hadoop 2.5.1 HBase 0.96.2 Reporter: JoneZhang I found hbase clutser crashed on-the-hour HBase master running log as follows 2015-07-14 14:41:49,832 DEBUG [master:10.240.131.18:6.oldLogCleaner] master.ReplicationLogCleaner: Didn't find this log in ZK, deleting: 10-241-125-46%2C60020%2C1436841063572.1436851865226 2015-07-14 14:45:49,822 DEBUG [master:10.240.131.18:6.oldLogCleaner] master.ReplicationLogCleaner: Didn't find this log in ZK, deleting: 10-241-85-137%2C60020%2C1436841341086.1436852143141 2015-07-14 15:00:03,481 INFO [main] util.VersionInfo: HBase 0.96.2-hadoop2 2015-07-14 15:00:03,481 INFO [main] util.VersionInfo: Subversion https://svn.apache.org/repos/asf/hbase/tags/0.96.2RC2 -r 1581096 2015-07-14 15:00:03,481 INFO [main] util.VersionInfo: Compiled by stack on Mon Mar 24 16:03:18 PDT 2014 2015-07-14 15:00:03,729 INFO [main] zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52 GMT 2015-07-14 15:00:03,730 INFO [main] zookeeper.ZooKeeper: Client environment:host.name=10-240-131-18 2015-07-14 15:00:03,730 INFO [main] zookeeper.ZooKeeper: Client environment:java.version=1.7.0_72 ... 2015-07-14 15:00:03,749 INFO [main] zookeeper.RecoverableZooKeeper: Process identifier=clean znode for master connecting to ZooKeeper ensemble=10.240.131.17:2200,10.240.131.16:2200,10.240.131.15:2200,10.240.131.14:2200,10.240.131.18:2200 2015-07-14 15:00:03,751 INFO [main-SendThread(10-240-131-18:2200)] zookeeper.ClientCnxn: Opening socket connection to server 10-240-131-18/10.240.131.18:2200. Will not attempt to authenticate using SASL (unknown error) 2015-07-14 15:00:03,757 INFO [main-SendThread(10-240-131-18:2200)] zookeeper.ClientCnxn: Socket connection established to 10-240-131-18/10.240.131.18:2200, initiating session 2015-07-14 15:00:03,764 INFO [main-SendThread(10-240-131-18:2200)] zookeeper.ClientCnxn: Session establishment complete on server 10-240-131-18/10.240.131.18:2200, sessionid = 0x34e8a64b453024a, negotiated timeout = 4 2015-07-14 15:00:04,835 INFO [main] zookeeper.ZooKeeper: Session: 0x34e8a64b453024a closed 2015-07-14 15:00:04,835 INFO [main-EventThread] zookeeper.ClientCnxn: EventThread shut down After print Didn't find this log in ZK... every hour at a time The master dead Zookeeper running log as follows 2015-07-14 15:00:03,756 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2200:NIOServerCnxnFactory@197] - Accepted socket connection from /10.240.131.18:52733 2015-07-14 15:00:03,761 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2200:ZooKeeperServer@868] - Client attempting to establish new session at /10.240.131.18:52733 2015-07-14 15:00:03,762 [myid:3] - INFO [CommitProcessor:3:ZooKeeperServer@617] - Established session 0x34e8a64b453024a with negotiated timeout 4 for client /10.240.131.18:52733 2015-07-14 15:00:04,836 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2200:NIOServerCnxn@1007] - Closed socket connection for client /10.240.131.18:52733 which had sessionid 0x34e8a64b453024a -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14058) Stabilizing default heap memory tuner
[ https://issues.apache.org/jira/browse/HBASE-14058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626032#comment-14626032 ] Anoop Sam John commented on HBASE-14058: The eviction usage related discussion let us have in another Jira. Let us get this improvement in first. Stabilizing default heap memory tuner - Key: HBASE-14058 URL: https://issues.apache.org/jira/browse/HBASE-14058 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 2.0.0, 1.2.0, 1.3.0 Reporter: Abhilash Assignee: Abhilash Attachments: HBASE-14058-v1.patch, HBASE-14058.patch, after_modifications.png, before_modifications.png The memory tuner works well in general cases but when we have a work load that is both read heavy as well as write heavy the tuner does too many tuning. We should try to control the number of tuner operation and stabilize it. The main problem was that the tuner thinks it is in steady state even if it sees just one neutral tuner period thus does too many tuning operations and too many reverts that too with large step sizes(step size was set to maximum even after one neutral period). So to stop this I have thought of these steps: 1) The division created by μ + δ/2 and μ - δ/2 is too small. Statistically ~62% periods will lie outside this range, which means 62% of the data points are considered either high or low which is too much. Use μ + δ*0.8 and μ - δ*0.8 instead. On expectations it will decrease number of tuner operations per 100 periods from 19 to just 10. If we use δ/2 then 31% of data values will be considered to be high and 31% will be considered to be low (2*0.31 * 0.31 = 0.19), on the other hand if we use δ*0.8 then 22% will be low and 22% will be high(2*0.22*0.22 ~ 0.10). 2) Defining proper steady state by looking at past few periods(it is equal to hbase.regionserver.heapmemory.autotuner.lookup.periods) rather than just last tuner operation. We say tuner is in steady state when last few tuner periods were NEUTRAL. We keep decreasing step size unless it is extremely low. Then leave system in that state for some time. 3) Rather then decreasing step size only while reverting, decrease the magnitude of step size whenever we are trying to revert tuning done in last few periods(sum the changes of last few periods and compare to current step) rather than just looking at last period. When its magnitude gets too low then make tuner steps NEUTRAL(no operation). This will cause step size to continuously decrease unless we reach steady state. After that tuning process will restart (tuner step size rests again when we reach steady state). 4) The tuning done in last few periods will be decaying sum of past tuner steps with sign. This parameter will be positive for increase in memstore and negative for increase in block cache. Rather than using arithmetic mean we use this to give more priority to recent tuner steps. Please see the attachments. One represents the size of memstore(green) and size of block cache(blue) adjusted by tuner without these modification and other with the above modifications. The x-axis is time axis and y-axis is the fraction of heap memory available to memstore and block cache at that time(it always sums up to 80%). I configured min/max ranges for both components to 0.1 and 0.7 respectively(so in the plots the y-axis min and max is 0.1 and 0.7). In both cases the tuner tries to distribute memory by giving ~15% to memstore and ~65% to block cache. But the modified one does it much more smoothly. I got these results from YCSB test. The test was doing approximately 5000 inserts and 500 reads per second (for one region server). The results can be further fine tuned and number of tuner operation can be reduced with these changes in configuration. For more fine tuning: a) lower max step size (suggested = 4%) b) lower min step size ( default if also fine ) To further decrease frequency of tuning operations: c) increase the number of lookup periods ( in the tests it was just 10, default is 60 ) d) increase tuner period ( in the tests it was just 20 secs, default is 60secs) I used smaller tuner period/ number of look up periods to get more data points. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HBASE-14073) TestRemoteTable.testDelete failed in the latest trunk code
[ https://issues.apache.org/jira/browse/HBASE-14073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jingcheng Du reassigned HBASE-14073: Assignee: Jingcheng Du TestRemoteTable.testDelete failed in the latest trunk code -- Key: HBASE-14073 URL: https://issues.apache.org/jira/browse/HBASE-14073 Project: HBase Issue Type: Bug Components: REST Affects Versions: 2.0.0 Reporter: Jingcheng Du Assignee: Jingcheng Du TestRemoteTable.testDelete failed in the latest trunk code. {code} excepted null, but was: B@615c4156 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11339) HBase MOB
[ https://issues.apache.org/jira/browse/HBASE-11339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626130#comment-14626130 ] Hadoop QA commented on HBASE-11339: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12745215/11339-master-v9.patch against master branch at commit a3d30892b41f604ab5a62d4f612fa7c230267dfe. ATTACHMENT ID: 12745215 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 102 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:red}-1 checkstyle{color}. The applied patch generated 1889 checkstyle errors (more than the master's current 1873 errors). {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 lineLengths{color}. The patch introduces the following lines longer than 100: + family.setMobEnabled(JBoolean.valueOf(arg.delete(org.apache.hadoop.hbase.HColumnDescriptor::IS_MOB))) if arg.include?(org.apache.hadoop.hbase.HColumnDescriptor::IS_MOB) + family.setMobThreshold(JLong.valueOf(arg.delete(org.apache.hadoop.hbase.HColumnDescriptor::MOB_THRESHOLD))) if arg.include?(org.apache.hadoop.hbase.HColumnDescriptor::MOB_THRESHOLD) + @admin.compactMob(org.apache.hadoop.hbase.TableName.valueOf(table_name), family.to_java_bytes) + @admin.majorCompactMob(org.apache.hadoop.hbase.TableName.valueOf(table_name), family.to_java_bytes) {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: {color:red}-1 core zombie tests{color}. There are 1 zombie test(s): at org.apache.hadoop.hbase.TestChoreService.testForceTrigger(TestChoreService.java:398) Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/14765//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/14765//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/14765//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/14765//console This message is automatically generated. HBase MOB - Key: HBASE-11339 URL: https://issues.apache.org/jira/browse/HBASE-11339 Project: HBase Issue Type: Umbrella Components: regionserver, Scanners Affects Versions: 2.0.0 Reporter: Jingcheng Du Assignee: Jingcheng Du Fix For: hbase-11339 Attachments: 11339-master-v3.txt, 11339-master-v4.txt, 11339-master-v5.txt, 11339-master-v6.txt, 11339-master-v7.txt, 11339-master-v8.patch, 11339-master-v9.patch, HBase MOB Design-v2.pdf, HBase MOB Design-v3.pdf, HBase MOB Design-v4.pdf, HBase MOB Design-v5.pdf, HBase MOB Design.pdf, MOB user guide.docx, MOB user guide_v2.docx, MOB user guide_v3.docx, MOB user guide_v4.docx, MOB user guide_v5.docx, hbase-11339-150519.patch, hbase-11339-in-dev.patch, hbase-11339.150417.patch, merge-150212.patch, merge.150212b.patch, merge.150212c.patch, merge.150710.patch It's quite useful to save the medium binary data like images, documents into Apache HBase. Unfortunately directly saving the binary MOB(medium object) to HBase leads to a worse performance since the frequent split and compaction. In this design, the MOB data are stored in an more efficient way, which keeps a high write/read performance and guarantees the data consistency in Apache HBase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-11339) HBase MOB
[ https://issues.apache.org/jira/browse/HBASE-11339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jingcheng Du updated HBASE-11339: - Attachment: 11339-master-v9.patch Upload a new patch V9 to include a few minor changes in code style. This patch is uploaded to RB. You can find it in the link https://reviews.apache.org/r/36391/. Thanks. HBase MOB - Key: HBASE-11339 URL: https://issues.apache.org/jira/browse/HBASE-11339 Project: HBase Issue Type: Umbrella Components: regionserver, Scanners Affects Versions: 2.0.0 Reporter: Jingcheng Du Assignee: Jingcheng Du Fix For: hbase-11339 Attachments: 11339-master-v3.txt, 11339-master-v4.txt, 11339-master-v5.txt, 11339-master-v6.txt, 11339-master-v7.txt, 11339-master-v8.patch, 11339-master-v9.patch, HBase MOB Design-v2.pdf, HBase MOB Design-v3.pdf, HBase MOB Design-v4.pdf, HBase MOB Design-v5.pdf, HBase MOB Design.pdf, MOB user guide.docx, MOB user guide_v2.docx, MOB user guide_v3.docx, MOB user guide_v4.docx, MOB user guide_v5.docx, hbase-11339-150519.patch, hbase-11339-in-dev.patch, hbase-11339.150417.patch, merge-150212.patch, merge.150212b.patch, merge.150212c.patch, merge.150710.patch It's quite useful to save the medium binary data like images, documents into Apache HBase. Unfortunately directly saving the binary MOB(medium object) to HBase leads to a worse performance since the frequent split and compaction. In this design, the MOB data are stored in an more efficient way, which keeps a high write/read performance and guarantees the data consistency in Apache HBase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14073) TestRemoteTable.testDelete failed in the latest trunk code
[ https://issues.apache.org/jira/browse/HBASE-14073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jingcheng Du updated HBASE-14073: - Description: TestRemoteTable.testDelete failed in the latest trunk code. {code} excepted null, but was: B@615c4156 {code} was:TestRemoteTable.testDelete failed in the latest trunk code. TestRemoteTable.testDelete failed in the latest trunk code -- Key: HBASE-14073 URL: https://issues.apache.org/jira/browse/HBASE-14073 Project: HBase Issue Type: Bug Components: REST Affects Versions: 2.0.0 Reporter: Jingcheng Du TestRemoteTable.testDelete failed in the latest trunk code. {code} excepted null, but was: B@615c4156 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14073) TestRemoteTable.testDelete failed in the latest trunk code
[ https://issues.apache.org/jira/browse/HBASE-14073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626068#comment-14626068 ] ramkrishna.s.vasudevan commented on HBASE-14073: +1. Caused by HBASE-14047 checkin. Thanks for catching it. TestRemoteTable.testDelete failed in the latest trunk code -- Key: HBASE-14073 URL: https://issues.apache.org/jira/browse/HBASE-14073 Project: HBase Issue Type: Bug Components: REST Affects Versions: 2.0.0 Reporter: Jingcheng Du Assignee: Jingcheng Du Attachments: HBASE-14073.patch TestRemoteTable.testDelete failed in the latest trunk code. {code} excepted null, but was: B@615c4156 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14073) TestRemoteTable.testDelete failed in the latest trunk code
[ https://issues.apache.org/jira/browse/HBASE-14073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626077#comment-14626077 ] Anoop Sam John commented on HBASE-14073: Caused by HBASE-14047. If you have not analyzed this yet, I got the reason why it failed. May be it can be corrected by a patch here or else I will reopen the other issue and give an addendum. TestRemoteTable.testDelete failed in the latest trunk code -- Key: HBASE-14073 URL: https://issues.apache.org/jira/browse/HBASE-14073 Project: HBase Issue Type: Bug Components: REST Affects Versions: 2.0.0 Reporter: Jingcheng Du Assignee: Jingcheng Du Attachments: HBASE-14073.patch TestRemoteTable.testDelete failed in the latest trunk code. {code} excepted null, but was: B@615c4156 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14066) clean out old docbook docs from branch-1
[ https://issues.apache.org/jira/browse/HBASE-14066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626078#comment-14626078 ] Hudson commented on HBASE-14066: FAILURE: Integrated in HBase-1.3-IT #39 (See [https://builds.apache.org/job/HBase-1.3-IT/39/]) HBASE-14066 clean out old docbook docs from branch-1. (busbey: rev fdd2692f340d9822171a6cf640e6dfe4b839a9fc) * src/main/docbkx/appendix_hfile_format.xml * src/main/docbkx/unit_testing.xml * src/main/docbkx/security.xml * src/main/docbkx/tracing.xml * src/main/docbkx/appendix_contributing_to_documentation.xml * src/main/docbkx/configuration.xml * src/main/docbkx/ops_mgt.xml * src/main/docbkx/zookeeper.xml * src/main/docbkx/schema_design.xml * src/main/docbkx/troubleshooting.xml * src/main/docbkx/shell.xml * src/main/docbkx/external_apis.xml * src/main/docbkx/cp.xml * src/main/docbkx/developer.xml * src/main/docbkx/appendix_acl_matrix.xml * src/main/docbkx/book.xml * src/main/docbkx/upgrading.xml * src/main/docbkx/case_studies.xml * src/main/docbkx/performance.xml * src/main/docbkx/getting_started.xml * src/main/docbkx/community.xml * src/main/docbkx/customization.xsl * src/main/docbkx/hbase_apis.xml * src/main/docbkx/thrift_filter_language.xml * src/main/docbkx/rpc.xml * src/main/docbkx/preface.xml clean out old docbook docs from branch-1 Key: HBASE-14066 URL: https://issues.apache.org/jira/browse/HBASE-14066 Project: HBase Issue Type: Task Components: documentation Affects Versions: 1.1.0 Reporter: Sean Busbey Assignee: Sean Busbey Fix For: 1.2.0, 1.3.0 Attachments: HBASE-14066-branch-1.v1.patch branch-1 has the old docbook docs and a placeholder for the new asciidoc docs. Since we make all documentation changes to asciidoc and copy it over the documentation for a release line, excise the docbook directory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14073) TestRemoteTable.testDelete failed in the latest trunk code
[ https://issues.apache.org/jira/browse/HBASE-14073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anoop Sam John updated HBASE-14073: --- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.0.0 Status: Resolved (was: Patch Available) Thanks Jingcheng. TestRemoteTable.testDelete failed in the latest trunk code -- Key: HBASE-14073 URL: https://issues.apache.org/jira/browse/HBASE-14073 Project: HBase Issue Type: Bug Components: REST Affects Versions: 2.0.0 Reporter: Jingcheng Du Assignee: Jingcheng Du Fix For: 2.0.0 Attachments: HBASE-14073.patch TestRemoteTable.testDelete failed in the latest trunk code. {code} excepted null, but was: B@615c4156 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14075) HBaseClusterManager should use port(if given) to find pid
Yu Li created HBASE-14075: - Summary: HBaseClusterManager should use port(if given) to find pid Key: HBASE-14075 URL: https://issues.apache.org/jira/browse/HBASE-14075 Project: HBase Issue Type: Bug Reporter: Yu Li Assignee: Yu Li Priority: Minor This issue is found while we run ITBLL in distributed cluster. Our testing env is kind of special that we run multiple regionserver instance on a single physical machine, so {noformat}ps -ef | grep proc_regionserver{noformat} will return more than one line, thus cause the tool might check/kill the wrong process Actually in HBaseClusterManager we already introduce port as a parameter for methods like isRunning, kill, etc. So the only thing to do here is to get pid through port if port is given -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14045) Bumping thrift version to 0.9.2.
[ https://issues.apache.org/jira/browse/HBASE-14045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626132#comment-14626132 ] Hadoop QA commented on HBASE-14045: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12745184/HBASE-14045-branch-1.patch against branch-1 branch at commit a3d30892b41f604ab5a62d4f612fa7c230267dfe. ATTACHMENT ID: 12745184 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+0 tests included{color}. The patch appears to be a documentation, build, or dev-support patch that doesn't require tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/14766//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/14766//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/14766//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/14766//console This message is automatically generated. Bumping thrift version to 0.9.2. Key: HBASE-14045 URL: https://issues.apache.org/jira/browse/HBASE-14045 Project: HBase Issue Type: Improvement Reporter: Srikanth Srungarapu Assignee: Srikanth Srungarapu Fix For: 2.0.0, 1.3.0 Attachments: HBASE-14045-branch-1.patch, HBASE-14045.patch, compat_report.html From mailing list conversation: {quote} Currently, HBase is using Thrift 0.9.0 version, with the latest version being 0.9.2. Currently, the HBase Thrift gateway is vulnerable to crashes due to THRIFT-2660 when used with default transport and the workaround for this problem is switching to framed transport. Unfortunately, the recently added impersonation support \[1\] doesn't work with framed transport leaving thrift gateway using this feature susceptible to crashes. Updating thrift version to 0.9.2 will help us in mitigating this problem. Given that security is one of key requirements for the production clusters, it would be good to ensure our users that security features in thrift gateway can be used without any major concerns. Aside this, there are also some nice fixes pertaining to leaky resources in 0.9.2 like \[2\] and \[3\]. As far compatibility guarantees are concerned, thrift assures 100% wire compatibility. However, it is my understanding that there were some minor additions (new API) in 0.9.2 \[4\] which won't work in 0.9.0, but that won't affect us since we are not using those features. And I tried running test suite and did manual testing with thrift version set to 0.9.2 and things are running smoothly. If there are no objections to this change, I would be more than happy to file a jira and follow this up. \[1\] https://issues.apache.org/jira/browse/HBASE-11349 \[2\] https://issues.apache.org/jira/browse/THRIFT-2274 \[3\] https://issues.apache.org/jira/browse/THRIFT-2359 \[4\] https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310800version=12324954 {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-11339) HBase MOB
[ https://issues.apache.org/jira/browse/HBASE-11339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jingcheng Du updated HBASE-11339: - Attachment: 11339-master-v8.patch The patch V8 is uploaded. # Refine the class import. # Shorten the long lines. # Some minor code changes. This patch is uploaded to RB too, you can review it through the link https://reviews.apache.org/r/36391/. Thanks. HBase MOB - Key: HBASE-11339 URL: https://issues.apache.org/jira/browse/HBASE-11339 Project: HBase Issue Type: Umbrella Components: regionserver, Scanners Affects Versions: 2.0.0 Reporter: Jingcheng Du Assignee: Jingcheng Du Fix For: hbase-11339 Attachments: 11339-master-v3.txt, 11339-master-v4.txt, 11339-master-v5.txt, 11339-master-v6.txt, 11339-master-v7.txt, 11339-master-v8.patch, HBase MOB Design-v2.pdf, HBase MOB Design-v3.pdf, HBase MOB Design-v4.pdf, HBase MOB Design-v5.pdf, HBase MOB Design.pdf, MOB user guide.docx, MOB user guide_v2.docx, MOB user guide_v3.docx, MOB user guide_v4.docx, MOB user guide_v5.docx, hbase-11339-150519.patch, hbase-11339-in-dev.patch, hbase-11339.150417.patch, merge-150212.patch, merge.150212b.patch, merge.150212c.patch, merge.150710.patch It's quite useful to save the medium binary data like images, documents into Apache HBase. Unfortunately directly saving the binary MOB(medium object) to HBase leads to a worse performance since the frequent split and compaction. In this design, the MOB data are stored in an more efficient way, which keeps a high write/read performance and guarantees the data consistency in Apache HBase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14045) Bumping thrift version to 0.9.2.
[ https://issues.apache.org/jira/browse/HBASE-14045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Srikanth Srungarapu updated HBASE-14045: Attachment: compat_report.html Bumping thrift version to 0.9.2. Key: HBASE-14045 URL: https://issues.apache.org/jira/browse/HBASE-14045 Project: HBase Issue Type: Improvement Reporter: Srikanth Srungarapu Assignee: Srikanth Srungarapu Fix For: 2.0.0, 1.3.0 Attachments: HBASE-14045-branch-1.patch, HBASE-14045.patch, compat_report.html From mailing list conversation: {quote} Currently, HBase is using Thrift 0.9.0 version, with the latest version being 0.9.2. Currently, the HBase Thrift gateway is vulnerable to crashes due to THRIFT-2660 when used with default transport and the workaround for this problem is switching to framed transport. Unfortunately, the recently added impersonation support \[1\] doesn't work with framed transport leaving thrift gateway using this feature susceptible to crashes. Updating thrift version to 0.9.2 will help us in mitigating this problem. Given that security is one of key requirements for the production clusters, it would be good to ensure our users that security features in thrift gateway can be used without any major concerns. Aside this, there are also some nice fixes pertaining to leaky resources in 0.9.2 like \[2\] and \[3\]. As far compatibility guarantees are concerned, thrift assures 100% wire compatibility. However, it is my understanding that there were some minor additions (new API) in 0.9.2 \[4\] which won't work in 0.9.0, but that won't affect us since we are not using those features. And I tried running test suite and did manual testing with thrift version set to 0.9.2 and things are running smoothly. If there are no objections to this change, I would be more than happy to file a jira and follow this up. \[1\] https://issues.apache.org/jira/browse/HBASE-11349 \[2\] https://issues.apache.org/jira/browse/THRIFT-2274 \[3\] https://issues.apache.org/jira/browse/THRIFT-2359 \[4\] https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310800version=12324954 {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14073) TestRemoteTable.testDelete failed in the latest trunk code
[ https://issues.apache.org/jira/browse/HBASE-14073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626079#comment-14626079 ] Anoop Sam John commented on HBASE-14073: ok.. seeing the comments late.. :-) TestRemoteTable.testDelete failed in the latest trunk code -- Key: HBASE-14073 URL: https://issues.apache.org/jira/browse/HBASE-14073 Project: HBase Issue Type: Bug Components: REST Affects Versions: 2.0.0 Reporter: Jingcheng Du Assignee: Jingcheng Du Attachments: HBASE-14073.patch TestRemoteTable.testDelete failed in the latest trunk code. {code} excepted null, but was: B@615c4156 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14045) Bumping thrift version to 0.9.2.
[ https://issues.apache.org/jira/browse/HBASE-14045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Srikanth Srungarapu updated HBASE-14045: Attachment: HBASE-14045.patch Bumping thrift version to 0.9.2. Key: HBASE-14045 URL: https://issues.apache.org/jira/browse/HBASE-14045 Project: HBase Issue Type: Improvement Reporter: Srikanth Srungarapu Assignee: Srikanth Srungarapu Fix For: 2.0.0, 1.3.0 Attachments: HBASE-14045-branch-1.patch, HBASE-14045.patch, compat_report.html From mailing list conversation: {quote} Currently, HBase is using Thrift 0.9.0 version, with the latest version being 0.9.2. Currently, the HBase Thrift gateway is vulnerable to crashes due to THRIFT-2660 when used with default transport and the workaround for this problem is switching to framed transport. Unfortunately, the recently added impersonation support \[1\] doesn't work with framed transport leaving thrift gateway using this feature susceptible to crashes. Updating thrift version to 0.9.2 will help us in mitigating this problem. Given that security is one of key requirements for the production clusters, it would be good to ensure our users that security features in thrift gateway can be used without any major concerns. Aside this, there are also some nice fixes pertaining to leaky resources in 0.9.2 like \[2\] and \[3\]. As far compatibility guarantees are concerned, thrift assures 100% wire compatibility. However, it is my understanding that there were some minor additions (new API) in 0.9.2 \[4\] which won't work in 0.9.0, but that won't affect us since we are not using those features. And I tried running test suite and did manual testing with thrift version set to 0.9.2 and things are running smoothly. If there are no objections to this change, I would be more than happy to file a jira and follow this up. \[1\] https://issues.apache.org/jira/browse/HBASE-11349 \[2\] https://issues.apache.org/jira/browse/THRIFT-2274 \[3\] https://issues.apache.org/jira/browse/THRIFT-2359 \[4\] https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310800version=12324954 {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14045) Bumping thrift version to 0.9.2.
[ https://issues.apache.org/jira/browse/HBASE-14045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Srikanth Srungarapu updated HBASE-14045: Attachment: (was: HBASE-14045.patch) Bumping thrift version to 0.9.2. Key: HBASE-14045 URL: https://issues.apache.org/jira/browse/HBASE-14045 Project: HBase Issue Type: Improvement Reporter: Srikanth Srungarapu Assignee: Srikanth Srungarapu Fix For: 2.0.0, 1.3.0 Attachments: HBASE-14045-branch-1.patch, HBASE-14045.patch, compat_report.html From mailing list conversation: {quote} Currently, HBase is using Thrift 0.9.0 version, with the latest version being 0.9.2. Currently, the HBase Thrift gateway is vulnerable to crashes due to THRIFT-2660 when used with default transport and the workaround for this problem is switching to framed transport. Unfortunately, the recently added impersonation support \[1\] doesn't work with framed transport leaving thrift gateway using this feature susceptible to crashes. Updating thrift version to 0.9.2 will help us in mitigating this problem. Given that security is one of key requirements for the production clusters, it would be good to ensure our users that security features in thrift gateway can be used without any major concerns. Aside this, there are also some nice fixes pertaining to leaky resources in 0.9.2 like \[2\] and \[3\]. As far compatibility guarantees are concerned, thrift assures 100% wire compatibility. However, it is my understanding that there were some minor additions (new API) in 0.9.2 \[4\] which won't work in 0.9.0, but that won't affect us since we are not using those features. And I tried running test suite and did manual testing with thrift version set to 0.9.2 and things are running smoothly. If there are no objections to this change, I would be more than happy to file a jira and follow this up. \[1\] https://issues.apache.org/jira/browse/HBASE-11349 \[2\] https://issues.apache.org/jira/browse/THRIFT-2274 \[3\] https://issues.apache.org/jira/browse/THRIFT-2359 \[4\] https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310800version=12324954 {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14075) HBaseClusterManager should use port(if given) to find pid
[ https://issues.apache.org/jira/browse/HBASE-14075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu Li updated HBASE-14075: -- Attachment: HBASE-14075.patch The fix is straight forward, attaching the patch HBaseClusterManager should use port(if given) to find pid - Key: HBASE-14075 URL: https://issues.apache.org/jira/browse/HBASE-14075 Project: HBase Issue Type: Bug Reporter: Yu Li Assignee: Yu Li Priority: Minor Attachments: HBASE-14075.patch This issue is found while we run ITBLL in distributed cluster. Our testing env is kind of special that we run multiple regionserver instance on a single physical machine, so {noformat}ps -ef | grep proc_regionserver{noformat} will return more than one line, thus cause the tool might check/kill the wrong process Actually in HBaseClusterManager we already introduce port as a parameter for methods like isRunning, kill, etc. So the only thing to do here is to get pid through port if port is given -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14073) TestRemoteTable.testDelete failed in the latest trunk code
[ https://issues.apache.org/jira/browse/HBASE-14073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626126#comment-14626126 ] Ashish Singhi commented on HBASE-14073: --- Thanks for the patch Jingcheng. TestRemoteTable.testDelete failed in the latest trunk code -- Key: HBASE-14073 URL: https://issues.apache.org/jira/browse/HBASE-14073 Project: HBase Issue Type: Bug Components: REST Affects Versions: 2.0.0 Reporter: Jingcheng Du Assignee: Jingcheng Du Fix For: 2.0.0 Attachments: HBASE-14073.patch TestRemoteTable.testDelete failed in the latest trunk code. {code} excepted null, but was: B@615c4156 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14045) Bumping thrift version to 0.9.2.
[ https://issues.apache.org/jira/browse/HBASE-14045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626280#comment-14626280 ] Hadoop QA commented on HBASE-14045: --- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12745228/HBASE-14045.patch against master branch at commit 2f327c911056d02813f642503db9a4383e8b4a2f. ATTACHMENT ID: 12745228 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+0 tests included{color}. The patch appears to be a documentation, build, or dev-support patch that doesn't require tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/14768//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/14768//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/14768//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/14768//console This message is automatically generated. Bumping thrift version to 0.9.2. Key: HBASE-14045 URL: https://issues.apache.org/jira/browse/HBASE-14045 Project: HBase Issue Type: Improvement Reporter: Srikanth Srungarapu Assignee: Srikanth Srungarapu Fix For: 2.0.0, 1.3.0 Attachments: HBASE-14045-branch-1.patch, HBASE-14045.patch, compat_report.html From mailing list conversation: {quote} Currently, HBase is using Thrift 0.9.0 version, with the latest version being 0.9.2. Currently, the HBase Thrift gateway is vulnerable to crashes due to THRIFT-2660 when used with default transport and the workaround for this problem is switching to framed transport. Unfortunately, the recently added impersonation support \[1\] doesn't work with framed transport leaving thrift gateway using this feature susceptible to crashes. Updating thrift version to 0.9.2 will help us in mitigating this problem. Given that security is one of key requirements for the production clusters, it would be good to ensure our users that security features in thrift gateway can be used without any major concerns. Aside this, there are also some nice fixes pertaining to leaky resources in 0.9.2 like \[2\] and \[3\]. As far compatibility guarantees are concerned, thrift assures 100% wire compatibility. However, it is my understanding that there were some minor additions (new API) in 0.9.2 \[4\] which won't work in 0.9.0, but that won't affect us since we are not using those features. And I tried running test suite and did manual testing with thrift version set to 0.9.2 and things are running smoothly. If there are no objections to this change, I would be more than happy to file a jira and follow this up. \[1\] https://issues.apache.org/jira/browse/HBASE-11349 \[2\] https://issues.apache.org/jira/browse/THRIFT-2274 \[3\] https://issues.apache.org/jira/browse/THRIFT-2359 \[4\] https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310800version=12324954 {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HBASE-14074) HBase cluster crashed on-the-hour
[ https://issues.apache.org/jira/browse/HBASE-14074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] JoneZhang reassigned HBASE-14074: - Assignee: Andrew Purtell HBase cluster crashed on-the-hour -- Key: HBASE-14074 URL: https://issues.apache.org/jira/browse/HBASE-14074 Project: HBase Issue Type: Bug Components: Admin Affects Versions: 0.96.2 Environment: Hadoop 2.5.1 HBase 0.96.2 Reporter: JoneZhang Assignee: Andrew Purtell I found hbase clutser crashed on-the-hour HBase master running log as follows 2015-07-14 14:41:49,832 DEBUG [master:10.240.131.18:6.oldLogCleaner] master.ReplicationLogCleaner: Didn't find this log in ZK, deleting: 10-241-125-46%2C60020%2C1436841063572.1436851865226 2015-07-14 14:45:49,822 DEBUG [master:10.240.131.18:6.oldLogCleaner] master.ReplicationLogCleaner: Didn't find this log in ZK, deleting: 10-241-85-137%2C60020%2C1436841341086.1436852143141 2015-07-14 15:00:03,481 INFO [main] util.VersionInfo: HBase 0.96.2-hadoop2 2015-07-14 15:00:03,481 INFO [main] util.VersionInfo: Subversion https://svn.apache.org/repos/asf/hbase/tags/0.96.2RC2 -r 1581096 2015-07-14 15:00:03,481 INFO [main] util.VersionInfo: Compiled by stack on Mon Mar 24 16:03:18 PDT 2014 2015-07-14 15:00:03,729 INFO [main] zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52 GMT 2015-07-14 15:00:03,730 INFO [main] zookeeper.ZooKeeper: Client environment:host.name=10-240-131-18 2015-07-14 15:00:03,730 INFO [main] zookeeper.ZooKeeper: Client environment:java.version=1.7.0_72 ... 2015-07-14 15:00:03,749 INFO [main] zookeeper.RecoverableZooKeeper: Process identifier=clean znode for master connecting to ZooKeeper ensemble=10.240.131.17:2200,10.240.131.16:2200,10.240.131.15:2200,10.240.131.14:2200,10.240.131.18:2200 2015-07-14 15:00:03,751 INFO [main-SendThread(10-240-131-18:2200)] zookeeper.ClientCnxn: Opening socket connection to server 10-240-131-18/10.240.131.18:2200. Will not attempt to authenticate using SASL (unknown error) 2015-07-14 15:00:03,757 INFO [main-SendThread(10-240-131-18:2200)] zookeeper.ClientCnxn: Socket connection established to 10-240-131-18/10.240.131.18:2200, initiating session 2015-07-14 15:00:03,764 INFO [main-SendThread(10-240-131-18:2200)] zookeeper.ClientCnxn: Session establishment complete on server 10-240-131-18/10.240.131.18:2200, sessionid = 0x34e8a64b453024a, negotiated timeout = 4 2015-07-14 15:00:04,835 INFO [main] zookeeper.ZooKeeper: Session: 0x34e8a64b453024a closed 2015-07-14 15:00:04,835 INFO [main-EventThread] zookeeper.ClientCnxn: EventThread shut down After print Didn't find this log in ZK... every hour at a time The master dead Zookeeper running log as follows 2015-07-14 15:00:03,756 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2200:NIOServerCnxnFactory@197] - Accepted socket connection from /10.240.131.18:52733 2015-07-14 15:00:03,761 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2200:ZooKeeperServer@868] - Client attempting to establish new session at /10.240.131.18:52733 2015-07-14 15:00:03,762 [myid:3] - INFO [CommitProcessor:3:ZooKeeperServer@617] - Established session 0x34e8a64b453024a with negotiated timeout 4 for client /10.240.131.18:52733 2015-07-14 15:00:04,836 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2200:NIOServerCnxn@1007] - Closed socket connection for client /10.240.131.18:52733 which had sessionid 0x34e8a64b453024a -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13408) HBase In-Memory Memstore Compaction
[ https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626301#comment-14626301 ] Eshcar Hillel commented on HBASE-13408: --- Hi - we are back with an implementation of the basic feature (see link to the review board for the HBASE-13408-098 code), and some experimental results. We were able to show 30-65% performance gain for read accesses in high-churn workloads (comprising of 50% reads and 50% writes), and mainly to maintain predictable latency SLA (see performance evaluation document for full results). We’ve also adapted the design document to reflect the code, specifically renaming some classes and describing the changes we made in the region flushing policy. (see design document ver02). HBase In-Memory Memstore Compaction --- Key: HBASE-13408 URL: https://issues.apache.org/jira/browse/HBASE-13408 Project: HBase Issue Type: New Feature Reporter: Eshcar Hillel Attachments: HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf, HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf, InMemoryMemstoreCompactionEvaluationResults.pdf A store unit holds a column family in a region, where the memstore is its in-memory component. The memstore absorbs all updates to the store; from time to time these updates are flushed to a file on disk, where they are compacted. Unlike disk components, the memstore is not compacted until it is written to the filesystem and optionally to block-cache. This may result in underutilization of the memory due to duplicate entries per row, for example, when hot data is continuously updated. Generally, the faster the data is accumulated in memory, more flushes are triggered, the data sinks to disk more frequently, slowing down retrieval of data, even if very recent. In high-churn workloads, compacting the memstore can help maintain the data in memory, and thereby speed up data retrieval. We suggest a new compacted memstore with the following principles: 1.The data is kept in memory for as long as possible 2.Memstore data is either compacted or in process of being compacted 3.Allow a panic mode, which may interrupt an in-progress compaction and force a flush of part of the memstore. We suggest applying this optimization only to in-memory column families. A design document is attached. This feature was previously discussed in HBASE-5311. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14073) TestRemoteTable.testDelete failed in the latest trunk code
[ https://issues.apache.org/jira/browse/HBASE-14073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626237#comment-14626237 ] Hadoop QA commented on HBASE-14073: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12745221/HBASE-14073.patch against master branch at commit a3d30892b41f604ab5a62d4f612fa7c230267dfe. ATTACHMENT ID: 12745221 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/14767//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/14767//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/14767//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/14767//console This message is automatically generated. TestRemoteTable.testDelete failed in the latest trunk code -- Key: HBASE-14073 URL: https://issues.apache.org/jira/browse/HBASE-14073 Project: HBase Issue Type: Bug Components: REST Affects Versions: 2.0.0 Reporter: Jingcheng Du Assignee: Jingcheng Du Fix For: 2.0.0 Attachments: HBASE-14073.patch TestRemoteTable.testDelete failed in the latest trunk code. {code} excepted null, but was: B@615c4156 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13408) HBase In-Memory Memstore Compaction
[ https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eshcar Hillel updated HBASE-13408: -- Attachment: InMemoryMemstoreCompactionEvaluationResults.pdf HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf HBase In-Memory Memstore Compaction --- Key: HBASE-13408 URL: https://issues.apache.org/jira/browse/HBASE-13408 Project: HBase Issue Type: New Feature Reporter: Eshcar Hillel Attachments: HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf, HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf, InMemoryMemstoreCompactionEvaluationResults.pdf A store unit holds a column family in a region, where the memstore is its in-memory component. The memstore absorbs all updates to the store; from time to time these updates are flushed to a file on disk, where they are compacted. Unlike disk components, the memstore is not compacted until it is written to the filesystem and optionally to block-cache. This may result in underutilization of the memory due to duplicate entries per row, for example, when hot data is continuously updated. Generally, the faster the data is accumulated in memory, more flushes are triggered, the data sinks to disk more frequently, slowing down retrieval of data, even if very recent. In high-churn workloads, compacting the memstore can help maintain the data in memory, and thereby speed up data retrieval. We suggest a new compacted memstore with the following principles: 1.The data is kept in memory for as long as possible 2.Memstore data is either compacted or in process of being compacted 3.Allow a panic mode, which may interrupt an in-progress compaction and force a flush of part of the memstore. We suggest applying this optimization only to in-memory column families. A design document is attached. This feature was previously discussed in HBASE-5311. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13408) HBase In-Memory Memstore Compaction
[ https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626309#comment-14626309 ] Eshcar Hillel commented on HBASE-13408: --- A comment and a request: we’ve yet to address the WAL truncating issue. The problem is twofold: (1) if the region comprises only an in-memory column (store) then flush may not occur for a long time resulting in a big log which in turn may significantly increase MTTR. This is bad. (2) if the in-memory column (store) is part of a region with default stores then flushes do occur, and the WAL truncates even entries it should not. Specifically it truncate entries of the in-memory store that are still present in the memstore, that is, not eliminated by compaction and not flushed to disk. This is a real threat to HBase durability guarantees. The same solution can help avoid both problems. Currently the WAL uses a region counter to mark the entries as well as to decide which entries are truncable. However, the memstore is unaware of these sequence numbers and therefore cannot indicate which WAL entries should not be truncated. We would like to come up with a mechanism that allows the memstore and WAL to share the minimal required information in order to ensure the data durability. We’d appreciate suggestions/insights. HBase In-Memory Memstore Compaction --- Key: HBASE-13408 URL: https://issues.apache.org/jira/browse/HBASE-13408 Project: HBase Issue Type: New Feature Reporter: Eshcar Hillel Attachments: HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf, HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf, InMemoryMemstoreCompactionEvaluationResults.pdf A store unit holds a column family in a region, where the memstore is its in-memory component. The memstore absorbs all updates to the store; from time to time these updates are flushed to a file on disk, where they are compacted. Unlike disk components, the memstore is not compacted until it is written to the filesystem and optionally to block-cache. This may result in underutilization of the memory due to duplicate entries per row, for example, when hot data is continuously updated. Generally, the faster the data is accumulated in memory, more flushes are triggered, the data sinks to disk more frequently, slowing down retrieval of data, even if very recent. In high-churn workloads, compacting the memstore can help maintain the data in memory, and thereby speed up data retrieval. We suggest a new compacted memstore with the following principles: 1.The data is kept in memory for as long as possible 2.Memstore data is either compacted or in process of being compacted 3.Allow a panic mode, which may interrupt an in-progress compaction and force a flush of part of the memstore. We suggest applying this optimization only to in-memory column families. A design document is attached. This feature was previously discussed in HBASE-5311. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14073) TestRemoteTable.testDelete failed in the latest trunk code
[ https://issues.apache.org/jira/browse/HBASE-14073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626296#comment-14626296 ] Hudson commented on HBASE-14073: SUCCESS: Integrated in HBase-TRUNK #6648 (See [https://builds.apache.org/job/HBase-TRUNK/6648/]) HBASE-14073 TestRemoteTable.testDelete failed in the latest trunk code.(Jingcheng) (anoopsamjohn: rev 2f327c911056d02813f642503db9a4383e8b4a2f) * hbase-rest/src/main/java/org/apache/hadoop/hbase/rest/client/RemoteHTable.java TestRemoteTable.testDelete failed in the latest trunk code -- Key: HBASE-14073 URL: https://issues.apache.org/jira/browse/HBASE-14073 Project: HBase Issue Type: Bug Components: REST Affects Versions: 2.0.0 Reporter: Jingcheng Du Assignee: Jingcheng Du Fix For: 2.0.0 Attachments: HBASE-14073.patch TestRemoteTable.testDelete failed in the latest trunk code. {code} excepted null, but was: B@615c4156 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13867) Add endpoint coprocessor guide to HBase book
[ https://issues.apache.org/jira/browse/HBASE-13867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626323#comment-14626323 ] Gabor Liptak commented on HBASE-13867: -- [~gbhardwaj] Yes, please reformat to max 100 characters limit and resubmit as HBASE-13867.2.patch (if there are lines which are need to stay longer, like long URLs, please comment on them when you upload the reformatted patch). Thanks Add endpoint coprocessor guide to HBase book Key: HBASE-13867 URL: https://issues.apache.org/jira/browse/HBASE-13867 Project: HBase Issue Type: Task Components: Coprocessors, documentation Reporter: Vladimir Rodionov Assignee: Gaurav Bhardwaj Fix For: 2.0.0, 1.0.2, 1.2.0, 1.1.2 Attachments: HBASE-13867.1.patch Endpoint coprocessors are very poorly documented. Coprocessor section of HBase book must be updated either with its own endpoint coprocessors HOW-TO guide or, at least, with the link(s) to some other guides. There is good description here: http://www.3pillarglobal.com/insights/hbase-coprocessors -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14058) Stabilizing default heap memory tuner
[ https://issues.apache.org/jira/browse/HBASE-14058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626568#comment-14626568 ] Abhilash commented on HBASE-14058: -- Then lets get this patch in as there as no other reviews for this patch. Stabilizing default heap memory tuner - Key: HBASE-14058 URL: https://issues.apache.org/jira/browse/HBASE-14058 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 2.0.0, 1.2.0, 1.3.0 Reporter: Abhilash Assignee: Abhilash Attachments: HBASE-14058-v1.patch, HBASE-14058.patch, after_modifications.png, before_modifications.png The memory tuner works well in general cases but when we have a work load that is both read heavy as well as write heavy the tuner does too many tuning. We should try to control the number of tuner operation and stabilize it. The main problem was that the tuner thinks it is in steady state even if it sees just one neutral tuner period thus does too many tuning operations and too many reverts that too with large step sizes(step size was set to maximum even after one neutral period). So to stop this I have thought of these steps: 1) The division created by μ + δ/2 and μ - δ/2 is too small. Statistically ~62% periods will lie outside this range, which means 62% of the data points are considered either high or low which is too much. Use μ + δ*0.8 and μ - δ*0.8 instead. On expectations it will decrease number of tuner operations per 100 periods from 19 to just 10. If we use δ/2 then 31% of data values will be considered to be high and 31% will be considered to be low (2*0.31 * 0.31 = 0.19), on the other hand if we use δ*0.8 then 22% will be low and 22% will be high(2*0.22*0.22 ~ 0.10). 2) Defining proper steady state by looking at past few periods(it is equal to hbase.regionserver.heapmemory.autotuner.lookup.periods) rather than just last tuner operation. We say tuner is in steady state when last few tuner periods were NEUTRAL. We keep decreasing step size unless it is extremely low. Then leave system in that state for some time. 3) Rather then decreasing step size only while reverting, decrease the magnitude of step size whenever we are trying to revert tuning done in last few periods(sum the changes of last few periods and compare to current step) rather than just looking at last period. When its magnitude gets too low then make tuner steps NEUTRAL(no operation). This will cause step size to continuously decrease unless we reach steady state. After that tuning process will restart (tuner step size rests again when we reach steady state). 4) The tuning done in last few periods will be decaying sum of past tuner steps with sign. This parameter will be positive for increase in memstore and negative for increase in block cache. Rather than using arithmetic mean we use this to give more priority to recent tuner steps. Please see the attachments. One represents the size of memstore(green) and size of block cache(blue) adjusted by tuner without these modification and other with the above modifications. The x-axis is time axis and y-axis is the fraction of heap memory available to memstore and block cache at that time(it always sums up to 80%). I configured min/max ranges for both components to 0.1 and 0.7 respectively(so in the plots the y-axis min and max is 0.1 and 0.7). In both cases the tuner tries to distribute memory by giving ~15% to memstore and ~65% to block cache. But the modified one does it much more smoothly. I got these results from YCSB test. The test was doing approximately 5000 inserts and 500 reads per second (for one region server). The results can be further fine tuned and number of tuner operation can be reduced with these changes in configuration. For more fine tuning: a) lower max step size (suggested = 4%) b) lower min step size ( default if also fine ) To further decrease frequency of tuning operations: c) increase the number of lookup periods ( in the tests it was just 10, default is 60 ) d) increase tuner period ( in the tests it was just 20 secs, default is 60secs) I used smaller tuner period/ number of look up periods to get more data points. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14070) Hybrid Logical Clocks for HBase
[ https://issues.apache.org/jira/browse/HBASE-14070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626566#comment-14626566 ] Dave Latham commented on HBASE-14070: - This looks awesome. Thanks, Enis. I also left some comments and questions in the GDoc. Hybrid Logical Clocks for HBase --- Key: HBASE-14070 URL: https://issues.apache.org/jira/browse/HBASE-14070 Project: HBase Issue Type: New Feature Reporter: Enis Soztutar Assignee: Enis Soztutar Attachments: HybridLogicalClocksforHBaseandPhoenix.pdf HBase and Phoenix uses systems physical clock (PT) to give timestamps to events (read and writes). This works mostly when the system clock is strictly monotonically increasing and there is no cross-dependency between servers clocks. However we know that leap seconds, general clock skew and clock drift are in fact real. This jira proposes using Hybrid Logical Clocks (HLC) as an implementation of hybrid physical clock + a logical clock. HLC is best of both worlds where it keeps causality relationship similar to logical clocks, but still is compatible with NTP based physical system clock. HLC can be represented in 64bits. A design document is attached and also can be found here: https://docs.google.com/document/d/1LL2GAodiYi0waBz5ODGL4LDT4e_bXy8P9h6kWC05Bhw/edit# -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13408) HBase In-Memory Memstore Compaction
[ https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626370#comment-14626370 ] Ted Yu commented on HBASE-13408: In the review request, please add 'hbase' to Groups field so that people can receive review comments. HBase In-Memory Memstore Compaction --- Key: HBASE-13408 URL: https://issues.apache.org/jira/browse/HBASE-13408 Project: HBase Issue Type: New Feature Reporter: Eshcar Hillel Attachments: HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf, HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf, InMemoryMemstoreCompactionEvaluationResults.pdf A store unit holds a column family in a region, where the memstore is its in-memory component. The memstore absorbs all updates to the store; from time to time these updates are flushed to a file on disk, where they are compacted. Unlike disk components, the memstore is not compacted until it is written to the filesystem and optionally to block-cache. This may result in underutilization of the memory due to duplicate entries per row, for example, when hot data is continuously updated. Generally, the faster the data is accumulated in memory, more flushes are triggered, the data sinks to disk more frequently, slowing down retrieval of data, even if very recent. In high-churn workloads, compacting the memstore can help maintain the data in memory, and thereby speed up data retrieval. We suggest a new compacted memstore with the following principles: 1.The data is kept in memory for as long as possible 2.Memstore data is either compacted or in process of being compacted 3.Allow a panic mode, which may interrupt an in-progress compaction and force a flush of part of the memstore. We suggest applying this optimization only to in-memory column families. A design document is attached. This feature was previously discussed in HBASE-5311. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14072) Nullpointerexception in running mapreduce with toolrun and exteded class TableInputFormat
[ https://issues.apache.org/jira/browse/HBASE-14072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625947#comment-14625947 ] Dima Spivak commented on HBASE-14072: - Mind posting the exact command run as well as the stacktrace you see? Nullpointerexception in running mapreduce with toolrun and exteded class TableInputFormat -- Key: HBASE-14072 URL: https://issues.apache.org/jira/browse/HBASE-14072 Project: HBase Issue Type: Bug Components: hbase Affects Versions: 1.1.0.1 Reporter: aslam Running mapreduce with extended class org.apache.hadoop.hbase.mapreduce.TableInputFormat , and using toolrun ,nullpointerexception is coming from getTable().getName().The code was working if previous version of hbase. Its working if we call initialize(context) inside getSplits method. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13738) Scan with RAW type for increment data insertions is displaying only latest two KV's
[ https://issues.apache.org/jira/browse/HBASE-13738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626801#comment-14626801 ] Pankaj Kumar commented on HBASE-13738: -- Sorry for late reply, I will provide another patch version soon addressing Anoop's feedback. Scan with RAW type for increment data insertions is displaying only latest two KV's Key: HBASE-13738 URL: https://issues.apache.org/jira/browse/HBASE-13738 Project: HBase Issue Type: Bug Components: Scanners Environment: Suse 11 SP3 Reporter: neha Assignee: Pankaj Kumar Priority: Minor Attachments: HBASE-13738.patch [Scenario for Reproducing ]: 1. Create an HBase table with single column family by keeping the versions=1 (DEFAULT) 2. Increment Insertion more than 2 times for the same row and for same qualifier. 3. scan the table with raw= true and versions= 10 {code} scan 'tbl', {RAW = TRUE, VERSIONS = 10} {code} Expected Result: === Raw scan should result in all the versions until the table flushed -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14041) Client MetaCache is cleared if a ThrottlingException is thrown
[ https://issues.apache.org/jira/browse/HBASE-14041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-14041: --- Fix Version/s: 1.3.0 1.2.0 2.0.0 Client MetaCache is cleared if a ThrottlingException is thrown -- Key: HBASE-14041 URL: https://issues.apache.org/jira/browse/HBASE-14041 Project: HBase Issue Type: Bug Components: Client Affects Versions: 1.1.0 Reporter: Eungsop Yoo Assignee: Eungsop Yoo Priority: Minor Fix For: 2.0.0, 1.2.0, 1.3.0 Attachments: 0001-Do-not-clear-MetaCache-if-a-ThrottlingException-is-t-v2.patch, 0001-Do-not-clear-MetaCache-if-a-ThrottlingException-is-t-v3.patch, 0001-Do-not-clear-MetaCache-if-a-ThrottlingException-is-t.patch During performance test with the request throttling, I saw that hbase:meta table had been read a lot. Currently the MetaCache of the client is cleared, if a ThrottlingException is thrown. It seems to be not needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13329) ArrayIndexOutOfBoundsException in CellComparator#getMinimumMidpointArray
[ https://issues.apache.org/jira/browse/HBASE-13329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626820#comment-14626820 ] Lars Hofhansl commented on HBASE-13329: --- [~busbey], did you remove the 1.3.0 target on purpose? ArrayIndexOutOfBoundsException in CellComparator#getMinimumMidpointArray Key: HBASE-13329 URL: https://issues.apache.org/jira/browse/HBASE-13329 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 1.0.1 Environment: linux-debian-jessie ec2 - t2.micro instances Reporter: Ruben Aguiar Assignee: Lars Hofhansl Priority: Critical Fix For: 2.0.0, 1.0.2, 1.2.0, 1.1.2 Attachments: 13329-asserts.patch, 13329-v1.patch, 13329.txt, HBASE-13329.test.00.branch-1.1.patch While trying to benchmark my opentsdb cluster, I've created a script that sends to hbase always the same value (in this case 1). After a few minutes, the whole region server crashes and the region itself becomes impossible to open again (cannot assign or unassign). After some investigation, what I saw on the logs is that when a Memstore flush is called on a large region (128mb) the process errors, killing the regionserver. On restart, replaying the edits generates the same error, making the region unavailable. Tried to manually unassign, assign or close_region. That didn't work because the code that reads/replays it crashes. From my investigation this seems to be an overflow issue. The logs show that the function getMinimumMidpointArray tried to access index -32743 of an array, extremely close to the minimum short value in Java. Upon investigation of the source code, it seems an index short is used, being incremented as long as the two vectors are the same, probably making it overflow on large vectors with equal data. Changing it to int should solve the problem. Here follows the hadoop logs of when the regionserver went down. Any help is appreciated. Any other information you need please do tell me: 2015-03-24 18:00:56,187 INFO [regionserver//10.2.0.73:16020.logRoller] wal.FSHLog: Rolled WAL /hbase/WALs/10.2.0.73,16020,1427216382590/10.2.0.73%2C16020%2C1427216382590.default.1427220018516 with entries=143, filesize=134.70 MB; new WAL /hbase/WALs/10.2.0.73,16020,1427216382590/10.2.0.73%2C16020%2C1427216382590.default.1427220056140 2015-03-24 18:00:56,188 INFO [regionserver//10.2.0.73:16020.logRoller] wal.FSHLog: Archiving hdfs://10.2.0.74:8020/hbase/WALs/10.2.0.73,16020,1427216382590/10.2.0.73%2C16020%2C1427216382590.default.1427219987709 to hdfs://10.2.0.74:8020/hbase/oldWALs/10.2.0.73%2C16020%2C1427216382590.default.1427219987709 2015-03-24 18:04:35,722 INFO [MemStoreFlusher.0] regionserver.HRegion: Started memstore flush for tsdb,,1427133969325.52bc1994da0fea97563a4a656a58bec2., current region memstore size 128.04 MB 2015-03-24 18:04:36,154 FATAL [MemStoreFlusher.0] regionserver.HRegionServer: ABORTING region server 10.2.0.73,16020,1427216382590: Replay of WAL required. Forcing server shutdown org.apache.hadoop.hbase.DroppedSnapshotException: region: tsdb,,1427133969325.52bc1994da0fea97563a4a656a58bec2. at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1999) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1770) at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1702) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:445) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:407) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$800(MemStoreFlusher.java:69) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:225) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ArrayIndexOutOfBoundsException: -32743 at org.apache.hadoop.hbase.CellComparator.getMinimumMidpointArray(CellComparator.java:478) at org.apache.hadoop.hbase.CellComparator.getMidpoint(CellComparator.java:448) at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.finishBlock(HFileWriterV2.java:165) at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.checkBlockBoundary(HFileWriterV2.java:146) at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:263) at org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(HFileWriterV3.java:87) at org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:932) at org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:121) at
[jira] [Commented] (HBASE-13329) ArrayIndexOutOfBoundsException in CellComparator#getMinimumMidpointArray
[ https://issues.apache.org/jira/browse/HBASE-13329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627056#comment-14627056 ] Benoit Sigoure commented on HBASE-13329: I'm kinda late to the party but yeah OpenTSDB compactions might cause long column qualifiers. OpenTSDB doesn't generally use long row keys though, so that makes total sense. Thanks for getting to the bottom of this one! ArrayIndexOutOfBoundsException in CellComparator#getMinimumMidpointArray Key: HBASE-13329 URL: https://issues.apache.org/jira/browse/HBASE-13329 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 1.0.1 Environment: linux-debian-jessie ec2 - t2.micro instances Reporter: Ruben Aguiar Assignee: Lars Hofhansl Priority: Critical Fix For: 2.0.0, 1.0.2, 1.2.0, 1.1.2 Attachments: 13329-asserts.patch, 13329-v1.patch, 13329.txt, HBASE-13329.test.00.branch-1.1.patch While trying to benchmark my opentsdb cluster, I've created a script that sends to hbase always the same value (in this case 1). After a few minutes, the whole region server crashes and the region itself becomes impossible to open again (cannot assign or unassign). After some investigation, what I saw on the logs is that when a Memstore flush is called on a large region (128mb) the process errors, killing the regionserver. On restart, replaying the edits generates the same error, making the region unavailable. Tried to manually unassign, assign or close_region. That didn't work because the code that reads/replays it crashes. From my investigation this seems to be an overflow issue. The logs show that the function getMinimumMidpointArray tried to access index -32743 of an array, extremely close to the minimum short value in Java. Upon investigation of the source code, it seems an index short is used, being incremented as long as the two vectors are the same, probably making it overflow on large vectors with equal data. Changing it to int should solve the problem. Here follows the hadoop logs of when the regionserver went down. Any help is appreciated. Any other information you need please do tell me: 2015-03-24 18:00:56,187 INFO [regionserver//10.2.0.73:16020.logRoller] wal.FSHLog: Rolled WAL /hbase/WALs/10.2.0.73,16020,1427216382590/10.2.0.73%2C16020%2C1427216382590.default.1427220018516 with entries=143, filesize=134.70 MB; new WAL /hbase/WALs/10.2.0.73,16020,1427216382590/10.2.0.73%2C16020%2C1427216382590.default.1427220056140 2015-03-24 18:00:56,188 INFO [regionserver//10.2.0.73:16020.logRoller] wal.FSHLog: Archiving hdfs://10.2.0.74:8020/hbase/WALs/10.2.0.73,16020,1427216382590/10.2.0.73%2C16020%2C1427216382590.default.1427219987709 to hdfs://10.2.0.74:8020/hbase/oldWALs/10.2.0.73%2C16020%2C1427216382590.default.1427219987709 2015-03-24 18:04:35,722 INFO [MemStoreFlusher.0] regionserver.HRegion: Started memstore flush for tsdb,,1427133969325.52bc1994da0fea97563a4a656a58bec2., current region memstore size 128.04 MB 2015-03-24 18:04:36,154 FATAL [MemStoreFlusher.0] regionserver.HRegionServer: ABORTING region server 10.2.0.73,16020,1427216382590: Replay of WAL required. Forcing server shutdown org.apache.hadoop.hbase.DroppedSnapshotException: region: tsdb,,1427133969325.52bc1994da0fea97563a4a656a58bec2. at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1999) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1770) at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1702) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:445) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:407) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$800(MemStoreFlusher.java:69) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:225) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ArrayIndexOutOfBoundsException: -32743 at org.apache.hadoop.hbase.CellComparator.getMinimumMidpointArray(CellComparator.java:478) at org.apache.hadoop.hbase.CellComparator.getMidpoint(CellComparator.java:448) at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.finishBlock(HFileWriterV2.java:165) at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.checkBlockBoundary(HFileWriterV2.java:146) at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:263) at org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(HFileWriterV3.java:87) at
[jira] [Commented] (HBASE-13965) Stochastic Load Balancer JMX Metrics
[ https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627088#comment-14627088 ] Elliott Clark commented on HBASE-13965: --- bq.Do you mean removing all the code for the per-table balancing, as well as documents if any? Yep, that greatly simplifies lots of things, and cleans up some code that was put in as a stop gap measure. Stochastic Load Balancer JMX Metrics Key: HBASE-13965 URL: https://issues.apache.org/jira/browse/HBASE-13965 Project: HBase Issue Type: Improvement Components: Balancer, metrics Reporter: Lei Chen Assignee: Lei Chen Attachments: HBASE-13965-v3.patch, HBASE-13965-v4.patch, HBASE-13965-v5.patch, HBASE-13965-v6.patch, HBASE-13965-v7.patch, HBASE-13965-v8.patch, HBASE-13965_v2.patch, HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png Today’s default HBase load balancer (the Stochastic load balancer) is cost function based. The cost function weights are tunable but no visibility into those cost function results is directly provided. A driving example is a cluster we have been tuning which has skewed rack size (one rack has half the nodes of the other few racks). We are tuning the cluster for uniform response time from all region servers with the ability to tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and RegionCountSkew Cost is difficult without a way to attribute each cost function’s contribution to overall cost. What this jira proposes is to provide visibility via JMX into each cost function of the stochastic load balancer, as well as the overall cost of the balancing plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14077) Add package to hbase-protocol protobuf files.
Elliott Clark created HBASE-14077: - Summary: Add package to hbase-protocol protobuf files. Key: HBASE-14077 URL: https://issues.apache.org/jira/browse/HBASE-14077 Project: HBase Issue Type: Bug Components: Protobufs Reporter: Elliott Clark Assignee: Elliott Clark c++ generated code is currently in the default namespace. That's bad practice; so lets fix it -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11339) HBase MOB
[ https://issues.apache.org/jira/browse/HBASE-11339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626991#comment-14626991 ] Ted Yu commented on HBASE-11339: The long line warnings all come from .rb file where there are pre-existing long lines. HBase MOB - Key: HBASE-11339 URL: https://issues.apache.org/jira/browse/HBASE-11339 Project: HBase Issue Type: Umbrella Components: regionserver, Scanners Affects Versions: 2.0.0 Reporter: Jingcheng Du Assignee: Jingcheng Du Fix For: hbase-11339 Attachments: 11339-master-v10.patch, 11339-master-v3.txt, 11339-master-v4.txt, 11339-master-v5.txt, 11339-master-v6.txt, 11339-master-v7.txt, 11339-master-v8.patch, 11339-master-v9.patch, HBase MOB Design-v2.pdf, HBase MOB Design-v3.pdf, HBase MOB Design-v4.pdf, HBase MOB Design-v5.pdf, HBase MOB Design.pdf, MOB user guide.docx, MOB user guide_v2.docx, MOB user guide_v3.docx, MOB user guide_v4.docx, MOB user guide_v5.docx, hbase-11339-150519.patch, hbase-11339-in-dev.patch, hbase-11339.150417.patch, merge-150212.patch, merge.150212b.patch, merge.150212c.patch, merge.150710.patch It's quite useful to save the medium binary data like images, documents into Apache HBase. Unfortunately directly saving the binary MOB(medium object) to HBase leads to a worse performance since the frequent split and compaction. In this design, the MOB data are stored in an more efficient way, which keeps a high write/read performance and guarantees the data consistency in Apache HBase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14078) improve error message when HMaster can't bind to port
[ https://issues.apache.org/jira/browse/HBASE-14078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Busbey updated HBASE-14078: Labels: beginner (was: ) improve error message when HMaster can't bind to port - Key: HBASE-14078 URL: https://issues.apache.org/jira/browse/HBASE-14078 Project: HBase Issue Type: Improvement Components: master Affects Versions: 2.0.0 Reporter: Sean Busbey Labels: beginner Fix For: 2.0.0 When the master fails to start becahse hbase.master.port is already taken, the log messages could make it easier to tell. {quote} 2015-07-14 13:10:02,667 INFO [main] regionserver.RSRpcServices: master/a1221.halxg.cloudera.com/10.20.188.121:16000 server-side HConnection retries=350 2015-07-14 13:10:02,879 INFO [main] ipc.SimpleRpcScheduler: Using deadline as user call queue, count=3 2015-07-14 13:10:02,895 ERROR [main] master.HMasterCommandLine: Master exiting java.lang.RuntimeException: Failed construction of Master: class org.apache.hadoop.hbase.master.HMaster at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:2258) at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:234) at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:140) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126) at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2272) Caused by: java.net.BindException: Address already in use at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:444) at sun.nio.ch.Net.bind(Net.java:436) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) at org.apache.hadoop.hbase.ipc.RpcServer.bind(RpcServer.java:2513) at org.apache.hadoop.hbase.ipc.RpcServer$Listener.init(RpcServer.java:599) at org.apache.hadoop.hbase.ipc.RpcServer.init(RpcServer.java:2000) at org.apache.hadoop.hbase.regionserver.RSRpcServices.init(RSRpcServices.java:919) at org.apache.hadoop.hbase.master.MasterRpcServices.init(MasterRpcServices.java:211) at org.apache.hadoop.hbase.master.HMaster.createRpcServices(HMaster.java:509) at org.apache.hadoop.hbase.regionserver.HRegionServer.init(HRegionServer.java:535) at org.apache.hadoop.hbase.master.HMaster.init(HMaster.java:351) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:2253) ... 5 more {quote} I recognize that the RSRpcServices log message shows port 16000, but I don't know why a new operator would. Additionally, it'd be nice to tell them that the port is controlled by {{hbase.master.port}}. Maybe give a hint on how to see what's using the port. Could be too os-dist specific? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14078) improve error message when HMaster can't bind to port
[ https://issues.apache.org/jira/browse/HBASE-14078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627071#comment-14627071 ] Sean Busbey commented on HBASE-14078: - versions set to 2.0.0 because that's where I hit it. I'm guessing it also applies to all other versions. If someone has a chance to check feel free to add in any patch level as a target. improve error message when HMaster can't bind to port - Key: HBASE-14078 URL: https://issues.apache.org/jira/browse/HBASE-14078 Project: HBase Issue Type: Improvement Components: master Affects Versions: 2.0.0 Reporter: Sean Busbey Labels: beginner Fix For: 2.0.0 When the master fails to start becahse hbase.master.port is already taken, the log messages could make it easier to tell. {quote} 2015-07-14 13:10:02,667 INFO [main] regionserver.RSRpcServices: master/a1221.halxg.cloudera.com/10.20.188.121:16000 server-side HConnection retries=350 2015-07-14 13:10:02,879 INFO [main] ipc.SimpleRpcScheduler: Using deadline as user call queue, count=3 2015-07-14 13:10:02,895 ERROR [main] master.HMasterCommandLine: Master exiting java.lang.RuntimeException: Failed construction of Master: class org.apache.hadoop.hbase.master.HMaster at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:2258) at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:234) at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:140) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126) at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2272) Caused by: java.net.BindException: Address already in use at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:444) at sun.nio.ch.Net.bind(Net.java:436) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) at org.apache.hadoop.hbase.ipc.RpcServer.bind(RpcServer.java:2513) at org.apache.hadoop.hbase.ipc.RpcServer$Listener.init(RpcServer.java:599) at org.apache.hadoop.hbase.ipc.RpcServer.init(RpcServer.java:2000) at org.apache.hadoop.hbase.regionserver.RSRpcServices.init(RSRpcServices.java:919) at org.apache.hadoop.hbase.master.MasterRpcServices.init(MasterRpcServices.java:211) at org.apache.hadoop.hbase.master.HMaster.createRpcServices(HMaster.java:509) at org.apache.hadoop.hbase.regionserver.HRegionServer.init(HRegionServer.java:535) at org.apache.hadoop.hbase.master.HMaster.init(HMaster.java:351) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:2253) ... 5 more {quote} I recognize that the RSRpcServices log message shows port 16000, but I don't know why a new operator would. Additionally, it'd be nice to tell them that the port is controlled by {{hbase.master.port}}. Maybe give a hint on how to see what's using the port. Could be too os-dist specific? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14030) HBase Backup/Restore Phase 1
[ https://issues.apache.org/jira/browse/HBASE-14030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov updated HBASE-14030: -- Attachment: HBASE-14030-v3.patch Patch v3 contains HBASE-14031, HBASE-14032, HBASE-14033. HBase Backup/Restore Phase 1 Key: HBASE-14030 URL: https://issues.apache.org/jira/browse/HBASE-14030 Project: HBase Issue Type: Umbrella Reporter: Vladimir Rodionov Assignee: Vladimir Rodionov Attachments: HBASE-14030-v0.patch, HBASE-14030-v1.patch, HBASE-14030-v2.patch, HBASE-14030-v3.patch This is the umbrella ticket for Backup/Restore Phase 1. See HBase-7912 design doc for the phase description. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14077) Add package to hbase-protocol protobuf files.
[ https://issues.apache.org/jira/browse/HBASE-14077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HBASE-14077: -- Affects Version/s: 1.3.0 1.2.0 2.0.0 Fix Version/s: 1.3.0 2.0.0 Add package to hbase-protocol protobuf files. - Key: HBASE-14077 URL: https://issues.apache.org/jira/browse/HBASE-14077 Project: HBase Issue Type: Bug Components: Protobufs Affects Versions: 2.0.0, 1.2.0, 1.3.0 Reporter: Elliott Clark Assignee: Elliott Clark Fix For: 2.0.0, 1.3.0 Attachments: HBASE-14077.patch c++ generated code is currently in the default namespace. That's bad practice; so lets fix it -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14077) Add package to hbase-protocol protobuf files.
[ https://issues.apache.org/jira/browse/HBASE-14077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HBASE-14077: -- Attachment: HBASE-14077.patch Add package to hbase-protocol protobuf files. - Key: HBASE-14077 URL: https://issues.apache.org/jira/browse/HBASE-14077 Project: HBase Issue Type: Bug Components: Protobufs Affects Versions: 2.0.0, 1.2.0, 1.3.0 Reporter: Elliott Clark Assignee: Elliott Clark Fix For: 2.0.0, 1.3.0 Attachments: HBASE-14077.patch c++ generated code is currently in the default namespace. That's bad practice; so lets fix it -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14078) improve error message when HMaster can't bind to port
Sean Busbey created HBASE-14078: --- Summary: improve error message when HMaster can't bind to port Key: HBASE-14078 URL: https://issues.apache.org/jira/browse/HBASE-14078 Project: HBase Issue Type: Improvement Components: master Affects Versions: 2.0.0 Reporter: Sean Busbey Fix For: 2.0.0 When the master fails to start becahse hbase.master.port is already taken, the log messages could make it easier to tell. {quote} 2015-07-14 13:10:02,667 INFO [main] regionserver.RSRpcServices: master/a1221.halxg.cloudera.com/10.20.188.121:16000 server-side HConnection retries=350 2015-07-14 13:10:02,879 INFO [main] ipc.SimpleRpcScheduler: Using deadline as user call queue, count=3 2015-07-14 13:10:02,895 ERROR [main] master.HMasterCommandLine: Master exiting java.lang.RuntimeException: Failed construction of Master: class org.apache.hadoop.hbase.master.HMaster at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:2258) at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:234) at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:140) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126) at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2272) Caused by: java.net.BindException: Address already in use at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:444) at sun.nio.ch.Net.bind(Net.java:436) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) at org.apache.hadoop.hbase.ipc.RpcServer.bind(RpcServer.java:2513) at org.apache.hadoop.hbase.ipc.RpcServer$Listener.init(RpcServer.java:599) at org.apache.hadoop.hbase.ipc.RpcServer.init(RpcServer.java:2000) at org.apache.hadoop.hbase.regionserver.RSRpcServices.init(RSRpcServices.java:919) at org.apache.hadoop.hbase.master.MasterRpcServices.init(MasterRpcServices.java:211) at org.apache.hadoop.hbase.master.HMaster.createRpcServices(HMaster.java:509) at org.apache.hadoop.hbase.regionserver.HRegionServer.init(HRegionServer.java:535) at org.apache.hadoop.hbase.master.HMaster.init(HMaster.java:351) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:2253) ... 5 more {quote} I recognize that the RSRpcServices log message shows port 16000, but I don't know why a new operator would. Additionally, it'd be nice to tell them that the port is controlled by {{hbase.master.port}}. Maybe give a hint on how to see what's using the port. Could be too os-dist specific? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-14031) HBase Backup/Restore Phase 1: Abstract DistCp in incremental backup
[ https://issues.apache.org/jira/browse/HBASE-14031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov resolved HBASE-14031. --- Resolution: Implemented HBASE-14030 v3 contains the patch for the feature. HBase Backup/Restore Phase 1: Abstract DistCp in incremental backup --- Key: HBASE-14031 URL: https://issues.apache.org/jira/browse/HBASE-14031 Project: HBase Issue Type: Task Reporter: Vladimir Rodionov Assignee: Vladimir Rodionov Fix For: 2.0.0 Abstract DistCp (incremental backup) to support non-M/R based implementations. Provide M/R implementation. DistCp is used to copy WAL files during incremental backup. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14030) HBase Backup/Restore Phase 1
[ https://issues.apache.org/jira/browse/HBASE-14030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627151#comment-14627151 ] Vladimir Rodionov commented on HBASE-14030: --- [~tedyu]: {quote} Mind uploading patch v2 on reviewboard ? {quote} Wiil do this after I finish Phase 1. HBase Backup/Restore Phase 1 Key: HBASE-14030 URL: https://issues.apache.org/jira/browse/HBASE-14030 Project: HBase Issue Type: Umbrella Reporter: Vladimir Rodionov Assignee: Vladimir Rodionov Attachments: HBASE-14030-v0.patch, HBASE-14030-v1.patch, HBASE-14030-v2.patch, HBASE-14030-v3.patch This is the umbrella ticket for Backup/Restore Phase 1. See HBase-7912 design doc for the phase description. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14077) Add package to hbase-protocol protobuf files.
[ https://issues.apache.org/jira/browse/HBASE-14077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627147#comment-14627147 ] stack commented on HBASE-14077: --- LGTM. You don't want to put it in org.hbase package or org.apache.hbase package? Add package to hbase-protocol protobuf files. - Key: HBASE-14077 URL: https://issues.apache.org/jira/browse/HBASE-14077 Project: HBase Issue Type: Bug Components: Protobufs Affects Versions: 2.0.0, 1.2.0, 1.3.0 Reporter: Elliott Clark Assignee: Elliott Clark Fix For: 2.0.0, 1.3.0 Attachments: HBASE-14077.patch c++ generated code is currently in the default namespace. That's bad practice; so lets fix it -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13329) ArrayIndexOutOfBoundsException in CellComparator#getMinimumMidpointArray
[ https://issues.apache.org/jira/browse/HBASE-13329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626891#comment-14626891 ] Sean Busbey commented on HBASE-13329: - yes. I had accounted for it in my staged 1.2.0 release notes. see the discussion on HBASE-14025. ArrayIndexOutOfBoundsException in CellComparator#getMinimumMidpointArray Key: HBASE-13329 URL: https://issues.apache.org/jira/browse/HBASE-13329 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 1.0.1 Environment: linux-debian-jessie ec2 - t2.micro instances Reporter: Ruben Aguiar Assignee: Lars Hofhansl Priority: Critical Fix For: 2.0.0, 1.0.2, 1.2.0, 1.1.2 Attachments: 13329-asserts.patch, 13329-v1.patch, 13329.txt, HBASE-13329.test.00.branch-1.1.patch While trying to benchmark my opentsdb cluster, I've created a script that sends to hbase always the same value (in this case 1). After a few minutes, the whole region server crashes and the region itself becomes impossible to open again (cannot assign or unassign). After some investigation, what I saw on the logs is that when a Memstore flush is called on a large region (128mb) the process errors, killing the regionserver. On restart, replaying the edits generates the same error, making the region unavailable. Tried to manually unassign, assign or close_region. That didn't work because the code that reads/replays it crashes. From my investigation this seems to be an overflow issue. The logs show that the function getMinimumMidpointArray tried to access index -32743 of an array, extremely close to the minimum short value in Java. Upon investigation of the source code, it seems an index short is used, being incremented as long as the two vectors are the same, probably making it overflow on large vectors with equal data. Changing it to int should solve the problem. Here follows the hadoop logs of when the regionserver went down. Any help is appreciated. Any other information you need please do tell me: 2015-03-24 18:00:56,187 INFO [regionserver//10.2.0.73:16020.logRoller] wal.FSHLog: Rolled WAL /hbase/WALs/10.2.0.73,16020,1427216382590/10.2.0.73%2C16020%2C1427216382590.default.1427220018516 with entries=143, filesize=134.70 MB; new WAL /hbase/WALs/10.2.0.73,16020,1427216382590/10.2.0.73%2C16020%2C1427216382590.default.1427220056140 2015-03-24 18:00:56,188 INFO [regionserver//10.2.0.73:16020.logRoller] wal.FSHLog: Archiving hdfs://10.2.0.74:8020/hbase/WALs/10.2.0.73,16020,1427216382590/10.2.0.73%2C16020%2C1427216382590.default.1427219987709 to hdfs://10.2.0.74:8020/hbase/oldWALs/10.2.0.73%2C16020%2C1427216382590.default.1427219987709 2015-03-24 18:04:35,722 INFO [MemStoreFlusher.0] regionserver.HRegion: Started memstore flush for tsdb,,1427133969325.52bc1994da0fea97563a4a656a58bec2., current region memstore size 128.04 MB 2015-03-24 18:04:36,154 FATAL [MemStoreFlusher.0] regionserver.HRegionServer: ABORTING region server 10.2.0.73,16020,1427216382590: Replay of WAL required. Forcing server shutdown org.apache.hadoop.hbase.DroppedSnapshotException: region: tsdb,,1427133969325.52bc1994da0fea97563a4a656a58bec2. at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1999) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1770) at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1702) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:445) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:407) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$800(MemStoreFlusher.java:69) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:225) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ArrayIndexOutOfBoundsException: -32743 at org.apache.hadoop.hbase.CellComparator.getMinimumMidpointArray(CellComparator.java:478) at org.apache.hadoop.hbase.CellComparator.getMidpoint(CellComparator.java:448) at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.finishBlock(HFileWriterV2.java:165) at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.checkBlockBoundary(HFileWriterV2.java:146) at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:263) at org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(HFileWriterV3.java:87) at org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:932) at
[jira] [Updated] (HBASE-14076) ResultSerialization and MutationSerialization can throw InvalidProtocolBufferException when serializing a cell larger than 64MB
[ https://issues.apache.org/jira/browse/HBASE-14076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Esteban Gutierrez updated HBASE-14076: -- Affects Version/s: 1.2.0 hbase-11339 2.0.0 ResultSerialization and MutationSerialization can throw InvalidProtocolBufferException when serializing a cell larger than 64MB --- Key: HBASE-14076 URL: https://issues.apache.org/jira/browse/HBASE-14076 Project: HBase Issue Type: Bug Affects Versions: 2.0.0, hbase-11339, 1.2.0 Reporter: Esteban Gutierrez Assignee: Esteban Gutierrez This was reported in CRUNCH-534 but is a problem how we handle deserialization of large Cells ( 64MB) in ResultSerialization and MutationSerialization. The fix is just re-using what it was done in HBASE-13230. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11339) HBase MOB
[ https://issues.apache.org/jira/browse/HBASE-11339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626989#comment-14626989 ] Hadoop QA commented on HBASE-11339: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12745292/11339-master-v10.patch against master branch at commit 2f327c911056d02813f642503db9a4383e8b4a2f. ATTACHMENT ID: 12745292 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 102 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 lineLengths{color}. The patch introduces the following lines longer than 100: + family.setMobEnabled(JBoolean.valueOf(arg.delete(org.apache.hadoop.hbase.HColumnDescriptor::IS_MOB))) if arg.include?(org.apache.hadoop.hbase.HColumnDescriptor::IS_MOB) + family.setMobThreshold(JLong.valueOf(arg.delete(org.apache.hadoop.hbase.HColumnDescriptor::MOB_THRESHOLD))) if arg.include?(org.apache.hadoop.hbase.HColumnDescriptor::MOB_THRESHOLD) + @admin.compactMob(org.apache.hadoop.hbase.TableName.valueOf(table_name), family.to_java_bytes) + @admin.majorCompactMob(org.apache.hadoop.hbase.TableName.valueOf(table_name), family.to_java_bytes) {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/14769//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/14769//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/14769//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/14769//console This message is automatically generated. HBase MOB - Key: HBASE-11339 URL: https://issues.apache.org/jira/browse/HBASE-11339 Project: HBase Issue Type: Umbrella Components: regionserver, Scanners Affects Versions: 2.0.0 Reporter: Jingcheng Du Assignee: Jingcheng Du Fix For: hbase-11339 Attachments: 11339-master-v10.patch, 11339-master-v3.txt, 11339-master-v4.txt, 11339-master-v5.txt, 11339-master-v6.txt, 11339-master-v7.txt, 11339-master-v8.patch, 11339-master-v9.patch, HBase MOB Design-v2.pdf, HBase MOB Design-v3.pdf, HBase MOB Design-v4.pdf, HBase MOB Design-v5.pdf, HBase MOB Design.pdf, MOB user guide.docx, MOB user guide_v2.docx, MOB user guide_v3.docx, MOB user guide_v4.docx, MOB user guide_v5.docx, hbase-11339-150519.patch, hbase-11339-in-dev.patch, hbase-11339.150417.patch, merge-150212.patch, merge.150212b.patch, merge.150212c.patch, merge.150710.patch It's quite useful to save the medium binary data like images, documents into Apache HBase. Unfortunately directly saving the binary MOB(medium object) to HBase leads to a worse performance since the frequent split and compaction. In this design, the MOB data are stored in an more efficient way, which keeps a high write/read performance and guarantees the data consistency in Apache HBase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14075) HBaseClusterManager should use port(if given) to find pid
[ https://issues.apache.org/jira/browse/HBASE-14075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu Li updated HBASE-14075: -- Attachment: HBASE-14075-master_v3.patch Minor fix to avoid shell execution return non-zero value when checking service running HBaseClusterManager should use port(if given) to find pid - Key: HBASE-14075 URL: https://issues.apache.org/jira/browse/HBASE-14075 Project: HBase Issue Type: Bug Reporter: Yu Li Assignee: Yu Li Priority: Minor Attachments: HBASE-14075-master_v2.patch, HBASE-14075-master_v3.patch, HBASE-14075.patch This issue is found while we run ITBLL in distributed cluster. Our testing env is kind of special that we run multiple regionserver instance on a single physical machine, so {noformat}ps -ef | grep proc_regionserver{noformat} will return more than one line, thus cause the tool might check/kill the wrong process Actually in HBaseClusterManager we already introduce port as a parameter for methods like isRunning, kill, etc. So the only thing to do here is to get pid through port if port is given -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14078) improve error message when HMaster can't bind to port
[ https://issues.apache.org/jira/browse/HBASE-14078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Busbey updated HBASE-14078: Description: When the master fails to start becahse hbase.master.port is already taken, the log messages could make it easier to tell. {quote} 2015-07-14 13:10:02,667 INFO [main] regionserver.RSRpcServices: master/master01.example.com/10.20.188.121:16000 server-side HConnection retries=350 2015-07-14 13:10:02,879 INFO [main] ipc.SimpleRpcScheduler: Using deadline as user call queue, count=3 2015-07-14 13:10:02,895 ERROR [main] master.HMasterCommandLine: Master exiting java.lang.RuntimeException: Failed construction of Master: class org.apache.hadoop.hbase.master.HMaster at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:2258) at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:234) at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:140) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126) at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2272) Caused by: java.net.BindException: Address already in use at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:444) at sun.nio.ch.Net.bind(Net.java:436) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) at org.apache.hadoop.hbase.ipc.RpcServer.bind(RpcServer.java:2513) at org.apache.hadoop.hbase.ipc.RpcServer$Listener.init(RpcServer.java:599) at org.apache.hadoop.hbase.ipc.RpcServer.init(RpcServer.java:2000) at org.apache.hadoop.hbase.regionserver.RSRpcServices.init(RSRpcServices.java:919) at org.apache.hadoop.hbase.master.MasterRpcServices.init(MasterRpcServices.java:211) at org.apache.hadoop.hbase.master.HMaster.createRpcServices(HMaster.java:509) at org.apache.hadoop.hbase.regionserver.HRegionServer.init(HRegionServer.java:535) at org.apache.hadoop.hbase.master.HMaster.init(HMaster.java:351) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:2253) ... 5 more {quote} I recognize that the RSRpcServices log message shows port 16000, but I don't know why a new operator would. Additionally, it'd be nice to tell them that the port is controlled by {{hbase.master.port}}. Maybe give a hint on how to see what's using the port. Could be too os-dist specific? was: When the master fails to start becahse hbase.master.port is already taken, the log messages could make it easier to tell. {quote} 2015-07-14 13:10:02,667 INFO [main] regionserver.RSRpcServices: master/a1221.halxg.cloudera.com/10.20.188.121:16000 server-side HConnection retries=350 2015-07-14 13:10:02,879 INFO [main] ipc.SimpleRpcScheduler: Using deadline as user call queue, count=3 2015-07-14 13:10:02,895 ERROR [main] master.HMasterCommandLine: Master exiting java.lang.RuntimeException: Failed construction of Master: class org.apache.hadoop.hbase.master.HMaster at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:2258) at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:234) at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:140) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126) at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2272) Caused by: java.net.BindException: Address already in use at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:444) at sun.nio.ch.Net.bind(Net.java:436) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) at org.apache.hadoop.hbase.ipc.RpcServer.bind(RpcServer.java:2513) at org.apache.hadoop.hbase.ipc.RpcServer$Listener.init(RpcServer.java:599) at org.apache.hadoop.hbase.ipc.RpcServer.init(RpcServer.java:2000) at org.apache.hadoop.hbase.regionserver.RSRpcServices.init(RSRpcServices.java:919) at org.apache.hadoop.hbase.master.MasterRpcServices.init(MasterRpcServices.java:211)
[jira] [Updated] (HBASE-14075) HBaseClusterManager should use port(if given) to find pid
[ https://issues.apache.org/jira/browse/HBASE-14075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu Li updated HBASE-14075: -- Attachment: HBASE-14075-master_v2.patch Done some testing before uploading the patch, here it is HBaseClusterManager should use port(if given) to find pid - Key: HBASE-14075 URL: https://issues.apache.org/jira/browse/HBASE-14075 Project: HBase Issue Type: Bug Reporter: Yu Li Assignee: Yu Li Priority: Minor Attachments: HBASE-14075-master_v2.patch, HBASE-14075.patch This issue is found while we run ITBLL in distributed cluster. Our testing env is kind of special that we run multiple regionserver instance on a single physical machine, so {noformat}ps -ef | grep proc_regionserver{noformat} will return more than one line, thus cause the tool might check/kill the wrong process Actually in HBaseClusterManager we already introduce port as a parameter for methods like isRunning, kill, etc. So the only thing to do here is to get pid through port if port is given -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13965) Stochastic Load Balancer JMX Metrics
[ https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627012#comment-14627012 ] Ted Yu commented on HBASE-13965: w.r.t. test failure of TestWALProcedureStoreOnHDFS: it is not related to your patch. For #3, it should be done in separate JIRA. Stochastic Load Balancer JMX Metrics Key: HBASE-13965 URL: https://issues.apache.org/jira/browse/HBASE-13965 Project: HBase Issue Type: Improvement Components: Balancer, metrics Reporter: Lei Chen Assignee: Lei Chen Attachments: HBASE-13965-v3.patch, HBASE-13965-v4.patch, HBASE-13965-v5.patch, HBASE-13965-v6.patch, HBASE-13965-v7.patch, HBASE-13965-v8.patch, HBASE-13965_v2.patch, HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png Today’s default HBase load balancer (the Stochastic load balancer) is cost function based. The cost function weights are tunable but no visibility into those cost function results is directly provided. A driving example is a cluster we have been tuning which has skewed rack size (one rack has half the nodes of the other few racks). We are tuning the cluster for uniform response time from all region servers with the ability to tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and RegionCountSkew Cost is difficult without a way to attribute each cost function’s contribution to overall cost. What this jira proposes is to provide visibility via JMX into each cost function of the stochastic load balancer, as well as the overall cost of the balancing plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14077) Add package to hbase-protocol protobuf files.
[ https://issues.apache.org/jira/browse/HBASE-14077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HBASE-14077: -- Status: Patch Available (was: Open) Add package to hbase-protocol protobuf files. - Key: HBASE-14077 URL: https://issues.apache.org/jira/browse/HBASE-14077 Project: HBase Issue Type: Bug Components: Protobufs Affects Versions: 2.0.0, 1.2.0, 1.3.0 Reporter: Elliott Clark Assignee: Elliott Clark Fix For: 2.0.0, 1.3.0 Attachments: HBASE-14077.patch c++ generated code is currently in the default namespace. That's bad practice; so lets fix it -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13965) Stochastic Load Balancer JMX Metrics
[ https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei Chen updated HBASE-13965: - Attachment: HBASE-13965-v8.patch Updates: 1. Use the number of all tables (including system tables) to calculate the size of the MRU map. This should be fine since we are trying to avoid OOM, not necessarily calculate the exact number of metrics needed. 2. formatting and spelling improvements TODO: 1. The unit test uses 61120 as the JMX registry port. I noticed that in one of the recent QA test results, it reports a Port already in use error. Should I change the port? 2. The last two patches failed the core tests. However I'm not sure that the failed test, TestWALProcedureStoreOnHDFS.testWalRollOnLowReplication, is related to this patch. 3. About removing the per-table mode entirely, I'm not sure it should be included in this JIRA. Stochastic Load Balancer JMX Metrics Key: HBASE-13965 URL: https://issues.apache.org/jira/browse/HBASE-13965 Project: HBase Issue Type: Improvement Components: Balancer, metrics Reporter: Lei Chen Assignee: Lei Chen Attachments: HBASE-13965-v3.patch, HBASE-13965-v4.patch, HBASE-13965-v5.patch, HBASE-13965-v6.patch, HBASE-13965-v7.patch, HBASE-13965-v8.patch, HBASE-13965_v2.patch, HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png Today’s default HBase load balancer (the Stochastic load balancer) is cost function based. The cost function weights are tunable but no visibility into those cost function results is directly provided. A driving example is a cluster we have been tuning which has skewed rack size (one rack has half the nodes of the other few racks). We are tuning the cluster for uniform response time from all region servers with the ability to tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and RegionCountSkew Cost is difficult without a way to attribute each cost function’s contribution to overall cost. What this jira proposes is to provide visibility via JMX into each cost function of the stochastic load balancer, as well as the overall cost of the balancing plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-14032) HBase Backup/Restore Phase 1: Abstract SnapshotCopy (full backup)
[ https://issues.apache.org/jira/browse/HBASE-14032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov resolved HBASE-14032. --- Resolution: Implemented HBASE-14030 v3 contains the patch for the feature. HBase Backup/Restore Phase 1: Abstract SnapshotCopy (full backup) - Key: HBASE-14032 URL: https://issues.apache.org/jira/browse/HBASE-14032 Project: HBase Issue Type: Task Reporter: Vladimir Rodionov Assignee: Vladimir Rodionov Abstract SnapshotCopy (full backup) to support non-M/R based implementations. Provide M/R implementation. SnapshotCopy is used to copy snapshot’s data during full backup operation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14075) HBaseClusterManager should use port(if given) to find pid
[ https://issues.apache.org/jira/browse/HBASE-14075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-14075: --- Status: Patch Available (was: Open) HBaseClusterManager should use port(if given) to find pid - Key: HBASE-14075 URL: https://issues.apache.org/jira/browse/HBASE-14075 Project: HBase Issue Type: Bug Reporter: Yu Li Assignee: Yu Li Priority: Minor Attachments: HBASE-14075-master_v2.patch, HBASE-14075.patch This issue is found while we run ITBLL in distributed cluster. Our testing env is kind of special that we run multiple regionserver instance on a single physical machine, so {noformat}ps -ef | grep proc_regionserver{noformat} will return more than one line, thus cause the tool might check/kill the wrong process Actually in HBaseClusterManager we already introduce port as a parameter for methods like isRunning, kill, etc. So the only thing to do here is to get pid through port if port is given -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14079) improve error message when Master fails to connect to Hadoop-auth
[ https://issues.apache.org/jira/browse/HBASE-14079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Busbey updated HBASE-14079: Issue Type: Bug (was: Improvement) improve error message when Master fails to connect to Hadoop-auth - Key: HBASE-14079 URL: https://issues.apache.org/jira/browse/HBASE-14079 Project: HBase Issue Type: Bug Components: master Affects Versions: 2.0.0 Reporter: Sean Busbey Fix For: 2.0.0 Current error message at INFO level doesn't give any hint about what keytab and principle are in use {quote} 2015-07-14 13:32:48,514 INFO [main] impl.MetricsConfig: loaded properties from hadoop-metrics2-hbase.properties 2015-07-14 13:32:48,640 INFO [main] impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s). 2015-07-14 13:32:48,640 INFO [main] impl.MetricsSystemImpl: HBase metrics system started 2015-07-14 13:32:48,776 ERROR [main] master.HMasterCommandLine: Master exiting java.lang.RuntimeException: Failed construction of Master: class org.apache.hadoop.hbase.master.HMaster at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:2258) at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:234) at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:140) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126) at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2272) Caused by: javax.security.auth.login.LoginException: Unable to obtain password from user at com.sun.security.auth.module.Krb5LoginModule.promptForPass(Krb5LoginModule.java:856) at com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:719) at com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:584) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at javax.security.auth.login.LoginContext.invoke(LoginContext.java:762) at javax.security.auth.login.LoginContext.access$000(LoginContext.java:203) at javax.security.auth.login.LoginContext$4.run(LoginContext.java:690) at javax.security.auth.login.LoginContext$4.run(LoginContext.java:688) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:687) at javax.security.auth.login.LoginContext.login(LoginContext.java:595) at org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:912) at org.apache.hadoop.security.SecurityUtil.login(SecurityUtil.java:242) at org.apache.hadoop.hbase.security.User$SecureHadoopUser.login(User.java:385) at org.apache.hadoop.hbase.security.User.login(User.java:252) at org.apache.hadoop.hbase.security.UserProvider.login(UserProvider.java:115) at org.apache.hadoop.hbase.master.HMaster.login(HMaster.java:464) at org.apache.hadoop.hbase.regionserver.HRegionServer.init(HRegionServer.java:553) at org.apache.hadoop.hbase.master.HMaster.init(HMaster.java:351) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:2253) ... 5 more {quote} increasing to DEBUG also doesn't help. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14079) improve error message when Master fails to connect to Hadoop-auth
[ https://issues.apache.org/jira/browse/HBASE-14079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627251#comment-14627251 ] Sean Busbey commented on HBASE-14079: - digging in, it looks like the Hadoop exception give use the info we want {code} try { login = newLoginContext(HadoopConfiguration.KEYTAB_KERBEROS_CONFIG_NAME, subject, new HadoopConfiguration()); start = Time.now(); login.login(); metrics.loginSuccess.add(Time.now() - start); loginUser = new UserGroupInformation(subject); loginUser.setLogin(login); loginUser.setAuthenticationMethod(AuthenticationMethod.KERBEROS); } catch (LoginException le) { if (start 0) { metrics.loginFailure.add(Time.now() - start); } throw new IOException(Login failure for + user + from keytab + path, le); } {code} Unfortunately, when that IOException gets all the way back up to us we strip off the outer IOException and discard it. {code} public static HMaster constructMaster(Class? extends HMaster masterClass, final Configuration conf, final CoordinatedStateManager cp) { try { Constructor? extends HMaster c = masterClass.getConstructor(Configuration.class, CoordinatedStateManager.class); return c.newInstance(conf, cp); } catch (InvocationTargetException ite) { Throwable target = ite.getTargetException() != null? ite.getTargetException(): ite; if (target.getCause() != null) target = target.getCause(); throw new RuntimeException(Failed construction of Master: + masterClass.toString(), target); } catch (Exception e) { throw new RuntimeException(Failed construction of Master: + masterClass.toString() + ((e.getCause() != null)? e.getCause().getMessage(): ), e); } } {code} This looks like a bug. We should at least log the original exception. improve error message when Master fails to connect to Hadoop-auth - Key: HBASE-14079 URL: https://issues.apache.org/jira/browse/HBASE-14079 Project: HBase Issue Type: Improvement Components: master Affects Versions: 2.0.0 Reporter: Sean Busbey Fix For: 2.0.0 Current error message at INFO level doesn't give any hint about what keytab and principle are in use {quote} 2015-07-14 13:32:48,514 INFO [main] impl.MetricsConfig: loaded properties from hadoop-metrics2-hbase.properties 2015-07-14 13:32:48,640 INFO [main] impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s). 2015-07-14 13:32:48,640 INFO [main] impl.MetricsSystemImpl: HBase metrics system started 2015-07-14 13:32:48,776 ERROR [main] master.HMasterCommandLine: Master exiting java.lang.RuntimeException: Failed construction of Master: class org.apache.hadoop.hbase.master.HMaster at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:2258) at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:234) at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:140) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126) at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2272) Caused by: javax.security.auth.login.LoginException: Unable to obtain password from user at com.sun.security.auth.module.Krb5LoginModule.promptForPass(Krb5LoginModule.java:856) at com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:719) at com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:584) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at javax.security.auth.login.LoginContext.invoke(LoginContext.java:762) at javax.security.auth.login.LoginContext.access$000(LoginContext.java:203) at javax.security.auth.login.LoginContext$4.run(LoginContext.java:690) at javax.security.auth.login.LoginContext$4.run(LoginContext.java:688) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:687) at javax.security.auth.login.LoginContext.login(LoginContext.java:595) at org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:912)
[jira] [Resolved] (HBASE-14080) Cherry-pick addendums to HBASE-13084
[ https://issues.apache.org/jira/browse/HBASE-14080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar resolved HBASE-14080. --- Resolution: Fixed Pushed the addendums to branch-1.0. Ran TestShell 20 times. Cherry-pick addendums to HBASE-13084 - Key: HBASE-14080 URL: https://issues.apache.org/jira/browse/HBASE-14080 Project: HBase Issue Type: Sub-task Components: test Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 1.0.2 Parent jira is closed, so had to create a subtask. TestShell seems failing flakily in jenkins runs. We'll cherry-pick the addendum patches to the parent jira in branch-1.0. The main patch is already in. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HBASE-14065) ref guide section on release candidate generation refers to old doc files
[ https://issues.apache.org/jira/browse/HBASE-14065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Liptak reassigned HBASE-14065: Assignee: Gabor Liptak ref guide section on release candidate generation refers to old doc files - Key: HBASE-14065 URL: https://issues.apache.org/jira/browse/HBASE-14065 Project: HBase Issue Type: Bug Components: documentation Reporter: Sean Busbey Assignee: Gabor Liptak currently it says to copy files from the master version of {{src/main/docbkx}} which is incorrect since the move to asciidoc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13706) CoprocessorClassLoader should not exempt Hive classes
[ https://issues.apache.org/jira/browse/HBASE-13706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627314#comment-14627314 ] Andrew Purtell commented on HBASE-13706: Oh, I see, 'org.apache.*'. Of course, sorry. CoprocessorClassLoader should not exempt Hive classes - Key: HBASE-13706 URL: https://issues.apache.org/jira/browse/HBASE-13706 Project: HBase Issue Type: Bug Components: Coprocessors Affects Versions: 2.0.0, 1.0.1, 1.1.0, 0.98.12 Reporter: Jerry He Priority: Minor CoprocessorClassLoader is used to load classes from the coprocessor jar. Certain classes are exempt from being loaded by this ClassLoader, which means they will be ignored in the coprocessor jar, but loaded from parent classpath instead. One problem is that we categorically exempt org.apache.hadoop. But it happens that Hive packages start with org.apache.hadoop. There is no reason to exclude hive classes from theCoprocessorClassLoader. HBase does not even include Hive jars. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (HBASE-13706) CoprocessorClassLoader should not exempt Hive classes
[ https://issues.apache.org/jira/browse/HBASE-13706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell reopened HBASE-13706: CoprocessorClassLoader should not exempt Hive classes - Key: HBASE-13706 URL: https://issues.apache.org/jira/browse/HBASE-13706 Project: HBase Issue Type: Bug Components: Coprocessors Affects Versions: 2.0.0, 1.0.1, 1.1.0, 0.98.12 Reporter: Jerry He Priority: Minor CoprocessorClassLoader is used to load classes from the coprocessor jar. Certain classes are exempt from being loaded by this ClassLoader, which means they will be ignored in the coprocessor jar, but loaded from parent classpath instead. One problem is that we categorically exempt org.apache.hadoop. But it happens that Hive packages start with org.apache.hadoop. There is no reason to exclude hive classes from theCoprocessorClassLoader. HBase does not even include Hive jars. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13981) Fix ImportTsv spelling and usage issues
[ https://issues.apache.org/jira/browse/HBASE-13981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627377#comment-14627377 ] Gabor Liptak commented on HBASE-13981: -- [~appy] I reformatted the patch. Thanks Fix ImportTsv spelling and usage issues --- Key: HBASE-13981 URL: https://issues.apache.org/jira/browse/HBASE-13981 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 1.1.0.1 Reporter: Lars George Assignee: Gabor Liptak Labels: beginner Fix For: 2.0.0, 1.3.0 Attachments: HBASE-13981.1.patch, HBASE-13981.2.patch, HBASE-13981.3.patch, HBASE-13981.4.patch The {{ImportTsv}} tool has various spelling and formatting issues. Fix those. In code: {noformat} public final static String ATTRIBUTE_SEPERATOR_CONF_KEY = attributes.seperator; {noformat} It is separator. In usage text: {noformat} input data. Another special columnHBASE_TS_KEY designates that this column should be {noformat} Space missing. {noformat} Record with invalid timestamps (blank, non-numeric) will be treated as bad record. {noformat} Records ... as bad records - plural missing twice. {noformat} HBASE_ATTRIBUTES_KEY can be used to specify Operation Attributes per record. Should be specified as key=value where -1 is used as the seperator. Note that more than one OperationAttributes can be specified. {noformat} - Remove line wraps and indentation. - Fix separator. - Fix wrong separator being output, it is not -1 (wrong constant use in code) - General wording/style could be better (eg. last sentence now uses OperationAttributes without a space). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14082) Add replica id to JMX metrics names
Lei Chen created HBASE-14082: Summary: Add replica id to JMX metrics names Key: HBASE-14082 URL: https://issues.apache.org/jira/browse/HBASE-14082 Project: HBase Issue Type: Improvement Components: metrics Reporter: Lei Chen Assignee: Lei Chen Today, via JMX, one cannot distinguish a primary region from a replica. A possible solution is to add replica id to JMX metrics names. The benefits may include, for example: # Knowing the latency of a read request on a replica region means the first attempt to the primary region has timeout. # Write requests on replicas are due to the replication process, while the ones on primary are from clients. # In case of looking for hot spots of read operations, replicas should be excluded since TIMELINE reads are sent to all replicas. To implement, we can change the format of metrics names found at {code}Hadoop-HBase-RegionServer-Regions-Attributes{code} from {code}namespace_namespace_table_tablename_region_regionname_metric_metricname{code} to {code}namespace_namespace_table_tablename_region_regionname_replicaid_replicaid_metric_metricname{code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14082) Add replica id to JMX metrics names
[ https://issues.apache.org/jira/browse/HBASE-14082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627396#comment-14627396 ] Lei Chen commented on HBASE-14082: -- If the changes are # adding a getter, getReplicaId(), to MetricsRegionWrapper.java # inserting a string, _replicaid_ + regionWrapper.getReplicaId(), to MetricsRegionSourceImpl.java Should I include a test case? I'm not sure if it is preferred that every change should be covered by unit tests. Add replica id to JMX metrics names --- Key: HBASE-14082 URL: https://issues.apache.org/jira/browse/HBASE-14082 Project: HBase Issue Type: Improvement Components: metrics Reporter: Lei Chen Assignee: Lei Chen Today, via JMX, one cannot distinguish a primary region from a replica. A possible solution is to add replica id to JMX metrics names. The benefits may include, for example: # Knowing the latency of a read request on a replica region means the first attempt to the primary region has timeout. # Write requests on replicas are due to the replication process, while the ones on primary are from clients. # In case of looking for hot spots of read operations, replicas should be excluded since TIMELINE reads are sent to all replicas. To implement, we can change the format of metrics names found at {code}Hadoop-HBase-RegionServer-Regions-Attributes{code} from {code}namespace_namespace_table_tablename_region_regionname_metric_metricname{code} to {code}namespace_namespace_table_tablename_region_regionname_replicaid_replicaid_metric_metricname{code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14065) ref guide section on release candidate generation refers to old doc files
[ https://issues.apache.org/jira/browse/HBASE-14065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627423#comment-14627423 ] Hadoop QA commented on HBASE-14065: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12745352/HBASE-14065.1.patch against master branch at commit 2f327c911056d02813f642503db9a4383e8b4a2f. ATTACHMENT ID: 12745352 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+0 tests included{color}. The patch appears to be a documentation patch that doesn't require tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn post-site goal to fail. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.master.TestDistributedLogSplitting Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/14774//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/14774//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/14774//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/14774//console This message is automatically generated. ref guide section on release candidate generation refers to old doc files - Key: HBASE-14065 URL: https://issues.apache.org/jira/browse/HBASE-14065 Project: HBase Issue Type: Bug Components: documentation Reporter: Sean Busbey Assignee: Gabor Liptak Attachments: HBASE-14065.1.patch currently it says to copy files from the master version of {{src/main/docbkx}} which is incorrect since the move to asciidoc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13743) Backport HBASE-13709 (Updates to meta table server columns may be eclipsed) to 0.98
[ https://issues.apache.org/jira/browse/HBASE-13743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-13743: --- Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Thanks [~enis]. Pushed to 0.98 Backport HBASE-13709 (Updates to meta table server columns may be eclipsed) to 0.98 --- Key: HBASE-13743 URL: https://issues.apache.org/jira/browse/HBASE-13743 Project: HBase Issue Type: Sub-task Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 0.98.14 Attachments: HBASE-13743-0.98.patch The problem addressed with HBASE-13709 is more likely on branch-1 and later but still an issue with the 0.98 code. Backport doesn't look too difficult but nontrivial due to the number of fix ups needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13706) CoprocessorClassLoader should not exempt Hive classes
[ https://issues.apache.org/jira/browse/HBASE-13706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627276#comment-14627276 ] Jerry He commented on HBASE-13706: -- Hi, [~apurtell] bq. We shouldn't be exempting foreign classes in our classloader. HBase doesn't know anything about Hive, nor should it. Exactly. Here is an example. In my coprocessor implementation jar, I have org.apache.hadoop.hive classes embedded. But the CoprocessorClassLoader will exempt/skip these classes when loading the coprocessor implementation jar, which is not right. I will get a patch ... simple patch. CoprocessorClassLoader should not exempt Hive classes - Key: HBASE-13706 URL: https://issues.apache.org/jira/browse/HBASE-13706 Project: HBase Issue Type: Bug Components: Coprocessors Affects Versions: 2.0.0, 1.0.1, 1.1.0, 0.98.12 Reporter: Jerry He Priority: Minor CoprocessorClassLoader is used to load classes from the coprocessor jar. Certain classes are exempt from being loaded by this ClassLoader, which means they will be ignored in the coprocessor jar, but loaded from parent classpath instead. One problem is that we categorically exempt org.apache.hadoop. But it happens that Hive packages start with org.apache.hadoop. There is no reason to exclude hive classes from theCoprocessorClassLoader. HBase does not even include Hive jars. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13981) Fix ImportTsv spelling and usage issues
[ https://issues.apache.org/jira/browse/HBASE-13981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627391#comment-14627391 ] Apekshit Sharma commented on HBASE-13981: - Great, comments are fixed and existing wrong configuration has been deprecated. However, a new configuration that'll eventually replace the deprecated should be added too. (from last comment) {code} + @Deprecated public final static String ATTRIBUTE_SEPERATOR_CONF_KEY = attributes.seperator; + public final static String ATTRIBUTE_SEPERATOR_CONF_KEY = attributes.separator; {code} [~larsgeorge] this patch should be ready soon. Can you please commit it when it's ready. (i don't have permissions) Fix ImportTsv spelling and usage issues --- Key: HBASE-13981 URL: https://issues.apache.org/jira/browse/HBASE-13981 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 1.1.0.1 Reporter: Lars George Assignee: Gabor Liptak Labels: beginner Fix For: 2.0.0, 1.3.0 Attachments: HBASE-13981.1.patch, HBASE-13981.2.patch, HBASE-13981.3.patch, HBASE-13981.4.patch The {{ImportTsv}} tool has various spelling and formatting issues. Fix those. In code: {noformat} public final static String ATTRIBUTE_SEPERATOR_CONF_KEY = attributes.seperator; {noformat} It is separator. In usage text: {noformat} input data. Another special columnHBASE_TS_KEY designates that this column should be {noformat} Space missing. {noformat} Record with invalid timestamps (blank, non-numeric) will be treated as bad record. {noformat} Records ... as bad records - plural missing twice. {noformat} HBASE_ATTRIBUTES_KEY can be used to specify Operation Attributes per record. Should be specified as key=value where -1 is used as the seperator. Note that more than one OperationAttributes can be specified. {noformat} - Remove line wraps and indentation. - Fix separator. - Fix wrong separator being output, it is not -1 (wrong constant use in code) - General wording/style could be better (eg. last sentence now uses OperationAttributes without a space). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13981) Fix ImportTsv spelling and usage issues
[ https://issues.apache.org/jira/browse/HBASE-13981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627454#comment-14627454 ] Apekshit Sharma commented on HBASE-13981: - [~gliptak] Yeah, you're right. +1. Fix ImportTsv spelling and usage issues --- Key: HBASE-13981 URL: https://issues.apache.org/jira/browse/HBASE-13981 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 1.1.0.1 Reporter: Lars George Assignee: Gabor Liptak Labels: beginner Fix For: 2.0.0, 1.3.0 Attachments: HBASE-13981.1.patch, HBASE-13981.2.patch, HBASE-13981.3.patch, HBASE-13981.4.patch The {{ImportTsv}} tool has various spelling and formatting issues. Fix those. In code: {noformat} public final static String ATTRIBUTE_SEPERATOR_CONF_KEY = attributes.seperator; {noformat} It is separator. In usage text: {noformat} input data. Another special columnHBASE_TS_KEY designates that this column should be {noformat} Space missing. {noformat} Record with invalid timestamps (blank, non-numeric) will be treated as bad record. {noformat} Records ... as bad records - plural missing twice. {noformat} HBASE_ATTRIBUTES_KEY can be used to specify Operation Attributes per record. Should be specified as key=value where -1 is used as the seperator. Note that more than one OperationAttributes can be specified. {noformat} - Remove line wraps and indentation. - Fix separator. - Fix wrong separator being output, it is not -1 (wrong constant use in code) - General wording/style could be better (eg. last sentence now uses OperationAttributes without a space). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14030) HBase Backup/Restore Phase 1
[ https://issues.apache.org/jira/browse/HBASE-14030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627162#comment-14627162 ] Vladimir Rodionov commented on HBASE-14030: --- Patch v3 = Patch v2 + HBASE-14031 + HBASE-14032 + HBASE-14033 HBase Backup/Restore Phase 1 Key: HBASE-14030 URL: https://issues.apache.org/jira/browse/HBASE-14030 Project: HBase Issue Type: Umbrella Reporter: Vladimir Rodionov Assignee: Vladimir Rodionov Attachments: HBASE-14030-v0.patch, HBASE-14030-v1.patch, HBASE-14030-v2.patch, HBASE-14030-v3.patch This is the umbrella ticket for Backup/Restore Phase 1. See HBASE-7912 design doc for the phase description. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14079) improve error message when Master fails to connect to Hadoop-auth
Sean Busbey created HBASE-14079: --- Summary: improve error message when Master fails to connect to Hadoop-auth Key: HBASE-14079 URL: https://issues.apache.org/jira/browse/HBASE-14079 Project: HBase Issue Type: Improvement Components: master Affects Versions: 2.0.0 Reporter: Sean Busbey Fix For: 2.0.0 Current error message at INFO level doesn't give any hint about what keytab and principle are in use {quote} 2015-07-14 13:32:48,514 INFO [main] impl.MetricsConfig: loaded properties from hadoop-metrics2-hbase.properties 2015-07-14 13:32:48,640 INFO [main] impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s). 2015-07-14 13:32:48,640 INFO [main] impl.MetricsSystemImpl: HBase metrics system started 2015-07-14 13:32:48,776 ERROR [main] master.HMasterCommandLine: Master exiting java.lang.RuntimeException: Failed construction of Master: class org.apache.hadoop.hbase.master.HMaster at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:2258) at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:234) at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:140) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126) at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2272) Caused by: javax.security.auth.login.LoginException: Unable to obtain password from user at com.sun.security.auth.module.Krb5LoginModule.promptForPass(Krb5LoginModule.java:856) at com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:719) at com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:584) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at javax.security.auth.login.LoginContext.invoke(LoginContext.java:762) at javax.security.auth.login.LoginContext.access$000(LoginContext.java:203) at javax.security.auth.login.LoginContext$4.run(LoginContext.java:690) at javax.security.auth.login.LoginContext$4.run(LoginContext.java:688) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:687) at javax.security.auth.login.LoginContext.login(LoginContext.java:595) at org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:912) at org.apache.hadoop.security.SecurityUtil.login(SecurityUtil.java:242) at org.apache.hadoop.hbase.security.User$SecureHadoopUser.login(User.java:385) at org.apache.hadoop.hbase.security.User.login(User.java:252) at org.apache.hadoop.hbase.security.UserProvider.login(UserProvider.java:115) at org.apache.hadoop.hbase.master.HMaster.login(HMaster.java:464) at org.apache.hadoop.hbase.regionserver.HRegionServer.init(HRegionServer.java:553) at org.apache.hadoop.hbase.master.HMaster.init(HMaster.java:351) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:2253) ... 5 more {quote} increasing to DEBUG also doesn't help. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14077) Add package to hbase-protocol protobuf files.
[ https://issues.apache.org/jira/browse/HBASE-14077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627258#comment-14627258 ] Elliott Clark commented on HBASE-14077: --- Cpp projects are usually pretty flat, so I would prefer something with a little less typing. Add package to hbase-protocol protobuf files. - Key: HBASE-14077 URL: https://issues.apache.org/jira/browse/HBASE-14077 Project: HBase Issue Type: Bug Components: Protobufs Affects Versions: 2.0.0, 1.2.0, 1.3.0 Reporter: Elliott Clark Assignee: Elliott Clark Fix For: 2.0.0, 1.3.0 Attachments: HBASE-14077.patch c++ generated code is currently in the default namespace. That's bad practice; so lets fix it -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13981) Fix ImportTsv spelling and usage issues
[ https://issues.apache.org/jira/browse/HBASE-13981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627408#comment-14627408 ] Gabor Liptak commented on HBASE-13981: -- [~appy] As I mentioned above, I do not see references to ATTRIBUTE_SEPERATOR_CONF_KEY in the codebase, hence I didn't create a replacement define. Can you double check? Thanks Fix ImportTsv spelling and usage issues --- Key: HBASE-13981 URL: https://issues.apache.org/jira/browse/HBASE-13981 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 1.1.0.1 Reporter: Lars George Assignee: Gabor Liptak Labels: beginner Fix For: 2.0.0, 1.3.0 Attachments: HBASE-13981.1.patch, HBASE-13981.2.patch, HBASE-13981.3.patch, HBASE-13981.4.patch The {{ImportTsv}} tool has various spelling and formatting issues. Fix those. In code: {noformat} public final static String ATTRIBUTE_SEPERATOR_CONF_KEY = attributes.seperator; {noformat} It is separator. In usage text: {noformat} input data. Another special columnHBASE_TS_KEY designates that this column should be {noformat} Space missing. {noformat} Record with invalid timestamps (blank, non-numeric) will be treated as bad record. {noformat} Records ... as bad records - plural missing twice. {noformat} HBASE_ATTRIBUTES_KEY can be used to specify Operation Attributes per record. Should be specified as key=value where -1 is used as the seperator. Note that more than one OperationAttributes can be specified. {noformat} - Remove line wraps and indentation. - Fix separator. - Fix wrong separator being output, it is not -1 (wrong constant use in code) - General wording/style could be better (eg. last sentence now uses OperationAttributes without a space). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11339) HBase MOB
[ https://issues.apache.org/jira/browse/HBASE-11339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627443#comment-14627443 ] Jingcheng Du commented on HBASE-11339: -- Thanks Ted! I've uploaded patch v10 to RB. The hbase group members can read it by the link https://reviews.apache.org/r/36391/. Thanks. HBase MOB - Key: HBASE-11339 URL: https://issues.apache.org/jira/browse/HBASE-11339 Project: HBase Issue Type: Umbrella Components: regionserver, Scanners Affects Versions: 2.0.0 Reporter: Jingcheng Du Assignee: Jingcheng Du Fix For: hbase-11339 Attachments: 11339-master-v10.patch, 11339-master-v3.txt, 11339-master-v4.txt, 11339-master-v5.txt, 11339-master-v6.txt, 11339-master-v7.txt, 11339-master-v8.patch, 11339-master-v9.patch, HBase MOB Design-v2.pdf, HBase MOB Design-v3.pdf, HBase MOB Design-v4.pdf, HBase MOB Design-v5.pdf, HBase MOB Design.pdf, MOB user guide.docx, MOB user guide_v2.docx, MOB user guide_v3.docx, MOB user guide_v4.docx, MOB user guide_v5.docx, hbase-11339-150519.patch, hbase-11339-in-dev.patch, hbase-11339.150417.patch, merge-150212.patch, merge.150212b.patch, merge.150212c.patch, merge.150710.patch It's quite useful to save the medium binary data like images, documents into Apache HBase. Unfortunately directly saving the binary MOB(medium object) to HBase leads to a worse performance since the frequent split and compaction. In this design, the MOB data are stored in an more efficient way, which keeps a high write/read performance and guarantees the data consistency in Apache HBase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13706) CoprocessorClassLoader should not exempt Hive classes
[ https://issues.apache.org/jira/browse/HBASE-13706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry He updated HBASE-13706: - Fix Version/s: 1.1.2 1.0.2 0.98.14 2.0.0 Status: Patch Available (was: Reopened) CoprocessorClassLoader should not exempt Hive classes - Key: HBASE-13706 URL: https://issues.apache.org/jira/browse/HBASE-13706 Project: HBase Issue Type: Bug Components: Coprocessors Affects Versions: 0.98.12, 1.1.0, 1.0.1, 2.0.0 Reporter: Jerry He Assignee: Jerry He Priority: Minor Fix For: 2.0.0, 0.98.14, 1.0.2, 1.1.2 Attachments: HBASE-13706.patch CoprocessorClassLoader is used to load classes from the coprocessor jar. Certain classes are exempt from being loaded by this ClassLoader, which means they will be ignored in the coprocessor jar, but loaded from parent classpath instead. One problem is that we categorically exempt org.apache.hadoop. But it happens that Hive packages start with org.apache.hadoop. There is no reason to exclude hive classes from theCoprocessorClassLoader. HBase does not even include Hive jars. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14080) Cherry-pick addendums to HBASE-13084
Enis Soztutar created HBASE-14080: - Summary: Cherry-pick addendums to HBASE-13084 Key: HBASE-14080 URL: https://issues.apache.org/jira/browse/HBASE-14080 Project: HBase Issue Type: Sub-task Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 1.0.2 Parent jira is closed, so had to create a subtask. TestShell seems failing flakily in jenkins runs. We'll cherry-pick the addendum patches to the parent jira in branch-1.0. The main patch is already in. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14075) HBaseClusterManager should use port(if given) to find pid
[ https://issues.apache.org/jira/browse/HBASE-14075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627265#comment-14627265 ] Hadoop QA commented on HBASE-14075: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12745332/HBASE-14075-master_v3.patch against master branch at commit 2f327c911056d02813f642503db9a4383e8b4a2f. ATTACHMENT ID: 12745332 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 53 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 lineLengths{color}. The patch introduces the following lines longer than 100: +clusterManager.kill(ServiceType.HBASE_MASTER, serverName.getHostname(), serverName.getPort(), pid); +execWithRetries(hostname, getCommandProvider(service).signalCommand(service, signal, port, pid)); {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/14772//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/14772//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/14772//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/14772//console This message is automatically generated. HBaseClusterManager should use port(if given) to find pid - Key: HBASE-14075 URL: https://issues.apache.org/jira/browse/HBASE-14075 Project: HBase Issue Type: Bug Reporter: Yu Li Assignee: Yu Li Priority: Minor Attachments: HBASE-14075-master_v2.patch, HBASE-14075-master_v3.patch, HBASE-14075.patch This issue is found while we run ITBLL in distributed cluster. Our testing env is kind of special that we run multiple regionserver instance on a single physical machine, so {noformat}ps -ef | grep proc_regionserver{noformat} will return more than one line, thus cause the tool might check/kill the wrong process Actually in HBaseClusterManager we already introduce port as a parameter for methods like isRunning, kill, etc. So the only thing to do here is to get pid through port if port is given -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (HBASE-12596) bulkload needs to follow locality
[ https://issues.apache.org/jira/browse/HBASE-12596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell reopened HBASE-12596: Getting a compile error on 0.98 when built with Java 6, which is still in the support matrix for that release: {noformat} [ERROR] /home/apurtell/src/hbase/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat2.java:[240,69] getHostString() is not public in java.net.InetSocketAddress; cannot be accessed from outside package {noformat} bulkload needs to follow locality - Key: HBASE-12596 URL: https://issues.apache.org/jira/browse/HBASE-12596 Project: HBase Issue Type: Improvement Components: HFile, regionserver Affects Versions: 0.98.8 Environment: hadoop-2.3.0, hbase-0.98.8, jdk1.7 Reporter: Victor Xu Assignee: Victor Xu Fix For: 2.0.0, 0.98.14, 1.3.0 Attachments: HBASE-12596-0.98-v1.patch, HBASE-12596-0.98-v2.patch, HBASE-12596-0.98-v3.patch, HBASE-12596-0.98-v4.patch, HBASE-12596-0.98-v5.patch, HBASE-12596-0.98-v6.patch, HBASE-12596-branch-1-v1.patch, HBASE-12596-branch-1-v2.patch, HBASE-12596-master-v1.patch, HBASE-12596-master-v2.patch, HBASE-12596-master-v3.patch, HBASE-12596-master-v4.patch, HBASE-12596-master-v5.patch, HBASE-12596-master-v6.patch, HBASE-12596.patch Normally, we have 2 steps to perform a bulkload: 1. use a job to write HFiles to be loaded; 2. Move these HFiles to the right hdfs directory. However, the locality could be loss during the first step. Why not just write the HFiles directly into the right place? We can do this easily because StoreFile.WriterBuilder has the withFavoredNodes method, and we just need to call it in HFileOutputFormat's getNewWriter(). This feature is enabled by default, and we could use 'hbase.bulkload.locality.sensitive.enabled=false' to disable it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HBASE-13897) OOM may occur when Import imports a row with too many KeyValues
[ https://issues.apache.org/jira/browse/HBASE-13897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627270#comment-14627270 ] Andrew Purtell edited comment on HBASE-13897 at 7/14/15 11:32 PM: -- Here are more Hadoop 1.x errors (hadoop.profile=1.1): {noformat} [ERROR] symbol : variable TaskCounter [ERROR] location: class org.apache.hadoop.hbase.mapreduce.Import [ERROR] /home/apurtell/src/hbase/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/Import.java:[70,34] cannot find symbol [ERROR] symbol : class TaskCounter [ERROR] location: package org.apache.hadoop.mapreduce [ERROR] /home/apurtell/src/hbase/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/Import.java:[758,54] cannot find symbol [ERROR] symbol : variable TaskCounter [ERROR] location: class org.apache.hadoop.hbase.mapreduce.Import [ERROR] /home/apurtell/src/hbase/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/Import.java:[759,55] cannot find symbol [ERROR] - [Help 1] {noformat} Reverting from 0.98 for now (commit 372b71b) was (Author: apurtell): Getting a compile error on 0.98 when built with Java 6, which is still in the support matrix for that release: {noformat} [ERROR] /home/apurtell/src/hbase/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat2.java:[240,69] getHostString() is not public in java.net.InetSocketAddress; cannot be accessed from outside package {noformat} And here are more Hadoop 1.x errors (hadoop.profile=1.1): {noformat} [ERROR] symbol : variable TaskCounter [ERROR] location: class org.apache.hadoop.hbase.mapreduce.Import [ERROR] /home/apurtell/src/hbase/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/Import.java:[70,34] cannot find symbol [ERROR] symbol : class TaskCounter [ERROR] location: package org.apache.hadoop.mapreduce [ERROR] /home/apurtell/src/hbase/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/Import.java:[758,54] cannot find symbol [ERROR] symbol : variable TaskCounter [ERROR] location: class org.apache.hadoop.hbase.mapreduce.Import [ERROR] /home/apurtell/src/hbase/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/Import.java:[759,55] cannot find symbol [ERROR] - [Help 1] {noformat} Reverting from 0.98 for now (commit 372b71b) OOM may occur when Import imports a row with too many KeyValues --- Key: HBASE-13897 URL: https://issues.apache.org/jira/browse/HBASE-13897 Project: HBase Issue Type: Bug Affects Versions: 0.98.13 Reporter: Liu Junhong Assignee: Liu Junhong Fix For: 2.0.0, 1.3.0 Attachments: 13897-v2.txt, HBASE-13897-0.98-20150710-suitable_for_hadoop1.patch, HBASE-13897-0.98-20150710.patch, HBASE-13897-0.98.patch, HBASE-13897-branch_1-20150709.patch, HBASE-13897-master-20150629.patch, HBASE-13897-master-20150630.patch, HBASE-13897-master-20150707.patch, HBASE-13897-master.patch When importing a row with too many KeyValues (may have too many columns or versions),KeyValueReducer will incur OOM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13352) Add hbase.import.version to Import usage.
[ https://issues.apache.org/jira/browse/HBASE-13352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-13352: --- Fix Version/s: (was: 0.98.14) Reverted from 0.98. There's a compilation issue with Hadoop 1.x: {noformat} [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.5.1:compile (default-compile) on project hbase-server: Compilation failure: Compilation failure: [ERROR] /home/apurtell/src/hbase/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/Import.java:[56,34] cannot find symbol [ERROR] symbol : class TaskCounter [ERROR] location: package org.apache.hadoop.mapreduce [ERROR] /home/apurtell/src/hbase/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/Import.java:[543,54] cannot find symbol [ERROR] symbol : variable TaskCounter [ERROR] location: class org.apache.hadoop.hbase.mapreduce.Import [ERROR] /home/apurtell/src/hbase/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/Import.java:[544,55] cannot find symbol [ERROR] symbol : variable TaskCounter [ERROR] location: class org.apache.hadoop.hbase.mapreduce.Import {noformat} Add hbase.import.version to Import usage. - Key: HBASE-13352 URL: https://issues.apache.org/jira/browse/HBASE-13352 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 1.0.2, 1.2.0, 1.1.2, 1.3.0 Attachments: 13352-v2.txt, 13352.txt, hbase-13352_v3.patch We just tried to export some (small amount of) data out of an 0.94 cluster to 0.98 cluster. We used Export/Import for that. By default we found that the import M/R job correctly reports the number of records seen, but _silently_ does not import anything. After looking at the 0.98 it's obvious there's an hbase.import.version (-Dhbase.import.version=0.94) to make this work. Two issues: # -Dhbase.import.version=0.94 should be show with the the Import.usage # If not given it should not just silently not import anything In this issue I'll just a trivially add this option to the Import tool's usage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14070) Hybrid Logical Clocks for HBase
[ https://issues.apache.org/jira/browse/HBASE-14070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-14070: --- Attachment: HybridLogicalClocksforHBaseandPhoenix.docx Hybrid Logical Clocks for HBase --- Key: HBASE-14070 URL: https://issues.apache.org/jira/browse/HBASE-14070 Project: HBase Issue Type: New Feature Reporter: Enis Soztutar Assignee: Enis Soztutar Attachments: HybridLogicalClocksforHBaseandPhoenix.docx, HybridLogicalClocksforHBaseandPhoenix.pdf HBase and Phoenix uses systems physical clock (PT) to give timestamps to events (read and writes). This works mostly when the system clock is strictly monotonically increasing and there is no cross-dependency between servers clocks. However we know that leap seconds, general clock skew and clock drift are in fact real. This jira proposes using Hybrid Logical Clocks (HLC) as an implementation of hybrid physical clock + a logical clock. HLC is best of both worlds where it keeps causality relationship similar to logical clocks, but still is compatible with NTP based physical system clock. HLC can be represented in 64bits. A design document is attached and also can be found here: https://docs.google.com/document/d/1LL2GAodiYi0waBz5ODGL4LDT4e_bXy8P9h6kWC05Bhw/edit# -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13981) Fix ImportTsv spelling and usage issues
[ https://issues.apache.org/jira/browse/HBASE-13981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Liptak updated HBASE-13981: - Attachment: HBASE-13981.4.patch Fix ImportTsv spelling and usage issues --- Key: HBASE-13981 URL: https://issues.apache.org/jira/browse/HBASE-13981 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 1.1.0.1 Reporter: Lars George Assignee: Gabor Liptak Labels: beginner Fix For: 2.0.0, 1.3.0 Attachments: HBASE-13981.1.patch, HBASE-13981.2.patch, HBASE-13981.3.patch, HBASE-13981.4.patch The {{ImportTsv}} tool has various spelling and formatting issues. Fix those. In code: {noformat} public final static String ATTRIBUTE_SEPERATOR_CONF_KEY = attributes.seperator; {noformat} It is separator. In usage text: {noformat} input data. Another special columnHBASE_TS_KEY designates that this column should be {noformat} Space missing. {noformat} Record with invalid timestamps (blank, non-numeric) will be treated as bad record. {noformat} Records ... as bad records - plural missing twice. {noformat} HBASE_ATTRIBUTES_KEY can be used to specify Operation Attributes per record. Should be specified as key=value where -1 is used as the seperator. Note that more than one OperationAttributes can be specified. {noformat} - Remove line wraps and indentation. - Fix separator. - Fix wrong separator being output, it is not -1 (wrong constant use in code) - General wording/style could be better (eg. last sentence now uses OperationAttributes without a space). -- This message was sent by Atlassian JIRA (v6.3.4#6332)