[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
[ https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448512#comment-13448512 ] stack commented on HBASE-6649: -- This patch makes sense to me. We replicate all up to the exception and then next time in, we should pick up the IOE again. Want me to commit this DD? [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1] --- Key: HBASE-6649 URL: https://issues.apache.org/jira/browse/HBASE-6649 Project: HBase Issue Type: Bug Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 0.92.3 Attachments: 6649-1.patch, 6649-2.txt, HBase-0.92 #495 test - queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover [Jenkins].html Have seen it twice in the recent past: http://bit.ly/MPCykB http://bit.ly/O79Dq7 .. Looking briefly at the logs hints at a pattern - in both the failed test instances, there was an RS crash while the test was running. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6514) unknown metrics type: org.apache.hadoop.hbase.metrics.histogram.MetricsHistogram
[ https://issues.apache.org/jira/browse/HBASE-6514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448515#comment-13448515 ] Elliott Clark commented on HBASE-6514: -- Thanks Stack. Always nice to have a double check. unknown metrics type: org.apache.hadoop.hbase.metrics.histogram.MetricsHistogram Key: HBASE-6514 URL: https://issues.apache.org/jira/browse/HBASE-6514 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.92.2, 0.94.0 Environment: MacOS 10.8 Oracle JDK 1.7 Reporter: Archimedes Trajano Assignee: Elliott Clark Fix For: 0.92.2, 0.96.0, 0.94.2 Attachments: FrameworkTest.java, FrameworkTest.java, HBASE-6514-94-0.patch, HBASE-6514-trunk-0.patch, out.txt When trying to run a unit test that just starts up and shutdown the server the following errors occur in System.out 01:10:59,874 ERROR MetricsUtil:116 - unknown metrics type: org.apache.hadoop.hbase.metrics.histogram.MetricsHistogram 01:10:59,874 ERROR MetricsUtil:116 - unknown metrics type: org.apache.hadoop.hbase.metrics.histogram.MetricsHistogram 01:10:59,875 ERROR MetricsUtil:116 - unknown metrics type: org.apache.hadoop.hbase.metrics.histogram.MetricsHistogram 01:10:59,875 ERROR MetricsUtil:116 - unknown metrics type: org.apache.hadoop.hbase.metrics.histogram.MetricsHistogram -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3976) Disable Block Cache On Compactions
[ https://issues.apache.org/jira/browse/HBASE-3976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448521#comment-13448521 ] Mikhail Bautin commented on HBASE-3976: --- Lars: I agree, cache-on-flush is definitely the most useful. This is what we are now using in production for some workloads. Disable Block Cache On Compactions -- Key: HBASE-3976 URL: https://issues.apache.org/jira/browse/HBASE-3976 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.90.3 Reporter: Karthick Sankarachary Assignee: Mikhail Bautin Priority: Minor Attachments: HBASE-3976.patch, HBASE-3976-unconditional.patch, HBASE-3976-V3.patch Is there a good reason to believe that caching blocks during compactions is beneficial? Currently, if block cache is enabled on a certain family, then every time it's compacted, we load all of its blocks into the (LRU) cache, at the expense of the legitimately hot ones. As a matter of fact, this concern was raised earlier in HBASE-1597, which rightly points out that, we should not bog down the LRU with unneccessary blocks during compaction. Even though that issue has been marked as fixed, it looks like it ought to be reopened. Should we err on the side of caution and not cache blocks during compactions period (as illustrated in the attached patch)? Or, can we be selectively aggressive about what blocks do get cached during compaction (e.g., only cache those blocks from the recent files)? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4050) Update HBase metrics framework to metrics2 framework
[ https://issues.apache.org/jira/browse/HBASE-4050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448525#comment-13448525 ] Hudson commented on HBASE-4050: --- Integrated in HBase-TRUNK #3304 (See [https://builds.apache.org/job/HBase-TRUNK/3304/]) HBASE-4050 Clean up BaseMetricsSourceImpl (Revision 1381008) Result = FAILURE stack : Files : * /hbase/trunk/hbase-hadoop1-compat/src/main/java/org/apache/hadoop/hbase/master/metrics/MasterMetricsSourceImpl.java * /hbase/trunk/hbase-hadoop1-compat/src/main/java/org/apache/hadoop/hbase/metrics/BaseMetricsSourceImpl.java * /hbase/trunk/hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/master/metrics/MasterMetricsSourceImpl.java * /hbase/trunk/hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/metrics/BaseMetricsSourceImpl.java Update HBase metrics framework to metrics2 framework Key: HBASE-4050 URL: https://issues.apache.org/jira/browse/HBASE-4050 Project: HBase Issue Type: New Feature Components: metrics Affects Versions: 0.90.4 Environment: Java 6 Reporter: Eric Yang Assignee: Elliott Clark Priority: Critical Fix For: 0.96.0 Attachments: 4050-metrics-v2.patch, 4050-metrics-v3.patch, HBASE-4050-0.patch, HBASE-4050-1.patch, HBASE-4050-2.patch, HBASE-4050-3.patch, HBASE-4050-5.patch, HBASE-4050-6.patch, HBASE-4050-7.patch, HBASE-4050-8_1.patch, HBASE-4050-8.patch, HBASE-4050.patch Metrics Framework has been marked deprecated in Hadoop 0.20.203+ and 0.22+, and it might get removed in future Hadoop release. Hence, HBase needs to revise the dependency of MetricsContext to use Metrics2 framework. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6533) [replication] replication will be block if WAL compress set differently in master and slave configuration
[ https://issues.apache.org/jira/browse/HBASE-6533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448537#comment-13448537 ] terry zhang commented on HBASE-6533: this is because of master sending the hlog entry in compress mode. But Slave do not know about it. So when slave ipc hbaseserver deserilize the buffer and read the hlog entry fields error will happen. We can let the Master send the buffer in none compress mode. then whether master use hlog compression or not. Slave both can work fine [replication] replication will be block if WAL compress set differently in master and slave configuration - Key: HBASE-6533 URL: https://issues.apache.org/jira/browse/HBASE-6533 Project: HBase Issue Type: Bug Components: replication Affects Versions: 0.94.0 Reporter: terry zhang Priority: Critical as we know in hbase 0.94.0 we have a configuration below property namehbase.regionserver.wal.enablecompression/name valuetrue/value /property if we enable it in master cluster and disable it in slave cluster . Then replication will not work. It will throw unwrapRemoteException again and again in master cluster. 2012-08-09 12:49:55,892 WARN org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Can't replicate because of an error on the remote cluster: java.io.IOException: IPC server unable to read call parameters: Error in readFields at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:95) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:79) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:635) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:365) Caused by: org.apache.hadoop.ipc.RemoteException: IPC server unable to read call parameters: Error in readFields at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:921) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:151) at $Proxy13.replicateLogEntries(Unknown Source) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:616) ... 1 more This is because Slave cluster can not parse the hlog entry . 2012-08-09 14:46:05,891 WARN org.apache.hadoop.ipc.HBaseServer: Unable to read call parameters for client 10.232.98.89 java.io.IOException: Error in readFields at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:685) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:586) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:635) at org.apache.hadoop.hbase.ipc.Invocation.readFields(Invocation.java:125) at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:1292) at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:1207) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:735) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.doRunLoop(HBaseServer.java:524) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:499) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:180) at org.apache.hadoop.hbase.KeyValue.readFields(KeyValue.java:2254) at org.apache.hadoop.hbase.regionserver.wal.WALEdit.readFields(WALEdit.java:146) at org.apache.hadoop.hbase.regionserver.wal.HLog$Entry.readFields(HLog.java:1767) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:682) ... 11 more -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more
[jira] [Updated] (HBASE-6533) [replication] replication will be block if WAL compress set differently in master and slave configuration
[ https://issues.apache.org/jira/browse/HBASE-6533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] terry zhang updated HBASE-6533: --- Priority: Critical (was: Major) [replication] replication will be block if WAL compress set differently in master and slave configuration - Key: HBASE-6533 URL: https://issues.apache.org/jira/browse/HBASE-6533 Project: HBase Issue Type: Bug Components: replication Affects Versions: 0.94.0 Reporter: terry zhang Priority: Critical as we know in hbase 0.94.0 we have a configuration below property namehbase.regionserver.wal.enablecompression/name valuetrue/value /property if we enable it in master cluster and disable it in slave cluster . Then replication will not work. It will throw unwrapRemoteException again and again in master cluster. 2012-08-09 12:49:55,892 WARN org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Can't replicate because of an error on the remote cluster: java.io.IOException: IPC server unable to read call parameters: Error in readFields at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:95) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:79) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:635) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:365) Caused by: org.apache.hadoop.ipc.RemoteException: IPC server unable to read call parameters: Error in readFields at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:921) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:151) at $Proxy13.replicateLogEntries(Unknown Source) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:616) ... 1 more This is because Slave cluster can not parse the hlog entry . 2012-08-09 14:46:05,891 WARN org.apache.hadoop.ipc.HBaseServer: Unable to read call parameters for client 10.232.98.89 java.io.IOException: Error in readFields at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:685) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:586) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:635) at org.apache.hadoop.hbase.ipc.Invocation.readFields(Invocation.java:125) at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:1292) at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:1207) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:735) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.doRunLoop(HBaseServer.java:524) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:499) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:180) at org.apache.hadoop.hbase.KeyValue.readFields(KeyValue.java:2254) at org.apache.hadoop.hbase.regionserver.wal.WALEdit.readFields(WALEdit.java:146) at org.apache.hadoop.hbase.regionserver.wal.HLog$Entry.readFields(HLog.java:1767) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:682) ... 11 more -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6533) [replication] replication will be block if WAL compress set differently in master and slave configuration
[ https://issues.apache.org/jira/browse/HBASE-6533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] terry zhang updated HBASE-6533: --- Attachment: hbase-6533.patch [replication] replication will be block if WAL compress set differently in master and slave configuration - Key: HBASE-6533 URL: https://issues.apache.org/jira/browse/HBASE-6533 Project: HBase Issue Type: Bug Components: replication Affects Versions: 0.94.0 Reporter: terry zhang Priority: Critical Attachments: hbase-6533.patch as we know in hbase 0.94.0 we have a configuration below property namehbase.regionserver.wal.enablecompression/name valuetrue/value /property if we enable it in master cluster and disable it in slave cluster . Then replication will not work. It will throw unwrapRemoteException again and again in master cluster. 2012-08-09 12:49:55,892 WARN org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Can't replicate because of an error on the remote cluster: java.io.IOException: IPC server unable to read call parameters: Error in readFields at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:95) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:79) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:635) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:365) Caused by: org.apache.hadoop.ipc.RemoteException: IPC server unable to read call parameters: Error in readFields at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:921) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:151) at $Proxy13.replicateLogEntries(Unknown Source) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:616) ... 1 more This is because Slave cluster can not parse the hlog entry . 2012-08-09 14:46:05,891 WARN org.apache.hadoop.ipc.HBaseServer: Unable to read call parameters for client 10.232.98.89 java.io.IOException: Error in readFields at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:685) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:586) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:635) at org.apache.hadoop.hbase.ipc.Invocation.readFields(Invocation.java:125) at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:1292) at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:1207) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:735) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.doRunLoop(HBaseServer.java:524) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:499) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:180) at org.apache.hadoop.hbase.KeyValue.readFields(KeyValue.java:2254) at org.apache.hadoop.hbase.regionserver.wal.WALEdit.readFields(WALEdit.java:146) at org.apache.hadoop.hbase.regionserver.wal.HLog$Entry.readFields(HLog.java:1767) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:682) ... 11 more -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6592) [shell] Add means of custom formatting output by column
[ https://issues.apache.org/jira/browse/HBASE-6592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Huang updated HBASE-6592: - Attachment: hbase-6592.patch [shell] Add means of custom formatting output by column --- Key: HBASE-6592 URL: https://issues.apache.org/jira/browse/HBASE-6592 Project: HBase Issue Type: New Feature Components: shell Reporter: stack Priority: Minor Labels: noob Attachments: hbase-6592.patch See Jacques suggestion toward end of this thread for how we should allow adding a custom formatter per column to use outputting column content in shell: http://search-hadoop.com/m/2WxUB1fuxL11/Printing+integers+in+the+Hbase+shellsubj=Printing+integers+in+the+Hbase+shell -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6592) [shell] Add means of custom formatting output by column
[ https://issues.apache.org/jira/browse/HBASE-6592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Huang updated HBASE-6592: - Attachment: (was: hbase-6592.patch) [shell] Add means of custom formatting output by column --- Key: HBASE-6592 URL: https://issues.apache.org/jira/browse/HBASE-6592 Project: HBase Issue Type: New Feature Components: shell Reporter: stack Priority: Minor Labels: noob Attachments: hbase-6592.patch See Jacques suggestion toward end of this thread for how we should allow adding a custom formatter per column to use outputting column content in shell: http://search-hadoop.com/m/2WxUB1fuxL11/Printing+integers+in+the+Hbase+shellsubj=Printing+integers+in+the+Hbase+shell -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6592) [shell] Add means of custom formatting output by column
[ https://issues.apache.org/jira/browse/HBASE-6592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448548#comment-13448548 ] Jie Huang commented on HBASE-6592: -- Add unit-test for this new feature. Any idea? [shell] Add means of custom formatting output by column --- Key: HBASE-6592 URL: https://issues.apache.org/jira/browse/HBASE-6592 Project: HBase Issue Type: New Feature Components: shell Reporter: stack Priority: Minor Labels: noob Attachments: hbase-6592.patch See Jacques suggestion toward end of this thread for how we should allow adding a custom formatter per column to use outputting column content in shell: http://search-hadoop.com/m/2WxUB1fuxL11/Printing+integers+in+the+Hbase+shellsubj=Printing+integers+in+the+Hbase+shell -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6533) [replication] replication will be block if WAL compress set differently in master and slave configuration
[ https://issues.apache.org/jira/browse/HBASE-6533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] terry zhang updated HBASE-6533: --- Fix Version/s: 0.94.3 [replication] replication will be block if WAL compress set differently in master and slave configuration - Key: HBASE-6533 URL: https://issues.apache.org/jira/browse/HBASE-6533 Project: HBase Issue Type: Bug Components: replication Affects Versions: 0.94.0 Reporter: terry zhang Priority: Critical Fix For: 0.94.3 Attachments: hbase-6533.patch as we know in hbase 0.94.0 we have a configuration below property namehbase.regionserver.wal.enablecompression/name valuetrue/value /property if we enable it in master cluster and disable it in slave cluster . Then replication will not work. It will throw unwrapRemoteException again and again in master cluster. 2012-08-09 12:49:55,892 WARN org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Can't replicate because of an error on the remote cluster: java.io.IOException: IPC server unable to read call parameters: Error in readFields at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:95) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:79) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:635) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:365) Caused by: org.apache.hadoop.ipc.RemoteException: IPC server unable to read call parameters: Error in readFields at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:921) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:151) at $Proxy13.replicateLogEntries(Unknown Source) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:616) ... 1 more This is because Slave cluster can not parse the hlog entry . 2012-08-09 14:46:05,891 WARN org.apache.hadoop.ipc.HBaseServer: Unable to read call parameters for client 10.232.98.89 java.io.IOException: Error in readFields at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:685) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:586) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:635) at org.apache.hadoop.hbase.ipc.Invocation.readFields(Invocation.java:125) at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:1292) at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:1207) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:735) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.doRunLoop(HBaseServer.java:524) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:499) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:180) at org.apache.hadoop.hbase.KeyValue.readFields(KeyValue.java:2254) at org.apache.hadoop.hbase.regionserver.wal.WALEdit.readFields(WALEdit.java:146) at org.apache.hadoop.hbase.regionserver.wal.HLog$Entry.readFields(HLog.java:1767) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:682) ... 11 more -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6533) [replication] replication will be block if WAL compress set differently in master and slave configuration
[ https://issues.apache.org/jira/browse/HBASE-6533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] terry zhang updated HBASE-6533: --- Assignee: terry zhang [replication] replication will be block if WAL compress set differently in master and slave configuration - Key: HBASE-6533 URL: https://issues.apache.org/jira/browse/HBASE-6533 Project: HBase Issue Type: Bug Components: replication Affects Versions: 0.94.0 Reporter: terry zhang Assignee: terry zhang Priority: Critical Fix For: 0.94.3 Attachments: hbase-6533.patch as we know in hbase 0.94.0 we have a configuration below property namehbase.regionserver.wal.enablecompression/name valuetrue/value /property if we enable it in master cluster and disable it in slave cluster . Then replication will not work. It will throw unwrapRemoteException again and again in master cluster. 2012-08-09 12:49:55,892 WARN org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Can't replicate because of an error on the remote cluster: java.io.IOException: IPC server unable to read call parameters: Error in readFields at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:95) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:79) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:635) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:365) Caused by: org.apache.hadoop.ipc.RemoteException: IPC server unable to read call parameters: Error in readFields at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:921) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:151) at $Proxy13.replicateLogEntries(Unknown Source) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:616) ... 1 more This is because Slave cluster can not parse the hlog entry . 2012-08-09 14:46:05,891 WARN org.apache.hadoop.ipc.HBaseServer: Unable to read call parameters for client 10.232.98.89 java.io.IOException: Error in readFields at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:685) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:586) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:635) at org.apache.hadoop.hbase.ipc.Invocation.readFields(Invocation.java:125) at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:1292) at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:1207) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:735) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.doRunLoop(HBaseServer.java:524) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:499) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:180) at org.apache.hadoop.hbase.KeyValue.readFields(KeyValue.java:2254) at org.apache.hadoop.hbase.regionserver.wal.WALEdit.readFields(WALEdit.java:146) at org.apache.hadoop.hbase.regionserver.wal.HLog$Entry.readFields(HLog.java:1767) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:682) ... 11 more -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6719) [replication] Data will lose if open a Hlog failed more than maxRetriesMultiplier
terry zhang created HBASE-6719: -- Summary: [replication] Data will lose if open a Hlog failed more than maxRetriesMultiplier Key: HBASE-6719 URL: https://issues.apache.org/jira/browse/HBASE-6719 Project: HBase Issue Type: Bug Components: replication Affects Versions: 0.94.1 Reporter: terry zhang Assignee: terry zhang Priority: Critical Fix For: 0.94.2 Please Take a look below code {code:title=ReplicationSource.java|borderStyle=solid} protected boolean openReader(int sleepMultiplier) { { ... catch (IOException ioe) { LOG.warn(peerClusterZnode + Got: , ioe); // TODO Need a better way to determinate if a file is really gone but // TODO without scanning all logs dir if (sleepMultiplier == this.maxRetriesMultiplier) { LOG.warn(Waited too long for this file, considering dumping); return !processEndOfFile(); // Open a file failed over maxRetriesMultiplier(default 10) } } return true; ... } protected boolean processEndOfFile() { if (this.queue.size() != 0) {// Skipped this Hlog . Data loss this.currentPath = null; this.position = 0; return true; } else if (this.queueRecovered) { // Terminate Failover Replication source thread ,data loss this.manager.closeRecoveredQueue(this); LOG.info(Finished recovering the queue); this.running = false; return true; } return false; } {code} Some Time HDFS will meet some problem but actually Hlog file is OK , So after HDFS back ,Some data will lose and can not find them back in slave cluster. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6719) [replication] Data will lose if open a Hlog failed more than maxRetriesMultiplier
[ https://issues.apache.org/jira/browse/HBASE-6719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] terry zhang updated HBASE-6719: --- Attachment: hbase-6719.patch [replication] Data will lose if open a Hlog failed more than maxRetriesMultiplier - Key: HBASE-6719 URL: https://issues.apache.org/jira/browse/HBASE-6719 Project: HBase Issue Type: Bug Components: replication Affects Versions: 0.94.1 Reporter: terry zhang Assignee: terry zhang Priority: Critical Fix For: 0.94.2 Attachments: hbase-6719.patch Please Take a look below code {code:title=ReplicationSource.java|borderStyle=solid} protected boolean openReader(int sleepMultiplier) { { ... catch (IOException ioe) { LOG.warn(peerClusterZnode + Got: , ioe); // TODO Need a better way to determinate if a file is really gone but // TODO without scanning all logs dir if (sleepMultiplier == this.maxRetriesMultiplier) { LOG.warn(Waited too long for this file, considering dumping); return !processEndOfFile(); // Open a file failed over maxRetriesMultiplier(default 10) } } return true; ... } protected boolean processEndOfFile() { if (this.queue.size() != 0) {// Skipped this Hlog . Data loss this.currentPath = null; this.position = 0; return true; } else if (this.queueRecovered) { // Terminate Failover Replication source thread ,data loss this.manager.closeRecoveredQueue(this); LOG.info(Finished recovering the queue); this.running = false; return true; } return false; } {code} Some Time HDFS will meet some problem but actually Hlog file is OK , So after HDFS back ,Some data will lose and can not find them back in slave cluster. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6719) [replication] Data will lose if open a Hlog failed more than maxRetriesMultiplier
[ https://issues.apache.org/jira/browse/HBASE-6719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448578#comment-13448578 ] terry zhang commented on HBASE-6719: I think we need to handle the IOException carefully and better not to skip the Hlog unless it is really corrupted. We can log this failture as a fatal in Log and skip the Hlog (by delete the hlog zk node manually ) if we have to. [replication] Data will lose if open a Hlog failed more than maxRetriesMultiplier - Key: HBASE-6719 URL: https://issues.apache.org/jira/browse/HBASE-6719 Project: HBase Issue Type: Bug Components: replication Affects Versions: 0.94.1 Reporter: terry zhang Assignee: terry zhang Priority: Critical Fix For: 0.94.2 Attachments: hbase-6719.patch Please Take a look below code {code:title=ReplicationSource.java|borderStyle=solid} protected boolean openReader(int sleepMultiplier) { { ... catch (IOException ioe) { LOG.warn(peerClusterZnode + Got: , ioe); // TODO Need a better way to determinate if a file is really gone but // TODO without scanning all logs dir if (sleepMultiplier == this.maxRetriesMultiplier) { LOG.warn(Waited too long for this file, considering dumping); return !processEndOfFile(); // Open a file failed over maxRetriesMultiplier(default 10) } } return true; ... } protected boolean processEndOfFile() { if (this.queue.size() != 0) {// Skipped this Hlog . Data loss this.currentPath = null; this.position = 0; return true; } else if (this.queueRecovered) { // Terminate Failover Replication source thread ,data loss this.manager.closeRecoveredQueue(this); LOG.info(Finished recovering the queue); this.running = false; return true; } return false; } {code} Some Time HDFS will meet some problem but actually Hlog file is OK , So after HDFS back ,Some data will lose and can not find them back in slave cluster. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6719) [replication] Data will lose if open a Hlog failed more than maxRetriesMultiplier
[ https://issues.apache.org/jira/browse/HBASE-6719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448584#comment-13448584 ] terry zhang commented on HBASE-6719: now we can handler it like below: hlog size = 0, Hlog queue =0,Recovery thread = yes. Terminate recovery thread(return !processEndOfFile()) hlog size = 0, Hlog queue =0,Recovery thread = no. Continue Loop (return !processEndOfFile()) hlog size = 0, Hlog queue !=0,Recovery thread = yes. Skip hlog (return !processEndOfFile()) hlog size = 0, Hlog queue !=0,Recovery thread = no. skip hlog (return !processEndOfFile()) hlog size = 1, Hlog queue =0,Recovery thread = yes. LOG as a Fatal mistake in regionserver's log hlog size = 1, Hlog queue =0,Recovery thread = no. LOG as a Fatal mistake in regionserver's log hlog size = 1, Hlog queue !=0,Recovery thread = yes. LOG as a Fatal mistake in regionserver's log hlog size = 1, Hlog queue !=0,Recovery thread = no. LOG as a Fatal mistake in regionserver's log [replication] Data will lose if open a Hlog failed more than maxRetriesMultiplier - Key: HBASE-6719 URL: https://issues.apache.org/jira/browse/HBASE-6719 Project: HBase Issue Type: Bug Components: replication Affects Versions: 0.94.1 Reporter: terry zhang Assignee: terry zhang Priority: Critical Fix For: 0.94.2 Attachments: hbase-6719.patch Please Take a look below code {code:title=ReplicationSource.java|borderStyle=solid} protected boolean openReader(int sleepMultiplier) { { ... catch (IOException ioe) { LOG.warn(peerClusterZnode + Got: , ioe); // TODO Need a better way to determinate if a file is really gone but // TODO without scanning all logs dir if (sleepMultiplier == this.maxRetriesMultiplier) { LOG.warn(Waited too long for this file, considering dumping); return !processEndOfFile(); // Open a file failed over maxRetriesMultiplier(default 10) } } return true; ... } protected boolean processEndOfFile() { if (this.queue.size() != 0) {// Skipped this Hlog . Data loss this.currentPath = null; this.position = 0; return true; } else if (this.queueRecovered) { // Terminate Failover Replication source thread ,data loss this.manager.closeRecoveredQueue(this); LOG.info(Finished recovering the queue); this.running = false; return true; } return false; } {code} Some Time HDFS will meet some problem but actually Hlog file is OK , So after HDFS back ,Some data will lose and can not find them back in slave cluster. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6719) [replication] Data will lose if open a Hlog failed more than maxRetriesMultiplier
[ https://issues.apache.org/jira/browse/HBASE-6719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448586#comment-13448586 ] terry zhang commented on HBASE-6719: hlog size=1 Means hlog size is not 0.( hlog size != 0) [replication] Data will lose if open a Hlog failed more than maxRetriesMultiplier - Key: HBASE-6719 URL: https://issues.apache.org/jira/browse/HBASE-6719 Project: HBase Issue Type: Bug Components: replication Affects Versions: 0.94.1 Reporter: terry zhang Assignee: terry zhang Priority: Critical Fix For: 0.94.2 Attachments: hbase-6719.patch Please Take a look below code {code:title=ReplicationSource.java|borderStyle=solid} protected boolean openReader(int sleepMultiplier) { { ... catch (IOException ioe) { LOG.warn(peerClusterZnode + Got: , ioe); // TODO Need a better way to determinate if a file is really gone but // TODO without scanning all logs dir if (sleepMultiplier == this.maxRetriesMultiplier) { LOG.warn(Waited too long for this file, considering dumping); return !processEndOfFile(); // Open a file failed over maxRetriesMultiplier(default 10) } } return true; ... } protected boolean processEndOfFile() { if (this.queue.size() != 0) {// Skipped this Hlog . Data loss this.currentPath = null; this.position = 0; return true; } else if (this.queueRecovered) { // Terminate Failover Replication source thread ,data loss this.manager.closeRecoveredQueue(this); LOG.info(Finished recovering the queue); this.running = false; return true; } return false; } {code} Some Time HDFS will meet some problem but actually Hlog file is OK , So after HDFS back ,Some data will lose and can not find them back in slave cluster. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6299) RS starts region open while fails ack to HMaster.sendRegionOpen() causes inconsistency in HMaster's region state and a series of successive problems.
[ https://issues.apache.org/jira/browse/HBASE-6299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448616#comment-13448616 ] ramkrishna.s.vasudevan commented on HBASE-6299: --- [~maryannxue] You have any updated patch for this? Can we provide one updated patch for this issue ? RS starts region open while fails ack to HMaster.sendRegionOpen() causes inconsistency in HMaster's region state and a series of successive problems. - Key: HBASE-6299 URL: https://issues.apache.org/jira/browse/HBASE-6299 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.6, 0.94.0 Reporter: Maryann Xue Assignee: Maryann Xue Priority: Critical Attachments: HBASE-6299.patch, HBASE-6299-v2.patch 1. HMaster tries to assign a region to an RS. 2. HMaster creates a RegionState for this region and puts it into regionsInTransition. 3. In the first assign attempt, HMaster calls RS.openRegion(). The RS receives the open region request and starts to proceed, with success eventually. However, due to network problems, HMaster fails to receive the response for the openRegion() call, and the call times out. 4. HMaster attemps to assign for a second time, choosing another RS. 5. But since the HMaster's OpenedRegionHandler has been triggered by the region open of the previous RS, and the RegionState has already been removed from regionsInTransition, HMaster finds invalid and ignores the unassigned ZK node RS_ZK_REGION_OPENING updated by the second attempt. 6. The unassigned ZK node stays and a later unassign fails coz RS_ZK_REGION_CLOSING cannot be created. {code} 2012-06-29 07:03:38,870 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for region CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.; plan=hri=CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568., src=swbss-hadoop-004,60020,1340890123243, dest=swbss-hadoop-006,60020,1340890678078 2012-06-29 07:03:38,870 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568. to swbss-hadoop-006,60020,1340890678078 2012-06-29 07:03:38,870 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=M_ZK_REGION_OFFLINE, server=swbss-hadoop-002:6, region=b713fd655fa02395496c5a6e39ddf568 2012-06-29 07:06:28,882 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=swbss-hadoop-006,60020,1340890678078, region=b713fd655fa02395496c5a6e39ddf568 2012-06-29 07:06:32,291 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=swbss-hadoop-006,60020,1340890678078, region=b713fd655fa02395496c5a6e39ddf568 2012-06-29 07:06:32,299 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=swbss-hadoop-006,60020,1340890678078, region=b713fd655fa02395496c5a6e39ddf568 2012-06-29 07:06:32,299 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568. from serverName=swbss-hadoop-006,60020,1340890678078, load=(requests=518945, regions=575, usedHeap=15282, maxHeap=31301); deleting unassigned node 2012-06-29 07:06:32,299 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x2377fee2ae80007 Deleting existing unassigned node for b713fd655fa02395496c5a6e39ddf568 that is in expected state RS_ZK_REGION_OPENED 2012-06-29 07:06:32,301 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x2377fee2ae80007 Successfully deleted unassigned node for region b713fd655fa02395496c5a6e39ddf568 in expected state RS_ZK_REGION_OPENED 2012-06-29 07:06:32,301 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: The master has opened the region CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568. that was online on serverName=swbss-hadoop-006,60020,1340890678078, load=(requests=518945, regions=575, usedHeap=15282, maxHeap=31301) 2012-06-29 07:07:41,140 WARN org.apache.hadoop.hbase.master.AssignmentManager: Failed assignment of CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568. to serverName=swbss-hadoop-006,60020,1340890678078, load=(requests=0, regions=575, usedHeap=0, maxHeap=0),
[jira] [Commented] (HBASE-3866) Script to add regions gradually to a new regionserver.
[ https://issues.apache.org/jira/browse/HBASE-3866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448647#comment-13448647 ] Aravind Gottipati commented on HBASE-3866: -- I will defer to you folks regarding including this script with the distribution. Stack's suggestion of closing the JIRA is a fine one, like he said - this would leave the script here for others to use. I would however like to note a few things. 1. The script attached here is outdated. A newer version of the script that worked with 0.92 is here (https://github.com/aravind/hbase-utils/blob/master/region_mover.rb). I haven't been keeping up with the latest, so there is a very good chance, it might not work with versions after 0.92. 2. The script is pretty inefficient in how it moves and balances regions. It maintains an internal hashmap (two of them even) of the servers - number of regions, to keep the region count balanced. 3. It is as portable as the original region mover script, since it re-uses most of the same mechanisms. Script to add regions gradually to a new regionserver. -- Key: HBASE-3866 URL: https://issues.apache.org/jira/browse/HBASE-3866 Project: HBase Issue Type: Improvement Components: scripts Affects Versions: 0.90.2 Reporter: Aravind Gottipati Priority: Minor Attachments: 3866-max-regions-per-iteration.patch, slow_balancer.rb, slow_balancer.rb When a new region server is brought online, the current balancer kicks off a whole bunch of region moves and causes a lot of regions to be un-available right away. A slower balancer that gradually balances the cluster is probably a good script to have. I have an initial version that mooches off the region_mover script to do this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4050) Update HBase metrics framework to metrics2 framework
[ https://issues.apache.org/jira/browse/HBASE-4050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448650#comment-13448650 ] Hudson commented on HBASE-4050: --- Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #160 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/160/]) HBASE-4050 Clean up BaseMetricsSourceImpl (Revision 1381008) Result = FAILURE stack : Files : * /hbase/trunk/hbase-hadoop1-compat/src/main/java/org/apache/hadoop/hbase/master/metrics/MasterMetricsSourceImpl.java * /hbase/trunk/hbase-hadoop1-compat/src/main/java/org/apache/hadoop/hbase/metrics/BaseMetricsSourceImpl.java * /hbase/trunk/hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/master/metrics/MasterMetricsSourceImpl.java * /hbase/trunk/hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/metrics/BaseMetricsSourceImpl.java Update HBase metrics framework to metrics2 framework Key: HBASE-4050 URL: https://issues.apache.org/jira/browse/HBASE-4050 Project: HBase Issue Type: New Feature Components: metrics Affects Versions: 0.90.4 Environment: Java 6 Reporter: Eric Yang Assignee: Elliott Clark Priority: Critical Fix For: 0.96.0 Attachments: 4050-metrics-v2.patch, 4050-metrics-v3.patch, HBASE-4050-0.patch, HBASE-4050-1.patch, HBASE-4050-2.patch, HBASE-4050-3.patch, HBASE-4050-5.patch, HBASE-4050-6.patch, HBASE-4050-7.patch, HBASE-4050-8_1.patch, HBASE-4050-8.patch, HBASE-4050.patch Metrics Framework has been marked deprecated in Hadoop 0.20.203+ and 0.22+, and it might get removed in future Hadoop release. Hence, HBase needs to revise the dependency of MetricsContext to use Metrics2 framework. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5631) hbck should handle case where .tableinfo file is missing.
[ https://issues.apache.org/jira/browse/HBASE-5631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Huang updated HBASE-5631: - Attachment: (was: hbase-5631-trunk.patch) hbck should handle case where .tableinfo file is missing. - Key: HBASE-5631 URL: https://issues.apache.org/jira/browse/HBASE-5631 Project: HBase Issue Type: Improvement Components: hbck Affects Versions: 0.92.2, 0.94.0, 0.96.0 Reporter: Jonathan Hsieh Assignee: Jie Huang 0.92+ branches have a .tableinfo file which could be missing from hdfs. hbck should be able to detect and repair this properly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5631) hbck should handle case where .tableinfo file is missing.
[ https://issues.apache.org/jira/browse/HBASE-5631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Huang updated HBASE-5631: - Attachment: hbase-5631.patch here attaches the patch file for this feature. hbck should handle case where .tableinfo file is missing. - Key: HBASE-5631 URL: https://issues.apache.org/jira/browse/HBASE-5631 Project: HBase Issue Type: Improvement Components: hbck Affects Versions: 0.92.2, 0.94.0, 0.96.0 Reporter: Jonathan Hsieh Assignee: Jie Huang Attachments: hbase-5631.patch 0.92+ branches have a .tableinfo file which could be missing from hdfs. hbck should be able to detect and repair this properly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6698) Refactor checkAndPut and checkAndDelete to use doMiniBatchMutation
[ https://issues.apache.org/jira/browse/HBASE-6698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Priyadarshini updated HBASE-6698: - Attachment: HBASE-6698_2.patch Refactor checkAndPut and checkAndDelete to use doMiniBatchMutation -- Key: HBASE-6698 URL: https://issues.apache.org/jira/browse/HBASE-6698 Project: HBase Issue Type: Improvement Reporter: ramkrishna.s.vasudevan Fix For: 0.96.0 Attachments: HBASE-6698_1.patch, HBASE-6698_2.patch, HBASE-6698.patch Currently the checkAndPut and checkAndDelete api internally calls the internalPut and internalDelete. May be we can just call doMiniBatchMutation only. This will help in future like if we have some hooks and the CP handles certain cases in the doMiniBatchMutation the same can be done while doing a put thro checkAndPut or while doing a delete thro checkAndDelete. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6592) [shell] Add means of custom formatting output by column
[ https://issues.apache.org/jira/browse/HBASE-6592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Huang updated HBASE-6592: - Status: Patch Available (was: Open) [shell] Add means of custom formatting output by column --- Key: HBASE-6592 URL: https://issues.apache.org/jira/browse/HBASE-6592 Project: HBase Issue Type: New Feature Components: shell Reporter: stack Priority: Minor Labels: noob Attachments: hbase-6592.patch See Jacques suggestion toward end of this thread for how we should allow adding a custom formatter per column to use outputting column content in shell: http://search-hadoop.com/m/2WxUB1fuxL11/Printing+integers+in+the+Hbase+shellsubj=Printing+integers+in+the+Hbase+shell -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6698) Refactor checkAndPut and checkAndDelete to use doMiniBatchMutation
[ https://issues.apache.org/jira/browse/HBASE-6698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-6698: -- Status: Open (was: Patch Available) Refactor checkAndPut and checkAndDelete to use doMiniBatchMutation -- Key: HBASE-6698 URL: https://issues.apache.org/jira/browse/HBASE-6698 Project: HBase Issue Type: Improvement Reporter: ramkrishna.s.vasudevan Fix For: 0.96.0 Attachments: HBASE-6698_1.patch, HBASE-6698_2.patch, HBASE-6698.patch Currently the checkAndPut and checkAndDelete api internally calls the internalPut and internalDelete. May be we can just call doMiniBatchMutation only. This will help in future like if we have some hooks and the CP handles certain cases in the doMiniBatchMutation the same can be done while doing a put thro checkAndPut or while doing a delete thro checkAndDelete. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6698) Refactor checkAndPut and checkAndDelete to use doMiniBatchMutation
[ https://issues.apache.org/jira/browse/HBASE-6698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448697#comment-13448697 ] Priyadarshini commented on HBASE-6698: -- Refactored internalPut() and internalDelete(). Refactor checkAndPut and checkAndDelete to use doMiniBatchMutation -- Key: HBASE-6698 URL: https://issues.apache.org/jira/browse/HBASE-6698 Project: HBase Issue Type: Improvement Reporter: ramkrishna.s.vasudevan Fix For: 0.96.0 Attachments: HBASE-6698_1.patch, HBASE-6698_2.patch, HBASE-6698.patch Currently the checkAndPut and checkAndDelete api internally calls the internalPut and internalDelete. May be we can just call doMiniBatchMutation only. This will help in future like if we have some hooks and the CP handles certain cases in the doMiniBatchMutation the same can be done while doing a put thro checkAndPut or while doing a delete thro checkAndDelete. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6698) Refactor checkAndPut and checkAndDelete to use doMiniBatchMutation
[ https://issues.apache.org/jira/browse/HBASE-6698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Priyadarshini updated HBASE-6698: - Status: Patch Available (was: Open) Refactor checkAndPut and checkAndDelete to use doMiniBatchMutation -- Key: HBASE-6698 URL: https://issues.apache.org/jira/browse/HBASE-6698 Project: HBase Issue Type: Improvement Reporter: ramkrishna.s.vasudevan Fix For: 0.96.0 Attachments: HBASE-6698_1.patch, HBASE-6698_2.patch, HBASE-6698.patch Currently the checkAndPut and checkAndDelete api internally calls the internalPut and internalDelete. May be we can just call doMiniBatchMutation only. This will help in future like if we have some hooks and the CP handles certain cases in the doMiniBatchMutation the same can be done while doing a put thro checkAndPut or while doing a delete thro checkAndDelete. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6286) Upgrade maven-compiler-plugin to 2.5.1
[ https://issues.apache.org/jira/browse/HBASE-6286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448727#comment-13448727 ] Michael Drzal commented on HBASE-6286: -- +1 seems like a win to me Upgrade maven-compiler-plugin to 2.5.1 -- Key: HBASE-6286 URL: https://issues.apache.org/jira/browse/HBASE-6286 Project: HBase Issue Type: Improvement Components: build Reporter: Andrew Purtell Assignee: Andrew Purtell Priority: Minor Attachments: HBASE-6286.patch time mvn -PlocalTests clean install -DskipTests With 2.5.1: |user|1m35.634s|1m31.178s|1m31.366s| |sys|0m06.540s|0m05.376s|0m05.488s| With 2.0.2 (current): |user|2m01.168s|1m54.027s|1m57.799s| |sys|0m05.896s|0m05.912s|0m06.032s| -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6288) In hbase-daemons.sh, description of the default backup-master file path is wrong
[ https://issues.apache.org/jira/browse/HBASE-6288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448732#comment-13448732 ] Michael Drzal commented on HBASE-6288: -- +1 looks good [~benkimkimben] In hbase-daemons.sh, description of the default backup-master file path is wrong Key: HBASE-6288 URL: https://issues.apache.org/jira/browse/HBASE-6288 Project: HBase Issue Type: Task Components: master, scripts, shell Affects Versions: 0.92.0, 0.92.1, 0.94.0 Reporter: Benjamin Kim Attachments: HBASE-6288-92-1.patch, HBASE-6288-92.patch, HBASE-6288-94.patch, HBASE-6288-trunk.patch In hbase-daemons.sh, description of the default backup-master file path is wrong {code} # HBASE_BACKUP_MASTERS File naming remote hosts. # Default is ${HADOOP_CONF_DIR}/backup-masters {code} it says the default backup-masters file path is at a hadoop-conf-dir, but shouldn't this be HBASE_CONF_DIR? also adding following lines to conf/hbase-env.sh would be helpful {code} # File naming hosts on which backup HMaster will run. $HBASE_HOME/conf/backup-masters by default. export HBASE_BACKUP_MASTERS=${HBASE_HOME}/conf/backup-masters {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6698) Refactor checkAndPut and checkAndDelete to use doMiniBatchMutation
[ https://issues.apache.org/jira/browse/HBASE-6698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448737#comment-13448737 ] Hadoop QA commented on HBASE-6698: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12543846/HBASE-6698_2.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. -1 javadoc. The javadoc tool appears to have generated 108 warning messages. -1 javac. The applied patch generated 5 javac compiler warnings (more than the trunk's current 4 warnings). -1 findbugs. The patch appears to introduce 7 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.TestRegionServerMetrics Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2787//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2787//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2787//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2787//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2787//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2787//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2787//console This message is automatically generated. Refactor checkAndPut and checkAndDelete to use doMiniBatchMutation -- Key: HBASE-6698 URL: https://issues.apache.org/jira/browse/HBASE-6698 Project: HBase Issue Type: Improvement Reporter: ramkrishna.s.vasudevan Fix For: 0.96.0 Attachments: HBASE-6698_1.patch, HBASE-6698_2.patch, HBASE-6698.patch Currently the checkAndPut and checkAndDelete api internally calls the internalPut and internalDelete. May be we can just call doMiniBatchMutation only. This will help in future like if we have some hooks and the CP handles certain cases in the doMiniBatchMutation the same can be done while doing a put thro checkAndPut or while doing a delete thro checkAndDelete. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6302) Document how to run integration tests
[ https://issues.apache.org/jira/browse/HBASE-6302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448755#comment-13448755 ] Michael Drzal commented on HBASE-6302: -- Patch looks good, with the exception of the points that Andrew made. Document how to run integration tests - Key: HBASE-6302 URL: https://issues.apache.org/jira/browse/HBASE-6302 Project: HBase Issue Type: Sub-task Components: documentation Reporter: stack Assignee: Enis Soztutar Priority: Blocker Fix For: 0.96.0 Attachments: HBASE-6302_v1.patch HBASE-6203 has attached the old IT doc with some mods. When we figure how ITs are to be run, update it and apply the documentation under this issue. Making a blocker against 0.96. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6651) Thread safety of HTablePool is doubtful
[ https://issues.apache.org/jira/browse/HBASE-6651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448770#comment-13448770 ] Hiroshi Ikeda commented on HBASE-6651: -- * I think ThreadLocalPool is useless and dangerous. You never access a content in ThreadLocal from other threads, and if you require information in the content to dispose its container object or something, you must collect the information by using all the thread that you ever used to access. * RoundRobinPool might give the same object to different threads. * It is bad to use conccurent collections. We should explictly lock larger sections to keep consistency, or remove synchronization concerns from PoolMap with using explicit locks from outside of PoolMap. * PoolMap breaks the contract of Map; The actual behaviors of the methods of PoolMap are vague. Also filling out the methods of Map causes the code dirty. We should simplify the code by removing the needless implementation at the start. Thread safety of HTablePool is doubtful --- Key: HBASE-6651 URL: https://issues.apache.org/jira/browse/HBASE-6651 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.94.1 Reporter: Hiroshi Ikeda Priority: Minor There are some operations in HTablePool to access to PoolMap in multiple times without any explict synchronization. For example HTablePool.closeTablePool() calles PoolMap.values(), and calles PoolMap.remove(). If other threads add new instances to the pool in the middle of the calls, the new added instances might be dropped. (HTablePool.closeTablePool() also has another problem that calling it by multple threads causes accessing HTable by multiple threads.) Moreover, PoolMap is not thread safe for the same reason. For example PoolMap.put() calles ConcurrentMap.get() and calles ConcurrentMap.put(). If other threads add a new instance to the concurent map in the middle of the calls, the new instance might be dropped. And also implementations of Pool have the same problems. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5631) hbck should handle case where .tableinfo file is missing.
[ https://issues.apache.org/jira/browse/HBASE-5631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448773#comment-13448773 ] Hadoop QA commented on HBASE-5631: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12543845/hbase-5631.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. -1 javadoc. The javadoc tool appears to have generated 108 warning messages. -1 javac. The applied patch generated 5 javac compiler warnings (more than the trunk's current 4 warnings). -1 findbugs. The patch appears to introduce 7 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2788//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2788//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2788//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2788//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2788//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2788//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2788//console This message is automatically generated. hbck should handle case where .tableinfo file is missing. - Key: HBASE-5631 URL: https://issues.apache.org/jira/browse/HBASE-5631 Project: HBase Issue Type: Improvement Components: hbck Affects Versions: 0.92.2, 0.94.0, 0.96.0 Reporter: Jonathan Hsieh Assignee: Jie Huang Attachments: hbase-5631.patch 0.92+ branches have a .tableinfo file which could be missing from hdfs. hbck should be able to detect and repair this properly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-6288) In hbase-daemons.sh, description of the default backup-master file path is wrong
[ https://issues.apache.org/jira/browse/HBASE-6288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HBASE-6288. -- Resolution: Fixed Fix Version/s: 0.94.2 0.92.3 Hadoop Flags: Reviewed Committed to 0.92, 0.94 and to trunk. In hbase-daemons.sh, description of the default backup-master file path is wrong Key: HBASE-6288 URL: https://issues.apache.org/jira/browse/HBASE-6288 Project: HBase Issue Type: Task Components: master, scripts, shell Affects Versions: 0.92.0, 0.92.1, 0.94.0 Reporter: Benjamin Kim Fix For: 0.92.3, 0.94.2 Attachments: HBASE-6288-92-1.patch, HBASE-6288-92.patch, HBASE-6288-94.patch, HBASE-6288-trunk.patch In hbase-daemons.sh, description of the default backup-master file path is wrong {code} # HBASE_BACKUP_MASTERS File naming remote hosts. # Default is ${HADOOP_CONF_DIR}/backup-masters {code} it says the default backup-masters file path is at a hadoop-conf-dir, but shouldn't this be HBASE_CONF_DIR? also adding following lines to conf/hbase-env.sh would be helpful {code} # File naming hosts on which backup HMaster will run. $HBASE_HOME/conf/backup-masters by default. export HBASE_BACKUP_MASTERS=${HBASE_HOME}/conf/backup-masters {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5631) hbck should handle case where .tableinfo file is missing.
[ https://issues.apache.org/jira/browse/HBASE-5631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448823#comment-13448823 ] Jonathan Hsieh commented on HBASE-5631: --- Have you tried shutting down the cluster and then restarting it? I have a suspicion that this may not work if the HTD isn't cached. Could you modify the test (add a few lines) from HBASE-6516 to verify that this patch fixes the table? {code} + HTableDescriptor[] htds = getHTableDescriptors(tmpList); // this goes to master which goes to the filesystem {code} Nits: instead of this: {code} + Path hbaseRoot = new Path(conf.get(HConstants.HBASE_DIR)); {code} use this: {code} FSUtils.getRootDir(conf); {code} Are we purposely updating the passed in array? could we just use tmpList? {code} + ListString tmpList = new ArrayListString(); + tmpList.addAll(orphanTableDirs); + HTableDescriptor[] htds = getHTableDescriptors(tmpList); + Iterator iter = orphanTableDirs.iterator(); + int j = 0; + while (iter.hasNext()) { +String tableName = (String) iter.next(); + {code} I wasn't consistent with error.print vs log. I think I prefer log. Any reason you picked this vs the other? {code} +errors.print(Try to fix orphan table: + tableName); .. +errors.print(fixing table: + tableName); .. + errors.report(Failed to fix orphan table: + tableName); {code} typo/reword: hfsck - hbck, It is strongly recommended that you re-run hbck manually since orphan table dirs have been fixed {code} +LOG.warn(Strongly recommend to re-run manually hfsck after all orphanTableDirs being fixed); {code} hbck should handle case where .tableinfo file is missing. - Key: HBASE-5631 URL: https://issues.apache.org/jira/browse/HBASE-5631 Project: HBase Issue Type: Improvement Components: hbck Affects Versions: 0.92.2, 0.94.0, 0.96.0 Reporter: Jonathan Hsieh Assignee: Jie Huang Attachments: hbase-5631.patch 0.92+ branches have a .tableinfo file which could be missing from hdfs. hbck should be able to detect and repair this properly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6288) In hbase-daemons.sh, description of the default backup-master file path is wrong
[ https://issues.apache.org/jira/browse/HBASE-6288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448824#comment-13448824 ] Hudson commented on HBASE-6288: --- Integrated in HBase-0.94 #449 (See [https://builds.apache.org/job/HBase-0.94/449/]) HBASE-6288 In hbase-daemons.sh, description of the default backup-master file path is wrong (Revision 1381219) Result = FAILURE stack : Files : * /hbase/branches/0.94/bin/master-backup.sh * /hbase/branches/0.94/conf/hbase-env.sh In hbase-daemons.sh, description of the default backup-master file path is wrong Key: HBASE-6288 URL: https://issues.apache.org/jira/browse/HBASE-6288 Project: HBase Issue Type: Task Components: master, scripts, shell Affects Versions: 0.92.0, 0.92.1, 0.94.0 Reporter: Benjamin Kim Fix For: 0.92.3, 0.94.2 Attachments: HBASE-6288-92-1.patch, HBASE-6288-92.patch, HBASE-6288-94.patch, HBASE-6288-trunk.patch In hbase-daemons.sh, description of the default backup-master file path is wrong {code} # HBASE_BACKUP_MASTERS File naming remote hosts. # Default is ${HADOOP_CONF_DIR}/backup-masters {code} it says the default backup-masters file path is at a hadoop-conf-dir, but shouldn't this be HBASE_CONF_DIR? also adding following lines to conf/hbase-env.sh would be helpful {code} # File naming hosts on which backup HMaster will run. $HBASE_HOME/conf/backup-masters by default. export HBASE_BACKUP_MASTERS=${HBASE_HOME}/conf/backup-masters {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6438) RegionAlreadyInTransitionException needs to give more info to avoid assignment inconsistencies
[ https://issues.apache.org/jira/browse/HBASE-6438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448843#comment-13448843 ] ramkrishna.s.vasudevan commented on HBASE-6438: --- @Stack Sorry for missing out this review comment all these days. Actually we would like to get in HBASe-6299 also and this patch. As you mentioned can we give a patch for 0.94 and 0.92 combining both. We faced HBASE-6299 recently in one of our testing. Both should be an useful one. RegionAlreadyInTransitionException needs to give more info to avoid assignment inconsistencies -- Key: HBASE-6438 URL: https://issues.apache.org/jira/browse/HBASE-6438 Project: HBase Issue Type: Bug Reporter: ramkrishna.s.vasudevan Assignee: rajeshbabu Attachments: HBASE-6438_trunk.patch Seeing some of the recent issues in region assignment, RegionAlreadyInTransitionException is one reason after which the region assignment may or may not happen(in the sense we need to wait for the TM to assign). In HBASE-6317 we got one problem due to RegionAlreadyInTransitionException on master restart. Consider the following case, due to some reason like master restart or external assign call, we try to assign a region that is already getting opened in a RS. Now the next call to assign has already changed the state of the znode and so the current assign that is going on the RS is affected and it fails. The second assignment that started also fails getting RAITE exception. Finally both assignments not carrying on. Idea is to find whether any such RAITE exception can be retried or not. Here again we have following cases like where - The znode is yet to transitioned from OFFLINE to OPENING in RS - RS may be in the step of openRegion. - RS may be trying to transition OPENING to OPENED. - RS is yet to add to online regions in the RS side. Here in openRegion() and updateMeta() any failures we are moving the znode to FAILED_OPEN. So in these cases getting an RAITE should be ok. But in other cases the assignment is stopped. The idea is to just add the current state of the region assignment in the RIT map in the RS side and using that info we can determine whether the assignment can be retried or not on getting an RAITE. Considering the current work going on in AM, pls do share if this is needed atleast in the 0.92/0.94 versions? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6288) In hbase-daemons.sh, description of the default backup-master file path is wrong
[ https://issues.apache.org/jira/browse/HBASE-6288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-6288: - Fix Version/s: 0.96.0 In hbase-daemons.sh, description of the default backup-master file path is wrong Key: HBASE-6288 URL: https://issues.apache.org/jira/browse/HBASE-6288 Project: HBase Issue Type: Task Components: master, scripts, shell Affects Versions: 0.92.0, 0.92.1, 0.94.0 Reporter: Benjamin Kim Fix For: 0.96.0, 0.92.3, 0.94.2 Attachments: HBASE-6288-92-1.patch, HBASE-6288-92.patch, HBASE-6288-94.patch, HBASE-6288-trunk.patch In hbase-daemons.sh, description of the default backup-master file path is wrong {code} # HBASE_BACKUP_MASTERS File naming remote hosts. # Default is ${HADOOP_CONF_DIR}/backup-masters {code} it says the default backup-masters file path is at a hadoop-conf-dir, but shouldn't this be HBASE_CONF_DIR? also adding following lines to conf/hbase-env.sh would be helpful {code} # File naming hosts on which backup HMaster will run. $HBASE_HOME/conf/backup-masters by default. export HBASE_BACKUP_MASTERS=${HBASE_HOME}/conf/backup-masters {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3866) Script to add regions gradually to a new regionserver.
[ https://issues.apache.org/jira/browse/HBASE-3866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448871#comment-13448871 ] Lars Hofhansl commented on HBASE-3866: -- In my comment above I was referring to Ted's patch to HMaster. I agree the scripts tend to rot (because we do not have a good test framework for them), but they are useful to be kept here. So... What about Ted's attached patch? Script to add regions gradually to a new regionserver. -- Key: HBASE-3866 URL: https://issues.apache.org/jira/browse/HBASE-3866 Project: HBase Issue Type: Improvement Components: scripts Affects Versions: 0.90.2 Reporter: Aravind Gottipati Priority: Minor Attachments: 3866-max-regions-per-iteration.patch, slow_balancer.rb, slow_balancer.rb When a new region server is brought online, the current balancer kicks off a whole bunch of region moves and causes a lot of regions to be un-available right away. A slower balancer that gradually balances the cluster is probably a good script to have. I have an initial version that mooches off the region_mover script to do this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6651) Thread safety of HTablePool is doubtful
[ https://issues.apache.org/jira/browse/HBASE-6651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448874#comment-13448874 ] stack commented on HBASE-6651: -- @Hiroshi Thank you for digging in here. ThreadLocalPool was added by HBASE-2938 a while back. On #1, what do you see as implications? If its a pool of threads and all are using threadlocal, what would they need to share info? Can you say more on points #2 and #3 above? What do you suggest we do? Purge ThreadLocalPool? Thanks. Thread safety of HTablePool is doubtful --- Key: HBASE-6651 URL: https://issues.apache.org/jira/browse/HBASE-6651 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.94.1 Reporter: Hiroshi Ikeda Priority: Minor There are some operations in HTablePool to access to PoolMap in multiple times without any explict synchronization. For example HTablePool.closeTablePool() calles PoolMap.values(), and calles PoolMap.remove(). If other threads add new instances to the pool in the middle of the calls, the new added instances might be dropped. (HTablePool.closeTablePool() also has another problem that calling it by multple threads causes accessing HTable by multiple threads.) Moreover, PoolMap is not thread safe for the same reason. For example PoolMap.put() calles ConcurrentMap.get() and calles ConcurrentMap.put(). If other threads add a new instance to the concurent map in the middle of the calls, the new instance might be dropped. And also implementations of Pool have the same problems. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3866) Script to add regions gradually to a new regionserver.
[ https://issues.apache.org/jira/browse/HBASE-3866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448875#comment-13448875 ] stack commented on HBASE-3866: -- Patch looks good to me. Commit under a new issue named Add max-regions-per-balance-iteration (or some such) -- (Hey Aravind!) Script to add regions gradually to a new regionserver. -- Key: HBASE-3866 URL: https://issues.apache.org/jira/browse/HBASE-3866 Project: HBase Issue Type: Improvement Components: scripts Affects Versions: 0.90.2 Reporter: Aravind Gottipati Priority: Minor Attachments: 3866-max-regions-per-iteration.patch, slow_balancer.rb, slow_balancer.rb When a new region server is brought online, the current balancer kicks off a whole bunch of region moves and causes a lot of regions to be un-available right away. A slower balancer that gradually balances the cluster is probably a good script to have. I have an initial version that mooches off the region_mover script to do this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6302) Document how to run integration tests
[ https://issues.apache.org/jira/browse/HBASE-6302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448878#comment-13448878 ] stack commented on HBASE-6302: -- @Enis Want to have a go at addressing Andrew comments? Or just paste a CLI example here and I'll take care of getting above committed. Document how to run integration tests - Key: HBASE-6302 URL: https://issues.apache.org/jira/browse/HBASE-6302 Project: HBase Issue Type: Sub-task Components: documentation Reporter: stack Assignee: Enis Soztutar Priority: Blocker Fix For: 0.96.0 Attachments: HBASE-6302_v1.patch HBASE-6203 has attached the old IT doc with some mods. When we figure how ITs are to be run, update it and apply the documentation under this issue. Making a blocker against 0.96. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6288) In hbase-daemons.sh, description of the default backup-master file path is wrong
[ https://issues.apache.org/jira/browse/HBASE-6288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448901#comment-13448901 ] Hudson commented on HBASE-6288: --- Integrated in HBase-0.92 #556 (See [https://builds.apache.org/job/HBase-0.92/556/]) HBASE-6288 In hbase-daemons.sh, description of the default backup-master file path is wrong (Revision 1381220) Result = SUCCESS stack : Files : * /hbase/branches/0.92/bin/master-backup.sh * /hbase/branches/0.92/conf/hbase-env.sh In hbase-daemons.sh, description of the default backup-master file path is wrong Key: HBASE-6288 URL: https://issues.apache.org/jira/browse/HBASE-6288 Project: HBase Issue Type: Task Components: master, scripts, shell Affects Versions: 0.92.0, 0.92.1, 0.94.0 Reporter: Benjamin Kim Fix For: 0.96.0, 0.92.3, 0.94.2 Attachments: HBASE-6288-92-1.patch, HBASE-6288-92.patch, HBASE-6288-94.patch, HBASE-6288-trunk.patch In hbase-daemons.sh, description of the default backup-master file path is wrong {code} # HBASE_BACKUP_MASTERS File naming remote hosts. # Default is ${HADOOP_CONF_DIR}/backup-masters {code} it says the default backup-masters file path is at a hadoop-conf-dir, but shouldn't this be HBASE_CONF_DIR? also adding following lines to conf/hbase-env.sh would be helpful {code} # File naming hosts on which backup HMaster will run. $HBASE_HOME/conf/backup-masters by default. export HBASE_BACKUP_MASTERS=${HBASE_HOME}/conf/backup-masters {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6398) Print a warning if there is no local datanode
[ https://issues.apache.org/jira/browse/HBASE-6398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448902#comment-13448902 ] Sameer Vaishampayan commented on HBASE-6398: Will work on this. Print a warning if there is no local datanode - Key: HBASE-6398 URL: https://issues.apache.org/jira/browse/HBASE-6398 Project: HBase Issue Type: Improvement Reporter: Elliott Clark Labels: noob When starting up a RS HBase should print out a warning if there is no datanode locally. Lots of optimizations are only available if the data is machine local. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-6288) In hbase-daemons.sh, description of the default backup-master file path is wrong
[ https://issues.apache.org/jira/browse/HBASE-6288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu reassigned HBASE-6288: - Assignee: Benjamin Kim In hbase-daemons.sh, description of the default backup-master file path is wrong Key: HBASE-6288 URL: https://issues.apache.org/jira/browse/HBASE-6288 Project: HBase Issue Type: Task Components: master, scripts, shell Affects Versions: 0.92.0, 0.92.1, 0.94.0 Reporter: Benjamin Kim Assignee: Benjamin Kim Fix For: 0.96.0, 0.92.3, 0.94.2 Attachments: HBASE-6288-92-1.patch, HBASE-6288-92.patch, HBASE-6288-94.patch, HBASE-6288-trunk.patch In hbase-daemons.sh, description of the default backup-master file path is wrong {code} # HBASE_BACKUP_MASTERS File naming remote hosts. # Default is ${HADOOP_CONF_DIR}/backup-masters {code} it says the default backup-masters file path is at a hadoop-conf-dir, but shouldn't this be HBASE_CONF_DIR? also adding following lines to conf/hbase-env.sh would be helpful {code} # File naming hosts on which backup HMaster will run. $HBASE_HOME/conf/backup-masters by default. export HBASE_BACKUP_MASTERS=${HBASE_HOME}/conf/backup-masters {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6720) Optionally limit number of regions balanced in each balancer run
Lars Hofhansl created HBASE-6720: Summary: Optionally limit number of regions balanced in each balancer run Key: HBASE-6720 URL: https://issues.apache.org/jira/browse/HBASE-6720 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Fix For: 0.96.0, 0.94.3 See discussion on HBASE-3866 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-3866) Script to add regions gradually to a new regionserver.
[ https://issues.apache.org/jira/browse/HBASE-3866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-3866: - Resolution: Won't Fix Status: Resolved (was: Patch Available) Filed HBASE-6720 Script to add regions gradually to a new regionserver. -- Key: HBASE-3866 URL: https://issues.apache.org/jira/browse/HBASE-3866 Project: HBase Issue Type: Improvement Components: scripts Affects Versions: 0.90.2 Reporter: Aravind Gottipati Priority: Minor Attachments: 3866-max-regions-per-iteration.patch, slow_balancer.rb, slow_balancer.rb When a new region server is brought online, the current balancer kicks off a whole bunch of region moves and causes a lot of regions to be un-available right away. A slower balancer that gradually balances the cluster is probably a good script to have. I have an initial version that mooches off the region_mover script to do this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
[ https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448907#comment-13448907 ] Devaraj Das commented on HBASE-6649: [~zhi...@ebaysf.com]This patch fixes a specific problem to do with replication missing rows, and in my observations, that leads to somewhat frequent TestReplication.queueFailover failures. On trunk, do you know which test hangs? There probably are more issues to fix in the replication area, and we should have follow up jiras (and this jira is part-1 :)). [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1] --- Key: HBASE-6649 URL: https://issues.apache.org/jira/browse/HBASE-6649 Project: HBase Issue Type: Bug Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 0.92.3 Attachments: 6649-1.patch, 6649-2.txt, HBase-0.92 #495 test - queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover [Jenkins].html Have seen it twice in the recent past: http://bit.ly/MPCykB http://bit.ly/O79Dq7 .. Looking briefly at the logs hints at a pattern - in both the failed test instances, there was an RS crash while the test was running. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
[ https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448914#comment-13448914 ] Ted Yu commented on HBASE-6649: --- target/surefire-reports/org.apache.hadoop.hbase.replication.TestReplication.txt was 0 length. There was no JVM left from TestReplication by the time I got back to computer. [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1] --- Key: HBASE-6649 URL: https://issues.apache.org/jira/browse/HBASE-6649 Project: HBase Issue Type: Bug Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 0.92.3 Attachments: 6649-1.patch, 6649-2.txt, 6649-trunk.patch, HBase-0.92 #495 test - queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover [Jenkins].html Have seen it twice in the recent past: http://bit.ly/MPCykB http://bit.ly/O79Dq7 .. Looking briefly at the logs hints at a pattern - in both the failed test instances, there was an RS crash while the test was running. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6721) RegionServer Group based Assignment
Francis Liu created HBASE-6721: -- Summary: RegionServer Group based Assignment Key: HBASE-6721 URL: https://issues.apache.org/jira/browse/HBASE-6721 Project: HBase Issue Type: New Feature Reporter: Francis Liu Assignee: Vandana Ayyalasomayajula Fix For: 0.96.0 In multi-tenant deployments of HBase, it is likely that a RegionServer will be serving out regions from a number of different tables owned by various client applications. Being able to group a subset of running RegionServers and assign specific tables to it, provides a client application a level of isolation and resource allocation. The proposal essentially is to have an AssignmentManager which is aware of RegionServer groups and assigns tables to region servers based on groupings. Load balancing will occur on a per group basis as well. This is essentially a simplification of the approach taken in HBASE-4120. See attached document. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6721) RegionServer Group based Assignment
[ https://issues.apache.org/jira/browse/HBASE-6721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vandana Ayyalasomayajula updated HBASE-6721: Attachment: HBASE-6721-DesigDoc.pdf Design document for HBase region server grouping feature. RegionServer Group based Assignment --- Key: HBASE-6721 URL: https://issues.apache.org/jira/browse/HBASE-6721 Project: HBase Issue Type: New Feature Reporter: Francis Liu Assignee: Vandana Ayyalasomayajula Fix For: 0.96.0 Attachments: HBASE-6721-DesigDoc.pdf In multi-tenant deployments of HBase, it is likely that a RegionServer will be serving out regions from a number of different tables owned by various client applications. Being able to group a subset of running RegionServers and assign specific tables to it, provides a client application a level of isolation and resource allocation. The proposal essentially is to have an AssignmentManager which is aware of RegionServer groups and assigns tables to region servers based on groupings. Load balancing will occur on a per group basis as well. This is essentially a simplification of the approach taken in HBASE-4120. See attached document. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3976) Disable Block Cache On Compactions
[ https://issues.apache.org/jira/browse/HBASE-3976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448920#comment-13448920 ] Lars Hofhansl commented on HBASE-3976: -- Hmm... Looking at the code in trunk, this is (mostly) what is currently happening anyway. HStore.createWriterInTmp using the configured cacheOnWrite setting unless this is a compaction (in which case cacheOnWrite is set to false). There is also a test for this in TestCacheOnWrite. I think we can close this issue. Agreed? Disable Block Cache On Compactions -- Key: HBASE-3976 URL: https://issues.apache.org/jira/browse/HBASE-3976 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.90.3 Reporter: Karthick Sankarachary Assignee: Mikhail Bautin Priority: Minor Attachments: HBASE-3976.patch, HBASE-3976-unconditional.patch, HBASE-3976-V3.patch Is there a good reason to believe that caching blocks during compactions is beneficial? Currently, if block cache is enabled on a certain family, then every time it's compacted, we load all of its blocks into the (LRU) cache, at the expense of the legitimately hot ones. As a matter of fact, this concern was raised earlier in HBASE-1597, which rightly points out that, we should not bog down the LRU with unneccessary blocks during compaction. Even though that issue has been marked as fixed, it looks like it ought to be reopened. Should we err on the side of caution and not cache blocks during compactions period (as illustrated in the attached patch)? Or, can we be selectively aggressive about what blocks do get cached during compaction (e.g., only cache those blocks from the recent files)? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
[ https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448924#comment-13448924 ] Lars Hofhansl commented on HBASE-6649: -- Patch looks good to me. (As Ted points out there might other issues as well) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1] --- Key: HBASE-6649 URL: https://issues.apache.org/jira/browse/HBASE-6649 Project: HBase Issue Type: Bug Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 0.96.0, 0.92.3, 0.94.2 Attachments: 6649-1.patch, 6649-2.txt, 6649-trunk.patch, HBase-0.92 #495 test - queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover [Jenkins].html Have seen it twice in the recent past: http://bit.ly/MPCykB http://bit.ly/O79Dq7 .. Looking briefly at the logs hints at a pattern - in both the failed test instances, there was an RS crash while the test was running. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
[ https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-6649: - Fix Version/s: 0.94.2 0.96.0 I'd also like this in 0.94. The 0.92 will probably just apply cleanly. If not I'll make one. [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1] --- Key: HBASE-6649 URL: https://issues.apache.org/jira/browse/HBASE-6649 Project: HBase Issue Type: Bug Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 0.96.0, 0.92.3, 0.94.2 Attachments: 6649-1.patch, 6649-2.txt, 6649-trunk.patch, HBase-0.92 #495 test - queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover [Jenkins].html Have seen it twice in the recent past: http://bit.ly/MPCykB http://bit.ly/O79Dq7 .. Looking briefly at the logs hints at a pattern - in both the failed test instances, there was an RS crash while the test was running. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-3976) Disable Block Cache On Compactions
[ https://issues.apache.org/jira/browse/HBASE-3976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl resolved HBASE-3976. -- Resolution: Fixed Closing... Please reopen if this should be kept open. Disable Block Cache On Compactions -- Key: HBASE-3976 URL: https://issues.apache.org/jira/browse/HBASE-3976 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.90.3 Reporter: Karthick Sankarachary Assignee: Mikhail Bautin Priority: Minor Attachments: HBASE-3976.patch, HBASE-3976-unconditional.patch, HBASE-3976-V3.patch Is there a good reason to believe that caching blocks during compactions is beneficial? Currently, if block cache is enabled on a certain family, then every time it's compacted, we load all of its blocks into the (LRU) cache, at the expense of the legitimately hot ones. As a matter of fact, this concern was raised earlier in HBASE-1597, which rightly points out that, we should not bog down the LRU with unneccessary blocks during compaction. Even though that issue has been marked as fixed, it looks like it ought to be reopened. Should we err on the side of caution and not cache blocks during compactions period (as illustrated in the attached patch)? Or, can we be selectively aggressive about what blocks do get cached during compaction (e.g., only cache those blocks from the recent files)? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-3861) MiniZooKeeperCluster.startup() should refer to hbase.zookeeper.property.maxClientCnxns
[ https://issues.apache.org/jira/browse/HBASE-3861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-3861: - Looking at MiniZooKeeperCluster in trunk, this is already done: {code} NIOServerCnxnFactory standaloneServerFactory; while (true) { try { standaloneServerFactory = new NIOServerCnxnFactory(); standaloneServerFactory.configure( new InetSocketAddress(tentativePort), configuration.getInt(HConstants.ZOOKEEPER_MAX_CLIENT_CNXNS, 1000)); } catch (BindException e) { {code} Closing. MiniZooKeeperCluster.startup() should refer to hbase.zookeeper.property.maxClientCnxns -- Key: HBASE-3861 URL: https://issues.apache.org/jira/browse/HBASE-3861 Project: HBase Issue Type: Improvement Affects Versions: 0.90.3 Reporter: Eugene Koontz Assignee: Eugene Koontz Attachments: HBASE-3861.patch, HBASE-3861.patch Original Estimate: 1h Remaining Estimate: 1h Currently the number of the client connections is hard-wired to 1000: {noformat} standaloneServerFactory = new NIOServerCnxnFactory(); standaloneServerFactory.configure(new InetSocketAddress(clientPort),1000); } catch (BindException e) { {noformat} This should be set according to the test environment's hbase configuration. The property in question is : hbase.zookeeper.property.maxClientCnxns. Currently some tests such as org.apache.hadoop.hbase.client.TestHCM fail because the number of connections used by the HBase client exceeds 1000. Recently MAX_CACHED_HBASE_INSTANCES increased from 31 to 2000 on 0.90 branch: http://svn.apache.org/viewvc/hbase/branches/0.90/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java?p2=%2Fhbase%2Fbranches%2F0.90%2Fsrc%2Fmain%2Fjava%2Forg%2Fapache%2Fhadoop%2Fhbase%2Fclient%2FHConnectionManager.javap1=%2Fhbase%2Fbranches%2F0.90%2Fsrc%2Fmain%2Fjava%2Forg%2Fapache%2Fhadoop%2Fhbase%2Fclient%2FHConnectionManager.javar1=1096818r2=1096817view=diffpathrev=1096818 and correspondingly the hbase config on the Zookeeper server-side also increased in hbase-default.xml: http://svn.apache.org/viewvc/hbase/branches/0.90/src/main/resources/hbase-default.xml?p2=%2Fhbase%2Fbranches%2F0.90%2Fsrc%2Fmain%2Fresources%2Fhbase-default.xmlp1=%2Fhbase%2Fbranches%2F0.90%2Fsrc%2Fmain%2Fresources%2Fhbase-default.xmlr1=1091594r2=1091593view=diffpathrev=1091594 So if MiniZKCluster looks at this setting, the test won't have this failure. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-3861) MiniZooKeeperCluster.startup() should refer to hbase.zookeeper.property.maxClientCnxns
[ https://issues.apache.org/jira/browse/HBASE-3861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl resolved HBASE-3861. -- Resolution: Fixed MiniZooKeeperCluster.startup() should refer to hbase.zookeeper.property.maxClientCnxns -- Key: HBASE-3861 URL: https://issues.apache.org/jira/browse/HBASE-3861 Project: HBase Issue Type: Improvement Affects Versions: 0.90.3 Reporter: Eugene Koontz Assignee: Eugene Koontz Attachments: HBASE-3861.patch, HBASE-3861.patch Original Estimate: 1h Remaining Estimate: 1h Currently the number of the client connections is hard-wired to 1000: {noformat} standaloneServerFactory = new NIOServerCnxnFactory(); standaloneServerFactory.configure(new InetSocketAddress(clientPort),1000); } catch (BindException e) { {noformat} This should be set according to the test environment's hbase configuration. The property in question is : hbase.zookeeper.property.maxClientCnxns. Currently some tests such as org.apache.hadoop.hbase.client.TestHCM fail because the number of connections used by the HBase client exceeds 1000. Recently MAX_CACHED_HBASE_INSTANCES increased from 31 to 2000 on 0.90 branch: http://svn.apache.org/viewvc/hbase/branches/0.90/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java?p2=%2Fhbase%2Fbranches%2F0.90%2Fsrc%2Fmain%2Fjava%2Forg%2Fapache%2Fhadoop%2Fhbase%2Fclient%2FHConnectionManager.javap1=%2Fhbase%2Fbranches%2F0.90%2Fsrc%2Fmain%2Fjava%2Forg%2Fapache%2Fhadoop%2Fhbase%2Fclient%2FHConnectionManager.javar1=1096818r2=1096817view=diffpathrev=1096818 and correspondingly the hbase config on the Zookeeper server-side also increased in hbase-default.xml: http://svn.apache.org/viewvc/hbase/branches/0.90/src/main/resources/hbase-default.xml?p2=%2Fhbase%2Fbranches%2F0.90%2Fsrc%2Fmain%2Fresources%2Fhbase-default.xmlp1=%2Fhbase%2Fbranches%2F0.90%2Fsrc%2Fmain%2Fresources%2Fhbase-default.xmlr1=1091594r2=1091593view=diffpathrev=1091594 So if MiniZKCluster looks at this setting, the test won't have this failure. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6302) Document how to run integration tests
[ https://issues.apache.org/jira/browse/HBASE-6302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448933#comment-13448933 ] Enis Soztutar commented on HBASE-6302: -- Sorry guys, I was waiting for HBASE-6241 to be resolved first before updating the patch. Without HBASE-6241 finalized, if we commit the doc, it might be confusing. Document how to run integration tests - Key: HBASE-6302 URL: https://issues.apache.org/jira/browse/HBASE-6302 Project: HBase Issue Type: Sub-task Components: documentation Reporter: stack Assignee: Enis Soztutar Priority: Blocker Fix For: 0.96.0 Attachments: HBASE-6302_v1.patch HBASE-6203 has attached the old IT doc with some mods. When we figure how ITs are to be run, update it and apply the documentation under this issue. Making a blocker against 0.96. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3859) Increment a counter when a Scanner lease expires
[ https://issues.apache.org/jira/browse/HBASE-3859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448935#comment-13448935 ] Lars Hofhansl commented on HBASE-3859: -- Patch looks good. Should we commit? I'm a bit fuzzy on the current state of Metrics in HBase (v1 vs v2, etc) Increment a counter when a Scanner lease expires Key: HBASE-3859 URL: https://issues.apache.org/jira/browse/HBASE-3859 Project: HBase Issue Type: Improvement Components: metrics, regionserver Affects Versions: 0.90.2 Reporter: Benoit Sigoure Assignee: Mubarak Seyed Priority: Minor Attachments: HBASE-3859.trunk.v1.patch Whenever a Scanner lease expires, the RegionServer will close it automatically and log a message to complain. I would like the RegionServer to increment a counter whenever this happens and expose this counter through the metrics system, so we can plug this into our monitoring system (OpenTSDB) and keep track of how frequently this happens. It's not supposed to happen frequently so it's good to keep an eye on it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3976) Disable Block Cache On Compactions
[ https://issues.apache.org/jira/browse/HBASE-3976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448936#comment-13448936 ] Mikhail Bautin commented on HBASE-3976: --- Lars: thanks for double-checking this! Disable Block Cache On Compactions -- Key: HBASE-3976 URL: https://issues.apache.org/jira/browse/HBASE-3976 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.90.3 Reporter: Karthick Sankarachary Assignee: Mikhail Bautin Priority: Minor Attachments: HBASE-3976.patch, HBASE-3976-unconditional.patch, HBASE-3976-V3.patch Is there a good reason to believe that caching blocks during compactions is beneficial? Currently, if block cache is enabled on a certain family, then every time it's compacted, we load all of its blocks into the (LRU) cache, at the expense of the legitimately hot ones. As a matter of fact, this concern was raised earlier in HBASE-1597, which rightly points out that, we should not bog down the LRU with unneccessary blocks during compaction. Even though that issue has been marked as fixed, it looks like it ought to be reopened. Should we err on the side of caution and not cache blocks during compactions period (as illustrated in the attached patch)? Or, can we be selectively aggressive about what blocks do get cached during compaction (e.g., only cache those blocks from the recent files)? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3854) broken examples
[ https://issues.apache.org/jira/browse/HBASE-3854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448938#comment-13448938 ] Lars Hofhansl commented on HBASE-3854: -- Is this still an issue (I don't anything about thrift, so I can't really tell) broken examples --- Key: HBASE-3854 URL: https://issues.apache.org/jira/browse/HBASE-3854 Project: HBase Issue Type: Bug Components: thrift Affects Versions: 0.20.0 Reporter: Alexey Diomin Priority: Minor We introduce NotFound exception in HBASE-1292, but we drop it in HBASE-1367. In result: 1. incorrect doc in Hbase.thrift in as result in generated java and java-doc 2. broken examples in src/examples/thrift/ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
[ https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448942#comment-13448942 ] Ted Yu commented on HBASE-6649: --- @J-D: What do you think ? nit: {code} + } catch (IOException ie) { +break; {code} A log statement is desirable before break. [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1] --- Key: HBASE-6649 URL: https://issues.apache.org/jira/browse/HBASE-6649 Project: HBase Issue Type: Bug Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 0.96.0, 0.92.3, 0.94.2 Attachments: 6649-1.patch, 6649-2.txt, 6649-trunk.patch, HBase-0.92 #495 test - queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover [Jenkins].html Have seen it twice in the recent past: http://bit.ly/MPCykB http://bit.ly/O79Dq7 .. Looking briefly at the logs hints at a pattern - in both the failed test instances, there was an RS crash while the test was running. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-3840) Add sanity checks on Configurations to make sure hbase confs have been loaded
[ https://issues.apache.org/jira/browse/HBASE-3840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl resolved HBASE-3840. -- Resolution: Won't Fix There appears to be no interest in this for over a year. Closing. Please reopen if you disagree. Add sanity checks on Configurations to make sure hbase confs have been loaded - Key: HBASE-3840 URL: https://issues.apache.org/jira/browse/HBASE-3840 Project: HBase Issue Type: Improvement Affects Versions: 0.92.0 Reporter: Todd Lipcon A common user error (and even hbase dev error) is to pass a vanilla Hadoop Configuration into HBase methods that expect to see all of the relevant hbase defaults from hbase-default.xml. This often results in NPE or issues locating ZK. We should add a method like HBaseConfiguration.verify(conf) which ensures that the conf has incorporated hbase-default.xml. We can do this by checking for existence of hbase.defaults.for.version. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-3851) A Random-Access Column Object Model
[ https://issues.apache.org/jira/browse/HBASE-3851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl resolved HBASE-3851. -- Resolution: Won't Fix Closing, as suggested. @Karthik: Do you want to attach the github link you mentioned? A Random-Access Column Object Model --- Key: HBASE-3851 URL: https://issues.apache.org/jira/browse/HBASE-3851 Project: HBase Issue Type: New Feature Components: client Affects Versions: 0.92.0 Reporter: Karthick Sankarachary Assignee: Karthick Sankarachary Priority: Minor Labels: HBase, Mapping, Object Attachments: HBASE-3851.patch By design, a value in HBase is an opaque and atomic byte array. In theory, any arbitrary type can potentially be represented in terms of such unstructured yet indivisible units. However, as the complexity of the type increases, so does the need to access it in parts rather than in whole. That way, one can update parts of a value without reading the whole first. This calls for transparency in the type of data being accessed. To that end, we introduce here a simple object model where each part maps to a {{HTable}} column and value thereof. Specifically, we define a {{ColumnObject}} interface that denotes an arbitrary type comprising properties, where each property is a {{name, value}} tuple of byte arrays. In essence, each property maps to a distinct HBase {{KeyValue}}. In particular, the property's name maps to a column, prefixed by the qualifier and the object's identifier (assumed to be unique within a column family), and the property's value maps to the {{KeyValue#getValue()}} of the corresponding column. Furthermore, the {{ColumnObject}} is marked as a {{RandomAccess}} type to underscore the fact that its properties can be accessed in and of themselves. For starters, we provide three concrete objects - a {{ColumnMap}}, {{ColumnList}} and {{ColumnSet}} that implement the {{Map}}, {{List}} and {{Set}} interfaces respectively. The {{ColumnMap}} treats each {{Map.Entry}} as an object property, the {{ColumnList}} stores each element against its ordinal position, and the {{ColumnSet}} considers each element as the property name (as well as its value). For the sake of convenience, we also define extensions to the {{Get}}, {{Put}}, {{Delete}} and {{Result}} classes that are aware of and know how to deal with such {{ColumnObject}} types. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3834) Store ignores checksum errors when opening files
[ https://issues.apache.org/jira/browse/HBASE-3834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448950#comment-13448950 ] Lars Hofhansl commented on HBASE-3834: -- Should we close this? There appears to be not much interest in it. Store ignores checksum errors when opening files Key: HBASE-3834 URL: https://issues.apache.org/jira/browse/HBASE-3834 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.90.2 Reporter: Todd Lipcon Priority: Critical Fix For: 0.90.8 If you corrupt one of the storefiles in a region (eg using vim to muck up some bytes), the region will still open, but that storefile will just be ignored with a log message. We should probably not do this in general - better to keep that region unassigned and force an admin to make a decision to remove the bad storefile. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3828) region server stuck in waitOnAllRegionsToClose
[ https://issues.apache.org/jira/browse/HBASE-3828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448964#comment-13448964 ] Lars Hofhansl commented on HBASE-3828: -- I assume with all the recent work this has been fixed. @Ram, @Stack: Would you agree with that? If so, we can just close this. region server stuck in waitOnAllRegionsToClose -- Key: HBASE-3828 URL: https://issues.apache.org/jira/browse/HBASE-3828 Project: HBase Issue Type: Bug Reporter: Prakash Khemani -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6721) RegionServer Group based Assignment
[ https://issues.apache.org/jira/browse/HBASE-6721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448970#comment-13448970 ] Ted Yu commented on HBASE-6721: --- More details should be added to the design. Have you considered introducing interface for AssignmentManager so that existing and new managers can be easily swapped ? Have you considered storing group information in zookeeper instead of on hdfs ? Please explain more about RegionServerGroupProtocol. Thanks for the initiative. RegionServer Group based Assignment --- Key: HBASE-6721 URL: https://issues.apache.org/jira/browse/HBASE-6721 Project: HBase Issue Type: New Feature Reporter: Francis Liu Assignee: Vandana Ayyalasomayajula Fix For: 0.96.0 Attachments: HBASE-6721-DesigDoc.pdf In multi-tenant deployments of HBase, it is likely that a RegionServer will be serving out regions from a number of different tables owned by various client applications. Being able to group a subset of running RegionServers and assign specific tables to it, provides a client application a level of isolation and resource allocation. The proposal essentially is to have an AssignmentManager which is aware of RegionServer groups and assigns tables to region servers based on groupings. Load balancing will occur on a per group basis as well. This is essentially a simplification of the approach taken in HBASE-4120. See attached document. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3859) Increment a counter when a Scanner lease expires
[ https://issues.apache.org/jira/browse/HBASE-3859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448971#comment-13448971 ] Elliott Clark commented on HBASE-3859: -- Wouldn't it be better to use MetricsTimeVaryingLong rather than a MetricsLongValue and an AtomicLong [~lhofhansl] We're starting to get close to finishing the move to metrics2 however the HRegionServer is the last part that needs to be moved over. My plan is to move over and clean up stuff in HBASE-4050 in the coming weeks. With that said I still think this can be a useful issue and having it in the Mertics1 version will make sure that it's ported over when the time comes. Increment a counter when a Scanner lease expires Key: HBASE-3859 URL: https://issues.apache.org/jira/browse/HBASE-3859 Project: HBase Issue Type: Improvement Components: metrics, regionserver Affects Versions: 0.90.2 Reporter: Benoit Sigoure Assignee: Mubarak Seyed Priority: Minor Attachments: HBASE-3859.trunk.v1.patch Whenever a Scanner lease expires, the RegionServer will close it automatically and log a message to complain. I would like the RegionServer to increment a counter whenever this happens and expose this counter through the metrics system, so we can plug this into our monitoring system (OpenTSDB) and keep track of how frequently this happens. It's not supposed to happen frequently so it's good to keep an eye on it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-3814) force regionserver to halt
[ https://issues.apache.org/jira/browse/HBASE-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl resolved HBASE-3814. -- Resolution: Won't Fix There appears to be no interest in this one. Please revive if you think we should do this. force regionserver to halt -- Key: HBASE-3814 URL: https://issues.apache.org/jira/browse/HBASE-3814 Project: HBase Issue Type: Bug Reporter: Prakash Khemani Once abort() on a regionserver is called we should have a timeout thread that does Runtime.halt() if the rs gets stuck somewhere during abort processing. === Pumahbase132 has following the logs .. the dfsclient is not able to set up a write pipeline successfully ... it tries to abort ... but while aborting it gets stuck. I know there is a check that if we are aborting because filesystem is closed then we should not try to flush the logs while aborting. But in this case the fs is up and running, just that it is not functioning. 2011-04-21 23:48:07,082 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream 10.38.131.53:50010 for file /PUMAHBASE002-SNC5-HBASE/.logs/pumahbase132.snc5.facebook.com,60020,1303450732026/pumahbase132.snc5.facebook.com%3A60020.1303450732280java.io.IOException: Bad connect ack with firstBadLink 10.38.133.33:50010 2011-04-21 23:48:07,082 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_-8967376451767492285_6537229 for file /PUMAHBASE002-SNC5-HBASE/.logs/pumahbase132.snc5.facebook.com,60020,1303450732026/pumahbase132.snc5.facebook.com%3A60020.1303450732280 2011-04-21 23:48:07,125 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream 10.38.131.53:50010 for file /PUMAHBASE002-SNC5-HBASE/.logs/pumahbase132.snc5.facebook.com,60020,1303450732026/pumahbase132.snc5.facebook.com%3A60020.1303450732280java.io.IOException: Bad connect ack with firstBadLink 10.38.134.59:50010 2011-04-21 23:48:07,125 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_7172251852699100447_6537229 for file /PUMAHBASE002-SNC5-HBASE/.logs/pumahbase132.snc5.facebook.com,60020,1303450732026/pumahbase132.snc5.facebook.com%3A60020.1303450732280 2011-04-21 23:48:07,169 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream 10.38.131.53:50010 for file /PUMAHBASE002-SNC5-HBASE/.logs/pumahbase132.snc5.facebook.com,60020,1303450732026/pumahbase132.snc5.facebook.com%3A60020.1303450732280java.io.IOException: Bad connect ack with firstBadLink 10.38.134.53:50010 2011-04-21 23:48:07,169 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_-9153204772467623625_6537229 for file /PUMAHBASE002-SNC5-HBASE/.logs/pumahbase132.snc5.facebook.com,60020,1303450732026/pumahbase132.snc5.facebook.com%3A60020.1303450732280 2011-04-21 23:48:07,213 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream 10.38.131.53:50010 for file /PUMAHBASE002-SNC5-HBASE/.logs/pumahbase132.snc5.facebook.com,60020,1303450732026/pumahbase132.snc5.facebook.com%3A60020.1303450732280java.io.IOException: Bad connect ack with firstBadLink 10.38.134.49:50010 2011-04-21 23:48:07,213 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_-2513098940934276625_6537229 for file /PUMAHBASE002-SNC5-HBASE/.logs/pumahbase132.snc5.facebook.com,60020,1303450732026/pumahbase132.snc5.facebook.com%3A60020.1303450732280 2011-04-21 23:48:07,214 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception: java.io.IOException: Unable to create new block. at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3560) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2700(DFSClient.java:2720) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2977) 2011-04-21 23:48:07,214 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_-2513098940934276625_6537229 bad datanode[1] nodes == null 2011-04-21 23:48:07,214 WARN org.apache.hadoop.hdfs.DFSClient: Could not get block locations. Source file /PUMAHBASE002-SNC5-HBASE/.logs/pumahbase132.snc5.facebook.com,60020,1303450732026/pumahbase132.snc5.facebook.com%3A60020.1303450732280 - Aborting... 2011-04-21 23:48:07,216 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog And then the RS gets stuck trying to roll the logs ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6720) Optionally limit number of regions balanced in each balancer run
[ https://issues.apache.org/jira/browse/HBASE-6720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448975#comment-13448975 ] Elliott Clark commented on HBASE-6720: -- When this is put in we need to make sure to change the StochasticLoadBalancer as well. It right now has a setting hbase.master.balancer.stochastic.maxMoveRegions that sets the maximum number of regions to move at a time. Optionally limit number of regions balanced in each balancer run Key: HBASE-6720 URL: https://issues.apache.org/jira/browse/HBASE-6720 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Fix For: 0.96.0, 0.94.3 See discussion on HBASE-3866 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-6715) TestFromClientSide.testCacheOnWriteEvictOnClose is flaky
[ https://issues.apache.org/jira/browse/HBASE-6715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang reassigned HBASE-6715: -- Assignee: Jimmy Xiang TestFromClientSide.testCacheOnWriteEvictOnClose is flaky Key: HBASE-6715 URL: https://issues.apache.org/jira/browse/HBASE-6715 Project: HBase Issue Type: Test Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Occasionally, this test fails: {noformat} expected:2049 but was:2069 Stacktrace java.lang.AssertionError: expected:2049 but was:2069 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.junit.Assert.assertEquals(Assert.java:456) at org.apache.hadoop.hbase.client.TestFromClientSide.testCacheOnWriteEvictOnClose(TestFromClientSide.java:4248) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) {noformat} It could be because there is other thread still accessing the cache. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
[ https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448987#comment-13448987 ] Himanshu Vashishtha commented on HBASE-6649: lgtm. The exception will be re-thrown in the next try, so +0 on adding a log statement before break. [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1] --- Key: HBASE-6649 URL: https://issues.apache.org/jira/browse/HBASE-6649 Project: HBase Issue Type: Bug Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 0.96.0, 0.92.3, 0.94.2 Attachments: 6649-1.patch, 6649-2.txt, 6649-trunk.patch, HBase-0.92 #495 test - queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover [Jenkins].html Have seen it twice in the recent past: http://bit.ly/MPCykB http://bit.ly/O79Dq7 .. Looking briefly at the logs hints at a pattern - in both the failed test instances, there was an RS crash while the test was running. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
[ https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448988#comment-13448988 ] stack commented on HBASE-6649: -- J-D on vacation. Let me commit this. Will add the log message Ted suggests though my sense it overkill, lets see. Would suggest new issue for other 'parts' DD. [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1] --- Key: HBASE-6649 URL: https://issues.apache.org/jira/browse/HBASE-6649 Project: HBase Issue Type: Bug Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 0.96.0, 0.92.3, 0.94.2 Attachments: 6649-1.patch, 6649-2.txt, 6649-trunk.patch, HBase-0.92 #495 test - queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover [Jenkins].html Have seen it twice in the recent past: http://bit.ly/MPCykB http://bit.ly/O79Dq7 .. Looking briefly at the logs hints at a pattern - in both the failed test instances, there was an RS crash while the test was running. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
[ https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj Das updated HBASE-6649: --- Attachment: 6649-0.92.patch 6649-trunk.patch Don't mind adding a few comments around the exception handling.. [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1] --- Key: HBASE-6649 URL: https://issues.apache.org/jira/browse/HBASE-6649 Project: HBase Issue Type: Bug Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 0.96.0, 0.92.3, 0.94.2 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 6649-trunk.patch, 6649-trunk.patch, HBase-0.92 #495 test - queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover [Jenkins].html Have seen it twice in the recent past: http://bit.ly/MPCykB http://bit.ly/O79Dq7 .. Looking briefly at the logs hints at a pattern - in both the failed test instances, there was an RS crash while the test was running. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
[ https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-6649: - Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed to trunk, 0.92, and 0.94. Thanks for the reviews lads and DD for the patch. [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1] --- Key: HBASE-6649 URL: https://issues.apache.org/jira/browse/HBASE-6649 Project: HBase Issue Type: Bug Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 0.96.0, 0.92.3, 0.94.2 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 6649-trunk.patch, 6649-trunk.patch, HBase-0.92 #495 test - queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover [Jenkins].html Have seen it twice in the recent past: http://bit.ly/MPCykB http://bit.ly/O79Dq7 .. Looking briefly at the logs hints at a pattern - in both the failed test instances, there was an RS crash while the test was running. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
[ https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-6649: - Attachment: 6649.txt Here is what I applied. Includes Ted's suggested logging. I applied this same patch to 0.94 and 0.92 w/ -p1 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1] --- Key: HBASE-6649 URL: https://issues.apache.org/jira/browse/HBASE-6649 Project: HBase Issue Type: Bug Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 0.96.0, 0.92.3, 0.94.2 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 6649-trunk.patch, 6649-trunk.patch, 6649.txt, HBase-0.92 #495 test - queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover [Jenkins].html Have seen it twice in the recent past: http://bit.ly/MPCykB http://bit.ly/O79Dq7 .. Looking briefly at the logs hints at a pattern - in both the failed test instances, there was an RS crash while the test was running. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3834) Store ignores checksum errors when opening files
[ https://issues.apache.org/jira/browse/HBASE-3834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448998#comment-13448998 ] Todd Lipcon commented on HBASE-3834: It's still a somewhat scary bug, if it still exists. It causes data to be silently missing from a table. So I hope someone will take interest in it :) Store ignores checksum errors when opening files Key: HBASE-3834 URL: https://issues.apache.org/jira/browse/HBASE-3834 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.90.2 Reporter: Todd Lipcon Priority: Critical Fix For: 0.90.8 If you corrupt one of the storefiles in a region (eg using vim to muck up some bytes), the region will still open, but that storefile will just be ignored with a log message. We should probably not do this in general - better to keep that region unassigned and force an admin to make a decision to remove the bad storefile. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6721) RegionServer Group based Assignment
[ https://issues.apache.org/jira/browse/HBASE-6721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13449007#comment-13449007 ] Ted Yu commented on HBASE-6721: --- Another aspect is fault tolerance. Say the smallest group consists of 6 region servers, the impact of majority of the 6 servers going down at the same time is much higher than 6 servers out of whole cluster going down where there is only one group. RegionServer Group based Assignment --- Key: HBASE-6721 URL: https://issues.apache.org/jira/browse/HBASE-6721 Project: HBase Issue Type: New Feature Reporter: Francis Liu Assignee: Vandana Ayyalasomayajula Fix For: 0.96.0 Attachments: HBASE-6721-DesigDoc.pdf In multi-tenant deployments of HBase, it is likely that a RegionServer will be serving out regions from a number of different tables owned by various client applications. Being able to group a subset of running RegionServers and assign specific tables to it, provides a client application a level of isolation and resource allocation. The proposal essentially is to have an AssignmentManager which is aware of RegionServer groups and assigns tables to region servers based on groupings. Load balancing will occur on a per group basis as well. This is essentially a simplification of the approach taken in HBASE-4120. See attached document. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6722) fixHdfsOrphans won't work for first/end regions
Adrien Mogenet created HBASE-6722: - Summary: fixHdfsOrphans won't work for first/end regions Key: HBASE-6722 URL: https://issues.apache.org/jira/browse/HBASE-6722 Project: HBase Issue Type: Bug Components: hbck Affects Versions: 0.94.1 Reporter: Adrien Mogenet When a .regioninfo is missing on the first (or final) region, it will try to determine the startKey (or endKey) based on what it has been seen on the HDFS. However, for these special cases an empty key should be considered instead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
[ https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13449035#comment-13449035 ] Hudson commented on HBASE-6649: --- Integrated in HBase-0.94 #450 (See [https://builds.apache.org/job/HBase-0.94/450/]) HBASE-6649 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1] (Revision 1381289) Result = FAILURE stack : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1] --- Key: HBASE-6649 URL: https://issues.apache.org/jira/browse/HBASE-6649 Project: HBase Issue Type: Bug Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 0.96.0, 0.92.3, 0.94.2 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 6649-trunk.patch, 6649-trunk.patch, 6649.txt, HBase-0.92 #495 test - queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover [Jenkins].html Have seen it twice in the recent past: http://bit.ly/MPCykB http://bit.ly/O79Dq7 .. Looking briefly at the logs hints at a pattern - in both the failed test instances, there was an RS crash while the test was running. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6669) Add BigDecimalColumnInterpreter for doing aggregations using AggregationClient
[ https://issues.apache.org/jira/browse/HBASE-6669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13449067#comment-13449067 ] Ted Yu commented on HBASE-6669: --- Since there're two BigDecimal fields in BigDecimalColumnInterpreter, you need to implement readFields() and write() for serialization. Add BigDecimalColumnInterpreter for doing aggregations using AggregationClient -- Key: HBASE-6669 URL: https://issues.apache.org/jira/browse/HBASE-6669 Project: HBase Issue Type: New Feature Components: client, coprocessors Reporter: Anil Gupta Priority: Minor Labels: client, coprocessors Attachments: BigDecimalColumnInterpreter.java, BigDecimalColumnInterpreter.patch, BigDecimalColumnInterpreter.patch I recently created a Class for doing aggregations(sum,min,max,std) on values stored as BigDecimal in HBase. I would like to commit the BigDecimalColumnInterpreter into HBase. In my opinion this class can be used by a wide variety of users. Please let me know if its not appropriate to add this class in HBase. Thanks, Anil Gupta Software Engineer II, Intuit, Inc -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
[ https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13449069#comment-13449069 ] Hudson commented on HBASE-6649: --- Integrated in HBase-TRUNK #3307 (See [https://builds.apache.org/job/HBase-TRUNK/3307/]) HBASE-6649 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1] (Revision 1381287) Result = FAILURE stack : Files : * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1] --- Key: HBASE-6649 URL: https://issues.apache.org/jira/browse/HBASE-6649 Project: HBase Issue Type: Bug Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 0.96.0, 0.92.3, 0.94.2 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 6649-trunk.patch, 6649-trunk.patch, 6649.txt, HBase-0.92 #495 test - queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover [Jenkins].html Have seen it twice in the recent past: http://bit.ly/MPCykB http://bit.ly/O79Dq7 .. Looking briefly at the logs hints at a pattern - in both the failed test instances, there was an RS crash while the test was running. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3803) Make the load balancer run with a gentle hand
[ https://issues.apache.org/jira/browse/HBASE-3803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13449085#comment-13449085 ] Lars Hofhansl commented on HBASE-3803: -- Can we subsume this in HBASE-6720. Make the load balancer run with a gentle hand - Key: HBASE-3803 URL: https://issues.apache.org/jira/browse/HBASE-3803 Project: HBase Issue Type: Improvement Reporter: stack We need 'smoothing' of balancer region move Yesterday we brought a regionserver back online into a smallish cluster that was under load and the balance run unloaded a bunch of regions all in the one go which put a dent in the throughput when a bunch of regions went offline at the one time. It'd be sweet if the balancer ran at a context appropriate 'rate'; when under load, it should move regions 'gently' rather than all as a big bang (the decommission script will move a region at a time, verifying it deployed in its new location before moving another... this can take ages to complete but its proven minimally disruptive to loadings) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-879) When dfs restarts or moves blocks around, hbase regionservers don't notice
[ https://issues.apache.org/jira/browse/HBASE-879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesse Yates resolved HBASE-879. --- Resolution: Fixed I think this is fixed in the current versions of HBase. Reoopen if I'm mistaken. When dfs restarts or moves blocks around, hbase regionservers don't notice -- Key: HBASE-879 URL: https://issues.apache.org/jira/browse/HBASE-879 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.18.1, 0.19.0 Reporter: Michael Bieniosek Since the hbase regionservers use a DFSClient to keep handles open to the dfs, if the dfs blocks move around (typically because of a dfs restart, but can also happen if datanodes die or blocks get shuffled around), the regionserver will be unable to service the region. It would be nice if the DFSClient that the regionservers use could notice this case and refresh the block list. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-6017) TestReplication fails occasionally
[ https://issues.apache.org/jira/browse/HBASE-6017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesse Yates resolved HBASE-6017. Resolution: Duplicate DUP of HBASE-6649 TestReplication fails occasionally -- Key: HBASE-6017 URL: https://issues.apache.org/jira/browse/HBASE-6017 Project: HBase Issue Type: Bug Affects Versions: 0.92.1 Reporter: Devaraj Das I see occasional failures in TestReplication on the 0.92 branch. Running org.apache.hadoop.hbase.replication.TestReplication Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 240.118 sec FAILURE! Results : Failed tests: queueFailover(org.apache.hadoop.hbase.replication.TestReplication): Waited too much time for queueFailover replication -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-917) filesystem intensive operations such as compaction should be load aware
[ https://issues.apache.org/jira/browse/HBASE-917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesse Yates resolved HBASE-917. --- Resolution: Won't Fix I think this can be handled via coprocessor hooks - closing as won't fix. filesystem intensive operations such as compaction should be load aware --- Key: HBASE-917 URL: https://issues.apache.org/jira/browse/HBASE-917 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Andrew Purtell If the underlying filesystem is already severely stressed, running intensive operations such as compaction is asking for trouble. Ideally, such actions should be deferred until load is observed to lessen. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-1042) OOME but we don't abort
[ https://issues.apache.org/jira/browse/HBASE-1042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesse Yates resolved HBASE-1042. Resolution: Fixed This is fixed against trunk, according to the comments. Reopen if still an issue. OOME but we don't abort --- Key: HBASE-1042 URL: https://issues.apache.org/jira/browse/HBASE-1042 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Attachments: 1042-committed.patch, 1042.patch, 1042-v2.patch On streamy cluster saw case where graceful shutdown had been triggered rather than an abort on OOME. On graceful shutdown, we wait on leases to expire or be closed. Server wouldn't go down because it was waiting on leases to expire only an OOME in Leases had killed the thread so it wasn't ever going to expire anything. Node was stuck for four hours till someone noticed it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
[ https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13449147#comment-13449147 ] Hudson commented on HBASE-6649: --- Integrated in HBase-0.92 #557 (See [https://builds.apache.org/job/HBase-0.92/557/]) HBASE-6649 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1] (Revision 1381291) Result = SUCCESS stack : Files : * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1] --- Key: HBASE-6649 URL: https://issues.apache.org/jira/browse/HBASE-6649 Project: HBase Issue Type: Bug Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 0.96.0, 0.92.3, 0.94.2 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 6649-trunk.patch, 6649-trunk.patch, 6649.txt, HBase-0.92 #495 test - queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover [Jenkins].html Have seen it twice in the recent past: http://bit.ly/MPCykB http://bit.ly/O79Dq7 .. Looking briefly at the logs hints at a pattern - in both the failed test instances, there was an RS crash while the test was running. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6610) HFileLink: Hardlink alternative for snapshot restore
[ https://issues.apache.org/jira/browse/HBASE-6610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-6610: --- Attachment: HBASE-6610-v3.patch HFileLink: Hardlink alternative for snapshot restore Key: HBASE-6610 URL: https://issues.apache.org/jira/browse/HBASE-6610 Project: HBase Issue Type: Sub-task Components: io Affects Versions: 0.96.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Labels: snapshot Fix For: 0.96.0 Attachments: HBASE-6610-v1.patch, HBASE-6610-v2.patch, HBASE-6610-v3.patch To avoid copying data during restore snapshot we need to introduce an HFile Link that allows to reference a file that can be in the original path (/hbase/table/region/cf/hfile) or, if the file is archived, in the archive directory (/hbase/.archive/table/region/cf/hfile). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (HBASE-917) filesystem intensive operations such as compaction should be load aware
[ https://issues.apache.org/jira/browse/HBASE-917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack reopened HBASE-917: - Reopening. The vehicle by which we achieve this issue may be a coprocessor but the actual work still needs to be done. I'd say we should leave this issue open. You might argue the issue is without sufficient detail. You might get away w/ closing it with that justification. filesystem intensive operations such as compaction should be load aware --- Key: HBASE-917 URL: https://issues.apache.org/jira/browse/HBASE-917 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Andrew Purtell If the underlying filesystem is already severely stressed, running intensive operations such as compaction is asking for trouble. Ideally, such actions should be deferred until load is observed to lessen. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6721) RegionServer Group based Assignment
[ https://issues.apache.org/jira/browse/HBASE-6721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13449185#comment-13449185 ] Francis Liu commented on HBASE-6721: Have you considered introducing interface for AssignmentManager so that existing and new managers can be easily swapped? Yes, part of the proposal is to make AssignmentManager pluggable. I'll add that as a subtask for this. Have you considered storing group information in zookeeper instead of on hdfs ? Correct me if I'm wrong, but it seems the approach HBase has taken for it's usage of ZK is more towards storing temporal data for coordination and the real source of truth is on HDFS or Tables. And we decided to follow the same approach. Please explain more about RegionServerGroupProtocol. RegionServerGroupProtocol exposes APIs to manage Grouping (see API in doc). The currently plan is that these APIs will be used and exposed via the CLI commands. Another aspect is fault tolerance. Say the smallest group consists of 6 region servers, the impact of majority of the 6 servers going down at the same time is much higher than 6 servers out of whole cluster going down where there is only one group. This is similar to hbase cluster sizing for fault tolerance. Let's play around with it and later on document best practices. RegionServer Group based Assignment --- Key: HBASE-6721 URL: https://issues.apache.org/jira/browse/HBASE-6721 Project: HBase Issue Type: New Feature Reporter: Francis Liu Assignee: Vandana Ayyalasomayajula Fix For: 0.96.0 Attachments: HBASE-6721-DesigDoc.pdf In multi-tenant deployments of HBase, it is likely that a RegionServer will be serving out regions from a number of different tables owned by various client applications. Being able to group a subset of running RegionServers and assign specific tables to it, provides a client application a level of isolation and resource allocation. The proposal essentially is to have an AssignmentManager which is aware of RegionServer groups and assigns tables to region servers based on groupings. Load balancing will occur on a per group basis as well. This is essentially a simplification of the approach taken in HBASE-4120. See attached document. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6723) Make AssignmentManager pluggable
Francis Liu created HBASE-6723: -- Summary: Make AssignmentManager pluggable Key: HBASE-6723 URL: https://issues.apache.org/jira/browse/HBASE-6723 Project: HBase Issue Type: Sub-task Reporter: Francis Liu -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-5065) wrong IllegalArgumentException thrown when creating an 'HServerAddress' with an un-reachable hostname
[ https://issues.apache.org/jira/browse/HBASE-5065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Trezzo resolved HBASE-5065. - Resolution: Invalid This looks like it is now fixed in trunk and 0.90. checkBindAddressCanBeResolved() now has a null check and throws an IllegalArgumentException with an appropriate message. This class is also deprecated. Please reopen if you think differently. wrong IllegalArgumentException thrown when creating an 'HServerAddress' with an un-reachable hostname - Key: HBASE-5065 URL: https://issues.apache.org/jira/browse/HBASE-5065 Project: HBase Issue Type: Bug Components: util Affects Versions: 0.90.4 Reporter: Eran Hirsch Priority: Trivial When trying to build an 'HServerAddress' object with an unresolvable hostname: e.g. new HServerAddress(www.IAMUNREACHABLE.com:80) a call to 'getResolvedAddress' would cause the 'InetSocketAddress' c'tor to throw an IllegalArgumentException because it is called with a null 'hostname' parameter. This happens because there is no null-check after the static 'getBindAddressInternal' method returns a null value when the hostname is unresolved. This is a trivial bug because the code HServerAddress is expected to throw this kind of exception when this error occurs, but it is thrown for the wrong reason. The method 'checkBindAddressCanBeResolved' should be the one throwing the exception (and give a slightly different reason). Because of this reason the method call itself becomes redundent as it will always succeed in the current flow, because the case it checks is already checked for by the previous getResolvedAddress method. In short: an IllegalArgumentException is thrown with reason: hostname can't be null from the InetSocketAddress c'tor INSTEAD OF an IllegalArgumentException with reason: Could not resolve the DNS name of [BADHOSTNAME]:[PORT] from HServerAddress's checkBindCanBeResolved method. Stack trace: java.lang.IllegalArgumentException: hostname can't be null at java.net.InetSocketAddress.init(InetSocketAddress.java:139) ~[na:1.7.0_02] at org.apache.hadoop.hbase.HServerAddress.getResolvedAddress(HServerAddress.java:108) ~[hbase-0.90.4.jar:0.90.4] at org.apache.hadoop.hbase.HServerAddress.init(HServerAddress.java:64) ~[hbase-0.90.4.jar:0.90.4] at org.apache.hadoop.hbase.zookeeper.RootRegionTracker.dataToHServerAddress(RootRegionTracker.java:82) ~[hbase-0.90.4.jar:0.90.4] at org.apache.hadoop.hbase.zookeeper.RootRegionTracker.waitRootRegionLocation(RootRegionTracker.java:73) ~[hbase-0.90.4.jar:0.90.4] at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:579) ~[hbase-0.90.4.jar:0.90.4] at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:559) ~[hbase-0.90.4.jar:0.90.4] at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:688) ~[hbase-0.90.4.jar:0.90.4] at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:590) ~[hbase-0.90.4.jar:0.90.4] at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:559) ~[hbase-0.90.4.jar:0.90.4] at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:688) ~[hbase-0.90.4.jar:0.90.4] at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:594) ~[hbase-0.90.4.jar:0.90.4] at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:559) ~[hbase-0.90.4.jar:0.90.4] at org.apache.hadoop.hbase.client.HTable.init(HTable.java:173) ~[hbase-0.90.4.jar:0.90.4] at org.apache.hadoop.hbase.client.HTable.init(HTable.java:147) ~[hbase-0.90.4.jar:0.90.4] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6721) RegionServer Group based Assignment
[ https://issues.apache.org/jira/browse/HBASE-6721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13449238#comment-13449238 ] Ted Yu commented on HBASE-6721: --- Looking at current doc, GroupInfo would be passed to (new) AssignmentManager. Do you plan to reference GroupInfo in AssignmentManager interface ? RegionServer Group based Assignment --- Key: HBASE-6721 URL: https://issues.apache.org/jira/browse/HBASE-6721 Project: HBase Issue Type: New Feature Reporter: Francis Liu Assignee: Vandana Ayyalasomayajula Fix For: 0.96.0 Attachments: HBASE-6721-DesigDoc.pdf In multi-tenant deployments of HBase, it is likely that a RegionServer will be serving out regions from a number of different tables owned by various client applications. Being able to group a subset of running RegionServers and assign specific tables to it, provides a client application a level of isolation and resource allocation. The proposal essentially is to have an AssignmentManager which is aware of RegionServer groups and assigns tables to region servers based on groupings. Load balancing will occur on a per group basis as well. This is essentially a simplification of the approach taken in HBASE-4120. See attached document. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6165) Replication can overrun .META. scans on cluster re-start
[ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13449245#comment-13449245 ] Jeff Whiting commented on HBASE-6165: - I maybe a little late to the party, but why is replication using any kind of higher than normal priority handlers? It looks like we all agree that they shouldn't be using the high priority handlers. It looks like they now have their own medium priority handlers. But I don't see an argument as to why they don't just use the normal handlers priority handlers. Replication can overrun .META. scans on cluster re-start Key: HBASE-6165 URL: https://issues.apache.org/jira/browse/HBASE-6165 Project: HBase Issue Type: Bug Reporter: Elliott Clark Assignee: Himanshu Vashishtha Fix For: 0.96.0, 0.94.2 Attachments: 6165-v6.txt, HBase-6165-94-v1.patch, HBase-6165-94-v2.patch, HBase-6165-v1.patch, HBase-6165-v2.patch, HBase-6165-v3.patch, HBase-6165-v4.patch, HBase-6165-v5.patch When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6659) Port HBASE-6508 Filter out edits at log split time
[ https://issues.apache.org/jira/browse/HBASE-6659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13449251#comment-13449251 ] Ted Yu commented on HBASE-6659: --- For last flushed sequence Id, another option is to embed it in HRegionInfo. This way, there is no need to modify RegionLoad. Port HBASE-6508 Filter out edits at log split time -- Key: HBASE-6659 URL: https://issues.apache.org/jira/browse/HBASE-6659 Project: HBase Issue Type: Bug Reporter: Zhihong Ted Yu Assignee: Zhihong Ted Yu Fix For: 0.96.0 Attachments: 6508-v2.txt, 6508-v3.txt, 6508-v4.txt, 6508-v5.txt, 6508-v7.txt, 6508-v7.txt HBASE-6508 is for 0.89-fb branch. This JIRA ports the feature to trunk. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6723) Make AssignmentManager pluggable
[ https://issues.apache.org/jira/browse/HBASE-6723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13449262#comment-13449262 ] stack commented on HBASE-6723: -- One thought is that AM as is should not be pluggable. Its way too fat doing too many things such as actual rpcs inside in AM. My guess is you don't want your AM replacement doing rpcs and handing zk callbacks directly; that should be done by a wrapper class and what you want to replace is some nugget core that makes the assignment decisions, something we don't yet have but that we badly need if only to make AM decision making more testable. Go easy Francis. Make AssignmentManager pluggable Key: HBASE-6723 URL: https://issues.apache.org/jira/browse/HBASE-6723 Project: HBase Issue Type: Sub-task Reporter: Francis Liu -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6715) TestFromClientSide.testCacheOnWriteEvictOnClose is flaky
[ https://issues.apache.org/jira/browse/HBASE-6715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-6715: --- Status: Patch Available (was: Open) TestFromClientSide.testCacheOnWriteEvictOnClose is flaky Key: HBASE-6715 URL: https://issues.apache.org/jira/browse/HBASE-6715 Project: HBase Issue Type: Test Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Attachments: trunk-6715.patch Occasionally, this test fails: {noformat} expected:2049 but was:2069 Stacktrace java.lang.AssertionError: expected:2049 but was:2069 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.junit.Assert.assertEquals(Assert.java:456) at org.apache.hadoop.hbase.client.TestFromClientSide.testCacheOnWriteEvictOnClose(TestFromClientSide.java:4248) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) {noformat} It could be because there is other thread still accessing the cache. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6715) TestFromClientSide.testCacheOnWriteEvictOnClose is flaky
[ https://issues.apache.org/jira/browse/HBASE-6715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-6715: --- Attachment: trunk-6715.patch TestFromClientSide.testCacheOnWriteEvictOnClose is flaky Key: HBASE-6715 URL: https://issues.apache.org/jira/browse/HBASE-6715 Project: HBase Issue Type: Test Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Attachments: trunk-6715.patch Occasionally, this test fails: {noformat} expected:2049 but was:2069 Stacktrace java.lang.AssertionError: expected:2049 but was:2069 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.junit.Assert.assertEquals(Assert.java:456) at org.apache.hadoop.hbase.client.TestFromClientSide.testCacheOnWriteEvictOnClose(TestFromClientSide.java:4248) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) {noformat} It could be because there is other thread still accessing the cache. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6715) TestFromClientSide.testCacheOnWriteEvictOnClose is flaky
[ https://issues.apache.org/jira/browse/HBASE-6715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13449266#comment-13449266 ] stack commented on HBASE-6715: -- Is this a fix or more debug to find why the fail? I'm +1 on commit in either case. TestFromClientSide.testCacheOnWriteEvictOnClose is flaky Key: HBASE-6715 URL: https://issues.apache.org/jira/browse/HBASE-6715 Project: HBase Issue Type: Test Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Attachments: trunk-6715.patch Occasionally, this test fails: {noformat} expected:2049 but was:2069 Stacktrace java.lang.AssertionError: expected:2049 but was:2069 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.junit.Assert.assertEquals(Assert.java:456) at org.apache.hadoop.hbase.client.TestFromClientSide.testCacheOnWriteEvictOnClose(TestFromClientSide.java:4248) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) {noformat} It could be because there is other thread still accessing the cache. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira