[jira] [Created] (HBASE-8728) HBase table schema
JOB M THOMAS created HBASE-8728: --- Summary: HBase table schema Key: HBASE-8728 URL: https://issues.apache.org/jira/browse/HBASE-8728 Project: HBase Issue Type: Task Reporter: JOB M THOMAS -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8728) HBase table schema
[ https://issues.apache.org/jira/browse/HBASE-8728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] JOB M THOMAS updated HBASE-8728: Description: Hi friends, This is my data in a file. 125829086 Llandovery 501 125829087 Tamil 461 125829088 throbless 736 125829089 pondside 195 125829090 oxyterpene 791 125829091 subofficer 416 125829092 paleornithology 734 125829093 kenno 80 125829094 oratorship 565 125829095 Cimmerianism 499 125829096 jharal 985 125829097 genii 330 125829098 qualminess 340 125829099 blurredness 57 125829100 topline 803 I have to create Hbase table for this. you can use the first number as row key and second and third fields as two columns in hbase. please help me to create the table? I have serched a lot in google, but not found any soluton to create a table with one column family and two columns under it please help me... HBase table schema --- Key: HBASE-8728 URL: https://issues.apache.org/jira/browse/HBASE-8728 Project: HBase Issue Type: Task Reporter: JOB M THOMAS Hi friends, This is my data in a file. 125829086 Llandovery 501 125829087 Tamil 461 125829088 throbless 736 125829089 pondside 195 125829090 oxyterpene 791 125829091 subofficer 416 125829092 paleornithology 734 125829093 kenno 80 125829094 oratorship 565 125829095 Cimmerianism 499 125829096 jharal 985 125829097 genii 330 125829098 qualminess 340 125829099 blurredness 57 125829100 topline 803 I have to create Hbase table for this. you can use the first number as row key and second and third fields as two columns in hbase. please help me to create the table? I have serched a lot in google, but not found any soluton to create a table with one column family and two columns under it please help me... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8728) HBase table schema
[ https://issues.apache.org/jira/browse/HBASE-8728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680258#comment-13680258 ] Anoop Sam John commented on HBASE-8728: --- Pls have a look at ImportTSV tool which supports your need. And pls dont raise tickets in JIRA for this kind of help.. You can send mails in user@ mailing list and guys there can help you out. HBase table schema --- Key: HBASE-8728 URL: https://issues.apache.org/jira/browse/HBASE-8728 Project: HBase Issue Type: Task Reporter: JOB M THOMAS Hi friends, This is my data in a file. 125829086 Llandovery 501 125829087 Tamil 461 125829088 throbless 736 125829089 pondside 195 125829090 oxyterpene 791 125829091 subofficer 416 125829092 paleornithology 734 125829093 kenno 80 125829094 oratorship 565 125829095 Cimmerianism 499 125829096 jharal 985 125829097 genii 330 125829098 qualminess 340 125829099 blurredness 57 125829100 topline 803 I have to create Hbase table for this. you can use the first number as row key and second and third fields as two columns in hbase. please help me to create the table? I have serched a lot in google, but not found any soluton to create a table with one column family and two columns under it please help me... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-8729) distributedLogReplay may hang during chained region server failure
Jeffrey Zhong created HBASE-8729: Summary: distributedLogReplay may hang during chained region server failure Key: HBASE-8729 URL: https://issues.apache.org/jira/browse/HBASE-8729 Project: HBase Issue Type: Bug Components: MTTR Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Fix For: 0.98.0, 0.95.2 In a test, half cluster(in terms of region servers) was down and some log replay had incurred chained RS failures(receiving RS of a log replay failed again). Since by default, we only allow 3 concurrent SSH handlers(controlled by {code}this.executorService.startExecutorService(ExecutorType.MASTER_SERVER_OPERATIONS,conf.getInt(hbase.master.executor.serverops.threads, 3));{code}). If all 3 SSH handlers are doing logReplay(blocking call) and one of receiving RS fails again then logReplay will hang because regions of the newly failed RS can't be re-assigned to another live RS(no ssh handler will be processed due to max threads setting) and existing log replay will keep routing replay traffic to the dead RS. The fix is to submit logReplay work into a separate type of executor queue in order not to block SSH region assignment so that logReplay can route traffic to a live RS after retries and move forward. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-8728) HBase table schema
[ https://issues.apache.org/jira/browse/HBASE-8728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anoop Sam John resolved HBASE-8728. --- Resolution: Invalid HBase table schema --- Key: HBASE-8728 URL: https://issues.apache.org/jira/browse/HBASE-8728 Project: HBase Issue Type: Task Reporter: JOB M THOMAS Hi friends, This is my data in a file. 125829086 Llandovery 501 125829087 Tamil 461 125829088 throbless 736 125829089 pondside 195 125829090 oxyterpene 791 125829091 subofficer 416 125829092 paleornithology 734 125829093 kenno 80 125829094 oratorship 565 125829095 Cimmerianism 499 125829096 jharal 985 125829097 genii 330 125829098 qualminess 340 125829099 blurredness 57 125829100 topline 803 I have to create Hbase table for this. you can use the first number as row key and second and third fields as two columns in hbase. please help me to create the table? I have serched a lot in google, but not found any soluton to create a table with one column family and two columns under it please help me... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8728) HBase table schema
[ https://issues.apache.org/jira/browse/HBASE-8728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680262#comment-13680262 ] JOB M THOMAS commented on HBASE-8728: - how to join user@mailing how to send my doublts? HBase table schema --- Key: HBASE-8728 URL: https://issues.apache.org/jira/browse/HBASE-8728 Project: HBase Issue Type: Task Reporter: JOB M THOMAS Hi friends, This is my data in a file. 125829086 Llandovery 501 125829087 Tamil 461 125829088 throbless 736 125829089 pondside 195 125829090 oxyterpene 791 125829091 subofficer 416 125829092 paleornithology 734 125829093 kenno 80 125829094 oratorship 565 125829095 Cimmerianism 499 125829096 jharal 985 125829097 genii 330 125829098 qualminess 340 125829099 blurredness 57 125829100 topline 803 I have to create Hbase table for this. you can use the first number as row key and second and third fields as two columns in hbase. please help me to create the table? I have serched a lot in google, but not found any soluton to create a table with one column family and two columns under it please help me... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8729) distributedLogReplay may hang during chained region server failure
[ https://issues.apache.org/jira/browse/HBASE-8729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Zhong updated HBASE-8729: - Status: Patch Available (was: Open) distributedLogReplay may hang during chained region server failure -- Key: HBASE-8729 URL: https://issues.apache.org/jira/browse/HBASE-8729 Project: HBase Issue Type: Bug Components: MTTR Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Fix For: 0.98.0, 0.95.2 Attachments: hbase-8729.patch In a test, half cluster(in terms of region servers) was down and some log replay had incurred chained RS failures(receiving RS of a log replay failed again). Since by default, we only allow 3 concurrent SSH handlers(controlled by {code}this.executorService.startExecutorService(ExecutorType.MASTER_SERVER_OPERATIONS,conf.getInt(hbase.master.executor.serverops.threads, 3));{code}). If all 3 SSH handlers are doing logReplay(blocking call) and one of receiving RS fails again then logReplay will hang because regions of the newly failed RS can't be re-assigned to another live RS(no ssh handler will be processed due to max threads setting) and existing log replay will keep routing replay traffic to the dead RS. The fix is to submit logReplay work into a separate type of executor queue in order not to block SSH region assignment so that logReplay can route traffic to a live RS after retries and move forward. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8729) distributedLogReplay may hang during chained region server failure
[ https://issues.apache.org/jira/browse/HBASE-8729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Zhong updated HBASE-8729: - Attachment: hbase-8729.patch distributedLogReplay may hang during chained region server failure -- Key: HBASE-8729 URL: https://issues.apache.org/jira/browse/HBASE-8729 Project: HBase Issue Type: Bug Components: MTTR Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Fix For: 0.98.0, 0.95.2 Attachments: hbase-8729.patch In a test, half cluster(in terms of region servers) was down and some log replay had incurred chained RS failures(receiving RS of a log replay failed again). Since by default, we only allow 3 concurrent SSH handlers(controlled by {code}this.executorService.startExecutorService(ExecutorType.MASTER_SERVER_OPERATIONS,conf.getInt(hbase.master.executor.serverops.threads, 3));{code}). If all 3 SSH handlers are doing logReplay(blocking call) and one of receiving RS fails again then logReplay will hang because regions of the newly failed RS can't be re-assigned to another live RS(no ssh handler will be processed due to max threads setting) and existing log replay will keep routing replay traffic to the dead RS. The fix is to submit logReplay work into a separate type of executor queue in order not to block SSH region assignment so that logReplay can route traffic to a live RS after retries and move forward. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8728) HBase table schema
[ https://issues.apache.org/jira/browse/HBASE-8728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680264#comment-13680264 ] Anoop Sam John commented on HBASE-8728: --- U can send a mail to user-subscr...@hbase.apache.org to subscribe to this list.. Just a blank mail is enough an it will add you mail id to the users list. You will get mails.. For ur doubts, u can send email to this id - u...@hbase.apache.org Thanks HBase table schema --- Key: HBASE-8728 URL: https://issues.apache.org/jira/browse/HBASE-8728 Project: HBase Issue Type: Task Reporter: JOB M THOMAS Hi friends, This is my data in a file. 125829086 Llandovery 501 125829087 Tamil 461 125829088 throbless 736 125829089 pondside 195 125829090 oxyterpene 791 125829091 subofficer 416 125829092 paleornithology 734 125829093 kenno 80 125829094 oratorship 565 125829095 Cimmerianism 499 125829096 jharal 985 125829097 genii 330 125829098 qualminess 340 125829099 blurredness 57 125829100 topline 803 I have to create Hbase table for this. you can use the first number as row key and second and third fields as two columns in hbase. please help me to create the table? I have serched a lot in google, but not found any soluton to create a table with one column family and two columns under it please help me... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8728) HBase table schema
[ https://issues.apache.org/jira/browse/HBASE-8728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680283#comment-13680283 ] JOB M THOMAS commented on HBASE-8728: - thanks anoop. HBase table schema --- Key: HBASE-8728 URL: https://issues.apache.org/jira/browse/HBASE-8728 Project: HBase Issue Type: Task Reporter: JOB M THOMAS Hi friends, This is my data in a file. 125829086 Llandovery 501 125829087 Tamil 461 125829088 throbless 736 125829089 pondside 195 125829090 oxyterpene 791 125829091 subofficer 416 125829092 paleornithology 734 125829093 kenno 80 125829094 oratorship 565 125829095 Cimmerianism 499 125829096 jharal 985 125829097 genii 330 125829098 qualminess 340 125829099 blurredness 57 125829100 topline 803 I have to create Hbase table for this. you can use the first number as row key and second and third fields as two columns in hbase. please help me to create the table? I have serched a lot in google, but not found any soluton to create a table with one column family and two columns under it please help me... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8667) Master and Regionserver not able to communicate if both bound to different network interfaces on the same machine.
[ https://issues.apache.org/jira/browse/HBASE-8667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680316#comment-13680316 ] rajeshbabu commented on HBASE-8667: --- [~stack] bq. Our workaround was having the regionserver take the name the master proffered after checkin. This seemed to get rid of a an all-to-common problem seen in hbase deploys Then we need to initialize rpc server in RS with the hostname recieved from master after checkin right? Otherwise we will have this issue. Master and Regionserver not able to communicate if both bound to different network interfaces on the same machine. -- Key: HBASE-8667 URL: https://issues.apache.org/jira/browse/HBASE-8667 Project: HBase Issue Type: Bug Components: IPC/RPC Reporter: rajeshbabu Fix For: 0.98.0, 0.95.2, 0.94.9 Attachments: HBASE-8667_Trunk.patch, HBASE-8667_Trunk-V2.patch While testing HBASE-8640 fix found that master and regionserver running on different interfaces are not communicating properly. I have two interfaces 1) lo 2) eth0 in my machine and default hostname interface is lo. I have configured master ipc address to ip of eth0 interface. Started master and regionserver on the same machine. 1) master rpc server bound to eth0 and RS rpc server bound to lo 2) Since rpc client is not binding to any ip address, when RS is reporting RS startup its getting registered with eth0 ip address(but actually it should register localhost) Here are RS logs: {code} 2013-05-31 06:05:28,608 WARN [regionserver60020] org.apache.hadoop.hbase.regionserver.HRegionServer: reportForDuty failed; sleeping and then retrying. 2013-05-31 06:05:31,609 INFO [regionserver60020] org.apache.hadoop.hbase.regionserver.HRegionServer: Attempting connect to Master server at 192.168.0.100,6,1369960497008 2013-05-31 06:05:31,609 INFO [regionserver60020] org.apache.hadoop.hbase.regionserver.HRegionServer: Telling master at 192.168.0.100,6,1369960497008 that we are up with port=60020, startcode=1369960502544 2013-05-31 06:05:31,618 DEBUG [regionserver60020] org.apache.hadoop.hbase.regionserver.HRegionServer: Config from master: hbase.rootdir=hdfs://localhost:2851/hbase 2013-05-31 06:05:31,618 DEBUG [regionserver60020] org.apache.hadoop.hbase.regionserver.HRegionServer: Config from master: fs.default.name=hdfs://localhost:2851 2013-05-31 06:05:31,618 INFO [regionserver60020] org.apache.hadoop.hbase.regionserver.HRegionServer: Master passed us a different hostname to use; was=localhost, but now=192.168.0.100 {code} Here are master logs: {code} 2013-05-31 06:05:31,615 INFO [IPC Server handler 9 on 6] org.apache.hadoop.hbase.master.ServerManager: Registering server=192.168.0.100,60020,1369960502544 {code} Since master has wrong rpc server address of RS, META is not getting assigned. {code} 2013-05-31 06:05:34,362 DEBUG [master-192.168.0.100,6,1369960497008] org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for .META.,,1.1028785192 so generated a random one; hri=.META.,,1.1028785192, src=, dest=192.168.0.100,60020,1369960502544; 1 (online=1, available=1) available servers, forceNewPlan=false - org.apache.hadoop.hbase.master.AssignmentManager: Failed assignment of .META.,,1.1028785192 to 192.168.0.100,60020,1369960502544, trying to assign elsewhere instead; try=1 of 10 java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:592) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:511) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:481) at org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupConnection(RpcClient.java:549) at org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupIOstreams(RpcClient.java:813) at org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1422) at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1315) at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1532) at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1587) at org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$BlockingStub.openRegion(AdminProtos.java:15039) at org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:627) at
[jira] [Commented] (HBASE-8721) fix for bug that delete can mask puts that happened after the delete was entered
[ https://issues.apache.org/jira/browse/HBASE-8721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680323#comment-13680323 ] Feng Honghua commented on HBASE-8721: - [~sershe] If we want to keep the behaviour that delete can mask puts that happened after the delete, to fix the inconsistency issue caused by major compact, the only alternative is to keep the delete markers forever, as you said. But I think the inconsistency issue's root cause is the arguable behaviour that delete can mask puts that happened after the delete. A more intuitive and more reasonable behaviour is that a delete can only mask puts happened before it, and has no impact on puts happened after it. (This behaviour has nothing to do with another behaviour that timestamp determines which kv survives regarding version semantic.) And if we choose this adjusted behaviour, we can fix the inconsistency issue just with the help of mvcc, and collect the delete markers during major compact as before (no need to keep them forever to fix that inconsistency) A obvious, and ridiculous drawback of the behaviour that delete can mask puts that happened after the delete is that when an end user puts a kv, gets success response but it turns out that he can't read out that kv just because someone(maybe this someone is himself, but he can't realize this) ever made a delete that can mask this kv...this sounds really uncanny and weird. Turns back to scenarios that timestamp is used as another ordinary dimension without time semantic, in those cases we declare max(int) for the versions, and in that scheme timestamp isn't used to control version count but as an ordinary dimension to locate a cell. And each cell has a single version. So no problem. I agree we can introduce a config knob to enable the new behaviour. fix for bug that delete can mask puts that happened after the delete was entered Key: HBASE-8721 URL: https://issues.apache.org/jira/browse/HBASE-8721 Project: HBase Issue Type: Bug Components: regionserver Reporter: Feng Honghua Attachments: HBASE-8721-0.94-V0.patch this fix aims for bug mentioned in http://hbase.apache.org/book.html 5.8.2.1: Deletes mask puts, even puts that happened after the delete was entered. Remember that a delete writes a tombstone, which only disappears after then next major compaction has run. Suppose you do a delete of everything = T. After this you do a new put with a timestamp = T. This put, even if it happened after the delete, will be masked by the delete tombstone. Performing the put will not fail, but when you do a get you will notice the put did have no effect. It will start working again after the major compaction has run. These issues should not be a problem if you use always-increasing versions for new puts to a row. But they can occur even if you do not care about time: just do delete and put immediately after each other, and there is some chance they happen within the same millisecond. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8729) distributedLogReplay may hang during chained region server failure
[ https://issues.apache.org/jira/browse/HBASE-8729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680420#comment-13680420 ] Ted Yu commented on HBASE-8729: --- {code} + this.executorService.startExecutorService(ExecutorType.MASTER_LOG_REPLAY_OPERATIONS, + conf.getInt(hbase.master.executor.serverops.threads, 15)); {code} Did you intend to introduce a new config param for log replay operations ? There are several syntax errors in class javadoc for EventHandler. {code} +sinkConf.setInt(HConstants.HBASE_RPC_TIMEOUT_KEY, HConstants.DEFAULT_HBASE_RPC_TIMEOUT / 2); {code} Can you add some comment for the above change ? distributedLogReplay may hang during chained region server failure -- Key: HBASE-8729 URL: https://issues.apache.org/jira/browse/HBASE-8729 Project: HBase Issue Type: Bug Components: MTTR Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Fix For: 0.98.0, 0.95.2 Attachments: hbase-8729.patch In a test, half cluster(in terms of region servers) was down and some log replay had incurred chained RS failures(receiving RS of a log replay failed again). Since by default, we only allow 3 concurrent SSH handlers(controlled by {code}this.executorService.startExecutorService(ExecutorType.MASTER_SERVER_OPERATIONS,conf.getInt(hbase.master.executor.serverops.threads, 3));{code}). If all 3 SSH handlers are doing logReplay(blocking call) and one of receiving RS fails again then logReplay will hang because regions of the newly failed RS can't be re-assigned to another live RS(no ssh handler will be processed due to max threads setting) and existing log replay will keep routing replay traffic to the dead RS. The fix is to submit logReplay work into a separate type of executor queue in order not to block SSH region assignment so that logReplay can route traffic to a live RS after retries and move forward. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8727) Adding a KijiCon notice in the news section of the site
[ https://issues.apache.org/jira/browse/HBASE-8727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-8727: - Resolution: Fixed Status: Resolved (was: Patch Available) We don't add notice for other folks meetups but making an exception in this case. Good on you J. Adding a KijiCon notice in the news section of the site --- Key: HBASE-8727 URL: https://issues.apache.org/jira/browse/HBASE-8727 Project: HBase Issue Type: Bug Components: site Reporter: Jonathan Natkins Attachments: HBASE-8727.diff -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Comment Edited] (HBASE-8729) distributedLogReplay may hang during chained region server failure
[ https://issues.apache.org/jira/browse/HBASE-8729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680420#comment-13680420 ] Ted Yu edited comment on HBASE-8729 at 6/11/13 4:23 PM: {code} + this.executorService.startExecutorService(ExecutorType.MASTER_LOG_REPLAY_OPERATIONS, + conf.getInt(hbase.master.executor.serverops.threads, 15)); {code} Did you intend to introduce a new config param for log replay operations ? There are several syntax errors in class javadoc for LogReplayHandler. {code} +sinkConf.setInt(HConstants.HBASE_RPC_TIMEOUT_KEY, HConstants.DEFAULT_HBASE_RPC_TIMEOUT / 2); {code} Can you add some comment for the above change ? was (Author: yuzhih...@gmail.com): {code} + this.executorService.startExecutorService(ExecutorType.MASTER_LOG_REPLAY_OPERATIONS, + conf.getInt(hbase.master.executor.serverops.threads, 15)); {code} Did you intend to introduce a new config param for log replay operations ? There are several syntax errors in class javadoc for EventHandler. {code} +sinkConf.setInt(HConstants.HBASE_RPC_TIMEOUT_KEY, HConstants.DEFAULT_HBASE_RPC_TIMEOUT / 2); {code} Can you add some comment for the above change ? distributedLogReplay may hang during chained region server failure -- Key: HBASE-8729 URL: https://issues.apache.org/jira/browse/HBASE-8729 Project: HBase Issue Type: Bug Components: MTTR Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Fix For: 0.98.0, 0.95.2 Attachments: hbase-8729.patch In a test, half cluster(in terms of region servers) was down and some log replay had incurred chained RS failures(receiving RS of a log replay failed again). Since by default, we only allow 3 concurrent SSH handlers(controlled by {code}this.executorService.startExecutorService(ExecutorType.MASTER_SERVER_OPERATIONS,conf.getInt(hbase.master.executor.serverops.threads, 3));{code}). If all 3 SSH handlers are doing logReplay(blocking call) and one of receiving RS fails again then logReplay will hang because regions of the newly failed RS can't be re-assigned to another live RS(no ssh handler will be processed due to max threads setting) and existing log replay will keep routing replay traffic to the dead RS. The fix is to submit logReplay work into a separate type of executor queue in order not to block SSH region assignment so that logReplay can route traffic to a live RS after retries and move forward. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8729) distributedLogReplay may hang during chained region server failure
[ https://issues.apache.org/jira/browse/HBASE-8729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-8729: -- Attachment: 8729-v2.patch distributedLogReplay may hang during chained region server failure -- Key: HBASE-8729 URL: https://issues.apache.org/jira/browse/HBASE-8729 Project: HBase Issue Type: Bug Components: MTTR Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Fix For: 0.98.0, 0.95.2 Attachments: 8729-v2.patch, hbase-8729.patch In a test, half cluster(in terms of region servers) was down and some log replay had incurred chained RS failures(receiving RS of a log replay failed again). Since by default, we only allow 3 concurrent SSH handlers(controlled by {code}this.executorService.startExecutorService(ExecutorType.MASTER_SERVER_OPERATIONS,conf.getInt(hbase.master.executor.serverops.threads, 3));{code}). If all 3 SSH handlers are doing logReplay(blocking call) and one of receiving RS fails again then logReplay will hang because regions of the newly failed RS can't be re-assigned to another live RS(no ssh handler will be processed due to max threads setting) and existing log replay will keep routing replay traffic to the dead RS. The fix is to submit logReplay work into a separate type of executor queue in order not to block SSH region assignment so that logReplay can route traffic to a live RS after retries and move forward. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8687) When moving region with region_mover.rb, there is long stack trace for RegionMovedException
[ https://issues.apache.org/jira/browse/HBASE-8687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680475#comment-13680475 ] stack commented on HBASE-8687: -- Did the script keep going? Select a new location and move the region there? Was it moving the region to where the region was already sitting and that was why the exception? When moving region with region_mover.rb, there is long stack trace for RegionMovedException --- Key: HBASE-8687 URL: https://issues.apache.org/jira/browse/HBASE-8687 Project: HBase Issue Type: Bug Components: Region Assignment Reporter: Ted Yu Priority: Minor When gracefully rolling restart region servers, I saw the following in output: {code} 2013-06-04 20:44:40,135 DEBUG [main] client.ClientScanner: Scan table=usertable, startRow=user8129671889902366092 2013-06-04 20:44:40,141 DEBUG [main] client.ClientScanner: Scan table=.META., startRow=usertable,user8129671889902366092,00 2013-06-04 20:44:40,158 INFO [main] region_mover: Moving region 13168d8b86f1ace9472f60555207a707 (2 of 2) to server=hor8n09.gq1.ygridcore.net,60020,1370378675859 2013-06-04 20:44:40,405 DEBUG [main] client.ClientScanner: Scan table=usertable, startRow=user8129671889902366092 2013-06-04 20:44:40,407 WARN [main] client.ServerCallable: Call exception, tries=0, numRetries=100 org.apache.hadoop.hbase.exceptions.RegionMovedException: Region moved to: hostname=hor8n09.gq1.ygridcore.net port=60020 startCode=1370378675859. As of locationSeqNum=194375. at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95) at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:230) at org.apache.hadoop.hbase.client.ScannerCallable.openScanner(ScannerCallable.java:299) at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:147) at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:55) at org.apache.hadoop.hbase.client.ServerCallable.withRetries(ServerCallable.java:174) at org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:215) at org.apache.hadoop.hbase.client.ClientScanner.init(ClientScanner.java:130) at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:585) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.jruby.javasupport.JavaMethod.invokeDirectWithExceptionHandling(JavaMethod.java:450) at org.jruby.javasupport.JavaMethod.invokeDirect(JavaMethod.java:311) at org.jruby.java.invokers.InstanceMethodInvoker.call(InstanceMethodInvoker.java:59) at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:167) at homes.hortonzy.hbase_minus_0_dot_95_dot_1.bin.region_mover.method__6$RUBY$isSuccessfulScan(/homes/hortonzy/hbase-0.95.1/bin/region_mover.rb:121) at homes$hortonzy$hbase_minus_0_dot_95_dot_1$bin$region_mover$method__6$RUBY$isSuccessfulScan.call(homes$hortonzy$hbase_minus_0_dot_95_dot_1$bin$region_mover$method__6$RUBY$isSuccessfulScan:65535) at homes$hortonzy$hbase_minus_0_dot_95_dot_1$bin$region_mover$method__6$RUBY$isSuccessfulScan.call(homes$hortonzy$hbase_minus_0_dot_95_dot_1$bin$region_mover$method__6$RUBY$isSuccessfulScan:65535) at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:201) at homes.hortonzy.hbase_minus_0_dot_95_dot_1.bin.region_mover.method__8$RUBY$move(/homes/hortonzy/hbase-0.95.1/bin/region_mover.rb:164) at homes$hortonzy$hbase_minus_0_dot_95_dot_1$bin$region_mover$method__8$RUBY$move.call(homes$hortonzy$hbase_minus_0_dot_95_dot_1$bin$region_mover$method__8$RUBY$move:65535) at org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:181) at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:69) at
[jira] [Commented] (HBASE-8687) When moving region with region_mover.rb, there is long stack trace for RegionMovedException
[ https://issues.apache.org/jira/browse/HBASE-8687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680480#comment-13680480 ] Ted Yu commented on HBASE-8687: --- bq. Did the script keep going? Yes. bq. Was it moving the region to where the region was already sitting and that was why the exception? I checked cluster status afterwards: region servers came back up and cluster was balanced. So I think the exception was red herring. When moving region with region_mover.rb, there is long stack trace for RegionMovedException --- Key: HBASE-8687 URL: https://issues.apache.org/jira/browse/HBASE-8687 Project: HBase Issue Type: Bug Components: Region Assignment Reporter: Ted Yu Priority: Minor When gracefully rolling restart region servers, I saw the following in output: {code} 2013-06-04 20:44:40,135 DEBUG [main] client.ClientScanner: Scan table=usertable, startRow=user8129671889902366092 2013-06-04 20:44:40,141 DEBUG [main] client.ClientScanner: Scan table=.META., startRow=usertable,user8129671889902366092,00 2013-06-04 20:44:40,158 INFO [main] region_mover: Moving region 13168d8b86f1ace9472f60555207a707 (2 of 2) to server=hor8n09.gq1.ygridcore.net,60020,1370378675859 2013-06-04 20:44:40,405 DEBUG [main] client.ClientScanner: Scan table=usertable, startRow=user8129671889902366092 2013-06-04 20:44:40,407 WARN [main] client.ServerCallable: Call exception, tries=0, numRetries=100 org.apache.hadoop.hbase.exceptions.RegionMovedException: Region moved to: hostname=hor8n09.gq1.ygridcore.net port=60020 startCode=1370378675859. As of locationSeqNum=194375. at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95) at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:230) at org.apache.hadoop.hbase.client.ScannerCallable.openScanner(ScannerCallable.java:299) at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:147) at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:55) at org.apache.hadoop.hbase.client.ServerCallable.withRetries(ServerCallable.java:174) at org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:215) at org.apache.hadoop.hbase.client.ClientScanner.init(ClientScanner.java:130) at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:585) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.jruby.javasupport.JavaMethod.invokeDirectWithExceptionHandling(JavaMethod.java:450) at org.jruby.javasupport.JavaMethod.invokeDirect(JavaMethod.java:311) at org.jruby.java.invokers.InstanceMethodInvoker.call(InstanceMethodInvoker.java:59) at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:167) at homes.hortonzy.hbase_minus_0_dot_95_dot_1.bin.region_mover.method__6$RUBY$isSuccessfulScan(/homes/hortonzy/hbase-0.95.1/bin/region_mover.rb:121) at homes$hortonzy$hbase_minus_0_dot_95_dot_1$bin$region_mover$method__6$RUBY$isSuccessfulScan.call(homes$hortonzy$hbase_minus_0_dot_95_dot_1$bin$region_mover$method__6$RUBY$isSuccessfulScan:65535) at homes$hortonzy$hbase_minus_0_dot_95_dot_1$bin$region_mover$method__6$RUBY$isSuccessfulScan.call(homes$hortonzy$hbase_minus_0_dot_95_dot_1$bin$region_mover$method__6$RUBY$isSuccessfulScan:65535) at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:201) at homes.hortonzy.hbase_minus_0_dot_95_dot_1.bin.region_mover.method__8$RUBY$move(/homes/hortonzy/hbase-0.95.1/bin/region_mover.rb:164) at homes$hortonzy$hbase_minus_0_dot_95_dot_1$bin$region_mover$method__8$RUBY$move.call(homes$hortonzy$hbase_minus_0_dot_95_dot_1$bin$region_mover$method__8$RUBY$move:65535) at org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:181) at
[jira] [Commented] (HBASE-8687) When moving region with region_mover.rb, there is long stack trace for RegionMovedException
[ https://issues.apache.org/jira/browse/HBASE-8687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680477#comment-13680477 ] stack commented on HBASE-8687: -- Looking in code, the RegionMovedException uses a cache of regions recently moved to point out where the region has gone too. The server above that threw the exception was or was not hor8n09? If it was, then that is odd. If region is still on this server, we should be fixing up the recently moved cache. When moving region with region_mover.rb, there is long stack trace for RegionMovedException --- Key: HBASE-8687 URL: https://issues.apache.org/jira/browse/HBASE-8687 Project: HBase Issue Type: Bug Components: Region Assignment Reporter: Ted Yu Priority: Minor When gracefully rolling restart region servers, I saw the following in output: {code} 2013-06-04 20:44:40,135 DEBUG [main] client.ClientScanner: Scan table=usertable, startRow=user8129671889902366092 2013-06-04 20:44:40,141 DEBUG [main] client.ClientScanner: Scan table=.META., startRow=usertable,user8129671889902366092,00 2013-06-04 20:44:40,158 INFO [main] region_mover: Moving region 13168d8b86f1ace9472f60555207a707 (2 of 2) to server=hor8n09.gq1.ygridcore.net,60020,1370378675859 2013-06-04 20:44:40,405 DEBUG [main] client.ClientScanner: Scan table=usertable, startRow=user8129671889902366092 2013-06-04 20:44:40,407 WARN [main] client.ServerCallable: Call exception, tries=0, numRetries=100 org.apache.hadoop.hbase.exceptions.RegionMovedException: Region moved to: hostname=hor8n09.gq1.ygridcore.net port=60020 startCode=1370378675859. As of locationSeqNum=194375. at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95) at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:230) at org.apache.hadoop.hbase.client.ScannerCallable.openScanner(ScannerCallable.java:299) at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:147) at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:55) at org.apache.hadoop.hbase.client.ServerCallable.withRetries(ServerCallable.java:174) at org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:215) at org.apache.hadoop.hbase.client.ClientScanner.init(ClientScanner.java:130) at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:585) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.jruby.javasupport.JavaMethod.invokeDirectWithExceptionHandling(JavaMethod.java:450) at org.jruby.javasupport.JavaMethod.invokeDirect(JavaMethod.java:311) at org.jruby.java.invokers.InstanceMethodInvoker.call(InstanceMethodInvoker.java:59) at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:167) at homes.hortonzy.hbase_minus_0_dot_95_dot_1.bin.region_mover.method__6$RUBY$isSuccessfulScan(/homes/hortonzy/hbase-0.95.1/bin/region_mover.rb:121) at homes$hortonzy$hbase_minus_0_dot_95_dot_1$bin$region_mover$method__6$RUBY$isSuccessfulScan.call(homes$hortonzy$hbase_minus_0_dot_95_dot_1$bin$region_mover$method__6$RUBY$isSuccessfulScan:65535) at homes$hortonzy$hbase_minus_0_dot_95_dot_1$bin$region_mover$method__6$RUBY$isSuccessfulScan.call(homes$hortonzy$hbase_minus_0_dot_95_dot_1$bin$region_mover$method__6$RUBY$isSuccessfulScan:65535) at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:201) at homes.hortonzy.hbase_minus_0_dot_95_dot_1.bin.region_mover.method__8$RUBY$move(/homes/hortonzy/hbase-0.95.1/bin/region_mover.rb:164) at homes$hortonzy$hbase_minus_0_dot_95_dot_1$bin$region_mover$method__8$RUBY$move.call(homes$hortonzy$hbase_minus_0_dot_95_dot_1$bin$region_mover$method__8$RUBY$move:65535) at org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:181) at
[jira] [Commented] (HBASE-8687) When moving region with region_mover.rb, there is long stack trace for RegionMovedException
[ https://issues.apache.org/jira/browse/HBASE-8687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680495#comment-13680495 ] stack commented on HBASE-8687: -- The region_mover.rb script does spew a bunch which can disorientate an operator. No harm cleanup up some of it. When moving region with region_mover.rb, there is long stack trace for RegionMovedException --- Key: HBASE-8687 URL: https://issues.apache.org/jira/browse/HBASE-8687 Project: HBase Issue Type: Bug Components: Region Assignment Reporter: Ted Yu Priority: Minor When gracefully rolling restart region servers, I saw the following in output: {code} 2013-06-04 20:44:40,135 DEBUG [main] client.ClientScanner: Scan table=usertable, startRow=user8129671889902366092 2013-06-04 20:44:40,141 DEBUG [main] client.ClientScanner: Scan table=.META., startRow=usertable,user8129671889902366092,00 2013-06-04 20:44:40,158 INFO [main] region_mover: Moving region 13168d8b86f1ace9472f60555207a707 (2 of 2) to server=hor8n09.gq1.ygridcore.net,60020,1370378675859 2013-06-04 20:44:40,405 DEBUG [main] client.ClientScanner: Scan table=usertable, startRow=user8129671889902366092 2013-06-04 20:44:40,407 WARN [main] client.ServerCallable: Call exception, tries=0, numRetries=100 org.apache.hadoop.hbase.exceptions.RegionMovedException: Region moved to: hostname=hor8n09.gq1.ygridcore.net port=60020 startCode=1370378675859. As of locationSeqNum=194375. at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95) at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:230) at org.apache.hadoop.hbase.client.ScannerCallable.openScanner(ScannerCallable.java:299) at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:147) at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:55) at org.apache.hadoop.hbase.client.ServerCallable.withRetries(ServerCallable.java:174) at org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:215) at org.apache.hadoop.hbase.client.ClientScanner.init(ClientScanner.java:130) at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:585) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.jruby.javasupport.JavaMethod.invokeDirectWithExceptionHandling(JavaMethod.java:450) at org.jruby.javasupport.JavaMethod.invokeDirect(JavaMethod.java:311) at org.jruby.java.invokers.InstanceMethodInvoker.call(InstanceMethodInvoker.java:59) at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:167) at homes.hortonzy.hbase_minus_0_dot_95_dot_1.bin.region_mover.method__6$RUBY$isSuccessfulScan(/homes/hortonzy/hbase-0.95.1/bin/region_mover.rb:121) at homes$hortonzy$hbase_minus_0_dot_95_dot_1$bin$region_mover$method__6$RUBY$isSuccessfulScan.call(homes$hortonzy$hbase_minus_0_dot_95_dot_1$bin$region_mover$method__6$RUBY$isSuccessfulScan:65535) at homes$hortonzy$hbase_minus_0_dot_95_dot_1$bin$region_mover$method__6$RUBY$isSuccessfulScan.call(homes$hortonzy$hbase_minus_0_dot_95_dot_1$bin$region_mover$method__6$RUBY$isSuccessfulScan:65535) at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:201) at homes.hortonzy.hbase_minus_0_dot_95_dot_1.bin.region_mover.method__8$RUBY$move(/homes/hortonzy/hbase-0.95.1/bin/region_mover.rb:164) at homes$hortonzy$hbase_minus_0_dot_95_dot_1$bin$region_mover$method__8$RUBY$move.call(homes$hortonzy$hbase_minus_0_dot_95_dot_1$bin$region_mover$method__8$RUBY$move:65535) at org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:181) at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:69) at homes.hortonzy.hbase_minus_0_dot_95_dot_1.bin.region_mover.block_6$RUBY$__for__(/homes/hortonzy/hbase-0.95.1/bin/region_mover.rb:381)
[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background
[ https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680501#comment-13680501 ] Ted Yu commented on HBASE-6295: --- Putting patch on cluster, I saw a lot of the following in the log: {code} 2013-06-11 16:51:19,806 INFO [HBaseWriterThread_11] client.AsyncProcess: won: Waiting for number of tasks to be equals or less than 0, currently it's 1 2013-06-11 16:51:19,807 INFO [HBaseWriterThread_18] client.AsyncProcess: won: Waiting for number of tasks to be equals or less than 0, currently it's 1 2013-06-11 16:51:19,807 INFO [HBaseWriterThread_15] client.AsyncProcess: won: Waiting for number of tasks to be equals or less than 0, currently it's 1 {code} I think the above log should be at TRACE level. Possible performance improvement in client batch operations: presplit and send in background Key: HBASE-6295 URL: https://issues.apache.org/jira/browse/HBASE-6295 Project: HBase Issue Type: Improvement Components: Client, Performance Affects Versions: 0.95.2 Reporter: Nicolas Liochon Assignee: Nicolas Liochon Labels: noob Fix For: 0.98.0 Attachments: 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch today batch algo is: {noformat} for Operation o: ListOp{ add o to todolist if todolist maxsize or o last in list split todolist per location send split lists to region servers clear todolist wait } {noformat} We could: - create immediately the final object instead of an intermediate array - split per location immediately - instead of sending when the list as a whole is full, send it when there is enough data for a single location It would be: {noformat} for Operation o: ListOp{ get location add o to todo location.todolist if (location.todolist maxLocationSize) send location.todolist to region server clear location.todolist // don't wait, continue the loop } send remaining wait {noformat} It's not trivial to write if you add error management: retried list must be shared with the operations added in the todolist. But it's doable. It's interesting mainly for 'big' writes -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8705) RS holding META when restarted in a single node setup may hang infinitely without META assignment
[ https://issues.apache.org/jira/browse/HBASE-8705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-8705: -- Attachment: HBASE-8705.patch A simple patch that just retries incase of META. What you guys think about it. It is nothing but reintroducing the logic where the assignment was attempted for maxAttempts number of times. This just does that for META incase of not region plan available but with a sleep. RS holding META when restarted in a single node setup may hang infinitely without META assignment - Key: HBASE-8705 URL: https://issues.apache.org/jira/browse/HBASE-8705 Project: HBase Issue Type: Bug Affects Versions: 0.95.1 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Minor Fix For: 0.98.0 Attachments: HBASE-8705.patch This bug may be minor as it likely to happen in a single node setup. I restarted the RS holding META. The master tried assigning META using MetaSSH. But tried this before the new RS came up. So as not region plan is found {code} if (plan == null) { LOG.warn(Unable to determine a plan to assign + region); if (tomActivated){ this.timeoutMonitor.setAllRegionServersOffline(true); } else { regionStates.updateRegionState(region, RegionState.State.FAILED_OPEN); } return; } {code} we just return without assigment. And this being the META the small cluster just hangs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Comment Edited] (HBASE-8705) RS holding META when restarted in a single node setup may hang infinitely without META assignment
[ https://issues.apache.org/jira/browse/HBASE-8705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680508#comment-13680508 ] ramkrishna.s.vasudevan edited comment on HBASE-8705 at 6/11/13 6:41 PM: A simple patch that just retries incase of META. What you guys think about it. It is nothing but reintroducing the logic where the assignment was attempted for maxAttempts number of times. This just does that for META incase of no region plan available but with a sleep. was (Author: ram_krish): A simple patch that just retries incase of META. What you guys think about it. It is nothing but reintroducing the logic where the assignment was attempted for maxAttempts number of times. This just does that for META incase of not region plan available but with a sleep. RS holding META when restarted in a single node setup may hang infinitely without META assignment - Key: HBASE-8705 URL: https://issues.apache.org/jira/browse/HBASE-8705 Project: HBase Issue Type: Bug Affects Versions: 0.95.1 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Minor Fix For: 0.98.0 Attachments: HBASE-8705.patch This bug may be minor as it likely to happen in a single node setup. I restarted the RS holding META. The master tried assigning META using MetaSSH. But tried this before the new RS came up. So as not region plan is found {code} if (plan == null) { LOG.warn(Unable to determine a plan to assign + region); if (tomActivated){ this.timeoutMonitor.setAllRegionServersOffline(true); } else { regionStates.updateRegionState(region, RegionState.State.FAILED_OPEN); } return; } {code} we just return without assigment. And this being the META the small cluster just hangs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8705) RS holding META when restarted in a single node setup may hang infinitely without META assignment
[ https://issues.apache.org/jira/browse/HBASE-8705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-8705: -- Status: Patch Available (was: Open) RS holding META when restarted in a single node setup may hang infinitely without META assignment - Key: HBASE-8705 URL: https://issues.apache.org/jira/browse/HBASE-8705 Project: HBase Issue Type: Bug Affects Versions: 0.95.1 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Minor Fix For: 0.98.0 Attachments: HBASE-8705.patch This bug may be minor as it likely to happen in a single node setup. I restarted the RS holding META. The master tried assigning META using MetaSSH. But tried this before the new RS came up. So as not region plan is found {code} if (plan == null) { LOG.warn(Unable to determine a plan to assign + region); if (tomActivated){ this.timeoutMonitor.setAllRegionServersOffline(true); } else { regionStates.updateRegionState(region, RegionState.State.FAILED_OPEN); } return; } {code} we just return without assigment. And this being the META the small cluster just hangs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-8730) Update TestEnvironmentEdgeManager to fix error
Shane Hogan created HBASE-8730: -- Summary: Update TestEnvironmentEdgeManager to fix error Key: HBASE-8730 URL: https://issues.apache.org/jira/browse/HBASE-8730 Project: HBase Issue Type: Test Components: test Affects Versions: 0.89-fb Reporter: Shane Hogan Priority: Trivial Fix For: 0.89-fb Fixes a small issue with the test. Fixing the unit tests false assumption that the delegate starts out being the default delegate. This assumption is violated if another part of the code calls injectEdge with something other than the defaultEnvironmentEdge. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-8664) Small fix ups for memory size outputs in UI
[ https://issues.apache.org/jira/browse/HBASE-8664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HBASE-8664. -- Resolution: Fixed Fix Version/s: 0.98.0 Hadoop Flags: Reviewed Committed to trunk and 0.95. Thanks for review Enis. Small fix ups for memory size outputs in UI --- Key: HBASE-8664 URL: https://issues.apache.org/jira/browse/HBASE-8664 Project: HBase Issue Type: Bug Components: UI Reporter: stack Assignee: stack Fix For: 0.98.0, 0.95.1 Attachments: ui.txt This issue goes in the 'polish' category. On regionserver ui, we were listing raw bytes for heap size, memstore size, etc. I put in place StringUtils.humanReadableInt (looked to see if bootstrap could do it for us but doesn't seem so, not w/o plugin). I then made all the megabytes and kilobytes match StringUtils.humanReadableInt with its 'm' instead of 'MB' and 'k' instead of KB. Removed a stray KB that was in the wrong place too. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8696) Fixup for logs that show when running hbase-it tests.
[ https://issues.apache.org/jira/browse/HBASE-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-8696: - Attachment: 8696v2.txt Update to address Sergey's feedback and then a bunch of more changes. Here is a commit message: {code} Tighten up logs. Mostly shorten thread names, use encoded name for region in RegionStates logging rather than full toString of the HRI. Cleanup in the file archiving so we log less. Add means of asking for more than one regionserver when running standalone. For example, below will start 5 regionservers in the standlone process (need to suppress startup of the info servers to avoid complaint that port already in use) $ ./bin/start-hbase.sh -Dhbase.regionserver.info.port=-1 --localRegionServers=5 M bin/start-hbase.sh Allow passing extraneous args provided when in local mode. Useful when asking for more than one regionserver to be started in the local process. M hbase-client/src/main/java/org/apache/hadoop/hbase/HRegionInfo.java Add a short name method used when logging region name in logs (Just prints out the encoded name) M hbase-client/src/main/java/org/apache/hadoop/hbase/client/ClientScanner.java Was printing table name as bytes...toString it. M hbase-client/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java Record time at which an exception was thrown so that when we dump out all exceptions on failure, we can see the expanse during which retries were operating. M hbase-client/src/main/java/org/apache/hadoop/hbase/client/RetriesExhaustedWithDetailsException.java Print out time at which exception was thrown when doing summary of a list of exceptions. M hbase-client/src/main/java/org/apache/hadoop/hbase/client/ZooKeeperRegistry.java Small fixups. M hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStates.java M hbase-client/src/main/java/org/apache/hadoop/hbase/master/RegionState.java Change the messages so don't output full HRI#toString just encoded region name so lines are not unreadably long. M hbase-it/src/test/java/org/apache/hadoop/hbase/HBaseClusterManager.java Only log if a change. M hbase-it/src/test/java/org/apache/hadoop/hbase/IngestIntegrationTestBase.java Minor fixups. M hbase-server/src/main/java/org/apache/hadoop/hbase/backup/HFileArchiver.java Make some logging trace especially duplicated logging. M hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java Tighten up thread names; instead of 'IPC Server listener on PORT' instead do RpcServer.listener,port=PORT. M hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat.java Fix table name (was bytes) M hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java Make stuff trace. M hbase-server/src/main/java/org/apache/hadoop/hbase/master/CatalogJanitor.java Tighten thread name (make it like the others). M base-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java Tighten thread names. M hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMasterCommandLine.java Add being able to set how many masters in a process and regionservers. M hbase-server/src/main/java/org/apache/hadoop/hbase/master/handler/OpenedRegionHandler.java Print encoded name rather than full region name. M hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java Fix hostname compare (was comparing hostname to servername which never matched) {code} Fixup for logs that show when running hbase-it tests. - Key: HBASE-8696 URL: https://issues.apache.org/jira/browse/HBASE-8696 Project: HBase Issue Type: Improvement Reporter: stack Assignee: stack Fix For: 0.95.1 Attachments: 8696v2.txt, 8698.txt I've been staring at logs trying to figure why hbase-it tests fail. Here are some more log cleanups that come of my frustration trying to read our emissions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8696) Fixup for logs that show when running hbase-it tests.
[ https://issues.apache.org/jira/browse/HBASE-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680546#comment-13680546 ] stack commented on HBASE-8696: -- I put it up on rb here: https://reviews.apache.org/r/11805/ Fixup for logs that show when running hbase-it tests. - Key: HBASE-8696 URL: https://issues.apache.org/jira/browse/HBASE-8696 Project: HBase Issue Type: Improvement Reporter: stack Assignee: stack Fix For: 0.95.1 Attachments: 8696v2.txt, 8698.txt I've been staring at logs trying to figure why hbase-it tests fail. Here are some more log cleanups that come of my frustration trying to read our emissions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8696) Fixup for logs that show when running hbase-it tests.
[ https://issues.apache.org/jira/browse/HBASE-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680552#comment-13680552 ] Hadoop QA commented on HBASE-8696: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12587275/8696v2.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 9 new or modified tests. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6003//console This message is automatically generated. Fixup for logs that show when running hbase-it tests. - Key: HBASE-8696 URL: https://issues.apache.org/jira/browse/HBASE-8696 Project: HBase Issue Type: Improvement Reporter: stack Assignee: stack Fix For: 0.95.1 Attachments: 8696v2.txt, 8698.txt I've been staring at logs trying to figure why hbase-it tests fail. Here are some more log cleanups that come of my frustration trying to read our emissions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8705) RS holding META when restarted in a single node setup may hang infinitely without META assignment
[ https://issues.apache.org/jira/browse/HBASE-8705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680555#comment-13680555 ] stack commented on HBASE-8705: -- +1 Seems innocuous and could help... RS holding META when restarted in a single node setup may hang infinitely without META assignment - Key: HBASE-8705 URL: https://issues.apache.org/jira/browse/HBASE-8705 Project: HBase Issue Type: Bug Affects Versions: 0.95.1 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Minor Fix For: 0.98.0 Attachments: HBASE-8705.patch This bug may be minor as it likely to happen in a single node setup. I restarted the RS holding META. The master tried assigning META using MetaSSH. But tried this before the new RS came up. So as not region plan is found {code} if (plan == null) { LOG.warn(Unable to determine a plan to assign + region); if (tomActivated){ this.timeoutMonitor.setAllRegionServersOffline(true); } else { regionStates.updateRegionState(region, RegionState.State.FAILED_OPEN); } return; } {code} we just return without assigment. And this being the META the small cluster just hangs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7679) implement store file management for stripe compactions
[ https://issues.apache.org/jira/browse/HBASE-7679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-7679: - Attachment: 8696v3.txt Rebase implement store file management for stripe compactions -- Key: HBASE-7679 URL: https://issues.apache.org/jira/browse/HBASE-7679 Project: HBase Issue Type: Sub-task Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: 8696v3.txt, HBASE-7667-and-7603-v0-incomplete.patch, HBASE-7667-and-7603-v0-incomplete.patch, HBASE-7667-and-7603-v1.patch, HBASE-7667-and-7603-v1.patch, HBASE-7667-v1.patch, HBASE-7667-v1.patch, HBASE-7667-v2.patch, HBASE-7667-v2.patch, HBASE-7667-v3.patch, HBASE-7679-v10.patch, HBASE-7679-v11.patch, HBASE-7679-v12.patch, HBASE-7679-v12.patch, HBASE-7679-v13.patch, HBASE-7679-v13.patch, HBASE-7679-v14.patch, HBASE-7679-v15.patch, HBASE-7679-v16.patch, HBASE-7679-v4.patch, HBASE-7679-v5.patch, HBASE-7679-v6.patch, HBASE-7679-v7-.patch, HBASE-7679-v7.patch, HBASE-7679-v8.patch, HBASE-7679-v9.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8729) distributedLogReplay may hang during chained region server failure
[ https://issues.apache.org/jira/browse/HBASE-8729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-8729: -- Attachment: 8729-v2.patch distributedLogReplay may hang during chained region server failure -- Key: HBASE-8729 URL: https://issues.apache.org/jira/browse/HBASE-8729 Project: HBase Issue Type: Bug Components: MTTR Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Fix For: 0.98.0, 0.95.2 Attachments: 8729-v2.patch, 8729-v2.patch, hbase-8729.patch In a test, half cluster(in terms of region servers) was down and some log replay had incurred chained RS failures(receiving RS of a log replay failed again). Since by default, we only allow 3 concurrent SSH handlers(controlled by {code}this.executorService.startExecutorService(ExecutorType.MASTER_SERVER_OPERATIONS,conf.getInt(hbase.master.executor.serverops.threads, 3));{code}). If all 3 SSH handlers are doing logReplay(blocking call) and one of receiving RS fails again then logReplay will hang because regions of the newly failed RS can't be re-assigned to another live RS(no ssh handler will be processed due to max threads setting) and existing log replay will keep routing replay traffic to the dead RS. The fix is to submit logReplay work into a separate type of executor queue in order not to block SSH region assignment so that logReplay can route traffic to a live RS after retries and move forward. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-8731) Use the JDK 1.7 in the precommit env for trunk
Nicolas Liochon created HBASE-8731: -- Summary: Use the JDK 1.7 in the precommit env for trunk Key: HBASE-8731 URL: https://issues.apache.org/jira/browse/HBASE-8731 Project: HBase Issue Type: Improvement Components: build Affects Versions: 0.98.0 Reporter: Nicolas Liochon Assignee: Giridharan Kesavan Fix For: 0.98.0 HBase today uses the jdk 1.6. In the past it created issues when we tried to use 1.7 for the core build while the precommit was on 1.6. Having the precommit on 1.7 would solve this. The best is to start with trunk. Likely 0.95 will come next, and may be, a day, 0.94. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8729) distributedLogReplay may hang during chained region server failure
[ https://issues.apache.org/jira/browse/HBASE-8729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-8729: -- Attachment: (was: 8729-v2.patch) distributedLogReplay may hang during chained region server failure -- Key: HBASE-8729 URL: https://issues.apache.org/jira/browse/HBASE-8729 Project: HBase Issue Type: Bug Components: MTTR Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Fix For: 0.98.0, 0.95.2 Attachments: 8729-v2.patch, hbase-8729.patch In a test, half cluster(in terms of region servers) was down and some log replay had incurred chained RS failures(receiving RS of a log replay failed again). Since by default, we only allow 3 concurrent SSH handlers(controlled by {code}this.executorService.startExecutorService(ExecutorType.MASTER_SERVER_OPERATIONS,conf.getInt(hbase.master.executor.serverops.threads, 3));{code}). If all 3 SSH handlers are doing logReplay(blocking call) and one of receiving RS fails again then logReplay will hang because regions of the newly failed RS can't be re-assigned to another live RS(no ssh handler will be processed due to max threads setting) and existing log replay will keep routing replay traffic to the dead RS. The fix is to submit logReplay work into a separate type of executor queue in order not to block SSH region assignment so that logReplay can route traffic to a live RS after retries and move forward. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-8732) Changing Encoding on Column Families errors out
Elliott Clark created HBASE-8732: Summary: Changing Encoding on Column Families errors out Key: HBASE-8732 URL: https://issues.apache.org/jira/browse/HBASE-8732 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.98.0, 0.95.1 Reporter: Elliott Clark -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8726) Create an Integration Test for online schema change
[ https://issues.apache.org/jira/browse/HBASE-8726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HBASE-8726: - Attachment: HBASE-8726-0.patch Here's a pretty simple test that uses ChaosMonkey to try and modify column families. Create an Integration Test for online schema change --- Key: HBASE-8726 URL: https://issues.apache.org/jira/browse/HBASE-8726 Project: HBase Issue Type: Bug Components: Admin Reporter: Elliott Clark Assignee: Elliott Clark Attachments: HBASE-8726-0.patch With table locks in place it should be time to start really testing online table schema changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8732) Changing Encoding on Column Families errors out
[ https://issues.apache.org/jira/browse/HBASE-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HBASE-8732: - Description: Getting an error when opening a scanner on a file that has no encoding. Changing Encoding on Column Families errors out --- Key: HBASE-8732 URL: https://issues.apache.org/jira/browse/HBASE-8732 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.98.0, 0.95.1 Reporter: Elliott Clark Getting an error when opening a scanner on a file that has no encoding. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8726) Create an Integration Test for online schema change
[ https://issues.apache.org/jira/browse/HBASE-8726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HBASE-8726: - Affects Version/s: 0.95.1 0.98.0 Status: Patch Available (was: Open) Create an Integration Test for online schema change --- Key: HBASE-8726 URL: https://issues.apache.org/jira/browse/HBASE-8726 Project: HBase Issue Type: Bug Components: Admin Affects Versions: 0.98.0, 0.95.1 Reporter: Elliott Clark Assignee: Elliott Clark Attachments: HBASE-8726-0.patch With table locks in place it should be time to start really testing online table schema changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8732) Changing Encoding on Column Families errors out
[ https://issues.apache.org/jira/browse/HBASE-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680608#comment-13680608 ] Elliott Clark commented on HBASE-8732: -- Getting this error: {code} Caused by: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException: java.io.IOException: Could not seek StoreFileScanner[HFileScanner for reader reader=hdfs://localhost:57053/user/eclark/hbase/IntegrationTestModifyColumns/d2c63aa3399aaf7e40bf7d045c0bb1ca/test_cf/d020ed015d9b4c73b08b06192095e4be, compression=none, cacheConf=CacheConfig:enabled [cacheDataOnRead=true] [cacheDataOnWrite=false] [cacheIndexesOnWrite=false] [cacheBloomsOnWrite=false] [cacheEvictOnClose=false] [cacheCompressed=false], firstKey=1115ec15a4637bb614390d16c81ea881-105445/test_cf:0/1370980134228/Put, lastKey=221cdbd49831660e254edeb0c4b51109-102317/test_cf:0/1370980122463/Put, avgKeyLen=59, avgValueLen=100, entries=6441, length=1089866, cur=null] to key 1a860448b5d2824f0a7163839fe04f6e-109693/test_cf:/LATEST_TIMESTAMP/DeleteFamily/vlen=0/mvcc=0 at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:154) at org.apache.hadoop.hbase.regionserver.StoreScanner.init(StoreScanner.java:160) at org.apache.hadoop.hbase.regionserver.HStore.getScanner(HStore.java:1623) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.init(HRegion.java:3507) at org.apache.hadoop.hbase.regionserver.HRegion.instantiateRegionScanner(HRegion.java:1705) at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1697) at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1674) at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4452) at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4427) at org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2743) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:20926) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2122) at org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1829) Caused by: java.io.IOException: Cached block under key d020ed015d9b4c73b08b06192095e4be_590914_FAST_DIFF has wrong encoding: null (expected: FAST_DIFF) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:319) at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:254) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:469) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:490) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:222) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:142) ... 12 more at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1336) at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1540) at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1597) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.get(ClientProtos.java:21331) at org.apache.hadoop.hbase.protobuf.ProtobufUtil.get(ProtobufUtil.java:1233) ... 8 more {code} Changing Encoding on Column Families errors out --- Key: HBASE-8732 URL: https://issues.apache.org/jira/browse/HBASE-8732 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.98.0, 0.95.1 Reporter: Elliott Clark Getting an error when opening a scanner on a file that has no encoding. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7679) implement store file management for stripe compactions
[ https://issues.apache.org/jira/browse/HBASE-7679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680615#comment-13680615 ] Sergey Shelukhin commented on HBASE-7679: - this appears to be the wrong JIRA implement store file management for stripe compactions -- Key: HBASE-7679 URL: https://issues.apache.org/jira/browse/HBASE-7679 Project: HBase Issue Type: Sub-task Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: 8696v3.txt, HBASE-7667-and-7603-v0-incomplete.patch, HBASE-7667-and-7603-v0-incomplete.patch, HBASE-7667-and-7603-v1.patch, HBASE-7667-and-7603-v1.patch, HBASE-7667-v1.patch, HBASE-7667-v1.patch, HBASE-7667-v2.patch, HBASE-7667-v2.patch, HBASE-7667-v3.patch, HBASE-7679-v10.patch, HBASE-7679-v11.patch, HBASE-7679-v12.patch, HBASE-7679-v12.patch, HBASE-7679-v13.patch, HBASE-7679-v13.patch, HBASE-7679-v14.patch, HBASE-7679-v15.patch, HBASE-7679-v16.patch, HBASE-7679-v4.patch, HBASE-7679-v5.patch, HBASE-7679-v6.patch, HBASE-7679-v7-.patch, HBASE-7679-v7.patch, HBASE-7679-v8.patch, HBASE-7679-v9.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8726) Create an Integration Test for online schema change
[ https://issues.apache.org/jira/browse/HBASE-8726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680623#comment-13680623 ] Sergey Shelukhin commented on HBASE-8726: - Some comments are stale (e.g. the one mentioning kills for CHAOS_EVERY_MS). {code} new AddColumnPolicy(tableName, new HBaseAdmin(util.getConfiguration())), {code} passing HBaseAdmin is not necessary, Action class has context that has admin, as well as other random stuff. Action is called policy which is kind of confusing. You are making online changes enabled by default, is this intended in this JIRA? Create an Integration Test for online schema change --- Key: HBASE-8726 URL: https://issues.apache.org/jira/browse/HBASE-8726 Project: HBase Issue Type: Bug Components: Admin Affects Versions: 0.98.0, 0.95.1 Reporter: Elliott Clark Assignee: Elliott Clark Attachments: HBASE-8726-0.patch With table locks in place it should be time to start really testing online table schema changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8726) Create an Integration Test for online schema change
[ https://issues.apache.org/jira/browse/HBASE-8726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680640#comment-13680640 ] stack commented on HBASE-8726: -- Would be cool if we could enable it as default (if it passes these tests). Patch looks good to me (caveat the suggestions [~sershe] makes). Create an Integration Test for online schema change --- Key: HBASE-8726 URL: https://issues.apache.org/jira/browse/HBASE-8726 Project: HBase Issue Type: Bug Components: Admin Affects Versions: 0.98.0, 0.95.1 Reporter: Elliott Clark Assignee: Elliott Clark Attachments: HBASE-8726-0.patch With table locks in place it should be time to start really testing online table schema changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7679) implement store file management for stripe compactions
[ https://issues.apache.org/jira/browse/HBASE-7679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-7679: - Attachment: (was: 8696v3.txt) implement store file management for stripe compactions -- Key: HBASE-7679 URL: https://issues.apache.org/jira/browse/HBASE-7679 Project: HBase Issue Type: Sub-task Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HBASE-7667-and-7603-v0-incomplete.patch, HBASE-7667-and-7603-v0-incomplete.patch, HBASE-7667-and-7603-v1.patch, HBASE-7667-and-7603-v1.patch, HBASE-7667-v1.patch, HBASE-7667-v1.patch, HBASE-7667-v2.patch, HBASE-7667-v2.patch, HBASE-7667-v3.patch, HBASE-7679-v10.patch, HBASE-7679-v11.patch, HBASE-7679-v12.patch, HBASE-7679-v12.patch, HBASE-7679-v13.patch, HBASE-7679-v13.patch, HBASE-7679-v14.patch, HBASE-7679-v15.patch, HBASE-7679-v16.patch, HBASE-7679-v4.patch, HBASE-7679-v5.patch, HBASE-7679-v6.patch, HBASE-7679-v7-.patch, HBASE-7679-v7.patch, HBASE-7679-v8.patch, HBASE-7679-v9.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8696) Fixup for logs that show when running hbase-it tests.
[ https://issues.apache.org/jira/browse/HBASE-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-8696: - Attachment: 8696v3.txt Rebase Fixup for logs that show when running hbase-it tests. - Key: HBASE-8696 URL: https://issues.apache.org/jira/browse/HBASE-8696 Project: HBase Issue Type: Improvement Reporter: stack Assignee: stack Fix For: 0.95.1 Attachments: 8696v2.txt, 8696v3.txt, 8698.txt I've been staring at logs trying to figure why hbase-it tests fail. Here are some more log cleanups that come of my frustration trying to read our emissions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8726) Create an Integration Test for online schema change
[ https://issues.apache.org/jira/browse/HBASE-8726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680646#comment-13680646 ] Elliott Clark commented on HBASE-8726: -- bq.passing HBaseAdmin is not necessary, Action class has context that has admin, as well as other random stuff. The context class is all private with a comment about how whoever wrote the actions wanted the internals to be private so I went the route of passing an admin. bq.Action is called policy which is kind of confusing. True. I'll rename those. bq.You are making online changes enabled by default, is this intended in this JIRA? Yes when we can get this to run stablely for hours I would like to make it default. Until then I don't think committing this is right yet. This test exposed HBASE-8732 in the first 10 mins. So I expect there are still more bugs before we can make it default. Create an Integration Test for online schema change --- Key: HBASE-8726 URL: https://issues.apache.org/jira/browse/HBASE-8726 Project: HBase Issue Type: Bug Components: Admin Affects Versions: 0.98.0, 0.95.1 Reporter: Elliott Clark Assignee: Elliott Clark Attachments: HBASE-8726-0.patch With table locks in place it should be time to start really testing online table schema changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8729) distributedLogReplay may hang during chained region server failure
[ https://issues.apache.org/jira/browse/HBASE-8729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680659#comment-13680659 ] stack commented on HBASE-8729: -- Why is it M_MASTER_LOG_REPLAY rather than just M_LOG_REPLAY? (Don't M mean MASTER?) Make this name shorter: + MASTER_LOG_REPLAY_OPERATIONS(7). M_LOG_REPLAY_OPS. It is name of thread and shows all over logs so terse is better. Should be its own config? + this.executorService.startExecutorService(ExecutorType.MASTER_LOG_REPLAY_OPERATIONS, + conf.getInt(hbase.master.executor.serverops.threads, 15)); ... rather than serverops? Rather than a log replay handler, should we instead have M_SERVER_SHUTDOWN be its own type... and then make N executor slots for server shutdown handling rather than for log reaplay? Would then make the exit of server shutdown handler nicer in that when we leave it, we have processed the server rather than as we have in this patch where we go off to another executor for completion? distributedLogReplay may hang during chained region server failure -- Key: HBASE-8729 URL: https://issues.apache.org/jira/browse/HBASE-8729 Project: HBase Issue Type: Bug Components: MTTR Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Fix For: 0.98.0, 0.95.2 Attachments: 8729-v2.patch, hbase-8729.patch In a test, half cluster(in terms of region servers) was down and some log replay had incurred chained RS failures(receiving RS of a log replay failed again). Since by default, we only allow 3 concurrent SSH handlers(controlled by {code}this.executorService.startExecutorService(ExecutorType.MASTER_SERVER_OPERATIONS,conf.getInt(hbase.master.executor.serverops.threads, 3));{code}). If all 3 SSH handlers are doing logReplay(blocking call) and one of receiving RS fails again then logReplay will hang because regions of the newly failed RS can't be re-assigned to another live RS(no ssh handler will be processed due to max threads setting) and existing log replay will keep routing replay traffic to the dead RS. The fix is to submit logReplay work into a separate type of executor queue in order not to block SSH region assignment so that logReplay can route traffic to a live RS after retries and move forward. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8726) Create an Integration Test for online schema change
[ https://issues.apache.org/jira/browse/HBASE-8726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-8726: - Fix Version/s: 0.95.2 Adding to 0.95.2. Create an Integration Test for online schema change --- Key: HBASE-8726 URL: https://issues.apache.org/jira/browse/HBASE-8726 Project: HBase Issue Type: Bug Components: Admin Affects Versions: 0.98.0, 0.95.1 Reporter: Elliott Clark Assignee: Elliott Clark Fix For: 0.95.2 Attachments: HBASE-8726-0.patch With table locks in place it should be time to start really testing online table schema changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8732) Changing Encoding on Column Families errors out
[ https://issues.apache.org/jira/browse/HBASE-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-8732: - Priority: Critical (was: Major) Fix Version/s: 0.95.2 Changing Encoding on Column Families errors out --- Key: HBASE-8732 URL: https://issues.apache.org/jira/browse/HBASE-8732 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.98.0, 0.95.1 Reporter: Elliott Clark Priority: Critical Fix For: 0.95.2 Getting an error when opening a scanner on a file that has no encoding. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8706) Some improvement in snapshot
[ https://issues.apache.org/jira/browse/HBASE-8706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-8706: --- Attachment: HBASE-8706-v4.patch Added some fixes around the use of wakeTime/keepAlive/timeout. patch looks good for me, any other comments? Some improvement in snapshot Key: HBASE-8706 URL: https://issues.apache.org/jira/browse/HBASE-8706 Project: HBase Issue Type: Bug Components: snapshots Affects Versions: 0.94.8, 0.95.0 Reporter: binlijin Attachments: HBASE-8706-2.patch, HBASE-8706-3.patch, HBASE-8706.patch, HBASE-8706-v4.patch (1)timeout for Procedure can not be configured. {code} Procedure's timeout ProcedureCoordinator final static long TIMEOUT_MILLIS_DEFAULT = 6; createProcedure(ForeignExceptionDispatcher fed, String procName, byte[] procArgs, ListString expectedMembers) { // build the procedure return new Procedure(this, fed, WAKE_MILLIS_DEFAULT, TIMEOUT_MILLIS_DEFAULT, procName, procArgs, expectedMembers); } RegionServerSnapshotManager: /** Conf key for max time to keep threads in snapshot request pool waiting */ public static final String SNAPSHOT_TIMEOUT_MILLIS_KEY = hbase.snapshot.region.timeout; /** Keep threads alive in request pool for max of 60 seconds */ public static final long SNAPSHOT_TIMEOUT_MILLIS_DEFAULT = 6; public Subprocedure buildSubprocedure(SnapshotDescription snapshot) { long timeoutMillis = conf.getLong(SNAPSHOT_TIMEOUT_MILLIS_KEY, SNAPSHOT_TIMEOUT_MILLIS_DEFAULT); case FLUSH: SnapshotSubprocedurePool taskManager = new SnapshotSubprocedurePool(rss.getServerName().toString(), conf); } {code} (2)TakeSnapshotHandler after snapshotRegions we should call monitor.rethrowException(); to check if there is exception and if there is we can skip the verifySnapshot (3)too much error message when error happened in some place. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8721) fix for bug that delete can mask puts that happened after the delete was entered
[ https://issues.apache.org/jira/browse/HBASE-8721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680676#comment-13680676 ] stack commented on HBASE-8721: -- [~fenghh] On keeping deleted cells, it is an option I believe. See http://hbase.apache.org/book.html#cf.keep.deleted [~fenghh] Agree that the way delete works is uncanny where we could a put after a delete will go unseen. Thank you for looking into this. You are using mvcc when rather it should be sequenceid that you should be using? Is that so? mvcc is used cloaking memstore state doing a reveal only after all that makes up a transaction has been written across the row. sequenceid is given when we add something to the WAL and it used ensuring ordering when doing WAL replays. fix for bug that delete can mask puts that happened after the delete was entered Key: HBASE-8721 URL: https://issues.apache.org/jira/browse/HBASE-8721 Project: HBase Issue Type: Bug Components: regionserver Reporter: Feng Honghua Attachments: HBASE-8721-0.94-V0.patch this fix aims for bug mentioned in http://hbase.apache.org/book.html 5.8.2.1: Deletes mask puts, even puts that happened after the delete was entered. Remember that a delete writes a tombstone, which only disappears after then next major compaction has run. Suppose you do a delete of everything = T. After this you do a new put with a timestamp = T. This put, even if it happened after the delete, will be masked by the delete tombstone. Performing the put will not fail, but when you do a get you will notice the put did have no effect. It will start working again after the major compaction has run. These issues should not be a problem if you use always-increasing versions for new puts to a row. But they can occur even if you do not care about time: just do delete and put immediately after each other, and there is some chance they happen within the same millisecond. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8699) Parameter to DistributedFileSystem#isFileClosed should be of type Path
[ https://issues.apache.org/jira/browse/HBASE-8699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680692#comment-13680692 ] Ted Yu commented on HBASE-8699: --- [~stack]: What do you think of the patch ? Thanks Parameter to DistributedFileSystem#isFileClosed should be of type Path -- Key: HBASE-8699 URL: https://issues.apache.org/jira/browse/HBASE-8699 Project: HBase Issue Type: Bug Components: wal Reporter: Ted Yu Assignee: Ted Yu Attachments: 8699-v1.txt Here is current code of FSHDFSUtils#isFileClosed(): {code} boolean isFileClosed(final DistributedFileSystem dfs, final Path p) { try { Method m = dfs.getClass().getMethod(isFileClosed, new Class?[] {String.class}); return (Boolean) m.invoke(dfs, p.toString()); {code} We look for isFileClosed method with parameter type of String. However, from hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java (branch-2): {code} public boolean isFileClosed(Path src) throws IOException { {code} The parameter type is of Path. This means we would get NoSuchMethodException. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8664) Small fix ups for memory size outputs in UI
[ https://issues.apache.org/jira/browse/HBASE-8664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680696#comment-13680696 ] Hudson commented on HBASE-8664: --- Integrated in hbase-0.95 #236 (See [https://builds.apache.org/job/hbase-0.95/236/]) HBASE-8664 Small fix ups for memory size outputs in UI (Revision 1491903) Result = FAILURE stack : Files : * /hbase/branches/0.95/hbase-server/src/main/jamon/org/apache/hadoop/hbase/tmpl/master/RegionServerListTmpl.jamon * /hbase/branches/0.95/hbase-server/src/main/jamon/org/apache/hadoop/hbase/tmpl/regionserver/RegionListTmpl.jamon * /hbase/branches/0.95/hbase-server/src/main/jamon/org/apache/hadoop/hbase/tmpl/regionserver/ServerMetricsTmpl.jamon Small fix ups for memory size outputs in UI --- Key: HBASE-8664 URL: https://issues.apache.org/jira/browse/HBASE-8664 Project: HBase Issue Type: Bug Components: UI Reporter: stack Assignee: stack Fix For: 0.98.0, 0.95.1 Attachments: ui.txt This issue goes in the 'polish' category. On regionserver ui, we were listing raw bytes for heap size, memstore size, etc. I put in place StringUtils.humanReadableInt (looked to see if bootstrap could do it for us but doesn't seem so, not w/o plugin). I then made all the megabytes and kilobytes match StringUtils.humanReadableInt with its 'm' instead of 'MB' and 'k' instead of KB. Removed a stray KB that was in the wrong place too. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8699) Parameter to DistributedFileSystem#isFileClosed should be of type Path
[ https://issues.apache.org/jira/browse/HBASE-8699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680708#comment-13680708 ] Elliott Clark commented on HBASE-8699: -- bq.Is there a reliable way to detect hadoop version ? I am not aware of one. That's what the hadoop-compat modules are there for. Anything hadoop 2+ will have hbase-hadoop2-compat on the cp. That seems like a good solution. Parameter to DistributedFileSystem#isFileClosed should be of type Path -- Key: HBASE-8699 URL: https://issues.apache.org/jira/browse/HBASE-8699 Project: HBase Issue Type: Bug Components: wal Reporter: Ted Yu Assignee: Ted Yu Attachments: 8699-v1.txt Here is current code of FSHDFSUtils#isFileClosed(): {code} boolean isFileClosed(final DistributedFileSystem dfs, final Path p) { try { Method m = dfs.getClass().getMethod(isFileClosed, new Class?[] {String.class}); return (Boolean) m.invoke(dfs, p.toString()); {code} We look for isFileClosed method with parameter type of String. However, from hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java (branch-2): {code} public boolean isFileClosed(Path src) throws IOException { {code} The parameter type is of Path. This means we would get NoSuchMethodException. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3787) Increment is non-idempotent but client retries RPC
[ https://issues.apache.org/jira/browse/HBASE-3787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680709#comment-13680709 ] stack commented on HBASE-3787: -- Yeah, can't cache KV. Can we have something for one server first? Increment is non-idempotent but client retries RPC -- Key: HBASE-3787 URL: https://issues.apache.org/jira/browse/HBASE-3787 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.94.4, 0.95.2 Reporter: dhruba borthakur Assignee: Sergey Shelukhin Priority: Critical Fix For: 0.95.1 Attachments: HBASE-3787-partial.patch, HBASE-3787-v0.patch, HBASE-3787-v1.patch, HBASE-3787-v2.patch, HBASE-3787-v3.patch, HBASE-3787-v4.patch, HBASE-3787-v5.patch, HBASE-3787-v5.patch The HTable.increment() operation is non-idempotent. The client retries the increment RPC a few times (as specified by configuration) before throwing an error to the application. This makes it possible that the same increment call be applied twice at the server. For increment operations, is it better to use HConnectionManager.getRegionServerWithoutRetries()? Another option would be to enhance the IPC module to make the RPC server correctly identify if the RPC is a retry attempt and handle accordingly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8696) Fixup for logs that show when running hbase-it tests.
[ https://issues.apache.org/jira/browse/HBASE-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-8696: - Attachment: 8696v4.txt Update w/ Sergey comments addressed. Fixup for logs that show when running hbase-it tests. - Key: HBASE-8696 URL: https://issues.apache.org/jira/browse/HBASE-8696 Project: HBase Issue Type: Improvement Reporter: stack Assignee: stack Fix For: 0.95.1 Attachments: 8696v2.txt, 8696v3.txt, 8696v4.txt, 8698.txt I've been staring at logs trying to figure why hbase-it tests fail. Here are some more log cleanups that come of my frustration trying to read our emissions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8699) Parameter to DistributedFileSystem#isFileClosed should be of type Path
[ https://issues.apache.org/jira/browse/HBASE-8699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680712#comment-13680712 ] Ted Yu commented on HBASE-8699: --- Currently hadoop 1.2.0 contains DistributedFileSystem#isFileClosed that HBase can use. Should 1.2.0 be covered ? Parameter to DistributedFileSystem#isFileClosed should be of type Path -- Key: HBASE-8699 URL: https://issues.apache.org/jira/browse/HBASE-8699 Project: HBase Issue Type: Bug Components: wal Reporter: Ted Yu Assignee: Ted Yu Attachments: 8699-v1.txt Here is current code of FSHDFSUtils#isFileClosed(): {code} boolean isFileClosed(final DistributedFileSystem dfs, final Path p) { try { Method m = dfs.getClass().getMethod(isFileClosed, new Class?[] {String.class}); return (Boolean) m.invoke(dfs, p.toString()); {code} We look for isFileClosed method with parameter type of String. However, from hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java (branch-2): {code} public boolean isFileClosed(Path src) throws IOException { {code} The parameter type is of Path. This means we would get NoSuchMethodException. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8664) Small fix ups for memory size outputs in UI
[ https://issues.apache.org/jira/browse/HBASE-8664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680721#comment-13680721 ] Hudson commented on HBASE-8664: --- Integrated in HBase-TRUNK #4173 (See [https://builds.apache.org/job/HBase-TRUNK/4173/]) HBASE-8664 Small fix ups for memory size outputs in UI (Revision 1491902) Result = SUCCESS stack : Files : * /hbase/trunk/hbase-server/src/main/jamon/org/apache/hadoop/hbase/tmpl/master/RegionServerListTmpl.jamon * /hbase/trunk/hbase-server/src/main/jamon/org/apache/hadoop/hbase/tmpl/regionserver/RegionListTmpl.jamon * /hbase/trunk/hbase-server/src/main/jamon/org/apache/hadoop/hbase/tmpl/regionserver/ServerMetricsTmpl.jamon Small fix ups for memory size outputs in UI --- Key: HBASE-8664 URL: https://issues.apache.org/jira/browse/HBASE-8664 Project: HBase Issue Type: Bug Components: UI Reporter: stack Assignee: stack Fix For: 0.98.0, 0.95.1 Attachments: ui.txt This issue goes in the 'polish' category. On regionserver ui, we were listing raw bytes for heap size, memstore size, etc. I put in place StringUtils.humanReadableInt (looked to see if bootstrap could do it for us but doesn't seem so, not w/o plugin). I then made all the megabytes and kilobytes match StringUtils.humanReadableInt with its 'm' instead of 'MB' and 'k' instead of KB. Removed a stray KB that was in the wrong place too. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8699) Parameter to DistributedFileSystem#isFileClosed should be of type Path
[ https://issues.apache.org/jira/browse/HBASE-8699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680719#comment-13680719 ] Ted Yu commented on HBASE-8699: --- bq. Anything hadoop 2+ will have hbase-hadoop2-compat on the cp lib/hbase-hadoop2-compat-0.95.1.jar would be on the classpath. Does it reveal the underlying hadoop version ? Parameter to DistributedFileSystem#isFileClosed should be of type Path -- Key: HBASE-8699 URL: https://issues.apache.org/jira/browse/HBASE-8699 Project: HBase Issue Type: Bug Components: wal Reporter: Ted Yu Assignee: Ted Yu Attachments: 8699-v1.txt Here is current code of FSHDFSUtils#isFileClosed(): {code} boolean isFileClosed(final DistributedFileSystem dfs, final Path p) { try { Method m = dfs.getClass().getMethod(isFileClosed, new Class?[] {String.class}); return (Boolean) m.invoke(dfs, p.toString()); {code} We look for isFileClosed method with parameter type of String. However, from hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java (branch-2): {code} public boolean isFileClosed(Path src) throws IOException { {code} The parameter type is of Path. This means we would get NoSuchMethodException. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8699) Parameter to DistributedFileSystem#isFileClosed should be of type Path
[ https://issues.apache.org/jira/browse/HBASE-8699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680725#comment-13680725 ] stack commented on HBASE-8699: -- What Elliott said. org.apache.hadoop.util.VersionInfo.getVersion() will give you hadoop version... (over in hadoop-one-compat, if 1.2, change test result?) Parameter to DistributedFileSystem#isFileClosed should be of type Path -- Key: HBASE-8699 URL: https://issues.apache.org/jira/browse/HBASE-8699 Project: HBase Issue Type: Bug Components: wal Reporter: Ted Yu Assignee: Ted Yu Attachments: 8699-v1.txt Here is current code of FSHDFSUtils#isFileClosed(): {code} boolean isFileClosed(final DistributedFileSystem dfs, final Path p) { try { Method m = dfs.getClass().getMethod(isFileClosed, new Class?[] {String.class}); return (Boolean) m.invoke(dfs, p.toString()); {code} We look for isFileClosed method with parameter type of String. However, from hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java (branch-2): {code} public boolean isFileClosed(Path src) throws IOException { {code} The parameter type is of Path. This means we would get NoSuchMethodException. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8732) Changing Encoding on Column Families errors out
[ https://issues.apache.org/jira/browse/HBASE-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680722#comment-13680722 ] Elliott Clark commented on HBASE-8732: -- It seems like FastDiff is the culprit here. If I change the test to not use fast diff then it passes. Changing Encoding on Column Families errors out --- Key: HBASE-8732 URL: https://issues.apache.org/jira/browse/HBASE-8732 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.98.0, 0.95.1 Reporter: Elliott Clark Priority: Critical Fix For: 0.95.2 Getting an error when opening a scanner on a file that has no encoding. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8706) Some improvement in snapshot
[ https://issues.apache.org/jira/browse/HBASE-8706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680728#comment-13680728 ] stack commented on HBASE-8706: -- Skimmed the patch. lgtm. Some improvement in snapshot Key: HBASE-8706 URL: https://issues.apache.org/jira/browse/HBASE-8706 Project: HBase Issue Type: Bug Components: snapshots Affects Versions: 0.94.8, 0.95.0 Reporter: binlijin Attachments: HBASE-8706-2.patch, HBASE-8706-3.patch, HBASE-8706.patch, HBASE-8706-v4.patch (1)timeout for Procedure can not be configured. {code} Procedure's timeout ProcedureCoordinator final static long TIMEOUT_MILLIS_DEFAULT = 6; createProcedure(ForeignExceptionDispatcher fed, String procName, byte[] procArgs, ListString expectedMembers) { // build the procedure return new Procedure(this, fed, WAKE_MILLIS_DEFAULT, TIMEOUT_MILLIS_DEFAULT, procName, procArgs, expectedMembers); } RegionServerSnapshotManager: /** Conf key for max time to keep threads in snapshot request pool waiting */ public static final String SNAPSHOT_TIMEOUT_MILLIS_KEY = hbase.snapshot.region.timeout; /** Keep threads alive in request pool for max of 60 seconds */ public static final long SNAPSHOT_TIMEOUT_MILLIS_DEFAULT = 6; public Subprocedure buildSubprocedure(SnapshotDescription snapshot) { long timeoutMillis = conf.getLong(SNAPSHOT_TIMEOUT_MILLIS_KEY, SNAPSHOT_TIMEOUT_MILLIS_DEFAULT); case FLUSH: SnapshotSubprocedurePool taskManager = new SnapshotSubprocedurePool(rss.getServerName().toString(), conf); } {code} (2)TakeSnapshotHandler after snapshotRegions we should call monitor.rethrowException(); to check if there is exception and if there is we can skip the verifySnapshot (3)too much error message when error happened in some place. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8665) bad compaction priority behavior in queue can cause store to be blocked
[ https://issues.apache.org/jira/browse/HBASE-8665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680736#comment-13680736 ] Sergey Shelukhin commented on HBASE-8665: - [~saint@gmail.com] ping? bad compaction priority behavior in queue can cause store to be blocked --- Key: HBASE-8665 URL: https://issues.apache.org/jira/browse/HBASE-8665 Project: HBase Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HBASE-8665-v0.patch Note that this can be solved by bumping up the number of compaction threads but still it seems like this priority inversion should be dealt with. There's a store with 1 big file and 3 flushes (1 2 3 4) sitting around and minding its own business when it decides to compact. Compaction (2 3 4) is created and put in queue, it's low priority, so it doesn't get out of the queue for some time - other stores are compacting. Meanwhile more files are flushed and at (1 2 3 4 5 6 7) it decides to compact (5 6 7). This compaction now has higher priority than the first one. After that if the load is high it enters vicious cycle of compacting and compacting files as they arrive, with store being blocked on and off, with the (2 3 4) compaction staying in queue for up to ~20 minutes (that I've seen). I wonder why we do thing thing where we queue compaction and compact separately. Perhaps we should take snapshot of all store priorities, then do select in order and execute the first compaction we find. This will need starvation safeguard too but should probably be better. Btw, exploring compaction policy may be more prone to this, as it can select files from the middle, not just beginning, which, given the treatment of already selected files that was not changed from the old ratio-based one (all files with lower seqNums than the ones selected are also ineligible for further selection), will make more files ineligible (e.g. imagine with 10 blocking files, with 8 present (1-8), (6 7 8) being selected and getting stuck). Today I see the case that would also apply to old policy, but yesterday I saw file distribution something like this: 4,5g, 2,1g, 295,9m, 113,3m, 68,0m, 67,8m, 1,1g, 295,1m, 100,4m, unfortunately w/o enough logs to figure out how it resulted. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8667) Master and Regionserver not able to communicate if both bound to different network interfaces on the same machine.
[ https://issues.apache.org/jira/browse/HBASE-8667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680733#comment-13680733 ] stack commented on HBASE-8667: -- bq. Then we need to initialize rpc server in RS with the hostname recieved from master after checkin right? Otherwise we will have this issue. The regionserver just takes the name and uses it in subsequent communication w/ the master -- it does not change where it is bound based of the name the master gave it. Are you suggesting that regionserver only set up an rpcserver after it has gotten name from master? What if this disagrees w/ what the operator told us use in the configuration? Isn't what we have here a setup problem; we have regionserver on localhost and master on an ip? Can you have regionserver bind to same ip? Master and Regionserver not able to communicate if both bound to different network interfaces on the same machine. -- Key: HBASE-8667 URL: https://issues.apache.org/jira/browse/HBASE-8667 Project: HBase Issue Type: Bug Components: IPC/RPC Reporter: rajeshbabu Fix For: 0.98.0, 0.95.2, 0.94.9 Attachments: HBASE-8667_Trunk.patch, HBASE-8667_Trunk-V2.patch While testing HBASE-8640 fix found that master and regionserver running on different interfaces are not communicating properly. I have two interfaces 1) lo 2) eth0 in my machine and default hostname interface is lo. I have configured master ipc address to ip of eth0 interface. Started master and regionserver on the same machine. 1) master rpc server bound to eth0 and RS rpc server bound to lo 2) Since rpc client is not binding to any ip address, when RS is reporting RS startup its getting registered with eth0 ip address(but actually it should register localhost) Here are RS logs: {code} 2013-05-31 06:05:28,608 WARN [regionserver60020] org.apache.hadoop.hbase.regionserver.HRegionServer: reportForDuty failed; sleeping and then retrying. 2013-05-31 06:05:31,609 INFO [regionserver60020] org.apache.hadoop.hbase.regionserver.HRegionServer: Attempting connect to Master server at 192.168.0.100,6,1369960497008 2013-05-31 06:05:31,609 INFO [regionserver60020] org.apache.hadoop.hbase.regionserver.HRegionServer: Telling master at 192.168.0.100,6,1369960497008 that we are up with port=60020, startcode=1369960502544 2013-05-31 06:05:31,618 DEBUG [regionserver60020] org.apache.hadoop.hbase.regionserver.HRegionServer: Config from master: hbase.rootdir=hdfs://localhost:2851/hbase 2013-05-31 06:05:31,618 DEBUG [regionserver60020] org.apache.hadoop.hbase.regionserver.HRegionServer: Config from master: fs.default.name=hdfs://localhost:2851 2013-05-31 06:05:31,618 INFO [regionserver60020] org.apache.hadoop.hbase.regionserver.HRegionServer: Master passed us a different hostname to use; was=localhost, but now=192.168.0.100 {code} Here are master logs: {code} 2013-05-31 06:05:31,615 INFO [IPC Server handler 9 on 6] org.apache.hadoop.hbase.master.ServerManager: Registering server=192.168.0.100,60020,1369960502544 {code} Since master has wrong rpc server address of RS, META is not getting assigned. {code} 2013-05-31 06:05:34,362 DEBUG [master-192.168.0.100,6,1369960497008] org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for .META.,,1.1028785192 so generated a random one; hri=.META.,,1.1028785192, src=, dest=192.168.0.100,60020,1369960502544; 1 (online=1, available=1) available servers, forceNewPlan=false - org.apache.hadoop.hbase.master.AssignmentManager: Failed assignment of .META.,,1.1028785192 to 192.168.0.100,60020,1369960502544, trying to assign elsewhere instead; try=1 of 10 java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:592) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:511) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:481) at org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupConnection(RpcClient.java:549) at org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupIOstreams(RpcClient.java:813) at org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1422) at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1315) at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1532) at
[jira] [Commented] (HBASE-8699) Parameter to DistributedFileSystem#isFileClosed should be of type Path
[ https://issues.apache.org/jira/browse/HBASE-8699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680739#comment-13680739 ] Ted Yu commented on HBASE-8699: --- See if I understand correctly. We utilize this method: {code} public static String getVersion() { {code} and check the return String for certain releases we know DistributedFileSystem#isFileClosed(Path ) is present. Parameter to DistributedFileSystem#isFileClosed should be of type Path -- Key: HBASE-8699 URL: https://issues.apache.org/jira/browse/HBASE-8699 Project: HBase Issue Type: Bug Components: wal Reporter: Ted Yu Assignee: Ted Yu Attachments: 8699-v1.txt Here is current code of FSHDFSUtils#isFileClosed(): {code} boolean isFileClosed(final DistributedFileSystem dfs, final Path p) { try { Method m = dfs.getClass().getMethod(isFileClosed, new Class?[] {String.class}); return (Boolean) m.invoke(dfs, p.toString()); {code} We look for isFileClosed method with parameter type of String. However, from hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java (branch-2): {code} public boolean isFileClosed(Path src) throws IOException { {code} The parameter type is of Path. This means we would get NoSuchMethodException. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8729) distributedLogReplay may hang during chained region server failure
[ https://issues.apache.org/jira/browse/HBASE-8729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680741#comment-13680741 ] Jeffrey Zhong commented on HBASE-8729: -- [~saint@gmail.com] Thanks for the good comments! I'll address your first two comments in the next patch(Ted addressed the second one already in the v2 patch). The interesting point is your last comment: {quote} Rather than a log replay handler, should we instead have M_SERVER_SHUTDOWN be its own type... and then make N executor slots for server shutdown handling rather than for log reaplay? Would then make the exit of server shutdown handler nicer in that when we leave it, we have processed the server rather than as we have in this patch where we go off to another executor for completion? {quote} If we don't introduce the new log replay handler, setting N is tricky and its value has to be big enough so that we won't end up in issue of the JIRA. The other alternative(not clean and error prone) is using one pool while limiting logReplay can use up to MaxThreads - 3 slots in order not to block all threads in the pool. How do you think? Thanks. distributedLogReplay may hang during chained region server failure -- Key: HBASE-8729 URL: https://issues.apache.org/jira/browse/HBASE-8729 Project: HBase Issue Type: Bug Components: MTTR Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Fix For: 0.98.0, 0.95.2 Attachments: 8729-v2.patch, hbase-8729.patch In a test, half cluster(in terms of region servers) was down and some log replay had incurred chained RS failures(receiving RS of a log replay failed again). Since by default, we only allow 3 concurrent SSH handlers(controlled by {code}this.executorService.startExecutorService(ExecutorType.MASTER_SERVER_OPERATIONS,conf.getInt(hbase.master.executor.serverops.threads, 3));{code}). If all 3 SSH handlers are doing logReplay(blocking call) and one of receiving RS fails again then logReplay will hang because regions of the newly failed RS can't be re-assigned to another live RS(no ssh handler will be processed due to max threads setting) and existing log replay will keep routing replay traffic to the dead RS. The fix is to submit logReplay work into a separate type of executor queue in order not to block SSH region assignment so that logReplay can route traffic to a live RS after retries and move forward. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8344) Improve the assignment when node failures happen to choose the secondary RS as the new primary RS
[ https://issues.apache.org/jira/browse/HBASE-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680745#comment-13680745 ] Enis Soztutar commented on HBASE-8344: -- Looks good to go. Improve the assignment when node failures happen to choose the secondary RS as the new primary RS - Key: HBASE-8344 URL: https://issues.apache.org/jira/browse/HBASE-8344 Project: HBase Issue Type: Sub-task Reporter: Devaraj Das Assignee: Devaraj Das Priority: Critical Fix For: 0.95.2 Attachments: hbase-8344-1.txt, hbase-8344-2.1.txt, hbase-8344-2.2.txt, hbase-8344-2.3.txt, hbase-8344-2.4.txt, hbase-8344-2.5.txt, hbase-8344-2.6.txt, hbase-8344-2.6.txt, hbase-8344-2.7.txt, hbase-8344-2.7.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8721) fix for bug that delete can mask puts that happened after the delete was entered
[ https://issues.apache.org/jira/browse/HBASE-8721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680744#comment-13680744 ] Sergey Shelukhin commented on HBASE-8721: - bq. But I think the inconsistency issue's root cause is the arguable behaviour that delete can mask puts that happened after the delete. A more intuitive and more reasonable behaviour is that a delete can only mask puts happened before it, and has no impact on puts happened after it. This would be inconsistent with puts happening after puts being masked by earlier puts, depending on timestamp; as in my example above. Timestamp's express purpose is the version, by default if you don't set it, it will be taken from server time. If you are setting explicit timestamps, you are explicitly telling HBase that it should withhold judgement about versions because you know what happens logically before and after in your system. If you are using timestamp otherwise for some convenience, you are misusing it. If this version semantic is removed, timestamp becomes simply a long tucked unto a KeyValue and should be removed, after all, we don't have a string or a boolean also added to KeyValue so that people could use them for their purposes. HBase already has columns and column families to do that. Timestamp has very explicit semantics and purpose right now. If you want time-based behavior then don't set timestamps and HBase will use time-based behavior. fix for bug that delete can mask puts that happened after the delete was entered Key: HBASE-8721 URL: https://issues.apache.org/jira/browse/HBASE-8721 Project: HBase Issue Type: Bug Components: regionserver Reporter: Feng Honghua Attachments: HBASE-8721-0.94-V0.patch this fix aims for bug mentioned in http://hbase.apache.org/book.html 5.8.2.1: Deletes mask puts, even puts that happened after the delete was entered. Remember that a delete writes a tombstone, which only disappears after then next major compaction has run. Suppose you do a delete of everything = T. After this you do a new put with a timestamp = T. This put, even if it happened after the delete, will be masked by the delete tombstone. Performing the put will not fail, but when you do a get you will notice the put did have no effect. It will start working again after the major compaction has run. These issues should not be a problem if you use always-increasing versions for new puts to a row. But they can occur even if you do not care about time: just do delete and put immediately after each other, and there is some chance they happen within the same millisecond. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3787) Increment is non-idempotent but client retries RPC
[ https://issues.apache.org/jira/browse/HBASE-3787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680751#comment-13680751 ] Sergey Shelukhin commented on HBASE-3787: - refer to the attached patch ;) I can remove the WAL part Increment is non-idempotent but client retries RPC -- Key: HBASE-3787 URL: https://issues.apache.org/jira/browse/HBASE-3787 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.94.4, 0.95.2 Reporter: dhruba borthakur Assignee: Sergey Shelukhin Priority: Critical Fix For: 0.95.1 Attachments: HBASE-3787-partial.patch, HBASE-3787-v0.patch, HBASE-3787-v1.patch, HBASE-3787-v2.patch, HBASE-3787-v3.patch, HBASE-3787-v4.patch, HBASE-3787-v5.patch, HBASE-3787-v5.patch The HTable.increment() operation is non-idempotent. The client retries the increment RPC a few times (as specified by configuration) before throwing an error to the application. This makes it possible that the same increment call be applied twice at the server. For increment operations, is it better to use HConnectionManager.getRegionServerWithoutRetries()? Another option would be to enhance the IPC module to make the RPC server correctly identify if the RPC is a retry attempt and handle accordingly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8721) fix for bug that delete can mask puts that happened after the delete was entered
[ https://issues.apache.org/jira/browse/HBASE-8721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680750#comment-13680750 ] Sergey Shelukhin commented on HBASE-8721: - (columns, or part of rowkey, as the case seems to be from your description) fix for bug that delete can mask puts that happened after the delete was entered Key: HBASE-8721 URL: https://issues.apache.org/jira/browse/HBASE-8721 Project: HBase Issue Type: Bug Components: regionserver Reporter: Feng Honghua Attachments: HBASE-8721-0.94-V0.patch this fix aims for bug mentioned in http://hbase.apache.org/book.html 5.8.2.1: Deletes mask puts, even puts that happened after the delete was entered. Remember that a delete writes a tombstone, which only disappears after then next major compaction has run. Suppose you do a delete of everything = T. After this you do a new put with a timestamp = T. This put, even if it happened after the delete, will be masked by the delete tombstone. Performing the put will not fail, but when you do a get you will notice the put did have no effect. It will start working again after the major compaction has run. These issues should not be a problem if you use always-increasing versions for new puts to a row. But they can occur even if you do not care about time: just do delete and put immediately after each other, and there is some chance they happen within the same millisecond. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8700) IntegrationTestBigLinkedList can fail due to random number collision
[ https://issues.apache.org/jira/browse/HBASE-8700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680759#comment-13680759 ] Enis Soztutar commented on HBASE-8700: -- Can we make the # command line args change backwards compatible? I also wanted to pre-split the table at creation to reduce the runtime. It becomes a little bit easier with this change. Should we do a follow up? IntegrationTestBigLinkedList can fail due to random number collision Key: HBASE-8700 URL: https://issues.apache.org/jira/browse/HBASE-8700 Project: HBase Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HBASE-8700-v0.patch, HBASE-8700-v1.patch The test can fail due to random number collision, claiming there are unreferenced elements for obvious reasons (we rewrite some link). Original Accumulo test has one-stage generation so it doesn't count unreferenced elements as failures, only undefined ones. With 200m longs out of half-long range the probability of collision is approx 0.2%. Moreover, without some way to debug, it's hard to debug what keys should be looked at in such cases -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8344) Improve the assignment when node failures happen to choose the secondary RS as the new primary RS
[ https://issues.apache.org/jira/browse/HBASE-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680763#comment-13680763 ] Nick Dimiduk commented on HBASE-8344: - +1 Improve the assignment when node failures happen to choose the secondary RS as the new primary RS - Key: HBASE-8344 URL: https://issues.apache.org/jira/browse/HBASE-8344 Project: HBase Issue Type: Sub-task Reporter: Devaraj Das Assignee: Devaraj Das Priority: Critical Fix For: 0.95.2 Attachments: hbase-8344-1.txt, hbase-8344-2.1.txt, hbase-8344-2.2.txt, hbase-8344-2.3.txt, hbase-8344-2.4.txt, hbase-8344-2.5.txt, hbase-8344-2.6.txt, hbase-8344-2.6.txt, hbase-8344-2.7.txt, hbase-8344-2.7.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4811) Support reverse Scan
[ https://issues.apache.org/jira/browse/HBASE-4811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-4811: -- Attachment: 4811-trunk-v10.txt Support reverse Scan Key: HBASE-4811 URL: https://issues.apache.org/jira/browse/HBASE-4811 Project: HBase Issue Type: Improvement Components: Client Affects Versions: 0.20.6, 0.94.7 Reporter: John Carrino Assignee: Liang Xie Attachments: 4811-trunk-v10.txt, 4811-trunk-v5.patch, HBase-4811-0.94.3modified.txt, HBase-4811-0.94-v2.txt, hbase-4811-trunkv1.patch, hbase-4811-trunkv4.patch, hbase-4811-trunkv6.patch, hbase-4811-trunkv7.patch, hbase-4811-trunkv8.patch, hbase-4811-trunkv9.patch All the documentation I find about HBase says that if you want forward and reverse scans you should just build 2 tables and one be ascending and one descending. Is there a fundamental reason that HBase only supports forward Scan? It seems like a lot of extra space overhead and coding overhead (to keep them in sync) to support 2 tables. I am assuming this has been discussed before, but I can't find the discussions anywhere about it or why it would be infeasible. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-8724) [0.94] ExportSnapshot should not use hbase.tmp.dir as a staging dir on hdfs
[ https://issues.apache.org/jira/browse/HBASE-8724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar resolved HBASE-8724. -- Resolution: Fixed Hadoop Flags: Reviewed Thanks for the reviews. I've committed this to 0.94. [0.94] ExportSnapshot should not use hbase.tmp.dir as a staging dir on hdfs --- Key: HBASE-8724 URL: https://issues.apache.org/jira/browse/HBASE-8724 Project: HBase Issue Type: Bug Components: mapreduce, snapshots Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.94.9 Attachments: hbase-8724_v1.patch On 0.94, ExportSnapshot uses hbase.tmp.dir as the job's staging directory on hdfs. However, hbase.tmp.dir is by definition a local directory, thus should not be used as an hdfs directory for the job. Trunk uses JobUtil.getStagingDir() which gets the staging dir from JobSubmissionFiles class in Hadoop, so trunk is fine. We've discovered this since it fails the test on windows, but this is not windows-specific as per above (like specifying hbase.tmp.dir as /var/hbase/tmp/ etc) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8344) Improve the assignment when node failures happen to choose the secondary RS as the new primary RS
[ https://issues.apache.org/jira/browse/HBASE-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj Das updated HBASE-8344: --- Resolution: Fixed Status: Resolved (was: Patch Available) Committed. Thanks for the reviews, folks. Improve the assignment when node failures happen to choose the secondary RS as the new primary RS - Key: HBASE-8344 URL: https://issues.apache.org/jira/browse/HBASE-8344 Project: HBase Issue Type: Sub-task Reporter: Devaraj Das Assignee: Devaraj Das Priority: Critical Fix For: 0.95.2 Attachments: hbase-8344-1.txt, hbase-8344-2.1.txt, hbase-8344-2.2.txt, hbase-8344-2.3.txt, hbase-8344-2.4.txt, hbase-8344-2.5.txt, hbase-8344-2.6.txt, hbase-8344-2.6.txt, hbase-8344-2.7.txt, hbase-8344-2.7.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8696) Fixup for logs that show when running hbase-it tests.
[ https://issues.apache.org/jira/browse/HBASE-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680776#comment-13680776 ] Sergey Shelukhin commented on HBASE-8696: - +1 Fixup for logs that show when running hbase-it tests. - Key: HBASE-8696 URL: https://issues.apache.org/jira/browse/HBASE-8696 Project: HBase Issue Type: Improvement Reporter: stack Assignee: stack Fix For: 0.95.1 Attachments: 8696v2.txt, 8696v3.txt, 8696v4.txt, 8698.txt I've been staring at logs trying to figure why hbase-it tests fail. Here are some more log cleanups that come of my frustration trying to read our emissions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8700) IntegrationTestBigLinkedList can fail due to random number collision
[ https://issues.apache.org/jira/browse/HBASE-8700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680796#comment-13680796 ] Enis Soztutar commented on HBASE-8700: -- Offline discussion with Sergey, it seems that this is already BC in regards to the command line args. +1 on commit. IntegrationTestBigLinkedList can fail due to random number collision Key: HBASE-8700 URL: https://issues.apache.org/jira/browse/HBASE-8700 Project: HBase Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HBASE-8700-v0.patch, HBASE-8700-v1.patch The test can fail due to random number collision, claiming there are unreferenced elements for obvious reasons (we rewrite some link). Original Accumulo test has one-stage generation so it doesn't count unreferenced elements as failures, only undefined ones. With 200m longs out of half-long range the probability of collision is approx 0.2%. Moreover, without some way to debug, it's hard to debug what keys should be looked at in such cases -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8702) Make WALEditCodec pluggable
[ https://issues.apache.org/jira/browse/HBASE-8702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680799#comment-13680799 ] Jesse Yates commented on HBASE-8702: Thanks Sergey! I'm planning on committing to trunk tomorrow, unless there are objections. Make WALEditCodec pluggable --- Key: HBASE-8702 URL: https://issues.apache.org/jira/browse/HBASE-8702 Project: HBase Issue Type: Improvement Components: Replication, wal Reporter: Jesse Yates Assignee: Jesse Yates Fix For: 0.98.0, 0.95.2, 0.94.9 Attachments: hbase-8702-0.94-v0.patch, hbase-8702-trunk-v0.patch, hbase-8702-trunk-v1.patch WALEditCode needs to be pluggable to support alternative serialziation mechanisms. The open question here is whether to support the alternative codec when doing replication - both clusters would need the codec on the classpath, which has additional overhead and also will be a little bit complicated when making the WAL serialization backwards compatible in 0.94. This is the follow-up to HBASE-8636. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8700) IntegrationTestBigLinkedList can fail due to random number collision
[ https://issues.apache.org/jira/browse/HBASE-8700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680803#comment-13680803 ] Sergey Shelukhin commented on HBASE-8700: - latter - maybe former - they are IntegrationTestBigLinkedList can fail due to random number collision Key: HBASE-8700 URL: https://issues.apache.org/jira/browse/HBASE-8700 Project: HBase Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HBASE-8700-v0.patch, HBASE-8700-v1.patch The test can fail due to random number collision, claiming there are unreferenced elements for obvious reasons (we rewrite some link). Original Accumulo test has one-stage generation so it doesn't count unreferenced elements as failures, only undefined ones. With 200m longs out of half-long range the probability of collision is approx 0.2%. Moreover, without some way to debug, it's hard to debug what keys should be looked at in such cases -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8617) Introducing a new config to disable writes during recovering
[ https://issues.apache.org/jira/browse/HBASE-8617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680808#comment-13680808 ] Jeffrey Zhong commented on HBASE-8617: -- [~ted_yu] are you good on v2 patch? Thanks. Introducing a new config to disable writes during recovering - Key: HBASE-8617 URL: https://issues.apache.org/jira/browse/HBASE-8617 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.98.0, 0.95.1 Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Attachments: HBASE-8617.patch, HBASE-8617-v2.patch In distributedLogReplay(hbase-7006), we allow writes even when a region is in recovering. It may cause undesired behavior when applications(or deployments) already are near its write capacity because distributedLogReplay generates more write traffic to remaining region servers. The new config hbase.regionserver.disallow.writes.when.recovering tries to address the above situation so that recovering won't be affected by application normal write traffic. The default value of this config is false(meaning allow writes in recovery) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8617) Introducing a new config to disable writes during recovering
[ https://issues.apache.org/jira/browse/HBASE-8617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680811#comment-13680811 ] Ted Yu commented on HBASE-8617: --- +1 Introducing a new config to disable writes during recovering - Key: HBASE-8617 URL: https://issues.apache.org/jira/browse/HBASE-8617 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.98.0, 0.95.1 Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Attachments: HBASE-8617.patch, HBASE-8617-v2.patch In distributedLogReplay(hbase-7006), we allow writes even when a region is in recovering. It may cause undesired behavior when applications(or deployments) already are near its write capacity because distributedLogReplay generates more write traffic to remaining region servers. The new config hbase.regionserver.disallow.writes.when.recovering tries to address the above situation so that recovering won't be affected by application normal write traffic. The default value of this config is false(meaning allow writes in recovery) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8652) Number of compacting KVs is not reset at the end of compaction
[ https://issues.apache.org/jira/browse/HBASE-8652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-8652: -- Component/s: Compaction Number of compacting KVs is not reset at the end of compaction -- Key: HBASE-8652 URL: https://issues.apache.org/jira/browse/HBASE-8652 Project: HBase Issue Type: Bug Components: Compaction Reporter: Ted Yu Priority: Minor Looking at master:60010/master-status#compactStas , I noticed that 'Num. Compacting KVs' column stays unchanged at non-zero value(s). In DefaultCompactor#compact(), we have this at the beginning: {code} this.progress = new CompactionProgress(fd.maxKeyCount); {code} But progress.totalCompactingKVs is not reset at the end of compact(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8664) Small fix ups for memory size outputs in UI
[ https://issues.apache.org/jira/browse/HBASE-8664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680832#comment-13680832 ] Hudson commented on HBASE-8664: --- Integrated in hbase-0.95-on-hadoop2 #129 (See [https://builds.apache.org/job/hbase-0.95-on-hadoop2/129/]) HBASE-8664 Small fix ups for memory size outputs in UI (Revision 1491903) Result = FAILURE stack : Files : * /hbase/branches/0.95/hbase-server/src/main/jamon/org/apache/hadoop/hbase/tmpl/master/RegionServerListTmpl.jamon * /hbase/branches/0.95/hbase-server/src/main/jamon/org/apache/hadoop/hbase/tmpl/regionserver/RegionListTmpl.jamon * /hbase/branches/0.95/hbase-server/src/main/jamon/org/apache/hadoop/hbase/tmpl/regionserver/ServerMetricsTmpl.jamon Small fix ups for memory size outputs in UI --- Key: HBASE-8664 URL: https://issues.apache.org/jira/browse/HBASE-8664 Project: HBase Issue Type: Bug Components: UI Reporter: stack Assignee: stack Fix For: 0.98.0, 0.95.1 Attachments: ui.txt This issue goes in the 'polish' category. On regionserver ui, we were listing raw bytes for heap size, memstore size, etc. I put in place StringUtils.humanReadableInt (looked to see if bootstrap could do it for us but doesn't seem so, not w/o plugin). I then made all the megabytes and kilobytes match StringUtils.humanReadableInt with its 'm' instead of 'MB' and 'k' instead of KB. Removed a stray KB that was in the wrong place too. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8344) Improve the assignment when node failures happen to choose the secondary RS as the new primary RS
[ https://issues.apache.org/jira/browse/HBASE-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680831#comment-13680831 ] Hudson commented on HBASE-8344: --- Integrated in hbase-0.95-on-hadoop2 #129 (See [https://builds.apache.org/job/hbase-0.95-on-hadoop2/129/]) HBASE-8344. Improves the assignment when node failures happen to choose the secondary RS as the new primary RS (Revision 1491996) Result = FAILURE ddas : Files : * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/FavoredNodeAssignmentHelper.java * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/FavoredNodeLoadBalancer.java * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/FavoredNodes.java * /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestRegionPlacement.java * /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestFavoredNodeAssignmentHelper.java Improve the assignment when node failures happen to choose the secondary RS as the new primary RS - Key: HBASE-8344 URL: https://issues.apache.org/jira/browse/HBASE-8344 Project: HBase Issue Type: Sub-task Reporter: Devaraj Das Assignee: Devaraj Das Priority: Critical Fix For: 0.95.2 Attachments: hbase-8344-1.txt, hbase-8344-2.1.txt, hbase-8344-2.2.txt, hbase-8344-2.3.txt, hbase-8344-2.4.txt, hbase-8344-2.5.txt, hbase-8344-2.6.txt, hbase-8344-2.6.txt, hbase-8344-2.7.txt, hbase-8344-2.7.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8724) [0.94] ExportSnapshot should not use hbase.tmp.dir as a staging dir on hdfs
[ https://issues.apache.org/jira/browse/HBASE-8724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680843#comment-13680843 ] Hudson commented on HBASE-8724: --- Integrated in HBase-0.94-security #164 (See [https://builds.apache.org/job/HBase-0.94-security/164/]) HBASE-8724 [0.94] ExportSnapshot should not use hbase.tmp.dir as a staging dir on hdfs (Revision 1491993) Result = SUCCESS enis : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/snapshot/ExportSnapshot.java [0.94] ExportSnapshot should not use hbase.tmp.dir as a staging dir on hdfs --- Key: HBASE-8724 URL: https://issues.apache.org/jira/browse/HBASE-8724 Project: HBase Issue Type: Bug Components: mapreduce, snapshots Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.94.9 Attachments: hbase-8724_v1.patch On 0.94, ExportSnapshot uses hbase.tmp.dir as the job's staging directory on hdfs. However, hbase.tmp.dir is by definition a local directory, thus should not be used as an hdfs directory for the job. Trunk uses JobUtil.getStagingDir() which gets the staging dir from JobSubmissionFiles class in Hadoop, so trunk is fine. We've discovered this since it fails the test on windows, but this is not windows-specific as per above (like specifying hbase.tmp.dir as /var/hbase/tmp/ etc) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8724) [0.94] ExportSnapshot should not use hbase.tmp.dir as a staging dir on hdfs
[ https://issues.apache.org/jira/browse/HBASE-8724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680849#comment-13680849 ] Hudson commented on HBASE-8724: --- Integrated in HBase-0.94 #1010 (See [https://builds.apache.org/job/HBase-0.94/1010/]) HBASE-8724 [0.94] ExportSnapshot should not use hbase.tmp.dir as a staging dir on hdfs (Revision 1491993) Result = SUCCESS enis : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/snapshot/ExportSnapshot.java [0.94] ExportSnapshot should not use hbase.tmp.dir as a staging dir on hdfs --- Key: HBASE-8724 URL: https://issues.apache.org/jira/browse/HBASE-8724 Project: HBase Issue Type: Bug Components: mapreduce, snapshots Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.94.9 Attachments: hbase-8724_v1.patch On 0.94, ExportSnapshot uses hbase.tmp.dir as the job's staging directory on hdfs. However, hbase.tmp.dir is by definition a local directory, thus should not be used as an hdfs directory for the job. Trunk uses JobUtil.getStagingDir() which gets the staging dir from JobSubmissionFiles class in Hadoop, so trunk is fine. We've discovered this since it fails the test on windows, but this is not windows-specific as per above (like specifying hbase.tmp.dir as /var/hbase/tmp/ etc) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8344) Improve the assignment when node failures happen to choose the secondary RS as the new primary RS
[ https://issues.apache.org/jira/browse/HBASE-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680856#comment-13680856 ] Hudson commented on HBASE-8344: --- Integrated in HBase-TRUNK #4174 (See [https://builds.apache.org/job/HBase-TRUNK/4174/]) HBASE-8344. Improves the assignment when node failures happen to choose the secondary RS as the new primary RS (Revision 1491994) Result = FAILURE ddas : Files : * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/FavoredNodeAssignmentHelper.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/FavoredNodeLoadBalancer.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/FavoredNodes.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestRegionPlacement.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestFavoredNodeAssignmentHelper.java Improve the assignment when node failures happen to choose the secondary RS as the new primary RS - Key: HBASE-8344 URL: https://issues.apache.org/jira/browse/HBASE-8344 Project: HBase Issue Type: Sub-task Reporter: Devaraj Das Assignee: Devaraj Das Priority: Critical Fix For: 0.95.2 Attachments: hbase-8344-1.txt, hbase-8344-2.1.txt, hbase-8344-2.2.txt, hbase-8344-2.3.txt, hbase-8344-2.4.txt, hbase-8344-2.5.txt, hbase-8344-2.6.txt, hbase-8344-2.6.txt, hbase-8344-2.7.txt, hbase-8344-2.7.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8344) Improve the assignment when node failures happen to choose the secondary RS as the new primary RS
[ https://issues.apache.org/jira/browse/HBASE-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680864#comment-13680864 ] Hudson commented on HBASE-8344: --- Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #564 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/564/]) HBASE-8344. Improves the assignment when node failures happen to choose the secondary RS as the new primary RS (Revision 1491994) Result = FAILURE ddas : Files : * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/FavoredNodeAssignmentHelper.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/FavoredNodeLoadBalancer.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/FavoredNodes.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestRegionPlacement.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestFavoredNodeAssignmentHelper.java Improve the assignment when node failures happen to choose the secondary RS as the new primary RS - Key: HBASE-8344 URL: https://issues.apache.org/jira/browse/HBASE-8344 Project: HBase Issue Type: Sub-task Reporter: Devaraj Das Assignee: Devaraj Das Priority: Critical Fix For: 0.95.2 Attachments: hbase-8344-1.txt, hbase-8344-2.1.txt, hbase-8344-2.2.txt, hbase-8344-2.3.txt, hbase-8344-2.4.txt, hbase-8344-2.5.txt, hbase-8344-2.6.txt, hbase-8344-2.6.txt, hbase-8344-2.7.txt, hbase-8344-2.7.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8664) Small fix ups for memory size outputs in UI
[ https://issues.apache.org/jira/browse/HBASE-8664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680865#comment-13680865 ] Hudson commented on HBASE-8664: --- Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #564 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/564/]) HBASE-8664 Small fix ups for memory size outputs in UI (Revision 1491902) Result = FAILURE stack : Files : * /hbase/trunk/hbase-server/src/main/jamon/org/apache/hadoop/hbase/tmpl/master/RegionServerListTmpl.jamon * /hbase/trunk/hbase-server/src/main/jamon/org/apache/hadoop/hbase/tmpl/regionserver/RegionListTmpl.jamon * /hbase/trunk/hbase-server/src/main/jamon/org/apache/hadoop/hbase/tmpl/regionserver/ServerMetricsTmpl.jamon Small fix ups for memory size outputs in UI --- Key: HBASE-8664 URL: https://issues.apache.org/jira/browse/HBASE-8664 Project: HBase Issue Type: Bug Components: UI Reporter: stack Assignee: stack Fix For: 0.98.0, 0.95.1 Attachments: ui.txt This issue goes in the 'polish' category. On regionserver ui, we were listing raw bytes for heap size, memstore size, etc. I put in place StringUtils.humanReadableInt (looked to see if bootstrap could do it for us but doesn't seem so, not w/o plugin). I then made all the megabytes and kilobytes match StringUtils.humanReadableInt with its 'm' instead of 'MB' and 'k' instead of KB. Removed a stray KB that was in the wrong place too. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4811) Support reverse Scan
[ https://issues.apache.org/jira/browse/HBASE-4811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680867#comment-13680867 ] Lars Hofhansl commented on HBASE-4811: -- v9/v10 is much nicer. Few comments: * do we need NonReversedNonLazyKeyValueScanner? Could add unsupported implementations for these methods to NonLazyKeyValueScanner. * Instead of leaking backwardSeek and seekToLastRow out of the Reversed* classes, should we have an initScan() (or maybe setup()) method on the scanners that does the right thing? I.e. a ReversedScanner would do the seekToLastRow/backwardSeek stuff, and a normal scanner would just seek. * This: {code} + @Override + public synchronized boolean reseek(KeyValue kv) throws IOException { +checkReseek(); +return heap.backwardSeek(kv); + } {code} and this {code} + @Override + public boolean backwardSeek(KeyValue key) throws IOException { +checkReseek(); +return this.heap.backwardSeek(key); + } {code} Is weird. It should either scan backwards or not? If we do what I suggested in the previous point, we would not need this, I think. That way only MemstoreScanner and StoreFileScanner would be special. And they have to special, because they are opened ahead of time (well, at least StoreFileScanner is). Sorry for being pain in the ***. Support reverse Scan Key: HBASE-4811 URL: https://issues.apache.org/jira/browse/HBASE-4811 Project: HBase Issue Type: Improvement Components: Client Affects Versions: 0.20.6, 0.94.7 Reporter: John Carrino Assignee: Liang Xie Attachments: 4811-trunk-v10.txt, 4811-trunk-v5.patch, HBase-4811-0.94.3modified.txt, HBase-4811-0.94-v2.txt, hbase-4811-trunkv1.patch, hbase-4811-trunkv4.patch, hbase-4811-trunkv6.patch, hbase-4811-trunkv7.patch, hbase-4811-trunkv8.patch, hbase-4811-trunkv9.patch All the documentation I find about HBase says that if you want forward and reverse scans you should just build 2 tables and one be ascending and one descending. Is there a fundamental reason that HBase only supports forward Scan? It seems like a lot of extra space overhead and coding overhead (to keep them in sync) to support 2 tables. I am assuming this has been discussed before, but I can't find the discussions anywhere about it or why it would be infeasible. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8344) Improve the assignment when node failures happen to choose the secondary RS as the new primary RS
[ https://issues.apache.org/jira/browse/HBASE-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680873#comment-13680873 ] Hudson commented on HBASE-8344: --- Integrated in hbase-0.95 #237 (See [https://builds.apache.org/job/hbase-0.95/237/]) HBASE-8344. Improves the assignment when node failures happen to choose the secondary RS as the new primary RS (Revision 1491996) Result = SUCCESS ddas : Files : * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/FavoredNodeAssignmentHelper.java * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/FavoredNodeLoadBalancer.java * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/FavoredNodes.java * /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestRegionPlacement.java * /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestFavoredNodeAssignmentHelper.java Improve the assignment when node failures happen to choose the secondary RS as the new primary RS - Key: HBASE-8344 URL: https://issues.apache.org/jira/browse/HBASE-8344 Project: HBase Issue Type: Sub-task Reporter: Devaraj Das Assignee: Devaraj Das Priority: Critical Fix For: 0.95.2 Attachments: hbase-8344-1.txt, hbase-8344-2.1.txt, hbase-8344-2.2.txt, hbase-8344-2.3.txt, hbase-8344-2.4.txt, hbase-8344-2.5.txt, hbase-8344-2.6.txt, hbase-8344-2.6.txt, hbase-8344-2.7.txt, hbase-8344-2.7.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8700) IntegrationTestBigLinkedList can fail due to random number collision
[ https://issues.apache.org/jira/browse/HBASE-8700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HBASE-8700: Attachment: HBASE-8700-0.94.patch 94 patch IntegrationTestBigLinkedList can fail due to random number collision Key: HBASE-8700 URL: https://issues.apache.org/jira/browse/HBASE-8700 Project: HBase Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HBASE-8700-0.94.patch, HBASE-8700-v0.patch, HBASE-8700-v1.patch The test can fail due to random number collision, claiming there are unreferenced elements for obvious reasons (we rewrite some link). Original Accumulo test has one-stage generation so it doesn't count unreferenced elements as failures, only undefined ones. With 200m longs out of half-long range the probability of collision is approx 0.2%. Moreover, without some way to debug, it's hard to debug what keys should be looked at in such cases -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8541) implement flush-into-stripes in stripe compactions
[ https://issues.apache.org/jira/browse/HBASE-8541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HBASE-8541: Status: Patch Available (was: Open) implement flush-into-stripes in stripe compactions -- Key: HBASE-8541 URL: https://issues.apache.org/jira/browse/HBASE-8541 Project: HBase Issue Type: Improvement Reporter: Sergey Shelukhin Attachments: HBASE-8541-latest-with-dependencies.patch, HBASE-8541-v0.patch Flush will be able to flush into multiple files under this design, avoiding L0 I/O amplification. I have the patch which is missing just one feature - support for concurrent flushes and stripe changes. This can be done via extensive try-locking of stripe changes and flushes, or advisory flags without blocking flushes, dumping conflicting flushes into L0 in case of (very rare) collisions. For file loading for the latter, a set-cover-like problem needs to be solved to determine optimal stripes. That will also address Jimmy's concern of getting rid of metadata, btw. However currently I don't have time for that. I plan to attach the try-locking patch first, but this won't happen for a couple weeks probably and should not block main reviews. Hopefully this will be added on top of main reviews. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8541) implement flush-into-stripes in stripe compactions
[ https://issues.apache.org/jira/browse/HBASE-8541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HBASE-8541: Attachment: HBASE-8541-latest-with-dependencies.patch HBASE-8541-v0.patch First cut of the patch. This is what I used for perf testing, so it's verified on cluster. It's based on previous stripe compaction patches up to HBASE-8000 implement flush-into-stripes in stripe compactions -- Key: HBASE-8541 URL: https://issues.apache.org/jira/browse/HBASE-8541 Project: HBase Issue Type: Improvement Reporter: Sergey Shelukhin Attachments: HBASE-8541-latest-with-dependencies.patch, HBASE-8541-v0.patch Flush will be able to flush into multiple files under this design, avoiding L0 I/O amplification. I have the patch which is missing just one feature - support for concurrent flushes and stripe changes. This can be done via extensive try-locking of stripe changes and flushes, or advisory flags without blocking flushes, dumping conflicting flushes into L0 in case of (very rare) collisions. For file loading for the latter, a set-cover-like problem needs to be solved to determine optimal stripes. That will also address Jimmy's concern of getting rid of metadata, btw. However currently I don't have time for that. I plan to attach the try-locking patch first, but this won't happen for a couple weeks probably and should not block main reviews. Hopefully this will be added on top of main reviews. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8715) HBase should support IO QOS
[ https://issues.apache.org/jira/browse/HBASE-8715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680895#comment-13680895 ] Sergey Shelukhin commented on HBASE-8715: - HBase uses HDFS as main backing storage, so this will have to go thru it to actual file system level; does there need to be an HDFS JIRA to plumb this thru? HBase should support IO QOS --- Key: HBASE-8715 URL: https://issues.apache.org/jira/browse/HBASE-8715 Project: HBase Issue Type: New Feature Reporter: Pritam Damania Priority: Minor The operating system exposes system calls like ioprio_set/get to set priorities for various threads doing IO. HBase can use this to accordingly prioritize operations like flushes/compactions/WAL write etc to use the disk bandwidth more efficiently. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4955) Use the official versions of surefire junit
[ https://issues.apache.org/jira/browse/HBASE-4955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680897#comment-13680897 ] Nicolas Liochon commented on HBASE-4955: Surefire 2.15 is available. I will give it a try 'soon'. Use the official versions of surefire junit - Key: HBASE-4955 URL: https://issues.apache.org/jira/browse/HBASE-4955 Project: HBase Issue Type: Improvement Components: test Affects Versions: 0.94.0 Environment: all Reporter: Nicolas Liochon Assignee: Nicolas Liochon Priority: Critical Attachments: 4955.v1.patch, 4955.v2.patch, 4955.v2.patch, 4955.v2.patch, 4955.v2.patch, 4955.v3.patch, 4955.v3.patch, 4955.v3.patch, 4955.v4.patch, 4955.v4.patch, 4955.v4.patch, 4955.v4.patch, 4955.v4.patch, 4955.v4.patch, 4955.v5.patch, 8204.v4.patch We currently use private versions for Surefire JUnit since HBASE-4763. This JIRA traks what we need to move to official versions. Surefire 2.11 is just out, but, after some tests, it does not contain all what we need. JUnit. Could be for JUnit 4.11. Issue to monitor: https://github.com/KentBeck/junit/issues/359: fixed in our version, no feedback for an integration on trunk Surefire: Could be for Surefire 2.12. Issues to monitor are: 329 (category support): fixed, we use the official implementation from the trunk 786 (@Category with forkMode=always): fixed, we use the official implementation from the trunk 791 (incorrect elapsed time on test failure): fixed, we use the official implementation from the trunk 793 (incorrect time in the XML report): Not fixed (reopen) on trunk, fixed on our version. 760 (does not take into account the test method): fixed in trunk, not fixed in our version 798 (print immediately the test class name): not fixed in trunk, not fixed in our version 799 (Allow test parallelization when forkMode=always): not fixed in trunk, not fixed in our version 800 (redirectTestOutputToFile not taken into account): not yet fix on trunk, fixed on our version 800 793 are the more important to monitor, it's the only ones that are fixed in our version but not on trunk. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8706) Some improvement in snapshot
[ https://issues.apache.org/jira/browse/HBASE-8706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680896#comment-13680896 ] binlijin commented on HBASE-8706: - Patch looks good for me too. Some improvement in snapshot Key: HBASE-8706 URL: https://issues.apache.org/jira/browse/HBASE-8706 Project: HBase Issue Type: Bug Components: snapshots Affects Versions: 0.94.8, 0.95.0 Reporter: binlijin Attachments: HBASE-8706-2.patch, HBASE-8706-3.patch, HBASE-8706.patch, HBASE-8706-v4.patch (1)timeout for Procedure can not be configured. {code} Procedure's timeout ProcedureCoordinator final static long TIMEOUT_MILLIS_DEFAULT = 6; createProcedure(ForeignExceptionDispatcher fed, String procName, byte[] procArgs, ListString expectedMembers) { // build the procedure return new Procedure(this, fed, WAKE_MILLIS_DEFAULT, TIMEOUT_MILLIS_DEFAULT, procName, procArgs, expectedMembers); } RegionServerSnapshotManager: /** Conf key for max time to keep threads in snapshot request pool waiting */ public static final String SNAPSHOT_TIMEOUT_MILLIS_KEY = hbase.snapshot.region.timeout; /** Keep threads alive in request pool for max of 60 seconds */ public static final long SNAPSHOT_TIMEOUT_MILLIS_DEFAULT = 6; public Subprocedure buildSubprocedure(SnapshotDescription snapshot) { long timeoutMillis = conf.getLong(SNAPSHOT_TIMEOUT_MILLIS_KEY, SNAPSHOT_TIMEOUT_MILLIS_DEFAULT); case FLUSH: SnapshotSubprocedurePool taskManager = new SnapshotSubprocedurePool(rss.getServerName().toString(), conf); } {code} (2)TakeSnapshotHandler after snapshotRegions we should call monitor.rethrowException(); to check if there is exception and if there is we can skip the verifySnapshot (3)too much error message when error happened in some place. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8700) IntegrationTestBigLinkedList can fail due to random number collision
[ https://issues.apache.org/jira/browse/HBASE-8700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680899#comment-13680899 ] Hudson commented on HBASE-8700: --- Integrated in HBase-TRUNK #4175 (See [https://builds.apache.org/job/HBase-TRUNK/4175/]) HBASE-8700 IntegrationTestBigLinkedList can fail due to random number collision (Revision 1492034) Result = FAILURE sershe : Files : * /hbase/trunk/hbase-it/src/test/java/org/apache/hadoop/hbase/test/IntegrationTestBigLinkedList.java IntegrationTestBigLinkedList can fail due to random number collision Key: HBASE-8700 URL: https://issues.apache.org/jira/browse/HBASE-8700 Project: HBase Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HBASE-8700-0.94.patch, HBASE-8700-v0.patch, HBASE-8700-v1.patch The test can fail due to random number collision, claiming there are unreferenced elements for obvious reasons (we rewrite some link). Original Accumulo test has one-stage generation so it doesn't count unreferenced elements as failures, only undefined ones. With 200m longs out of half-long range the probability of collision is approx 0.2%. Moreover, without some way to debug, it's hard to debug what keys should be looked at in such cases -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8705) RS holding META when restarted in a single node setup may hang infinitely without META assignment
[ https://issues.apache.org/jira/browse/HBASE-8705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680901#comment-13680901 ] ramkrishna.s.vasudevan commented on HBASE-8705: --- Thanks Stack. Will wait for another day before committing this. RS holding META when restarted in a single node setup may hang infinitely without META assignment - Key: HBASE-8705 URL: https://issues.apache.org/jira/browse/HBASE-8705 Project: HBase Issue Type: Bug Affects Versions: 0.95.1 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Minor Fix For: 0.98.0 Attachments: HBASE-8705.patch This bug may be minor as it likely to happen in a single node setup. I restarted the RS holding META. The master tried assigning META using MetaSSH. But tried this before the new RS came up. So as not region plan is found {code} if (plan == null) { LOG.warn(Unable to determine a plan to assign + region); if (tomActivated){ this.timeoutMonitor.setAllRegionServersOffline(true); } else { regionStates.updateRegionState(region, RegionState.State.FAILED_OPEN); } return; } {code} we just return without assigment. And this being the META the small cluster just hangs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8700) IntegrationTestBigLinkedList can fail due to random number collision
[ https://issues.apache.org/jira/browse/HBASE-8700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680911#comment-13680911 ] Hadoop QA commented on HBASE-8700: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12587355/HBASE-8700-0.94.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified tests. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6011//console This message is automatically generated. IntegrationTestBigLinkedList can fail due to random number collision Key: HBASE-8700 URL: https://issues.apache.org/jira/browse/HBASE-8700 Project: HBase Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HBASE-8700-0.94.patch, HBASE-8700-v0.patch, HBASE-8700-v1.patch The test can fail due to random number collision, claiming there are unreferenced elements for obvious reasons (we rewrite some link). Original Accumulo test has one-stage generation so it doesn't count unreferenced elements as failures, only undefined ones. With 200m longs out of half-long range the probability of collision is approx 0.2%. Moreover, without some way to debug, it's hard to debug what keys should be looked at in such cases -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8667) Master and Regionserver not able to communicate if both bound to different network interfaces on the same machine.
[ https://issues.apache.org/jira/browse/HBASE-8667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680921#comment-13680921 ] Anoop Sam John commented on HBASE-8667: --- bq.Are you suggesting that regionserver only set up an rpcserver after it has gotten name from master? What if this disagrees w/ what the operator told us use in the configuration? Correct. I dont think this is good.. Here the issue was RS RPCServer bind with an ip. Now when the RS reports to Master the client socket was getting bound with another n/w interface and so master when it checks the hostname of the RS, it sees another name. Master now on will use that to communicate with RS but RS side there is no RPC server bound with this hostname/ip.. So this RS is like not in cluster at all.. When Master and RS are in seperate nodes and RS node is having 2 n/w interfaces and operator want to bind RS with a specific n/w interface, then also this issue may come up? Master and Regionserver not able to communicate if both bound to different network interfaces on the same machine. -- Key: HBASE-8667 URL: https://issues.apache.org/jira/browse/HBASE-8667 Project: HBase Issue Type: Bug Components: IPC/RPC Reporter: rajeshbabu Fix For: 0.98.0, 0.95.2, 0.94.9 Attachments: HBASE-8667_Trunk.patch, HBASE-8667_Trunk-V2.patch While testing HBASE-8640 fix found that master and regionserver running on different interfaces are not communicating properly. I have two interfaces 1) lo 2) eth0 in my machine and default hostname interface is lo. I have configured master ipc address to ip of eth0 interface. Started master and regionserver on the same machine. 1) master rpc server bound to eth0 and RS rpc server bound to lo 2) Since rpc client is not binding to any ip address, when RS is reporting RS startup its getting registered with eth0 ip address(but actually it should register localhost) Here are RS logs: {code} 2013-05-31 06:05:28,608 WARN [regionserver60020] org.apache.hadoop.hbase.regionserver.HRegionServer: reportForDuty failed; sleeping and then retrying. 2013-05-31 06:05:31,609 INFO [regionserver60020] org.apache.hadoop.hbase.regionserver.HRegionServer: Attempting connect to Master server at 192.168.0.100,6,1369960497008 2013-05-31 06:05:31,609 INFO [regionserver60020] org.apache.hadoop.hbase.regionserver.HRegionServer: Telling master at 192.168.0.100,6,1369960497008 that we are up with port=60020, startcode=1369960502544 2013-05-31 06:05:31,618 DEBUG [regionserver60020] org.apache.hadoop.hbase.regionserver.HRegionServer: Config from master: hbase.rootdir=hdfs://localhost:2851/hbase 2013-05-31 06:05:31,618 DEBUG [regionserver60020] org.apache.hadoop.hbase.regionserver.HRegionServer: Config from master: fs.default.name=hdfs://localhost:2851 2013-05-31 06:05:31,618 INFO [regionserver60020] org.apache.hadoop.hbase.regionserver.HRegionServer: Master passed us a different hostname to use; was=localhost, but now=192.168.0.100 {code} Here are master logs: {code} 2013-05-31 06:05:31,615 INFO [IPC Server handler 9 on 6] org.apache.hadoop.hbase.master.ServerManager: Registering server=192.168.0.100,60020,1369960502544 {code} Since master has wrong rpc server address of RS, META is not getting assigned. {code} 2013-05-31 06:05:34,362 DEBUG [master-192.168.0.100,6,1369960497008] org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for .META.,,1.1028785192 so generated a random one; hri=.META.,,1.1028785192, src=, dest=192.168.0.100,60020,1369960502544; 1 (online=1, available=1) available servers, forceNewPlan=false - org.apache.hadoop.hbase.master.AssignmentManager: Failed assignment of .META.,,1.1028785192 to 192.168.0.100,60020,1369960502544, trying to assign elsewhere instead; try=1 of 10 java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:592) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:511) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:481) at org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupConnection(RpcClient.java:549) at org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupIOstreams(RpcClient.java:813) at org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1422) at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1315) at