[jira] [Created] (HBASE-6515) Setting request size with protobuf
Himanshu Vashishtha created HBASE-6515: -- Summary: Setting request size with protobuf Key: HBASE-6515 URL: https://issues.apache.org/jira/browse/HBASE-6515 Project: HBase Issue Type: Bug Affects Versions: 0.96.0 Reporter: Himanshu Vashishtha While running replication on upstream code, I am hitting the size-limit exception while sending WALEdits to a different cluster. {code} com.google.protobuf.InvalidProtocolBufferException: IPC server unable to read call parameters: Protocol message was too large. May be malicious. Use CodedInputStream.setSizeLimit() to increase the size limit. {code} Do we have a property to set some max size or something? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6364) Powering down the server host holding the .META. table causes HBase Client to take excessively long to recover and connect to reassigned .META. table
[ https://issues.apache.org/jira/browse/HBASE-6364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429012#comment-13429012 ] nkeywal commented on HBASE-6364: Reanalyzing the fix, there is an issue with the v1: we could have a call added to a dying connection, and this call won't get cleaned up. This is was not possible previously. Will write a v2. Powering down the server host holding the .META. table causes HBase Client to take excessively long to recover and connect to reassigned .META. table - Key: HBASE-6364 URL: https://issues.apache.org/jira/browse/HBASE-6364 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.6, 0.92.1, 0.94.0 Reporter: Suraj Varma Assignee: nkeywal Labels: client Attachments: 6364-host-serving-META.v1.patch, 6364.v1.patch, 6364.v1.patch, stacktrace.txt When a server host with a Region Server holding the .META. table is powered down on a live cluster, while the HBase cluster itself detects and reassigns the .META. table, connected HBase Client's take an excessively long time to detect this and re-discover the reassigned .META. Workaround: Decrease the ipc.socket.timeout on HBase Client side to a low value (default is 20s leading to 35 minute recovery time; we were able to get acceptable results with 100ms getting a 3 minute recovery) This was found during some hardware failure testing scenarios. Test Case: 1) Apply load via client app on HBase cluster for several minutes 2) Power down the region server holding the .META. server (i.e. power off ... and keep it off) 3) Measure how long it takes for cluster to reassign META table and for client threads to re-lookup and re-orient to the lesser cluster (minus the RS and DN on that host). Observation: 1) Client threads spike up to maxThreads size ... and take over 35 mins to recover (i.e. for the thread count to go back to normal) - no client calls are serviced - they just back up on a synchronized method (see #2 below) 2) All the client app threads queue up behind the oahh.ipc.HBaseClient#setupIOStreams method http://tinyurl.com/7js53dj After taking several thread dumps we found that the thread within this synchronized method was blocked on NetUtils.connect(this.socket, remoteId.getAddress(), getSocketTimeout(conf)); The client thread that gets the synchronized lock would try to connect to the dead RS (till socket times out after 20s), retries, and then the next thread gets in and so forth in a serial manner. Workaround: --- Default ipc.socket.timeout is set to 20s. We dropped this to a low number (1000 ms, 100 ms, etc) on the client side hbase-site.xml. With this setting, the client threads recovered in a couple of minutes by failing fast and re-discovering the .META. table on a reassigned RS. Assumption: This ipc.socket.timeout is only ever used during the initial HConnection setup via the NetUtils.connect and should only ever be used when connectivity to a region server is lost and needs to be re-established. i.e it does not affect the normal RPC actiivity as this is just the connect timeout. During RS GC periods, any _new_ clients trying to connect will fail and will require .META. table re-lookups. This above timeout workaround is only for the HBase client side. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6372) Add scanner batching to Export job
[ https://issues.apache.org/jira/browse/HBASE-6372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429014#comment-13429014 ] Alexander Alten-Lorenz commented on HBASE-6372: --- Hm, these failures can be ignored, right? I was testing with trunk and got no errors. Add scanner batching to Export job -- Key: HBASE-6372 URL: https://issues.apache.org/jira/browse/HBASE-6372 Project: HBase Issue Type: Improvement Components: mapreduce Affects Versions: 0.96.0, 0.94.2 Reporter: Lars George Assignee: Shengsheng Huang Priority: Minor Labels: newbie Attachments: HBASE-6372.2.patch, HBASE-6372.3.patch, HBASE-6372.4.patch, HBASE-6372.patch When a single row is too large for the RS heap then an OOME can take out the entire RS. Setting scanner batching in custom scans helps avoiding this scenario, but for the supplied Export job this is not set. Similar to HBASE-3421 we can set the batching to a low number - or if needed make it a command line option. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6444) Expose the ability to set custom HTTP Request Headers for the REST client used by RemoteHTable
[ https://issues.apache.org/jira/browse/HBASE-6444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429017#comment-13429017 ] Alexander Alten-Lorenz commented on HBASE-6444: --- Here too, I was testing the patch and my local trunk was compiling without errors Expose the ability to set custom HTTP Request Headers for the REST client used by RemoteHTable -- Key: HBASE-6444 URL: https://issues.apache.org/jira/browse/HBASE-6444 Project: HBase Issue Type: Improvement Components: rest Reporter: Erich Hochmuth Assignee: Jimmy Xiang Attachments: HBASE-6444-0.94.patch, HBASE-6444.patch, trunk-6444.patch Original Estimate: 48h Remaining Estimate: 48h My corporate security office (ISO) requires that all http traffic get routed through a Web Access Management layer (http://en.wikipedia.org/wiki/Web_access_management) Our Hadoop cluster has been segmented by a virtual network with all access to HBase from outside clients being managed through HBase Stargate rest server. The corporate WAM system requires that all http clients authenticate with it first before making any http request to any http service in the corporate network. After the http client authenticates with the WAM system the WAM system returns the client a set of values that must be inserted into a http cookie and request header of all future http requests to other http clients. This would mean that all requests through the RemoteHTable interface would require that this cookie and request header be set as part of the http request. org.apache.hadoop.hbase.rest.client.Client looks like the appropriate place that this functionality would need to be plugged into. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6372) Add scanner batching to Export job
[ https://issues.apache.org/jira/browse/HBASE-6372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429020#comment-13429020 ] Shengsheng Huang commented on HBASE-6372: - It seems the current Scan#setBatch function may throw exception. Details as below: {quote} public void setBatch(int batch) { if (this.hasFilter() this.filter.hasFilterRow()) { throw new IncompatibleFilterException( Cannot set batch on a scan using a filter + that returns true for filter.hasFilterRow); } this.batch = batch; } {quote} Don't we need to catch that exception in this patch? Add scanner batching to Export job -- Key: HBASE-6372 URL: https://issues.apache.org/jira/browse/HBASE-6372 Project: HBase Issue Type: Improvement Components: mapreduce Affects Versions: 0.96.0, 0.94.2 Reporter: Lars George Assignee: Shengsheng Huang Priority: Minor Labels: newbie Attachments: HBASE-6372.2.patch, HBASE-6372.3.patch, HBASE-6372.4.patch, HBASE-6372.patch When a single row is too large for the RS heap then an OOME can take out the entire RS. Setting scanner batching in custom scans helps avoiding this scenario, but for the supplied Export job this is not set. Similar to HBASE-3421 we can set the batching to a low number - or if needed make it a command line option. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6364) Powering down the server host holding the .META. table causes HBase Client to take excessively long to recover and connect to reassigned .META. table
[ https://issues.apache.org/jira/browse/HBASE-6364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-6364: --- Status: Open (was: Patch Available) Powering down the server host holding the .META. table causes HBase Client to take excessively long to recover and connect to reassigned .META. table - Key: HBASE-6364 URL: https://issues.apache.org/jira/browse/HBASE-6364 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.94.0, 0.92.1, 0.90.6 Reporter: Suraj Varma Assignee: nkeywal Labels: client Attachments: 6364-host-serving-META.v1.patch, 6364.v1.patch, 6364.v1.patch, 6364.v2.patch, stacktrace.txt When a server host with a Region Server holding the .META. table is powered down on a live cluster, while the HBase cluster itself detects and reassigns the .META. table, connected HBase Client's take an excessively long time to detect this and re-discover the reassigned .META. Workaround: Decrease the ipc.socket.timeout on HBase Client side to a low value (default is 20s leading to 35 minute recovery time; we were able to get acceptable results with 100ms getting a 3 minute recovery) This was found during some hardware failure testing scenarios. Test Case: 1) Apply load via client app on HBase cluster for several minutes 2) Power down the region server holding the .META. server (i.e. power off ... and keep it off) 3) Measure how long it takes for cluster to reassign META table and for client threads to re-lookup and re-orient to the lesser cluster (minus the RS and DN on that host). Observation: 1) Client threads spike up to maxThreads size ... and take over 35 mins to recover (i.e. for the thread count to go back to normal) - no client calls are serviced - they just back up on a synchronized method (see #2 below) 2) All the client app threads queue up behind the oahh.ipc.HBaseClient#setupIOStreams method http://tinyurl.com/7js53dj After taking several thread dumps we found that the thread within this synchronized method was blocked on NetUtils.connect(this.socket, remoteId.getAddress(), getSocketTimeout(conf)); The client thread that gets the synchronized lock would try to connect to the dead RS (till socket times out after 20s), retries, and then the next thread gets in and so forth in a serial manner. Workaround: --- Default ipc.socket.timeout is set to 20s. We dropped this to a low number (1000 ms, 100 ms, etc) on the client side hbase-site.xml. With this setting, the client threads recovered in a couple of minutes by failing fast and re-discovering the .META. table on a reassigned RS. Assumption: This ipc.socket.timeout is only ever used during the initial HConnection setup via the NetUtils.connect and should only ever be used when connectivity to a region server is lost and needs to be re-established. i.e it does not affect the normal RPC actiivity as this is just the connect timeout. During RS GC periods, any _new_ clients trying to connect will fail and will require .META. table re-lookups. This above timeout workaround is only for the HBase client side. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6364) Powering down the server host holding the .META. table causes HBase Client to take excessively long to recover and connect to reassigned .META. table
[ https://issues.apache.org/jira/browse/HBASE-6364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-6364: --- Attachment: 6364.v2.patch Powering down the server host holding the .META. table causes HBase Client to take excessively long to recover and connect to reassigned .META. table - Key: HBASE-6364 URL: https://issues.apache.org/jira/browse/HBASE-6364 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.6, 0.92.1, 0.94.0 Reporter: Suraj Varma Assignee: nkeywal Labels: client Attachments: 6364-host-serving-META.v1.patch, 6364.v1.patch, 6364.v1.patch, 6364.v2.patch, stacktrace.txt When a server host with a Region Server holding the .META. table is powered down on a live cluster, while the HBase cluster itself detects and reassigns the .META. table, connected HBase Client's take an excessively long time to detect this and re-discover the reassigned .META. Workaround: Decrease the ipc.socket.timeout on HBase Client side to a low value (default is 20s leading to 35 minute recovery time; we were able to get acceptable results with 100ms getting a 3 minute recovery) This was found during some hardware failure testing scenarios. Test Case: 1) Apply load via client app on HBase cluster for several minutes 2) Power down the region server holding the .META. server (i.e. power off ... and keep it off) 3) Measure how long it takes for cluster to reassign META table and for client threads to re-lookup and re-orient to the lesser cluster (minus the RS and DN on that host). Observation: 1) Client threads spike up to maxThreads size ... and take over 35 mins to recover (i.e. for the thread count to go back to normal) - no client calls are serviced - they just back up on a synchronized method (see #2 below) 2) All the client app threads queue up behind the oahh.ipc.HBaseClient#setupIOStreams method http://tinyurl.com/7js53dj After taking several thread dumps we found that the thread within this synchronized method was blocked on NetUtils.connect(this.socket, remoteId.getAddress(), getSocketTimeout(conf)); The client thread that gets the synchronized lock would try to connect to the dead RS (till socket times out after 20s), retries, and then the next thread gets in and so forth in a serial manner. Workaround: --- Default ipc.socket.timeout is set to 20s. We dropped this to a low number (1000 ms, 100 ms, etc) on the client side hbase-site.xml. With this setting, the client threads recovered in a couple of minutes by failing fast and re-discovering the .META. table on a reassigned RS. Assumption: This ipc.socket.timeout is only ever used during the initial HConnection setup via the NetUtils.connect and should only ever be used when connectivity to a region server is lost and needs to be re-established. i.e it does not affect the normal RPC actiivity as this is just the connect timeout. During RS GC periods, any _new_ clients trying to connect will fail and will require .META. table re-lookups. This above timeout workaround is only for the HBase client side. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6364) Powering down the server host holding the .META. table causes HBase Client to take excessively long to recover and connect to reassigned .META. table
[ https://issues.apache.org/jira/browse/HBASE-6364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-6364: --- Fix Version/s: 0.96.0 Status: Patch Available (was: Open) Powering down the server host holding the .META. table causes HBase Client to take excessively long to recover and connect to reassigned .META. table - Key: HBASE-6364 URL: https://issues.apache.org/jira/browse/HBASE-6364 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.94.0, 0.92.1, 0.90.6 Reporter: Suraj Varma Assignee: nkeywal Labels: client Fix For: 0.96.0 Attachments: 6364-host-serving-META.v1.patch, 6364.v1.patch, 6364.v1.patch, 6364.v2.patch, stacktrace.txt When a server host with a Region Server holding the .META. table is powered down on a live cluster, while the HBase cluster itself detects and reassigns the .META. table, connected HBase Client's take an excessively long time to detect this and re-discover the reassigned .META. Workaround: Decrease the ipc.socket.timeout on HBase Client side to a low value (default is 20s leading to 35 minute recovery time; we were able to get acceptable results with 100ms getting a 3 minute recovery) This was found during some hardware failure testing scenarios. Test Case: 1) Apply load via client app on HBase cluster for several minutes 2) Power down the region server holding the .META. server (i.e. power off ... and keep it off) 3) Measure how long it takes for cluster to reassign META table and for client threads to re-lookup and re-orient to the lesser cluster (minus the RS and DN on that host). Observation: 1) Client threads spike up to maxThreads size ... and take over 35 mins to recover (i.e. for the thread count to go back to normal) - no client calls are serviced - they just back up on a synchronized method (see #2 below) 2) All the client app threads queue up behind the oahh.ipc.HBaseClient#setupIOStreams method http://tinyurl.com/7js53dj After taking several thread dumps we found that the thread within this synchronized method was blocked on NetUtils.connect(this.socket, remoteId.getAddress(), getSocketTimeout(conf)); The client thread that gets the synchronized lock would try to connect to the dead RS (till socket times out after 20s), retries, and then the next thread gets in and so forth in a serial manner. Workaround: --- Default ipc.socket.timeout is set to 20s. We dropped this to a low number (1000 ms, 100 ms, etc) on the client side hbase-site.xml. With this setting, the client threads recovered in a couple of minutes by failing fast and re-discovering the .META. table on a reassigned RS. Assumption: This ipc.socket.timeout is only ever used during the initial HConnection setup via the NetUtils.connect and should only ever be used when connectivity to a region server is lost and needs to be re-established. i.e it does not affect the normal RPC actiivity as this is just the connect timeout. During RS GC periods, any _new_ clients trying to connect will fail and will require .META. table re-lookups. This above timeout workaround is only for the HBase client side. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6364) Powering down the server host holding the .META. table causes HBase Client to take excessively long to recover and connect to reassigned .META. table
[ https://issues.apache.org/jira/browse/HBASE-6364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429052#comment-13429052 ] nkeywal commented on HBASE-6364: v2, fixes the problem mentioned above, works locally. Powering down the server host holding the .META. table causes HBase Client to take excessively long to recover and connect to reassigned .META. table - Key: HBASE-6364 URL: https://issues.apache.org/jira/browse/HBASE-6364 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.6, 0.92.1, 0.94.0 Reporter: Suraj Varma Assignee: nkeywal Labels: client Fix For: 0.96.0 Attachments: 6364-host-serving-META.v1.patch, 6364.v1.patch, 6364.v1.patch, 6364.v2.patch, stacktrace.txt When a server host with a Region Server holding the .META. table is powered down on a live cluster, while the HBase cluster itself detects and reassigns the .META. table, connected HBase Client's take an excessively long time to detect this and re-discover the reassigned .META. Workaround: Decrease the ipc.socket.timeout on HBase Client side to a low value (default is 20s leading to 35 minute recovery time; we were able to get acceptable results with 100ms getting a 3 minute recovery) This was found during some hardware failure testing scenarios. Test Case: 1) Apply load via client app on HBase cluster for several minutes 2) Power down the region server holding the .META. server (i.e. power off ... and keep it off) 3) Measure how long it takes for cluster to reassign META table and for client threads to re-lookup and re-orient to the lesser cluster (minus the RS and DN on that host). Observation: 1) Client threads spike up to maxThreads size ... and take over 35 mins to recover (i.e. for the thread count to go back to normal) - no client calls are serviced - they just back up on a synchronized method (see #2 below) 2) All the client app threads queue up behind the oahh.ipc.HBaseClient#setupIOStreams method http://tinyurl.com/7js53dj After taking several thread dumps we found that the thread within this synchronized method was blocked on NetUtils.connect(this.socket, remoteId.getAddress(), getSocketTimeout(conf)); The client thread that gets the synchronized lock would try to connect to the dead RS (till socket times out after 20s), retries, and then the next thread gets in and so forth in a serial manner. Workaround: --- Default ipc.socket.timeout is set to 20s. We dropped this to a low number (1000 ms, 100 ms, etc) on the client side hbase-site.xml. With this setting, the client threads recovered in a couple of minutes by failing fast and re-discovering the .META. table on a reassigned RS. Assumption: This ipc.socket.timeout is only ever used during the initial HConnection setup via the NetUtils.connect and should only ever be used when connectivity to a region server is lost and needs to be re-established. i.e it does not affect the normal RPC actiivity as this is just the connect timeout. During RS GC periods, any _new_ clients trying to connect will fail and will require .META. table re-lookups. This above timeout workaround is only for the HBase client side. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6364) Powering down the server host holding the .META. table causes HBase Client to take excessively long to recover and connect to reassigned .META. table
[ https://issues.apache.org/jira/browse/HBASE-6364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429072#comment-13429072 ] Hadoop QA commented on HBASE-6364: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12539258/6364.v2.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 5 javac compiler warnings (more than the trunk's current 4 warnings). -1 findbugs. The patch appears to introduce 10 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2516//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2516//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2516//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2516//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2516//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2516//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2516//console This message is automatically generated. Powering down the server host holding the .META. table causes HBase Client to take excessively long to recover and connect to reassigned .META. table - Key: HBASE-6364 URL: https://issues.apache.org/jira/browse/HBASE-6364 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.6, 0.92.1, 0.94.0 Reporter: Suraj Varma Assignee: nkeywal Labels: client Fix For: 0.96.0 Attachments: 6364-host-serving-META.v1.patch, 6364.v1.patch, 6364.v1.patch, 6364.v2.patch, stacktrace.txt When a server host with a Region Server holding the .META. table is powered down on a live cluster, while the HBase cluster itself detects and reassigns the .META. table, connected HBase Client's take an excessively long time to detect this and re-discover the reassigned .META. Workaround: Decrease the ipc.socket.timeout on HBase Client side to a low value (default is 20s leading to 35 minute recovery time; we were able to get acceptable results with 100ms getting a 3 minute recovery) This was found during some hardware failure testing scenarios. Test Case: 1) Apply load via client app on HBase cluster for several minutes 2) Power down the region server holding the .META. server (i.e. power off ... and keep it off) 3) Measure how long it takes for cluster to reassign META table and for client threads to re-lookup and re-orient to the lesser cluster (minus the RS and DN on that host). Observation: 1) Client threads spike up to maxThreads size ... and take over 35 mins to recover (i.e. for the thread count to go back to normal) - no client calls are serviced - they just back up on a synchronized method (see #2 below) 2) All the client app threads queue up behind the oahh.ipc.HBaseClient#setupIOStreams method http://tinyurl.com/7js53dj After taking several thread dumps we found that the thread within this synchronized method was blocked on NetUtils.connect(this.socket, remoteId.getAddress(), getSocketTimeout(conf)); The client thread that gets the synchronized lock would try to connect to the dead RS (till socket times out after 20s), retries, and then the next thread gets in and so forth in a serial manner. Workaround: --- Default ipc.socket.timeout is set to 20s. We dropped this to a low number (1000 ms, 100 ms, etc) on the client side hbase-site.xml. With this setting, the client threads recovered in a couple of minutes by failing fast and re-discovering the .META.
[jira] [Updated] (HBASE-6364) Powering down the server host holding the .META. table causes HBase Client to take excessively long to recover and connect to reassigned .META. table
[ https://issues.apache.org/jira/browse/HBASE-6364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-6364: --- Attachment: 6364.v3.patch Powering down the server host holding the .META. table causes HBase Client to take excessively long to recover and connect to reassigned .META. table - Key: HBASE-6364 URL: https://issues.apache.org/jira/browse/HBASE-6364 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.6, 0.92.1, 0.94.0 Reporter: Suraj Varma Assignee: nkeywal Labels: client Fix For: 0.96.0 Attachments: 6364-host-serving-META.v1.patch, 6364.v1.patch, 6364.v1.patch, 6364.v2.patch, 6364.v3.patch, stacktrace.txt When a server host with a Region Server holding the .META. table is powered down on a live cluster, while the HBase cluster itself detects and reassigns the .META. table, connected HBase Client's take an excessively long time to detect this and re-discover the reassigned .META. Workaround: Decrease the ipc.socket.timeout on HBase Client side to a low value (default is 20s leading to 35 minute recovery time; we were able to get acceptable results with 100ms getting a 3 minute recovery) This was found during some hardware failure testing scenarios. Test Case: 1) Apply load via client app on HBase cluster for several minutes 2) Power down the region server holding the .META. server (i.e. power off ... and keep it off) 3) Measure how long it takes for cluster to reassign META table and for client threads to re-lookup and re-orient to the lesser cluster (minus the RS and DN on that host). Observation: 1) Client threads spike up to maxThreads size ... and take over 35 mins to recover (i.e. for the thread count to go back to normal) - no client calls are serviced - they just back up on a synchronized method (see #2 below) 2) All the client app threads queue up behind the oahh.ipc.HBaseClient#setupIOStreams method http://tinyurl.com/7js53dj After taking several thread dumps we found that the thread within this synchronized method was blocked on NetUtils.connect(this.socket, remoteId.getAddress(), getSocketTimeout(conf)); The client thread that gets the synchronized lock would try to connect to the dead RS (till socket times out after 20s), retries, and then the next thread gets in and so forth in a serial manner. Workaround: --- Default ipc.socket.timeout is set to 20s. We dropped this to a low number (1000 ms, 100 ms, etc) on the client side hbase-site.xml. With this setting, the client threads recovered in a couple of minutes by failing fast and re-discovering the .META. table on a reassigned RS. Assumption: This ipc.socket.timeout is only ever used during the initial HConnection setup via the NetUtils.connect and should only ever be used when connectivity to a region server is lost and needs to be re-established. i.e it does not affect the normal RPC actiivity as this is just the connect timeout. During RS GC periods, any _new_ clients trying to connect will fail and will require .META. table re-lookups. This above timeout workaround is only for the HBase client side. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6364) Powering down the server host holding the .META. table causes HBase Client to take excessively long to recover and connect to reassigned .META. table
[ https://issues.apache.org/jira/browse/HBASE-6364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429078#comment-13429078 ] nkeywal commented on HBASE-6364: v3 just changes a comment, not the code itself. Powering down the server host holding the .META. table causes HBase Client to take excessively long to recover and connect to reassigned .META. table - Key: HBASE-6364 URL: https://issues.apache.org/jira/browse/HBASE-6364 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.6, 0.92.1, 0.94.0 Reporter: Suraj Varma Assignee: nkeywal Labels: client Fix For: 0.96.0 Attachments: 6364-host-serving-META.v1.patch, 6364.v1.patch, 6364.v1.patch, 6364.v2.patch, 6364.v3.patch, stacktrace.txt When a server host with a Region Server holding the .META. table is powered down on a live cluster, while the HBase cluster itself detects and reassigns the .META. table, connected HBase Client's take an excessively long time to detect this and re-discover the reassigned .META. Workaround: Decrease the ipc.socket.timeout on HBase Client side to a low value (default is 20s leading to 35 minute recovery time; we were able to get acceptable results with 100ms getting a 3 minute recovery) This was found during some hardware failure testing scenarios. Test Case: 1) Apply load via client app on HBase cluster for several minutes 2) Power down the region server holding the .META. server (i.e. power off ... and keep it off) 3) Measure how long it takes for cluster to reassign META table and for client threads to re-lookup and re-orient to the lesser cluster (minus the RS and DN on that host). Observation: 1) Client threads spike up to maxThreads size ... and take over 35 mins to recover (i.e. for the thread count to go back to normal) - no client calls are serviced - they just back up on a synchronized method (see #2 below) 2) All the client app threads queue up behind the oahh.ipc.HBaseClient#setupIOStreams method http://tinyurl.com/7js53dj After taking several thread dumps we found that the thread within this synchronized method was blocked on NetUtils.connect(this.socket, remoteId.getAddress(), getSocketTimeout(conf)); The client thread that gets the synchronized lock would try to connect to the dead RS (till socket times out after 20s), retries, and then the next thread gets in and so forth in a serial manner. Workaround: --- Default ipc.socket.timeout is set to 20s. We dropped this to a low number (1000 ms, 100 ms, etc) on the client side hbase-site.xml. With this setting, the client threads recovered in a couple of minutes by failing fast and re-discovering the .META. table on a reassigned RS. Assumption: This ipc.socket.timeout is only ever used during the initial HConnection setup via the NetUtils.connect and should only ever be used when connectivity to a region server is lost and needs to be re-established. i.e it does not affect the normal RPC actiivity as this is just the connect timeout. During RS GC periods, any _new_ clients trying to connect will fail and will require .META. table re-lookups. This above timeout workaround is only for the HBase client side. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6364) Powering down the server host holding the .META. table causes HBase Client to take excessively long to recover and connect to reassigned .META. table
[ https://issues.apache.org/jira/browse/HBASE-6364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-6364: --- Status: Patch Available (was: Open) Powering down the server host holding the .META. table causes HBase Client to take excessively long to recover and connect to reassigned .META. table - Key: HBASE-6364 URL: https://issues.apache.org/jira/browse/HBASE-6364 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.94.0, 0.92.1, 0.90.6 Reporter: Suraj Varma Assignee: nkeywal Labels: client Fix For: 0.96.0 Attachments: 6364-host-serving-META.v1.patch, 6364.v1.patch, 6364.v1.patch, 6364.v2.patch, 6364.v3.patch, stacktrace.txt When a server host with a Region Server holding the .META. table is powered down on a live cluster, while the HBase cluster itself detects and reassigns the .META. table, connected HBase Client's take an excessively long time to detect this and re-discover the reassigned .META. Workaround: Decrease the ipc.socket.timeout on HBase Client side to a low value (default is 20s leading to 35 minute recovery time; we were able to get acceptable results with 100ms getting a 3 minute recovery) This was found during some hardware failure testing scenarios. Test Case: 1) Apply load via client app on HBase cluster for several minutes 2) Power down the region server holding the .META. server (i.e. power off ... and keep it off) 3) Measure how long it takes for cluster to reassign META table and for client threads to re-lookup and re-orient to the lesser cluster (minus the RS and DN on that host). Observation: 1) Client threads spike up to maxThreads size ... and take over 35 mins to recover (i.e. for the thread count to go back to normal) - no client calls are serviced - they just back up on a synchronized method (see #2 below) 2) All the client app threads queue up behind the oahh.ipc.HBaseClient#setupIOStreams method http://tinyurl.com/7js53dj After taking several thread dumps we found that the thread within this synchronized method was blocked on NetUtils.connect(this.socket, remoteId.getAddress(), getSocketTimeout(conf)); The client thread that gets the synchronized lock would try to connect to the dead RS (till socket times out after 20s), retries, and then the next thread gets in and so forth in a serial manner. Workaround: --- Default ipc.socket.timeout is set to 20s. We dropped this to a low number (1000 ms, 100 ms, etc) on the client side hbase-site.xml. With this setting, the client threads recovered in a couple of minutes by failing fast and re-discovering the .META. table on a reassigned RS. Assumption: This ipc.socket.timeout is only ever used during the initial HConnection setup via the NetUtils.connect and should only ever be used when connectivity to a region server is lost and needs to be re-established. i.e it does not affect the normal RPC actiivity as this is just the connect timeout. During RS GC periods, any _new_ clients trying to connect will fail and will require .META. table re-lookups. This above timeout workaround is only for the HBase client side. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6364) Powering down the server host holding the .META. table causes HBase Client to take excessively long to recover and connect to reassigned .META. table
[ https://issues.apache.org/jira/browse/HBASE-6364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-6364: --- Status: Open (was: Patch Available) Powering down the server host holding the .META. table causes HBase Client to take excessively long to recover and connect to reassigned .META. table - Key: HBASE-6364 URL: https://issues.apache.org/jira/browse/HBASE-6364 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.94.0, 0.92.1, 0.90.6 Reporter: Suraj Varma Assignee: nkeywal Labels: client Fix For: 0.96.0 Attachments: 6364-host-serving-META.v1.patch, 6364.v1.patch, 6364.v1.patch, 6364.v2.patch, 6364.v3.patch, stacktrace.txt When a server host with a Region Server holding the .META. table is powered down on a live cluster, while the HBase cluster itself detects and reassigns the .META. table, connected HBase Client's take an excessively long time to detect this and re-discover the reassigned .META. Workaround: Decrease the ipc.socket.timeout on HBase Client side to a low value (default is 20s leading to 35 minute recovery time; we were able to get acceptable results with 100ms getting a 3 minute recovery) This was found during some hardware failure testing scenarios. Test Case: 1) Apply load via client app on HBase cluster for several minutes 2) Power down the region server holding the .META. server (i.e. power off ... and keep it off) 3) Measure how long it takes for cluster to reassign META table and for client threads to re-lookup and re-orient to the lesser cluster (minus the RS and DN on that host). Observation: 1) Client threads spike up to maxThreads size ... and take over 35 mins to recover (i.e. for the thread count to go back to normal) - no client calls are serviced - they just back up on a synchronized method (see #2 below) 2) All the client app threads queue up behind the oahh.ipc.HBaseClient#setupIOStreams method http://tinyurl.com/7js53dj After taking several thread dumps we found that the thread within this synchronized method was blocked on NetUtils.connect(this.socket, remoteId.getAddress(), getSocketTimeout(conf)); The client thread that gets the synchronized lock would try to connect to the dead RS (till socket times out after 20s), retries, and then the next thread gets in and so forth in a serial manner. Workaround: --- Default ipc.socket.timeout is set to 20s. We dropped this to a low number (1000 ms, 100 ms, etc) on the client side hbase-site.xml. With this setting, the client threads recovered in a couple of minutes by failing fast and re-discovering the .META. table on a reassigned RS. Assumption: This ipc.socket.timeout is only ever used during the initial HConnection setup via the NetUtils.connect and should only ever be used when connectivity to a region server is lost and needs to be re-established. i.e it does not affect the normal RPC actiivity as this is just the connect timeout. During RS GC periods, any _new_ clients trying to connect will fail and will require .META. table re-lookups. This above timeout workaround is only for the HBase client side. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6364) Powering down the server host holding the .META. table causes HBase Client to take excessively long to recover and connect to reassigned .META. table
[ https://issues.apache.org/jira/browse/HBASE-6364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429098#comment-13429098 ] Hadoop QA commented on HBASE-6364: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12539272/6364.v3.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 5 javac compiler warnings (more than the trunk's current 4 warnings). -1 findbugs. The patch appears to introduce 10 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.client.TestFromClientSide org.apache.hadoop.hbase.io.hfile.TestForceCacheImportantBlocks org.apache.hadoop.hbase.master.TestAssignmentManager Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2517//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2517//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2517//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2517//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2517//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2517//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2517//console This message is automatically generated. Powering down the server host holding the .META. table causes HBase Client to take excessively long to recover and connect to reassigned .META. table - Key: HBASE-6364 URL: https://issues.apache.org/jira/browse/HBASE-6364 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.6, 0.92.1, 0.94.0 Reporter: Suraj Varma Assignee: nkeywal Labels: client Fix For: 0.96.0 Attachments: 6364-host-serving-META.v1.patch, 6364.v1.patch, 6364.v1.patch, 6364.v2.patch, 6364.v3.patch, stacktrace.txt When a server host with a Region Server holding the .META. table is powered down on a live cluster, while the HBase cluster itself detects and reassigns the .META. table, connected HBase Client's take an excessively long time to detect this and re-discover the reassigned .META. Workaround: Decrease the ipc.socket.timeout on HBase Client side to a low value (default is 20s leading to 35 minute recovery time; we were able to get acceptable results with 100ms getting a 3 minute recovery) This was found during some hardware failure testing scenarios. Test Case: 1) Apply load via client app on HBase cluster for several minutes 2) Power down the region server holding the .META. server (i.e. power off ... and keep it off) 3) Measure how long it takes for cluster to reassign META table and for client threads to re-lookup and re-orient to the lesser cluster (minus the RS and DN on that host). Observation: 1) Client threads spike up to maxThreads size ... and take over 35 mins to recover (i.e. for the thread count to go back to normal) - no client calls are serviced - they just back up on a synchronized method (see #2 below) 2) All the client app threads queue up behind the oahh.ipc.HBaseClient#setupIOStreams method http://tinyurl.com/7js53dj After taking several thread dumps we found that the thread within this synchronized method was blocked on NetUtils.connect(this.socket, remoteId.getAddress(), getSocketTimeout(conf)); The client thread that gets the synchronized lock would try to connect to the dead RS (till socket times out after 20s), retries, and then the next thread gets in and so forth in a serial manner. Workaround: --- Default
[jira] [Updated] (HBASE-6378) the javadoc of setEnabledTable maybe not describe accurately
[ https://issues.apache.org/jira/browse/HBASE-6378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhou wenjian updated HBASE-6378: Attachment: HBASE-6378-trunk.patch the javadoc of setEnabledTable maybe not describe accurately -- Key: HBASE-6378 URL: https://issues.apache.org/jira/browse/HBASE-6378 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.94.0 Reporter: zhou wenjian Fix For: 0.94.2 Attachments: 6378.patch, HBASE-6378-trunk.patch, HBASE-6378.patch /** * Sets the ENABLED state in the cache and deletes the zookeeper node. Fails * silently if the node is not in enabled in zookeeper * * @param tableName * @throws KeeperException */ public void setEnabledTable(final String tableName) throws KeeperException { setTableState(tableName, TableState.ENABLED); } When setEnabledTable occours ,It will update the cache and the zookeeper node,rather than to delete the zk node. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6378) the javadoc of setEnabledTable maybe not describe accurately
[ https://issues.apache.org/jira/browse/HBASE-6378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhou wenjian updated HBASE-6378: Status: Patch Available (was: Open) the javadoc of setEnabledTable maybe not describe accurately -- Key: HBASE-6378 URL: https://issues.apache.org/jira/browse/HBASE-6378 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.94.0 Reporter: zhou wenjian Fix For: 0.94.2 Attachments: 6378.patch, HBASE-6378-trunk.patch, HBASE-6378.patch /** * Sets the ENABLED state in the cache and deletes the zookeeper node. Fails * silently if the node is not in enabled in zookeeper * * @param tableName * @throws KeeperException */ public void setEnabledTable(final String tableName) throws KeeperException { setTableState(tableName, TableState.ENABLED); } When setEnabledTable occours ,It will update the cache and the zookeeper node,rather than to delete the zk node. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6516) hbck cannot detect any IOException while .tableinfo file is missing
Jie Huang created HBASE-6516: Summary: hbck cannot detect any IOException while .tableinfo file is missing Key: HBASE-6516 URL: https://issues.apache.org/jira/browse/HBASE-6516 Project: HBase Issue Type: Bug Components: hbck Affects Versions: 0.94.0, 0.96.0 Reporter: Jie Huang HBaseFsck checks those missing .tableinfo files in loadHdfsRegionInfos() function. However, no IoException will be catched while .tableinfo is missing, since FSTableDescriptors.getTableDescriptor doesn't throw any IoException. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6516) hbck cannot detect any IOException while .tableinfo file is missing
[ https://issues.apache.org/jira/browse/HBASE-6516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Huang updated HBASE-6516: - Attachment: hbase-6516.patch Here proposed a possible fix. The basic idea is to check if the HTableDescriptor *htd* is null. If it is null, we'd better to print out the error message and throw an IOException accordingly. Any comment? Thanks. hbck cannot detect any IOException while .tableinfo file is missing - Key: HBASE-6516 URL: https://issues.apache.org/jira/browse/HBASE-6516 Project: HBase Issue Type: Bug Components: hbck Affects Versions: 0.94.0, 0.96.0 Reporter: Jie Huang Attachments: hbase-6516.patch HBaseFsck checks those missing .tableinfo files in loadHdfsRegionInfos() function. However, no IoException will be catched while .tableinfo is missing, since FSTableDescriptors.getTableDescriptor doesn't throw any IoException. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6378) the javadoc of setEnabledTable maybe not describe accurately
[ https://issues.apache.org/jira/browse/HBASE-6378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429133#comment-13429133 ] Hadoop QA commented on HBASE-6378: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12539297/HBASE-6378-trunk.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 5 javac compiler warnings (more than the trunk's current 4 warnings). -1 findbugs. The patch appears to introduce 9 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.client.TestFromClientSideWithCoprocessor Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2518//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2518//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2518//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2518//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2518//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2518//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2518//console This message is automatically generated. the javadoc of setEnabledTable maybe not describe accurately -- Key: HBASE-6378 URL: https://issues.apache.org/jira/browse/HBASE-6378 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.94.0 Reporter: zhou wenjian Fix For: 0.94.2 Attachments: 6378.patch, HBASE-6378-trunk.patch, HBASE-6378.patch /** * Sets the ENABLED state in the cache and deletes the zookeeper node. Fails * silently if the node is not in enabled in zookeeper * * @param tableName * @throws KeeperException */ public void setEnabledTable(final String tableName) throws KeeperException { setTableState(tableName, TableState.ENABLED); } When setEnabledTable occours ,It will update the cache and the zookeeper node,rather than to delete the zk node. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6364) Powering down the server host holding the .META. table causes HBase Client to take excessively long to recover and connect to reassigned .META. table
[ https://issues.apache.org/jira/browse/HBASE-6364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429139#comment-13429139 ] nkeywal commented on HBASE-6364: errors are unrelated imho. Retrying to see if the third executions says something different. Powering down the server host holding the .META. table causes HBase Client to take excessively long to recover and connect to reassigned .META. table - Key: HBASE-6364 URL: https://issues.apache.org/jira/browse/HBASE-6364 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.6, 0.92.1, 0.94.0 Reporter: Suraj Varma Assignee: nkeywal Labels: client Fix For: 0.96.0 Attachments: 6364-host-serving-META.v1.patch, 6364.v1.patch, 6364.v1.patch, 6364.v2.patch, 6364.v3.patch, 6364.v3.patch, stacktrace.txt When a server host with a Region Server holding the .META. table is powered down on a live cluster, while the HBase cluster itself detects and reassigns the .META. table, connected HBase Client's take an excessively long time to detect this and re-discover the reassigned .META. Workaround: Decrease the ipc.socket.timeout on HBase Client side to a low value (default is 20s leading to 35 minute recovery time; we were able to get acceptable results with 100ms getting a 3 minute recovery) This was found during some hardware failure testing scenarios. Test Case: 1) Apply load via client app on HBase cluster for several minutes 2) Power down the region server holding the .META. server (i.e. power off ... and keep it off) 3) Measure how long it takes for cluster to reassign META table and for client threads to re-lookup and re-orient to the lesser cluster (minus the RS and DN on that host). Observation: 1) Client threads spike up to maxThreads size ... and take over 35 mins to recover (i.e. for the thread count to go back to normal) - no client calls are serviced - they just back up on a synchronized method (see #2 below) 2) All the client app threads queue up behind the oahh.ipc.HBaseClient#setupIOStreams method http://tinyurl.com/7js53dj After taking several thread dumps we found that the thread within this synchronized method was blocked on NetUtils.connect(this.socket, remoteId.getAddress(), getSocketTimeout(conf)); The client thread that gets the synchronized lock would try to connect to the dead RS (till socket times out after 20s), retries, and then the next thread gets in and so forth in a serial manner. Workaround: --- Default ipc.socket.timeout is set to 20s. We dropped this to a low number (1000 ms, 100 ms, etc) on the client side hbase-site.xml. With this setting, the client threads recovered in a couple of minutes by failing fast and re-discovering the .META. table on a reassigned RS. Assumption: This ipc.socket.timeout is only ever used during the initial HConnection setup via the NetUtils.connect and should only ever be used when connectivity to a region server is lost and needs to be re-established. i.e it does not affect the normal RPC actiivity as this is just the connect timeout. During RS GC periods, any _new_ clients trying to connect will fail and will require .META. table re-lookups. This above timeout workaround is only for the HBase client side. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6364) Powering down the server host holding the .META. table causes HBase Client to take excessively long to recover and connect to reassigned .META. table
[ https://issues.apache.org/jira/browse/HBASE-6364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-6364: --- Status: Open (was: Patch Available) Powering down the server host holding the .META. table causes HBase Client to take excessively long to recover and connect to reassigned .META. table - Key: HBASE-6364 URL: https://issues.apache.org/jira/browse/HBASE-6364 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.94.0, 0.92.1, 0.90.6 Reporter: Suraj Varma Assignee: nkeywal Labels: client Fix For: 0.96.0 Attachments: 6364-host-serving-META.v1.patch, 6364.v1.patch, 6364.v1.patch, 6364.v2.patch, 6364.v3.patch, 6364.v3.patch, stacktrace.txt When a server host with a Region Server holding the .META. table is powered down on a live cluster, while the HBase cluster itself detects and reassigns the .META. table, connected HBase Client's take an excessively long time to detect this and re-discover the reassigned .META. Workaround: Decrease the ipc.socket.timeout on HBase Client side to a low value (default is 20s leading to 35 minute recovery time; we were able to get acceptable results with 100ms getting a 3 minute recovery) This was found during some hardware failure testing scenarios. Test Case: 1) Apply load via client app on HBase cluster for several minutes 2) Power down the region server holding the .META. server (i.e. power off ... and keep it off) 3) Measure how long it takes for cluster to reassign META table and for client threads to re-lookup and re-orient to the lesser cluster (minus the RS and DN on that host). Observation: 1) Client threads spike up to maxThreads size ... and take over 35 mins to recover (i.e. for the thread count to go back to normal) - no client calls are serviced - they just back up on a synchronized method (see #2 below) 2) All the client app threads queue up behind the oahh.ipc.HBaseClient#setupIOStreams method http://tinyurl.com/7js53dj After taking several thread dumps we found that the thread within this synchronized method was blocked on NetUtils.connect(this.socket, remoteId.getAddress(), getSocketTimeout(conf)); The client thread that gets the synchronized lock would try to connect to the dead RS (till socket times out after 20s), retries, and then the next thread gets in and so forth in a serial manner. Workaround: --- Default ipc.socket.timeout is set to 20s. We dropped this to a low number (1000 ms, 100 ms, etc) on the client side hbase-site.xml. With this setting, the client threads recovered in a couple of minutes by failing fast and re-discovering the .META. table on a reassigned RS. Assumption: This ipc.socket.timeout is only ever used during the initial HConnection setup via the NetUtils.connect and should only ever be used when connectivity to a region server is lost and needs to be re-established. i.e it does not affect the normal RPC actiivity as this is just the connect timeout. During RS GC periods, any _new_ clients trying to connect will fail and will require .META. table re-lookups. This above timeout workaround is only for the HBase client side. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6364) Powering down the server host holding the .META. table causes HBase Client to take excessively long to recover and connect to reassigned .META. table
[ https://issues.apache.org/jira/browse/HBASE-6364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-6364: --- Status: Patch Available (was: Open) Powering down the server host holding the .META. table causes HBase Client to take excessively long to recover and connect to reassigned .META. table - Key: HBASE-6364 URL: https://issues.apache.org/jira/browse/HBASE-6364 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.94.0, 0.92.1, 0.90.6 Reporter: Suraj Varma Assignee: nkeywal Labels: client Fix For: 0.96.0 Attachments: 6364-host-serving-META.v1.patch, 6364.v1.patch, 6364.v1.patch, 6364.v2.patch, 6364.v3.patch, 6364.v3.patch, stacktrace.txt When a server host with a Region Server holding the .META. table is powered down on a live cluster, while the HBase cluster itself detects and reassigns the .META. table, connected HBase Client's take an excessively long time to detect this and re-discover the reassigned .META. Workaround: Decrease the ipc.socket.timeout on HBase Client side to a low value (default is 20s leading to 35 minute recovery time; we were able to get acceptable results with 100ms getting a 3 minute recovery) This was found during some hardware failure testing scenarios. Test Case: 1) Apply load via client app on HBase cluster for several minutes 2) Power down the region server holding the .META. server (i.e. power off ... and keep it off) 3) Measure how long it takes for cluster to reassign META table and for client threads to re-lookup and re-orient to the lesser cluster (minus the RS and DN on that host). Observation: 1) Client threads spike up to maxThreads size ... and take over 35 mins to recover (i.e. for the thread count to go back to normal) - no client calls are serviced - they just back up on a synchronized method (see #2 below) 2) All the client app threads queue up behind the oahh.ipc.HBaseClient#setupIOStreams method http://tinyurl.com/7js53dj After taking several thread dumps we found that the thread within this synchronized method was blocked on NetUtils.connect(this.socket, remoteId.getAddress(), getSocketTimeout(conf)); The client thread that gets the synchronized lock would try to connect to the dead RS (till socket times out after 20s), retries, and then the next thread gets in and so forth in a serial manner. Workaround: --- Default ipc.socket.timeout is set to 20s. We dropped this to a low number (1000 ms, 100 ms, etc) on the client side hbase-site.xml. With this setting, the client threads recovered in a couple of minutes by failing fast and re-discovering the .META. table on a reassigned RS. Assumption: This ipc.socket.timeout is only ever used during the initial HConnection setup via the NetUtils.connect and should only ever be used when connectivity to a region server is lost and needs to be re-established. i.e it does not affect the normal RPC actiivity as this is just the connect timeout. During RS GC periods, any _new_ clients trying to connect will fail and will require .META. table re-lookups. This above timeout workaround is only for the HBase client side. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6364) Powering down the server host holding the .META. table causes HBase Client to take excessively long to recover and connect to reassigned .META. table
[ https://issues.apache.org/jira/browse/HBASE-6364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-6364: --- Attachment: 6364.v3.patch Powering down the server host holding the .META. table causes HBase Client to take excessively long to recover and connect to reassigned .META. table - Key: HBASE-6364 URL: https://issues.apache.org/jira/browse/HBASE-6364 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.6, 0.92.1, 0.94.0 Reporter: Suraj Varma Assignee: nkeywal Labels: client Fix For: 0.96.0 Attachments: 6364-host-serving-META.v1.patch, 6364.v1.patch, 6364.v1.patch, 6364.v2.patch, 6364.v3.patch, 6364.v3.patch, stacktrace.txt When a server host with a Region Server holding the .META. table is powered down on a live cluster, while the HBase cluster itself detects and reassigns the .META. table, connected HBase Client's take an excessively long time to detect this and re-discover the reassigned .META. Workaround: Decrease the ipc.socket.timeout on HBase Client side to a low value (default is 20s leading to 35 minute recovery time; we were able to get acceptable results with 100ms getting a 3 minute recovery) This was found during some hardware failure testing scenarios. Test Case: 1) Apply load via client app on HBase cluster for several minutes 2) Power down the region server holding the .META. server (i.e. power off ... and keep it off) 3) Measure how long it takes for cluster to reassign META table and for client threads to re-lookup and re-orient to the lesser cluster (minus the RS and DN on that host). Observation: 1) Client threads spike up to maxThreads size ... and take over 35 mins to recover (i.e. for the thread count to go back to normal) - no client calls are serviced - they just back up on a synchronized method (see #2 below) 2) All the client app threads queue up behind the oahh.ipc.HBaseClient#setupIOStreams method http://tinyurl.com/7js53dj After taking several thread dumps we found that the thread within this synchronized method was blocked on NetUtils.connect(this.socket, remoteId.getAddress(), getSocketTimeout(conf)); The client thread that gets the synchronized lock would try to connect to the dead RS (till socket times out after 20s), retries, and then the next thread gets in and so forth in a serial manner. Workaround: --- Default ipc.socket.timeout is set to 20s. We dropped this to a low number (1000 ms, 100 ms, etc) on the client side hbase-site.xml. With this setting, the client threads recovered in a couple of minutes by failing fast and re-discovering the .META. table on a reassigned RS. Assumption: This ipc.socket.timeout is only ever used during the initial HConnection setup via the NetUtils.connect and should only ever be used when connectivity to a region server is lost and needs to be re-established. i.e it does not affect the normal RPC actiivity as this is just the connect timeout. During RS GC periods, any _new_ clients trying to connect will fail and will require .META. table re-lookups. This above timeout workaround is only for the HBase client side. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6444) Expose the ability to set custom HTTP Request Headers for the REST client used by RemoteHTable
[ https://issues.apache.org/jira/browse/HBASE-6444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429204#comment-13429204 ] Erich Hochmuth commented on HBASE-6444: --- I'm new to the process so let me know if there is something that i can do to get this put into the trunk. Expose the ability to set custom HTTP Request Headers for the REST client used by RemoteHTable -- Key: HBASE-6444 URL: https://issues.apache.org/jira/browse/HBASE-6444 Project: HBase Issue Type: Improvement Components: rest Reporter: Erich Hochmuth Assignee: Jimmy Xiang Attachments: HBASE-6444-0.94.patch, HBASE-6444.patch, trunk-6444.patch Original Estimate: 48h Remaining Estimate: 48h My corporate security office (ISO) requires that all http traffic get routed through a Web Access Management layer (http://en.wikipedia.org/wiki/Web_access_management) Our Hadoop cluster has been segmented by a virtual network with all access to HBase from outside clients being managed through HBase Stargate rest server. The corporate WAM system requires that all http clients authenticate with it first before making any http request to any http service in the corporate network. After the http client authenticates with the WAM system the WAM system returns the client a set of values that must be inserted into a http cookie and request header of all future http requests to other http clients. This would mean that all requests through the RemoteHTable interface would require that this cookie and request header be set as part of the http request. org.apache.hadoop.hbase.rest.client.Client looks like the appropriate place that this functionality would need to be plugged into. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6444) Expose the ability to set custom HTTP Request Headers for the REST client used by RemoteHTable
[ https://issues.apache.org/jira/browse/HBASE-6444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429208#comment-13429208 ] stack commented on HBASE-6444: -- +1 on patch (especially if it works as Erich confirms) Expose the ability to set custom HTTP Request Headers for the REST client used by RemoteHTable -- Key: HBASE-6444 URL: https://issues.apache.org/jira/browse/HBASE-6444 Project: HBase Issue Type: Improvement Components: rest Reporter: Erich Hochmuth Assignee: Jimmy Xiang Attachments: HBASE-6444-0.94.patch, HBASE-6444.patch, trunk-6444.patch Original Estimate: 48h Remaining Estimate: 48h My corporate security office (ISO) requires that all http traffic get routed through a Web Access Management layer (http://en.wikipedia.org/wiki/Web_access_management) Our Hadoop cluster has been segmented by a virtual network with all access to HBase from outside clients being managed through HBase Stargate rest server. The corporate WAM system requires that all http clients authenticate with it first before making any http request to any http service in the corporate network. After the http client authenticates with the WAM system the WAM system returns the client a set of values that must be inserted into a http cookie and request header of all future http requests to other http clients. This would mean that all requests through the RemoteHTable interface would require that this cookie and request header be set as part of the http request. org.apache.hadoop.hbase.rest.client.Client looks like the appropriate place that this functionality would need to be plugged into. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6444) Expose the ability to set custom HTTP Request Headers for the REST client used by RemoteHTable
[ https://issues.apache.org/jira/browse/HBASE-6444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429209#comment-13429209 ] Andrew Purtell commented on HBASE-6444: --- +1 Jimmy, looks fine. On commit consider updating the manual and also add javadoc to the Client class that mentions what gets sent in the Cookie header will be auto updated from the last response, for any subsequent request originated by that Client instance. Otherwise we're going to hear about that on the list sooner or later. Expose the ability to set custom HTTP Request Headers for the REST client used by RemoteHTable -- Key: HBASE-6444 URL: https://issues.apache.org/jira/browse/HBASE-6444 Project: HBase Issue Type: Improvement Components: rest Reporter: Erich Hochmuth Assignee: Jimmy Xiang Attachments: HBASE-6444-0.94.patch, HBASE-6444.patch, trunk-6444.patch Original Estimate: 48h Remaining Estimate: 48h My corporate security office (ISO) requires that all http traffic get routed through a Web Access Management layer (http://en.wikipedia.org/wiki/Web_access_management) Our Hadoop cluster has been segmented by a virtual network with all access to HBase from outside clients being managed through HBase Stargate rest server. The corporate WAM system requires that all http clients authenticate with it first before making any http request to any http service in the corporate network. After the http client authenticates with the WAM system the WAM system returns the client a set of values that must be inserted into a http cookie and request header of all future http requests to other http clients. This would mean that all requests through the RemoteHTable interface would require that this cookie and request header be set as part of the http request. org.apache.hadoop.hbase.rest.client.Client looks like the appropriate place that this functionality would need to be plugged into. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6517) Print thread dump when a test times out
Andrew Purtell created HBASE-6517: - Summary: Print thread dump when a test times out Key: HBASE-6517 URL: https://issues.apache.org/jira/browse/HBASE-6517 Project: HBase Issue Type: Improvement Components: test Affects Versions: 0.96.0 Reporter: Andrew Purtell Hadoop common is adding a JUnit run listener which prints full thread dump into System.err when a test is failed due to timeout. See HDFS-3762. Suggest pulling in their {{TestTimedOutListener}} once it is committed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6517) Print thread dump when a test times out
[ https://issues.apache.org/jira/browse/HBASE-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-6517: -- Priority: Minor (was: Major) Print thread dump when a test times out --- Key: HBASE-6517 URL: https://issues.apache.org/jira/browse/HBASE-6517 Project: HBase Issue Type: Improvement Components: test Affects Versions: 0.96.0 Reporter: Andrew Purtell Priority: Minor Labels: noob Hadoop common is adding a JUnit run listener which prints full thread dump into System.err when a test is failed due to timeout. See HDFS-3762. Suggest pulling in their {{TestTimedOutListener}} once it is committed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6515) Setting request size with protobuf
[ https://issues.apache.org/jira/browse/HBASE-6515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-6515: -- Component/s: replication ipc Priority: Critical (was: Major) Setting request size with protobuf -- Key: HBASE-6515 URL: https://issues.apache.org/jira/browse/HBASE-6515 Project: HBase Issue Type: Bug Components: ipc, replication Affects Versions: 0.96.0 Reporter: Himanshu Vashishtha Priority: Critical While running replication on upstream code, I am hitting the size-limit exception while sending WALEdits to a different cluster. {code} com.google.protobuf.InvalidProtocolBufferException: IPC server unable to read call parameters: Protocol message was too large. May be malicious. Use CodedInputStream.setSizeLimit() to increase the size limit. {code} Do we have a property to set some max size or something? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6516) hbck cannot detect any IOException while .tableinfo file is missing
[ https://issues.apache.org/jira/browse/HBASE-6516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429226#comment-13429226 ] Andrew Purtell commented on HBASE-6516: --- From the null test, you should pass that same string that is logged to the constructor of the new IOException. Have you tried running the unit test suite with your patch applied? What is the result? hbck cannot detect any IOException while .tableinfo file is missing - Key: HBASE-6516 URL: https://issues.apache.org/jira/browse/HBASE-6516 Project: HBase Issue Type: Bug Components: hbck Affects Versions: 0.94.0, 0.96.0 Reporter: Jie Huang Attachments: hbase-6516.patch HBaseFsck checks those missing .tableinfo files in loadHdfsRegionInfos() function. However, no IoException will be catched while .tableinfo is missing, since FSTableDescriptors.getTableDescriptor doesn't throw any IoException. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6364) Powering down the server host holding the .META. table causes HBase Client to take excessively long to recover and connect to reassigned .META. table
[ https://issues.apache.org/jira/browse/HBASE-6364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429233#comment-13429233 ] stack commented on HBASE-6364: -- Good one lads. Fix formatting before commit N. Make it same as surrounding code... Add spacings around brackets -- the 'else' -- and the '+' in String concatenations. I think I understand the notifying that is going on on the end of the addCall method. They line up w/ waits on Call and waits on the calls data member? Would it be hard making a test of this bit of code? What speed up around recovery are you seeing N? Should we change the default timeout too as Suraj does above? Powering down the server host holding the .META. table causes HBase Client to take excessively long to recover and connect to reassigned .META. table - Key: HBASE-6364 URL: https://issues.apache.org/jira/browse/HBASE-6364 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.6, 0.92.1, 0.94.0 Reporter: Suraj Varma Assignee: nkeywal Labels: client Fix For: 0.96.0 Attachments: 6364-host-serving-META.v1.patch, 6364.v1.patch, 6364.v1.patch, 6364.v2.patch, 6364.v3.patch, 6364.v3.patch, stacktrace.txt When a server host with a Region Server holding the .META. table is powered down on a live cluster, while the HBase cluster itself detects and reassigns the .META. table, connected HBase Client's take an excessively long time to detect this and re-discover the reassigned .META. Workaround: Decrease the ipc.socket.timeout on HBase Client side to a low value (default is 20s leading to 35 minute recovery time; we were able to get acceptable results with 100ms getting a 3 minute recovery) This was found during some hardware failure testing scenarios. Test Case: 1) Apply load via client app on HBase cluster for several minutes 2) Power down the region server holding the .META. server (i.e. power off ... and keep it off) 3) Measure how long it takes for cluster to reassign META table and for client threads to re-lookup and re-orient to the lesser cluster (minus the RS and DN on that host). Observation: 1) Client threads spike up to maxThreads size ... and take over 35 mins to recover (i.e. for the thread count to go back to normal) - no client calls are serviced - they just back up on a synchronized method (see #2 below) 2) All the client app threads queue up behind the oahh.ipc.HBaseClient#setupIOStreams method http://tinyurl.com/7js53dj After taking several thread dumps we found that the thread within this synchronized method was blocked on NetUtils.connect(this.socket, remoteId.getAddress(), getSocketTimeout(conf)); The client thread that gets the synchronized lock would try to connect to the dead RS (till socket times out after 20s), retries, and then the next thread gets in and so forth in a serial manner. Workaround: --- Default ipc.socket.timeout is set to 20s. We dropped this to a low number (1000 ms, 100 ms, etc) on the client side hbase-site.xml. With this setting, the client threads recovered in a couple of minutes by failing fast and re-discovering the .META. table on a reassigned RS. Assumption: This ipc.socket.timeout is only ever used during the initial HConnection setup via the NetUtils.connect and should only ever be used when connectivity to a region server is lost and needs to be re-established. i.e it does not affect the normal RPC actiivity as this is just the connect timeout. During RS GC periods, any _new_ clients trying to connect will fail and will require .META. table re-lookups. This above timeout workaround is only for the HBase client side. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6509) Implement fast-forwarding FuzzyRowFilter to allow filter rows e.g. by ???alex?b
[ https://issues.apache.org/jira/browse/HBASE-6509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429236#comment-13429236 ] Jonathan Hsieh commented on HBASE-6509: --- Hey Alex, Is there a reason why this is necessary given that it seems tha you can use a RegexStringComparator on a RowFilter? {code} #example from title Filter f = new RowFilter(CompareOp.EQUALS, new RegexStringComparator(..alex.b) ) # example in javadoc Filter f = new RowFilter(CompareOp.EQUALS, new RegexStringComparator(_99__01) ) {code} Implement fast-forwarding FuzzyRowFilter to allow filter rows e.g. by ???alex?b - Key: HBASE-6509 URL: https://issues.apache.org/jira/browse/HBASE-6509 Project: HBase Issue Type: New Feature Components: filters Reporter: Alex Baranau Assignee: Alex Baranau Priority: Minor Attachments: HBASE-6509.patch, HBASE-6509_1.patch Implement fuzzy row key filter to allow fetching records e.g. by this criteria: ???alex?b. This seems to be very useful as an alternative to select records by row keys by specifying their part which is not prefix part. Due to fast-forwarding nature of the filter in many situations this helps to avoid heavy full-table scans. This is especially effective when you have composite row key and (some of) its parts has fixed length. E.g. with the key of format userId_actionId_time, given that userId and actionId length is fixed, one can select user actions of specific type using fuzzy row key by specifying mask _myaction. Given fast-forwarding nature of filter, this will usually work much faster than doing whole table scan with any of the existing server-side filters. In many cases this can work as secondary-indexing alternative. Many times users implement it as a custom filter and many times they just don' know this is possible. Let's add it to the common codebase. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6364) Powering down the server host holding the .META. table causes HBase Client to take excessively long to recover and connect to reassigned .META. table
[ https://issues.apache.org/jira/browse/HBASE-6364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429238#comment-13429238 ] Zhihong Ted Yu commented on HBASE-6364: --- In addCall(): {code} +calls.put(call.id, call); +notify(); {code} Do we need a 'synchronized (call)' block for the above notification ? Powering down the server host holding the .META. table causes HBase Client to take excessively long to recover and connect to reassigned .META. table - Key: HBASE-6364 URL: https://issues.apache.org/jira/browse/HBASE-6364 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.6, 0.92.1, 0.94.0 Reporter: Suraj Varma Assignee: nkeywal Labels: client Fix For: 0.96.0 Attachments: 6364-host-serving-META.v1.patch, 6364.v1.patch, 6364.v1.patch, 6364.v2.patch, 6364.v3.patch, 6364.v3.patch, stacktrace.txt When a server host with a Region Server holding the .META. table is powered down on a live cluster, while the HBase cluster itself detects and reassigns the .META. table, connected HBase Client's take an excessively long time to detect this and re-discover the reassigned .META. Workaround: Decrease the ipc.socket.timeout on HBase Client side to a low value (default is 20s leading to 35 minute recovery time; we were able to get acceptable results with 100ms getting a 3 minute recovery) This was found during some hardware failure testing scenarios. Test Case: 1) Apply load via client app on HBase cluster for several minutes 2) Power down the region server holding the .META. server (i.e. power off ... and keep it off) 3) Measure how long it takes for cluster to reassign META table and for client threads to re-lookup and re-orient to the lesser cluster (minus the RS and DN on that host). Observation: 1) Client threads spike up to maxThreads size ... and take over 35 mins to recover (i.e. for the thread count to go back to normal) - no client calls are serviced - they just back up on a synchronized method (see #2 below) 2) All the client app threads queue up behind the oahh.ipc.HBaseClient#setupIOStreams method http://tinyurl.com/7js53dj After taking several thread dumps we found that the thread within this synchronized method was blocked on NetUtils.connect(this.socket, remoteId.getAddress(), getSocketTimeout(conf)); The client thread that gets the synchronized lock would try to connect to the dead RS (till socket times out after 20s), retries, and then the next thread gets in and so forth in a serial manner. Workaround: --- Default ipc.socket.timeout is set to 20s. We dropped this to a low number (1000 ms, 100 ms, etc) on the client side hbase-site.xml. With this setting, the client threads recovered in a couple of minutes by failing fast and re-discovering the .META. table on a reassigned RS. Assumption: This ipc.socket.timeout is only ever used during the initial HConnection setup via the NetUtils.connect and should only ever be used when connectivity to a region server is lost and needs to be re-established. i.e it does not affect the normal RPC actiivity as this is just the connect timeout. During RS GC periods, any _new_ clients trying to connect will fail and will require .META. table re-lookups. This above timeout workaround is only for the HBase client side. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6509) Implement fast-forwarding FuzzyRowFilter to allow filter rows e.g. by ???alex?b
[ https://issues.apache.org/jira/browse/HBASE-6509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429243#comment-13429243 ] Zhihong Ted Yu commented on HBASE-6509: --- Alex's patch utilizes ReturnCode.SEEK_NEXT_USING_HINT in certain condition. Looking at RowFilter, I don't see this optimization. Implement fast-forwarding FuzzyRowFilter to allow filter rows e.g. by ???alex?b - Key: HBASE-6509 URL: https://issues.apache.org/jira/browse/HBASE-6509 Project: HBase Issue Type: New Feature Components: filters Reporter: Alex Baranau Assignee: Alex Baranau Priority: Minor Attachments: HBASE-6509.patch, HBASE-6509_1.patch Implement fuzzy row key filter to allow fetching records e.g. by this criteria: ???alex?b. This seems to be very useful as an alternative to select records by row keys by specifying their part which is not prefix part. Due to fast-forwarding nature of the filter in many situations this helps to avoid heavy full-table scans. This is especially effective when you have composite row key and (some of) its parts has fixed length. E.g. with the key of format userId_actionId_time, given that userId and actionId length is fixed, one can select user actions of specific type using fuzzy row key by specifying mask _myaction. Given fast-forwarding nature of filter, this will usually work much faster than doing whole table scan with any of the existing server-side filters. In many cases this can work as secondary-indexing alternative. Many times users implement it as a custom filter and many times they just don' know this is possible. Let's add it to the common codebase. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic
[ https://issues.apache.org/jira/browse/HBASE-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429262#comment-13429262 ] Dave Revell commented on HBASE-6358: {quote}If the size and speed don't matter, then wouldn't you have just used a normal (non-bulk-load) MR job to load the data?{quote} There are other reasons to atomically load hfiles even for non-huge datasets, such as ETL and restoring backups. And atomicity could have some benefits for certain use cases. But it's probably not asking too much for people with these use cases to use a distributed hfile loader that depends on mapreduce, so I'm willing to concede the point. @Todd, would you be in favor of adding another JIRA ticket for a distributed bulk loader, and having this ticket be blocked until it's done? I think it should be blocked so we don't remove the current bulkload from remote fs capability without offering an alternative, though the user does have the option of running distcp themselves. Bulkloading from remote filesystem is problematic - Key: HBASE-6358 URL: https://issues.apache.org/jira/browse/HBASE-6358 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.94.0 Reporter: Dave Revell Assignee: Dave Revell Attachments: 6358-suggestion.txt, HBASE-6358-trunk-v1.diff, HBASE-6358-trunk-v2.diff, HBASE-6358-trunk-v3.diff Bulk loading hfiles that don't live on the same filesystem as HBase can cause problems for subtle reasons. In Store.bulkLoadHFile(), the regionserver will copy the source hfile to its own filesystem if it's not already there. Since this can take a long time for large hfiles, it's likely that the client will timeout and retry. When the client retries repeatedly, there may be several bulkload operations in flight for the same hfile, causing lots of unnecessary IO and tying up handler threads. This can seriously impact performance. In my case, the cluster became unusable and the regionservers had to be kill -9'ed. Possible solutions: # Require that hfiles already be on the same filesystem as HBase in order for bulkloading to succeed. The copy could be handled by LoadIncrementalHFiles before the regionserver is called. # Others? I'm not familiar with Hadoop IPC so there may be tricks to extend the timeout or something else. I'm willing to write a patch but I'd appreciate recommendations on how to proceed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6444) Expose the ability to set custom HTTP Request Headers for the REST client used by RemoteHTable
[ https://issues.apache.org/jira/browse/HBASE-6444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-6444: --- Status: Open (was: Patch Available) Expose the ability to set custom HTTP Request Headers for the REST client used by RemoteHTable -- Key: HBASE-6444 URL: https://issues.apache.org/jira/browse/HBASE-6444 Project: HBase Issue Type: Improvement Components: rest Reporter: Erich Hochmuth Assignee: Jimmy Xiang Attachments: HBASE-6444-0.94.patch, HBASE-6444.patch, trunk-6444.patch, trunk-6444_v2.patch Original Estimate: 48h Remaining Estimate: 48h My corporate security office (ISO) requires that all http traffic get routed through a Web Access Management layer (http://en.wikipedia.org/wiki/Web_access_management) Our Hadoop cluster has been segmented by a virtual network with all access to HBase from outside clients being managed through HBase Stargate rest server. The corporate WAM system requires that all http clients authenticate with it first before making any http request to any http service in the corporate network. After the http client authenticates with the WAM system the WAM system returns the client a set of values that must be inserted into a http cookie and request header of all future http requests to other http clients. This would mean that all requests through the RemoteHTable interface would require that this cookie and request header be set as part of the http request. org.apache.hadoop.hbase.rest.client.Client looks like the appropriate place that this functionality would need to be plugged into. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6444) Expose the ability to set custom HTTP Request Headers for the REST client used by RemoteHTable
[ https://issues.apache.org/jira/browse/HBASE-6444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-6444: --- Attachment: trunk-6444_v2.patch Expose the ability to set custom HTTP Request Headers for the REST client used by RemoteHTable -- Key: HBASE-6444 URL: https://issues.apache.org/jira/browse/HBASE-6444 Project: HBase Issue Type: Improvement Components: rest Reporter: Erich Hochmuth Assignee: Jimmy Xiang Attachments: HBASE-6444-0.94.patch, HBASE-6444.patch, trunk-6444.patch, trunk-6444_v2.patch Original Estimate: 48h Remaining Estimate: 48h My corporate security office (ISO) requires that all http traffic get routed through a Web Access Management layer (http://en.wikipedia.org/wiki/Web_access_management) Our Hadoop cluster has been segmented by a virtual network with all access to HBase from outside clients being managed through HBase Stargate rest server. The corporate WAM system requires that all http clients authenticate with it first before making any http request to any http service in the corporate network. After the http client authenticates with the WAM system the WAM system returns the client a set of values that must be inserted into a http cookie and request header of all future http requests to other http clients. This would mean that all requests through the RemoteHTable interface would require that this cookie and request header be set as part of the http request. org.apache.hadoop.hbase.rest.client.Client looks like the appropriate place that this functionality would need to be plugged into. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6509) Implement fast-forwarding FuzzyRowFilter to allow filter rows e.g. by ???alex?b
[ https://issues.apache.org/jira/browse/HBASE-6509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429269#comment-13429269 ] Jonathan Hsieh commented on HBASE-6509: --- Ted, interesting. I wonder if this kind of optimization is more generally applicable or it is worth pushing down into the regex comparator. My concern is that this seems a bit special purpose. I wonder if there is a jira already filed for custom filter plugins/coprocessors. Implement fast-forwarding FuzzyRowFilter to allow filter rows e.g. by ???alex?b - Key: HBASE-6509 URL: https://issues.apache.org/jira/browse/HBASE-6509 Project: HBase Issue Type: New Feature Components: filters Reporter: Alex Baranau Assignee: Alex Baranau Priority: Minor Attachments: HBASE-6509.patch, HBASE-6509_1.patch Implement fuzzy row key filter to allow fetching records e.g. by this criteria: ???alex?b. This seems to be very useful as an alternative to select records by row keys by specifying their part which is not prefix part. Due to fast-forwarding nature of the filter in many situations this helps to avoid heavy full-table scans. This is especially effective when you have composite row key and (some of) its parts has fixed length. E.g. with the key of format userId_actionId_time, given that userId and actionId length is fixed, one can select user actions of specific type using fuzzy row key by specifying mask _myaction. Given fast-forwarding nature of filter, this will usually work much faster than doing whole table scan with any of the existing server-side filters. In many cases this can work as secondary-indexing alternative. Many times users implement it as a custom filter and many times they just don' know this is possible. Let's add it to the common codebase. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6444) Expose the ability to set custom HTTP Request Headers for the REST client used by RemoteHTable
[ https://issues.apache.org/jira/browse/HBASE-6444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429268#comment-13429268 ] Jimmy Xiang commented on HBASE-6444: I looked into the cookie syntax and found out that server sends Set-Cookie (header) to client, and client sends back Cookie (header). So I updated the patch not to update the Cookie header since it seems useless. So now, once the RemoteHTable is created, client application needs to get the httpClient and handle the Set-Cookie logic and set the Cookie header properly. The idea is that we leave the cookie business to client application, how is that? http://msdn.microsoft.com/en-us/library/aa920098.aspx Expose the ability to set custom HTTP Request Headers for the REST client used by RemoteHTable -- Key: HBASE-6444 URL: https://issues.apache.org/jira/browse/HBASE-6444 Project: HBase Issue Type: Improvement Components: rest Reporter: Erich Hochmuth Assignee: Jimmy Xiang Attachments: HBASE-6444-0.94.patch, HBASE-6444.patch, trunk-6444.patch, trunk-6444_v2.patch Original Estimate: 48h Remaining Estimate: 48h My corporate security office (ISO) requires that all http traffic get routed through a Web Access Management layer (http://en.wikipedia.org/wiki/Web_access_management) Our Hadoop cluster has been segmented by a virtual network with all access to HBase from outside clients being managed through HBase Stargate rest server. The corporate WAM system requires that all http clients authenticate with it first before making any http request to any http service in the corporate network. After the http client authenticates with the WAM system the WAM system returns the client a set of values that must be inserted into a http cookie and request header of all future http requests to other http clients. This would mean that all requests through the RemoteHTable interface would require that this cookie and request header be set as part of the http request. org.apache.hadoop.hbase.rest.client.Client looks like the appropriate place that this functionality would need to be plugged into. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6518) Bytes.toBytesBinary() incorrect trailing backslash escape
Tudor Scurtu created HBASE-6518: --- Summary: Bytes.toBytesBinary() incorrect trailing backslash escape Key: HBASE-6518 URL: https://issues.apache.org/jira/browse/HBASE-6518 Project: HBase Issue Type: Bug Components: util Reporter: Tudor Scurtu Assignee: Tudor Scurtu Priority: Trivial Bytes.toBytesBinary() converts escaped strings to byte arrays. When encountering a '\' character, it looks at the next one to see if it is an 'x', without checking if it exists. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6444) Expose the ability to set custom HTTP Request Headers for the REST client used by RemoteHTable
[ https://issues.apache.org/jira/browse/HBASE-6444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429271#comment-13429271 ] Erich Hochmuth commented on HBASE-6444: --- That should be fine. You are correct setting the exact cookie and any request headers is the concern of the application and not HBase. Expose the ability to set custom HTTP Request Headers for the REST client used by RemoteHTable -- Key: HBASE-6444 URL: https://issues.apache.org/jira/browse/HBASE-6444 Project: HBase Issue Type: Improvement Components: rest Reporter: Erich Hochmuth Assignee: Jimmy Xiang Attachments: HBASE-6444-0.94.patch, HBASE-6444.patch, trunk-6444.patch, trunk-6444_v2.patch Original Estimate: 48h Remaining Estimate: 48h My corporate security office (ISO) requires that all http traffic get routed through a Web Access Management layer (http://en.wikipedia.org/wiki/Web_access_management) Our Hadoop cluster has been segmented by a virtual network with all access to HBase from outside clients being managed through HBase Stargate rest server. The corporate WAM system requires that all http clients authenticate with it first before making any http request to any http service in the corporate network. After the http client authenticates with the WAM system the WAM system returns the client a set of values that must be inserted into a http cookie and request header of all future http requests to other http clients. This would mean that all requests through the RemoteHTable interface would require that this cookie and request header be set as part of the http request. org.apache.hadoop.hbase.rest.client.Client looks like the appropriate place that this functionality would need to be plugged into. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5189) Add metrics to keep track of region-splits in RS
[ https://issues.apache.org/jira/browse/HBASE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-5189: --- Hadoop Flags: (was: Reviewed) Status: Patch Available (was: Reopened) Add metrics to keep track of region-splits in RS Key: HBASE-5189 URL: https://issues.apache.org/jira/browse/HBASE-5189 Project: HBase Issue Type: Improvement Components: metrics, regionserver Affects Versions: 0.92.0, 0.90.5 Reporter: Mubarak Seyed Assignee: Mubarak Seyed Priority: Minor Labels: noob Fix For: 0.94.0 Attachments: HBASE-5189-persistent.patch, HBASE-5189.trunk.v1.patch, HBASE-5189.trunk.v2.patch For write-heavy workload with region-size 1 GB, region-split is considerably high. We do normally grep the NN log (grep mkdir*.split NN.log | sort | uniq -c) to get the count. I would like to have a counter incremented each time region-split execution succeeds and this counter exposed via the metrics stuff in HBase. - regionSplitSuccessCount - regionSplitFailureCount (will help us to correlate the timestamp range in RS logs across all RS) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6444) Expose the ability to set custom HTTP Request Headers for the REST client used by RemoteHTable
[ https://issues.apache.org/jira/browse/HBASE-6444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-6444: --- Hadoop Flags: Reviewed Status: Patch Available (was: Open) Expose the ability to set custom HTTP Request Headers for the REST client used by RemoteHTable -- Key: HBASE-6444 URL: https://issues.apache.org/jira/browse/HBASE-6444 Project: HBase Issue Type: Improvement Components: rest Reporter: Erich Hochmuth Assignee: Jimmy Xiang Attachments: HBASE-6444-0.94.patch, HBASE-6444.patch, trunk-6444.patch, trunk-6444_v2.patch Original Estimate: 48h Remaining Estimate: 48h My corporate security office (ISO) requires that all http traffic get routed through a Web Access Management layer (http://en.wikipedia.org/wiki/Web_access_management) Our Hadoop cluster has been segmented by a virtual network with all access to HBase from outside clients being managed through HBase Stargate rest server. The corporate WAM system requires that all http clients authenticate with it first before making any http request to any http service in the corporate network. After the http client authenticates with the WAM system the WAM system returns the client a set of values that must be inserted into a http cookie and request header of all future http requests to other http clients. This would mean that all requests through the RemoteHTable interface would require that this cookie and request header be set as part of the http request. org.apache.hadoop.hbase.rest.client.Client looks like the appropriate place that this functionality would need to be plugged into. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6444) Expose the ability to set custom HTTP Request Headers for the REST client used by RemoteHTable
[ https://issues.apache.org/jira/browse/HBASE-6444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429273#comment-13429273 ] Jimmy Xiang commented on HBASE-6444: Cool, thanks. I will commit patch 2 to trunk tomorrow if no objection. Expose the ability to set custom HTTP Request Headers for the REST client used by RemoteHTable -- Key: HBASE-6444 URL: https://issues.apache.org/jira/browse/HBASE-6444 Project: HBase Issue Type: Improvement Components: rest Reporter: Erich Hochmuth Assignee: Jimmy Xiang Attachments: HBASE-6444-0.94.patch, HBASE-6444.patch, trunk-6444.patch, trunk-6444_v2.patch Original Estimate: 48h Remaining Estimate: 48h My corporate security office (ISO) requires that all http traffic get routed through a Web Access Management layer (http://en.wikipedia.org/wiki/Web_access_management) Our Hadoop cluster has been segmented by a virtual network with all access to HBase from outside clients being managed through HBase Stargate rest server. The corporate WAM system requires that all http clients authenticate with it first before making any http request to any http service in the corporate network. After the http client authenticates with the WAM system the WAM system returns the client a set of values that must be inserted into a http cookie and request header of all future http requests to other http clients. This would mean that all requests through the RemoteHTable interface would require that this cookie and request header be set as part of the http request. org.apache.hadoop.hbase.rest.client.Client looks like the appropriate place that this functionality would need to be plugged into. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6518) Bytes.toBytesBinary() incorrect trailing backslash escape
[ https://issues.apache.org/jira/browse/HBASE-6518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tudor Scurtu updated HBASE-6518: Attachment: HBASE-6518.patch Bytes.toBytesBinary() incorrect trailing backslash escape - Key: HBASE-6518 URL: https://issues.apache.org/jira/browse/HBASE-6518 Project: HBase Issue Type: Bug Components: util Reporter: Tudor Scurtu Assignee: Tudor Scurtu Priority: Trivial Labels: patch Attachments: HBASE-6518.patch Bytes.toBytesBinary() converts escaped strings to byte arrays. When encountering a '\' character, it looks at the next one to see if it is an 'x', without checking if it exists. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6444) Expose the ability to set custom HTTP Request Headers for the REST client used by RemoteHTable
[ https://issues.apache.org/jira/browse/HBASE-6444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429275#comment-13429275 ] Erich Hochmuth commented on HBASE-6444: --- Not a problem...i appreciate the help! Expose the ability to set custom HTTP Request Headers for the REST client used by RemoteHTable -- Key: HBASE-6444 URL: https://issues.apache.org/jira/browse/HBASE-6444 Project: HBase Issue Type: Improvement Components: rest Reporter: Erich Hochmuth Assignee: Jimmy Xiang Attachments: HBASE-6444-0.94.patch, HBASE-6444.patch, trunk-6444.patch, trunk-6444_v2.patch Original Estimate: 48h Remaining Estimate: 48h My corporate security office (ISO) requires that all http traffic get routed through a Web Access Management layer (http://en.wikipedia.org/wiki/Web_access_management) Our Hadoop cluster has been segmented by a virtual network with all access to HBase from outside clients being managed through HBase Stargate rest server. The corporate WAM system requires that all http clients authenticate with it first before making any http request to any http service in the corporate network. After the http client authenticates with the WAM system the WAM system returns the client a set of values that must be inserted into a http cookie and request header of all future http requests to other http clients. This would mean that all requests through the RemoteHTable interface would require that this cookie and request header be set as part of the http request. org.apache.hadoop.hbase.rest.client.Client looks like the appropriate place that this functionality would need to be plugged into. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6487) assign region doesn't check if the region is already assigned
[ https://issues.apache.org/jira/browse/HBASE-6487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-6487: --- Status: Open (was: Patch Available) assign region doesn't check if the region is already assigned - Key: HBASE-6487 URL: https://issues.apache.org/jira/browse/HBASE-6487 Project: HBase Issue Type: Bug Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Fix For: 0.96.0 Attachments: trunk-6487.patch Tried to assign a region already assigned somewhere from hbase shell, the region is assigned to a different place but the previous assignment is not closed. So it causes double assignments. In such a case, it's better to issue a warning instead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6509) Implement fast-forwarding FuzzyRowFilter to allow filter rows e.g. by ???alex?b
[ https://issues.apache.org/jira/browse/HBASE-6509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429284#comment-13429284 ] Alex Baranau commented on HBASE-6509: - yes, the point is to utilize fast-forwarding navigation between rows. Re more general optimization for regex comparator: I don't think it is doable in general case. I.e. only for very specific cases of regexp we can do fast-forwarding. If we do it for some, that it can be misleading for users, and again, the functionality might be overlooked. Implement fast-forwarding FuzzyRowFilter to allow filter rows e.g. by ???alex?b - Key: HBASE-6509 URL: https://issues.apache.org/jira/browse/HBASE-6509 Project: HBase Issue Type: New Feature Components: filters Reporter: Alex Baranau Assignee: Alex Baranau Priority: Minor Attachments: HBASE-6509.patch, HBASE-6509_1.patch Implement fuzzy row key filter to allow fetching records e.g. by this criteria: ???alex?b. This seems to be very useful as an alternative to select records by row keys by specifying their part which is not prefix part. Due to fast-forwarding nature of the filter in many situations this helps to avoid heavy full-table scans. This is especially effective when you have composite row key and (some of) its parts has fixed length. E.g. with the key of format userId_actionId_time, given that userId and actionId length is fixed, one can select user actions of specific type using fuzzy row key by specifying mask _myaction. Given fast-forwarding nature of filter, this will usually work much faster than doing whole table scan with any of the existing server-side filters. In many cases this can work as secondary-indexing alternative. Many times users implement it as a custom filter and many times they just don' know this is possible. Let's add it to the common codebase. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6487) assign region doesn't check if the region is already assigned
[ https://issues.apache.org/jira/browse/HBASE-6487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-6487: --- Attachment: trunk-6487_v2.patch assign region doesn't check if the region is already assigned - Key: HBASE-6487 URL: https://issues.apache.org/jira/browse/HBASE-6487 Project: HBase Issue Type: Bug Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Fix For: 0.96.0 Attachments: trunk-6487.patch, trunk-6487_v2.patch Tried to assign a region already assigned somewhere from hbase shell, the region is assigned to a different place but the previous assignment is not closed. So it causes double assignments. In such a case, it's better to issue a warning instead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6487) assign region doesn't check if the region is already assigned
[ https://issues.apache.org/jira/browse/HBASE-6487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-6487: --- Status: Patch Available (was: Open) assign region doesn't check if the region is already assigned - Key: HBASE-6487 URL: https://issues.apache.org/jira/browse/HBASE-6487 Project: HBase Issue Type: Bug Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Fix For: 0.96.0 Attachments: trunk-6487.patch, trunk-6487_v2.patch Tried to assign a region already assigned somewhere from hbase shell, the region is assigned to a different place but the previous assignment is not closed. So it causes double assignments. In such a case, it's better to issue a warning instead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6364) Powering down the server host holding the .META. table causes HBase Client to take excessively long to recover and connect to reassigned .META. table
[ https://issues.apache.org/jira/browse/HBASE-6364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429295#comment-13429295 ] nkeywal commented on HBASE-6364: bq. Fix formatting before commit N. Ok. Before committing, I would be interested by a feedback from Suraj. There are just a few lines of code, so rebasing won't be complicated if he needs some time to test it. bq. I think I understand the notifying that is going on on the end of the addCall method. They line up w/ waits on Call and waits on the calls data member? It's mainly playing with the synchronized: Connection#addCall Connection#setupIOstreams are both synchronized, so, on an exception during setupIOstreams, either: - you were waiting just before setupIOstreams, and in this case you've been clean up during setupIOstreams exception management - you were waiting before the addCall, and in this case you won't be added to the calls list. - in both case when you enter yourself in setupIOstreams you are filtered by the test on shouldCloseConnection bq. Would it be hard making a test of this bit of code? It's a difficult question, because there are both the behavior of this jira and both the generic behavior to be tested. 1) Just for this jira, when I tested it I added a sleep to simulate a connection timeout. I will provide soon a (small) set of utility functions to better simulated this, with real timeouts. This type of test (more in the category of regression tests than unit tests) could be added to the integration tests may be. I had various issues during the tests, it was more difficult than expected. 2) Testing the HBaseClient itself would be useful, but the interesting path is the multithreaded one. bq. What speed up around recovery are you seeing N? Should we change the default timeout too as Suraj does above? For the speed up, it's arbitrary, as it depends on the number of RS. on my tests, 20% of the calls were serialized. I.e. with an operation on 20 rs, the fix makes it 3 times faster. But it seems that Suraj had a much worse serialization, on a bigger cluster, so for him we could expect much better results, likely 20 times faster or better. Another point is that in this fix we don't keep a list of dead rs, so we cut the connection attempts only of they are happening while another is taking place. So if he could try it would be great. For the default timeout, I think we can cut down the connect timeout. But I think it's safer to make it to 5 seconds, so this fix remains important. I will work on this on another jira. bq. Do we need a 'synchronized (call)' block for the above notification ? It's a notification on the connection itself, and addCall is synchronized, so it's ok. Then 'calls' is a 'ConcurrentSkipListMap' so we can access it concurrently. Powering down the server host holding the .META. table causes HBase Client to take excessively long to recover and connect to reassigned .META. table - Key: HBASE-6364 URL: https://issues.apache.org/jira/browse/HBASE-6364 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.6, 0.92.1, 0.94.0 Reporter: Suraj Varma Assignee: nkeywal Labels: client Fix For: 0.96.0 Attachments: 6364-host-serving-META.v1.patch, 6364.v1.patch, 6364.v1.patch, 6364.v2.patch, 6364.v3.patch, 6364.v3.patch, stacktrace.txt When a server host with a Region Server holding the .META. table is powered down on a live cluster, while the HBase cluster itself detects and reassigns the .META. table, connected HBase Client's take an excessively long time to detect this and re-discover the reassigned .META. Workaround: Decrease the ipc.socket.timeout on HBase Client side to a low value (default is 20s leading to 35 minute recovery time; we were able to get acceptable results with 100ms getting a 3 minute recovery) This was found during some hardware failure testing scenarios. Test Case: 1) Apply load via client app on HBase cluster for several minutes 2) Power down the region server holding the .META. server (i.e. power off ... and keep it off) 3) Measure how long it takes for cluster to reassign META table and for client threads to re-lookup and re-orient to the lesser cluster (minus the RS and DN on that host). Observation: 1) Client threads spike up to maxThreads size ... and take over 35 mins to recover (i.e. for the thread count to go back to normal) - no client calls are serviced - they just back up on a synchronized method (see #2 below) 2) All the client app threads queue up behind the oahh.ipc.HBaseClient#setupIOStreams method
[jira] [Commented] (HBASE-6444) Expose the ability to set custom HTTP Request Headers for the REST client used by RemoteHTable
[ https://issues.apache.org/jira/browse/HBASE-6444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429311#comment-13429311 ] Hadoop QA commented on HBASE-6444: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12539318/trunk-6444_v2.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 5 javac compiler warnings (more than the trunk's current 4 warnings). -1 findbugs. The patch appears to introduce 9 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2520//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2520//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2520//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2520//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2520//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2520//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2520//console This message is automatically generated. Expose the ability to set custom HTTP Request Headers for the REST client used by RemoteHTable -- Key: HBASE-6444 URL: https://issues.apache.org/jira/browse/HBASE-6444 Project: HBase Issue Type: Improvement Components: rest Reporter: Erich Hochmuth Assignee: Jimmy Xiang Attachments: HBASE-6444-0.94.patch, HBASE-6444.patch, trunk-6444.patch, trunk-6444_v2.patch Original Estimate: 48h Remaining Estimate: 48h My corporate security office (ISO) requires that all http traffic get routed through a Web Access Management layer (http://en.wikipedia.org/wiki/Web_access_management) Our Hadoop cluster has been segmented by a virtual network with all access to HBase from outside clients being managed through HBase Stargate rest server. The corporate WAM system requires that all http clients authenticate with it first before making any http request to any http service in the corporate network. After the http client authenticates with the WAM system the WAM system returns the client a set of values that must be inserted into a http cookie and request header of all future http requests to other http clients. This would mean that all requests through the RemoteHTable interface would require that this cookie and request header be set as part of the http request. org.apache.hadoop.hbase.rest.client.Client looks like the appropriate place that this functionality would need to be plugged into. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6137) RegionServer-level context and start/stop life-cycle methods for observer coprocessor
[ https://issues.apache.org/jira/browse/HBASE-6137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429319#comment-13429319 ] Lars Hofhansl commented on HBASE-6137: -- I missed this one when I filed HBASE-6505. Not sure I understand when exactly the start/stop methods would be called. RegionServer-level context and start/stop life-cycle methods for observer coprocessor - Key: HBASE-6137 URL: https://issues.apache.org/jira/browse/HBASE-6137 Project: HBase Issue Type: New Feature Components: coprocessors Affects Versions: 0.94.0 Reporter: James Taylor Coprocessors are a great way for an application to affect server-side processing. We're using observer coprocessors via the postScannerOpen to enable a scan to do aggregation. There's currently no way, however, to store/share state across coprocessor invocations on the regions within a region server. Ideally, we'd like to be able to have a context object that allows state to be shared across coprocessor invocation for the regions on the same region server. This would save us the setup cost for compiling our aggregators again for each region. Also useful, would be: - a start/stop method invocation on this new region server context object before the first region invocation and after the last region invocation on a given region server. - a way to pass state to the start/stop method from the client. The scan.setAttribute works well for passing state for the invocation on each region, but ideally something that would allow state to be passed just once per region server. One use case would be to pass a cache of the row data for a hash join implementation, where we wouldn't want to pass this information for every region. Our current work around is to either take the hit of the extra setup costs for the coprocessor invocation on each region or use an Endpoint coprocessor to initialize state prior to the client scan that will cause coprocessor invocations. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5189) Add metrics to keep track of region-splits in RS
[ https://issues.apache.org/jira/browse/HBASE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429327#comment-13429327 ] David S. Wang commented on HBASE-5189: -- Matteo, my non-binding opintion is that I think moving to PersistentMetricsTimeVaryingRate would be the right idea here in order to make this metric useful. Add metrics to keep track of region-splits in RS Key: HBASE-5189 URL: https://issues.apache.org/jira/browse/HBASE-5189 Project: HBase Issue Type: Improvement Components: metrics, regionserver Affects Versions: 0.90.5, 0.92.0 Reporter: Mubarak Seyed Assignee: Mubarak Seyed Priority: Minor Labels: noob Fix For: 0.94.0 Attachments: HBASE-5189-persistent.patch, HBASE-5189.trunk.v1.patch, HBASE-5189.trunk.v2.patch For write-heavy workload with region-size 1 GB, region-split is considerably high. We do normally grep the NN log (grep mkdir*.split NN.log | sort | uniq -c) to get the count. I would like to have a counter incremented each time region-split execution succeeds and this counter exposed via the metrics stuff in HBase. - regionSplitSuccessCount - regionSplitFailureCount (will help us to correlate the timestamp range in RS logs across all RS) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5189) Add metrics to keep track of region-splits in RS
[ https://issues.apache.org/jira/browse/HBASE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429328#comment-13429328 ] David S. Wang commented on HBASE-5189: -- ... for the reason you state (hit return too quickly). Add metrics to keep track of region-splits in RS Key: HBASE-5189 URL: https://issues.apache.org/jira/browse/HBASE-5189 Project: HBase Issue Type: Improvement Components: metrics, regionserver Affects Versions: 0.90.5, 0.92.0 Reporter: Mubarak Seyed Assignee: Mubarak Seyed Priority: Minor Labels: noob Fix For: 0.94.0 Attachments: HBASE-5189-persistent.patch, HBASE-5189.trunk.v1.patch, HBASE-5189.trunk.v2.patch For write-heavy workload with region-size 1 GB, region-split is considerably high. We do normally grep the NN log (grep mkdir*.split NN.log | sort | uniq -c) to get the count. I would like to have a counter incremented each time region-split execution succeeds and this counter exposed via the metrics stuff in HBase. - regionSplitSuccessCount - regionSplitFailureCount (will help us to correlate the timestamp range in RS logs across all RS) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5189) Add metrics to keep track of region-splits in RS
[ https://issues.apache.org/jira/browse/HBASE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429330#comment-13429330 ] Hadoop QA commented on HBASE-5189: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12538254/HBASE-5189-persistent.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 5 javac compiler warnings (more than the trunk's current 4 warnings). -1 findbugs. The patch appears to introduce 9 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2521//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2521//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2521//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2521//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2521//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2521//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2521//console This message is automatically generated. Add metrics to keep track of region-splits in RS Key: HBASE-5189 URL: https://issues.apache.org/jira/browse/HBASE-5189 Project: HBase Issue Type: Improvement Components: metrics, regionserver Affects Versions: 0.90.5, 0.92.0 Reporter: Mubarak Seyed Assignee: Mubarak Seyed Priority: Minor Labels: noob Fix For: 0.94.0 Attachments: HBASE-5189-persistent.patch, HBASE-5189.trunk.v1.patch, HBASE-5189.trunk.v2.patch For write-heavy workload with region-size 1 GB, region-split is considerably high. We do normally grep the NN log (grep mkdir*.split NN.log | sort | uniq -c) to get the count. I would like to have a counter incremented each time region-split execution succeeds and this counter exposed via the metrics stuff in HBase. - regionSplitSuccessCount - regionSplitFailureCount (will help us to correlate the timestamp range in RS logs across all RS) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6497) Revisit HLog sizing and roll parameters
[ https://issues.apache.org/jira/browse/HBASE-6497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429350#comment-13429350 ] Jean-Daniel Cryans commented on HBASE-6497: --- bq. Less parallelization per RS. If you have a lot of RSes, lowering file count does help reduce HBase RPCs too? I'm not sure I understand what you mean. HBase RPCs in which context? Revisit HLog sizing and roll parameters --- Key: HBASE-6497 URL: https://issues.apache.org/jira/browse/HBASE-6497 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Lars George The last major update to the HLog sizing and roll features were done in HBASE-1394. I am proposing to revisit these settings to overcome recent issues where the HLog becomes a major bottleneck. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6487) assign region doesn't check if the region is already assigned
[ https://issues.apache.org/jira/browse/HBASE-6487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429353#comment-13429353 ] Hadoop QA commented on HBASE-6487: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12539326/trunk-6487_v2.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 5 javac compiler warnings (more than the trunk's current 4 warnings). -1 findbugs. The patch appears to introduce 9 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.replication.TestReplication org.apache.hadoop.hbase.util.TestHBaseFsck Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2522//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2522//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2522//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2522//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2522//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2522//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2522//console This message is automatically generated. assign region doesn't check if the region is already assigned - Key: HBASE-6487 URL: https://issues.apache.org/jira/browse/HBASE-6487 Project: HBase Issue Type: Bug Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Fix For: 0.96.0 Attachments: trunk-6487.patch, trunk-6487_v2.patch Tried to assign a region already assigned somewhere from hbase shell, the region is assigned to a different place but the previous assignment is not closed. So it causes double assignments. In such a case, it's better to issue a warning instead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic
[ https://issues.apache.org/jira/browse/HBASE-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429359#comment-13429359 ] Todd Lipcon commented on HBASE-6358: bq. @Todd, would you be in favor of adding another JIRA ticket for a distributed bulk loader, and having this ticket be blocked until it's done? I think it should be blocked so we don't remove the current bulkload from remote fs capability without offering an alternative, though the user does have the option of running distcp themselves. I could go either way on this. Up to folks who are more actively contributing code than I :) Bulkloading from remote filesystem is problematic - Key: HBASE-6358 URL: https://issues.apache.org/jira/browse/HBASE-6358 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.94.0 Reporter: Dave Revell Assignee: Dave Revell Attachments: 6358-suggestion.txt, HBASE-6358-trunk-v1.diff, HBASE-6358-trunk-v2.diff, HBASE-6358-trunk-v3.diff Bulk loading hfiles that don't live on the same filesystem as HBase can cause problems for subtle reasons. In Store.bulkLoadHFile(), the regionserver will copy the source hfile to its own filesystem if it's not already there. Since this can take a long time for large hfiles, it's likely that the client will timeout and retry. When the client retries repeatedly, there may be several bulkload operations in flight for the same hfile, causing lots of unnecessary IO and tying up handler threads. This can seriously impact performance. In my case, the cluster became unusable and the regionservers had to be kill -9'ed. Possible solutions: # Require that hfiles already be on the same filesystem as HBase in order for bulkloading to succeed. The copy could be handled by LoadIncrementalHFiles before the regionserver is called. # Others? I'm not familiar with Hadoop IPC so there may be tricks to extend the timeout or something else. I'm willing to write a patch but I'd appreciate recommendations on how to proceed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6515) Setting request size with protobuf
[ https://issues.apache.org/jira/browse/HBASE-6515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429360#comment-13429360 ] Todd Lipcon commented on HBASE-6515: Did you look into what the default max size is? I don't think we should arbitrarily raise the limit. Instead, if replication sends too-large RPCs, we should figure out how to make it do smaller batches to fit within the limit. RPC payloads in the 10s or 100s of MBs are not good. Setting request size with protobuf -- Key: HBASE-6515 URL: https://issues.apache.org/jira/browse/HBASE-6515 Project: HBase Issue Type: Bug Components: ipc, replication Affects Versions: 0.96.0 Reporter: Himanshu Vashishtha Priority: Critical While running replication on upstream code, I am hitting the size-limit exception while sending WALEdits to a different cluster. {code} com.google.protobuf.InvalidProtocolBufferException: IPC server unable to read call parameters: Protocol message was too large. May be malicious. Use CodedInputStream.setSizeLimit() to increase the size limit. {code} Do we have a property to set some max size or something? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6364) Powering down the server host holding the .META. table causes HBase Client to take excessively long to recover and connect to reassigned .META. table
[ https://issues.apache.org/jira/browse/HBASE-6364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429394#comment-13429394 ] stack commented on HBASE-6364: -- bq. Before committing, I would be interested by a feedback from Suraj. There are just a few lines of code, so rebasing won't be complicated if he needs some time to test it. Sounds good. bq. For the default timeout, I think we can cut down the connect timeout. But I think it's safer to make it to 5 seconds, so this fix remains important. I will work on this on another jira. Sounds good too. On test, even if its utility to simulate so you can prove your fix, that'd be great. I like the numbers you are quoting above. Powering down the server host holding the .META. table causes HBase Client to take excessively long to recover and connect to reassigned .META. table - Key: HBASE-6364 URL: https://issues.apache.org/jira/browse/HBASE-6364 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.6, 0.92.1, 0.94.0 Reporter: Suraj Varma Assignee: nkeywal Labels: client Fix For: 0.96.0 Attachments: 6364-host-serving-META.v1.patch, 6364.v1.patch, 6364.v1.patch, 6364.v2.patch, 6364.v3.patch, 6364.v3.patch, stacktrace.txt When a server host with a Region Server holding the .META. table is powered down on a live cluster, while the HBase cluster itself detects and reassigns the .META. table, connected HBase Client's take an excessively long time to detect this and re-discover the reassigned .META. Workaround: Decrease the ipc.socket.timeout on HBase Client side to a low value (default is 20s leading to 35 minute recovery time; we were able to get acceptable results with 100ms getting a 3 minute recovery) This was found during some hardware failure testing scenarios. Test Case: 1) Apply load via client app on HBase cluster for several minutes 2) Power down the region server holding the .META. server (i.e. power off ... and keep it off) 3) Measure how long it takes for cluster to reassign META table and for client threads to re-lookup and re-orient to the lesser cluster (minus the RS and DN on that host). Observation: 1) Client threads spike up to maxThreads size ... and take over 35 mins to recover (i.e. for the thread count to go back to normal) - no client calls are serviced - they just back up on a synchronized method (see #2 below) 2) All the client app threads queue up behind the oahh.ipc.HBaseClient#setupIOStreams method http://tinyurl.com/7js53dj After taking several thread dumps we found that the thread within this synchronized method was blocked on NetUtils.connect(this.socket, remoteId.getAddress(), getSocketTimeout(conf)); The client thread that gets the synchronized lock would try to connect to the dead RS (till socket times out after 20s), retries, and then the next thread gets in and so forth in a serial manner. Workaround: --- Default ipc.socket.timeout is set to 20s. We dropped this to a low number (1000 ms, 100 ms, etc) on the client side hbase-site.xml. With this setting, the client threads recovered in a couple of minutes by failing fast and re-discovering the .META. table on a reassigned RS. Assumption: This ipc.socket.timeout is only ever used during the initial HConnection setup via the NetUtils.connect and should only ever be used when connectivity to a region server is lost and needs to be re-established. i.e it does not affect the normal RPC actiivity as this is just the connect timeout. During RS GC periods, any _new_ clients trying to connect will fail and will require .META. table re-lookups. This above timeout workaround is only for the HBase client side. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6518) Bytes.toBytesBinary() incorrect trailing backslash escape
[ https://issues.apache.org/jira/browse/HBASE-6518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429398#comment-13429398 ] stack commented on HBASE-6518: -- +1 on the patch. In future, you don't need to catch the exception and then fail... just let it out... that'll fail the test {code} +} catch (StringIndexOutOfBoundsException ex) { + fail(Illegal string access: + ex.getMessage()); +} {code} Also, we put spaces around operators in our code. See the rest of the code. Let me run this by the hadoopqa to see if it passes Bytes.toBytesBinary() incorrect trailing backslash escape - Key: HBASE-6518 URL: https://issues.apache.org/jira/browse/HBASE-6518 Project: HBase Issue Type: Bug Components: util Reporter: Tudor Scurtu Assignee: Tudor Scurtu Priority: Trivial Labels: patch Attachments: HBASE-6518.patch Bytes.toBytesBinary() converts escaped strings to byte arrays. When encountering a '\' character, it looks at the next one to see if it is an 'x', without checking if it exists. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6518) Bytes.toBytesBinary() incorrect trailing backslash escape
[ https://issues.apache.org/jira/browse/HBASE-6518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-6518: - Status: Patch Available (was: Open) Bytes.toBytesBinary() incorrect trailing backslash escape - Key: HBASE-6518 URL: https://issues.apache.org/jira/browse/HBASE-6518 Project: HBase Issue Type: Bug Components: util Reporter: Tudor Scurtu Assignee: Tudor Scurtu Priority: Trivial Labels: patch Attachments: HBASE-6518.patch Bytes.toBytesBinary() converts escaped strings to byte arrays. When encountering a '\' character, it looks at the next one to see if it is an 'x', without checking if it exists. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6513) Test errors when building on MacOS
[ https://issues.apache.org/jira/browse/HBASE-6513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429399#comment-13429399 ] stack commented on HBASE-6513: -- Happens every time? Test errors when building on MacOS -- Key: HBASE-6513 URL: https://issues.apache.org/jira/browse/HBASE-6513 Project: HBase Issue Type: Bug Components: build Environment: MacOSX 10.8 Oracle JDK 1.7 Reporter: Archimedes Trajano Results : Failed tests: testBackgroundEvictionThread[0](org.apache.hadoop.hbase.io.hfile.TestLruBlockCache): expected:2 but was:1 testBackgroundEvictionThread[1](org.apache.hadoop.hbase.io.hfile.TestLruBlockCache): expected:2 but was:1 testSplitCalculatorEq(org.apache.hadoop.hbase.util.TestRegionSplitCalculator): expected:2 but was:1 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6373) Add more context information to audit log messages
[ https://issues.apache.org/jira/browse/HBASE-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-6373: - Attachment: accesscontroller094.patch Patch for 0.94 branch. Add more context information to audit log messages -- Key: HBASE-6373 URL: https://issues.apache.org/jira/browse/HBASE-6373 Project: HBase Issue Type: Improvement Components: security Affects Versions: 0.96.0, 0.94.2 Reporter: Marcelo Vanzin Priority: Minor Fix For: 0.96.0 Attachments: accesscontroller.patch, accesscontroller.patch, accesscontroller094.patch The attached patch adds more information to the audit log messages; namely, it includes the IP address where the request originated, if it's available. The patch is against trunk, but I've tested it against the 0.92 branch. I didn't find any unit test for this code, please let me know if I missed something. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6373) Add more context information to audit log messages
[ https://issues.apache.org/jira/browse/HBASE-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-6373: - Fix Version/s: 0.94.2 Release Note: Applied to 0.94 branch too. Add more context information to audit log messages -- Key: HBASE-6373 URL: https://issues.apache.org/jira/browse/HBASE-6373 Project: HBase Issue Type: Improvement Components: security Affects Versions: 0.96.0, 0.94.2 Reporter: Marcelo Vanzin Priority: Minor Fix For: 0.96.0, 0.94.2 Attachments: accesscontroller.patch, accesscontroller.patch, accesscontroller094.patch The attached patch adds more information to the audit log messages; namely, it includes the IP address where the request originated, if it's available. The patch is against trunk, but I've tested it against the 0.92 branch. I didn't find any unit test for this code, please let me know if I missed something. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6373) Add more context information to audit log messages
[ https://issues.apache.org/jira/browse/HBASE-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429414#comment-13429414 ] stack commented on HBASE-6373: -- Applied to 0.94 branch too... Add more context information to audit log messages -- Key: HBASE-6373 URL: https://issues.apache.org/jira/browse/HBASE-6373 Project: HBase Issue Type: Improvement Components: security Affects Versions: 0.96.0, 0.94.2 Reporter: Marcelo Vanzin Priority: Minor Fix For: 0.96.0, 0.94.2 Attachments: accesscontroller.patch, accesscontroller.patch, accesscontroller094.patch The attached patch adds more information to the audit log messages; namely, it includes the IP address where the request originated, if it's available. The patch is against trunk, but I've tested it against the 0.92 branch. I didn't find any unit test for this code, please let me know if I missed something. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6302) Document how to run integration tests
[ https://issues.apache.org/jira/browse/HBASE-6302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429416#comment-13429416 ] stack commented on HBASE-6302: -- Patch looks good to me Enis. What you think of Andrew's comments above? Document how to run integration tests - Key: HBASE-6302 URL: https://issues.apache.org/jira/browse/HBASE-6302 Project: HBase Issue Type: Sub-task Components: documentation Reporter: stack Assignee: Enis Soztutar Priority: Blocker Fix For: 0.96.0 Attachments: HBASE-6302_v1.patch HBASE-6203 has attached the old IT doc with some mods. When we figure how ITs are to be run, update it and apply the documentation under this issue. Making a blocker against 0.96. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6449) Dapper like tracing
[ https://issues.apache.org/jira/browse/HBASE-6449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429419#comment-13429419 ] stack commented on HBASE-6449: -- @Jonathan Sounds good. Would suggest new issue for adding trace hooks to hbase. Could be subissue of this one. Good on you. Dapper like tracing --- Key: HBASE-6449 URL: https://issues.apache.org/jira/browse/HBASE-6449 Project: HBase Issue Type: New Feature Components: client, ipc Affects Versions: 0.96.0 Reporter: Jonathan Leavitt Labels: tracing Attachments: htrace1.diff, htrace2.diff, trace.png Add [Dapper|http://research.google.com/pubs/pub36356.html] like tracing to HBase. [Accumulo|http://accumulo.apache.org] added something similar with their cloudtrace package. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6495) HBaseAdmin shouldn't expect HConnection to be an HConnectionImplementation
[ https://issues.apache.org/jira/browse/HBASE-6495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429421#comment-13429421 ] stack commented on HBASE-6495: -- You have list of what HBaseAdmin needs beyond HConnection mighty Jesse? HBaseAdmin shouldn't expect HConnection to be an HConnectionImplementation -- Key: HBASE-6495 URL: https://issues.apache.org/jira/browse/HBASE-6495 Project: HBase Issue Type: Bug Affects Versions: 0.96.0, 0.94.1 Reporter: Jesse Yates Fix For: 0.96.0, 0.94.1 Currently, the HBaseAdmin has a constructor that takes an HConnection, but then immediately casts it to an HConnectionManager.HConnectionImplementation: {code} public HBaseAdmin(HConnection connection) throws MasterNotRunningException, ZooKeeperConnectionException { this.conf = connection.getConfiguration(); // We want the real class, without showing it our public interface, // hence the cast. this.connection = (HConnectionManager.HConnectionImplementation)connection; {code} However, this breaks the explicit contract in the javadocs and makes it basically impossible to mock out the hbaseadmin. We need to either make the hbaseadmin use a basic HConnection and optimize for cases where its smarter or bring up the couple of methods in HConnectionManager.HConnectionImplementation to the HConnection interface. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6488) HBase wont run on IPv6 on OSes that use zone-indexes
[ https://issues.apache.org/jira/browse/HBASE-6488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429423#comment-13429423 ] stack commented on HBASE-6488: -- You have a new patch for us RR? HBase wont run on IPv6 on OSes that use zone-indexes Key: HBASE-6488 URL: https://issues.apache.org/jira/browse/HBASE-6488 Project: HBase Issue Type: Bug Affects Versions: 0.94.0 Reporter: ryan rawson Attachments: HBASE-6488.txt In IPv6, an address may have a zone-index, which is specified with a percent, eg: ...%0. This looks like a format string, and thus in a part of the code which uses the hostname as a prefix to another string which is interpreted with String.format, you end up with an exception: 2012-07-31 18:21:39,848 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown. java.util.UnknownFormatConversionException: Conversion = '0' at java.util.Formatter.checkText(Formatter.java:2503) at java.util.Formatter.parse(Formatter.java:2467) at java.util.Formatter.format(Formatter.java:2414) at java.util.Formatter.format(Formatter.java:2367) at java.lang.String.format(String.java:2769) at com.google.common.util.concurrent.ThreadFactoryBuilder.setNameFormat(ThreadFactoryBuilder.java:68) at org.apache.hadoop.hbase.executor.ExecutorService$Executor.init(ExecutorService.java:299) at org.apache.hadoop.hbase.executor.ExecutorService.startExecutorService(ExecutorService.java:185) at org.apache.hadoop.hbase.executor.ExecutorService.startExecutorService(ExecutorService.java:227) at org.apache.hadoop.hbase.master.HMaster.startServiceThreads(HMaster.java:821) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:507) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:344) at org.apache.hadoop.hbase.master.HMasterCommandLine$LocalHMaster.run(HMasterCommandLine.java:220) at java.lang.Thread.run(Thread.java:680) 2012-07-31 18:21:39,908 INFO org.apache.hadoop.hbase.master.HMaster: Aborting -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5189) Add metrics to keep track of region-splits in RS
[ https://issues.apache.org/jira/browse/HBASE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429452#comment-13429452 ] stack commented on HBASE-5189: -- bq. Can someone comment on the original idea/use case of the region split counter? I'm pretty sure that the original implementation had little thought spent on it (and it sounds plain broke). Patch looks good to me (What you think Elliott). Any chance of a release note documenting its changed format Matteo? Good on you. Add metrics to keep track of region-splits in RS Key: HBASE-5189 URL: https://issues.apache.org/jira/browse/HBASE-5189 Project: HBase Issue Type: Improvement Components: metrics, regionserver Affects Versions: 0.90.5, 0.92.0 Reporter: Mubarak Seyed Assignee: Mubarak Seyed Priority: Minor Labels: noob Fix For: 0.94.0 Attachments: HBASE-5189-persistent.patch, HBASE-5189.trunk.v1.patch, HBASE-5189.trunk.v2.patch For write-heavy workload with region-size 1 GB, region-split is considerably high. We do normally grep the NN log (grep mkdir*.split NN.log | sort | uniq -c) to get the count. I would like to have a counter incremented each time region-split execution succeeds and this counter exposed via the metrics stuff in HBase. - regionSplitSuccessCount - regionSplitFailureCount (will help us to correlate the timestamp range in RS logs across all RS) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6518) Bytes.toBytesBinary() incorrect trailing backslash escape
[ https://issues.apache.org/jira/browse/HBASE-6518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429454#comment-13429454 ] Hadoop QA commented on HBASE-6518: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12539320/HBASE-6518.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 5 javac compiler warnings (more than the trunk's current 4 warnings). -1 findbugs. The patch appears to introduce 9 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.client.TestAdmin org.apache.hadoop.hbase.master.TestSplitLogManager Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2523//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2523//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2523//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2523//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2523//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2523//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2523//console This message is automatically generated. Bytes.toBytesBinary() incorrect trailing backslash escape - Key: HBASE-6518 URL: https://issues.apache.org/jira/browse/HBASE-6518 Project: HBase Issue Type: Bug Components: util Reporter: Tudor Scurtu Assignee: Tudor Scurtu Priority: Trivial Labels: patch Attachments: HBASE-6518.patch Bytes.toBytesBinary() converts escaped strings to byte arrays. When encountering a '\' character, it looks at the next one to see if it is an 'x', without checking if it exists. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6373) Add more context information to audit log messages
[ https://issues.apache.org/jira/browse/HBASE-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429473#comment-13429473 ] Hudson commented on HBASE-6373: --- Integrated in HBase-0.94 #384 (See [https://builds.apache.org/job/HBase-0.94/384/]) HBASE-6373 Add more context information to audit log messages (Revision 1370005) Result = FAILURE stack : Files : * /hbase/branches/0.94/security/src/main/java/org/apache/hadoop/hbase/security/access/AccessController.java Add more context information to audit log messages -- Key: HBASE-6373 URL: https://issues.apache.org/jira/browse/HBASE-6373 Project: HBase Issue Type: Improvement Components: security Affects Versions: 0.96.0, 0.94.2 Reporter: Marcelo Vanzin Priority: Minor Fix For: 0.96.0, 0.94.2 Attachments: accesscontroller.patch, accesscontroller.patch, accesscontroller094.patch The attached patch adds more information to the audit log messages; namely, it includes the IP address where the request originated, if it's available. The patch is against trunk, but I've tested it against the 0.92 branch. I didn't find any unit test for this code, please let me know if I missed something. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6364) Powering down the server host holding the .META. table causes HBase Client to take excessively long to recover and connect to reassigned .META. table
[ https://issues.apache.org/jira/browse/HBASE-6364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-6364: - Fix Version/s: 0.94.2 Not sure I wrapped my head around the issue completely. But from the discussion here and looking at the patch it looks right. This should be in 0.94 as well. Powering down the server host holding the .META. table causes HBase Client to take excessively long to recover and connect to reassigned .META. table - Key: HBASE-6364 URL: https://issues.apache.org/jira/browse/HBASE-6364 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.6, 0.92.1, 0.94.0 Reporter: Suraj Varma Assignee: nkeywal Labels: client Fix For: 0.96.0, 0.94.2 Attachments: 6364-host-serving-META.v1.patch, 6364.v1.patch, 6364.v1.patch, 6364.v2.patch, 6364.v3.patch, 6364.v3.patch, stacktrace.txt When a server host with a Region Server holding the .META. table is powered down on a live cluster, while the HBase cluster itself detects and reassigns the .META. table, connected HBase Client's take an excessively long time to detect this and re-discover the reassigned .META. Workaround: Decrease the ipc.socket.timeout on HBase Client side to a low value (default is 20s leading to 35 minute recovery time; we were able to get acceptable results with 100ms getting a 3 minute recovery) This was found during some hardware failure testing scenarios. Test Case: 1) Apply load via client app on HBase cluster for several minutes 2) Power down the region server holding the .META. server (i.e. power off ... and keep it off) 3) Measure how long it takes for cluster to reassign META table and for client threads to re-lookup and re-orient to the lesser cluster (minus the RS and DN on that host). Observation: 1) Client threads spike up to maxThreads size ... and take over 35 mins to recover (i.e. for the thread count to go back to normal) - no client calls are serviced - they just back up on a synchronized method (see #2 below) 2) All the client app threads queue up behind the oahh.ipc.HBaseClient#setupIOStreams method http://tinyurl.com/7js53dj After taking several thread dumps we found that the thread within this synchronized method was blocked on NetUtils.connect(this.socket, remoteId.getAddress(), getSocketTimeout(conf)); The client thread that gets the synchronized lock would try to connect to the dead RS (till socket times out after 20s), retries, and then the next thread gets in and so forth in a serial manner. Workaround: --- Default ipc.socket.timeout is set to 20s. We dropped this to a low number (1000 ms, 100 ms, etc) on the client side hbase-site.xml. With this setting, the client threads recovered in a couple of minutes by failing fast and re-discovering the .META. table on a reassigned RS. Assumption: This ipc.socket.timeout is only ever used during the initial HConnection setup via the NetUtils.connect and should only ever be used when connectivity to a region server is lost and needs to be re-established. i.e it does not affect the normal RPC actiivity as this is just the connect timeout. During RS GC periods, any _new_ clients trying to connect will fail and will require .META. table re-lookups. This above timeout workaround is only for the HBase client side. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6519) FSRegionScanner should be in its own file
Ramkumar Vadali created HBASE-6519: -- Summary: FSRegionScanner should be in its own file Key: HBASE-6519 URL: https://issues.apache.org/jira/browse/HBASE-6519 Project: HBase Issue Type: Improvement Components: util Environment: mac osx, jdk 1.6 Reporter: Ramkumar Vadali Priority: Minor I found this problem in the 0.89-fb branch. I was not able to start the master because of a ClassNotFoundException for FSRegionScanner. FSRegionScanner is a top-level class in FSUtils.java. Moving it to a separate file solved the problem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6495) HBaseAdmin shouldn't expect HConnection to be an HConnectionImplementation
[ https://issues.apache.org/jira/browse/HBASE-6495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429550#comment-13429550 ] Jesse Yates commented on HBASE-6495: @stack: yeah, when doing some work on HBASE-6055 it was basically impossible to cleanly mock out the client-to-master connection since I couldn't pass in an HConnection that was expected from a generic package. Ended up having to move to the .client package and mock out an HConnection.HConnectionImplementation which wasn't very nice. That's my main gripe - not the end of the world, but definitely a nice to have. Unless I'm missing something? HBaseAdmin shouldn't expect HConnection to be an HConnectionImplementation -- Key: HBASE-6495 URL: https://issues.apache.org/jira/browse/HBASE-6495 Project: HBase Issue Type: Bug Affects Versions: 0.96.0, 0.94.1 Reporter: Jesse Yates Fix For: 0.96.0, 0.94.1 Currently, the HBaseAdmin has a constructor that takes an HConnection, but then immediately casts it to an HConnectionManager.HConnectionImplementation: {code} public HBaseAdmin(HConnection connection) throws MasterNotRunningException, ZooKeeperConnectionException { this.conf = connection.getConfiguration(); // We want the real class, without showing it our public interface, // hence the cast. this.connection = (HConnectionManager.HConnectionImplementation)connection; {code} However, this breaks the explicit contract in the javadocs and makes it basically impossible to mock out the hbaseadmin. We need to either make the hbaseadmin use a basic HConnection and optimize for cases where its smarter or bring up the couple of methods in HConnectionManager.HConnectionImplementation to the HConnection interface. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6519) FSRegionScanner should be in its own file
[ https://issues.apache.org/jira/browse/HBASE-6519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429580#comment-13429580 ] Ramkumar Vadali commented on HBASE-6519: https://reviews.facebook.net/D4533 FSRegionScanner should be in its own file - Key: HBASE-6519 URL: https://issues.apache.org/jira/browse/HBASE-6519 Project: HBase Issue Type: Improvement Components: util Environment: mac osx, jdk 1.6 Reporter: Ramkumar Vadali Priority: Minor I found this problem in the 0.89-fb branch. I was not able to start the master because of a ClassNotFoundException for FSRegionScanner. FSRegionScanner is a top-level class in FSUtils.java. Moving it to a separate file solved the problem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6052) Convert .META. and -ROOT- content to pb
[ https://issues.apache.org/jira/browse/HBASE-6052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-6052: - Attachment: HBASE-6052_v3.patch v3 up in PB. Thanks for the reviews Convert .META. and -ROOT- content to pb --- Key: HBASE-6052 URL: https://issues.apache.org/jira/browse/HBASE-6052 Project: HBase Issue Type: Sub-task Reporter: stack Assignee: Enis Soztutar Priority: Blocker Fix For: 0.96.0 Attachments: HBASE-6052_v1.patch, HBASE-6052_v2.patch, HBASE-6052_v3.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6052) Convert .META. and -ROOT- content to pb
[ https://issues.apache.org/jira/browse/HBASE-6052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-6052: - Status: Open (was: Patch Available) Convert .META. and -ROOT- content to pb --- Key: HBASE-6052 URL: https://issues.apache.org/jira/browse/HBASE-6052 Project: HBase Issue Type: Sub-task Reporter: stack Assignee: Enis Soztutar Priority: Blocker Fix For: 0.96.0 Attachments: HBASE-6052_v1.patch, HBASE-6052_v2.patch, HBASE-6052_v3.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6052) Convert .META. and -ROOT- content to pb
[ https://issues.apache.org/jira/browse/HBASE-6052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-6052: - Status: Patch Available (was: Open) Convert .META. and -ROOT- content to pb --- Key: HBASE-6052 URL: https://issues.apache.org/jira/browse/HBASE-6052 Project: HBase Issue Type: Sub-task Reporter: stack Assignee: Enis Soztutar Priority: Blocker Fix For: 0.96.0 Attachments: HBASE-6052_v1.patch, HBASE-6052_v2.patch, HBASE-6052_v3.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6509) Implement fast-forwarding FuzzyRowFilter to allow filter rows e.g. by ???alex?b
[ https://issues.apache.org/jira/browse/HBASE-6509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Baranau updated HBASE-6509: Attachment: HBASE-6509_2.patch Ted, thank you for review. Fixed nits, updated diff. Implement fast-forwarding FuzzyRowFilter to allow filter rows e.g. by ???alex?b - Key: HBASE-6509 URL: https://issues.apache.org/jira/browse/HBASE-6509 Project: HBase Issue Type: New Feature Components: filters Reporter: Alex Baranau Assignee: Alex Baranau Priority: Minor Attachments: HBASE-6509.patch, HBASE-6509_1.patch, HBASE-6509_2.patch Implement fuzzy row key filter to allow fetching records e.g. by this criteria: ???alex?b. This seems to be very useful as an alternative to select records by row keys by specifying their part which is not prefix part. Due to fast-forwarding nature of the filter in many situations this helps to avoid heavy full-table scans. This is especially effective when you have composite row key and (some of) its parts has fixed length. E.g. with the key of format userId_actionId_time, given that userId and actionId length is fixed, one can select user actions of specific type using fuzzy row key by specifying mask _myaction. Given fast-forwarding nature of filter, this will usually work much faster than doing whole table scan with any of the existing server-side filters. In many cases this can work as secondary-indexing alternative. Many times users implement it as a custom filter and many times they just don' know this is possible. Let's add it to the common codebase. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6407) Investigate moving to DI (guice) framework for plugin arch.
[ https://issues.apache.org/jira/browse/HBASE-6407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HBASE-6407: - Attachment: HBASE-6407-4.patch There are still a lot of things that aren't Guice'd but the patch works. Added more explicit dependencies on Guice Moved HRegion and HMaster to a factory that takes in a configuration. Local clusters were too big of an issue where sometimes the conf needed to be copied and other times it needed to stay the same. Added JavaDocs for the factory classes. Removed the CompatibilitySingletoneFactory as this is it's replacement. Continued work on Guicify things. I have one test that I need to finish cleaning up. I'll get to that before the next version. Investigate moving to DI (guice) framework for plugin arch. --- Key: HBASE-6407 URL: https://issues.apache.org/jira/browse/HBASE-6407 Project: HBase Issue Type: Sub-task Reporter: Elliott Clark Assignee: Elliott Clark Attachments: HBASE-6407-1.patch, HBASE-6407-2.patch, HBASE-6407-3.patch, HBASE-6407-4.patch Investigate using Guice to inject the correct compat object provided by compat plugins -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6317) Master clean start up and Partially enabled tables make region assignment inconsistent.
[ https://issues.apache.org/jira/browse/HBASE-6317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] rajeshbabu updated HBASE-6317: -- Status: Patch Available (was: Open) Master clean start up and Partially enabled tables make region assignment inconsistent. --- Key: HBASE-6317 URL: https://issues.apache.org/jira/browse/HBASE-6317 Project: HBase Issue Type: Bug Reporter: ramkrishna.s.vasudevan Assignee: rajeshbabu Fix For: 0.92.2, 0.96.0, 0.94.2 Attachments: HBASE-6317_94.patch, HBASE-6317_94_3.patch, HBASE-6317_trunk_2.patch If we have a table in partially enabled state (ENABLING) then on HMaster restart we treat it as a clean cluster start up and do a bulk assign. Currently in 0.94 bulk assign will not handle ALREADY_OPENED scenarios and it leads to region assignment problems. Analysing more on this we found that we have better way to handle these scenarios. {code} if (false == checkIfRegionBelongsToDisabled(regionInfo) false == checkIfRegionsBelongsToEnabling(regionInfo)) { synchronized (this.regions) { regions.put(regionInfo, regionLocation); addToServers(regionLocation, regionInfo); } {code} We dont add to regions map so that enable table handler can handle it. But as nothing is added to regions map we think it as a clean cluster start up. Will come up with a patch tomorrow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6317) Master clean start up and Partially enabled tables make region assignment inconsistent.
[ https://issues.apache.org/jira/browse/HBASE-6317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] rajeshbabu updated HBASE-6317: -- Attachment: HBASE-6317_trunk_2.patch Same patch on RB. Master clean start up and Partially enabled tables make region assignment inconsistent. --- Key: HBASE-6317 URL: https://issues.apache.org/jira/browse/HBASE-6317 Project: HBase Issue Type: Bug Reporter: ramkrishna.s.vasudevan Assignee: rajeshbabu Fix For: 0.92.2, 0.96.0, 0.94.2 Attachments: HBASE-6317_94.patch, HBASE-6317_94_3.patch, HBASE-6317_trunk_2.patch If we have a table in partially enabled state (ENABLING) then on HMaster restart we treat it as a clean cluster start up and do a bulk assign. Currently in 0.94 bulk assign will not handle ALREADY_OPENED scenarios and it leads to region assignment problems. Analysing more on this we found that we have better way to handle these scenarios. {code} if (false == checkIfRegionBelongsToDisabled(regionInfo) false == checkIfRegionsBelongsToEnabling(regionInfo)) { synchronized (this.regions) { regions.put(regionInfo, regionLocation); addToServers(regionLocation, regionInfo); } {code} We dont add to regions map so that enable table handler can handle it. But as nothing is added to regions map we think it as a clean cluster start up. Will come up with a patch tomorrow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6516) hbck cannot detect any IOException while .tableinfo file is missing
[ https://issues.apache.org/jira/browse/HBASE-6516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429654#comment-13429654 ] Jie Huang commented on HBASE-6516: -- Thanks Andrew. I have modified the patch file accordingly. bq. Have you tried running the unit test suite with your patch applied? What is the result? Yes, I have verified all unit tests before uploading the patch file. hbck cannot detect any IOException while .tableinfo file is missing - Key: HBASE-6516 URL: https://issues.apache.org/jira/browse/HBASE-6516 Project: HBase Issue Type: Bug Components: hbck Affects Versions: 0.94.0, 0.96.0 Reporter: Jie Huang Attachments: hbase-6516-v2.patch, hbase-6516.patch HBaseFsck checks those missing .tableinfo files in loadHdfsRegionInfos() function. However, no IoException will be catched while .tableinfo is missing, since FSTableDescriptors.getTableDescriptor doesn't throw any IoException. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6516) hbck cannot detect any IOException while .tableinfo file is missing
[ https://issues.apache.org/jira/browse/HBASE-6516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Huang updated HBASE-6516: - Attachment: hbase-6516-v2.patch hbck cannot detect any IOException while .tableinfo file is missing - Key: HBASE-6516 URL: https://issues.apache.org/jira/browse/HBASE-6516 Project: HBase Issue Type: Bug Components: hbck Affects Versions: 0.94.0, 0.96.0 Reporter: Jie Huang Attachments: hbase-6516-v2.patch, hbase-6516.patch HBaseFsck checks those missing .tableinfo files in loadHdfsRegionInfos() function. However, no IoException will be catched while .tableinfo is missing, since FSTableDescriptors.getTableDescriptor doesn't throw any IoException. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6515) Setting request size with protobuf
[ https://issues.apache.org/jira/browse/HBASE-6515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429661#comment-13429661 ] Himanshu Vashishtha commented on HBASE-6515: Not yet, but yes, will test in that direction. While using ycsb, I see it does support 6MB waledit object. HTable's write buffer can also play interesting role here; it creates one WALEdit object per table flush. Setting it to a higher value may cause this exception. Given that current rpc can handle 64 MB arrays, this is something to look for. I will report back the default limit soon. Setting request size with protobuf -- Key: HBASE-6515 URL: https://issues.apache.org/jira/browse/HBASE-6515 Project: HBase Issue Type: Bug Components: ipc, replication Affects Versions: 0.96.0 Reporter: Himanshu Vashishtha Priority: Critical While running replication on upstream code, I am hitting the size-limit exception while sending WALEdits to a different cluster. {code} com.google.protobuf.InvalidProtocolBufferException: IPC server unable to read call parameters: Protocol message was too large. May be malicious. Use CodedInputStream.setSizeLimit() to increase the size limit. {code} Do we have a property to set some max size or something? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6509) Implement fast-forwarding FuzzyRowFilter to allow filter rows e.g. by ???alex?b
[ https://issues.apache.org/jira/browse/HBASE-6509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429664#comment-13429664 ] Hadoop QA commented on HBASE-6509: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12539386/HBASE-6509_2.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 5 javac compiler warnings (more than the trunk's current 4 warnings). -1 findbugs. The patch appears to introduce 9 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2524//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2524//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2524//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2524//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2524//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2524//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2524//console This message is automatically generated. Implement fast-forwarding FuzzyRowFilter to allow filter rows e.g. by ???alex?b - Key: HBASE-6509 URL: https://issues.apache.org/jira/browse/HBASE-6509 Project: HBase Issue Type: New Feature Components: filters Reporter: Alex Baranau Assignee: Alex Baranau Priority: Minor Attachments: HBASE-6509.patch, HBASE-6509_1.patch, HBASE-6509_2.patch Implement fuzzy row key filter to allow fetching records e.g. by this criteria: ???alex?b. This seems to be very useful as an alternative to select records by row keys by specifying their part which is not prefix part. Due to fast-forwarding nature of the filter in many situations this helps to avoid heavy full-table scans. This is especially effective when you have composite row key and (some of) its parts has fixed length. E.g. with the key of format userId_actionId_time, given that userId and actionId length is fixed, one can select user actions of specific type using fuzzy row key by specifying mask _myaction. Given fast-forwarding nature of filter, this will usually work much faster than doing whole table scan with any of the existing server-side filters. In many cases this can work as secondary-indexing alternative. Many times users implement it as a custom filter and many times they just don' know this is possible. Let's add it to the common codebase. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6052) Convert .META. and -ROOT- content to pb
[ https://issues.apache.org/jira/browse/HBASE-6052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429685#comment-13429685 ] Hadoop QA commented on HBASE-6052: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12539385/HBASE-6052_v3.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 67 new or modified tests. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 5 javac compiler warnings (more than the trunk's current 4 warnings). -1 findbugs. The patch appears to introduce 4 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.replication.TestReplication org.apache.hadoop.hbase.catalog.TestMetaMigrationConvertingToPB org.apache.hadoop.hbase.master.TestAssignmentManager Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2525//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2525//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2525//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2525//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2525//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2525//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2525//console This message is automatically generated. Convert .META. and -ROOT- content to pb --- Key: HBASE-6052 URL: https://issues.apache.org/jira/browse/HBASE-6052 Project: HBase Issue Type: Sub-task Reporter: stack Assignee: Enis Soztutar Priority: Blocker Fix For: 0.96.0 Attachments: HBASE-6052_v1.patch, HBASE-6052_v2.patch, HBASE-6052_v3.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6496) Example ZK based scan policy
[ https://issues.apache.org/jira/browse/HBASE-6496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-6496: - Attachment: 6496-v2.txt Here's a patch based on HBASE-6505. All CP instances will use a single watcher, which keeps the date up to date asynchronously. If the watcher get disconnected from ZK it will try reconnect periodically. Example ZK based scan policy Key: HBASE-6496 URL: https://issues.apache.org/jira/browse/HBASE-6496 Project: HBase Issue Type: Sub-task Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.96.0, 0.94.2 Attachments: 6496-v2.txt, 6496.txt Provide an example of a RegionServer that listens to a ZK node to learn about what set of KVs can safely be deleted during a compaction. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6520) MSLab May cause the Bytes.toLong does not work correctly for increment
ShiXing created HBASE-6520: -- Summary: MSLab May cause the Bytes.toLong does not work correctly for increment Key: HBASE-6520 URL: https://issues.apache.org/jira/browse/HBASE-6520 Project: HBase Issue Type: Bug Reporter: ShiXing Assignee: ShiXing When use MemStoreLAB, the KeyValues will share the byte array allocated by the MemStoreLAB, all the KeyValues' bytes attributes are the same byte array. When use the functions such as Bytes.toLong(byte[] bytes, int offset): {code} public static long toLong(byte[] bytes, int offset) { return toLong(bytes, offset, SIZEOF_LONG); } public static long toLong(byte[] bytes, int offset, final int length) { if (length != SIZEOF_LONG || offset + length bytes.length) { throw explainWrongLengthOrOffset(bytes, offset, length, SIZEOF_LONG); } long l = 0; for(int i = offset; i offset + length; i++) { l = 8; l ^= bytes[i] 0xFF; } return l; } {code} If we do not put a long value to the KeyValue, and read it as a long value in HRegion.increment(),the check {code} offset + length bytes.length {code} will take no effects, because the bytes.length is not equal to keyLength+valueLength, indeed it is MemStoreLAB chunkSize which is default 2048 * 1024. I will paste the patch later. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6454) Write PB definitions for filters
[ https://issues.apache.org/jira/browse/HBASE-6454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429925#comment-13429925 ] Hudson commented on HBASE-6454: --- Integrated in HBase-TRUNK #3198 (See [https://builds.apache.org/job/HBase-TRUNK/3198/]) HBASE-6454 Write PB definitions for filters, addendum adds FilterProtos.java (Gregory) (Revision 1370111) Result = SUCCESS tedyu : Files : * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/protobuf/generated/FilterProtos.java Write PB definitions for filters Key: HBASE-6454 URL: https://issues.apache.org/jira/browse/HBASE-6454 Project: HBase Issue Type: Task Components: ipc, migration Reporter: Gregory Chanan Assignee: Gregory Chanan Fix For: 0.96.0 Attachments: HBASE-6454.patch See HBASE-5447. Conversion to protobuf requires writing protobuf definitions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3996) Support multiple tables and scanners as input to the mapper in map/reduce jobs
[ https://issues.apache.org/jira/browse/HBASE-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429936#comment-13429936 ] Lars Hofhansl commented on HBASE-3996: -- Somehow I missed this (probably because of HBaseCon and vacation in June). Apologies for that. Let's finish this and get it in. I'll look at the patch again tomorrow. Support multiple tables and scanners as input to the mapper in map/reduce jobs -- Key: HBASE-3996 URL: https://issues.apache.org/jira/browse/HBASE-3996 Project: HBase Issue Type: Improvement Components: mapreduce Reporter: Eran Kutner Assignee: Eran Kutner Fix For: 0.96.0 Attachments: 3996-v2.txt, 3996-v3.txt, 3996-v4.txt, 3996-v5.txt, 3996-v6.txt, 3996-v7.txt, HBase-3996.patch It seems that in many cases feeding data from multiple tables or multiple scanners on a single table can save a lot of time when running map/reduce jobs. I propose a new MultiTableInputFormat class that would allow doing this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira