[jira] [Commented] (HBASE-5924) In the client code, don't wait for all the requests to be executed before resubmitting a request in error.
[ https://issues.apache.org/jira/browse/HBASE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13294841#comment-13294841 ] Ted Yu commented on HBASE-5924: --- Strange that TestDrainingServer failed Hadoop QA. It passed locally. Patch integrated to trunk. Thanks for the patch, N. Thanks for the review, Stack. In the client code, don't wait for all the requests to be executed before resubmitting a request in error. -- Key: HBASE-5924 URL: https://issues.apache.org/jira/browse/HBASE-5924 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Fix For: 0.96.0 Attachments: 5924.v11.patch, 5924.v14.patch, 5924.v19.patch, 5924.v5.patch, 5924.v9.patch The client (in the function HConnectionManager#processBatchCallback) works in two steps: - make the requests - collect the failures and successes and prepare for retry It means that when there is an immediate error (region moved, split, dead server, ...) we still wait for all the initial requests to be executed before submitting again the failed request. If we have a scenario with all the requests taking 5 seconds we have a final execution time of: 5 (initial requests) + 1 (wait time) + 5 (final request) = 11s. We could improve this by analyzing immediately the results. This would lead us, for the scenario mentioned above, to 6 seconds. So we could have a performance improvement of nearly 50% in many cases, and much more than 50% if the request execution time is different. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5924) In the client code, don't wait for all the requests to be executed before resubmitting a request in error.
[ https://issues.apache.org/jira/browse/HBASE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13294928#comment-13294928 ] Hudson commented on HBASE-5924: --- Integrated in HBase-TRUNK #3023 (See [https://builds.apache.org/job/HBase-TRUNK/3023/]) HBASE-5924 In the client code, don't wait for all the requests to be executed before resubmitting a request in error (N Keywal) Submitted by: N Keywal Reviewed by:Stack, Ted (Revision 1350105) Result = FAILURE tedyu : Files : * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/client/HConnection.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/client/HTable.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/client/HTableInterface.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/coprocessor/CoprocessorHost.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/rest/client/RemoteHTable.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/util/Triple.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestHCM.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterCoprocessorExceptionWithRemove.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionServerCoprocessorExceptionWithAbort.java In the client code, don't wait for all the requests to be executed before resubmitting a request in error. -- Key: HBASE-5924 URL: https://issues.apache.org/jira/browse/HBASE-5924 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Fix For: 0.96.0 Attachments: 5924.v11.patch, 5924.v14.patch, 5924.v19.patch, 5924.v5.patch, 5924.v9.patch The client (in the function HConnectionManager#processBatchCallback) works in two steps: - make the requests - collect the failures and successes and prepare for retry It means that when there is an immediate error (region moved, split, dead server, ...) we still wait for all the initial requests to be executed before submitting again the failed request. If we have a scenario with all the requests taking 5 seconds we have a final execution time of: 5 (initial requests) + 1 (wait time) + 5 (final request) = 11s. We could improve this by analyzing immediately the results. This would lead us, for the scenario mentioned above, to 6 seconds. So we could have a performance improvement of nearly 50% in many cases, and much more than 50% if the request execution time is different. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5924) In the client code, don't wait for all the requests to be executed before resubmitting a request in error.
[ https://issues.apache.org/jira/browse/HBASE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13294977#comment-13294977 ] Hudson commented on HBASE-5924: --- Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #53 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/53/]) HBASE-5924 In the client code, don't wait for all the requests to be executed before resubmitting a request in error (N Keywal) Submitted by: N Keywal Reviewed by:Stack, Ted (Revision 1350105) Result = FAILURE tedyu : Files : * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/client/HConnection.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/client/HTable.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/client/HTableInterface.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/coprocessor/CoprocessorHost.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/rest/client/RemoteHTable.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/util/Triple.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestHCM.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterCoprocessorExceptionWithRemove.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionServerCoprocessorExceptionWithAbort.java In the client code, don't wait for all the requests to be executed before resubmitting a request in error. -- Key: HBASE-5924 URL: https://issues.apache.org/jira/browse/HBASE-5924 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Fix For: 0.96.0 Attachments: 5924.v11.patch, 5924.v14.patch, 5924.v19.patch, 5924.v5.patch, 5924.v9.patch The client (in the function HConnectionManager#processBatchCallback) works in two steps: - make the requests - collect the failures and successes and prepare for retry It means that when there is an immediate error (region moved, split, dead server, ...) we still wait for all the initial requests to be executed before submitting again the failed request. If we have a scenario with all the requests taking 5 seconds we have a final execution time of: 5 (initial requests) + 1 (wait time) + 5 (final request) = 11s. We could improve this by analyzing immediately the results. This would lead us, for the scenario mentioned above, to 6 seconds. So we could have a performance improvement of nearly 50% in many cases, and much more than 50% if the request execution time is different. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5924) In the client code, don't wait for all the requests to be executed before resubmitting a request in error.
[ https://issues.apache.org/jira/browse/HBASE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13295204#comment-13295204 ] stack commented on HBASE-5924: -- I took a look at committed patch. This looks great. In the client code, don't wait for all the requests to be executed before resubmitting a request in error. -- Key: HBASE-5924 URL: https://issues.apache.org/jira/browse/HBASE-5924 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Fix For: 0.96.0 Attachments: 5924.v11.patch, 5924.v14.patch, 5924.v19.patch, 5924.v5.patch, 5924.v9.patch The client (in the function HConnectionManager#processBatchCallback) works in two steps: - make the requests - collect the failures and successes and prepare for retry It means that when there is an immediate error (region moved, split, dead server, ...) we still wait for all the initial requests to be executed before submitting again the failed request. If we have a scenario with all the requests taking 5 seconds we have a final execution time of: 5 (initial requests) + 1 (wait time) + 5 (final request) = 11s. We could improve this by analyzing immediately the results. This would lead us, for the scenario mentioned above, to 6 seconds. So we could have a performance improvement of nearly 50% in many cases, and much more than 50% if the request execution time is different. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5924) In the client code, don't wait for all the requests to be executed before resubmitting a request in error.
[ https://issues.apache.org/jira/browse/HBASE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13294587#comment-13294587 ] nkeywal commented on HBASE-5924: v19 with all the comments taken into account. I will create an another jira to rearrange the coprocessors tests on the znode. In the client code, don't wait for all the requests to be executed before resubmitting a request in error. -- Key: HBASE-5924 URL: https://issues.apache.org/jira/browse/HBASE-5924 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Fix For: 0.96.0 Attachments: 5924.v11.patch, 5924.v14.patch, 5924.v19.patch, 5924.v5.patch, 5924.v9.patch The client (in the function HConnectionManager#processBatchCallback) works in two steps: - make the requests - collect the failures and successes and prepare for retry It means that when there is an immediate error (region moved, split, dead server, ...) we still wait for all the initial requests to be executed before submitting again the failed request. If we have a scenario with all the requests taking 5 seconds we have a final execution time of: 5 (initial requests) + 1 (wait time) + 5 (final request) = 11s. We could improve this by analyzing immediately the results. This would lead us, for the scenario mentioned above, to 6 seconds. So we could have a performance improvement of nearly 50% in many cases, and much more than 50% if the request execution time is different. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5924) In the client code, don't wait for all the requests to be executed before resubmitting a request in error.
[ https://issues.apache.org/jira/browse/HBASE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13294612#comment-13294612 ] Hadoop QA commented on HBASE-5924: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12531976/5924.v19.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 9 new or modified tests. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 6 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.TestDrainingServer Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2163//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2163//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2163//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2163//console This message is automatically generated. In the client code, don't wait for all the requests to be executed before resubmitting a request in error. -- Key: HBASE-5924 URL: https://issues.apache.org/jira/browse/HBASE-5924 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Fix For: 0.96.0 Attachments: 5924.v11.patch, 5924.v14.patch, 5924.v19.patch, 5924.v5.patch, 5924.v9.patch The client (in the function HConnectionManager#processBatchCallback) works in two steps: - make the requests - collect the failures and successes and prepare for retry It means that when there is an immediate error (region moved, split, dead server, ...) we still wait for all the initial requests to be executed before submitting again the failed request. If we have a scenario with all the requests taking 5 seconds we have a final execution time of: 5 (initial requests) + 1 (wait time) + 5 (final request) = 11s. We could improve this by analyzing immediately the results. This would lead us, for the scenario mentioned above, to 6 seconds. So we could have a performance improvement of nearly 50% in many cases, and much more than 50% if the request execution time is different. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5924) In the client code, don't wait for all the requests to be executed before resubmitting a request in error.
[ https://issues.apache.org/jira/browse/HBASE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293775#comment-13293775 ] nkeywal commented on HBASE-5924: @ted: Ok. These issues were already in my initial patch. Could you confirm that you have finished the review? I would like to deliver 'the' final patch. Thank you. In the client code, don't wait for all the requests to be executed before resubmitting a request in error. -- Key: HBASE-5924 URL: https://issues.apache.org/jira/browse/HBASE-5924 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Fix For: 0.96.0 Attachments: 5924.v11.patch, 5924.v14.patch, 5924.v5.patch, 5924.v9.patch The client (in the function HConnectionManager#processBatchCallback) works in two steps: - make the requests - collect the failures and successes and prepare for retry It means that when there is an immediate error (region moved, split, dead server, ...) we still wait for all the initial requests to be executed before submitting again the failed request. If we have a scenario with all the requests taking 5 seconds we have a final execution time of: 5 (initial requests) + 1 (wait time) + 5 (final request) = 11s. We could improve this by analyzing immediately the results. This would lead us, for the scenario mentioned above, to 6 seconds. So we could have a performance improvement of nearly 50% in many cases, and much more than 50% if the request execution time is different. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5924) In the client code, don't wait for all the requests to be executed before resubmitting a request in error.
[ https://issues.apache.org/jira/browse/HBASE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293778#comment-13293778 ] Zhihong Ted Yu commented on HBASE-5924: --- I don't have further comments. Thanks In the client code, don't wait for all the requests to be executed before resubmitting a request in error. -- Key: HBASE-5924 URL: https://issues.apache.org/jira/browse/HBASE-5924 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Fix For: 0.96.0 Attachments: 5924.v11.patch, 5924.v14.patch, 5924.v5.patch, 5924.v9.patch The client (in the function HConnectionManager#processBatchCallback) works in two steps: - make the requests - collect the failures and successes and prepare for retry It means that when there is an immediate error (region moved, split, dead server, ...) we still wait for all the initial requests to be executed before submitting again the failed request. If we have a scenario with all the requests taking 5 seconds we have a final execution time of: 5 (initial requests) + 1 (wait time) + 5 (final request) = 11s. We could improve this by analyzing immediately the results. This would lead us, for the scenario mentioned above, to 6 seconds. So we could have a performance improvement of nearly 50% in many cases, and much more than 50% if the request execution time is different. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5924) In the client code, don't wait for all the requests to be executed before resubmitting a request in error.
[ https://issues.apache.org/jira/browse/HBASE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13291624#comment-13291624 ] nkeywal commented on HBASE-5924: I've done some changes to TestRegionServerCoprocessorExceptionWithAbort. The new implementation is very different but I think closer to what the test really wants to check... In the client code, don't wait for all the requests to be executed before resubmitting a request in error. -- Key: HBASE-5924 URL: https://issues.apache.org/jira/browse/HBASE-5924 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Attachments: 5924.v11.patch, 5924.v14.patch, 5924.v5.patch, 5924.v9.patch The client (in the function HConnectionManager#processBatchCallback) works in two steps: - make the requests - collect the failures and successes and prepare for retry It means that when there is an immediate error (region moved, split, dead server, ...) we still wait for all the initial requests to be executed before submitting again the failed request. If we have a scenario with all the requests taking 5 seconds we have a final execution time of: 5 (initial requests) + 1 (wait time) + 5 (final request) = 11s. We could improve this by analyzing immediately the results. This would lead us, for the scenario mentioned above, to 6 seconds. So we could have a performance improvement of nearly 50% in many cases, and much more than 50% if the request execution time is different. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5924) In the client code, don't wait for all the requests to be executed before resubmitting a request in error.
[ https://issues.apache.org/jira/browse/HBASE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13291644#comment-13291644 ] Hadoop QA commented on HBASE-5924: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12531380/5924.v14.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 5 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2129//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2129//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2129//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2129//console This message is automatically generated. In the client code, don't wait for all the requests to be executed before resubmitting a request in error. -- Key: HBASE-5924 URL: https://issues.apache.org/jira/browse/HBASE-5924 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Attachments: 5924.v11.patch, 5924.v14.patch, 5924.v5.patch, 5924.v9.patch The client (in the function HConnectionManager#processBatchCallback) works in two steps: - make the requests - collect the failures and successes and prepare for retry It means that when there is an immediate error (region moved, split, dead server, ...) we still wait for all the initial requests to be executed before submitting again the failed request. If we have a scenario with all the requests taking 5 seconds we have a final execution time of: 5 (initial requests) + 1 (wait time) + 5 (final request) = 11s. We could improve this by analyzing immediately the results. This would lead us, for the scenario mentioned above, to 6 seconds. So we could have a performance improvement of nearly 50% in many cases, and much more than 50% if the request execution time is different. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5924) In the client code, don't wait for all the requests to be executed before resubmitting a request in error.
[ https://issues.apache.org/jira/browse/HBASE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13291656#comment-13291656 ] nkeywal commented on HBASE-5924: I don't reproduce locally the issue on TestServerCustomProtocol, seems ok to me. In the client code, don't wait for all the requests to be executed before resubmitting a request in error. -- Key: HBASE-5924 URL: https://issues.apache.org/jira/browse/HBASE-5924 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Fix For: 0.96.0 Attachments: 5924.v11.patch, 5924.v14.patch, 5924.v5.patch, 5924.v9.patch The client (in the function HConnectionManager#processBatchCallback) works in two steps: - make the requests - collect the failures and successes and prepare for retry It means that when there is an immediate error (region moved, split, dead server, ...) we still wait for all the initial requests to be executed before submitting again the failed request. If we have a scenario with all the requests taking 5 seconds we have a final execution time of: 5 (initial requests) + 1 (wait time) + 5 (final request) = 11s. We could improve this by analyzing immediately the results. This would lead us, for the scenario mentioned above, to 6 seconds. So we could have a performance improvement of nearly 50% in many cases, and much more than 50% if the request execution time is different. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5924) In the client code, don't wait for all the requests to be executed before resubmitting a request in error.
[ https://issues.apache.org/jira/browse/HBASE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13291657#comment-13291657 ] Zhihong Ted Yu commented on HBASE-5924: --- With RSTracker gone, the following flag is no longer checked: {code} -public synchronized void nodeDeleted(String path) { - if (path.equals(rsNode)) { -regionZKNodeWasDeleted = true; {code} Can we keep the check ? {code} -assertTrue(RegionServer aborted on coprocessor exception, as expected., -rsTracker.regionZKNodeWasDeleted); {code} I think this should be kept: {code} -table.close(); {code} In the client code, don't wait for all the requests to be executed before resubmitting a request in error. -- Key: HBASE-5924 URL: https://issues.apache.org/jira/browse/HBASE-5924 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Fix For: 0.96.0 Attachments: 5924.v11.patch, 5924.v14.patch, 5924.v5.patch, 5924.v9.patch The client (in the function HConnectionManager#processBatchCallback) works in two steps: - make the requests - collect the failures and successes and prepare for retry It means that when there is an immediate error (region moved, split, dead server, ...) we still wait for all the initial requests to be executed before submitting again the failed request. If we have a scenario with all the requests taking 5 seconds we have a final execution time of: 5 (initial requests) + 1 (wait time) + 5 (final request) = 11s. We could improve this by analyzing immediately the results. This would lead us, for the scenario mentioned above, to 6 seconds. So we could have a performance improvement of nearly 50% in many cases, and much more than 50% if the request execution time is different. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5924) In the client code, don't wait for all the requests to be executed before resubmitting a request in error.
[ https://issues.apache.org/jira/browse/HBASE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13291663#comment-13291663 ] nkeywal commented on HBASE-5924: bq. With RSTracker gone, the following flag is no longer checked: Aggreed, but this unit test is supposed to test if the region server aborted when the coprocessor bugged, not if the regionserver znode is deleted on regionserver abort. I propose to check if there is an existing test on this in the tests suite and if not add it in the regionserver test package. I will comment in this jira if there is already a test, create a new one if I need to extend an existing test case. bq. table.close(); Yeah, I checked and it's still works, with or without the close. In the client code, don't wait for all the requests to be executed before resubmitting a request in error. -- Key: HBASE-5924 URL: https://issues.apache.org/jira/browse/HBASE-5924 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Fix For: 0.96.0 Attachments: 5924.v11.patch, 5924.v14.patch, 5924.v5.patch, 5924.v9.patch The client (in the function HConnectionManager#processBatchCallback) works in two steps: - make the requests - collect the failures and successes and prepare for retry It means that when there is an immediate error (region moved, split, dead server, ...) we still wait for all the initial requests to be executed before submitting again the failed request. If we have a scenario with all the requests taking 5 seconds we have a final execution time of: 5 (initial requests) + 1 (wait time) + 5 (final request) = 11s. We could improve this by analyzing immediately the results. This would lead us, for the scenario mentioned above, to 6 seconds. So we could have a performance improvement of nearly 50% in many cases, and much more than 50% if the request execution time is different. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5924) In the client code, don't wait for all the requests to be executed before resubmitting a request in error.
[ https://issues.apache.org/jira/browse/HBASE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13291811#comment-13291811 ] Zhihong Ted Yu commented on HBASE-5924: --- {code} $ find hbase-server/src/test -name '*.java' -exec grep 'nized void nodeDeleted(Str' {} \; -print public synchronized void nodeDeleted(String path) { hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterCoprocessorExceptionWithAbort.java public synchronized void nodeDeleted(String path) { hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterCoprocessorExceptionWithRemove.java public synchronized void nodeDeleted(String path) { hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionServerCoprocessorExceptionWithAbort.java {code} The other two checks are for master znode. w.r.t. table.close(), it is good programming practice of cleaning up resources. In the client code, don't wait for all the requests to be executed before resubmitting a request in error. -- Key: HBASE-5924 URL: https://issues.apache.org/jira/browse/HBASE-5924 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Fix For: 0.96.0 Attachments: 5924.v11.patch, 5924.v14.patch, 5924.v5.patch, 5924.v9.patch The client (in the function HConnectionManager#processBatchCallback) works in two steps: - make the requests - collect the failures and successes and prepare for retry It means that when there is an immediate error (region moved, split, dead server, ...) we still wait for all the initial requests to be executed before submitting again the failed request. If we have a scenario with all the requests taking 5 seconds we have a final execution time of: 5 (initial requests) + 1 (wait time) + 5 (final request) = 11s. We could improve this by analyzing immediately the results. This would lead us, for the scenario mentioned above, to 6 seconds. So we could have a performance improvement of nearly 50% in many cases, and much more than 50% if the request execution time is different. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5924) In the client code, don't wait for all the requests to be executed before resubmitting a request in error.
[ https://issues.apache.org/jira/browse/HBASE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13291815#comment-13291815 ] nkeywal commented on HBASE-5924: bq. w.r.t. table.close(), it is good programming practice of cleaning up resources. Yes, I agree. I wanted to say in my previous answer: I tested, it works, it can be added. In the client code, don't wait for all the requests to be executed before resubmitting a request in error. -- Key: HBASE-5924 URL: https://issues.apache.org/jira/browse/HBASE-5924 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Fix For: 0.96.0 Attachments: 5924.v11.patch, 5924.v14.patch, 5924.v5.patch, 5924.v9.patch The client (in the function HConnectionManager#processBatchCallback) works in two steps: - make the requests - collect the failures and successes and prepare for retry It means that when there is an immediate error (region moved, split, dead server, ...) we still wait for all the initial requests to be executed before submitting again the failed request. If we have a scenario with all the requests taking 5 seconds we have a final execution time of: 5 (initial requests) + 1 (wait time) + 5 (final request) = 11s. We could improve this by analyzing immediately the results. This would lead us, for the scenario mentioned above, to 6 seconds. So we could have a performance improvement of nearly 50% in many cases, and much more than 50% if the request execution time is different. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5924) In the client code, don't wait for all the requests to be executed before resubmitting a request in error.
[ https://issues.apache.org/jira/browse/HBASE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13291889#comment-13291889 ] Zhihong Ted Yu commented on HBASE-5924: --- {code} +if (loc == null) + throw new IOException(); {code} Without braces, the throw statement should be on the same line as if. Please include a brief message for the exception. Some long lines should be wrapped: {code} +final MapHRegionLocation, MultiActionR actionsByServer = new HashMapHRegionLocation, MultiActionR(); ... +new TripleMultiActionR, HRegionLocation, FutureMultiResponse(e.getValue(), e.getKey(), this.pool.submit(callable)); {code} In the client code, don't wait for all the requests to be executed before resubmitting a request in error. -- Key: HBASE-5924 URL: https://issues.apache.org/jira/browse/HBASE-5924 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Fix For: 0.96.0 Attachments: 5924.v11.patch, 5924.v14.patch, 5924.v5.patch, 5924.v9.patch The client (in the function HConnectionManager#processBatchCallback) works in two steps: - make the requests - collect the failures and successes and prepare for retry It means that when there is an immediate error (region moved, split, dead server, ...) we still wait for all the initial requests to be executed before submitting again the failed request. If we have a scenario with all the requests taking 5 seconds we have a final execution time of: 5 (initial requests) + 1 (wait time) + 5 (final request) = 11s. We could improve this by analyzing immediately the results. This would lead us, for the scenario mentioned above, to 6 seconds. So we could have a performance improvement of nearly 50% in many cases, and much more than 50% if the request execution time is different. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5924) In the client code, don't wait for all the requests to be executed before resubmitting a request in error.
[ https://issues.apache.org/jira/browse/HBASE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13290899#comment-13290899 ] nkeywal commented on HBASE-5924: @ted bq. 'origin' - 'original', 'what are the actions to replay' - 'what actions to replay' Done. bq. InterruptedIOException should be thrown. Done. bq. The above is hard to read. A period between 'records' and 'results' ? A period between 'list' and 'walk' ? It was already there previously :-). But I aggree, it's better with some periods or dots. Done. bq. hbase-server/src/main/java/org/apache/hadoop/hbase/util/Triple.java was not included in patch v9. Done @stack bq. Did you add the history in the first place? Why is it safe to remove it now? In the previous code we were updating the locations cache multiple times for the same row, and the second time without the RegionMovedException. So it was necessary to store that we had already taken the error into account for this row... We now update the locations cache only once, so we don't need to store the history anymore. bq. On your three comments above, on 1., on the unused code, it may not be triggered by the test suite – that could just be bad test coverage – but independent, there may have been a reason for it. If your review of processBatchCallback has it making no sense, by all means purge it (as you have done). Yep, for this one removing it allows to simplify the algorithm as I can find the original actions. bq. On 2., the callback, it looks like you kept it. I think that sensible. On 3., can we move it to HTable? Deprecate the current version in favor of the new HTable/HTableInterface version? Would that be too disruptive? We can keep the existing interface, deprecate it, and add the new one in HTable, making it call the old one. Then in the future remove if from HConnection and move the code in HTable. I've done it in v10. bq. Any way you can add tests to prove your claims of improvement above (its hard to review for that... It's hard. Testing that we restart immediately instead of waiting for all results is difficult without adding sleeps and/or mocking a lot of things, because it's not visible at all outside of the method: its interface has not changed, just the internal algorithm. Functionally, it's tested through testRegionCaching (with some extra checks in it in this patch), and it proves that: - it works on nominal case (and you can't start the mini cluster when the nominal case does not work). - it retries when one RS fails - it stops to retry when the number of retries is reached, and throws the right exception with the right content For the performance improvement on nominal case, unfortunately it does not make a big difference. It's cleaner, but the tests done show that it's not important vs. the remaining time. In the client code, don't wait for all the requests to be executed before resubmitting a request in error. -- Key: HBASE-5924 URL: https://issues.apache.org/jira/browse/HBASE-5924 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Attachments: 5924.v5.patch, 5924.v9.patch The client (in the function HConnectionManager#processBatchCallback) works in two steps: - make the requests - collect the failures and successes and prepare for retry It means that when there is an immediate error (region moved, split, dead server, ...) we still wait for all the initial requests to be executed before submitting again the failed request. If we have a scenario with all the requests taking 5 seconds we have a final execution time of: 5 (initial requests) + 1 (wait time) + 5 (final request) = 11s. We could improve this by analyzing immediately the results. This would lead us, for the scenario mentioned above, to 6 seconds. So we could have a performance improvement of nearly 50% in many cases, and much more than 50% if the request execution time is different. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5924) In the client code, don't wait for all the requests to be executed before resubmitting a request in error.
[ https://issues.apache.org/jira/browse/HBASE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13291069#comment-13291069 ] stack commented on HBASE-5924: -- Upload v10 nkeywal and we'll commit it. Thanks for responses above. In the client code, don't wait for all the requests to be executed before resubmitting a request in error. -- Key: HBASE-5924 URL: https://issues.apache.org/jira/browse/HBASE-5924 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Attachments: 5924.v5.patch, 5924.v9.patch The client (in the function HConnectionManager#processBatchCallback) works in two steps: - make the requests - collect the failures and successes and prepare for retry It means that when there is an immediate error (region moved, split, dead server, ...) we still wait for all the initial requests to be executed before submitting again the failed request. If we have a scenario with all the requests taking 5 seconds we have a final execution time of: 5 (initial requests) + 1 (wait time) + 5 (final request) = 11s. We could improve this by analyzing immediately the results. This would lead us, for the scenario mentioned above, to 6 seconds. So we could have a performance improvement of nearly 50% in many cases, and much more than 50% if the request execution time is different. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5924) In the client code, don't wait for all the requests to be executed before resubmitting a request in error.
[ https://issues.apache.org/jira/browse/HBASE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13291370#comment-13291370 ] nkeywal commented on HBASE-5924: v11: I changed the names to match HTableInterface#batch, so instead of HTableInterface#processBatchCallback I created HTableInterface#batchCallback. Local tests in progress. In the client code, don't wait for all the requests to be executed before resubmitting a request in error. -- Key: HBASE-5924 URL: https://issues.apache.org/jira/browse/HBASE-5924 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Attachments: 5924.v11.patch, 5924.v5.patch, 5924.v9.patch The client (in the function HConnectionManager#processBatchCallback) works in two steps: - make the requests - collect the failures and successes and prepare for retry It means that when there is an immediate error (region moved, split, dead server, ...) we still wait for all the initial requests to be executed before submitting again the failed request. If we have a scenario with all the requests taking 5 seconds we have a final execution time of: 5 (initial requests) + 1 (wait time) + 5 (final request) = 11s. We could improve this by analyzing immediately the results. This would lead us, for the scenario mentioned above, to 6 seconds. So we could have a performance improvement of nearly 50% in many cases, and much more than 50% if the request execution time is different. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5924) In the client code, don't wait for all the requests to be executed before resubmitting a request in error.
[ https://issues.apache.org/jira/browse/HBASE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13291375#comment-13291375 ] nkeywal commented on HBASE-5924: Ok, I got a random failure in a test I touched (TestRegionServerCoprocessorExceptionWithAbort), but it's because the test is flaky I think (I can be wrong :-) ). I will have a look tomorrow to be sure, but I think the patch v11 is reasonable enough to be committed. In the client code, don't wait for all the requests to be executed before resubmitting a request in error. -- Key: HBASE-5924 URL: https://issues.apache.org/jira/browse/HBASE-5924 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Attachments: 5924.v11.patch, 5924.v5.patch, 5924.v9.patch The client (in the function HConnectionManager#processBatchCallback) works in two steps: - make the requests - collect the failures and successes and prepare for retry It means that when there is an immediate error (region moved, split, dead server, ...) we still wait for all the initial requests to be executed before submitting again the failed request. If we have a scenario with all the requests taking 5 seconds we have a final execution time of: 5 (initial requests) + 1 (wait time) + 5 (final request) = 11s. We could improve this by analyzing immediately the results. This would lead us, for the scenario mentioned above, to 6 seconds. So we could have a performance improvement of nearly 50% in many cases, and much more than 50% if the request execution time is different. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5924) In the client code, don't wait for all the requests to be executed before resubmitting a request in error.
[ https://issues.apache.org/jira/browse/HBASE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13291434#comment-13291434 ] Hadoop QA commented on HBASE-5924: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12531326/5924.v11.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 5 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort org.apache.hadoop.hbase.client.TestAdmin Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2125//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2125//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2125//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2125//console This message is automatically generated. In the client code, don't wait for all the requests to be executed before resubmitting a request in error. -- Key: HBASE-5924 URL: https://issues.apache.org/jira/browse/HBASE-5924 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Attachments: 5924.v11.patch, 5924.v5.patch, 5924.v9.patch The client (in the function HConnectionManager#processBatchCallback) works in two steps: - make the requests - collect the failures and successes and prepare for retry It means that when there is an immediate error (region moved, split, dead server, ...) we still wait for all the initial requests to be executed before submitting again the failed request. If we have a scenario with all the requests taking 5 seconds we have a final execution time of: 5 (initial requests) + 1 (wait time) + 5 (final request) = 11s. We could improve this by analyzing immediately the results. This would lead us, for the scenario mentioned above, to 6 seconds. So we could have a performance improvement of nearly 50% in many cases, and much more than 50% if the request execution time is different. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5924) In the client code, don't wait for all the requests to be executed before resubmitting a request in error.
[ https://issues.apache.org/jira/browse/HBASE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13291476#comment-13291476 ] Zhihong Ted Yu commented on HBASE-5924: --- TestRegionServerCoprocessorExceptionWithAbort failed on QA machine. Should investigate. In the client code, don't wait for all the requests to be executed before resubmitting a request in error. -- Key: HBASE-5924 URL: https://issues.apache.org/jira/browse/HBASE-5924 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Attachments: 5924.v11.patch, 5924.v5.patch, 5924.v9.patch The client (in the function HConnectionManager#processBatchCallback) works in two steps: - make the requests - collect the failures and successes and prepare for retry It means that when there is an immediate error (region moved, split, dead server, ...) we still wait for all the initial requests to be executed before submitting again the failed request. If we have a scenario with all the requests taking 5 seconds we have a final execution time of: 5 (initial requests) + 1 (wait time) + 5 (final request) = 11s. We could improve this by analyzing immediately the results. This would lead us, for the scenario mentioned above, to 6 seconds. So we could have a performance improvement of nearly 50% in many cases, and much more than 50% if the request execution time is different. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5924) In the client code, don't wait for all the requests to be executed before resubmitting a request in error.
[ https://issues.apache.org/jira/browse/HBASE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13290339#comment-13290339 ] nkeywal commented on HBASE-5924: v9, with: - Ted's comment taken into account - Full removal of UpdateHistory stuff - Fix for HBASE-6156 Can be committed imho. In the client code, don't wait for all the requests to be executed before resubmitting a request in error. -- Key: HBASE-5924 URL: https://issues.apache.org/jira/browse/HBASE-5924 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Attachments: 5924.v5.patch, 5924.v9.patch The client (in the function HConnectionManager#processBatchCallback) works in two steps: - make the requests - collect the failures and successes and prepare for retry It means that when there is an immediate error (region moved, split, dead server, ...) we still wait for all the initial requests to be executed before submitting again the failed request. If we have a scenario with all the requests taking 5 seconds we have a final execution time of: 5 (initial requests) + 1 (wait time) + 5 (final request) = 11s. We could improve this by analyzing immediately the results. This would lead us, for the scenario mentioned above, to 6 seconds. So we could have a performance improvement of nearly 50% in many cases, and much more than 50% if the request execution time is different. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5924) In the client code, don't wait for all the requests to be executed before resubmitting a request in error.
[ https://issues.apache.org/jira/browse/HBASE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13290443#comment-13290443 ] stack commented on HBASE-5924: -- @nkeywal Did you add the history in the first place? Why is it safe to remove it now? Thanks for redoing processBatchCallback. On your three comments above, on 1., on the unused code, it may not be triggered by the test suite -- that could just be bad test coverage -- but independent, there may have been a reason for it. If your review of processBatchCallback has it making no sense, by all means purge it (as you have done). On 2., the callback, it looks like you kept it. I think that sensible. On 3., can we move it to HTable? Deprecate the current version in favor of the new HTable/HTableInterface version? Would that be too disruptive? Any way you can add tests to prove your claims of improvement above (its hard to review for that... but sounds like it at least works... going by it passing all unit tests... good on you nkeywal) In the client code, don't wait for all the requests to be executed before resubmitting a request in error. -- Key: HBASE-5924 URL: https://issues.apache.org/jira/browse/HBASE-5924 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Attachments: 5924.v5.patch, 5924.v9.patch The client (in the function HConnectionManager#processBatchCallback) works in two steps: - make the requests - collect the failures and successes and prepare for retry It means that when there is an immediate error (region moved, split, dead server, ...) we still wait for all the initial requests to be executed before submitting again the failed request. If we have a scenario with all the requests taking 5 seconds we have a final execution time of: 5 (initial requests) + 1 (wait time) + 5 (final request) = 11s. We could improve this by analyzing immediately the results. This would lead us, for the scenario mentioned above, to 6 seconds. So we could have a performance improvement of nearly 50% in many cases, and much more than 50% if the request execution time is different. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5924) In the client code, don't wait for all the requests to be executed before resubmitting a request in error.
[ https://issues.apache.org/jira/browse/HBASE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13290483#comment-13290483 ] Zhihong Ted Yu commented on HBASE-5924: --- {code} + // We need the origin multi action to find out what are the actions to replay if {code} 'origin' - 'original', 'what are the actions to replay' - 'what actions to replay' {code} +} catch (InterruptedException e) { + throw new IOException(e); {code} InterruptedIOException should be thrown. {code} + // mutate list so that it is empty for complete success, or contains + // only failed records results are returned in the same order as the + // requests in list walk the list backwards, so we can remove from list {code} The above is hard to read. A period between 'records' and 'results' ? A period between 'list' and 'walk' ? Hadoop QA didn't run tests: https://builds.apache.org/job/PreCommit-HBASE-Build/2116/console In the client code, don't wait for all the requests to be executed before resubmitting a request in error. -- Key: HBASE-5924 URL: https://issues.apache.org/jira/browse/HBASE-5924 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Attachments: 5924.v5.patch, 5924.v9.patch The client (in the function HConnectionManager#processBatchCallback) works in two steps: - make the requests - collect the failures and successes and prepare for retry It means that when there is an immediate error (region moved, split, dead server, ...) we still wait for all the initial requests to be executed before submitting again the failed request. If we have a scenario with all the requests taking 5 seconds we have a final execution time of: 5 (initial requests) + 1 (wait time) + 5 (final request) = 11s. We could improve this by analyzing immediately the results. This would lead us, for the scenario mentioned above, to 6 seconds. So we could have a performance improvement of nearly 50% in many cases, and much more than 50% if the request execution time is different. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5924) In the client code, don't wait for all the requests to be executed before resubmitting a request in error.
[ https://issues.apache.org/jira/browse/HBASE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13290509#comment-13290509 ] Zhihong Ted Yu commented on HBASE-5924: --- hbase-server/src/main/java/org/apache/hadoop/hbase/util/Triple.java was not included in patch v9. Hence: {code} [ERROR] /home/hduser/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java:[2061,23] cannot find symbol [ERROR] symbol : class Triple [ERROR] location: class org.apache.hadoop.hbase.client.HConnectionManager.HConnectionImplementation.ProcessR {code} In the client code, don't wait for all the requests to be executed before resubmitting a request in error. -- Key: HBASE-5924 URL: https://issues.apache.org/jira/browse/HBASE-5924 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Attachments: 5924.v5.patch, 5924.v9.patch The client (in the function HConnectionManager#processBatchCallback) works in two steps: - make the requests - collect the failures and successes and prepare for retry It means that when there is an immediate error (region moved, split, dead server, ...) we still wait for all the initial requests to be executed before submitting again the failed request. If we have a scenario with all the requests taking 5 seconds we have a final execution time of: 5 (initial requests) + 1 (wait time) + 5 (final request) = 11s. We could improve this by analyzing immediately the results. This would lead us, for the scenario mentioned above, to 6 seconds. So we could have a performance improvement of nearly 50% in many cases, and much more than 50% if the request execution time is different. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5924) In the client code, don't wait for all the requests to be executed before resubmitting a request in error.
[ https://issues.apache.org/jira/browse/HBASE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13289591#comment-13289591 ] nkeywal commented on HBASE-5924: Bumped the priority to major to make clear it's a complete rewriting. Tests are in progress, I will push a version soon anyway to get feedback. Changes: Analyze the replies as they come, not in the initial request order Replay the failed request immediately, not when we have all the replies Reuse the actions in case of errors instead of recreating the objects Don't iterate on the results list to find the errors Don't reiterate on the results list to detail the errors. Note that I removed the 'updateHistory' list but not the code in case the feedback shows it should still be used. Even if it's a one to one implementation, it's preferable to add specific tests. Will do in a later update. And the current implementation stayed in HConnectionManager and kept its callback. Happy to change this. In the client code, don't wait for all the requests to be executed before resubmitting a request in error. -- Key: HBASE-5924 URL: https://issues.apache.org/jira/browse/HBASE-5924 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal The client (in the function HConnectionManager#processBatchCallback) works in two steps: - make the requests - collect the failures and successes and prepare for retry It means that when there is an immediate error (region moved, split, dead server, ...) we still wait for all the initial requests to be executed before submitting again the failed request. If we have a scenario with all the requests taking 5 seconds we have a final execution time of: 5 (initial requests) + 1 (wait time) + 5 (final request) = 11s. We could improve this by analyzing immediately the results. This would lead us, for the scenario mentioned above, to 6 seconds. So we could have a performance improvement of nearly 50% in many cases, and much more than 50% if the request execution time is different. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5924) In the client code, don't wait for all the requests to be executed before resubmitting a request in error.
[ https://issues.apache.org/jira/browse/HBASE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13289646#comment-13289646 ] Hadoop QA commented on HBASE-5924: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12530992/5924.v5.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 6 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2110//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2110//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2110//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2110//console This message is automatically generated. In the client code, don't wait for all the requests to be executed before resubmitting a request in error. -- Key: HBASE-5924 URL: https://issues.apache.org/jira/browse/HBASE-5924 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Attachments: 5924.v5.patch The client (in the function HConnectionManager#processBatchCallback) works in two steps: - make the requests - collect the failures and successes and prepare for retry It means that when there is an immediate error (region moved, split, dead server, ...) we still wait for all the initial requests to be executed before submitting again the failed request. If we have a scenario with all the requests taking 5 seconds we have a final execution time of: 5 (initial requests) + 1 (wait time) + 5 (final request) = 11s. We could improve this by analyzing immediately the results. This would lead us, for the scenario mentioned above, to 6 seconds. So we could have a performance improvement of nearly 50% in many cases, and much more than 50% if the request execution time is different. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5924) In the client code, don't wait for all the requests to be executed before resubmitting a request in error.
[ https://issues.apache.org/jira/browse/HBASE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13289703#comment-13289703 ] Zhihong Ted Yu commented on HBASE-5924: --- Nice work. {code} private static class ProcessR { {code} Can we name the above class more meaningfully ? javadoc is desirable. {code} * @param sleepTime - sleep time befora actually executing the actions. Can be zero. {code} 'befora' - 'before' {code} for (ActionR aTodo : actionsList) { {code} aToDo - anAction ? {code} CallableMultiResponse callable = createDelayedCallable(sleepTime, e.getKey(), e.getValue()); TripleMultiActionR, HRegionLocation, FutureMultiResponse p = new TripleMultiActionR, HRegionLocation, FutureMultiResponse(e.getValue(), e.getKey(), this.pool.submit(callable)); {code} Wrap the two long lines above. {code} throw new IllegalArgumentException( argument results must be the same size as argument list); {code} It would be nice to include the sizes in exception message. In the client code, don't wait for all the requests to be executed before resubmitting a request in error. -- Key: HBASE-5924 URL: https://issues.apache.org/jira/browse/HBASE-5924 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Attachments: 5924.v5.patch The client (in the function HConnectionManager#processBatchCallback) works in two steps: - make the requests - collect the failures and successes and prepare for retry It means that when there is an immediate error (region moved, split, dead server, ...) we still wait for all the initial requests to be executed before submitting again the failed request. If we have a scenario with all the requests taking 5 seconds we have a final execution time of: 5 (initial requests) + 1 (wait time) + 5 (final request) = 11s. We could improve this by analyzing immediately the results. This would lead us, for the scenario mentioned above, to 6 seconds. So we could have a performance improvement of nearly 50% in many cases, and much more than 50% if the request execution time is different. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5924) In the client code, don't wait for all the requests to be executed before resubmitting a request in error.
[ https://issues.apache.org/jira/browse/HBASE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13287582#comment-13287582 ] nkeywal commented on HBASE-5924: This leads to a complete rewriting of the processBatchCallback function. 3 comments: 1) I don't see how this piece of code can happen, and I ran the complete test suite without getting into this part. Do I miss anything? {noformat} for (PairInteger, Object regionResult : regionResults) { if (regionResult == null) { // if the first/only record is 'null' the entire region failed. LOG.debug(Failures for region: + Bytes.toStringBinary(regionName) + , removing from cache); } else { {noformat} 2) The callback is never used internally. Is this something we should keep for customer code? 3) Do I move it to HTable? There is a comment saying that it does not belong to Connection, and it's true. But it's public, so... In the client code, don't wait for all the requests to be executed before resubmitting a request in error. -- Key: HBASE-5924 URL: https://issues.apache.org/jira/browse/HBASE-5924 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor The client (in the function HConnectionManager#processBatchCallback) works in two steps: - make the requests - collect the failures and successes and prepare for retry It means that when there is an immediate error (region moved, split, dead server, ...) we still wait for all the initial requests to be executed before submitting again the failed request. If we have a scenario with all the requests taking 5 seconds we have a final execution time of: 5 (initial requests) + 1 (wait time) + 5 (final request) = 11s. We could improve this by analyzing immediately the results. This would lead us, for the scenario mentioned above, to 6 seconds. So we could have a performance improvement of nearly 50% in many cases, and much more than 50% if the request execution time is different. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5924) In the client code, don't wait for all the requests to be executed before resubmitting a request in error.
[ https://issues.apache.org/jira/browse/HBASE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13287595#comment-13287595 ] Zhihong Yu commented on HBASE-5924: --- For #2 above, I think we can remove the callback in 0.96 In the client code, don't wait for all the requests to be executed before resubmitting a request in error. -- Key: HBASE-5924 URL: https://issues.apache.org/jira/browse/HBASE-5924 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor The client (in the function HConnectionManager#processBatchCallback) works in two steps: - make the requests - collect the failures and successes and prepare for retry It means that when there is an immediate error (region moved, split, dead server, ...) we still wait for all the initial requests to be executed before submitting again the failed request. If we have a scenario with all the requests taking 5 seconds we have a final execution time of: 5 (initial requests) + 1 (wait time) + 5 (final request) = 11s. We could improve this by analyzing immediately the results. This would lead us, for the scenario mentioned above, to 6 seconds. So we could have a performance improvement of nearly 50% in many cases, and much more than 50% if the request execution time is different. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5924) In the client code, don't wait for all the requests to be executed before resubmitting a request in error.
[ https://issues.apache.org/jira/browse/HBASE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267934#comment-13267934 ] stack commented on HBASE-5924: -- +1 In the client code, don't wait for all the requests to be executed before resubmitting a request in error. -- Key: HBASE-5924 URL: https://issues.apache.org/jira/browse/HBASE-5924 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor The client (in the function HConnectionManager#processBatchCallback) works in two steps: - make the requests - collect the failures and successes and prepare for retry It means that when there is an immediate error (region moved, split, dead server, ...) we still wait for all the initial requests to be executed before submitting again the failed request. If we have a scenario with all the requests taking 5 seconds we have a final execution time of: 5 (initial requests) + 1 (wait time) + 5 (final request) = 11s. We could improve this by analyzing immediately the results. This would lead us, for the scenario mentioned above, to 6 seconds. So we could have a performance improvement of nearly 50% in many cases, and much more than 50% if the request execution time is different. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira