[jira] [Updated] (HBASE-5926) Delete the master znode after a master crash
[ https://issues.apache.org/jira/browse/HBASE-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5926: --- Status: Patch Available (was: Open) Delete the master znode after a master crash Key: HBASE-5926 URL: https://issues.apache.org/jira/browse/HBASE-5926 Project: HBase Issue Type: Improvement Components: master, scripts Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Fix For: 0.96.0 Attachments: 5926.v6.patch, 5926.v8.patch, 5926.v9.patch This is the continuation of the work done in HBASE-5844. But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master backup master there is a single znode for both. So if we apply the same strategy as for a regionserver, we may have this scenario: 1) Master starts 2) Backup master starts 3) Master dies 4) ZK detects it 5) Backup master receives the update from ZK 6) Backup master creates the new master node and become the main master 7) Previous master script continues 8) Previous master script deletes the master node in ZK 9) = issue: we deleted the node just created by the new master This should not happen often (usually the znode will be deleted soon enough), but it can happen. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5926) Delete the master znode after a master crash
[ https://issues.apache.org/jira/browse/HBASE-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13278147#comment-13278147 ] nkeywal commented on HBASE-5926: bq. You should look at the javadoc that is created from your src. Its going to be a jumble. Check it out. You need a little bit of html in there at least for your list of strategy dependencies. Done. bq. What is the filecontent? We don't need any, right? The name of the file is enough? We need the content. For the regionserver, the content is the znode path. For the master it's the full ServerName (stringified). bq. This should be boolean rather than int? Or is it returned to shell? If so, should say so in the comment: + * @return if done returns 0 else -1. Done. bq. Is CleanZNode a good name? How about ZNodeCleaner or ZNodeClearer or CrashZNodeCleaner? Renamed to ZNodeClearer bq. I think in HMasterCommandLine, should be start|stop|clear so it fits format of the other commands. Done. bq. In MasterAddressTracker, can you get the znode sequence id and only delete if the sequence id matches? We store the full ServerName so if there is a restart we will see it. But maybe you're speaking about the znode version? Because I looked at the zk api, and with the version we could remove totally the race condition... Delete the master znode after a master crash Key: HBASE-5926 URL: https://issues.apache.org/jira/browse/HBASE-5926 Project: HBase Issue Type: Improvement Components: master, scripts Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Fix For: 0.96.0 Attachments: 5926.v6.patch, 5926.v8.patch, 5926.v9.patch This is the continuation of the work done in HBASE-5844. But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master backup master there is a single znode for both. So if we apply the same strategy as for a regionserver, we may have this scenario: 1) Master starts 2) Backup master starts 3) Master dies 4) ZK detects it 5) Backup master receives the update from ZK 6) Backup master creates the new master node and become the main master 7) Previous master script continues 8) Previous master script deletes the master node in ZK 9) = issue: we deleted the node just created by the new master This should not happen often (usually the znode will be deleted soon enough), but it can happen. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5926) Delete the master znode after a master crash
[ https://issues.apache.org/jira/browse/HBASE-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5926: --- Status: Patch Available (was: Open) Delete the master znode after a master crash Key: HBASE-5926 URL: https://issues.apache.org/jira/browse/HBASE-5926 Project: HBase Issue Type: Improvement Components: master, scripts Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Fix For: 0.96.0 Attachments: 5926.v10.patch, 5926.v6.patch, 5926.v8.patch, 5926.v9.patch This is the continuation of the work done in HBASE-5844. But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master backup master there is a single znode for both. So if we apply the same strategy as for a regionserver, we may have this scenario: 1) Master starts 2) Backup master starts 3) Master dies 4) ZK detects it 5) Backup master receives the update from ZK 6) Backup master creates the new master node and become the main master 7) Previous master script continues 8) Previous master script deletes the master node in ZK 9) = issue: we deleted the node just created by the new master This should not happen often (usually the znode will be deleted soon enough), but it can happen. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5926) Delete the master znode after a master crash
[ https://issues.apache.org/jira/browse/HBASE-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5926: --- Attachment: 5926.v10.patch Delete the master znode after a master crash Key: HBASE-5926 URL: https://issues.apache.org/jira/browse/HBASE-5926 Project: HBase Issue Type: Improvement Components: master, scripts Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Fix For: 0.96.0 Attachments: 5926.v10.patch, 5926.v6.patch, 5926.v8.patch, 5926.v9.patch This is the continuation of the work done in HBASE-5844. But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master backup master there is a single znode for both. So if we apply the same strategy as for a regionserver, we may have this scenario: 1) Master starts 2) Backup master starts 3) Master dies 4) ZK detects it 5) Backup master receives the update from ZK 6) Backup master creates the new master node and become the main master 7) Previous master script continues 8) Previous master script deletes the master node in ZK 9) = issue: we deleted the node just created by the new master This should not happen often (usually the znode will be deleted soon enough), but it can happen. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5926) Delete the master znode after a master crash
[ https://issues.apache.org/jira/browse/HBASE-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5926: --- Status: Open (was: Patch Available) Delete the master znode after a master crash Key: HBASE-5926 URL: https://issues.apache.org/jira/browse/HBASE-5926 Project: HBase Issue Type: Improvement Components: master, scripts Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Fix For: 0.96.0 Attachments: 5926.v10.patch, 5926.v6.patch, 5926.v8.patch, 5926.v9.patch This is the continuation of the work done in HBASE-5844. But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master backup master there is a single znode for both. So if we apply the same strategy as for a regionserver, we may have this scenario: 1) Master starts 2) Backup master starts 3) Master dies 4) ZK detects it 5) Backup master receives the update from ZK 6) Backup master creates the new master node and become the main master 7) Previous master script continues 8) Previous master script deletes the master node in ZK 9) = issue: we deleted the node just created by the new master This should not happen often (usually the znode will be deleted soon enough), but it can happen. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5926) Delete the master znode after a master crash
[ https://issues.apache.org/jira/browse/HBASE-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13278259#comment-13278259 ] nkeywal commented on HBASE-5926: Yes, it could be the node name only. Does it make a difference? To me, they can both be safely written in the fs. I check the version in v11, so there is no race condition at all now. Delete the master znode after a master crash Key: HBASE-5926 URL: https://issues.apache.org/jira/browse/HBASE-5926 Project: HBase Issue Type: Improvement Components: master, scripts Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Fix For: 0.96.0 Attachments: 5926.v10.patch, 5926.v6.patch, 5926.v8.patch, 5926.v9.patch This is the continuation of the work done in HBASE-5844. But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master backup master there is a single znode for both. So if we apply the same strategy as for a regionserver, we may have this scenario: 1) Master starts 2) Backup master starts 3) Master dies 4) ZK detects it 5) Backup master receives the update from ZK 6) Backup master creates the new master node and become the main master 7) Previous master script continues 8) Previous master script deletes the master node in ZK 9) = issue: we deleted the node just created by the new master This should not happen often (usually the znode will be deleted soon enough), but it can happen. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5926) Delete the master znode after a master crash
[ https://issues.apache.org/jira/browse/HBASE-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5926: --- Status: Open (was: Patch Available) Delete the master znode after a master crash Key: HBASE-5926 URL: https://issues.apache.org/jira/browse/HBASE-5926 Project: HBase Issue Type: Improvement Components: master, scripts Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Fix For: 0.96.0 Attachments: 5926.v10.patch, 5926.v6.patch, 5926.v8.patch, 5926.v9.patch This is the continuation of the work done in HBASE-5844. But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master backup master there is a single znode for both. So if we apply the same strategy as for a regionserver, we may have this scenario: 1) Master starts 2) Backup master starts 3) Master dies 4) ZK detects it 5) Backup master receives the update from ZK 6) Backup master creates the new master node and become the main master 7) Previous master script continues 8) Previous master script deletes the master node in ZK 9) = issue: we deleted the node just created by the new master This should not happen often (usually the znode will be deleted soon enough), but it can happen. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5926) Delete the master znode after a master crash
[ https://issues.apache.org/jira/browse/HBASE-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5926: --- Status: Patch Available (was: Open) Delete the master znode after a master crash Key: HBASE-5926 URL: https://issues.apache.org/jira/browse/HBASE-5926 Project: HBase Issue Type: Improvement Components: master, scripts Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Fix For: 0.96.0 Attachments: 5926.v10.patch, 5926.v11.patch, 5926.v6.patch, 5926.v8.patch, 5926.v9.patch This is the continuation of the work done in HBASE-5844. But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master backup master there is a single znode for both. So if we apply the same strategy as for a regionserver, we may have this scenario: 1) Master starts 2) Backup master starts 3) Master dies 4) ZK detects it 5) Backup master receives the update from ZK 6) Backup master creates the new master node and become the main master 7) Previous master script continues 8) Previous master script deletes the master node in ZK 9) = issue: we deleted the node just created by the new master This should not happen often (usually the znode will be deleted soon enough), but it can happen. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5926) Delete the master znode after a master crash
[ https://issues.apache.org/jira/browse/HBASE-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5926: --- Attachment: 5926.v11.patch Delete the master znode after a master crash Key: HBASE-5926 URL: https://issues.apache.org/jira/browse/HBASE-5926 Project: HBase Issue Type: Improvement Components: master, scripts Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Fix For: 0.96.0 Attachments: 5926.v10.patch, 5926.v11.patch, 5926.v6.patch, 5926.v8.patch, 5926.v9.patch This is the continuation of the work done in HBASE-5844. But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master backup master there is a single znode for both. So if we apply the same strategy as for a regionserver, we may have this scenario: 1) Master starts 2) Backup master starts 3) Master dies 4) ZK detects it 5) Backup master receives the update from ZK 6) Backup master creates the new master node and become the main master 7) Previous master script continues 8) Previous master script deletes the master node in ZK 9) = issue: we deleted the node just created by the new master This should not happen often (usually the znode will be deleted soon enough), but it can happen. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5926) Delete the master znode after a master crash
[ https://issues.apache.org/jira/browse/HBASE-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5926: --- Status: Patch Available (was: Open) Delete the master znode after a master crash Key: HBASE-5926 URL: https://issues.apache.org/jira/browse/HBASE-5926 Project: HBase Issue Type: Improvement Components: master, scripts Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Fix For: 0.96.0 Attachments: 5926.v10.patch, 5926.v11.patch, 5926.v13.patch, 5926.v6.patch, 5926.v8.patch, 5926.v9.patch This is the continuation of the work done in HBASE-5844. But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master backup master there is a single znode for both. So if we apply the same strategy as for a regionserver, we may have this scenario: 1) Master starts 2) Backup master starts 3) Master dies 4) ZK detects it 5) Backup master receives the update from ZK 6) Backup master creates the new master node and become the main master 7) Previous master script continues 8) Previous master script deletes the master node in ZK 9) = issue: we deleted the node just created by the new master This should not happen often (usually the znode will be deleted soon enough), but it can happen. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5926) Delete the master znode after a master crash
[ https://issues.apache.org/jira/browse/HBASE-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5926: --- Status: Open (was: Patch Available) Delete the master znode after a master crash Key: HBASE-5926 URL: https://issues.apache.org/jira/browse/HBASE-5926 Project: HBase Issue Type: Improvement Components: master, scripts Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Fix For: 0.96.0 Attachments: 5926.v10.patch, 5926.v11.patch, 5926.v13.patch, 5926.v6.patch, 5926.v8.patch, 5926.v9.patch This is the continuation of the work done in HBASE-5844. But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master backup master there is a single znode for both. So if we apply the same strategy as for a regionserver, we may have this scenario: 1) Master starts 2) Backup master starts 3) Master dies 4) ZK detects it 5) Backup master receives the update from ZK 6) Backup master creates the new master node and become the main master 7) Previous master script continues 8) Previous master script deletes the master node in ZK 9) = issue: we deleted the node just created by the new master This should not happen often (usually the znode will be deleted soon enough), but it can happen. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5926) Delete the master znode after a master crash
[ https://issues.apache.org/jira/browse/HBASE-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13278654#comment-13278654 ] nkeywal commented on HBASE-5926: v13 should do it... Delete the master znode after a master crash Key: HBASE-5926 URL: https://issues.apache.org/jira/browse/HBASE-5926 Project: HBase Issue Type: Improvement Components: master, scripts Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Fix For: 0.96.0 Attachments: 5926.v10.patch, 5926.v11.patch, 5926.v13.patch, 5926.v6.patch, 5926.v8.patch, 5926.v9.patch This is the continuation of the work done in HBASE-5844. But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master backup master there is a single znode for both. So if we apply the same strategy as for a regionserver, we may have this scenario: 1) Master starts 2) Backup master starts 3) Master dies 4) ZK detects it 5) Backup master receives the update from ZK 6) Backup master creates the new master node and become the main master 7) Previous master script continues 8) Previous master script deletes the master node in ZK 9) = issue: we deleted the node just created by the new master This should not happen often (usually the znode will be deleted soon enough), but it can happen. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5926) Delete the master znode after a master crash
[ https://issues.apache.org/jira/browse/HBASE-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5926: --- Attachment: 5926.v13.patch Delete the master znode after a master crash Key: HBASE-5926 URL: https://issues.apache.org/jira/browse/HBASE-5926 Project: HBase Issue Type: Improvement Components: master, scripts Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Fix For: 0.96.0 Attachments: 5926.v10.patch, 5926.v11.patch, 5926.v13.patch, 5926.v6.patch, 5926.v8.patch, 5926.v9.patch This is the continuation of the work done in HBASE-5844. But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master backup master there is a single znode for both. So if we apply the same strategy as for a regionserver, we may have this scenario: 1) Master starts 2) Backup master starts 3) Master dies 4) ZK detects it 5) Backup master receives the update from ZK 6) Backup master creates the new master node and become the main master 7) Previous master script continues 8) Previous master script deletes the master node in ZK 9) = issue: we deleted the node just created by the new master This should not happen often (usually the znode will be deleted soon enough), but it can happen. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5926) Delete the master znode after a master crash
[ https://issues.apache.org/jira/browse/HBASE-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13278919#comment-13278919 ] nkeywal commented on HBASE-5926: I think it's ok, I don't have this locally... Delete the master znode after a master crash Key: HBASE-5926 URL: https://issues.apache.org/jira/browse/HBASE-5926 Project: HBase Issue Type: Improvement Components: master, scripts Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Fix For: 0.96.0 Attachments: 5926.v10.patch, 5926.v11.patch, 5926.v13.patch, 5926.v6.patch, 5926.v8.patch, 5926.v9.patch This is the continuation of the work done in HBASE-5844. But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master backup master there is a single znode for both. So if we apply the same strategy as for a regionserver, we may have this scenario: 1) Master starts 2) Backup master starts 3) Master dies 4) ZK detects it 5) Backup master receives the update from ZK 6) Backup master creates the new master node and become the main master 7) Previous master script continues 8) Previous master script deletes the master node in ZK 9) = issue: we deleted the node just created by the new master This should not happen often (usually the znode will be deleted soon enough), but it can happen. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5926) Delete the master znode after a master crash
[ https://issues.apache.org/jira/browse/HBASE-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5926: --- Status: Open (was: Patch Available) Delete the master znode after a master crash Key: HBASE-5926 URL: https://issues.apache.org/jira/browse/HBASE-5926 Project: HBase Issue Type: Improvement Components: master, scripts Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Fix For: 0.96.0 Attachments: 5926.v10.patch, 5926.v11.patch, 5926.v13.patch, 5926.v14.patch, 5926.v6.patch, 5926.v8.patch, 5926.v9.patch This is the continuation of the work done in HBASE-5844. But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master backup master there is a single znode for both. So if we apply the same strategy as for a regionserver, we may have this scenario: 1) Master starts 2) Backup master starts 3) Master dies 4) ZK detects it 5) Backup master receives the update from ZK 6) Backup master creates the new master node and become the main master 7) Previous master script continues 8) Previous master script deletes the master node in ZK 9) = issue: we deleted the node just created by the new master This should not happen often (usually the znode will be deleted soon enough), but it can happen. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5926) Delete the master znode after a master crash
[ https://issues.apache.org/jira/browse/HBASE-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5926: --- Attachment: 5926.v14.patch Delete the master znode after a master crash Key: HBASE-5926 URL: https://issues.apache.org/jira/browse/HBASE-5926 Project: HBase Issue Type: Improvement Components: master, scripts Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Fix For: 0.96.0 Attachments: 5926.v10.patch, 5926.v11.patch, 5926.v13.patch, 5926.v14.patch, 5926.v6.patch, 5926.v8.patch, 5926.v9.patch This is the continuation of the work done in HBASE-5844. But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master backup master there is a single znode for both. So if we apply the same strategy as for a regionserver, we may have this scenario: 1) Master starts 2) Backup master starts 3) Master dies 4) ZK detects it 5) Backup master receives the update from ZK 6) Backup master creates the new master node and become the main master 7) Previous master script continues 8) Previous master script deletes the master node in ZK 9) = issue: we deleted the node just created by the new master This should not happen often (usually the znode will be deleted soon enough), but it can happen. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5926) Delete the master znode after a master crash
[ https://issues.apache.org/jira/browse/HBASE-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5926: --- Status: Patch Available (was: Open) Delete the master znode after a master crash Key: HBASE-5926 URL: https://issues.apache.org/jira/browse/HBASE-5926 Project: HBase Issue Type: Improvement Components: master, scripts Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Fix For: 0.96.0 Attachments: 5926.v10.patch, 5926.v11.patch, 5926.v13.patch, 5926.v14.patch, 5926.v6.patch, 5926.v8.patch, 5926.v9.patch This is the continuation of the work done in HBASE-5844. But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master backup master there is a single znode for both. So if we apply the same strategy as for a regionserver, we may have this scenario: 1) Master starts 2) Backup master starts 3) Master dies 4) ZK detects it 5) Backup master receives the update from ZK 6) Backup master creates the new master node and become the main master 7) Previous master script continues 8) Previous master script deletes the master node in ZK 9) = issue: we deleted the node just created by the new master This should not happen often (usually the znode will be deleted soon enough), but it can happen. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5998) Bulk assignment: regionserver optimization by using a temporary cache for table descriptors when receveing an open regions request
[ https://issues.apache.org/jira/browse/HBASE-5998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5998: --- Resolution: Fixed Status: Resolved (was: Patch Available) depiste the failure mentionned by jenkins, the patch is visible in the source code. Bulk assignment: regionserver optimization by using a temporary cache for table descriptors when receveing an open regions request -- Key: HBASE-5998 URL: https://issues.apache.org/jira/browse/HBASE-5998 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Fix For: 0.96.0 Attachments: 5998.v2.patch, 5998.v3.patch During the assignment, on the regionserver, before creating the handlers we load the table description. Even if there is a cache, we check the timestamps for each region, while it's not necessary. The test below is just with one node, with more nodes the benefit will improve. By limiting the time spent in HRegion#openRegion we increase the parallelization during cluster startup, as the master is using a pool of threads to call the RS. -- Without the fix 2012-05-14 11:40:52,501 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Bulk assigning 1193 region(s) to localhost,11003,1336988444043 2012-05-14 11:41:09,947 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Bulk assigning done for localhost,11003,1336988444043 -- With the fix 2012-05-14 11:34:40,444 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Bulk assigning 1193 region(s) to localhost,11003,1336988444043 2012-05-14 11:34:40,929 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Bulk assigning done for localhost,11003,1336988065948 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6057) Change some tests categories to optimize build time
nkeywal created HBASE-6057: -- Summary: Change some tests categories to optimize build time Key: HBASE-6057 URL: https://issues.apache.org/jira/browse/HBASE-6057 Project: HBase Issue Type: Improvement Components: test Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Some tests categorized as small takes more than 15s: it's better if they are executed in // with the medium tests. Some medium tests last less than 2s: it's better to have then executed with the small tests: we save a fork. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6057) Change some tests categories to optimize build time
[ https://issues.apache.org/jira/browse/HBASE-6057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-6057: --- Status: Patch Available (was: Open) this should buy around 5%-10%... Change some tests categories to optimize build time --- Key: HBASE-6057 URL: https://issues.apache.org/jira/browse/HBASE-6057 Project: HBase Issue Type: Improvement Components: test Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 6057.v1.patch Some tests categorized as small takes more than 15s: it's better if they are executed in // with the medium tests. Some medium tests last less than 2s: it's better to have then executed with the small tests: we save a fork. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6057) Change some tests categories to optimize build time
[ https://issues.apache.org/jira/browse/HBASE-6057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-6057: --- Attachment: 6057.v1.patch Change some tests categories to optimize build time --- Key: HBASE-6057 URL: https://issues.apache.org/jira/browse/HBASE-6057 Project: HBase Issue Type: Improvement Components: test Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 6057.v1.patch Some tests categorized as small takes more than 15s: it's better if they are executed in // with the medium tests. Some medium tests last less than 2s: it's better to have then executed with the small tests: we save a fork. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6058) Use ZK 3.4 API 'multi' in bulk assignment
nkeywal created HBASE-6058: -- Summary: Use ZK 3.4 API 'multi' in bulk assignment Key: HBASE-6058 URL: https://issues.apache.org/jira/browse/HBASE-6058 Project: HBase Issue Type: Improvement Components: master, zookeeper Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor We use async API today. This is already much much faster than the sync API. Still, it makes sense to use the 'multi' function: this will decrease the network zookeeper load at startup/rolling restart. On a 500 nodes cluster, we see 3 that 3 seconds are spent on updating ZK per bulk assignment. This should cut it in half (+ the benefits on the network/zk load). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6057) Change some tests categories to optimize build time
[ https://issues.apache.org/jira/browse/HBASE-6057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13279504#comment-13279504 ] nkeywal commented on HBASE-6057: All these tests are not impacted by the change. Patch is ok imho. Change some tests categories to optimize build time --- Key: HBASE-6057 URL: https://issues.apache.org/jira/browse/HBASE-6057 Project: HBase Issue Type: Improvement Components: test Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 6057.v1.patch Some tests categorized as small takes more than 15s: it's better if they are executed in // with the medium tests. Some medium tests last less than 2s: it's better to have then executed with the small tests: we save a fork. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5970) Improve the AssignmentManager#updateTimer and speed up handling opened event
[ https://issues.apache.org/jira/browse/HBASE-5970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280367#comment-13280367 ] nkeywal commented on HBASE-5970: Hi, Could you share the logs of the tests? I would be interested to have a look at them. The javadoc for updateTimers says it's not used for bulk assignment, is there a mix of regions 'bulk assigned' and other regions? I see as well in the description that the time was once with 'retainAssignment=true' and once without. Are the results comparable in both cases? Thank you! Improve the AssignmentManager#updateTimer and speed up handling opened event Key: HBASE-5970 URL: https://issues.apache.org/jira/browse/HBASE-5970 Project: HBase Issue Type: Improvement Components: master Reporter: chunhui shen Assignee: chunhui shen Attachments: 5970v3.patch, HBASE-5970.patch, HBASE-5970v2.patch, HBASE-5970v3.patch We found handing opened event very slow in the environment with lots of regions. The problem is the slow AssignmentManager#updateTimer. We do the test for bulk assigning 10w (i.e. 100k) regions, the whole process of bulk assigning took 1 hours. 2012-05-06 20:31:49,201 INFO org.apache.hadoop.hbase.master.AssignmentManager: Bulk assigning 10 region(s) round-robin across 5 server(s) 2012-05-06 21:26:32,103 INFO org.apache.hadoop.hbase.master.AssignmentManager: Bulk assigning done I think we could do the improvement for the AssignmentManager#updateTimer: Make a thread do this work. After the improvement, it took only 4.5mins 2012-05-07 11:03:36,581 INFO org.apache.hadoop.hbase.master.AssignmentManager: Bulk assigning 10 region(s) across 5 server(s), retainAssignment=true 2012-05-07 11:07:57,073 INFO org.apache.hadoop.hbase.master.AssignmentManager: Bulk assigning done -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1749) If RS looses lease, we used to restart by default; reinstitute
[ https://issues.apache.org/jira/browse/HBASE-1749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280387#comment-13280387 ] nkeywal commented on HBASE-1749: Yes, because of HBASE-5844 HBASE-5939, we now: - delete immediately the znode when we exit - restart after a non planned stop. This is safer than retrying to reinstitute a region server in the same jvm, as it removes any memory or static variable effect. In both case we trigger a reassignment of the regions however. If RS looses lease, we used to restart by default; reinstitute -- Key: HBASE-1749 URL: https://issues.apache.org/jira/browse/HBASE-1749 Project: HBase Issue Type: Bug Reporter: stack Assignee: nkeywal -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Work started] (HBASE-1749) If RS looses lease, we used to restart by default; reinstitute
[ https://issues.apache.org/jira/browse/HBASE-1749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HBASE-1749 started by nkeywal. If RS looses lease, we used to restart by default; reinstitute -- Key: HBASE-1749 URL: https://issues.apache.org/jira/browse/HBASE-1749 Project: HBase Issue Type: Bug Reporter: stack Assignee: nkeywal -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-1749) If RS looses lease, we used to restart by default; reinstitute
[ https://issues.apache.org/jira/browse/HBASE-1749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal resolved HBASE-1749. Resolution: Duplicate If RS looses lease, we used to restart by default; reinstitute -- Key: HBASE-1749 URL: https://issues.apache.org/jira/browse/HBASE-1749 Project: HBase Issue Type: Bug Reporter: stack Assignee: nkeywal -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5573) Replace client ZooKeeper watchers by simple ZooKeeper reads
[ https://issues.apache.org/jira/browse/HBASE-5573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5573: --- Resolution: Fixed Fix Version/s: 0.96.0 Status: Resolved (was: Patch Available) Replace client ZooKeeper watchers by simple ZooKeeper reads --- Key: HBASE-5573 URL: https://issues.apache.org/jira/browse/HBASE-5573 Project: HBase Issue Type: Improvement Components: client, zookeeper Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Fix For: 0.96.0 Attachments: 5573.v1.patch, 5573.v2.patch, 5573.v4.patch, 5573.v6.patch, 5573.v7.patch, 5573.v8.patch Some code in the package needs to read data in ZK. This could be done by a simple read, but is actually implemented with a watcher. This holds ZK resources. Fixing this could also be an opportunity to remove the need for the client to provide the master address and port. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6109) Improve RIT performances during assignment on large clusters
nkeywal created HBASE-6109: -- Summary: Improve RIT performances during assignment on large clusters Key: HBASE-6109 URL: https://issues.apache.org/jira/browse/HBASE-6109 Project: HBase Issue Type: Improvement Components: master Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor The main points in this patch are: - lowering the number of copy of the RIT list - lowering the number of synchronization - synchronizing on a region rather than on everything It also contains: - some fixes around the RIT notification: the list was sometimes modified without a corresponding 'notify'. - some tests flakiness correction, actually unrelated to this patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6109) Improve RIT performances during assignment on large clusters
[ https://issues.apache.org/jira/browse/HBASE-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-6109: --- Attachment: 6109.v7.patch Improve RIT performances during assignment on large clusters Key: HBASE-6109 URL: https://issues.apache.org/jira/browse/HBASE-6109 Project: HBase Issue Type: Improvement Components: master Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 6109.v7.patch The main points in this patch are: - lowering the number of copy of the RIT list - lowering the number of synchronization - synchronizing on a region rather than on everything It also contains: - some fixes around the RIT notification: the list was sometimes modified without a corresponding 'notify'. - some tests flakiness correction, actually unrelated to this patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6109) Improve RIT performances during assignment on large clusters
[ https://issues.apache.org/jira/browse/HBASE-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13283858#comment-13283858 ] nkeywal commented on HBASE-6109: Here it is. I haven't merged it with trunk, as I don't know yet the impact of the modules and I expect many commits the next few days :-). Improve RIT performances during assignment on large clusters Key: HBASE-6109 URL: https://issues.apache.org/jira/browse/HBASE-6109 Project: HBase Issue Type: Improvement Components: master Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 6109.v7.patch The main points in this patch are: - lowering the number of copy of the RIT list - lowering the number of synchronization - synchronizing on a region rather than on everything It also contains: - some fixes around the RIT notification: the list was sometimes modified without a corresponding 'notify'. - some tests flakiness correction, actually unrelated to this patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6109) Improve RIT performances during assignment on large clusters
[ https://issues.apache.org/jira/browse/HBASE-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13284843#comment-13284843 ] nkeywal commented on HBASE-6109: @stack bq. Is this a generic locker? Should it be named for what its locking? Renamed to LockerByString. If you have a better name... bq. NotifiableConcurrentSkipListMap needs class comment. It seems like its for use in a very particular circumstance. It needs explaining. done. bq. Does it need to be public? Only used in master package? Perhaps make it package private then? The issue was: {noformat} public NotifiableConcurrentSkipListMapString, RegionState getRegionsInTransition() { return regionsInTransition; } {noformat} But it's used in tests only, so I can actually make both package protected. Done. bq. internalList is a bad name for the internal delegate instance. Is 'delegatee' a better name than internalList? done. bq. We checked rit contains a name but then in a separate statement we do the waitForListUpdate? What if the region we are looking for is removed between the check and the waitForListUpdate invocation? Actually yes, it could happen. I added a timeout, so we will now check every 100ms. bq. Will this log be annoying? Removed. I added them while debugging. This one was already there however. I kept it. {noformat} public void removeClosedRegion(HRegionInfo hri) { if (regionsToReopen.remove(hri.getEncodedName()) != null) { LOG.debug(Removed region from reopening regions because it was closed); } } {noformat} bq. Is this true / How is it enforced? Oops, it not enforced (I don't know I could do it), but it's also not true: the update will set it as well. But it's not an issue as it's an atomic long. Comment updated. It's btw tempting to: - change the implementation of updateTimestampToNow to use a lazySet - get the timestamp only once before looping on the region set. I didn't do it in my patch, but I think it should be done. bq. needs space after curly parens. Sometimes you do it and sometimes you don't. Done @ted bq. It would be nice to have a test for NotifiableConcurrentSkipListMap. Will do for final release. bq. Since internalList is actually a Map, name the above method waitForUpdate() ? Done. bq. the above should read 'A utility class to manage a set of locks. Each lock is identified by a String which serves' Done bq. It should be Locker.class Done bq. The constant should be named NB_CONCURRENT_LOCKS. Done bq.The last word should be locked. Done bq. It would be nice to add more about reason. Done. bq. Looking at batchRemove() of http://www.docjar.com/html/api/java/util/ArrayList.java.html around line 669, I don't see synchronization. Meaning, existence check of elements from nodes in regionsInTransition.keySet() may not be deterministic. After looking at the java api code, I don't think there is an issue here. The set we're using is documented as: The view's iterator is a weakly consistent iterator that will never throw ConcurrentModificationException, and guarantees to traverse elements as they existed upon construction of the iterator, and may (but is not guaranteed to) reflect any modifications subsequent to construction.. So we won't have any java error. Then, if an element is added/removed to/from the RIT while we're doing the removeAll, it may be added/removed or not, but we're not less deterministic that we would be by adding a lock around the removeAll: the add/remove could be as well be done just before/after we take the lock, and we would not know it. I'm currently checking how it works with split, then I will update it to the current trunk. Improve RIT performances during assignment on large clusters Key: HBASE-6109 URL: https://issues.apache.org/jira/browse/HBASE-6109 Project: HBase Issue Type: Improvement Components: master Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 6109.v7.patch The main points in this patch are: - lowering the number of copy of the RIT list - lowering the number of synchronization - synchronizing on a region rather than on everything It also contains: - some fixes around the RIT notification: the list was sometimes modified without a corresponding 'notify'. - some tests flakiness correction, actually unrelated to this patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6109) Improve RIT performances during assignment on large clusters
[ https://issues.apache.org/jira/browse/HBASE-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-6109: --- Attachment: 6109.v19.patch Improve RIT performances during assignment on large clusters Key: HBASE-6109 URL: https://issues.apache.org/jira/browse/HBASE-6109 Project: HBase Issue Type: Improvement Components: master Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 6109.v19.patch, 6109.v7.patch The main points in this patch are: - lowering the number of copy of the RIT list - lowering the number of synchronization - synchronizing on a region rather than on everything It also contains: - some fixes around the RIT notification: the list was sometimes modified without a corresponding 'notify'. - some tests flakiness correction, actually unrelated to this patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6109) Improve RIT performances during assignment on large clusters
[ https://issues.apache.org/jira/browse/HBASE-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-6109: --- Status: Patch Available (was: Open) Improve RIT performances during assignment on large clusters Key: HBASE-6109 URL: https://issues.apache.org/jira/browse/HBASE-6109 Project: HBase Issue Type: Improvement Components: master Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 6109.v19.patch, 6109.v7.patch The main points in this patch are: - lowering the number of copy of the RIT list - lowering the number of synchronization - synchronizing on a region rather than on everything It also contains: - some fixes around the RIT notification: the list was sometimes modified without a corresponding 'notify'. - some tests flakiness correction, actually unrelated to this patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6109) Improve RIT performances during assignment on large clusters
[ https://issues.apache.org/jira/browse/HBASE-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13285671#comment-13285671 ] nkeywal commented on HBASE-6109: I think it's ok for a commit. From the code I read, we should have the same behavior as before on split. I will write some parallel tests later on, but I would expect the same behavior as today at least. It may take time as I may encounter some flakiness on this path ;-). I don't have a test class for NotifiableConcurrentSkipListMap, this class is small so I don't think it's an issue right now. I will push one with the other tests I will write. Improve RIT performances during assignment on large clusters Key: HBASE-6109 URL: https://issues.apache.org/jira/browse/HBASE-6109 Project: HBase Issue Type: Improvement Components: master Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 6109.v19.patch, 6109.v7.patch The main points in this patch are: - lowering the number of copy of the RIT list - lowering the number of synchronization - synchronizing on a region rather than on everything It also contains: - some fixes around the RIT notification: the list was sometimes modified without a corresponding 'notify'. - some tests flakiness correction, actually unrelated to this patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6109) Improve RIT performances during assignment on large clusters
[ https://issues.apache.org/jira/browse/HBASE-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-6109: --- Status: Open (was: Patch Available) Improve RIT performances during assignment on large clusters Key: HBASE-6109 URL: https://issues.apache.org/jira/browse/HBASE-6109 Project: HBase Issue Type: Improvement Components: master Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Fix For: 0.96.0 Attachments: 6109.v19.patch, 6109.v7.patch The main points in this patch are: - lowering the number of copy of the RIT list - lowering the number of synchronization - synchronizing on a region rather than on everything It also contains: - some fixes around the RIT notification: the list was sometimes modified without a corresponding 'notify'. - some tests flakiness correction, actually unrelated to this patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6109) Improve RIT performances during assignment on large clusters
[ https://issues.apache.org/jira/browse/HBASE-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13285828#comment-13285828 ] nkeywal commented on HBASE-6109: bq. Rename TestLocker class to TestKeyLocker ? Done. bq. 'shares with' - 'shared with' Done. bq. Indentation in AssignmentManager.addToRITandCallClose() was off. It would be nice to correct the existing lines. Done bq. 'share synchronized' - 'synchronized'. Remove the 'todo nli:' at the end. Done bq. Insert spaces around = sign. Done. bq. 'are in' - 'is in' Done bq. Why not call the method clone() ? We don't really want the NotifiableConcurrentSkipListMap to be cloneable: however, some functions want to work on a copy of the data structure, for reporting or test (with all the 'Map' semantic), hence the internal clone. bq. Suppose delegatee is empty upon entry to the above method, what if an entry is added after the isEmpty() check ? It will be equivalent to adding it just after the clear. bq. 'A number' - 'The number' Done. bq. 'number of people' - 'number of users' Done bq. 'it's equals to zero.' - 'it's equal to zero.' Done bq. The outer class is generic. The inner class shouldn't mention Region. Done Improve RIT performances during assignment on large clusters Key: HBASE-6109 URL: https://issues.apache.org/jira/browse/HBASE-6109 Project: HBase Issue Type: Improvement Components: master Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Fix For: 0.96.0 Attachments: 6109.v19.patch, 6109.v7.patch The main points in this patch are: - lowering the number of copy of the RIT list - lowering the number of synchronization - synchronizing on a region rather than on everything It also contains: - some fixes around the RIT notification: the list was sometimes modified without a corresponding 'notify'. - some tests flakiness correction, actually unrelated to this patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6109) Improve RIT performances during assignment on large clusters
[ https://issues.apache.org/jira/browse/HBASE-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-6109: --- Attachment: 6109.v21.patch Improve RIT performances during assignment on large clusters Key: HBASE-6109 URL: https://issues.apache.org/jira/browse/HBASE-6109 Project: HBase Issue Type: Improvement Components: master Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Fix For: 0.96.0 Attachments: 6109.v19.patch, 6109.v21.patch, 6109.v7.patch The main points in this patch are: - lowering the number of copy of the RIT list - lowering the number of synchronization - synchronizing on a region rather than on everything It also contains: - some fixes around the RIT notification: the list was sometimes modified without a corresponding 'notify'. - some tests flakiness correction, actually unrelated to this patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6109) Improve RIT performances during assignment on large clusters
[ https://issues.apache.org/jira/browse/HBASE-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13285909#comment-13285909 ] nkeywal commented on HBASE-6109: I need to have a look at this one. org.apache.hadoop.hbase.master.TestAssignmentManager.testRegionInOpeningStateOnDeadRSWhileMasterFailover Improve RIT performances during assignment on large clusters Key: HBASE-6109 URL: https://issues.apache.org/jira/browse/HBASE-6109 Project: HBase Issue Type: Improvement Components: master Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Fix For: 0.96.0 Attachments: 6109.v19.patch, 6109.v21.patch, 6109.v7.patch The main points in this patch are: - lowering the number of copy of the RIT list - lowering the number of synchronization - synchronizing on a region rather than on everything It also contains: - some fixes around the RIT notification: the list was sometimes modified without a corresponding 'notify'. - some tests flakiness correction, actually unrelated to this patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6109) Improve RIT performances during assignment on large clusters
[ https://issues.apache.org/jira/browse/HBASE-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286401#comment-13286401 ] nkeywal commented on HBASE-6109: testRegionInOpeningStateOnDeadRSWhileMasterFailover fails at this line: {noformat} public void testRegionInOpeningStateOnDeadRSWhileMasterFailover() throws IOException, KeeperException, ServiceException, InterruptedException { AssignmentManagerWithExtrasForTesting am = setUpMockedAssignmentManager(this.server, this.serverManager); ZKAssign.createNodeOffline(this.watcher, REGIONINFO, SERVERNAME_A); == FAILED HERE: KeeperErrorCode = NodeExists for /hbase/unassigned/5c7fe078551611acb0923a9ca0e1e1f4 {noformat} So it's more a test error. This node should be deleted in the after() clause of the previous, for whatever reason it was not or was recreated after the delete. Investigating... Improve RIT performances during assignment on large clusters Key: HBASE-6109 URL: https://issues.apache.org/jira/browse/HBASE-6109 Project: HBase Issue Type: Improvement Components: master Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Fix For: 0.96.0 Attachments: 6109.v19.patch, 6109.v21.patch, 6109.v7.patch The main points in this patch are: - lowering the number of copy of the RIT list - lowering the number of synchronization - synchronizing on a region rather than on everything It also contains: - some fixes around the RIT notification: the list was sometimes modified without a corresponding 'notify'. - some tests flakiness correction, actually unrelated to this patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6109) Improve RIT performances during assignment on large clusters
[ https://issues.apache.org/jira/browse/HBASE-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-6109: --- Attachment: 6109.v23.patch Improve RIT performances during assignment on large clusters Key: HBASE-6109 URL: https://issues.apache.org/jira/browse/HBASE-6109 Project: HBase Issue Type: Improvement Components: master Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Fix For: 0.96.0 Attachments: 6109.v19.patch, 6109.v21.patch, 6109.v23.patch, 6109.v7.patch The main points in this patch are: - lowering the number of copy of the RIT list - lowering the number of synchronization - synchronizing on a region rather than on everything It also contains: - some fixes around the RIT notification: the list was sometimes modified without a corresponding 'notify'. - some tests flakiness correction, actually unrelated to this patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6109) Improve RIT performances during assignment on large clusters
[ https://issues.apache.org/jira/browse/HBASE-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-6109: --- Status: Patch Available (was: Open) Improve RIT performances during assignment on large clusters Key: HBASE-6109 URL: https://issues.apache.org/jira/browse/HBASE-6109 Project: HBase Issue Type: Improvement Components: master Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Fix For: 0.96.0 Attachments: 6109.v19.patch, 6109.v21.patch, 6109.v23.patch, 6109.v7.patch The main points in this patch are: - lowering the number of copy of the RIT list - lowering the number of synchronization - synchronizing on a region rather than on everything It also contains: - some fixes around the RIT notification: the list was sometimes modified without a corresponding 'notify'. - some tests flakiness correction, actually unrelated to this patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6109) Improve RIT performances during assignment on large clusters
[ https://issues.apache.org/jira/browse/HBASE-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286467#comment-13286467 ] nkeywal commented on HBASE-6109: @ram You're right, I forgot to remove my flakiness detector before doing the patch. Ok, I'm good for a v24 then. I will do it after looking at the test results for the v23... Improve RIT performances during assignment on large clusters Key: HBASE-6109 URL: https://issues.apache.org/jira/browse/HBASE-6109 Project: HBase Issue Type: Improvement Components: master Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Fix For: 0.96.0 Attachments: 6109.v19.patch, 6109.v21.patch, 6109.v23.patch, 6109.v7.patch The main points in this patch are: - lowering the number of copy of the RIT list - lowering the number of synchronization - synchronizing on a region rather than on everything It also contains: - some fixes around the RIT notification: the list was sometimes modified without a corresponding 'notify'. - some tests flakiness correction, actually unrelated to this patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6109) Improve RIT performances during assignment on large clusters
[ https://issues.apache.org/jira/browse/HBASE-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-6109: --- Status: Open (was: Patch Available) Improve RIT performances during assignment on large clusters Key: HBASE-6109 URL: https://issues.apache.org/jira/browse/HBASE-6109 Project: HBase Issue Type: Improvement Components: master Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Fix For: 0.96.0 Attachments: 6109.v19.patch, 6109.v21.patch, 6109.v23.patch, 6109.v7.patch The main points in this patch are: - lowering the number of copy of the RIT list - lowering the number of synchronization - synchronizing on a region rather than on everything It also contains: - some fixes around the RIT notification: the list was sometimes modified without a corresponding 'notify'. - some tests flakiness correction, actually unrelated to this patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6109) Improve RIT performances during assignment on large clusters
[ https://issues.apache.org/jira/browse/HBASE-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-6109: --- Status: Patch Available (was: Open) Improve RIT performances during assignment on large clusters Key: HBASE-6109 URL: https://issues.apache.org/jira/browse/HBASE-6109 Project: HBase Issue Type: Improvement Components: master Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Fix For: 0.96.0 Attachments: 6109.v19.patch, 6109.v21.patch, 6109.v23.patch, 6109.v24.patch, 6109.v7.patch The main points in this patch are: - lowering the number of copy of the RIT list - lowering the number of synchronization - synchronizing on a region rather than on everything It also contains: - some fixes around the RIT notification: the list was sometimes modified without a corresponding 'notify'. - some tests flakiness correction, actually unrelated to this patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6109) Improve RIT performances during assignment on large clusters
[ https://issues.apache.org/jira/browse/HBASE-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-6109: --- Attachment: 6109.v24.patch Improve RIT performances during assignment on large clusters Key: HBASE-6109 URL: https://issues.apache.org/jira/browse/HBASE-6109 Project: HBase Issue Type: Improvement Components: master Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Fix For: 0.96.0 Attachments: 6109.v19.patch, 6109.v21.patch, 6109.v23.patch, 6109.v24.patch, 6109.v7.patch The main points in this patch are: - lowering the number of copy of the RIT list - lowering the number of synchronization - synchronizing on a region rather than on everything It also contains: - some fixes around the RIT notification: the list was sometimes modified without a corresponding 'notify'. - some tests flakiness correction, actually unrelated to this patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6109) Improve RIT performances during assignment on large clusters
[ https://issues.apache.org/jira/browse/HBASE-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286505#comment-13286505 ] nkeywal commented on HBASE-6109: Locally everything is ok and these tests are known as flaky, so I think it's ok. v24 is the version with the comments in TestAssignement. Improve RIT performances during assignment on large clusters Key: HBASE-6109 URL: https://issues.apache.org/jira/browse/HBASE-6109 Project: HBase Issue Type: Improvement Components: master Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Fix For: 0.96.0 Attachments: 6109.v19.patch, 6109.v21.patch, 6109.v23.patch, 6109.v24.patch, 6109.v7.patch The main points in this patch are: - lowering the number of copy of the RIT list - lowering the number of synchronization - synchronizing on a region rather than on everything It also contains: - some fixes around the RIT notification: the list was sometimes modified without a corresponding 'notify'. - some tests flakiness correction, actually unrelated to this patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6109) Improve RIT performances during assignment on large clusters
[ https://issues.apache.org/jira/browse/HBASE-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286544#comment-13286544 ] nkeywal commented on HBASE-6109: ok for commit imho. Improve RIT performances during assignment on large clusters Key: HBASE-6109 URL: https://issues.apache.org/jira/browse/HBASE-6109 Project: HBase Issue Type: Improvement Components: master Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Fix For: 0.96.0 Attachments: 6109.v19.patch, 6109.v21.patch, 6109.v23.patch, 6109.v24.patch, 6109.v7.patch The main points in this patch are: - lowering the number of copy of the RIT list - lowering the number of synchronization - synchronizing on a region rather than on everything It also contains: - some fixes around the RIT notification: the list was sometimes modified without a corresponding 'notify'. - some tests flakiness correction, actually unrelated to this patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6122) Backup master does not become Active master after ZK exception
[ https://issues.apache.org/jira/browse/HBASE-6122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13287278#comment-13287278 ] nkeywal commented on HBASE-6122: @ram bq. I found some changes in the trunk code. So not sure if it is applicable in trunk. Attached patches for 0.94 and 0.92. Do you mean that the problem is not reproducible on trunk? Backup master does not become Active master after ZK exception -- Key: HBASE-6122 URL: https://issues.apache.org/jira/browse/HBASE-6122 Project: HBase Issue Type: Bug Affects Versions: 0.94.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.92.2, 0.94.1 Attachments: HBASE-6122.patch, HBASE-6122_0.92.patch, HBASE-6122_0.94.patch, HBASE-6122_0.94.patch - Active master gets ZK expiry exception. - Backup master becomes active. - The previous active master retries and becomes the back up master. Now when the new active master goes down and the current back up master comes up, it goes down again with the zk expiry exception it got in the first step. {code} if (abortNow(msg, t)) { if (t != null) LOG.fatal(msg, t); else LOG.fatal(msg); this.abort = true; stop(Aborting); } {code} In ActiveMasterManager.blockUntilBecomingActiveMaster we try to wait till the back up master becomes active. {code} synchronized (this.clusterHasActiveMaster) { while (this.clusterHasActiveMaster.get() !this.master.isStopped()) { try { this.clusterHasActiveMaster.wait(); } catch (InterruptedException e) { // We expect to be interrupted when a master dies, will fall out if so LOG.debug(Interrupted waiting for master to die, e); } } if (!clusterStatusTracker.isClusterUp()) { this.master.stop(Cluster went down before this master became active); } if (this.master.isStopped()) { return cleanSetOfActiveMaster; } // Try to become active master again now that there is no active master blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker); } return cleanSetOfActiveMaster; {code} When the back up master (it is in back up mode as he got ZK exception), once again tries to come to active we don't get the return value that comes out from {code} // Try to become active master again now that there is no active master blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker); {code} We tend to return the 'cleanSetOfActiveMaster' which was previously false. Now because of this instead of again becoming active the back up master goes down in the abort() code. Thanks to Gopi,my colleague for reporting this issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6122) Backup master does not become Active master after ZK exception
[ https://issues.apache.org/jira/browse/HBASE-6122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13287306#comment-13287306 ] nkeywal commented on HBASE-6122: Thanks, I will give it a try to be sure. Backup master does not become Active master after ZK exception -- Key: HBASE-6122 URL: https://issues.apache.org/jira/browse/HBASE-6122 Project: HBase Issue Type: Bug Affects Versions: 0.94.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.92.2, 0.94.1 Attachments: HBASE-6122.patch, HBASE-6122_0.92.patch, HBASE-6122_0.94.patch, HBASE-6122_0.94.patch - Active master gets ZK expiry exception. - Backup master becomes active. - The previous active master retries and becomes the back up master. Now when the new active master goes down and the current back up master comes up, it goes down again with the zk expiry exception it got in the first step. {code} if (abortNow(msg, t)) { if (t != null) LOG.fatal(msg, t); else LOG.fatal(msg); this.abort = true; stop(Aborting); } {code} In ActiveMasterManager.blockUntilBecomingActiveMaster we try to wait till the back up master becomes active. {code} synchronized (this.clusterHasActiveMaster) { while (this.clusterHasActiveMaster.get() !this.master.isStopped()) { try { this.clusterHasActiveMaster.wait(); } catch (InterruptedException e) { // We expect to be interrupted when a master dies, will fall out if so LOG.debug(Interrupted waiting for master to die, e); } } if (!clusterStatusTracker.isClusterUp()) { this.master.stop(Cluster went down before this master became active); } if (this.master.isStopped()) { return cleanSetOfActiveMaster; } // Try to become active master again now that there is no active master blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker); } return cleanSetOfActiveMaster; {code} When the back up master (it is in back up mode as he got ZK exception), once again tries to come to active we don't get the return value that comes out from {code} // Try to become active master again now that there is no active master blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker); {code} We tend to return the 'cleanSetOfActiveMaster' which was previously false. Now because of this instead of again becoming active the back up master goes down in the abort() code. Thanks to Gopi,my colleague for reporting this issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5924) In the client code, don't wait for all the requests to be executed before resubmitting a request in error.
[ https://issues.apache.org/jira/browse/HBASE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13287582#comment-13287582 ] nkeywal commented on HBASE-5924: This leads to a complete rewriting of the processBatchCallback function. 3 comments: 1) I don't see how this piece of code can happen, and I ran the complete test suite without getting into this part. Do I miss anything? {noformat} for (PairInteger, Object regionResult : regionResults) { if (regionResult == null) { // if the first/only record is 'null' the entire region failed. LOG.debug(Failures for region: + Bytes.toStringBinary(regionName) + , removing from cache); } else { {noformat} 2) The callback is never used internally. Is this something we should keep for customer code? 3) Do I move it to HTable? There is a comment saying that it does not belong to Connection, and it's true. But it's public, so... In the client code, don't wait for all the requests to be executed before resubmitting a request in error. -- Key: HBASE-5924 URL: https://issues.apache.org/jira/browse/HBASE-5924 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor The client (in the function HConnectionManager#processBatchCallback) works in two steps: - make the requests - collect the failures and successes and prepare for retry It means that when there is an immediate error (region moved, split, dead server, ...) we still wait for all the initial requests to be executed before submitting again the failed request. If we have a scenario with all the requests taking 5 seconds we have a final execution time of: 5 (initial requests) + 1 (wait time) + 5 (final request) = 11s. We could improve this by analyzing immediately the results. This would lead us, for the scenario mentioned above, to 6 seconds. So we could have a performance improvement of nearly 50% in many cases, and much more than 50% if the request execution time is different. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6156) Improve multiop performances in HTable#flushCommits
nkeywal created HBASE-6156: -- Summary: Improve multiop performances in HTable#flushCommits Key: HBASE-6156 URL: https://issues.apache.org/jira/browse/HBASE-6156 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Fix For: 0.96.0 This code: {noformat} @Override public void flushCommits() throws IOException { try { Object[] results = new Object[writeBuffer.size()]; try { this.connection.processBatch(writeBuffer, tableName, pool, results); } catch (InterruptedException e) { throw new IOException(e); } finally { // mutate list so that it is empty for complete success, or contains // only failed records results are returned in the same order as the // requests in list walk the list backwards, so we can remove from list // without impacting the indexes of earlier members for (int i = results.length - 1; i=0; i--) { if (results[i] instanceof Result) { // successful Puts are removed from the list here. writeBuffer.remove(i); } } } } finally { if (clearBufferOnFail) { writeBuffer.clear(); currentWriteBufferSize = 0; } else { // the write buffer was adjusted by processBatchOfPuts currentWriteBufferSize = 0; for (Put aPut : writeBuffer) { currentWriteBufferSize += aPut.heapSize(); } } } } {noformat} Can be improved by: - not iterating on the list if clearBufferOnFail is set - not iterating the the list of there are no error - iterating on the list only once instead of two when we really have to. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5924) In the client code, don't wait for all the requests to be executed before resubmitting a request in error.
[ https://issues.apache.org/jira/browse/HBASE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5924: --- Priority: Major (was: Minor) In the client code, don't wait for all the requests to be executed before resubmitting a request in error. -- Key: HBASE-5924 URL: https://issues.apache.org/jira/browse/HBASE-5924 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal The client (in the function HConnectionManager#processBatchCallback) works in two steps: - make the requests - collect the failures and successes and prepare for retry It means that when there is an immediate error (region moved, split, dead server, ...) we still wait for all the initial requests to be executed before submitting again the failed request. If we have a scenario with all the requests taking 5 seconds we have a final execution time of: 5 (initial requests) + 1 (wait time) + 5 (final request) = 11s. We could improve this by analyzing immediately the results. This would lead us, for the scenario mentioned above, to 6 seconds. So we could have a performance improvement of nearly 50% in many cases, and much more than 50% if the request execution time is different. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5924) In the client code, don't wait for all the requests to be executed before resubmitting a request in error.
[ https://issues.apache.org/jira/browse/HBASE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13289591#comment-13289591 ] nkeywal commented on HBASE-5924: Bumped the priority to major to make clear it's a complete rewriting. Tests are in progress, I will push a version soon anyway to get feedback. Changes: Analyze the replies as they come, not in the initial request order Replay the failed request immediately, not when we have all the replies Reuse the actions in case of errors instead of recreating the objects Don't iterate on the results list to find the errors Don't reiterate on the results list to detail the errors. Note that I removed the 'updateHistory' list but not the code in case the feedback shows it should still be used. Even if it's a one to one implementation, it's preferable to add specific tests. Will do in a later update. And the current implementation stayed in HConnectionManager and kept its callback. Happy to change this. In the client code, don't wait for all the requests to be executed before resubmitting a request in error. -- Key: HBASE-5924 URL: https://issues.apache.org/jira/browse/HBASE-5924 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal The client (in the function HConnectionManager#processBatchCallback) works in two steps: - make the requests - collect the failures and successes and prepare for retry It means that when there is an immediate error (region moved, split, dead server, ...) we still wait for all the initial requests to be executed before submitting again the failed request. If we have a scenario with all the requests taking 5 seconds we have a final execution time of: 5 (initial requests) + 1 (wait time) + 5 (final request) = 11s. We could improve this by analyzing immediately the results. This would lead us, for the scenario mentioned above, to 6 seconds. So we could have a performance improvement of nearly 50% in many cases, and much more than 50% if the request execution time is different. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5924) In the client code, don't wait for all the requests to be executed before resubmitting a request in error.
[ https://issues.apache.org/jira/browse/HBASE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5924: --- Attachment: 5924.v5.patch In the client code, don't wait for all the requests to be executed before resubmitting a request in error. -- Key: HBASE-5924 URL: https://issues.apache.org/jira/browse/HBASE-5924 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Attachments: 5924.v5.patch The client (in the function HConnectionManager#processBatchCallback) works in two steps: - make the requests - collect the failures and successes and prepare for retry It means that when there is an immediate error (region moved, split, dead server, ...) we still wait for all the initial requests to be executed before submitting again the failed request. If we have a scenario with all the requests taking 5 seconds we have a final execution time of: 5 (initial requests) + 1 (wait time) + 5 (final request) = 11s. We could improve this by analyzing immediately the results. This would lead us, for the scenario mentioned above, to 6 seconds. So we could have a performance improvement of nearly 50% in many cases, and much more than 50% if the request execution time is different. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5924) In the client code, don't wait for all the requests to be executed before resubmitting a request in error.
[ https://issues.apache.org/jira/browse/HBASE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5924: --- Status: Patch Available (was: Open) local tests ok! In the client code, don't wait for all the requests to be executed before resubmitting a request in error. -- Key: HBASE-5924 URL: https://issues.apache.org/jira/browse/HBASE-5924 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Attachments: 5924.v5.patch The client (in the function HConnectionManager#processBatchCallback) works in two steps: - make the requests - collect the failures and successes and prepare for retry It means that when there is an immediate error (region moved, split, dead server, ...) we still wait for all the initial requests to be executed before submitting again the failed request. If we have a scenario with all the requests taking 5 seconds we have a final execution time of: 5 (initial requests) + 1 (wait time) + 5 (final request) = 11s. We could improve this by analyzing immediately the results. This would lead us, for the scenario mentioned above, to 6 seconds. So we could have a performance improvement of nearly 50% in many cases, and much more than 50% if the request execution time is different. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6175) TestFSUtils flaky on hdfs getFileStatus method
nkeywal created HBASE-6175: -- Summary: TestFSUtils flaky on hdfs getFileStatus method Key: HBASE-6175 URL: https://issues.apache.org/jira/browse/HBASE-6175 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Trivial Fix For: 0.96.0 This is a simplified version of a TestFSUtils issue: a sleep and the test works 100% of the time. No sleep and it becomes flaky. Root cause unknown. While the issue appears on the tests, the root cause could be an issue on real production system as well. {noformat} @Test public void testFSUTils() throws Exception { final String hosts[] = {host1, host2, host3, host4}; Path testFile = new Path(/test1.txt); HBaseTestingUtility htu = new HBaseTestingUtility(); try { htu.startMiniDFSCluster(hosts).waitActive(); FileSystem fs = htu.getDFSCluster().getFileSystem(); for (int i = 0; i 100; ++i) { FSDataOutputStream out = fs.create(testFile); byte[] data = new byte[1]; out.write(data, 0, 1); out.close(); // Put a sleep here to make me work //Thread.sleep(2000); FileStatus status = fs.getFileStatus(testFile); HDFSBlocksDistribution blocksDistribution = FSUtils.computeHDFSBlocksDistribution(fs, status, 0, status.getLen()); assertEquals(Wrong number of hosts distributing blocks. at iteration +i, 3, blocksDistribution.getTopHosts().size()); fs.delete(testFile, true); } } finally { htu.shutdownMiniDFSCluster(); } } {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5924) In the client code, don't wait for all the requests to be executed before resubmitting a request in error.
[ https://issues.apache.org/jira/browse/HBASE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5924: --- Attachment: 5924.v9.patch In the client code, don't wait for all the requests to be executed before resubmitting a request in error. -- Key: HBASE-5924 URL: https://issues.apache.org/jira/browse/HBASE-5924 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Attachments: 5924.v5.patch, 5924.v9.patch The client (in the function HConnectionManager#processBatchCallback) works in two steps: - make the requests - collect the failures and successes and prepare for retry It means that when there is an immediate error (region moved, split, dead server, ...) we still wait for all the initial requests to be executed before submitting again the failed request. If we have a scenario with all the requests taking 5 seconds we have a final execution time of: 5 (initial requests) + 1 (wait time) + 5 (final request) = 11s. We could improve this by analyzing immediately the results. This would lead us, for the scenario mentioned above, to 6 seconds. So we could have a performance improvement of nearly 50% in many cases, and much more than 50% if the request execution time is different. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5924) In the client code, don't wait for all the requests to be executed before resubmitting a request in error.
[ https://issues.apache.org/jira/browse/HBASE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5924: --- Status: Open (was: Patch Available) In the client code, don't wait for all the requests to be executed before resubmitting a request in error. -- Key: HBASE-5924 URL: https://issues.apache.org/jira/browse/HBASE-5924 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Attachments: 5924.v5.patch, 5924.v9.patch The client (in the function HConnectionManager#processBatchCallback) works in two steps: - make the requests - collect the failures and successes and prepare for retry It means that when there is an immediate error (region moved, split, dead server, ...) we still wait for all the initial requests to be executed before submitting again the failed request. If we have a scenario with all the requests taking 5 seconds we have a final execution time of: 5 (initial requests) + 1 (wait time) + 5 (final request) = 11s. We could improve this by analyzing immediately the results. This would lead us, for the scenario mentioned above, to 6 seconds. So we could have a performance improvement of nearly 50% in many cases, and much more than 50% if the request execution time is different. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5924) In the client code, don't wait for all the requests to be executed before resubmitting a request in error.
[ https://issues.apache.org/jira/browse/HBASE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5924: --- Status: Patch Available (was: Open) In the client code, don't wait for all the requests to be executed before resubmitting a request in error. -- Key: HBASE-5924 URL: https://issues.apache.org/jira/browse/HBASE-5924 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Attachments: 5924.v5.patch, 5924.v9.patch The client (in the function HConnectionManager#processBatchCallback) works in two steps: - make the requests - collect the failures and successes and prepare for retry It means that when there is an immediate error (region moved, split, dead server, ...) we still wait for all the initial requests to be executed before submitting again the failed request. If we have a scenario with all the requests taking 5 seconds we have a final execution time of: 5 (initial requests) + 1 (wait time) + 5 (final request) = 11s. We could improve this by analyzing immediately the results. This would lead us, for the scenario mentioned above, to 6 seconds. So we could have a performance improvement of nearly 50% in many cases, and much more than 50% if the request execution time is different. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5924) In the client code, don't wait for all the requests to be executed before resubmitting a request in error.
[ https://issues.apache.org/jira/browse/HBASE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13290339#comment-13290339 ] nkeywal commented on HBASE-5924: v9, with: - Ted's comment taken into account - Full removal of UpdateHistory stuff - Fix for HBASE-6156 Can be committed imho. In the client code, don't wait for all the requests to be executed before resubmitting a request in error. -- Key: HBASE-5924 URL: https://issues.apache.org/jira/browse/HBASE-5924 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Attachments: 5924.v5.patch, 5924.v9.patch The client (in the function HConnectionManager#processBatchCallback) works in two steps: - make the requests - collect the failures and successes and prepare for retry It means that when there is an immediate error (region moved, split, dead server, ...) we still wait for all the initial requests to be executed before submitting again the failed request. If we have a scenario with all the requests taking 5 seconds we have a final execution time of: 5 (initial requests) + 1 (wait time) + 5 (final request) = 11s. We could improve this by analyzing immediately the results. This would lead us, for the scenario mentioned above, to 6 seconds. So we could have a performance improvement of nearly 50% in many cases, and much more than 50% if the request execution time is different. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5924) In the client code, don't wait for all the requests to be executed before resubmitting a request in error.
[ https://issues.apache.org/jira/browse/HBASE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13290899#comment-13290899 ] nkeywal commented on HBASE-5924: @ted bq. 'origin' - 'original', 'what are the actions to replay' - 'what actions to replay' Done. bq. InterruptedIOException should be thrown. Done. bq. The above is hard to read. A period between 'records' and 'results' ? A period between 'list' and 'walk' ? It was already there previously :-). But I aggree, it's better with some periods or dots. Done. bq. hbase-server/src/main/java/org/apache/hadoop/hbase/util/Triple.java was not included in patch v9. Done @stack bq. Did you add the history in the first place? Why is it safe to remove it now? In the previous code we were updating the locations cache multiple times for the same row, and the second time without the RegionMovedException. So it was necessary to store that we had already taken the error into account for this row... We now update the locations cache only once, so we don't need to store the history anymore. bq. On your three comments above, on 1., on the unused code, it may not be triggered by the test suite – that could just be bad test coverage – but independent, there may have been a reason for it. If your review of processBatchCallback has it making no sense, by all means purge it (as you have done). Yep, for this one removing it allows to simplify the algorithm as I can find the original actions. bq. On 2., the callback, it looks like you kept it. I think that sensible. On 3., can we move it to HTable? Deprecate the current version in favor of the new HTable/HTableInterface version? Would that be too disruptive? We can keep the existing interface, deprecate it, and add the new one in HTable, making it call the old one. Then in the future remove if from HConnection and move the code in HTable. I've done it in v10. bq. Any way you can add tests to prove your claims of improvement above (its hard to review for that... It's hard. Testing that we restart immediately instead of waiting for all results is difficult without adding sleeps and/or mocking a lot of things, because it's not visible at all outside of the method: its interface has not changed, just the internal algorithm. Functionally, it's tested through testRegionCaching (with some extra checks in it in this patch), and it proves that: - it works on nominal case (and you can't start the mini cluster when the nominal case does not work). - it retries when one RS fails - it stops to retry when the number of retries is reached, and throws the right exception with the right content For the performance improvement on nominal case, unfortunately it does not make a big difference. It's cleaner, but the tests done show that it's not important vs. the remaining time. In the client code, don't wait for all the requests to be executed before resubmitting a request in error. -- Key: HBASE-5924 URL: https://issues.apache.org/jira/browse/HBASE-5924 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Attachments: 5924.v5.patch, 5924.v9.patch The client (in the function HConnectionManager#processBatchCallback) works in two steps: - make the requests - collect the failures and successes and prepare for retry It means that when there is an immediate error (region moved, split, dead server, ...) we still wait for all the initial requests to be executed before submitting again the failed request. If we have a scenario with all the requests taking 5 seconds we have a final execution time of: 5 (initial requests) + 1 (wait time) + 5 (final request) = 11s. We could improve this by analyzing immediately the results. This would lead us, for the scenario mentioned above, to 6 seconds. So we could have a performance improvement of nearly 50% in many cases, and much more than 50% if the request execution time is different. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6175) TestFSUtils flaky on hdfs getFileStatus method
[ https://issues.apache.org/jira/browse/HBASE-6175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13290903#comment-13290903 ] nkeywal commented on HBASE-6175: Yes, without much success :-). You're right, I will try on hdfs list. If it doesn't work out I will push a first patch to make the test non-flaky but keep this jira open as the root cause remains. TestFSUtils flaky on hdfs getFileStatus method -- Key: HBASE-6175 URL: https://issues.apache.org/jira/browse/HBASE-6175 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Trivial Fix For: 0.96.0 This is a simplified version of a TestFSUtils issue: a sleep and the test works 100% of the time. No sleep and it becomes flaky. Root cause unknown. While the issue appears on the tests, the root cause could be an issue on real production system as well. {noformat} @Test public void testFSUTils() throws Exception { final String hosts[] = {host1, host2, host3, host4}; Path testFile = new Path(/test1.txt); HBaseTestingUtility htu = new HBaseTestingUtility(); try { htu.startMiniDFSCluster(hosts).waitActive(); FileSystem fs = htu.getDFSCluster().getFileSystem(); for (int i = 0; i 100; ++i) { FSDataOutputStream out = fs.create(testFile); byte[] data = new byte[1]; out.write(data, 0, 1); out.close(); // Put a sleep here to make me work //Thread.sleep(2000); FileStatus status = fs.getFileStatus(testFile); HDFSBlocksDistribution blocksDistribution = FSUtils.computeHDFSBlocksDistribution(fs, status, 0, status.getLen()); assertEquals(Wrong number of hosts distributing blocks. at iteration +i, 3, blocksDistribution.getTopHosts().size()); fs.delete(testFile, true); } } finally { htu.shutdownMiniDFSCluster(); } } {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5924) In the client code, don't wait for all the requests to be executed before resubmitting a request in error.
[ https://issues.apache.org/jira/browse/HBASE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5924: --- Status: Open (was: Patch Available) In the client code, don't wait for all the requests to be executed before resubmitting a request in error. -- Key: HBASE-5924 URL: https://issues.apache.org/jira/browse/HBASE-5924 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Attachments: 5924.v5.patch, 5924.v9.patch The client (in the function HConnectionManager#processBatchCallback) works in two steps: - make the requests - collect the failures and successes and prepare for retry It means that when there is an immediate error (region moved, split, dead server, ...) we still wait for all the initial requests to be executed before submitting again the failed request. If we have a scenario with all the requests taking 5 seconds we have a final execution time of: 5 (initial requests) + 1 (wait time) + 5 (final request) = 11s. We could improve this by analyzing immediately the results. This would lead us, for the scenario mentioned above, to 6 seconds. So we could have a performance improvement of nearly 50% in many cases, and much more than 50% if the request execution time is different. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5924) In the client code, don't wait for all the requests to be executed before resubmitting a request in error.
[ https://issues.apache.org/jira/browse/HBASE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5924: --- Attachment: 5924.v11.patch In the client code, don't wait for all the requests to be executed before resubmitting a request in error. -- Key: HBASE-5924 URL: https://issues.apache.org/jira/browse/HBASE-5924 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Attachments: 5924.v11.patch, 5924.v5.patch, 5924.v9.patch The client (in the function HConnectionManager#processBatchCallback) works in two steps: - make the requests - collect the failures and successes and prepare for retry It means that when there is an immediate error (region moved, split, dead server, ...) we still wait for all the initial requests to be executed before submitting again the failed request. If we have a scenario with all the requests taking 5 seconds we have a final execution time of: 5 (initial requests) + 1 (wait time) + 5 (final request) = 11s. We could improve this by analyzing immediately the results. This would lead us, for the scenario mentioned above, to 6 seconds. So we could have a performance improvement of nearly 50% in many cases, and much more than 50% if the request execution time is different. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5924) In the client code, don't wait for all the requests to be executed before resubmitting a request in error.
[ https://issues.apache.org/jira/browse/HBASE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5924: --- Status: Patch Available (was: Open) In the client code, don't wait for all the requests to be executed before resubmitting a request in error. -- Key: HBASE-5924 URL: https://issues.apache.org/jira/browse/HBASE-5924 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Attachments: 5924.v11.patch, 5924.v5.patch, 5924.v9.patch The client (in the function HConnectionManager#processBatchCallback) works in two steps: - make the requests - collect the failures and successes and prepare for retry It means that when there is an immediate error (region moved, split, dead server, ...) we still wait for all the initial requests to be executed before submitting again the failed request. If we have a scenario with all the requests taking 5 seconds we have a final execution time of: 5 (initial requests) + 1 (wait time) + 5 (final request) = 11s. We could improve this by analyzing immediately the results. This would lead us, for the scenario mentioned above, to 6 seconds. So we could have a performance improvement of nearly 50% in many cases, and much more than 50% if the request execution time is different. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5924) In the client code, don't wait for all the requests to be executed before resubmitting a request in error.
[ https://issues.apache.org/jira/browse/HBASE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13291370#comment-13291370 ] nkeywal commented on HBASE-5924: v11: I changed the names to match HTableInterface#batch, so instead of HTableInterface#processBatchCallback I created HTableInterface#batchCallback. Local tests in progress. In the client code, don't wait for all the requests to be executed before resubmitting a request in error. -- Key: HBASE-5924 URL: https://issues.apache.org/jira/browse/HBASE-5924 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Attachments: 5924.v11.patch, 5924.v5.patch, 5924.v9.patch The client (in the function HConnectionManager#processBatchCallback) works in two steps: - make the requests - collect the failures and successes and prepare for retry It means that when there is an immediate error (region moved, split, dead server, ...) we still wait for all the initial requests to be executed before submitting again the failed request. If we have a scenario with all the requests taking 5 seconds we have a final execution time of: 5 (initial requests) + 1 (wait time) + 5 (final request) = 11s. We could improve this by analyzing immediately the results. This would lead us, for the scenario mentioned above, to 6 seconds. So we could have a performance improvement of nearly 50% in many cases, and much more than 50% if the request execution time is different. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5924) In the client code, don't wait for all the requests to be executed before resubmitting a request in error.
[ https://issues.apache.org/jira/browse/HBASE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13291375#comment-13291375 ] nkeywal commented on HBASE-5924: Ok, I got a random failure in a test I touched (TestRegionServerCoprocessorExceptionWithAbort), but it's because the test is flaky I think (I can be wrong :-) ). I will have a look tomorrow to be sure, but I think the patch v11 is reasonable enough to be committed. In the client code, don't wait for all the requests to be executed before resubmitting a request in error. -- Key: HBASE-5924 URL: https://issues.apache.org/jira/browse/HBASE-5924 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Attachments: 5924.v11.patch, 5924.v5.patch, 5924.v9.patch The client (in the function HConnectionManager#processBatchCallback) works in two steps: - make the requests - collect the failures and successes and prepare for retry It means that when there is an immediate error (region moved, split, dead server, ...) we still wait for all the initial requests to be executed before submitting again the failed request. If we have a scenario with all the requests taking 5 seconds we have a final execution time of: 5 (initial requests) + 1 (wait time) + 5 (final request) = 11s. We could improve this by analyzing immediately the results. This would lead us, for the scenario mentioned above, to 6 seconds. So we could have a performance improvement of nearly 50% in many cases, and much more than 50% if the request execution time is different. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6175) TestFSUtils flaky on hdfs getFileStatus method
[ https://issues.apache.org/jira/browse/HBASE-6175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13291376#comment-13291376 ] nkeywal commented on HBASE-6175: The hbase-free version of the test: {noformat} package org.apache.hadoop.test; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FSDataOutputStream; import org.apache.hadoop.fs.FileStatus; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.hdfs.MiniDFSCluster; import org.junit.Test; import static junit.framework.Assert.assertEquals; public class TestHDFS { @Test public void testFSUTils() throws Exception { final Configuration conf = new Configuration(); final String hosts[] = {host1, host2, host3, host4}; final byte[] data = new byte[1]; // Will fit in one block final Path testFile = new Path(/test1.txt); MiniDFSCluster dfsCluster = new MiniDFSCluster(0, conf, hosts.length, true, true, true, null, null, hosts, null); try { FileSystem fs = dfsCluster.getFileSystem(); dfsCluster.waitClusterUp(); for (int i = 0; i 200; ++i) { FSDataOutputStream out = fs.create(testFile); out.write(data, 0, 1); out.close(); // Put a sleep here to make me work //Thread.sleep(1000); FileStatus status = fs.getFileStatus(testFile); int nbHosts = fs.getFileBlockLocations(status, 0, status.getLen())[0].getHosts().length; assertEquals(1, fs.getFileBlockLocations(status, 0, status.getLen()).length); assertEquals(Wrong number of hosts distributing blocks at iteration + i, 3, nbHosts); fs.delete(testFile, true); } } finally { dfsCluster.shutdown(); } } } {noformat} TestFSUtils flaky on hdfs getFileStatus method -- Key: HBASE-6175 URL: https://issues.apache.org/jira/browse/HBASE-6175 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Trivial Fix For: 0.96.0 This is a simplified version of a TestFSUtils issue: a sleep and the test works 100% of the time. No sleep and it becomes flaky. Root cause unknown. While the issue appears on the tests, the root cause could be an issue on real production system as well. {noformat} @Test public void testFSUTils() throws Exception { final String hosts[] = {host1, host2, host3, host4}; Path testFile = new Path(/test1.txt); HBaseTestingUtility htu = new HBaseTestingUtility(); try { htu.startMiniDFSCluster(hosts).waitActive(); FileSystem fs = htu.getDFSCluster().getFileSystem(); for (int i = 0; i 100; ++i) { FSDataOutputStream out = fs.create(testFile); byte[] data = new byte[1]; out.write(data, 0, 1); out.close(); // Put a sleep here to make me work //Thread.sleep(2000); FileStatus status = fs.getFileStatus(testFile); HDFSBlocksDistribution blocksDistribution = FSUtils.computeHDFSBlocksDistribution(fs, status, 0, status.getLen()); assertEquals(Wrong number of hosts distributing blocks. at iteration +i, 3, blocksDistribution.getTopHosts().size()); fs.delete(testFile, true); } } finally { htu.shutdownMiniDFSCluster(); } } {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5924) In the client code, don't wait for all the requests to be executed before resubmitting a request in error.
[ https://issues.apache.org/jira/browse/HBASE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5924: --- Status: Patch Available (was: Open) In the client code, don't wait for all the requests to be executed before resubmitting a request in error. -- Key: HBASE-5924 URL: https://issues.apache.org/jira/browse/HBASE-5924 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Attachments: 5924.v11.patch, 5924.v14.patch, 5924.v5.patch, 5924.v9.patch The client (in the function HConnectionManager#processBatchCallback) works in two steps: - make the requests - collect the failures and successes and prepare for retry It means that when there is an immediate error (region moved, split, dead server, ...) we still wait for all the initial requests to be executed before submitting again the failed request. If we have a scenario with all the requests taking 5 seconds we have a final execution time of: 5 (initial requests) + 1 (wait time) + 5 (final request) = 11s. We could improve this by analyzing immediately the results. This would lead us, for the scenario mentioned above, to 6 seconds. So we could have a performance improvement of nearly 50% in many cases, and much more than 50% if the request execution time is different. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5924) In the client code, don't wait for all the requests to be executed before resubmitting a request in error.
[ https://issues.apache.org/jira/browse/HBASE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5924: --- Status: Open (was: Patch Available) In the client code, don't wait for all the requests to be executed before resubmitting a request in error. -- Key: HBASE-5924 URL: https://issues.apache.org/jira/browse/HBASE-5924 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Attachments: 5924.v11.patch, 5924.v14.patch, 5924.v5.patch, 5924.v9.patch The client (in the function HConnectionManager#processBatchCallback) works in two steps: - make the requests - collect the failures and successes and prepare for retry It means that when there is an immediate error (region moved, split, dead server, ...) we still wait for all the initial requests to be executed before submitting again the failed request. If we have a scenario with all the requests taking 5 seconds we have a final execution time of: 5 (initial requests) + 1 (wait time) + 5 (final request) = 11s. We could improve this by analyzing immediately the results. This would lead us, for the scenario mentioned above, to 6 seconds. So we could have a performance improvement of nearly 50% in many cases, and much more than 50% if the request execution time is different. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5924) In the client code, don't wait for all the requests to be executed before resubmitting a request in error.
[ https://issues.apache.org/jira/browse/HBASE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5924: --- Attachment: 5924.v14.patch In the client code, don't wait for all the requests to be executed before resubmitting a request in error. -- Key: HBASE-5924 URL: https://issues.apache.org/jira/browse/HBASE-5924 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Attachments: 5924.v11.patch, 5924.v14.patch, 5924.v5.patch, 5924.v9.patch The client (in the function HConnectionManager#processBatchCallback) works in two steps: - make the requests - collect the failures and successes and prepare for retry It means that when there is an immediate error (region moved, split, dead server, ...) we still wait for all the initial requests to be executed before submitting again the failed request. If we have a scenario with all the requests taking 5 seconds we have a final execution time of: 5 (initial requests) + 1 (wait time) + 5 (final request) = 11s. We could improve this by analyzing immediately the results. This would lead us, for the scenario mentioned above, to 6 seconds. So we could have a performance improvement of nearly 50% in many cases, and much more than 50% if the request execution time is different. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5924) In the client code, don't wait for all the requests to be executed before resubmitting a request in error.
[ https://issues.apache.org/jira/browse/HBASE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13291624#comment-13291624 ] nkeywal commented on HBASE-5924: I've done some changes to TestRegionServerCoprocessorExceptionWithAbort. The new implementation is very different but I think closer to what the test really wants to check... In the client code, don't wait for all the requests to be executed before resubmitting a request in error. -- Key: HBASE-5924 URL: https://issues.apache.org/jira/browse/HBASE-5924 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Attachments: 5924.v11.patch, 5924.v14.patch, 5924.v5.patch, 5924.v9.patch The client (in the function HConnectionManager#processBatchCallback) works in two steps: - make the requests - collect the failures and successes and prepare for retry It means that when there is an immediate error (region moved, split, dead server, ...) we still wait for all the initial requests to be executed before submitting again the failed request. If we have a scenario with all the requests taking 5 seconds we have a final execution time of: 5 (initial requests) + 1 (wait time) + 5 (final request) = 11s. We could improve this by analyzing immediately the results. This would lead us, for the scenario mentioned above, to 6 seconds. So we could have a performance improvement of nearly 50% in many cases, and much more than 50% if the request execution time is different. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6175) TestFSUtils flaky on hdfs getFileStatus method
[ https://issues.apache.org/jira/browse/HBASE-6175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13291625#comment-13291625 ] nkeywal commented on HBASE-6175: Todd said on hdfs mailing list: {noformat} This is the expected behavior based on the default configuration of dfs.replication.min. When you close the file, the client waits until all of the DNs have the block fully written, but the DNs report the replica to the NN asychronously. So with the default configuration, the client then only waits for 1 replica to be available before allowing the file to be closed. If you need to wait for more replicas, I would recommend polling after closing the file. {noformat} So I need to check if it's just the test or if HBase really needs to know the exact number of replica. TestFSUtils flaky on hdfs getFileStatus method -- Key: HBASE-6175 URL: https://issues.apache.org/jira/browse/HBASE-6175 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Trivial Fix For: 0.96.0 This is a simplified version of a TestFSUtils issue: a sleep and the test works 100% of the time. No sleep and it becomes flaky. Root cause unknown. While the issue appears on the tests, the root cause could be an issue on real production system as well. {noformat} @Test public void testFSUTils() throws Exception { final String hosts[] = {host1, host2, host3, host4}; Path testFile = new Path(/test1.txt); HBaseTestingUtility htu = new HBaseTestingUtility(); try { htu.startMiniDFSCluster(hosts).waitActive(); FileSystem fs = htu.getDFSCluster().getFileSystem(); for (int i = 0; i 100; ++i) { FSDataOutputStream out = fs.create(testFile); byte[] data = new byte[1]; out.write(data, 0, 1); out.close(); // Put a sleep here to make me work //Thread.sleep(2000); FileStatus status = fs.getFileStatus(testFile); HDFSBlocksDistribution blocksDistribution = FSUtils.computeHDFSBlocksDistribution(fs, status, 0, status.getLen()); assertEquals(Wrong number of hosts distributing blocks. at iteration +i, 3, blocksDistribution.getTopHosts().size()); fs.delete(testFile, true); } } finally { htu.shutdownMiniDFSCluster(); } } {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5924) In the client code, don't wait for all the requests to be executed before resubmitting a request in error.
[ https://issues.apache.org/jira/browse/HBASE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13291656#comment-13291656 ] nkeywal commented on HBASE-5924: I don't reproduce locally the issue on TestServerCustomProtocol, seems ok to me. In the client code, don't wait for all the requests to be executed before resubmitting a request in error. -- Key: HBASE-5924 URL: https://issues.apache.org/jira/browse/HBASE-5924 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Fix For: 0.96.0 Attachments: 5924.v11.patch, 5924.v14.patch, 5924.v5.patch, 5924.v9.patch The client (in the function HConnectionManager#processBatchCallback) works in two steps: - make the requests - collect the failures and successes and prepare for retry It means that when there is an immediate error (region moved, split, dead server, ...) we still wait for all the initial requests to be executed before submitting again the failed request. If we have a scenario with all the requests taking 5 seconds we have a final execution time of: 5 (initial requests) + 1 (wait time) + 5 (final request) = 11s. We could improve this by analyzing immediately the results. This would lead us, for the scenario mentioned above, to 6 seconds. So we could have a performance improvement of nearly 50% in many cases, and much more than 50% if the request execution time is different. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5924) In the client code, don't wait for all the requests to be executed before resubmitting a request in error.
[ https://issues.apache.org/jira/browse/HBASE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13291663#comment-13291663 ] nkeywal commented on HBASE-5924: bq. With RSTracker gone, the following flag is no longer checked: Aggreed, but this unit test is supposed to test if the region server aborted when the coprocessor bugged, not if the regionserver znode is deleted on regionserver abort. I propose to check if there is an existing test on this in the tests suite and if not add it in the regionserver test package. I will comment in this jira if there is already a test, create a new one if I need to extend an existing test case. bq. table.close(); Yeah, I checked and it's still works, with or without the close. In the client code, don't wait for all the requests to be executed before resubmitting a request in error. -- Key: HBASE-5924 URL: https://issues.apache.org/jira/browse/HBASE-5924 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Fix For: 0.96.0 Attachments: 5924.v11.patch, 5924.v14.patch, 5924.v5.patch, 5924.v9.patch The client (in the function HConnectionManager#processBatchCallback) works in two steps: - make the requests - collect the failures and successes and prepare for retry It means that when there is an immediate error (region moved, split, dead server, ...) we still wait for all the initial requests to be executed before submitting again the failed request. If we have a scenario with all the requests taking 5 seconds we have a final execution time of: 5 (initial requests) + 1 (wait time) + 5 (final request) = 11s. We could improve this by analyzing immediately the results. This would lead us, for the scenario mentioned above, to 6 seconds. So we could have a performance improvement of nearly 50% in many cases, and much more than 50% if the request execution time is different. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6175) TestFSUtils flaky on hdfs getFileStatus method
[ https://issues.apache.org/jira/browse/HBASE-6175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13291667#comment-13291667 ] nkeywal commented on HBASE-6175: It's used mainly to estimate and in a cache to prioritize. It's not an issue if we miss one replica sometimes. So it's just a question of fixing the test itself. TestFSUtils flaky on hdfs getFileStatus method -- Key: HBASE-6175 URL: https://issues.apache.org/jira/browse/HBASE-6175 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Trivial Fix For: 0.96.0 This is a simplified version of a TestFSUtils issue: a sleep and the test works 100% of the time. No sleep and it becomes flaky. Root cause unknown. While the issue appears on the tests, the root cause could be an issue on real production system as well. {noformat} @Test public void testFSUTils() throws Exception { final String hosts[] = {host1, host2, host3, host4}; Path testFile = new Path(/test1.txt); HBaseTestingUtility htu = new HBaseTestingUtility(); try { htu.startMiniDFSCluster(hosts).waitActive(); FileSystem fs = htu.getDFSCluster().getFileSystem(); for (int i = 0; i 100; ++i) { FSDataOutputStream out = fs.create(testFile); byte[] data = new byte[1]; out.write(data, 0, 1); out.close(); // Put a sleep here to make me work //Thread.sleep(2000); FileStatus status = fs.getFileStatus(testFile); HDFSBlocksDistribution blocksDistribution = FSUtils.computeHDFSBlocksDistribution(fs, status, 0, status.getLen()); assertEquals(Wrong number of hosts distributing blocks. at iteration +i, 3, blocksDistribution.getTopHosts().size()); fs.delete(testFile, true); } } finally { htu.shutdownMiniDFSCluster(); } } {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5924) In the client code, don't wait for all the requests to be executed before resubmitting a request in error.
[ https://issues.apache.org/jira/browse/HBASE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13291815#comment-13291815 ] nkeywal commented on HBASE-5924: bq. w.r.t. table.close(), it is good programming practice of cleaning up resources. Yes, I agree. I wanted to say in my previous answer: I tested, it works, it can be added. In the client code, don't wait for all the requests to be executed before resubmitting a request in error. -- Key: HBASE-5924 URL: https://issues.apache.org/jira/browse/HBASE-5924 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Fix For: 0.96.0 Attachments: 5924.v11.patch, 5924.v14.patch, 5924.v5.patch, 5924.v9.patch The client (in the function HConnectionManager#processBatchCallback) works in two steps: - make the requests - collect the failures and successes and prepare for retry It means that when there is an immediate error (region moved, split, dead server, ...) we still wait for all the initial requests to be executed before submitting again the failed request. If we have a scenario with all the requests taking 5 seconds we have a final execution time of: 5 (initial requests) + 1 (wait time) + 5 (final request) = 11s. We could improve this by analyzing immediately the results. This would lead us, for the scenario mentioned above, to 6 seconds. So we could have a performance improvement of nearly 50% in many cases, and much more than 50% if the request execution time is different. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5924) In the client code, don't wait for all the requests to be executed before resubmitting a request in error.
[ https://issues.apache.org/jira/browse/HBASE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293775#comment-13293775 ] nkeywal commented on HBASE-5924: @ted: Ok. These issues were already in my initial patch. Could you confirm that you have finished the review? I would like to deliver 'the' final patch. Thank you. In the client code, don't wait for all the requests to be executed before resubmitting a request in error. -- Key: HBASE-5924 URL: https://issues.apache.org/jira/browse/HBASE-5924 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Fix For: 0.96.0 Attachments: 5924.v11.patch, 5924.v14.patch, 5924.v5.patch, 5924.v9.patch The client (in the function HConnectionManager#processBatchCallback) works in two steps: - make the requests - collect the failures and successes and prepare for retry It means that when there is an immediate error (region moved, split, dead server, ...) we still wait for all the initial requests to be executed before submitting again the failed request. If we have a scenario with all the requests taking 5 seconds we have a final execution time of: 5 (initial requests) + 1 (wait time) + 5 (final request) = 11s. We could improve this by analyzing immediately the results. This would lead us, for the scenario mentioned above, to 6 seconds. So we could have a performance improvement of nearly 50% in many cases, and much more than 50% if the request execution time is different. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5924) In the client code, don't wait for all the requests to be executed before resubmitting a request in error.
[ https://issues.apache.org/jira/browse/HBASE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5924: --- Status: Open (was: Patch Available) In the client code, don't wait for all the requests to be executed before resubmitting a request in error. -- Key: HBASE-5924 URL: https://issues.apache.org/jira/browse/HBASE-5924 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Fix For: 0.96.0 Attachments: 5924.v11.patch, 5924.v14.patch, 5924.v5.patch, 5924.v9.patch The client (in the function HConnectionManager#processBatchCallback) works in two steps: - make the requests - collect the failures and successes and prepare for retry It means that when there is an immediate error (region moved, split, dead server, ...) we still wait for all the initial requests to be executed before submitting again the failed request. If we have a scenario with all the requests taking 5 seconds we have a final execution time of: 5 (initial requests) + 1 (wait time) + 5 (final request) = 11s. We could improve this by analyzing immediately the results. This would lead us, for the scenario mentioned above, to 6 seconds. So we could have a performance improvement of nearly 50% in many cases, and much more than 50% if the request execution time is different. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5924) In the client code, don't wait for all the requests to be executed before resubmitting a request in error.
[ https://issues.apache.org/jira/browse/HBASE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5924: --- Attachment: 5924.v19.patch In the client code, don't wait for all the requests to be executed before resubmitting a request in error. -- Key: HBASE-5924 URL: https://issues.apache.org/jira/browse/HBASE-5924 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Fix For: 0.96.0 Attachments: 5924.v11.patch, 5924.v14.patch, 5924.v19.patch, 5924.v5.patch, 5924.v9.patch The client (in the function HConnectionManager#processBatchCallback) works in two steps: - make the requests - collect the failures and successes and prepare for retry It means that when there is an immediate error (region moved, split, dead server, ...) we still wait for all the initial requests to be executed before submitting again the failed request. If we have a scenario with all the requests taking 5 seconds we have a final execution time of: 5 (initial requests) + 1 (wait time) + 5 (final request) = 11s. We could improve this by analyzing immediately the results. This would lead us, for the scenario mentioned above, to 6 seconds. So we could have a performance improvement of nearly 50% in many cases, and much more than 50% if the request execution time is different. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5924) In the client code, don't wait for all the requests to be executed before resubmitting a request in error.
[ https://issues.apache.org/jira/browse/HBASE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13294587#comment-13294587 ] nkeywal commented on HBASE-5924: v19 with all the comments taken into account. I will create an another jira to rearrange the coprocessors tests on the znode. In the client code, don't wait for all the requests to be executed before resubmitting a request in error. -- Key: HBASE-5924 URL: https://issues.apache.org/jira/browse/HBASE-5924 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Fix For: 0.96.0 Attachments: 5924.v11.patch, 5924.v14.patch, 5924.v19.patch, 5924.v5.patch, 5924.v9.patch The client (in the function HConnectionManager#processBatchCallback) works in two steps: - make the requests - collect the failures and successes and prepare for retry It means that when there is an immediate error (region moved, split, dead server, ...) we still wait for all the initial requests to be executed before submitting again the failed request. If we have a scenario with all the requests taking 5 seconds we have a final execution time of: 5 (initial requests) + 1 (wait time) + 5 (final request) = 11s. We could improve this by analyzing immediately the results. This would lead us, for the scenario mentioned above, to 6 seconds. So we could have a performance improvement of nearly 50% in many cases, and much more than 50% if the request execution time is different. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5924) In the client code, don't wait for all the requests to be executed before resubmitting a request in error.
[ https://issues.apache.org/jira/browse/HBASE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5924: --- Status: Patch Available (was: Open) In the client code, don't wait for all the requests to be executed before resubmitting a request in error. -- Key: HBASE-5924 URL: https://issues.apache.org/jira/browse/HBASE-5924 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Fix For: 0.96.0 Attachments: 5924.v11.patch, 5924.v14.patch, 5924.v19.patch, 5924.v5.patch, 5924.v9.patch The client (in the function HConnectionManager#processBatchCallback) works in two steps: - make the requests - collect the failures and successes and prepare for retry It means that when there is an immediate error (region moved, split, dead server, ...) we still wait for all the initial requests to be executed before submitting again the failed request. If we have a scenario with all the requests taking 5 seconds we have a final execution time of: 5 (initial requests) + 1 (wait time) + 5 (final request) = 11s. We could improve this by analyzing immediately the results. This would lead us, for the scenario mentioned above, to 6 seconds. So we could have a performance improvement of nearly 50% in many cases, and much more than 50% if the request execution time is different. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6109) Improve RIT performances during assignment on large clusters
[ https://issues.apache.org/jira/browse/HBASE-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-6109: --- Resolution: Fixed Status: Resolved (was: Patch Available) Improve RIT performances during assignment on large clusters Key: HBASE-6109 URL: https://issues.apache.org/jira/browse/HBASE-6109 Project: HBase Issue Type: Improvement Components: master Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Fix For: 0.96.0 Attachments: 6109.v19.patch, 6109.v21.patch, 6109.v23.patch, 6109.v24.patch, 6109.v7.patch The main points in this patch are: - lowering the number of copy of the RIT list - lowering the number of synchronization - synchronizing on a region rather than on everything It also contains: - some fixes around the RIT notification: the list was sometimes modified without a corresponding 'notify'. - some tests flakiness correction, actually unrelated to this patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5924) In the client code, don't wait for all the requests to be executed before resubmitting a request in error.
[ https://issues.apache.org/jira/browse/HBASE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5924: --- Resolution: Fixed Status: Resolved (was: Patch Available) In the client code, don't wait for all the requests to be executed before resubmitting a request in error. -- Key: HBASE-5924 URL: https://issues.apache.org/jira/browse/HBASE-5924 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Fix For: 0.96.0 Attachments: 5924.v11.patch, 5924.v14.patch, 5924.v19.patch, 5924.v5.patch, 5924.v9.patch The client (in the function HConnectionManager#processBatchCallback) works in two steps: - make the requests - collect the failures and successes and prepare for retry It means that when there is an immediate error (region moved, split, dead server, ...) we still wait for all the initial requests to be executed before submitting again the failed request. If we have a scenario with all the requests taking 5 seconds we have a final execution time of: 5 (initial requests) + 1 (wait time) + 5 (final request) = 11s. We could improve this by analyzing immediately the results. This would lead us, for the scenario mentioned above, to 6 seconds. So we could have a performance improvement of nearly 50% in many cases, and much more than 50% if the request execution time is different. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-6156) Improve multiop performances in HTable#flushCommits
[ https://issues.apache.org/jira/browse/HBASE-6156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal resolved HBASE-6156. Resolution: Fixed Fixed in HBASE-5924 Improve multiop performances in HTable#flushCommits --- Key: HBASE-6156 URL: https://issues.apache.org/jira/browse/HBASE-6156 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Fix For: 0.96.0 This code: {noformat} @Override public void flushCommits() throws IOException { try { Object[] results = new Object[writeBuffer.size()]; try { this.connection.processBatch(writeBuffer, tableName, pool, results); } catch (InterruptedException e) { throw new IOException(e); } finally { // mutate list so that it is empty for complete success, or contains // only failed records results are returned in the same order as the // requests in list walk the list backwards, so we can remove from list // without impacting the indexes of earlier members for (int i = results.length - 1; i=0; i--) { if (results[i] instanceof Result) { // successful Puts are removed from the list here. writeBuffer.remove(i); } } } } finally { if (clearBufferOnFail) { writeBuffer.clear(); currentWriteBufferSize = 0; } else { // the write buffer was adjusted by processBatchOfPuts currentWriteBufferSize = 0; for (Put aPut : writeBuffer) { currentWriteBufferSize += aPut.heapSize(); } } } } {noformat} Can be improved by: - not iterating on the list if clearBufferOnFail is set - not iterating the the list of there are no error - iterating on the list only once instead of two when we really have to. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6156) Improve multiop performances in HTable#flushCommits
[ https://issues.apache.org/jira/browse/HBASE-6156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13397556#comment-13397556 ] nkeywal commented on HBASE-6156: Small status as of June. * Improvements identified Failure detection time: performed by ZK, with a timeout. With the default value, we needed 90 seconds before starting to act on a software or hardware issue. Recovery time - server side: split in two parts: reassigning the regions of a dead RS to a new RS, replaying the WAL. Must be as fast as possible. Recovery time - client side: errors should be transparent for the user code. On the client side, we must as well limit the time lost on errors to a minimum. Planned rolling restart: just make this as fast and less disruptive as possible Other possible changes. detailed below. * Status Failure detection time: software crash - done Done in HBASE-5844, HBASE-5926 Failure detection time: hardware issue - not started 1) as much as possible, it should be handled by ZooKeeper and not HBase, see open Jira as ZOOKEEPER-702, ZOOKEEPER-922, ... 2) we need to make easy for a monitoring tool to tag a RS or Master as dead. This way, specialized HW tools could point out dead RS. Jira to open. Recovery time - Server: in progress 1) bulk assignment: To be retested, there are many just-closed JIRA on this (HBASE-5998, HBASE-6109, HBASE-5970, ...). A lot of work by many people. There are still possible improvements (HBASE-6058, ...) 2) Log replay: To be retested, there are many just-closed JIRA on this (HBASE-6134, ...). Recovery time - Client - done 1) The RS now returns the new RS to the client after a region move (HBASE-5992, HBASE-5877) 2) Client retries sooner on errors (HBASE-5924). 3) In the future, it could be interesting to share the region location in ZK with the client. It's not reasonable today as it could lead to have too many connection to ZK. ZOOKEEPER-1147 is an open JIRA on this. Planned rolling restart performances - in progress Benefits from the modifications in the client mentioned above. To do: analyze move performances to make it faster if possible. Other possible changes - Restart the server immediately on software crash: done in HBASE-5939 - Reuse the same assignment on software crash: not planned - Use spare hardware to reuse the same assignment on hardware failure: not planned - Multiple RS for the same region (excluded in the initial document: hbase architecture change previously discussed by Gary H./Andy P.): not planned Improve multiop performances in HTable#flushCommits --- Key: HBASE-6156 URL: https://issues.apache.org/jira/browse/HBASE-6156 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Fix For: 0.96.0 This code: {noformat} @Override public void flushCommits() throws IOException { try { Object[] results = new Object[writeBuffer.size()]; try { this.connection.processBatch(writeBuffer, tableName, pool, results); } catch (InterruptedException e) { throw new IOException(e); } finally { // mutate list so that it is empty for complete success, or contains // only failed records results are returned in the same order as the // requests in list walk the list backwards, so we can remove from list // without impacting the indexes of earlier members for (int i = results.length - 1; i=0; i--) { if (results[i] instanceof Result) { // successful Puts are removed from the list here. writeBuffer.remove(i); } } } } finally { if (clearBufferOnFail) { writeBuffer.clear(); currentWriteBufferSize = 0; } else { // the write buffer was adjusted by processBatchOfPuts currentWriteBufferSize = 0; for (Put aPut : writeBuffer) { currentWriteBufferSize += aPut.heapSize(); } } } } {noformat} Can be improved by: - not iterating on the list if clearBufferOnFail is set - not iterating the the list of there are no error - iterating on the list only once instead of two when we really have to. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5843) Improve HBase MTTR - Mean Time To Recover
[ https://issues.apache.org/jira/browse/HBASE-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13397559#comment-13397559 ] nkeywal commented on HBASE-5843: Small status as of June. * Improvements identified Failure detection time: performed by ZK, with a timeout. With the default value, we needed 90 seconds before starting to act on a software or hardware issue. Recovery time - server side: split in two parts: reassigning the regions of a dead RS to a new RS, replaying the WAL. Must be as fast as possible. Recovery time - client side: errors should be transparent for the user code. On the client side, we must as well limit the time lost on errors to a minimum. Planned rolling restart: just make this as fast and less disruptive as possible Other possible changes. detailed below. * Status Failure detection time: software crash - done Done in HBASE-5844, HBASE-5926 Failure detection time: hardware issue - not started 1) as much as possible, it should be handled by ZooKeeper and not HBase, see open Jira as ZOOKEEPER-702, ZOOKEEPER-922, ... 2) we need to make easy for a monitoring tool to tag a RS or Master as dead. This way, specialized HW tools could point out dead RS. Jira to open. Recovery time - Server: in progress 1) bulk assignment: To be retested, there are many just-closed JIRA on this (HBASE-5998, HBASE-6109, HBASE-5970, ...). A lot of work by many people. There are still possible improvements (HBASE-6058, ...) 2) Log replay: To be retested, there are many just-closed JIRA on this (HBASE-6134, ...). Recovery time - Client - done 1) The RS now returns the new RS to the client after a region move (HBASE-5992, HBASE-5877) 2) Client retries sooner on errors (HBASE-5924). 3) In the future, it could be interesting to share the region location in ZK with the client. It's not reasonable today as it could lead to have too many connection to ZK. ZOOKEEPER-1147 is an open JIRA on this. Planned rolling restart performances - in progress Benefits from the modifications in the client mentioned above. To do: analyze move performances to make it faster if possible. Other possible changes Restart the server immediately on software crash: done in HBASE-5939 Reuse the same assignment on software crash: not planned Use spare hardware to reuse the same assignment on hardware failure: not planned Multiple RS for the same region (excluded in the initial document: hbase architecture change previously discussed by Gary H./Andy P.): not planned Improve HBase MTTR - Mean Time To Recover - Key: HBASE-5843 URL: https://issues.apache.org/jira/browse/HBASE-5843 Project: HBase Issue Type: Umbrella Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal A part of the approach is described here: https://docs.google.com/document/d/1z03xRoZrIJmg7jsWuyKYl6zNournF_7ZHzdi0qz_B4c/edit The ideal target is: - failure impact client applications only by an added delay to execute a query, whatever the failure. - this delay is always inferior to 1 second. We're not going to achieve that immediately... Priority will be given to the most frequent issues. Short term: - software crash - standard administrative tasks as stop/start of a cluster. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6156) Improve multiop performances in HTable#flushCommits
[ https://issues.apache.org/jira/browse/HBASE-6156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13397562#comment-13397562 ] nkeywal commented on HBASE-6156: It seems I can't update my comments... The comment above is for hbase-5843. Improve multiop performances in HTable#flushCommits --- Key: HBASE-6156 URL: https://issues.apache.org/jira/browse/HBASE-6156 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Fix For: 0.96.0 This code: {noformat} @Override public void flushCommits() throws IOException { try { Object[] results = new Object[writeBuffer.size()]; try { this.connection.processBatch(writeBuffer, tableName, pool, results); } catch (InterruptedException e) { throw new IOException(e); } finally { // mutate list so that it is empty for complete success, or contains // only failed records results are returned in the same order as the // requests in list walk the list backwards, so we can remove from list // without impacting the indexes of earlier members for (int i = results.length - 1; i=0; i--) { if (results[i] instanceof Result) { // successful Puts are removed from the list here. writeBuffer.remove(i); } } } } finally { if (clearBufferOnFail) { writeBuffer.clear(); currentWriteBufferSize = 0; } else { // the write buffer was adjusted by processBatchOfPuts currentWriteBufferSize = 0; for (Put aPut : writeBuffer) { currentWriteBufferSize += aPut.heapSize(); } } } } {noformat} Can be improved by: - not iterating on the list if clearBufferOnFail is set - not iterating the the list of there are no error - iterating on the list only once instead of two when we really have to. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6058) Use ZK 3.4 API 'multi' in bulk assignment
[ https://issues.apache.org/jira/browse/HBASE-6058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13398282#comment-13398282 ] nkeywal commented on HBASE-6058: All the tests I made with multi were successful. But they were only tests :-). Note it's not totally trivial to use it in bulk assignment, because we have two levels of asynchronous calls (the callback calls another asynchronous function). So supporting both ZK version (with without multi) would be looking for issue imho. And fixing ZOOKEEPER-1381 would help on deployment, today we hang if we call multi on a 3.3 ZK server... Anyway, I will redo some perfo tests to see where we are now with the current implementation. Use ZK 3.4 API 'multi' in bulk assignment - Key: HBASE-6058 URL: https://issues.apache.org/jira/browse/HBASE-6058 Project: HBase Issue Type: Improvement Components: master, zookeeper Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor We use async API today. This is already much much faster than the sync API. Still, it makes sense to use the 'multi' function: this will decrease the network zookeeper load at startup/rolling restart. On a 500 nodes cluster, we see 3 that 3 seconds are spent on updating ZK per bulk assignment. This should cut it in half (+ the benefits on the network/zk load). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5843) Improve HBase MTTR - Mean Time To Recover
[ https://issues.apache.org/jira/browse/HBASE-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13398535#comment-13398535 ] nkeywal commented on HBASE-5843: @ram Thanks for pointing this one out. I will wait for the fix before redoing a perf check. @andrew Yes, there should be no issue. HBASE-5926 modifies the content of HBASE-5844, so the merged patch will be smaller. I will have a look. Improve HBase MTTR - Mean Time To Recover - Key: HBASE-5843 URL: https://issues.apache.org/jira/browse/HBASE-5843 Project: HBase Issue Type: Umbrella Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal A part of the approach is described here: https://docs.google.com/document/d/1z03xRoZrIJmg7jsWuyKYl6zNournF_7ZHzdi0qz_B4c/edit The ideal target is: - failure impact client applications only by an added delay to execute a query, whatever the failure. - this delay is always inferior to 1 second. We're not going to achieve that immediately... Priority will be given to the most frequent issues. Short term: - software crash - standard administrative tasks as stop/start of a cluster. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4671) HBaseTestingUtility unable to connect to regionserver because of 127.0.0.1 / 127.0.1.1 discrepancy
[ https://issues.apache.org/jira/browse/HBASE-4671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13399974#comment-13399974 ] nkeywal commented on HBASE-4671: From the hbase reference guide: http://hbase.apache.org/book.html#os {noformat} 2.2.3. Loopback IP HBase expects the loopback IP address to be 127.0.0.1. Ubuntu and some other distributions, for example, will default to 127.0.1.1 and this will cause problems for you. /etc/hosts should look something like this: 127.0.0.1 localhost 127.0.0.1 ubuntu.ubuntu-domain ubuntu {noformat} HBaseTestingUtility unable to connect to regionserver because of 127.0.0.1 / 127.0.1.1 discrepancy -- Key: HBASE-4671 URL: https://issues.apache.org/jira/browse/HBASE-4671 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.90.4 Environment: At least Ubuntu 11.10 with a default hosts file. Reporter: Ferdy Galema When /etc/hosts contains following lines (and this is not uncommon) it will cause HBaseTestingUtility to malfunction. 127.0.0.1 localhost 127.0.1.1 myMachineName Symptoms: 2011-10-25 17:38:30,875 WARN master.AssignmentManager - Failed assignment of -ROOT-,,0.70236052 to serverName=localhost,34462,1319557102914, load=(requests=0, regions=0, usedHeap=46, maxHeap=865), trying to assign elsewhere instead; retry=0 org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed setting up proxy interface org.apache.hadoop.hbase.ipc.HRegionInterface to /127.0.0.1:34462 after attempts=1 because 2011-10-25 17:38:28,371 INFO regionserver.HRegionServer - Serving as localhost,34462,1319557102914, RPC listening on /127.0.1.1:34462, sessionid=0x1333bbb7a180002 caused by /127.0.0.1:34462 vs /127.0.1.1:34462 Workaround: Changing 127.0.1.1 to 127.0.0.1 works. Permanent solution: Dunno, my understanding of inner workings is not sufficient enough. Although it seems like it has something to do with changing the machine name from myMachineName to localhost during the test: 2011-10-25 17:38:28,056 INFO regionserver.HRegionServer - Master passed us address to use. Was=myMachineName:34462, Now=localhost:34462 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6273) HMasterInterface.isMasterRunning() requires clean up
[ https://issues.apache.org/jira/browse/HBASE-6273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402064#comment-13402064 ] nkeywal commented on HBASE-6273: +1 as well. Errors can be: 1) Can't connect to ZK - Hence you can't get the master address. This is especially true in 0.96 as the connection to ZK is made on demand. I think we need to have/keep this as an exception 2) There is no info on the master address in ZK. We should make this clear as well. The best option I see is a specific exception such as NoMasterAddressException or NoMasterAddressInZooKeeperException And MasterNotRunning can be: the master is actually there, but marked internally as non running (stopped). The server side method 'isMasterRunning' will return false (you can contact the master but it's stopping). As such I think is better not to have an exception named MasterNotRunning as it creates confusion between the technical status and the functional one. I think the pseudo code could be, on the client interface: {noformat} public boolean isMasterRunning() throws MasterConnectionException, NoMasterAddressException, ZooKeeperConnectionException { ZK zk = getZK(); // Can throw ZooKeeperConnectionException Master m = getMaster(zk); // Can throw MasterConnectionException or NoMasterAddressException boolean isRunning = m.isMasterRunning(); // can throw MasterConnectionException return isRunning; } {noformat} HMasterInterface.isMasterRunning() requires clean up Key: HBASE-6273 URL: https://issues.apache.org/jira/browse/HBASE-6273 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.94.0 Reporter: ramkrishna.s.vasudevan Fix For: 0.96.0 This JIRA is in reference to JD's comments regarding the clean up needed in isMasterRunning(). Refer to https://issues.apache.org/jira/browse/HBASE-6240?focusedCommentId=13400772page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13400772 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6175) TestFSUtils flaky on hdfs getFileStatus method
[ https://issues.apache.org/jira/browse/HBASE-6175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-6175: --- Attachment: 6175.v1.patch TestFSUtils flaky on hdfs getFileStatus method -- Key: HBASE-6175 URL: https://issues.apache.org/jira/browse/HBASE-6175 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Trivial Fix For: 0.96.0 Attachments: 6175.v1.patch This is a simplified version of a TestFSUtils issue: a sleep and the test works 100% of the time. No sleep and it becomes flaky. Root cause unknown. While the issue appears on the tests, the root cause could be an issue on real production system as well. {noformat} @Test public void testFSUTils() throws Exception { final String hosts[] = {host1, host2, host3, host4}; Path testFile = new Path(/test1.txt); HBaseTestingUtility htu = new HBaseTestingUtility(); try { htu.startMiniDFSCluster(hosts).waitActive(); FileSystem fs = htu.getDFSCluster().getFileSystem(); for (int i = 0; i 100; ++i) { FSDataOutputStream out = fs.create(testFile); byte[] data = new byte[1]; out.write(data, 0, 1); out.close(); // Put a sleep here to make me work //Thread.sleep(2000); FileStatus status = fs.getFileStatus(testFile); HDFSBlocksDistribution blocksDistribution = FSUtils.computeHDFSBlocksDistribution(fs, status, 0, status.getLen()); assertEquals(Wrong number of hosts distributing blocks. at iteration +i, 3, blocksDistribution.getTopHosts().size()); fs.delete(testFile, true); } } finally { htu.shutdownMiniDFSCluster(); } } {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6175) TestFSUtils flaky on hdfs getFileStatus method
[ https://issues.apache.org/jira/browse/HBASE-6175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-6175: --- Status: Patch Available (was: Open) TestFSUtils flaky on hdfs getFileStatus method -- Key: HBASE-6175 URL: https://issues.apache.org/jira/browse/HBASE-6175 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Trivial Fix For: 0.96.0 Attachments: 6175.v1.patch This is a simplified version of a TestFSUtils issue: a sleep and the test works 100% of the time. No sleep and it becomes flaky. Root cause unknown. While the issue appears on the tests, the root cause could be an issue on real production system as well. {noformat} @Test public void testFSUTils() throws Exception { final String hosts[] = {host1, host2, host3, host4}; Path testFile = new Path(/test1.txt); HBaseTestingUtility htu = new HBaseTestingUtility(); try { htu.startMiniDFSCluster(hosts).waitActive(); FileSystem fs = htu.getDFSCluster().getFileSystem(); for (int i = 0; i 100; ++i) { FSDataOutputStream out = fs.create(testFile); byte[] data = new byte[1]; out.write(data, 0, 1); out.close(); // Put a sleep here to make me work //Thread.sleep(2000); FileStatus status = fs.getFileStatus(testFile); HDFSBlocksDistribution blocksDistribution = FSUtils.computeHDFSBlocksDistribution(fs, status, 0, status.getLen()); assertEquals(Wrong number of hosts distributing blocks. at iteration +i, 3, blocksDistribution.getTopHosts().size()); fs.delete(testFile, true); } } finally { htu.shutdownMiniDFSCluster(); } } {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6175) TestFSUtils flaky on hdfs getFileStatus method
[ https://issues.apache.org/jira/browse/HBASE-6175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402990#comment-13402990 ] nkeywal commented on HBASE-6175: Here is the fix. Without 'no go' I'll commit it this week end. TestFSUtils flaky on hdfs getFileStatus method -- Key: HBASE-6175 URL: https://issues.apache.org/jira/browse/HBASE-6175 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Trivial Fix For: 0.96.0 Attachments: 6175.v1.patch This is a simplified version of a TestFSUtils issue: a sleep and the test works 100% of the time. No sleep and it becomes flaky. Root cause unknown. While the issue appears on the tests, the root cause could be an issue on real production system as well. {noformat} @Test public void testFSUTils() throws Exception { final String hosts[] = {host1, host2, host3, host4}; Path testFile = new Path(/test1.txt); HBaseTestingUtility htu = new HBaseTestingUtility(); try { htu.startMiniDFSCluster(hosts).waitActive(); FileSystem fs = htu.getDFSCluster().getFileSystem(); for (int i = 0; i 100; ++i) { FSDataOutputStream out = fs.create(testFile); byte[] data = new byte[1]; out.write(data, 0, 1); out.close(); // Put a sleep here to make me work //Thread.sleep(2000); FileStatus status = fs.getFileStatus(testFile); HDFSBlocksDistribution blocksDistribution = FSUtils.computeHDFSBlocksDistribution(fs, status, 0, status.getLen()); assertEquals(Wrong number of hosts distributing blocks. at iteration +i, 3, blocksDistribution.getTopHosts().size()); fs.delete(testFile, true); } } finally { htu.shutdownMiniDFSCluster(); } } {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6290) Add a function a mark a server as dead and start the recovery the process
nkeywal created HBASE-6290: -- Summary: Add a function a mark a server as dead and start the recovery the process Key: HBASE-6290 URL: https://issues.apache.org/jira/browse/HBASE-6290 Project: HBase Issue Type: Improvement Components: monitoring Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor ZooKeeper is used a a monitoring tool: we use znode and we start the recovery process when a znode is deleted by ZK because it got a timeout. This timeout is defaulted to 90 seconds, and often set to 30s However, some HW issues could be detected by specialized hw monitoring tools before the ZK timeout. For this reason, it makes sense to offer a very simple function to mark a RS as dead. This should not take in It could be a hbase shell function such as considerAsDead ipAddress|serverName This would delete all the znodes of the server running on this box, starting the recovery process. Such a function would be easily callable (at callers risk) by any fault detection tool... We could have issues to identify the right master region servers around ipv4 vs ipv6 vs and multi networked boxes however. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background
nkeywal created HBASE-6295: -- Summary: Possible performance improvement in client batch operations: presplit and send in background Key: HBASE-6295 URL: https://issues.apache.org/jira/browse/HBASE-6295 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.96.0 Reporter: nkeywal today batch algo is: {noformat} for Operation o: ListOp{ add o to todolist if todolist maxsize or o last in list split todolist per location send split lists to region servers clear todolist wait } {noformat} We could: - create immediately the final object instead of an intermediate array - split per location immediately - instead of sending when the list as a whole is full, send it when there is enough data for a single location It would be: {noformat} for Operation o: ListOp{ get location add o to todo location.todolist if (location.todolist maxLocationSize) send location.todolist to region server clear location.todolist // don't wait, continue the loop } send remaining wait {noformat} It's not trivial to write if you add error management: retried list must be shared with the operations added in the todolist. But it's doable. It's interesting mainly for 'big' writes -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6315) ipc.HBaseClient should support address change as does hdfs
nkeywal created HBASE-6315: -- Summary: ipc.HBaseClient should support address change as does hdfs Key: HBASE-6315 URL: https://issues.apache.org/jira/browse/HBASE-6315 Project: HBase Issue Type: Bug Components: ipc Affects Versions: 0.96.0 Reporter: nkeywal Priority: Minor ipc.HBaseClient is a copy paste from ipc.Client. This implementation now support adress change. As a side node, HBase comment on 'the max number of retries is 45' is now wrong. --- HBaseClient } catch (SocketTimeoutException toe) { /* The max number of retries is 45, * which amounts to 20s*45 = 15 minutes retries. */ handleConnectionFailure(timeoutFailures++, maxRetries, toe); } catch (IOException ie) { handleConnectionFailure(ioFailures++, maxRetries, ie); } --- Hadoop Client } catch (SocketTimeoutException toe) { /* Check for an address change and update the local reference. * Reset the failure counter if the address was changed */ if (updateAddress()) { timeoutFailures = ioFailures = 0; } /* The max number of retries is 45, * which amounts to 20s*45 = 15 minutes retries. */ handleConnectionFailure(timeoutFailures++, 45, toe); } catch (IOException ie) { if (updateAddress()) { timeoutFailures = ioFailures = 0; } handleConnectionFailure(ioFailures++, maxRetries, ie); } private synchronized boolean updateAddress() throws IOException { // Do a fresh lookup with the old host name. InetSocketAddress currentAddr = NetUtils.makeSocketAddr( server.getHostName(), server.getPort()); if (!server.equals(currentAddr)) { LOG.warn(Address change detected. Old: + server.toString() + New: + currentAddr.toString()); server = currentAddr; return true; } return false; } -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6309) [MTTR] Do NN operations outside of the ZK EventThread in SplitLogManager
[ https://issues.apache.org/jira/browse/HBASE-6309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13405976#comment-13405976 ] nkeywal commented on HBASE-6309: bq. IMO we should move everything that talks to ZK and NN out of that path. +1... [MTTR] Do NN operations outside of the ZK EventThread in SplitLogManager Key: HBASE-6309 URL: https://issues.apache.org/jira/browse/HBASE-6309 Project: HBase Issue Type: Improvement Affects Versions: 0.92.1, 0.94.0, 0.96.0 Reporter: Jean-Daniel Cryans Priority: Critical Fix For: 0.96.0 We found this issue during the leap second cataclysm which prompted a distributed splitting of all our logs. I saw that none of the RS were splitting after some time while the master was showing that it wasn't even 30% done. jstack'ing I saw this: {noformat} main-EventThread daemon prio=10 tid=0x7f6ce46d8800 nid=0x5376 in Object.wait() [0x7f6ce2ecb000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:485) at org.apache.hadoop.ipc.Client.call(Client.java:1093) - locked 0x0005fdd661a0 (a org.apache.hadoop.ipc.Client$Call) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226) at $Proxy9.rename(Unknown Source) at sun.reflect.GeneratedMethodAccessor29.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) at $Proxy9.rename(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.rename(DFSClient.java:759) at org.apache.hadoop.hdfs.DistributedFileSystem.rename(DistributedFileSystem.java:253) at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.moveRecoveredEditsFromTemp(HLogSplitter.java:553) at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.moveRecoveredEditsFromTemp(HLogSplitter.java:519) at org.apache.hadoop.hbase.master.SplitLogManager$1.finish(SplitLogManager.java:138) at org.apache.hadoop.hbase.master.SplitLogManager.getDataSetWatchSuccess(SplitLogManager.java:431) at org.apache.hadoop.hbase.master.SplitLogManager.access$1200(SplitLogManager.java:95) at org.apache.hadoop.hbase.master.SplitLogManager$GetDataAsyncCallback.processResult(SplitLogManager.java:1011) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:571) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:497) {noformat} We are effectively bottlenecking on doing NN operations and whatever else is happening in GetDataAsyncCallback. It was so bad that on our 100 offline cluster it took a few hours for the master to process all the incoming ZK events while the actual splitting took a fraction of that time. I'm marking this as critical and against 0.96 but depending on how involved the fix is we might want to backport. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira