[jira] [Commented] (HBASE-4177) Handling read failures during recovery - when HMaster calls Namenode recovery, recovery may be a failure leading to read failure while splitting logs
[ https://issues.apache.org/jira/browse/HBASE-4177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13440578#comment-13440578 ] Lars Hofhansl commented on HBASE-4177: -- That is superseded by all of N's work, correct? Handling read failures during recovery - when HMaster calls Namenode recovery, recovery may be a failure leading to read failure while splitting logs -- Key: HBASE-4177 URL: https://issues.apache.org/jira/browse/HBASE-4177 Project: HBase Issue Type: Bug Components: master Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Critical As per the mailing thread with the heading 'Handling read failures during recovery' we found this problem. As part of split Logs the HMaster calls Namenode recovery. The recovery is an asynchronous process. In HDFS === Even though client is getting the updated block info from Namenode on first read failure, client is discarding the new info and using the old info only to retrieve the data from datanode. So, all the read retries are failing. [Method parameter reassignment - Not reflected in caller]. In HBASE === In HMaster code we tend to wait for 1sec. But if the recovery had some failure then split log may not happen and may lead to dataloss. So may be we need to decide upon the actual delay that needs to be introduced once Hmaster calls NN recovery. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4177) Handling read failures during recovery - when HMaster calls Namenode recovery, recovery may be a failure leading to read failure while splitting logs
[ https://issues.apache.org/jira/browse/HBASE-4177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13440652#comment-13440652 ] nkeywal commented on HBASE-4177: hum, it's really closed to what I've done, but this problem may still be there. Ram, what do you think? If you don't have the time, I can give it a try. Handling read failures during recovery - when HMaster calls Namenode recovery, recovery may be a failure leading to read failure while splitting logs -- Key: HBASE-4177 URL: https://issues.apache.org/jira/browse/HBASE-4177 Project: HBase Issue Type: Bug Components: master Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Critical As per the mailing thread with the heading 'Handling read failures during recovery' we found this problem. As part of split Logs the HMaster calls Namenode recovery. The recovery is an asynchronous process. In HDFS === Even though client is getting the updated block info from Namenode on first read failure, client is discarding the new info and using the old info only to retrieve the data from datanode. So, all the read retries are failing. [Method parameter reassignment - Not reflected in caller]. In HBASE === In HMaster code we tend to wait for 1sec. But if the recovery had some failure then split log may not happen and may lead to dataloss. So may be we need to decide upon the actual delay that needs to be introduced once Hmaster calls NN recovery. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4177) Handling read failures during recovery - when HMaster calls Namenode recovery, recovery may be a failure leading to read failure while splitting logs
[ https://issues.apache.org/jira/browse/HBASE-4177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13440919#comment-13440919 ] ramkrishna.s.vasudevan commented on HBASE-4177: --- @N I too think the problem is still there. But internally here also we have not started working on this yet. At that time we had discussions that HDFS side also we need some changes and Stack has already raised the same in HDFS JIRA. Surely you can take a stab at it N. Handling read failures during recovery - when HMaster calls Namenode recovery, recovery may be a failure leading to read failure while splitting logs -- Key: HBASE-4177 URL: https://issues.apache.org/jira/browse/HBASE-4177 Project: HBase Issue Type: Bug Components: master Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Critical As per the mailing thread with the heading 'Handling read failures during recovery' we found this problem. As part of split Logs the HMaster calls Namenode recovery. The recovery is an asynchronous process. In HDFS === Even though client is getting the updated block info from Namenode on first read failure, client is discarding the new info and using the old info only to retrieve the data from datanode. So, all the read retries are failing. [Method parameter reassignment - Not reflected in caller]. In HBASE === In HMaster code we tend to wait for 1sec. But if the recovery had some failure then split log may not happen and may lead to dataloss. So may be we need to decide upon the actual delay that needs to be introduced once Hmaster calls NN recovery. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4177) Handling read failures during recovery - when HMaster calls Namenode recovery, recovery may be a failure leading to read failure while splitting logs
[ https://issues.apache.org/jira/browse/HBASE-4177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13199033#comment-13199033 ] ramkrishna.s.vasudevan commented on HBASE-4177: --- Any suggestions on this. We tend to run into this problem every now and then. Handling read failures during recovery - when HMaster calls Namenode recovery, recovery may be a failure leading to read failure while splitting logs -- Key: HBASE-4177 URL: https://issues.apache.org/jira/browse/HBASE-4177 Project: HBase Issue Type: Bug Components: master Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Critical As per the mailing thread with the heading 'Handling read failures during recovery' we found this problem. As part of split Logs the HMaster calls Namenode recovery. The recovery is an asynchronous process. In HDFS === Even though client is getting the updated block info from Namenode on first read failure, client is discarding the new info and using the old info only to retrieve the data from datanode. So, all the read retries are failing. [Method parameter reassignment - Not reflected in caller]. In HBASE === In HMaster code we tend to wait for 1sec. But if the recovery had some failure then split log may not happen and may lead to dataloss. So may be we need to decide upon the actual delay that needs to be introduced once Hmaster calls NN recovery. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4177) Handling read failures during recovery - when HMaster calls Namenode recovery, recovery may be a failure leading to read failure while splitting logs
[ https://issues.apache.org/jira/browse/HBASE-4177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13092981#comment-13092981 ] stack commented on HBASE-4177: -- I created HDFS-2296 at Hairong's suggestion. Handling read failures during recovery - when HMaster calls Namenode recovery, recovery may be a failure leading to read failure while splitting logs -- Key: HBASE-4177 URL: https://issues.apache.org/jira/browse/HBASE-4177 Project: HBase Issue Type: Bug Components: master Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Critical As per the mailing thread with the heading 'Handling read failures during recovery' we found this problem. As part of split Logs the HMaster calls Namenode recovery. The recovery is an asynchronous process. In HDFS === Even though client is getting the updated block info from Namenode on first read failure, client is discarding the new info and using the old info only to retrieve the data from datanode. So, all the read retries are failing. [Method parameter reassignment - Not reflected in caller]. In HBASE === In HMaster code we tend to wait for 1sec. But if the recovery had some failure then split log may not happen and may lead to dataloss. So may be we need to decide upon the actual delay that needs to be introduced once Hmaster calls NN recovery. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4177) Handling read failures during recovery - when HMaster calls Namenode recovery, recovery may be a failure leading to read failure while splitting logs
[ https://issues.apache.org/jira/browse/HBASE-4177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093003#comment-13093003 ] ramkrishna.s.vasudevan commented on HBASE-4177: --- @Stack Thanks for tracking this and raising an issue for the same in HDFS. Handling read failures during recovery - when HMaster calls Namenode recovery, recovery may be a failure leading to read failure while splitting logs -- Key: HBASE-4177 URL: https://issues.apache.org/jira/browse/HBASE-4177 Project: HBase Issue Type: Bug Components: master Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Critical As per the mailing thread with the heading 'Handling read failures during recovery' we found this problem. As part of split Logs the HMaster calls Namenode recovery. The recovery is an asynchronous process. In HDFS === Even though client is getting the updated block info from Namenode on first read failure, client is discarding the new info and using the old info only to retrieve the data from datanode. So, all the read retries are failing. [Method parameter reassignment - Not reflected in caller]. In HBASE === In HMaster code we tend to wait for 1sec. But if the recovery had some failure then split log may not happen and may lead to dataloss. So may be we need to decide upon the actual delay that needs to be introduced once Hmaster calls NN recovery. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4177) Handling read failures during recovery - when HMaster calls Namenode recovery, recovery may be a failure leading to read failure while splitting logs
[ https://issues.apache.org/jira/browse/HBASE-4177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13081098#comment-13081098 ] Ted Yu commented on HBASE-4177: --- Looking at FSUtils.recoverFileLease(), we check the type of fs inside while loop. This is unnecessary. w.r.t. soft limit for the lease, we have: {code} if (waitedFor FSConstants.LEASE_SOFTLIMIT_PERIOD) { LOG.warn(Waited + waitedFor + ms for lease recovery on + p + : + e.getMessage()); } {code} I think we should wait for the remainder of soft limit (which is 60 seconds). Handling read failures during recovery - when HMaster calls Namenode recovery, recovery may be a failure leading to read failure while splitting logs -- Key: HBASE-4177 URL: https://issues.apache.org/jira/browse/HBASE-4177 Project: HBase Issue Type: Bug Components: master Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan As per the mailing thread with the heading 'Handling read failures during recovery' we found this problem. As part of split Logs the HMaster calls Namenode recovery. The recovery is an asynchronous process. In HDFS === Even though client is getting the updated block info from Namenode on first read failure, client is discarding the new info and using the old info only to retrieve the data from datanode. So, all the read retries are failing. [Method parameter reassignment - Not reflected in caller]. In HBASE === In HMaster code we tend to wait for 1sec. But if the recovery had some failure then split log may not happen and may lead to dataloss. So may be we need to decide upon the actual delay that needs to be introduced once Hmaster calls NN recovery. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira