[jira] [Comment Edited] (HDFS-13811) RBF: Race condition between router admin quota update and periodic quota update service
[ https://issues.apache.org/jira/browse/HDFS-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16987714#comment-16987714 ] Yiqun Lin edited comment on HDFS-13811 at 12/4/19 10:22 AM: Committed this to trunk. Thanks [~LiJinglun] for the contribution, [~ayushtkn] for the review and [~dibyendu_hadoop] for reporting this. was (Author: linyiqun): Committed this to trunk. Thanks [~LiJinglun] for the contribution and thanks [~ayushtkn] for the review. > RBF: Race condition between router admin quota update and periodic quota > update service > --- > > Key: HDFS-13811 > URL: https://issues.apache.org/jira/browse/HDFS-13811 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Dibyendu Karmakar >Assignee: Jinglun >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-13811-000.patch, HDFS-13811-HDFS-13891-000.patch, > HDFS-13811.001.patch, HDFS-13811.002.patch, HDFS-13811.003.patch, > HDFS-13811.004.patch, HDFS-13811.005.patch, HDFS-13811.006.patch, > HDFS-13811.007.patch > > > If we try to update quota of an existing mount entry and at the same time > periodic quota update service is running on the same mount entry, it is > leading the mount table to _inconsistent state._ > Here transactions are: > A - Quota update service is fetching mount table entries. > B - Quota update service is updating the mount table with current usage. > A' - User is trying to update quota using admin cmd. > and the transaction sequence is [ A A' B ] > quota update service is updating the mount table with old quota value. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-13811) RBF: Race condition between router admin quota update and periodic quota update service
[ https://issues.apache.org/jira/browse/HDFS-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16984758#comment-16984758 ] Yiqun Lin edited comment on HDFS-13811 at 11/29/19 6:55 AM: Hi [~LiJinglun], I take a look for the major change of the patch. I am agreed with that updating mount table statestore behavior should be called only by admin call and let periodic service update the local cache. This will broken one thing that the usage in mount table cannot be updated. And the quota usage displayed in web UI will be invalid. How do we plan to fix this? Can you use the similar way that [~dibyendu_hadoop] did before to extract the quota usage from quota manager when getting the mount table entries? {noformat} @@ -142,6 +143,20 @@ public GetMountTableEntriesResponse getMountTableEntries( it.remove(); } } +// If quotamanager is not null, update quota usage from quota cache. +if (this.getQuotaManager() != null && request.isUpdateQuotaCache()) { + RouterQuotaUsage quota = + this.getQuotaManager().getQuotaUsage(record.getSourcePath()); + if(quota != null) { +RouterQuotaUsage oldquota = record.getQuota(); +RouterQuotaUsage newQuota = new RouterQuotaUsage.Builder() +.fileAndDirectoryCount(quota.getFileAndDirectoryCount()) +.quota(oldquota.getQuota()) +.spaceConsumed(quota.getSpaceConsumed()) +.spaceQuota(oldquota.getSpaceQuota()).build(); +record.setQuota(newQuota); + } +} } } {noformat} was (Author: linyiqun): Hi [~LiJinglun], I take a look for the major change of the patch. I am agreed with that updating mount table statestore behavior should be called only by admin call and let periodic service update the local cache. This will broken one thing that the usage in mount table cannot be updated. And the quota usage displayed in web UI will be invalid. How do we plan to fix this?The definition of update service is that: {noformat} /** * Service to periodically update the {@link RouterQuotaUsage} * cached information in the {@link Router} and update corresponding * mount table in State Store. */ public class RouterQuotaUpdateService extends PeriodicService { {noformat} > RBF: Race condition between router admin quota update and periodic quota > update service > --- > > Key: HDFS-13811 > URL: https://issues.apache.org/jira/browse/HDFS-13811 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Dibyendu Karmakar >Assignee: Jinglun >Priority: Major > Attachments: HDFS-13811-000.patch, HDFS-13811-HDFS-13891-000.patch, > HDFS-13811.001.patch, HDFS-13811.002.patch, HDFS-13811.003.patch > > > If we try to update quota of an existing mount entry and at the same time > periodic quota update service is running on the same mount entry, it is > leading the mount table to _inconsistent state._ > Here transactions are: > A - Quota update service is fetching mount table entries. > B - Quota update service is updating the mount table with current usage. > A' - User is trying to update quota using admin cmd. > and the transaction sequence is [ A A' B ] > quota update service is updating the mount table with old quota value. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-13811) RBF: Race condition between router admin quota update and periodic quota update service
[ https://issues.apache.org/jira/browse/HDFS-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16984758#comment-16984758 ] Yiqun Lin edited comment on HDFS-13811 at 11/29/19 6:49 AM: Hi [~LiJinglun], I take a look for the major change of the patch. I am agreed with that updating mount table statestore behavior should be called only by admin call and let periodic service update the local cache. This will broken one thing that the usage in mount table cannot be updated. And the quota usage displayed in web UI will be invalid. How do we plan to fix this?The definition of update service is that: {noformat} /** * Service to periodically update the {@link RouterQuotaUsage} * cached information in the {@link Router} and update corresponding * mount table in State Store. */ public class RouterQuotaUpdateService extends PeriodicService { {noformat} was (Author: linyiqun): Hi [~LiJinglun], I take a look for the major change of the patch. I am agreed with that updating mount table statestore behavior should be called only by admin call and let periodic service update the local cache. This will broken one thing that the usage in mount table cannot be updated. And the quota usage displayed in web UI will be invalid. How do we plan to fix this? > RBF: Race condition between router admin quota update and periodic quota > update service > --- > > Key: HDFS-13811 > URL: https://issues.apache.org/jira/browse/HDFS-13811 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Dibyendu Karmakar >Assignee: Jinglun >Priority: Major > Attachments: HDFS-13811-000.patch, HDFS-13811-HDFS-13891-000.patch, > HDFS-13811.001.patch, HDFS-13811.002.patch, HDFS-13811.003.patch > > > If we try to update quota of an existing mount entry and at the same time > periodic quota update service is running on the same mount entry, it is > leading the mount table to _inconsistent state._ > Here transactions are: > A - Quota update service is fetching mount table entries. > B - Quota update service is updating the mount table with current usage. > A' - User is trying to update quota using admin cmd. > and the transaction sequence is [ A A' B ] > quota update service is updating the mount table with old quota value. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-13811) RBF: Race condition between router admin quota update and periodic quota update service
[ https://issues.apache.org/jira/browse/HDFS-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16704423#comment-16704423 ] Yiqun Lin edited comment on HDFS-13811 at 11/30/18 8:53 AM: Hi [~dibyendu_hadoop], {quote} I haven't change the existing logic for getQuotaUsage. I have added a check to verify the quota cache value for the requested mount entry, if the cache is null, it will try to refresh the cache for that mount entry only. {quote} I suppose we can extract this as a new function and be called in {{getQuotaUsage(String path)}}. {quote} If we don't use updateQuotaCache flag, from RouterQuotaUpdateService#periodicInvoke getQuotaSetMountTables will eventually call getQuotaUsage, which will again call periodicInvoke and it will go into an infinite loop. That's why updateQuotaCache flag is required. {quote} I am not fully understanding these. I don't mean we must hard-coded the flag {{updateQuotaCache}} value as false in {{GetMountTableEntriesRequest}}. As I see {{getMountTableEntries(boolean updateQuotaCache)}} is used only by {{getQuotaSetMountTables}}. And then quota service passes a false value to {{getQuotaSetMountTables}}. No other places use this function. So I suggest to remove the parameter in function. Please feel free to attach next patch now. was (Author: linyiqun): Hi [~dibyendu_hadoop], {quote} I haven't change the existing logic for getQuotaUsage. I have added a check to verify the quota cache value for the requested mount entry, if the cache is null, it will try to refresh the cache for that mount entry only. {quote} I suppose we can extract this as a new function and be called in {{getQuotaUsage(String path)}}. {quote} If we don't use updateQuotaCache flag, from RouterQuotaUpdateService#periodicInvoke getQuotaSetMountTables will eventually call getQuotaUsage, which will again call periodicInvoke and it will go into an infinite loop. That's why updateQuotaCache flag is required. {quote} I am not fully understanding these. I don't mean we must hard-coded the flag {{updateQuotaCache}} value as value in {{GetMountTableEntriesRequest}}. As I see {{getMountTableEntries(boolean updateQuotaCache)}} is used only by {{getQuotaSetMountTables}}. And then quota service passes a false value to {{getQuotaSetMountTables}}. And no others places use this function. So I suggest to remove the parameter in function. Please feel free to attach next patch now. > RBF: Race condition between router admin quota update and periodic quota > update service > --- > > Key: HDFS-13811 > URL: https://issues.apache.org/jira/browse/HDFS-13811 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Dibyendu Karmakar >Assignee: Dibyendu Karmakar >Priority: Major > Attachments: HDFS-13811-000.patch, HDFS-13811-HDFS-13891-000.patch > > > If we try to update quota of an existing mount entry and at the same time > periodic quota update service is running on the same mount entry, it is > leading the mount table to _inconsistent state._ > Here transactions are: > A - Quota update service is fetching mount table entries. > B - Quota update service is updating the mount table with current usage. > A' - User is trying to update quota using admin cmd. > and the transaction sequence is [ A A' B ] > quota update service is updating the mount table with old quota value. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-13811) RBF: Race condition between router admin quota update and periodic quota update service
[ https://issues.apache.org/jira/browse/HDFS-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16696359#comment-16696359 ] Yiqun Lin edited comment on HDFS-13811 at 11/23/18 3:39 AM: Thanks for the explanation, [~dibyendu_hadoop]. I think I have got your thought. I am still reviewing, but some initial comments for you: *RouterQuotaManager.java* I'd like to keep original logic in {{getQuotaUsage}} and make that cleaned. We can allow usage is not found within a short time and wait for quota periodic update behaviour. Also the logic we change is incorrect, if we don't find the usage, it will get its parent usage until we find the right one. *RouterQuotaUpdateService* Line84: I'd like to add a try-catch for {{periodicInvoke}} method. So that one mount table updated error won't lead a loop exit. Line92: Rename {{periodicInvoke}} to {{updateQuotaUsage}}. Line124: {{currentQuotaUsage}} is an aggregated quota. The quota here (currentQuotaUsage.getQuota) only mean the last subcluster's quota value not mean all sub-clusters. If one subcluster filesysem's quota was changed, it still cannot be checked in following logic. Here I prefer to file another JIRA to improve this and keep original logic temporary. {code} // If there is a mismatch between the quota values in router cache // and sub-cluster file-system, sync the quota. if (currentQuotaUsage.getQuota() != nsQuota || currentQuotaUsage.getSpaceQuota() != ssQuota) { try { this.rpcServer.setQuota(src, nsQuota, ssQuota, null); } catch (IOException ioe) { LOG.error("Unable to set quota at remote location for " + src, ioe); } } {code} Line137: This line isn't needed. Line176: In quota update service, we don't really need to use parameter {{updateQuotaCache}}. Why not just set {{false}}. And no need to pass {{updateQuotaCache}} parameter. Haven't fully reviewed the UT, but I think we need to add a new test case for quota cache updating behaviour since we introduce the {{updateQuotaCache}} flag for mount table getting. was (Author: linyiqun): Thanks for the explanation, [~dibyendu_hadoop]. I think I have got your thought. I am still reviewing, but some initial comments for you: *RouterQuotaManager.java* I'd like to keep original logic in {{getQuotaUsage}} and make that cleaned. We can allow usage is not found within a short time and wait for quota periodic update behaviour. Also the logic we change is incorrect, if we don't find the usage, it will get its parent usage until we find the right one. *RouterQuotaUpdateService.periodicInvoke(MountTable entry)* Line84: I'd like to add a try-catch for {{periodicInvoke}} method. So that one mount table updated error won't lead a loop exit. Line92: Rename {{periodicInvoke}} to {{updateQuotaUsage}}. Line124: {{currentQuotaUsage}} is an aggregated quota. The quota here (currentQuotaUsage.getQuota) only mean the last subcluster's quota value not mean all sub-clusters. If one subcluster filesysem's quota was changed, it still cannot be checked in following logic. Here I prefer to file another JIRA to improve this and keep original logic temporary. {code} // If there is a mismatch between the quota values in router cache // and sub-cluster file-system, sync the quota. if (currentQuotaUsage.getQuota() != nsQuota || currentQuotaUsage.getSpaceQuota() != ssQuota) { try { this.rpcServer.setQuota(src, nsQuota, ssQuota, null); } catch (IOException ioe) { LOG.error("Unable to set quota at remote location for " + src, ioe); } } {code} Line137: This line isn't needed. Line176: In quota update service, we don't really need to use parameter {{updateQuotaCache}}. Why not just set {{false}}. And no need to pass {{updateQuotaCache}} parameter. Haven't fully reviewed the UT, but I think we need to add a new test case for quota cache updating behaviour since we introduce the {{updateQuotaCache}} flag for mount table getting. > RBF: Race condition between router admin quota update and periodic quota > update service > --- > > Key: HDFS-13811 > URL: https://issues.apache.org/jira/browse/HDFS-13811 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Dibyendu Karmakar >Assignee: Dibyendu Karmakar >Priority: Major > Attachments: HDFS-13811-000.patch, HDFS-13811-HDFS-13891-000.patch > > > If we try to update quota of an existing mount entry and at the same time > periodic quota update service is running on the same mount entry, it is > leading the mount table to _inconsistent state._ > Here transactions are: > A - Quota update service is fetching mount table entries. > B - Quota update service is updating the mount table with
[jira] [Comment Edited] (HDFS-13811) RBF: Race condition between router admin quota update and periodic quota update service
[ https://issues.apache.org/jira/browse/HDFS-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695590#comment-16695590 ] Dibyendu Karmakar edited comment on HDFS-13811 at 11/22/18 6:33 AM: Thanks [~linyiqun] for the review. Here I am taking the following approach- Updating the quota in state store is an admin call, If we try to update statestore internally by periodic service it will lead to race condition. IMO we should not update the mount table statestore internally, only the admin call should be able to update mount table statestore. In this patch I am restricting the periodic service to update mount table state store. Periodic service will only update the quota usage and store it in local cache. So, whenever the get call will come it will fetch all the other details from satestore and quota usage will be fetched from the cache. During get call if we get null value from cache, for quota usage then also we are trying to update the usage once, for that specific mount entry. Though there is a bit code change but I think it will keep the functionality clean. Please let me know your thoughts, [~linyiqun]. was (Author: dibyendu_hadoop): Thanks [~linyiqun] for the review. Here I am taking the following approach- Updating the quota in state store is an admin call, If we try to update statestore internally by periodic service it will lead to race condition. IMO we should not update the mount table statestore internally, only the admin call should be able to update mount table statestore. In this patch I am restricting the periodic service to update mount table state store. Periodic service will only update the quota usage and store it in local cache. So, whenever the get call will come it will fetch all the other details from satestore and quota usage will be fetched from the cache. During get call if we get null value from cache, for quota usage then also we are trying to update the usage once, for that specific mount entry. Though there is a bit code change but I think it will keep the functionality clean. Please let me know your thoughts [~linyiqun]. > RBF: Race condition between router admin quota update and periodic quota > update service > --- > > Key: HDFS-13811 > URL: https://issues.apache.org/jira/browse/HDFS-13811 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Dibyendu Karmakar >Assignee: Dibyendu Karmakar >Priority: Major > Attachments: HDFS-13811-000.patch > > > If we try to update quota of an existing mount entry and at the same time > periodic quota update service is running on the same mount entry, it is > leading the mount table to _inconsistent state._ > Here transactions are: > A - Quota update service is fetching mount table entries. > B - Quota update service is updating the mount table with current usage. > A' - User is trying to update quota using admin cmd. > and the transaction sequence is [ A A' B ] > quota update service is updating the mount table with old quota value. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-13811) RBF: Race condition between router admin quota update and periodic quota update service
[ https://issues.apache.org/jira/browse/HDFS-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16694887#comment-16694887 ] Yiqun Lin edited comment on HDFS-13811 at 11/22/18 2:58 AM: Take a quick look for the patch, looks like we change a lot. Thinking again for the corner case mentioned, can we use a more easier way: 1) When prepare to update mount tables in RouterQuotaUpdateService#updateMountTableEntries, get the latest mount table entries in quota update service. 2) When latest quota value is not matched with fetched before, then generate a new quota value(use latest quota and latest quota usage). I mean we can just take an additional check when updating the quota usage in update service. At least one thing we should promise: admin cmd be respected and making sense.. Maybe there will still be a little probability race condition happened during updating mount tables in update service. But I think this will be okay for us. The most of time is spent on quota usage get/update in RouterQuotaUpdateService#periodicInvoke. If this way is okay, I'm not sure other change is really needed. Please let me know your thought, [~dibyendu_hadoop]. I think this way can address your case. was (Author: linyiqun): Take a quick look for the patch, looks like we change a lot. Thinking again for the corner case mentioned, can we use a more easier way: 1) When prepare to update mount tables in RouterQuotaUpdateService#updateMountTableEntries, get the latest mount table entries in quota update service. 2) When latest quota value is not matched with fetched before, then generate a new quota value(use latest quota and latest quota usage). I mean we can just take an additional check when updating the quota usage in update service. At least one thing we should promise: admin cmd be respected and making sense.. If this way is okay, I'm not sure other change is really needed. Please let me know your thought, [~dibyendu_hadoop]. I think this way can address your case. > RBF: Race condition between router admin quota update and periodic quota > update service > --- > > Key: HDFS-13811 > URL: https://issues.apache.org/jira/browse/HDFS-13811 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Dibyendu Karmakar >Assignee: Dibyendu Karmakar >Priority: Major > Attachments: HDFS-13811-000.patch > > > If we try to update quota of an existing mount entry and at the same time > periodic quota update service is running on the same mount entry, it is > leading the mount table to _inconsistent state._ > Here transactions are: > A - Quota update service is fetching mount table entries. > B - Quota update service is updating the mount table with current usage. > A' - User is trying to update quota using admin cmd. > and the transaction sequence is [ A A' B ] > quota update service is updating the mount table with old quota value. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-13811) RBF: Race condition between router admin quota update and periodic quota update service
[ https://issues.apache.org/jira/browse/HDFS-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16694887#comment-16694887 ] Yiqun Lin edited comment on HDFS-13811 at 11/21/18 4:08 PM: Take a quick look for the patch, looks like we change a lot. Thinking again for the corner case mentioned, can we use a more easier way: 1) When prepare to update mount tables in RouterQuotaUpdateService#updateMountTableEntries, get the latest mount table entries in quota update service. 2) When latest quota value is not matched with fetched before, then generate a new quota value(use latest quota and latest quota usage). I mean we can just take an additional check when updating the quota usage in update service. At least one thing we should promise: admin cmd be respected and making sense.. If this way is okay, I'm not sure other change is really needed. Please let me know your thought, [~dibyendu_hadoop]. I think this way can address your case. was (Author: linyiqun): Take a quick look for the patch, looks like we change a lot. Thinking again for the corner case mentioned, can we use a more easier way: 1) When prepare to update mount tables in RouterQuotaUpdateService#updateMountTableEntries, get the latest mount table entries in quota update service. 2) When latest quota value is not matched with fetched before, then generate a new quota value. I mean we can just take an additional check when updating the quota usage in update service. At least one thing we should promise: admin cmd be respected and making sense.. If this way is okay, I'm not sure other change is really needed. Please let me know your thought, [~dibyendu_hadoop]. I think this way can address your case. > RBF: Race condition between router admin quota update and periodic quota > update service > --- > > Key: HDFS-13811 > URL: https://issues.apache.org/jira/browse/HDFS-13811 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Dibyendu Karmakar >Assignee: Dibyendu Karmakar >Priority: Major > Attachments: HDFS-13811-000.patch > > > If we try to update quota of an existing mount entry and at the same time > periodic quota update service is running on the same mount entry, it is > leading the mount table to _inconsistent state._ > Here transactions are: > A - Quota update service is fetching mount table entries. > B - Quota update service is updating the mount table with current usage. > A' - User is trying to update quota using admin cmd. > and the transaction sequence is [ A A' B ] > quota update service is updating the mount table with old quota value. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org