[jira] [Comment Edited] (HDFS-10702) Add a Client API and Proxy Provider to enable stale read from Standby
[ https://issues.apache.org/jira/browse/HDFS-10702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16171883#comment-16171883 ] Sean Mackrory edited comment on HDFS-10702 at 9/19/17 3:38 PM: --- The assumption of this feature is that an application is responsible for knowing when a dataset is stable enough to work on, and that events that happen after that transaction ID may affect the accuracy of the results as seen by the application. There are obviously cases where it isn't reasonable for an application to make an assumption like that, but like I said above, this isn't intended for every situation. That said, I'd be all for testing the sequence you described to verify exactly how it fails and that it doesn't bring all of HDFS down with it - just the client. But if a file is deleted after the specified transaction ID and the application tries to access it, returning an exception would be the correct behavior, IMO. I was actually wondering if what you meant was the block locations were out of date because the file had been re-replicated in a different configuration due to cluster health issues, or decommissioning. Cluster state is distinct from an application knowing when it's safe to assume that a dataset is finalized, so that complicates the assumption somewhat. But if it's just a clearly stated assumption that this feature transfers responsibility for knowing that a dataset is complete to the client application and we test the accessing a deleted file fails in a correct manner, would that address your concerns, [~mingma]? was (Author: mackrorysd): The assumption of this feature is that an application is responsible for knowing when a dataset is stable enough to work on, and that any failures or inaccuracies resulting in stuff that happens after the minimum transaction ID is assumed by the application. There are obviously case where that's not reasonable, but like I said above, this isn't intended for every situation. That said, I'd be all for testing the sequence you described to verify exactly how it fails and that it doesn't bring all of HDFS down with it - just the client. But if a file is deleted after the specified transaction ID and the application tries to access it, returning an exception would be the correct behavior, IMO. I was actually wondering if what you meant was the block locations were out of date because the file had been re-replicated in a different configuration due to cluster health issues, or decommissioning. Cluster state is distinct from an application knowing when it's safe to assume that a dataset is finalized, so that complicates the assumption somewhat. But if it's just a clearly stated assumption that this feature transfers responsibility for knowing that a dataset is complete to the client application and we test the accessing a deleted file fails in a correct manner, would that address your concerns, [~mingma]? > Add a Client API and Proxy Provider to enable stale read from Standby > - > > Key: HDFS-10702 > URL: https://issues.apache.org/jira/browse/HDFS-10702 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Jiayi Zhou >Assignee: Sean Mackrory >Priority: Minor > Attachments: HDFS-10702.001.patch, HDFS-10702.002.patch, > HDFS-10702.003.patch, HDFS-10702.004.patch, HDFS-10702.005.patch, > HDFS-10702.006.patch, HDFS-10702.007.patch, HDFS-10702.008.patch, > StaleReadfromStandbyNN.pdf > > > Currently, clients must always talk to the active NameNode when performing > any metadata operation, which means active NameNode could be a bottleneck for > scalability. One way to solve this problem is to send read-only operations to > Standby NameNode. The disadvantage is that it might be a stale read. > Here, I'm thinking of adding a Client API to enable/disable stale read from > Standby which gives Client the power to set the staleness restriction. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-10702) Add a Client API and Proxy Provider to enable stale read from Standby
[ https://issues.apache.org/jira/browse/HDFS-10702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16171883#comment-16171883 ] Sean Mackrory edited comment on HDFS-10702 at 9/19/17 3:36 PM: --- The assumption of this feature is that an application is responsible for knowing when a dataset is stable enough to work on, and that any failures or inaccuracies resulting in stuff that happens after the minimum transaction ID is assumed by the application. There are obviously case where that's not reasonable, but like I said above, this isn't intended for every situation. That said, I'd be all for testing the sequence you described to verify exactly how it fails and that it doesn't bring all of HDFS down with it - just the client. But if a file is deleted after the specified transaction ID and the application tries to access it, returning an exception would be the correct behavior, IMO. I was actually wondering if what you meant was the block locations were out of date because the file had been re-replicated in a different configuration due to cluster health issues, or decommissioning. Cluster state is distinct from an application knowing when it's safe to assume that a dataset is finalized, so that complicates the assumption somewhat. But if it's just a clearly stated assumption that this feature transfers responsibility for knowing that a dataset is complete to the client application and we test the accessing a deleted file fails in a correct manner, would that address your concerns, [~mingma]? was (Author: mackrorysd): The assumption of this feature is that an application is responsible for knowing when a dataset is stable enough to work on, and that any failures or inaccuracies resulting in stuff that happens after the minimum transaction ID is assumed by the application. That said, I'd be all for testing the scenario above to verify exactly how it fails and that it doesn't bring all of HDFS down with it - just the client. But if file is deleted after the specified transaction and the application tries to access it, returning an exception would be the correct behavior. I was actually wondering if what you meant was the block locations were out of date because the file had been re-replicated in a different configuration due to cluster health issues, or decommissioning. Cluster state is distinct from an application knowing when it's safe to assume that a dataset is finalized, so that complicates the assumption somewhat. But if it's just a clearly stated assumption that this feature transfers reponsibility for knowing that a dataset is complete to the client application and we test the accessing a deleted file fails in a correct manner, would that address your concerns, [~mingma]? > Add a Client API and Proxy Provider to enable stale read from Standby > - > > Key: HDFS-10702 > URL: https://issues.apache.org/jira/browse/HDFS-10702 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Jiayi Zhou >Assignee: Sean Mackrory >Priority: Minor > Attachments: HDFS-10702.001.patch, HDFS-10702.002.patch, > HDFS-10702.003.patch, HDFS-10702.004.patch, HDFS-10702.005.patch, > HDFS-10702.006.patch, HDFS-10702.007.patch, HDFS-10702.008.patch, > StaleReadfromStandbyNN.pdf > > > Currently, clients must always talk to the active NameNode when performing > any metadata operation, which means active NameNode could be a bottleneck for > scalability. One way to solve this problem is to send read-only operations to > Standby NameNode. The disadvantage is that it might be a stale read. > Here, I'm thinking of adding a Client API to enable/disable stale read from > Standby which gives Client the power to set the staleness restriction. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-10702) Add a Client API and Proxy Provider to enable stale read from Standby
[ https://issues.apache.org/jira/browse/HDFS-10702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15816062#comment-15816062 ] Sean Mackrory edited comment on HDFS-10702 at 1/10/17 8:25 PM: --- Attaching a patch that I believe wraps up everyone's feedback so far. Specifically: * A SyncInfo object is used both to request current transaction info, and submit minimum transaction info in requests - as suggested by [~andrew.wang] to future-proof the interface. * I had a good look at removing that from the RPC layer. Unless it's in the header, we'd have to add it to each individual request, and the more I look at it the more cumbersome it is to remove. The best solution there is having an HDFs-specific RPC header. I don't think it's that bad that it's in the RPC header, myself - no immediate plans to use this for YARN, obviously, but specifying bounds on the staleness of data could definitely more generally useful for dist. systems than just in HDFS. * On a similar note, I was a little concerned about it being a static ThreadLocal on the RPC client, but it seems there are other analagous settings there so I gather there's some guarantee that clients for different filesystems are in different threads? * I also had a look at supporting federation better. There's some more work I need to do there - it wasn't immediately working for me the way it seems it should. That's easy to add on later, though. I would suggest that for that and for the optimization of using the non-checkpointing standby I file a follow-up JIRA and build on top of this patch as-is. * I had a look at the checkpointer, and I didn't see any dangerous assumptions that it was the only one reading the state. Thanks for the reviews everyone! was (Author: mackrorysd): Attaching a patch that I believe wraps up everyone's feedback so far. Specifically: * A SyncInfo object is used both to request current transaction info, and submit minimum transaction info in requests - as suggested by awang to future-proof the interface. * I had a good look at removing that from the RPC layer. Unless it's in the header, we'd have to add it to each individual request, and the more I look at it the more cumbersome it is to remove. The best solution there is having an HDFs-specific RPC header. I don't think it's that bad that it's in the RPC header, myself - no immediate plans to use this for YARN, obviously, but specifying bounds on the staleness of data could definitely more generally useful for dist. systems than just in HDFS. * On a similar note, I was a little concerned about it being a static ThreadLocal on the RPC client, but it seems there are other analagous settings there so I gather there's some guarantee that clients for different filesystems are in different threads? * I also had a look at supporting federation better. There's some more work I need to do there - it wasn't immediately working for me the way it seems it should. That's easy to add on later, though. I would suggest that for that and for the optimization of using the non-checkpointing standby I file a follow-up JIRA and build on top of this patch as-is. * I had a look at the checkpointer, and I didn't see any dangerous assumptions that it was the only one reading the state. Thanks for the reviews everyone! > Add a Client API and Proxy Provider to enable stale read from Standby > - > > Key: HDFS-10702 > URL: https://issues.apache.org/jira/browse/HDFS-10702 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Jiayi Zhou >Assignee: Jiayi Zhou >Priority: Minor > Attachments: HDFS-10702.001.patch, HDFS-10702.002.patch, > HDFS-10702.003.patch, HDFS-10702.004.patch, HDFS-10702.005.patch, > HDFS-10702.006.patch, HDFS-10702.007.patch, StaleReadfromStandbyNN.pdf > > > Currently, clients must always talk to the active NameNode when performing > any metadata operation, which means active NameNode could be a bottleneck for > scalability. One way to solve this problem is to send read-only operations to > Standby NameNode. The disadvantage is that it might be a stale read. > Here, I'm thinking of adding a Client API to enable/disable stale read from > Standby which gives Client the power to set the staleness restriction. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-10702) Add a Client API and Proxy Provider to enable stale read from Standby
[ https://issues.apache.org/jira/browse/HDFS-10702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15608805#comment-15608805 ] Sean Mackrory edited comment on HDFS-10702 at 10/26/16 3:39 PM: I think we should also consider ways to not send requests to multiple SNNs. I did some testing, with the latest patch and while this does indeed decrease traffic to the NN and increase performance when a single SNN is there, if you have multiple SNNs this only decreases traffic to the NN - it actually also decreases the performance of the client and would further increase overall network usage. I don't think we need to be particularly sophisticated about picking the most ideal SNN, especially since this feature inherently accepts the idea of not having the most up to date metadata. was (Author: mackrorysd): I think we should also consider ways to not send requests to multiple SNNs. I did some testing, and while this does indeed decrease traffic to the NN and increase performance when a single SNN is there, if you have multiple SNNs this only decreases traffic to the NN - it actually also decreases the performance of the client and would further increase overall network usage. I don't think we need to be particularly sophisticated about picking the most ideal SNN, especially since this feature inherently accepts the idea of not having the most up to date metadata. > Add a Client API and Proxy Provider to enable stale read from Standby > - > > Key: HDFS-10702 > URL: https://issues.apache.org/jira/browse/HDFS-10702 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Jiayi Zhou >Assignee: Jiayi Zhou >Priority: Minor > Attachments: HDFS-10702.001.patch, HDFS-10702.002.patch, > HDFS-10702.003.patch, HDFS-10702.004.patch, HDFS-10702.005.patch, > StaleReadfromStandbyNN.pdf > > > Currently, clients must always talk to the active NameNode when performing > any metadata operation, which means active NameNode could be a bottleneck for > scalability. One way to solve this problem is to send read-only operations to > Standby NameNode. The disadvantage is that it might be a stale read. > Here, I'm thinking of adding a Client API to enable/disable stale read from > Standby which gives Client the power to set the staleness restriction. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-10702) Add a Client API and Proxy Provider to enable stale read from Standby
[ https://issues.apache.org/jira/browse/HDFS-10702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15428772#comment-15428772 ] Jiayi Zhou edited comment on HDFS-10702 at 8/19/16 8:35 PM: Upload a patch to fix some of the issues commented by Andrew. I haven't worked on the RpcHeader modification yet, want to see if the new patch runs properly first. * Fix nitty issues. * Remove MethodCategory and use annotations instead. Add a new annotation ReadOnly. * Idea of SyncInfo is also to add more fields in the future. NamenodeProtocol#getTransactionID is commented as NameNodeProtocol rather that ClientProtocol, so I add getSyncInfo() for ClientProtocol. Remove state from SyncInfo because we no longer need it. was (Author: clouderajiayi): Upload a path to fix some of the issues commented by Andrew. I haven't worked on the RpcHeader modification yet, want to see if the new patch runs properly first. * Fix nitty issues. * Remove MethodCategory and use annotations instead. Add a new annotation ReadOnly. * Idea of SyncInfo is also to add more fields in the future. NamenodeProtocol#getTransactionID is commented as NameNodeProtocol rather that ClientProtocol, so I add getSyncInfo() for ClientProtocol. Remove state from SyncInfo because we no longer need it. > Add a Client API and Proxy Provider to enable stale read from Standby > - > > Key: HDFS-10702 > URL: https://issues.apache.org/jira/browse/HDFS-10702 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Jiayi Zhou >Assignee: Jiayi Zhou >Priority: Minor > Attachments: HDFS-10702.001.patch, HDFS-10702.002.patch, > HDFS-10702.003.patch, HDFS-10702.004.patch, StaleReadfromStandbyNN.pdf > > > Currently, clients must always talk to the active NameNode when performing > any metadata operation, which means active NameNode could be a bottleneck for > scalability. One way to solve this problem is to send read-only operations to > Standby NameNode. The disadvantage is that it might be a stale read. > Here, I'm thinking of adding a Client API to enable/disable stale read from > Standby which gives Client the power to set the staleness restriction. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org