[jira] [Comment Edited] (HDFS-10702) Add a Client API and Proxy Provider to enable stale read from Standby

2017-09-19 Thread Sean Mackrory (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16171883#comment-16171883
 ] 

Sean Mackrory edited comment on HDFS-10702 at 9/19/17 3:38 PM:
---

The assumption of this feature is that an application is responsible for 
knowing when a dataset is stable enough to work on, and that events that happen 
after that transaction ID may affect the accuracy of the results as seen by the 
application. There are obviously cases where it isn't reasonable for an 
application to make an assumption like that, but like I said above, this isn't 
intended for every situation. That said, I'd be all for testing the sequence 
you described to verify exactly how it fails and that it doesn't bring all of 
HDFS down with it - just the client. But if a file is deleted after the 
specified transaction ID and the application tries to access it, returning an 
exception would be the correct behavior, IMO.

I was actually wondering if what you meant was the block locations were out of 
date because the file had been re-replicated in a different configuration due 
to cluster health issues, or decommissioning. Cluster state is distinct from an 
application knowing when it's safe to assume that a dataset is finalized, so 
that complicates the assumption somewhat.

But if it's just a clearly stated assumption that this feature transfers 
responsibility for knowing that a dataset is complete to the client application 
and we test the accessing a deleted file fails in a correct manner, would that 
address your concerns, [~mingma]?


was (Author: mackrorysd):
The assumption of this feature is that an application is responsible for 
knowing when a dataset is stable enough to work on, and that any failures or 
inaccuracies resulting in stuff that happens after the minimum transaction ID 
is assumed by the application. There are obviously case where that's not 
reasonable, but like I said above, this isn't intended for every situation. 
That said, I'd be all for testing the sequence you described to verify exactly 
how it fails and that it doesn't bring all of HDFS down with it - just the 
client. But if a file is deleted after the specified transaction ID and the 
application tries to access it, returning an exception would be the correct 
behavior, IMO.

I was actually wondering if what you meant was the block locations were out of 
date because the file had been re-replicated in a different configuration due 
to cluster health issues, or decommissioning. Cluster state is distinct from an 
application knowing when it's safe to assume that a dataset is finalized, so 
that complicates the assumption somewhat.

But if it's just a clearly stated assumption that this feature transfers 
responsibility for knowing that a dataset is complete to the client application 
and we test the accessing a deleted file fails in a correct manner, would that 
address your concerns, [~mingma]?

> Add a Client API and Proxy Provider to enable stale read from Standby
> -
>
> Key: HDFS-10702
> URL: https://issues.apache.org/jira/browse/HDFS-10702
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jiayi Zhou
>Assignee: Sean Mackrory
>Priority: Minor
> Attachments: HDFS-10702.001.patch, HDFS-10702.002.patch, 
> HDFS-10702.003.patch, HDFS-10702.004.patch, HDFS-10702.005.patch, 
> HDFS-10702.006.patch, HDFS-10702.007.patch, HDFS-10702.008.patch, 
> StaleReadfromStandbyNN.pdf
>
>
> Currently, clients must always talk to the active NameNode when performing 
> any metadata operation, which means active NameNode could be a bottleneck for 
> scalability. One way to solve this problem is to send read-only operations to 
> Standby NameNode. The disadvantage is that it might be a stale read. 
> Here, I'm thinking of adding a Client API to enable/disable stale read from 
> Standby which gives Client the power to set the staleness restriction.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-10702) Add a Client API and Proxy Provider to enable stale read from Standby

2017-09-19 Thread Sean Mackrory (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16171883#comment-16171883
 ] 

Sean Mackrory edited comment on HDFS-10702 at 9/19/17 3:36 PM:
---

The assumption of this feature is that an application is responsible for 
knowing when a dataset is stable enough to work on, and that any failures or 
inaccuracies resulting in stuff that happens after the minimum transaction ID 
is assumed by the application. There are obviously case where that's not 
reasonable, but like I said above, this isn't intended for every situation. 
That said, I'd be all for testing the sequence you described to verify exactly 
how it fails and that it doesn't bring all of HDFS down with it - just the 
client. But if a file is deleted after the specified transaction ID and the 
application tries to access it, returning an exception would be the correct 
behavior, IMO.

I was actually wondering if what you meant was the block locations were out of 
date because the file had been re-replicated in a different configuration due 
to cluster health issues, or decommissioning. Cluster state is distinct from an 
application knowing when it's safe to assume that a dataset is finalized, so 
that complicates the assumption somewhat.

But if it's just a clearly stated assumption that this feature transfers 
responsibility for knowing that a dataset is complete to the client application 
and we test the accessing a deleted file fails in a correct manner, would that 
address your concerns, [~mingma]?


was (Author: mackrorysd):
The assumption of this feature is that an application is responsible for 
knowing when a dataset is stable enough to work on, and that any failures or 
inaccuracies resulting in stuff that happens after the minimum transaction ID 
is assumed by the application. That said, I'd be all for testing the scenario 
above to verify exactly how it fails and that it doesn't bring all of HDFS down 
with it - just the client. But if file is deleted after the specified 
transaction and the application tries to access it, returning an exception 
would be the correct behavior.

I was actually wondering if what you meant was the block locations were out of 
date because the file had been re-replicated in a different configuration due 
to cluster health issues, or decommissioning. Cluster state is distinct from an 
application knowing when it's safe to assume that a dataset is finalized, so 
that complicates the assumption somewhat.

But if it's just a clearly stated assumption that this feature transfers 
reponsibility for knowing that a dataset is complete to the client application 
and we test the accessing a deleted file fails in a correct manner, would that 
address your concerns, [~mingma]?

> Add a Client API and Proxy Provider to enable stale read from Standby
> -
>
> Key: HDFS-10702
> URL: https://issues.apache.org/jira/browse/HDFS-10702
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jiayi Zhou
>Assignee: Sean Mackrory
>Priority: Minor
> Attachments: HDFS-10702.001.patch, HDFS-10702.002.patch, 
> HDFS-10702.003.patch, HDFS-10702.004.patch, HDFS-10702.005.patch, 
> HDFS-10702.006.patch, HDFS-10702.007.patch, HDFS-10702.008.patch, 
> StaleReadfromStandbyNN.pdf
>
>
> Currently, clients must always talk to the active NameNode when performing 
> any metadata operation, which means active NameNode could be a bottleneck for 
> scalability. One way to solve this problem is to send read-only operations to 
> Standby NameNode. The disadvantage is that it might be a stale read. 
> Here, I'm thinking of adding a Client API to enable/disable stale read from 
> Standby which gives Client the power to set the staleness restriction.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-10702) Add a Client API and Proxy Provider to enable stale read from Standby

2017-01-10 Thread Sean Mackrory (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15816062#comment-15816062
 ] 

Sean Mackrory edited comment on HDFS-10702 at 1/10/17 8:25 PM:
---

Attaching a patch that I believe wraps up everyone's feedback so far. 
Specifically:

* A SyncInfo object is used both to request current transaction info, and 
submit minimum transaction info in requests - as suggested by [~andrew.wang] to 
future-proof the interface.
* I had a good look at removing that from the RPC layer. Unless it's in the 
header, we'd have to add it to each individual request, and the more I look at 
it the more cumbersome it is to remove. The best solution there is having an 
HDFs-specific RPC header. I don't think it's that bad that it's in the RPC 
header, myself - no immediate plans to use this for YARN, obviously, but 
specifying bounds on the staleness of data could definitely more generally 
useful for dist. systems than just in HDFS. 
* On a similar note, I was a little concerned about it being a static 
ThreadLocal on the RPC client, but it seems there are other analagous settings 
there so I gather there's some guarantee that clients for different filesystems 
are in different threads?
* I also had a look at supporting federation better. There's some more work I 
need to do there - it wasn't immediately working for me the way it seems it 
should. That's easy to add on later, though. I would suggest that for that and 
for the optimization of using the non-checkpointing standby I file a follow-up 
JIRA and build on top of this patch as-is.
* I had a look at the checkpointer, and I didn't see any dangerous assumptions 
that it was the only one reading the state.

Thanks for the reviews everyone!


was (Author: mackrorysd):
Attaching a patch that I believe wraps up everyone's feedback so far. 
Specifically:

* A SyncInfo object is used both to request current transaction info, and 
submit minimum transaction info in requests - as suggested by awang to 
future-proof the interface.
* I had a good look at removing that from the RPC layer. Unless it's in the 
header, we'd have to add it to each individual request, and the more I look at 
it the more cumbersome it is to remove. The best solution there is having an 
HDFs-specific RPC header. I don't think it's that bad that it's in the RPC 
header, myself - no immediate plans to use this for YARN, obviously, but 
specifying bounds on the staleness of data could definitely more generally 
useful for dist. systems than just in HDFS. 
* On a similar note, I was a little concerned about it being a static 
ThreadLocal on the RPC client, but it seems there are other analagous settings 
there so I gather there's some guarantee that clients for different filesystems 
are in different threads?
* I also had a look at supporting federation better. There's some more work I 
need to do there - it wasn't immediately working for me the way it seems it 
should. That's easy to add on later, though. I would suggest that for that and 
for the optimization of using the non-checkpointing standby I file a follow-up 
JIRA and build on top of this patch as-is.
* I had a look at the checkpointer, and I didn't see any dangerous assumptions 
that it was the only one reading the state.

Thanks for the reviews everyone!

> Add a Client API and Proxy Provider to enable stale read from Standby
> -
>
> Key: HDFS-10702
> URL: https://issues.apache.org/jira/browse/HDFS-10702
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Jiayi Zhou
>Assignee: Jiayi Zhou
>Priority: Minor
> Attachments: HDFS-10702.001.patch, HDFS-10702.002.patch, 
> HDFS-10702.003.patch, HDFS-10702.004.patch, HDFS-10702.005.patch, 
> HDFS-10702.006.patch, HDFS-10702.007.patch, StaleReadfromStandbyNN.pdf
>
>
> Currently, clients must always talk to the active NameNode when performing 
> any metadata operation, which means active NameNode could be a bottleneck for 
> scalability. One way to solve this problem is to send read-only operations to 
> Standby NameNode. The disadvantage is that it might be a stale read. 
> Here, I'm thinking of adding a Client API to enable/disable stale read from 
> Standby which gives Client the power to set the staleness restriction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-10702) Add a Client API and Proxy Provider to enable stale read from Standby

2016-10-26 Thread Sean Mackrory (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15608805#comment-15608805
 ] 

Sean Mackrory edited comment on HDFS-10702 at 10/26/16 3:39 PM:


I think we should also consider ways to not send requests to multiple SNNs. I 
did some testing, with the latest patch and while this does indeed decrease 
traffic to the NN and increase performance when a single SNN is there, if you 
have multiple SNNs this only decreases traffic to the NN - it actually also 
decreases the performance of the client and would further increase overall 
network usage. I don't think we need to be particularly sophisticated about 
picking the most ideal SNN, especially since this feature inherently accepts 
the idea of not having the most up to date metadata.


was (Author: mackrorysd):
I think we should also consider ways to not send requests to multiple SNNs. I 
did some testing, and while this does indeed decrease traffic to the NN and 
increase performance when a single SNN is there, if you have multiple SNNs this 
only decreases traffic to the NN - it actually also decreases the performance 
of the client and would further increase overall network usage. I don't think 
we need to be particularly sophisticated about picking the most ideal SNN, 
especially since this feature inherently accepts the idea of not having the 
most up to date metadata.

> Add a Client API and Proxy Provider to enable stale read from Standby
> -
>
> Key: HDFS-10702
> URL: https://issues.apache.org/jira/browse/HDFS-10702
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Jiayi Zhou
>Assignee: Jiayi Zhou
>Priority: Minor
> Attachments: HDFS-10702.001.patch, HDFS-10702.002.patch, 
> HDFS-10702.003.patch, HDFS-10702.004.patch, HDFS-10702.005.patch, 
> StaleReadfromStandbyNN.pdf
>
>
> Currently, clients must always talk to the active NameNode when performing 
> any metadata operation, which means active NameNode could be a bottleneck for 
> scalability. One way to solve this problem is to send read-only operations to 
> Standby NameNode. The disadvantage is that it might be a stale read. 
> Here, I'm thinking of adding a Client API to enable/disable stale read from 
> Standby which gives Client the power to set the staleness restriction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-10702) Add a Client API and Proxy Provider to enable stale read from Standby

2016-08-19 Thread Jiayi Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15428772#comment-15428772
 ] 

Jiayi Zhou edited comment on HDFS-10702 at 8/19/16 8:35 PM:


Upload a patch to fix some of the issues commented by Andrew. I haven't worked 
on the RpcHeader modification yet, want to see if the new patch runs properly 
first.

* Fix nitty issues. 
* Remove MethodCategory and use annotations instead. Add a new annotation 
ReadOnly.
* Idea of SyncInfo is also to add more fields in the future. 
NamenodeProtocol#getTransactionID is commented as NameNodeProtocol rather that 
ClientProtocol, so I add getSyncInfo() for ClientProtocol. Remove state from 
SyncInfo because we no longer need it. 


was (Author: clouderajiayi):
Upload a path to fix some of the issues commented by Andrew. I haven't worked 
on the RpcHeader modification yet, want to see if the new patch runs properly 
first.

* Fix nitty issues. 
* Remove MethodCategory and use annotations instead. Add a new annotation 
ReadOnly.
* Idea of SyncInfo is also to add more fields in the future. 
NamenodeProtocol#getTransactionID is commented as NameNodeProtocol rather that 
ClientProtocol, so I add getSyncInfo() for ClientProtocol. Remove state from 
SyncInfo because we no longer need it. 

> Add a Client API and Proxy Provider to enable stale read from Standby
> -
>
> Key: HDFS-10702
> URL: https://issues.apache.org/jira/browse/HDFS-10702
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Jiayi Zhou
>Assignee: Jiayi Zhou
>Priority: Minor
> Attachments: HDFS-10702.001.patch, HDFS-10702.002.patch, 
> HDFS-10702.003.patch, HDFS-10702.004.patch, StaleReadfromStandbyNN.pdf
>
>
> Currently, clients must always talk to the active NameNode when performing 
> any metadata operation, which means active NameNode could be a bottleneck for 
> scalability. One way to solve this problem is to send read-only operations to 
> Standby NameNode. The disadvantage is that it might be a stale read. 
> Here, I'm thinking of adding a Client API to enable/disable stale read from 
> Standby which gives Client the power to set the staleness restriction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org