[jira] [Updated] (HDFS-14660) [SBN Read] ObserverNameNode should throw StandbyException for requests not from ObserverProxyProvider

2019-07-21 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-14660:

Attachment: HDFS-14660.000.patch

> [SBN Read] ObserverNameNode should throw StandbyException for requests not 
> from ObserverProxyProvider
> -
>
> Key: HDFS-14660
> URL: https://issues.apache.org/jira/browse/HDFS-14660
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
> Attachments: HDFS-14660.000.patch
>
>
> In a HDFS HA cluster with consistent reads enabled (HDFS-12943), clients 
> could be using either {{ObserverReadProxyProvider}}, 
> {{ConfiguredProxyProvider}}, or something else. Since observer is just a 
> special type of SBN and we allow transitions between them, a client NOT using 
> {{ObserverReadProxyProvider}} will need to have 
> {{dfs.ha.namenodes.}} include all NameNodes in the cluster, and 
> therefore, it may send request to a observer node.
> For this case, we should check whether the {{stateId}} in the incoming RPC 
> header is set or not, and throw an {{StandbyException}} when it is not. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14660) [SBN Read] ObserverNameNode should throw StandbyException for requests not from ObserverProxyProvider

2019-07-21 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-14660:

Status: Patch Available  (was: Open)

Thanks everyone for the discussion! submitted patch v0. Appreciated if you 
could review this.

> [SBN Read] ObserverNameNode should throw StandbyException for requests not 
> from ObserverProxyProvider
> -
>
> Key: HDFS-14660
> URL: https://issues.apache.org/jira/browse/HDFS-14660
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
> Attachments: HDFS-14660.000.patch
>
>
> In a HDFS HA cluster with consistent reads enabled (HDFS-12943), clients 
> could be using either {{ObserverReadProxyProvider}}, 
> {{ConfiguredProxyProvider}}, or something else. Since observer is just a 
> special type of SBN and we allow transitions between them, a client NOT using 
> {{ObserverReadProxyProvider}} will need to have 
> {{dfs.ha.namenodes.}} include all NameNodes in the cluster, and 
> therefore, it may send request to a observer node.
> For this case, we should check whether the {{stateId}} in the incoming RPC 
> header is set or not, and throw an {{StandbyException}} when it is not. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14660) [SBN Read] ObserverNameNode should throw StandbyException for requests not from ObserverProxyProvider

2019-07-21 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16889832#comment-16889832
 ] 

Chao Sun commented on HDFS-14660:
-

{quote}*For the special value we may raise separate JIRA.(Just we should ensure 
it gets done too.)
{quote}
[~ayushtkn]: could you clarify a little bit on this use case? is the purpose to 
let client read stale data from observer? 

If so, you can already achieve this with {{msync}}, no? just don't do {{msync}} 
from client side.

> [SBN Read] ObserverNameNode should throw StandbyException for requests not 
> from ObserverProxyProvider
> -
>
> Key: HDFS-14660
> URL: https://issues.apache.org/jira/browse/HDFS-14660
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
> Attachments: HDFS-14660.000.patch
>
>
> In a HDFS HA cluster with consistent reads enabled (HDFS-12943), clients 
> could be using either {{ObserverReadProxyProvider}}, 
> {{ConfiguredProxyProvider}}, or something else. Since observer is just a 
> special type of SBN and we allow transitions between them, a client NOT using 
> {{ObserverReadProxyProvider}} will need to have 
> {{dfs.ha.namenodes.}} include all NameNodes in the cluster, and 
> therefore, it may send request to a observer node.
> For this case, we should check whether the {{stateId}} in the incoming RPC 
> header is set or not, and throw an {{StandbyException}} when it is not. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14660) [SBN Read] ObserverNameNode should throw StandbyException for requests not from ObserverProxyProvider

2019-07-21 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16889832#comment-16889832
 ] 

Chao Sun edited comment on HDFS-14660 at 7/21/19 10:08 PM:
---

{quote}For the special value we may raise separate JIRA.(Just we should ensure 
it gets done too.)
{quote}
[~ayushtkn]: could you clarify a little bit on this use case? is the purpose to 
let client read stale data from observer?

If so, you can already achieve this with {{msync}}, no? just don't do {{msync}} 
from client side.


was (Author: csun):
{quote}*For the special value we may raise separate JIRA.(Just we should ensure 
it gets done too.)
{quote}
[~ayushtkn]: could you clarify a little bit on this use case? is the purpose to 
let client read stale data from observer? 

If so, you can already achieve this with {{msync}}, no? just don't do {{msync}} 
from client side.

> [SBN Read] ObserverNameNode should throw StandbyException for requests not 
> from ObserverProxyProvider
> -
>
> Key: HDFS-14660
> URL: https://issues.apache.org/jira/browse/HDFS-14660
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
> Attachments: HDFS-14660.000.patch
>
>
> In a HDFS HA cluster with consistent reads enabled (HDFS-12943), clients 
> could be using either {{ObserverReadProxyProvider}}, 
> {{ConfiguredProxyProvider}}, or something else. Since observer is just a 
> special type of SBN and we allow transitions between them, a client NOT using 
> {{ObserverReadProxyProvider}} will need to have 
> {{dfs.ha.namenodes.}} include all NameNodes in the cluster, and 
> therefore, it may send request to a observer node.
> For this case, we should check whether the {{stateId}} in the incoming RPC 
> header is set or not, and throw an {{StandbyException}} when it is not. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14660) [SBN Read] ObserverNameNode should throw StandbyException for requests not from ObserverProxyProvider

2019-07-21 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-14660:

Attachment: HDFS-14660.001.patch

> [SBN Read] ObserverNameNode should throw StandbyException for requests not 
> from ObserverProxyProvider
> -
>
> Key: HDFS-14660
> URL: https://issues.apache.org/jira/browse/HDFS-14660
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
> Attachments: HDFS-14660.000.patch, HDFS-14660.001.patch
>
>
> In a HDFS HA cluster with consistent reads enabled (HDFS-12943), clients 
> could be using either {{ObserverReadProxyProvider}}, 
> {{ConfiguredProxyProvider}}, or something else. Since observer is just a 
> special type of SBN and we allow transitions between them, a client NOT using 
> {{ObserverReadProxyProvider}} will need to have 
> {{dfs.ha.namenodes.}} include all NameNodes in the cluster, and 
> therefore, it may send request to a observer node.
> For this case, we should check whether the {{stateId}} in the incoming RPC 
> header is set or not, and throw an {{StandbyException}} when it is not. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14660) [SBN Read] ObserverNameNode should throw StandbyException for requests not from ObserverProxyProvider

2019-07-22 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16890315#comment-16890315
 ] 

Chao Sun commented on HDFS-14660:
-

{quote}Now what I think, even if ObserverProxyProvider is set, and the client 
hasn't done msync, now the request goes to the Active, rather than from the 
observer? Earlier which used to go to observer and used to get served 
irrespective of the state at which the observer is. Maybe I need to check 
again. Correct me if wrong.
{quote}

I think in this case, the client stateId will be smaller than that from server 
side, so observer will happily serve the requests. 

> [SBN Read] ObserverNameNode should throw StandbyException for requests not 
> from ObserverProxyProvider
> -
>
> Key: HDFS-14660
> URL: https://issues.apache.org/jira/browse/HDFS-14660
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
> Attachments: HDFS-14660.000.patch, HDFS-14660.001.patch
>
>
> In a HDFS HA cluster with consistent reads enabled (HDFS-12943), clients 
> could be using either {{ObserverReadProxyProvider}}, 
> {{ConfiguredProxyProvider}}, or something else. Since observer is just a 
> special type of SBN and we allow transitions between them, a client NOT using 
> {{ObserverReadProxyProvider}} will need to have 
> {{dfs.ha.namenodes.}} include all NameNodes in the cluster, and 
> therefore, it may send request to a observer node.
> For this case, we should check whether the {{stateId}} in the incoming RPC 
> header is set or not, and throw an {{StandbyException}} when it is not. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14660) [SBN Read] ObserverNameNode should throw StandbyException for requests not from ObserverProxyProvider

2019-07-22 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16890699#comment-16890699
 ] 

Chao Sun commented on HDFS-14660:
-

Thanks [~ayushtkn] and [~Harsha1206] for the feedback!
{quote}Regarding the test. I guess we should assert the standby exception from 
the Observer, which is the actual intention, so as to be sure observer didn't 
serve when configured without observer proxy provider, and it serves when it is 
configured.
{quote}
I guess the current test covers this by checking that when hitting observer, 
the client fails over and goes to the active. It checks the request is served 
by the observer since otherwise the number of files from `listStatus` should be 
0, not 1. I agree ideally it might be better to explicitly check this is a 
{{StandbyException}} from observer, but I'm lacking ideas on how to do that in 
a simple manner. Do you know happen to know anyway that can allow me to do that 
easily?

The comment on the ordering and finally block looks good. I'll make the change.

> [SBN Read] ObserverNameNode should throw StandbyException for requests not 
> from ObserverProxyProvider
> -
>
> Key: HDFS-14660
> URL: https://issues.apache.org/jira/browse/HDFS-14660
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
> Attachments: HDFS-14660.000.patch, HDFS-14660.001.patch
>
>
> In a HDFS HA cluster with consistent reads enabled (HDFS-12943), clients 
> could be using either {{ObserverReadProxyProvider}}, 
> {{ConfiguredProxyProvider}}, or something else. Since observer is just a 
> special type of SBN and we allow transitions between them, a client NOT using 
> {{ObserverReadProxyProvider}} will need to have 
> {{dfs.ha.namenodes.}} include all NameNodes in the cluster, and 
> therefore, it may send request to a observer node.
> For this case, we should check whether the {{stateId}} in the incoming RPC 
> header is set or not, and throw an {{StandbyException}} when it is not. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14034) Support getQuotaUsage API in WebHDFS

2019-07-25 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-14034:

Attachment: HDFS-14034.004.patch

> Support getQuotaUsage API in WebHDFS
> 
>
> Key: HDFS-14034
> URL: https://issues.apache.org/jira/browse/HDFS-14034
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: fs, webhdfs
>Reporter: Erik Krogen
>Assignee: Chao Sun
>Priority: Major
> Attachments: HDFS-14034.000.patch, HDFS-14034.001.patch, 
> HDFS-14034.002.patch, HDFS-14034.004.patch
>
>
> HDFS-8898 added support for a new API, {{getQuotaUsage}} which can fetch 
> quota usage on a directory with significantly lower impact than the similar 
> {{getContentSummary}}. This JIRA is to track adding support for this API to 
> WebHDFS. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14034) Support getQuotaUsage API in WebHDFS

2019-07-25 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16893196#comment-16893196
 ] 

Chao Sun commented on HDFS-14034:
-

Thanks [~xkrogen] for the comments! attached patch v4 to address them. 

[~jojochuang]: it would be great if you can also take a look. Thanks!

> Support getQuotaUsage API in WebHDFS
> 
>
> Key: HDFS-14034
> URL: https://issues.apache.org/jira/browse/HDFS-14034
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: fs, webhdfs
>Reporter: Erik Krogen
>Assignee: Chao Sun
>Priority: Major
> Attachments: HDFS-14034.000.patch, HDFS-14034.001.patch, 
> HDFS-14034.002.patch, HDFS-14034.004.patch
>
>
> HDFS-8898 added support for a new API, {{getQuotaUsage}} which can fetch 
> quota usage on a directory with significantly lower impact than the similar 
> {{getContentSummary}}. This JIRA is to track adding support for this API to 
> WebHDFS. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14671) WebHDFS: Add erasureCodingPolicy to ContentSummary

2019-07-25 Thread Chao Sun (JIRA)
Chao Sun created HDFS-14671:
---

 Summary: WebHDFS: Add erasureCodingPolicy to ContentSummary
 Key: HDFS-14671
 URL: https://issues.apache.org/jira/browse/HDFS-14671
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: webhdfs
Reporter: Chao Sun
Assignee: Chao Sun


HDFS-11647 added {{erasureCodingPolicy}} to {{ContentSummary}}. We should add 
this info to the result from WebHDFS {{getContentSummary}} call as well.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14660) [SBN Read] ObserverNameNode should throw StandbyException for requests not from ObserverProxyProvider

2019-07-25 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-14660:

Attachment: HDFS-14660.002.patch

> [SBN Read] ObserverNameNode should throw StandbyException for requests not 
> from ObserverProxyProvider
> -
>
> Key: HDFS-14660
> URL: https://issues.apache.org/jira/browse/HDFS-14660
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
> Attachments: HDFS-14660.000.patch, HDFS-14660.001.patch
>
>
> In a HDFS HA cluster with consistent reads enabled (HDFS-12943), clients 
> could be using either {{ObserverReadProxyProvider}}, 
> {{ConfiguredProxyProvider}}, or something else. Since observer is just a 
> special type of SBN and we allow transitions between them, a client NOT using 
> {{ObserverReadProxyProvider}} will need to have 
> {{dfs.ha.namenodes.}} include all NameNodes in the cluster, and 
> therefore, it may send request to a observer node.
> For this case, we should check whether the {{stateId}} in the incoming RPC 
> header is set or not, and throw an {{StandbyException}} when it is not. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14660) [SBN Read] ObserverNameNode should throw StandbyException for requests not from ObserverProxyProvider

2019-07-25 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-14660:

Attachment: HDFS-14660.002.patch

> [SBN Read] ObserverNameNode should throw StandbyException for requests not 
> from ObserverProxyProvider
> -
>
> Key: HDFS-14660
> URL: https://issues.apache.org/jira/browse/HDFS-14660
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
> Attachments: HDFS-14660.000.patch, HDFS-14660.001.patch, 
> HDFS-14660.002.patch
>
>
> In a HDFS HA cluster with consistent reads enabled (HDFS-12943), clients 
> could be using either {{ObserverReadProxyProvider}}, 
> {{ConfiguredProxyProvider}}, or something else. Since observer is just a 
> special type of SBN and we allow transitions between them, a client NOT using 
> {{ObserverReadProxyProvider}} will need to have 
> {{dfs.ha.namenodes.}} include all NameNodes in the cluster, and 
> therefore, it may send request to a observer node.
> For this case, we should check whether the {{stateId}} in the incoming RPC 
> header is set or not, and throw an {{StandbyException}} when it is not. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14660) [SBN Read] ObserverNameNode should throw StandbyException for requests not from ObserverProxyProvider

2019-07-25 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-14660:

Attachment: (was: HDFS-14660.002.patch)

> [SBN Read] ObserverNameNode should throw StandbyException for requests not 
> from ObserverProxyProvider
> -
>
> Key: HDFS-14660
> URL: https://issues.apache.org/jira/browse/HDFS-14660
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
> Attachments: HDFS-14660.000.patch, HDFS-14660.001.patch
>
>
> In a HDFS HA cluster with consistent reads enabled (HDFS-12943), clients 
> could be using either {{ObserverReadProxyProvider}}, 
> {{ConfiguredProxyProvider}}, or something else. Since observer is just a 
> special type of SBN and we allow transitions between them, a client NOT using 
> {{ObserverReadProxyProvider}} will need to have 
> {{dfs.ha.namenodes.}} include all NameNodes in the cluster, and 
> therefore, it may send request to a observer node.
> For this case, we should check whether the {{stateId}} in the incoming RPC 
> header is set or not, and throw an {{StandbyException}} when it is not. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14660) [SBN Read] ObserverNameNode should throw StandbyException for requests not from ObserverProxyProvider

2019-07-25 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16893349#comment-16893349
 ] 

Chao Sun commented on HDFS-14660:
-

[~ayushtkn] I updated the test case by testing the {{StandbyException}}. Can 
you take another look? Thanks. 

> [SBN Read] ObserverNameNode should throw StandbyException for requests not 
> from ObserverProxyProvider
> -
>
> Key: HDFS-14660
> URL: https://issues.apache.org/jira/browse/HDFS-14660
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
> Attachments: HDFS-14660.000.patch, HDFS-14660.001.patch, 
> HDFS-14660.002.patch
>
>
> In a HDFS HA cluster with consistent reads enabled (HDFS-12943), clients 
> could be using either {{ObserverReadProxyProvider}}, 
> {{ConfiguredProxyProvider}}, or something else. Since observer is just a 
> special type of SBN and we allow transitions between them, a client NOT using 
> {{ObserverReadProxyProvider}} will need to have 
> {{dfs.ha.namenodes.}} include all NameNodes in the cluster, and 
> therefore, it may send request to a observer node.
> For this case, we should check whether the {{stateId}} in the incoming RPC 
> header is set or not, and throw an {{StandbyException}} when it is not. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14674) Got an unexpected txid when tail editlog

2019-07-26 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16893451#comment-16893451
 ] 

Chao Sun commented on HDFS-14674:
-

[~wangzhaohui]: can you clean up your patch? it doesn't apply and contains lots 
of unnecessary changes. Also, it would be great if you can give more details on 
the root cause you found. Thanks.

> Got an unexpected txid when tail editlog
> 
>
> Key: HDFS-14674
> URL: https://issues.apache.org/jira/browse/HDFS-14674
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: wangzhaohui
>Priority: Major
> Attachments: HDFS-14674.patch, image-2019-07-26-11-34-23-405.png
>
>
> Add the following configuration
> !image-2019-07-26-11-34-23-405.png!
> error:
> {code:java}
> //代码占位符
> [2019-07-17T11:50:21.048+08:00] [INFO] [Edit log tailer] : replaying edit 
> log: 1/20512836 transactions completed. (0%) [2019-07-17T11:50:21.059+08:00] 
> [INFO] [Edit log tailer] : Edits file 
> http://ip/getJournal?jid=ns1003&segmentTxId=232056426162&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH,
>  
> http://ip/getJournal?ipjid=ns1003&segmentTxId=232056426162&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH,
>  
> http://ip/getJournal?ipjid=ns1003&segmentTxId=232056426162&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH
>  of size 3126782311 edits # 500 loaded in 3 seconds 
> [2019-07-17T11:50:21.059+08:00] [INFO] [Edit log tailer] : Reading 
> org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@51ceb7bc 
> expecting start txid #232056752162 [2019-07-17T11:50:21.059+08:00] [INFO] 
> [Edit log tailer] : Start loading edits file 
> http://ip/getJournal?ipjid=ns1003&segmentTxId=232077264498&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH,
>  
> http://ip/getJournal?ipjid=ns1003&segmentTxId=232077264498&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH,
>  
> http://ip/getJournal?ipjid=ns1003&segmentTxId=232077264498&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH
>  maxTxnipsToRead = 500 [2019-07-17T11:50:21.059+08:00] [INFO] [Edit log 
> tailer] : Fast-forwarding stream 
> 'http://ip/getJournal?jid=ns1003&segmentTxId=232077264498&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH,
>  
> http://ip/getJournal?ipjid=ns1003&segmentTxId=232077264498&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH,
>  
> http://ip/getJournal?ipjid=ns1003&segmentTxId=232077264498&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH'
>  to transaction ID 232056751662 [2019-07-17T11:50:21.059+08:00] [INFO] [Edit 
> log tailer] ip: Fast-forwarding stream 
> 'http://ip/getJournal?jid=ns1003&segmentTxId=232077264498&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH'
>  to transaction ID 232056751662 [2019-07-17T11:50:21.061+08:00] [ERROR] [Edit 
> log tailer] : Unknown error encountered while tailing edits. Shutting down 
> standby NN. java.io.IOException: There appears to be a gap in the edit log. 
> We expected txid 232056752162, but got txid 232077264498. at 
> org.apache.hadoop.hdfs.server.namenode.MetaRecoveryContext.editLogLoaderPrompt(MetaRecoveryContext.java:94)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:239)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:161)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:895) at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:321)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:460)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(EditLogTailer.java:410)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:427)
>  at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:414)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:423)
>  [2019-07-17T11:50:21.064+08:00] [INFO] [Edit log tailer] : Exiting with 
> status 1 [2019-07-17T11:50:21.066+08:00] [INFO] [Thread-1] : SHUTDOWN_MSG: 
> / SHUTDOWN_MSG: 
> Shutting down NameNode at ip 
> /
> {code}
>  
> if dfs.ha.tail-edits.max-txns-per-lock value is 500,when the namenode load 
> the editlog util 500,the current namenode will load the next editlog,but 
> editlog more than 500.So,namenode got an unexpected txid when tail editlog.
>  



--
This message was sent by Atlassian JI

[jira] [Updated] (HDFS-14464) Remove unnecessary log message from DFSInputStream

2019-07-26 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-14464:

Attachment: HDFS-14464-branch-2.000.patch

> Remove unnecessary log message from DFSInputStream
> --
>
> Key: HDFS-14464
> URL: https://issues.apache.org/jira/browse/HDFS-14464
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Priority: Trivial
> Attachments: HDFS-14464-branch-2.000.patch
>
>
> This was added by HDFS-8703.  This usually don't come out unless user makes 
> 0-byte read calls, which does happen.
> {code:java}
>  if (ret == 0) {
>DFSClient.LOG.warn("zero");
>  }
> {code}
> This was removed by HDFS-8905 in trunk and 3.x, but remained in 2.x.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14464) Remove unnecessary log message from DFSInputStream

2019-07-26 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-14464:

Status: Patch Available  (was: Open)

> Remove unnecessary log message from DFSInputStream
> --
>
> Key: HDFS-14464
> URL: https://issues.apache.org/jira/browse/HDFS-14464
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Chao Sun
>Priority: Trivial
> Attachments: HDFS-14464-branch-2.000.patch
>
>
> This was added by HDFS-8703.  This usually don't come out unless user makes 
> 0-byte read calls, which does happen.
> {code:java}
>  if (ret == 0) {
>DFSClient.LOG.warn("zero");
>  }
> {code}
> This was removed by HDFS-8905 in trunk and 3.x, but remained in 2.x.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-14464) Remove unnecessary log message from DFSInputStream

2019-07-26 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun reassigned HDFS-14464:
---

Assignee: Chao Sun

> Remove unnecessary log message from DFSInputStream
> --
>
> Key: HDFS-14464
> URL: https://issues.apache.org/jira/browse/HDFS-14464
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Chao Sun
>Priority: Trivial
> Attachments: HDFS-14464-branch-2.000.patch
>
>
> This was added by HDFS-8703.  This usually don't come out unless user makes 
> 0-byte read calls, which does happen.
> {code:java}
>  if (ret == 0) {
>DFSClient.LOG.warn("zero");
>  }
> {code}
> This was removed by HDFS-8905 in trunk and 3.x, but remained in 2.x.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14660) [SBN Read] ObserverNameNode should throw StandbyException for requests not from ObserverProxyProvider

2019-07-26 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16894271#comment-16894271
 ] 

Chao Sun commented on HDFS-14660:
-

[~ayushtkn] Oops not sure how I missed that. Attached patch v3.

> [SBN Read] ObserverNameNode should throw StandbyException for requests not 
> from ObserverProxyProvider
> -
>
> Key: HDFS-14660
> URL: https://issues.apache.org/jira/browse/HDFS-14660
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
> Attachments: HDFS-14660.000.patch, HDFS-14660.001.patch, 
> HDFS-14660.002.patch, HDFS-14660.003.patch
>
>
> In a HDFS HA cluster with consistent reads enabled (HDFS-12943), clients 
> could be using either {{ObserverReadProxyProvider}}, 
> {{ConfiguredProxyProvider}}, or something else. Since observer is just a 
> special type of SBN and we allow transitions between them, a client NOT using 
> {{ObserverReadProxyProvider}} will need to have 
> {{dfs.ha.namenodes.}} include all NameNodes in the cluster, and 
> therefore, it may send request to a observer node.
> For this case, we should check whether the {{stateId}} in the incoming RPC 
> header is set or not, and throw an {{StandbyException}} when it is not. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14660) [SBN Read] ObserverNameNode should throw StandbyException for requests not from ObserverProxyProvider

2019-07-26 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-14660:

Attachment: HDFS-14660.003.patch

> [SBN Read] ObserverNameNode should throw StandbyException for requests not 
> from ObserverProxyProvider
> -
>
> Key: HDFS-14660
> URL: https://issues.apache.org/jira/browse/HDFS-14660
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
> Attachments: HDFS-14660.000.patch, HDFS-14660.001.patch, 
> HDFS-14660.002.patch, HDFS-14660.003.patch
>
>
> In a HDFS HA cluster with consistent reads enabled (HDFS-12943), clients 
> could be using either {{ObserverReadProxyProvider}}, 
> {{ConfiguredProxyProvider}}, or something else. Since observer is just a 
> special type of SBN and we allow transitions between them, a client NOT using 
> {{ObserverReadProxyProvider}} will need to have 
> {{dfs.ha.namenodes.}} include all NameNodes in the cluster, and 
> therefore, it may send request to a observer node.
> For this case, we should check whether the {{stateId}} in the incoming RPC 
> header is set or not, and throw an {{StandbyException}} when it is not. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14660) [SBN Read] ObserverNameNode should throw StandbyException for requests not from ObserverProxyProvider

2019-07-27 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-14660:

Attachment: HDFS-14660.004.patch

> [SBN Read] ObserverNameNode should throw StandbyException for requests not 
> from ObserverProxyProvider
> -
>
> Key: HDFS-14660
> URL: https://issues.apache.org/jira/browse/HDFS-14660
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
> Attachments: HDFS-14660.000.patch, HDFS-14660.001.patch, 
> HDFS-14660.002.patch, HDFS-14660.003.patch, HDFS-14660.004.patch
>
>
> In a HDFS HA cluster with consistent reads enabled (HDFS-12943), clients 
> could be using either {{ObserverReadProxyProvider}}, 
> {{ConfiguredProxyProvider}}, or something else. Since observer is just a 
> special type of SBN and we allow transitions between them, a client NOT using 
> {{ObserverReadProxyProvider}} will need to have 
> {{dfs.ha.namenodes.}} include all NameNodes in the cluster, and 
> therefore, it may send request to a observer node.
> For this case, we should check whether the {{stateId}} in the incoming RPC 
> header is set or not, and throw an {{StandbyException}} when it is not. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14660) [SBN Read] ObserverNameNode should throw StandbyException for requests not from ObserverProxyProvider

2019-07-27 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16894474#comment-16894474
 ] 

Chao Sun commented on HDFS-14660:
-

Fixed and attached patch v4.

> [SBN Read] ObserverNameNode should throw StandbyException for requests not 
> from ObserverProxyProvider
> -
>
> Key: HDFS-14660
> URL: https://issues.apache.org/jira/browse/HDFS-14660
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
> Attachments: HDFS-14660.000.patch, HDFS-14660.001.patch, 
> HDFS-14660.002.patch, HDFS-14660.003.patch, HDFS-14660.004.patch
>
>
> In a HDFS HA cluster with consistent reads enabled (HDFS-12943), clients 
> could be using either {{ObserverReadProxyProvider}}, 
> {{ConfiguredProxyProvider}}, or something else. Since observer is just a 
> special type of SBN and we allow transitions between them, a client NOT using 
> {{ObserverReadProxyProvider}} will need to have 
> {{dfs.ha.namenodes.}} include all NameNodes in the cluster, and 
> therefore, it may send request to a observer node.
> For this case, we should check whether the {{stateId}} in the incoming RPC 
> header is set or not, and throw an {{StandbyException}} when it is not. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14660) [SBN Read] ObserverNameNode should throw StandbyException for requests not from ObserverProxyProvider

2019-07-29 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16895464#comment-16895464
 ] 

Chao Sun commented on HDFS-14660:
-

Thanks for committing it [~ayushtkn]! could you also help to commit this to 
branch 2? I think we'll need this in the upcoming 2.10 as well. 

> [SBN Read] ObserverNameNode should throw StandbyException for requests not 
> from ObserverProxyProvider
> -
>
> Key: HDFS-14660
> URL: https://issues.apache.org/jira/browse/HDFS-14660
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14660.000.patch, HDFS-14660.001.patch, 
> HDFS-14660.002.patch, HDFS-14660.003.patch, HDFS-14660.004.patch
>
>
> In a HDFS HA cluster with consistent reads enabled (HDFS-12943), clients 
> could be using either {{ObserverReadProxyProvider}}, 
> {{ConfiguredProxyProvider}}, or something else. Since observer is just a 
> special type of SBN and we allow transitions between them, a client NOT using 
> {{ObserverReadProxyProvider}} will need to have 
> {{dfs.ha.namenodes.}} include all NameNodes in the cluster, and 
> therefore, it may send request to a observer node.
> For this case, we should check whether the {{stateId}} in the incoming RPC 
> header is set or not, and throw an {{StandbyException}} when it is not. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14660) [SBN Read] ObserverNameNode should throw StandbyException for requests not from ObserverProxyProvider

2019-07-29 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16895474#comment-16895474
 ] 

Chao Sun commented on HDFS-14660:
-

[~ayushtkn] my bad - we'll need to wait until HDFS-14204 is committed. I'll 
submit a patch for branch 2 after that. Thanks.

> [SBN Read] ObserverNameNode should throw StandbyException for requests not 
> from ObserverProxyProvider
> -
>
> Key: HDFS-14660
> URL: https://issues.apache.org/jira/browse/HDFS-14660
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14660.000.patch, HDFS-14660.001.patch, 
> HDFS-14660.002.patch, HDFS-14660.003.patch, HDFS-14660.004.patch
>
>
> In a HDFS HA cluster with consistent reads enabled (HDFS-12943), clients 
> could be using either {{ObserverReadProxyProvider}}, 
> {{ConfiguredProxyProvider}}, or something else. Since observer is just a 
> special type of SBN and we allow transitions between them, a client NOT using 
> {{ObserverReadProxyProvider}} will need to have 
> {{dfs.ha.namenodes.}} include all NameNodes in the cluster, and 
> therefore, it may send request to a observer node.
> For this case, we should check whether the {{stateId}} in the incoming RPC 
> header is set or not, and throw an {{StandbyException}} when it is not. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14678) Allow triggerBlockReport to a specific namenode

2019-07-29 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16895664#comment-16895664
 ] 

Chao Sun commented on HDFS-14678:
-

cc [~jojochuang]: could you add [~LeonG] as contributor and assign this JIRA to 
him? also it would be great if you know whether this issue has been raised 
before or not :) - it is one of the items we want to improve in H2.

> Allow triggerBlockReport to a specific namenode
> ---
>
> Key: HDFS-14678
> URL: https://issues.apache.org/jira/browse/HDFS-14678
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.8.2
>Reporter: Leon Gao
>Priority: Minor
>
> In our largest prod cluster (running 2.8.2) we have >3k hosts. Every time 
> when rolling restarting NNs we will need to wait for block report which takes 
> >2.5 hours for each NN.
> One way to make it faster is to manually trigger a full block report from all 
> datanodes. [HDFS-7278|https://issues.apache.org/jira/browse/HDFS-7278]. 
> However, the current triggerBlockReport command will trigger a block report 
> on all NNs which will flood the active NN as well.
> A quick solution will be adding an option to specify a NN that the manually 
> triggered block report will go to, something like:
> *_hdfs dfsadmin [-triggerBlockReport [-incremental] ] 
> [-namenode] _*
> So when doing a restart of standby NN or observer NN we can trigger an 
> aggressive block report to a specific NN to exit safemode faster without 
> risking active NN performance.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-14681) TestDisableRouterQuota failed because port 8888 was occupied

2019-07-30 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun reassigned HDFS-14681:
---

Assignee: Chao Sun

> TestDisableRouterQuota failed because port  was occupied
> 
>
> Key: HDFS-14681
> URL: https://issues.apache.org/jira/browse/HDFS-14681
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Reporter: Wei-Chiu Chuang
>Assignee: Chao Sun
>Priority: Minor
>
> HDFS-13710 added TestDisableRouterQuota. 
> It appears the test always occupy port  and if I happen to have something 
> using that port already, the test fails.
> {noformat}
> [ERROR] 
> org.apache.hadoop.hdfs.server.federation.router.TestDisableRouterQuota  Time 
> elapsed: 0.533 s  <<< ERROR!
> org.apache.hadoop.service.ServiceStateException: java.net.BindException: 
> Problem binding to [0.0.0.0:] java.net.BindException: Address already in 
> use; For mo
> re details see:  http://wiki.apache.org/hadoop/BindException
> at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:173)
> at 
> org.apache.hadoop.hdfs.server.federation.router.TestDisableRouterQuota.setUp(TestDisableRouterQuota.java:49)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
> at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
> at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
> at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
> at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
> at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
> at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:379)
> at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:340)
> at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:125)
> at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:413)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14681) TestDisableRouterQuota failed because port 8888 was occupied

2019-07-30 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16896477#comment-16896477
 ] 

Chao Sun commented on HDFS-14681:
-

{quote} is Router Rpc Default port. Any idea what stayed up on , is it 
problem with the UT that blocks the port?
{quote}
A quick grep on  shows it's being used in multiple places, so I'll not be 
so surprised if this failure happens. It also could be some external 
application occupying that port too. Anyhow, in UT we'd better use 
{{NetUtils#getFreeSocketPort}} to avoid the port collision.

> TestDisableRouterQuota failed because port  was occupied
> 
>
> Key: HDFS-14681
> URL: https://issues.apache.org/jira/browse/HDFS-14681
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Reporter: Wei-Chiu Chuang
>Assignee: Chao Sun
>Priority: Minor
>
> HDFS-13710 added TestDisableRouterQuota. 
> It appears the test always occupy port  and if I happen to have something 
> using that port already, the test fails.
> {noformat}
> [ERROR] 
> org.apache.hadoop.hdfs.server.federation.router.TestDisableRouterQuota  Time 
> elapsed: 0.533 s  <<< ERROR!
> org.apache.hadoop.service.ServiceStateException: java.net.BindException: 
> Problem binding to [0.0.0.0:] java.net.BindException: Address already in 
> use; For mo
> re details see:  http://wiki.apache.org/hadoop/BindException
> at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:173)
> at 
> org.apache.hadoop.hdfs.server.federation.router.TestDisableRouterQuota.setUp(TestDisableRouterQuota.java:49)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
> at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
> at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
> at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
> at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
> at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
> at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:379)
> at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:340)
> at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:125)
> at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:413)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14674) Got an unexpected txid when tail editlog

2019-07-30 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16896818#comment-16896818
 ] 

Chao Sun commented on HDFS-14674:
-

Thanks [~wangzhaohui] and [~wuweiwei]. Now I see the issue: suppose the 
{{editStreams}} contains two edits: {{[0,1000)}}, and {{[1000,2000]}}, and the 
config is set to {{500}}, then in the loop it will load the first 500 edits of 
the first edit stream and the continue to load the first 500 of the second 
stream, skipping the rest 500 edits in the first stream. 

The patch looks good to me. It would be great if you can add a test case for 
this though. Also cc [~vagarychen] and [~shv].

> Got an unexpected txid when tail editlog
> 
>
> Key: HDFS-14674
> URL: https://issues.apache.org/jira/browse/HDFS-14674
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: wangzhaohui
>Assignee: wangzhaohui
>Priority: Major
> Attachments: HDFS-14674-001.patch, image-2019-07-26-11-34-23-405.png
>
>
> Add the following configuration
> !image-2019-07-26-11-34-23-405.png!
> error:
> {code:java}
> //代码占位符
> [2019-07-17T11:50:21.048+08:00] [INFO] [Edit log tailer] : replaying edit 
> log: 1/20512836 transactions completed. (0%) [2019-07-17T11:50:21.059+08:00] 
> [INFO] [Edit log tailer] : Edits file 
> http://ip/getJournal?jid=ns1003&segmentTxId=232056426162&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH,
>  
> http://ip/getJournal?ipjid=ns1003&segmentTxId=232056426162&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH,
>  
> http://ip/getJournal?ipjid=ns1003&segmentTxId=232056426162&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH
>  of size 3126782311 edits # 500 loaded in 3 seconds 
> [2019-07-17T11:50:21.059+08:00] [INFO] [Edit log tailer] : Reading 
> org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@51ceb7bc 
> expecting start txid #232056752162 [2019-07-17T11:50:21.059+08:00] [INFO] 
> [Edit log tailer] : Start loading edits file 
> http://ip/getJournal?ipjid=ns1003&segmentTxId=232077264498&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH,
>  
> http://ip/getJournal?ipjid=ns1003&segmentTxId=232077264498&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH,
>  
> http://ip/getJournal?ipjid=ns1003&segmentTxId=232077264498&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH
>  maxTxnipsToRead = 500 [2019-07-17T11:50:21.059+08:00] [INFO] [Edit log 
> tailer] : Fast-forwarding stream 
> 'http://ip/getJournal?jid=ns1003&segmentTxId=232077264498&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH,
>  
> http://ip/getJournal?ipjid=ns1003&segmentTxId=232077264498&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH,
>  
> http://ip/getJournal?ipjid=ns1003&segmentTxId=232077264498&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH'
>  to transaction ID 232056751662 [2019-07-17T11:50:21.059+08:00] [INFO] [Edit 
> log tailer] ip: Fast-forwarding stream 
> 'http://ip/getJournal?jid=ns1003&segmentTxId=232077264498&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH'
>  to transaction ID 232056751662 [2019-07-17T11:50:21.061+08:00] [ERROR] [Edit 
> log tailer] : Unknown error encountered while tailing edits. Shutting down 
> standby NN. java.io.IOException: There appears to be a gap in the edit log. 
> We expected txid 232056752162, but got txid 232077264498. at 
> org.apache.hadoop.hdfs.server.namenode.MetaRecoveryContext.editLogLoaderPrompt(MetaRecoveryContext.java:94)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:239)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:161)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:895) at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:321)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:460)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(EditLogTailer.java:410)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:427)
>  at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:414)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:423)
>  [2019-07-17T11:50:21.064+08:00] [INFO] [Edit log tailer] : Exiting with 
> status 1 [2019-07-17T11:50:21.066+08:00] [INFO] [Thread-1] : SHUTDOWN_MSG: 
> / SHUTDOWN_MSG: 
> Shutting down NameNode at ip 
> **

[jira] [Commented] (HDFS-14034) Support getQuotaUsage API in WebHDFS

2019-07-30 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16896826#comment-16896826
 ] 

Chao Sun commented on HDFS-14034:
-

Thanks [~jojochuang] and [~xkrogen] for the review! I'll submit another patch 
for branch-2 later. [~smeng] I've already created HDFS-14671 for the 
{{erasureCodingPolicy}} issue under the parent Jira.

> Support getQuotaUsage API in WebHDFS
> 
>
> Key: HDFS-14034
> URL: https://issues.apache.org/jira/browse/HDFS-14034
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: fs, webhdfs
>Reporter: Erik Krogen
>Assignee: Chao Sun
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14034.000.patch, HDFS-14034.001.patch, 
> HDFS-14034.002.patch, HDFS-14034.004.patch
>
>
> HDFS-8898 added support for a new API, {{getQuotaUsage}} which can fetch 
> quota usage on a directory with significantly lower impact than the similar 
> {{getContentSummary}}. This JIRA is to track adding support for this API to 
> WebHDFS. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-14671) WebHDFS: Add erasureCodingPolicy to ContentSummary

2019-07-31 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun resolved HDFS-14671.
-
Resolution: Duplicate

> WebHDFS: Add erasureCodingPolicy to ContentSummary
> --
>
> Key: HDFS-14671
> URL: https://issues.apache.org/jira/browse/HDFS-14671
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: webhdfs
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
>
> HDFS-11647 added {{erasureCodingPolicy}} to {{ContentSummary}}. We should add 
> this info to the result from WebHDFS {{getContentSummary}} call as well.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14034) Support getQuotaUsage API in WebHDFS

2019-07-31 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16896826#comment-16896826
 ] 

Chao Sun edited comment on HDFS-14034 at 7/31/19 7:12 AM:
--

Thanks [~jojochuang] and [~xkrogen] for the review! I'll submit another patch 
for branch-2 later. [~smeng] I've already created HDFS-14671 for the 
{{erasureCodingPolicy}} issue under the parent Jira. Saw you are working on it 
so resolved it as duplicate.


was (Author: csun):
Thanks [~jojochuang] and [~xkrogen] for the review! I'll submit another patch 
for branch-2 later. [~smeng] I've already created HDFS-14671 for the 
{{erasureCodingPolicy}} issue under the parent Jira.

> Support getQuotaUsage API in WebHDFS
> 
>
> Key: HDFS-14034
> URL: https://issues.apache.org/jira/browse/HDFS-14034
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: fs, webhdfs
>Reporter: Erik Krogen
>Assignee: Chao Sun
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14034.000.patch, HDFS-14034.001.patch, 
> HDFS-14034.002.patch, HDFS-14034.004.patch
>
>
> HDFS-8898 added support for a new API, {{getQuotaUsage}} which can fetch 
> quota usage on a directory with significantly lower impact than the similar 
> {{getContentSummary}}. This JIRA is to track adding support for this API to 
> WebHDFS. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-8631) WebHDFS : Support get/setQuota

2019-07-31 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun reassigned HDFS-8631:
--

Assignee: Chao Sun  (was: Xue Liu)

> WebHDFS : Support get/setQuota
> --
>
> Key: HDFS-8631
> URL: https://issues.apache.org/jira/browse/HDFS-8631
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 2.7.2
>Reporter: nijel
>Assignee: Chao Sun
>Priority: Major
> Attachments: HDFS-8631-001.patch, HDFS-8631-002.patch, 
> HDFS-8631-003.patch, HDFS-8631-004.patch, HDFS-8631-005.patch, 
> HDFS-8631-006.patch
>
>
> User is able do quota management from filesystem object. Same operation can 
> be allowed trough REST API.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-8631) WebHDFS : Support setQuota

2019-07-31 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-8631:
---
Summary: WebHDFS : Support setQuota  (was: WebHDFS : Support get/setQuota)

> WebHDFS : Support setQuota
> --
>
> Key: HDFS-8631
> URL: https://issues.apache.org/jira/browse/HDFS-8631
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 2.7.2
>Reporter: nijel
>Assignee: Chao Sun
>Priority: Major
> Attachments: HDFS-8631-001.patch, HDFS-8631-002.patch, 
> HDFS-8631-003.patch, HDFS-8631-004.patch, HDFS-8631-005.patch, 
> HDFS-8631-006.patch
>
>
> User is able do quota management from filesystem object. Same operation can 
> be allowed trough REST API.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13616) Batch listing of multiple directories

2019-07-31 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16897311#comment-16897311
 ] 

Chao Sun commented on HDFS-13616:
-

This seems like a great feature. [~andrew.wang] do you still plan to finish 
this? I'd be happy to help to move this forward.

> Batch listing of multiple directories
> -
>
> Key: HDFS-13616
> URL: https://issues.apache.org/jira/browse/HDFS-13616
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.2.0
>Reporter: Andrew Wang
>Assignee: Andrew Wang
>Priority: Major
> Attachments: BenchmarkListFiles.java, HDFS-13616.001.patch, 
> HDFS-13616.002.patch
>
>
> One of the dominant workloads for external metadata services is listing of 
> partition directories. This can end up being bottlenecked on RTT time when 
> partition directories contain a small number of files. This is fairly common, 
> since fine-grained partitioning is used for partition pruning by the query 
> engines.
> A batched listing API that takes multiple paths amortizes the RTT cost. 
> Initial benchmarks show a 10-20x improvement in metadata loading performance.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14034) Support getQuotaUsage API in WebHDFS

2019-07-31 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16897357#comment-16897357
 ] 

Chao Sun commented on HDFS-14034:
-

Submitted github PR for backporting to branch-2. Please take a look. Thanks.

> Support getQuotaUsage API in WebHDFS
> 
>
> Key: HDFS-14034
> URL: https://issues.apache.org/jira/browse/HDFS-14034
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: fs, webhdfs
>Reporter: Erik Krogen
>Assignee: Chao Sun
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14034.000.patch, HDFS-14034.001.patch, 
> HDFS-14034.002.patch, HDFS-14034.004.patch
>
>
> HDFS-8898 added support for a new API, {{getQuotaUsage}} which can fetch 
> quota usage on a directory with significantly lower impact than the similar 
> {{getContentSummary}}. This JIRA is to track adding support for this API to 
> WebHDFS. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14034) Support getQuotaUsage API in WebHDFS

2019-07-31 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16897550#comment-16897550
 ] 

Chao Sun commented on HDFS-14034:
-

[~xkrogen] the backport was pretty smooth and not many conflicts - we can wait 
until the yetus result comes back.

> Support getQuotaUsage API in WebHDFS
> 
>
> Key: HDFS-14034
> URL: https://issues.apache.org/jira/browse/HDFS-14034
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: fs, webhdfs
>Reporter: Erik Krogen
>Assignee: Chao Sun
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14034.000.patch, HDFS-14034.001.patch, 
> HDFS-14034.002.patch, HDFS-14034.004.patch
>
>
> HDFS-8898 added support for a new API, {{getQuotaUsage}} which can fetch 
> quota usage on a directory with significantly lower impact than the similar 
> {{getContentSummary}}. This JIRA is to track adding support for this API to 
> WebHDFS. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14034) Support getQuotaUsage API in WebHDFS

2019-08-01 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-14034:

Attachment: HDFS-14034-branch-2.000.patch

> Support getQuotaUsage API in WebHDFS
> 
>
> Key: HDFS-14034
> URL: https://issues.apache.org/jira/browse/HDFS-14034
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: fs, webhdfs
>Reporter: Erik Krogen
>Assignee: Chao Sun
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14034-branch-2.000.patch, HDFS-14034.000.patch, 
> HDFS-14034.001.patch, HDFS-14034.002.patch, HDFS-14034.004.patch
>
>
> HDFS-8898 added support for a new API, {{getQuotaUsage}} which can fetch 
> quota usage on a directory with significantly lower impact than the similar 
> {{getContentSummary}}. This JIRA is to track adding support for this API to 
> WebHDFS. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14034) Support getQuotaUsage API in WebHDFS

2019-08-07 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902686#comment-16902686
 ] 

Chao Sun commented on HDFS-14034:
-

Not sure why CI wasn't triggered for branch-2 patch. Re-attach patch v1 to try.

> Support getQuotaUsage API in WebHDFS
> 
>
> Key: HDFS-14034
> URL: https://issues.apache.org/jira/browse/HDFS-14034
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: fs, webhdfs
>Reporter: Erik Krogen
>Assignee: Chao Sun
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14034-branch-2.000.patch, 
> HDFS-14034-branch-2.001.patch, HDFS-14034.000.patch, HDFS-14034.001.patch, 
> HDFS-14034.002.patch, HDFS-14034.004.patch
>
>
> HDFS-8898 added support for a new API, {{getQuotaUsage}} which can fetch 
> quota usage on a directory with significantly lower impact than the similar 
> {{getContentSummary}}. This JIRA is to track adding support for this API to 
> WebHDFS. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14034) Support getQuotaUsage API in WebHDFS

2019-08-07 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-14034:

Attachment: HDFS-14034-branch-2.001.patch

> Support getQuotaUsage API in WebHDFS
> 
>
> Key: HDFS-14034
> URL: https://issues.apache.org/jira/browse/HDFS-14034
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: fs, webhdfs
>Reporter: Erik Krogen
>Assignee: Chao Sun
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14034-branch-2.000.patch, 
> HDFS-14034-branch-2.001.patch, HDFS-14034.000.patch, HDFS-14034.001.patch, 
> HDFS-14034.002.patch, HDFS-14034.004.patch
>
>
> HDFS-8898 added support for a new API, {{getQuotaUsage}} which can fetch 
> quota usage on a directory with significantly lower impact than the similar 
> {{getContentSummary}}. This JIRA is to track adding support for this API to 
> WebHDFS. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14034) Support getQuotaUsage API in WebHDFS

2019-08-07 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-14034:

Status: Patch Available  (was: Reopened)

> Support getQuotaUsage API in WebHDFS
> 
>
> Key: HDFS-14034
> URL: https://issues.apache.org/jira/browse/HDFS-14034
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: fs, webhdfs
>Reporter: Erik Krogen
>Assignee: Chao Sun
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14034-branch-2.000.patch, 
> HDFS-14034-branch-2.001.patch, HDFS-14034.000.patch, HDFS-14034.001.patch, 
> HDFS-14034.002.patch, HDFS-14034.004.patch
>
>
> HDFS-8898 added support for a new API, {{getQuotaUsage}} which can fetch 
> quota usage on a directory with significantly lower impact than the similar 
> {{getContentSummary}}. This JIRA is to track adding support for this API to 
> WebHDFS. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14034) Support getQuotaUsage API in WebHDFS

2019-08-07 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902688#comment-16902688
 ] 

Chao Sun commented on HDFS-14034:
-

Thanks [~ayushtkn]. Let me re-open the Jira and try that.

> Support getQuotaUsage API in WebHDFS
> 
>
> Key: HDFS-14034
> URL: https://issues.apache.org/jira/browse/HDFS-14034
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: fs, webhdfs
>Reporter: Erik Krogen
>Assignee: Chao Sun
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14034-branch-2.000.patch, 
> HDFS-14034-branch-2.001.patch, HDFS-14034.000.patch, HDFS-14034.001.patch, 
> HDFS-14034.002.patch, HDFS-14034.004.patch
>
>
> HDFS-8898 added support for a new API, {{getQuotaUsage}} which can fetch 
> quota usage on a directory with significantly lower impact than the similar 
> {{getContentSummary}}. This JIRA is to track adding support for this API to 
> WebHDFS. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Reopened] (HDFS-14034) Support getQuotaUsage API in WebHDFS

2019-08-07 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun reopened HDFS-14034:
-

Re-opening this for backporting to branch-2.

> Support getQuotaUsage API in WebHDFS
> 
>
> Key: HDFS-14034
> URL: https://issues.apache.org/jira/browse/HDFS-14034
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: fs, webhdfs
>Reporter: Erik Krogen
>Assignee: Chao Sun
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14034-branch-2.000.patch, 
> HDFS-14034-branch-2.001.patch, HDFS-14034.000.patch, HDFS-14034.001.patch, 
> HDFS-14034.002.patch, HDFS-14034.004.patch
>
>
> HDFS-8898 added support for a new API, {{getQuotaUsage}} which can fetch 
> quota usage on a directory with significantly lower impact than the similar 
> {{getContentSummary}}. This JIRA is to track adding support for this API to 
> WebHDFS. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-14709) Add encryption zone related REST APIs to WebHDFS

2019-08-08 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun reassigned HDFS-14709:
---

Assignee: Chao Sun

> Add encryption zone related REST APIs to WebHDFS
> 
>
> Key: HDFS-14709
> URL: https://issues.apache.org/jira/browse/HDFS-14709
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Wei-Chiu Chuang
>Assignee: Chao Sun
>Priority: Major
>
> Webhdfs doesn't handle encryption zone related REST APIs: 
> createEncryptionZone,
> getEZForPath,
> listEncryptionZones,
> reencryptEncryptionZone,
> listReencryptionStatus,
> getFileEncryptionInfo,
> provisionEZTrash,
> This is related, but not the same as HDFS-12355.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14674) [SBN read] Got an unexpected txid when tail editlog

2019-08-09 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16903640#comment-16903640
 ] 

Chao Sun commented on HDFS-14674:
-

[~wangzhaohui] I'm not sure if the test is effective since it is also passing 
without your modification on {{FSImage}}. I think it might be because the # of 
edits is less than the value of {{DFS_HA_TAILEDITS_MAX_TXNS_PER_LOCK_KEY}}.

Besides that, some nits on style:
- Please don't use star import
- The {{InterruptedException}} is never thrown from 
{{testMultiStreamsLoadEditWithConfMaxTxns}}
- one extra space before {{getConf}} in 
{{testMultiStreamsLoadEditWithConfMaxTxns}}
- In line 316 of {{TestEditLog}} you set 
{{DFS_HA_TAILEDITS_MAX_TXNS_PER_LOCK_KEY}} and then in 317 immediately read the 
value and assign to {{remainingReadTxns}} - can you just assign 100 to it?
- need a space before {{catch}} in {{testMultiStreamsLoadEditWithConfMaxTxns}}

> [SBN read] Got an unexpected txid when tail editlog
> ---
>
> Key: HDFS-14674
> URL: https://issues.apache.org/jira/browse/HDFS-14674
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: wangzhaohui
>Assignee: wangzhaohui
>Priority: Blocker
> Attachments: HDFS-14674-001.patch, HDFS-14674-003.patch, 
> HDFS-14674-004.patch, HDFS-14674-005.patch, HDFS-14674-006.patch, image.png
>
>
> Add the following configuration
> !image-2019-07-26-11-34-23-405.png!
> error:
> {code:java}
> //
> [2019-07-17T11:50:21.048+08:00] [INFO] [Edit log tailer] : replaying edit 
> log: 1/20512836 transactions completed. (0%) [2019-07-17T11:50:21.059+08:00] 
> [INFO] [Edit log tailer] : Edits file 
> http://ip/getJournal?jid=ns1003&segmentTxId=232056426162&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH,
>  
> http://ip/getJournal?ipjid=ns1003&segmentTxId=232056426162&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH,
>  
> http://ip/getJournal?ipjid=ns1003&segmentTxId=232056426162&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH
>  of size 3126782311 edits # 500 loaded in 3 seconds 
> [2019-07-17T11:50:21.059+08:00] [INFO] [Edit log tailer] : Reading 
> org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@51ceb7bc 
> expecting start txid #232056752162 [2019-07-17T11:50:21.059+08:00] [INFO] 
> [Edit log tailer] : Start loading edits file 
> http://ip/getJournal?ipjid=ns1003&segmentTxId=232077264498&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH,
>  
> http://ip/getJournal?ipjid=ns1003&segmentTxId=232077264498&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH,
>  
> http://ip/getJournal?ipjid=ns1003&segmentTxId=232077264498&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH
>  maxTxnipsToRead = 500 [2019-07-17T11:50:21.059+08:00] [INFO] [Edit log 
> tailer] : Fast-forwarding stream 
> 'http://ip/getJournal?jid=ns1003&segmentTxId=232077264498&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH,
>  
> http://ip/getJournal?ipjid=ns1003&segmentTxId=232077264498&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH,
>  
> http://ip/getJournal?ipjid=ns1003&segmentTxId=232077264498&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH'
>  to transaction ID 232056751662 [2019-07-17T11:50:21.059+08:00] [INFO] [Edit 
> log tailer] ip: Fast-forwarding stream 
> 'http://ip/getJournal?jid=ns1003&segmentTxId=232077264498&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH'
>  to transaction ID 232056751662 [2019-07-17T11:50:21.061+08:00] [ERROR] [Edit 
> log tailer] : Unknown error encountered while tailing edits. Shutting down 
> standby NN. java.io.IOException: There appears to be a gap in the edit log. 
> We expected txid 232056752162, but got txid 232077264498. at 
> org.apache.hadoop.hdfs.server.namenode.MetaRecoveryContext.editLogLoaderPrompt(MetaRecoveryContext.java:94)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:239)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:161)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:895) at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:321)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:460)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(EditLogTailer.java:410)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:427)
>  at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:414)
>  at 
> org.apache.hadoop.hdfs.server.nameno

[jira] [Commented] (HDFS-14317) Standby does not trigger edit log rolling when in-progress edit log tailing is enabled

2019-03-01 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16782170#comment-16782170
 ] 

Chao Sun commented on HDFS-14317:
-

Regarding the time unit issue for log rolling and tailing, perhaps we should 
file a JIRA to fix this, since someone may want to use sub-second tailing 
frequency for standby reads, such as 100ms, but this right now will lose 
precision and be converted to 0ms.

> Standby does not trigger edit log rolling when in-progress edit log tailing 
> is enabled
> --
>
> Key: HDFS-14317
> URL: https://issues.apache.org/jira/browse/HDFS-14317
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.9.0, 3.0.0
>Reporter: Ekanth Sethuramalingam
>Assignee: Ekanth Sethuramalingam
>Priority: Critical
> Attachments: HDFS-14317.001.patch, HDFS-14317.002.patch, 
> HDFS-14317.003.patch, HDFS-14317.004.patch
>
>
> The standby uses the following method to check if it is time to trigger edit 
> log rolling on active.
> {code}
>   /**
>* @return true if the configured log roll period has elapsed.
>*/
>   private boolean tooLongSinceLastLoad() {
> return logRollPeriodMs >= 0 && 
>   (monotonicNow() - lastLoadTimeMs) > logRollPeriodMs ;
>   }
> {code}
> In doTailEdits(), lastLoadTimeMs is updated when standby is able to 
> successfully tail any edits
> {code}
>   if (editsLoaded > 0) {
> lastLoadTimeMs = monotonicNow();
>   }
> {code}
> The default configuration for {{dfs.ha.log-roll.period}} is 120 seconds and 
> {{dfs.ha.tail-edits.period}} is 60 seconds. With in-progress edit log tailing 
> enabled, tooLongSinceLastLoad() will almost never return true resulting in 
> edit logs not rolled for a long time until this configuration 
> {{dfs.namenode.edit.log.autoroll.multiplier.threshold}} takes effect.
> [In our deployment, this resulted in in-progress edit logs getting deleted. 
> The sequence of events is that standby was able to checkpoint twice while the 
> in-progress edit log was growing on active. When the 
> NNStorageRetentionManager decided to cleanup old checkpoints and edit logs, 
> it cleaned up the in-progress edit log from active and QJM (as the txnid on 
> in-progress edit log was older than the 2 most recent checkpoints) resulting 
> in irrecoverably losing a few minutes worth of metadata].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14205) Backport HDFS-6440 to branch-2

2019-03-01 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-14205:

Attachment: HDFS-14205-branch-2.006.patch

> Backport HDFS-6440 to branch-2
> --
>
> Key: HDFS-14205
> URL: https://issues.apache.org/jira/browse/HDFS-14205
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Chen Liang
>Assignee: Chao Sun
>Priority: Major
> Attachments: HDFS-14205-branch-2.001.patch, 
> HDFS-14205-branch-2.002.patch, HDFS-14205-branch-2.003.patch, 
> HDFS-14205-branch-2.004.patch, HDFS-14205-branch-2.005.patch, 
> HDFS-14205-branch-2.006.patch
>
>
> Currently support for more than 2 NameNodes (HDFS-6440) is only in branch-3. 
> This JIRA aims to backport it to branch-2, as this is required by HDFS-12943 
> (consistent read from standby) backport to branch-2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14205) Backport HDFS-6440 to branch-2

2019-03-04 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-14205:

Attachment: HDFS-14205-branch-2.007.patch

> Backport HDFS-6440 to branch-2
> --
>
> Key: HDFS-14205
> URL: https://issues.apache.org/jira/browse/HDFS-14205
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Chen Liang
>Assignee: Chao Sun
>Priority: Major
> Attachments: HDFS-14205-branch-2.001.patch, 
> HDFS-14205-branch-2.002.patch, HDFS-14205-branch-2.003.patch, 
> HDFS-14205-branch-2.004.patch, HDFS-14205-branch-2.005.patch, 
> HDFS-14205-branch-2.006.patch, HDFS-14205-branch-2.007.patch
>
>
> Currently support for more than 2 NameNodes (HDFS-6440) is only in branch-3. 
> This JIRA aims to backport it to branch-2, as this is required by HDFS-12943 
> (consistent read from standby) backport to branch-2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14205) Backport HDFS-6440 to branch-2

2019-03-04 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16783817#comment-16783817
 ] 

Chao Sun commented on HDFS-14205:
-

Re-attach patch v6 to trigger jenkins.

> Backport HDFS-6440 to branch-2
> --
>
> Key: HDFS-14205
> URL: https://issues.apache.org/jira/browse/HDFS-14205
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Chen Liang
>Assignee: Chao Sun
>Priority: Major
> Attachments: HDFS-14205-branch-2.001.patch, 
> HDFS-14205-branch-2.002.patch, HDFS-14205-branch-2.003.patch, 
> HDFS-14205-branch-2.004.patch, HDFS-14205-branch-2.005.patch, 
> HDFS-14205-branch-2.006.patch, HDFS-14205-branch-2.007.patch
>
>
> Currently support for more than 2 NameNodes (HDFS-6440) is only in branch-3. 
> This JIRA aims to backport it to branch-2, as this is required by HDFS-12943 
> (consistent read from standby) backport to branch-2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14205) Backport HDFS-6440 to branch-2

2019-03-04 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16784124#comment-16784124
 ] 

Chao Sun commented on HDFS-14205:
-

Hmm... all the failed tests are passing locally on my laptop.. [~vagarychen]: 
if possible, could you verify the patch on you side and see if the tests are 
passing?

> Backport HDFS-6440 to branch-2
> --
>
> Key: HDFS-14205
> URL: https://issues.apache.org/jira/browse/HDFS-14205
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Chen Liang
>Assignee: Chao Sun
>Priority: Major
> Attachments: HDFS-14205-branch-2.001.patch, 
> HDFS-14205-branch-2.002.patch, HDFS-14205-branch-2.003.patch, 
> HDFS-14205-branch-2.004.patch, HDFS-14205-branch-2.005.patch, 
> HDFS-14205-branch-2.006.patch, HDFS-14205-branch-2.007.patch
>
>
> Currently support for more than 2 NameNodes (HDFS-6440) is only in branch-3. 
> This JIRA aims to backport it to branch-2, as this is required by HDFS-12943 
> (consistent read from standby) backport to branch-2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14346) EditLogTailer loses precision for sub-second edit log tailing and rolling interval

2019-03-07 Thread Chao Sun (JIRA)
Chao Sun created HDFS-14346:
---

 Summary: EditLogTailer loses precision for sub-second edit log 
tailing and rolling interval
 Key: HDFS-14346
 URL: https://issues.apache.org/jira/browse/HDFS-14346
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Chao Sun
Assignee: Chao Sun


{{EditLogTailer}} currently uses the following:
{code}
logRollPeriodMs = conf.getTimeDuration(
DFSConfigKeys.DFS_HA_LOGROLL_PERIOD_KEY,
DFSConfigKeys.DFS_HA_LOGROLL_PERIOD_DEFAULT, TimeUnit.SECONDS) * 1000;

sleepTimeMs = conf.getTimeDuration(
DFSConfigKeys.DFS_HA_TAILEDITS_PERIOD_KEY,
DFSConfigKeys.DFS_HA_TAILEDITS_PERIOD_DEFAULT, TimeUnit.SECONDS) * 1000;
{code}
to determine the edit log roll and tail frequency. However, if user specify 
sub-second frequency, such as {{100ms}}, it will lose precision and become 0s. 
This is not ideal for some scenarios such as standby reads (HDFS-12943).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14346) EditLogTailer loses precision for sub-second edit log tailing and rolling interval

2019-03-07 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16787231#comment-16787231
 ] 

Chao Sun commented on HDFS-14346:
-

Thanks [~xkrogen]! You are totally right - we need be careful to preserve this 
backward compatibility. I think the method signature you suggested should work. 
Let me try to come up with a patch based on this idea. :)

> EditLogTailer loses precision for sub-second edit log tailing and rolling 
> interval
> --
>
> Key: HDFS-14346
> URL: https://issues.apache.org/jira/browse/HDFS-14346
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Minor
>
> {{EditLogTailer}} currently uses the following:
> {code}
> logRollPeriodMs = conf.getTimeDuration(
> DFSConfigKeys.DFS_HA_LOGROLL_PERIOD_KEY,
> DFSConfigKeys.DFS_HA_LOGROLL_PERIOD_DEFAULT, TimeUnit.SECONDS) * 1000;
> sleepTimeMs = conf.getTimeDuration(
> DFSConfigKeys.DFS_HA_TAILEDITS_PERIOD_KEY,
> DFSConfigKeys.DFS_HA_TAILEDITS_PERIOD_DEFAULT, TimeUnit.SECONDS) * 
> 1000;
> {code}
> to determine the edit log roll and tail frequency. However, if user specify 
> sub-second frequency, such as {{100ms}}, it will lose precision and become 
> 0s. This is not ideal for some scenarios such as standby reads (HDFS-12943).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14205) Backport HDFS-6440 to branch-2

2019-03-07 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-14205:

Attachment: HDFS-14205-branch-2.008.patch

> Backport HDFS-6440 to branch-2
> --
>
> Key: HDFS-14205
> URL: https://issues.apache.org/jira/browse/HDFS-14205
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Chen Liang
>Assignee: Chao Sun
>Priority: Major
> Attachments: HDFS-14205-branch-2.001.patch, 
> HDFS-14205-branch-2.002.patch, HDFS-14205-branch-2.003.patch, 
> HDFS-14205-branch-2.004.patch, HDFS-14205-branch-2.005.patch, 
> HDFS-14205-branch-2.006.patch, HDFS-14205-branch-2.007.patch, 
> HDFS-14205-branch-2.008.patch
>
>
> Currently support for more than 2 NameNodes (HDFS-6440) is only in branch-3. 
> This JIRA aims to backport it to branch-2, as this is required by HDFS-12943 
> (consistent read from standby) backport to branch-2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14346) EditLogTailer loses precision for sub-second edit log tailing and rolling interval

2019-03-10 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-14346:

Attachment: HDFS-14346.000.patch

> EditLogTailer loses precision for sub-second edit log tailing and rolling 
> interval
> --
>
> Key: HDFS-14346
> URL: https://issues.apache.org/jira/browse/HDFS-14346
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Minor
> Attachments: HDFS-14346.000.patch
>
>
> {{EditLogTailer}} currently uses the following:
> {code}
> logRollPeriodMs = conf.getTimeDuration(
> DFSConfigKeys.DFS_HA_LOGROLL_PERIOD_KEY,
> DFSConfigKeys.DFS_HA_LOGROLL_PERIOD_DEFAULT, TimeUnit.SECONDS) * 1000;
> sleepTimeMs = conf.getTimeDuration(
> DFSConfigKeys.DFS_HA_TAILEDITS_PERIOD_KEY,
> DFSConfigKeys.DFS_HA_TAILEDITS_PERIOD_DEFAULT, TimeUnit.SECONDS) * 
> 1000;
> {code}
> to determine the edit log roll and tail frequency. However, if user specify 
> sub-second frequency, such as {{100ms}}, it will lose precision and become 
> 0s. This is not ideal for some scenarios such as standby reads (HDFS-12943).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14346) EditLogTailer loses precision for sub-second edit log tailing and rolling interval

2019-03-10 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-14346:

Status: Patch Available  (was: Open)

> EditLogTailer loses precision for sub-second edit log tailing and rolling 
> interval
> --
>
> Key: HDFS-14346
> URL: https://issues.apache.org/jira/browse/HDFS-14346
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Minor
> Attachments: HDFS-14346.000.patch
>
>
> {{EditLogTailer}} currently uses the following:
> {code}
> logRollPeriodMs = conf.getTimeDuration(
> DFSConfigKeys.DFS_HA_LOGROLL_PERIOD_KEY,
> DFSConfigKeys.DFS_HA_LOGROLL_PERIOD_DEFAULT, TimeUnit.SECONDS) * 1000;
> sleepTimeMs = conf.getTimeDuration(
> DFSConfigKeys.DFS_HA_TAILEDITS_PERIOD_KEY,
> DFSConfigKeys.DFS_HA_TAILEDITS_PERIOD_DEFAULT, TimeUnit.SECONDS) * 
> 1000;
> {code}
> to determine the edit log roll and tail frequency. However, if user specify 
> sub-second frequency, such as {{100ms}}, it will lose precision and become 
> 0s. This is not ideal for some scenarios such as standby reads (HDFS-12943).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14205) Backport HDFS-6440 to branch-2

2019-03-10 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-14205:

Attachment: HDFS-14205-branch-2.009.patch

> Backport HDFS-6440 to branch-2
> --
>
> Key: HDFS-14205
> URL: https://issues.apache.org/jira/browse/HDFS-14205
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Chen Liang
>Assignee: Chao Sun
>Priority: Major
> Attachments: HDFS-14205-branch-2.001.patch, 
> HDFS-14205-branch-2.002.patch, HDFS-14205-branch-2.003.patch, 
> HDFS-14205-branch-2.004.patch, HDFS-14205-branch-2.005.patch, 
> HDFS-14205-branch-2.006.patch, HDFS-14205-branch-2.007.patch, 
> HDFS-14205-branch-2.008.patch, HDFS-14205-branch-2.009.patch
>
>
> Currently support for more than 2 NameNodes (HDFS-6440) is only in branch-3. 
> This JIRA aims to backport it to branch-2, as this is required by HDFS-12943 
> (consistent read from standby) backport to branch-2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14346) Better time precision in getTimeDuration

2019-03-11 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-14346:

Description: 
Currently, {{Configuration#getTimeDuration}} has the following signature:
{code}
  /**
   * Return time duration in the given time unit. Valid units are encoded in
   * properties as suffixes: nanoseconds (ns), microseconds (us), milliseconds
   * (ms), seconds (s), minutes (m), hours (h), and days (d).
   * @param name Property name
   * @param defaultValue Value returned if no mapping exists.
   * @param unit Unit to convert the stored property, if it exists.
   * @throws NumberFormatException If the property stripped of its unit is not
   * a number
   */
  public long getTimeDuration(String name, long defaultValue, TimeUnit unit)
{code}

This may lose precision in case the default time unit is larger than the time 
unit that the configuration value is converted to in the call sites of this 
method. For instance, in {{EditLogTailer}} this method is used in the following 
manner:

{code}
logRollPeriodMs = conf.getTimeDuration(
DFSConfigKeys.DFS_HA_LOGROLL_PERIOD_KEY,
DFSConfigKeys.DFS_HA_LOGROLL_PERIOD_DEFAULT, TimeUnit.SECONDS) * 1000;

sleepTimeMs = conf.getTimeDuration(
DFSConfigKeys.DFS_HA_TAILEDITS_PERIOD_KEY,
DFSConfigKeys.DFS_HA_TAILEDITS_PERIOD_DEFAULT, TimeUnit.SECONDS) * 1000;
{code}

in both cases, the default time unit is second, and the configuration value is 
converted into milli-seconds. Precision is lost when people want to specify 
sub-second time duration such as {{100ms}}, which will be converted to {{0ms}}.


  was:
{{EditLogTailer}} currently uses the following:
{code}
logRollPeriodMs = conf.getTimeDuration(
DFSConfigKeys.DFS_HA_LOGROLL_PERIOD_KEY,
DFSConfigKeys.DFS_HA_LOGROLL_PERIOD_DEFAULT, TimeUnit.SECONDS) * 1000;

sleepTimeMs = conf.getTimeDuration(
DFSConfigKeys.DFS_HA_TAILEDITS_PERIOD_KEY,
DFSConfigKeys.DFS_HA_TAILEDITS_PERIOD_DEFAULT, TimeUnit.SECONDS) * 1000;
{code}
to determine the edit log roll and tail frequency. However, if user specify 
sub-second frequency, such as {{100ms}}, it will lose precision and become 0s. 
This is not ideal for some scenarios such as standby reads (HDFS-12943).

 Issue Type: Improvement  (was: Bug)
Summary: Better time precision in getTimeDuration  (was: EditLogTailer 
loses precision for sub-second edit log tailing and rolling interval)

> Better time precision in getTimeDuration
> 
>
> Key: HDFS-14346
> URL: https://issues.apache.org/jira/browse/HDFS-14346
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Minor
> Attachments: HDFS-14346.000.patch
>
>
> Currently, {{Configuration#getTimeDuration}} has the following signature:
> {code}
>   /**
>* Return time duration in the given time unit. Valid units are encoded in
>* properties as suffixes: nanoseconds (ns), microseconds (us), milliseconds
>* (ms), seconds (s), minutes (m), hours (h), and days (d).
>* @param name Property name
>* @param defaultValue Value returned if no mapping exists.
>* @param unit Unit to convert the stored property, if it exists.
>* @throws NumberFormatException If the property stripped of its unit is not
>* a number
>*/
>   public long getTimeDuration(String name, long defaultValue, TimeUnit unit)
> {code}
> This may lose precision in case the default time unit is larger than the time 
> unit that the configuration value is converted to in the call sites of this 
> method. For instance, in {{EditLogTailer}} this method is used in the 
> following manner:
> {code}
> logRollPeriodMs = conf.getTimeDuration(
> DFSConfigKeys.DFS_HA_LOGROLL_PERIOD_KEY,
> DFSConfigKeys.DFS_HA_LOGROLL_PERIOD_DEFAULT, TimeUnit.SECONDS) * 1000;
> sleepTimeMs = conf.getTimeDuration(
> DFSConfigKeys.DFS_HA_TAILEDITS_PERIOD_KEY,
> DFSConfigKeys.DFS_HA_TAILEDITS_PERIOD_DEFAULT, TimeUnit.SECONDS) * 
> 1000;
> {code}
> in both cases, the default time unit is second, and the configuration value 
> is converted into milli-seconds. Precision is lost when people want to 
> specify sub-second time duration such as {{100ms}}, which will be converted 
> to {{0ms}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14346) Better time precision in getTimeDuration

2019-03-11 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16789787#comment-16789787
 ] 

Chao Sun commented on HDFS-14346:
-

Thanks [~xkrogen] for the reviewing! changed the title and description.

Regarding the added method, it is currently only used by the method:
{code:java}
public long getTimeDuration(String name, long defaultValue, TimeUnit unit)
{code}
which simply delegate to the former.

It may also be useful when people uses a string to specify the default time 
duration. Taking the following config as example:
{code:java}
  public static final String DFS_NAMENODE_REENCRYPT_SLEEP_INTERVAL_DEFAULT = 
"1m";
{code}
which is used by the following code:
{code:java}
this.interval =
conf.getTimeDuration(DFS_NAMENODE_REENCRYPT_SLEEP_INTERVAL_KEY,
DFS_NAMENODE_REENCRYPT_SLEEP_INTERVAL_DEFAULT,
TimeUnit.MILLISECONDS);
{code}
If someone (perhaps accidentally) set {{"10"}} for this config, we may want to 
use {{TimeUnit.MINUTE}} for the {{defaultUnit}} while {{TimeUnit.MILLISECONDS}} 
for the {{returnUnit}}.

 

> Better time precision in getTimeDuration
> 
>
> Key: HDFS-14346
> URL: https://issues.apache.org/jira/browse/HDFS-14346
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Minor
> Attachments: HDFS-14346.000.patch
>
>
> Currently, {{Configuration#getTimeDuration}} has the following signature:
> {code}
>   /**
>* Return time duration in the given time unit. Valid units are encoded in
>* properties as suffixes: nanoseconds (ns), microseconds (us), milliseconds
>* (ms), seconds (s), minutes (m), hours (h), and days (d).
>* @param name Property name
>* @param defaultValue Value returned if no mapping exists.
>* @param unit Unit to convert the stored property, if it exists.
>* @throws NumberFormatException If the property stripped of its unit is not
>* a number
>*/
>   public long getTimeDuration(String name, long defaultValue, TimeUnit unit)
> {code}
> This may lose precision in case the default time unit is larger than the time 
> unit that the configuration value is converted to in the call sites of this 
> method. For instance, in {{EditLogTailer}} this method is used in the 
> following manner:
> {code}
> logRollPeriodMs = conf.getTimeDuration(
> DFSConfigKeys.DFS_HA_LOGROLL_PERIOD_KEY,
> DFSConfigKeys.DFS_HA_LOGROLL_PERIOD_DEFAULT, TimeUnit.SECONDS) * 1000;
> sleepTimeMs = conf.getTimeDuration(
> DFSConfigKeys.DFS_HA_TAILEDITS_PERIOD_KEY,
> DFSConfigKeys.DFS_HA_TAILEDITS_PERIOD_DEFAULT, TimeUnit.SECONDS) * 
> 1000;
> {code}
> in both cases, the default time unit is second, and the configuration value 
> is converted into milli-seconds. Precision is lost when people want to 
> specify sub-second time duration such as {{100ms}}, which will be converted 
> to {{0ms}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14205) Backport HDFS-6440 to branch-2

2019-03-11 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16789807#comment-16789807
 ] 

Chao Sun commented on HDFS-14205:
-

Turned out I need to generate the patch using {{--binary}} flag. There are 
still two failed tests in the latest run. However, they do not seem related to 
the patch: {{TestWebHdfsTimeouts}} is a known flaky test, while 
{{TestJournalNodeRespectsBindHostKeys}} failed due to:
{code}
java.io.FileNotFoundException: /home/jenkins/.keystore (No such file or 
directory)
{code}

I tested latter locally and it passed.

> Backport HDFS-6440 to branch-2
> --
>
> Key: HDFS-14205
> URL: https://issues.apache.org/jira/browse/HDFS-14205
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Chen Liang
>Assignee: Chao Sun
>Priority: Major
> Attachments: HDFS-14205-branch-2.001.patch, 
> HDFS-14205-branch-2.002.patch, HDFS-14205-branch-2.003.patch, 
> HDFS-14205-branch-2.004.patch, HDFS-14205-branch-2.005.patch, 
> HDFS-14205-branch-2.006.patch, HDFS-14205-branch-2.007.patch, 
> HDFS-14205-branch-2.008.patch, HDFS-14205-branch-2.009.patch
>
>
> Currently support for more than 2 NameNodes (HDFS-6440) is only in branch-3. 
> This JIRA aims to backport it to branch-2, as this is required by HDFS-12943 
> (consistent read from standby) backport to branch-2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14211) [Consistent Observer Reads] Allow for configurable "always msync" mode

2019-03-11 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16789811#comment-16789811
 ] 

Chao Sun commented on HDFS-14211:
-

One potential downside with this approach, IMO, is that {{msync}} still has to 
go through the RPC queue on the active NN. In a busy cluster this could impact 
the read-only performance. For instance, in our environment the RPC queue time 
in observer nodes is at least 10X lower than that from the active NN. This is 
also one major motivation for us to use observer for Presto workloads.

> [Consistent Observer Reads] Allow for configurable "always msync" mode
> --
>
> Key: HDFS-14211
> URL: https://issues.apache.org/jira/browse/HDFS-14211
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Attachments: HDFS-14211.000.patch
>
>
> To allow for reads to be serviced from an ObserverNode (see HDFS-12943) in a 
> consistent way, an {{msync}} API was introduced (HDFS-13688) to allow for a 
> client to fetch the latest transaction ID from the Active NN, thereby 
> ensuring that subsequent reads from the ObserverNode will be up-to-date with 
> the current state of the Active.
> Using this properly, however, requires application-side changes: for 
> examples, a NodeManager should call {{msync}} before localizing the resources 
> for a client, since it received notification of the existence of those 
> resources via communicate which is out-of-band to HDFS and thus could 
> potentially attempt to localize them prior to the availability of those 
> resources on the ObserverNode.
> Until such application-side changes can be made, which will be a longer-term 
> effort, we need to provide a mechanism for unchanged clients to utilize the 
> ObserverNode without exposing such a client to inconsistencies. This is 
> essentially phase 3 of the roadmap outlined in the [design 
> document|https://issues.apache.org/jira/secure/attachment/12915990/ConsistentReadsFromStandbyNode.pdf]
>  for HDFS-12943.
> The design document proposes some heuristics based on understanding of how 
> common applications (e.g. MR) use HDFS for resources. As an initial pass, we 
> can simply have a flag which tells a client to call {{msync}} before _every 
> single_ read operation. This may seem counterintuitive, as it turns every 
> read operation into two RPCs: {{msync}} to the Active following by an actual 
> read operation to the Observer. However, the {{msync}} operation is extremely 
> lightweight, as it does not acquire the {{FSNamesystemLock}}, and in 
> experiments we have found that this approach can easily scale to well over 
> 100,000 {{msync}} operations per second on the Active (while still servicing 
> approx. 10,000 write op/s). Combined with the fast-path edit log tailing for 
> standby/observer nodes (HDFS-13150), this "always msync" approach should 
> introduce only a few ms of extra latency to each read call.
> Below are some experimental results collected from experiments which convert 
> a normal RPC workload into one in which all read operations are turned into 
> an {{msync}}. The baseline is a workload of 1.5k write op/s and 25k read op/s.
> ||Rate Multiplier|2|4|6|8||
> ||RPC Queue Avg Time (ms)|14|53|110|125||
> ||RPC Queue NumOps Avg (k)|51|102|147|177||
> ||RPC Queue NumOps Max (k)|148|269|306|312||
> _(numbers are approximate and should be viewed primarily for their trends)_
> Results are promising up to between 4x and 6x of the baseline workload, which 
> is approx. 100-150k read op/s.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14366) Improve HDFS append performance

2019-03-12 Thread Chao Sun (JIRA)
Chao Sun created HDFS-14366:
---

 Summary: Improve HDFS append performance
 Key: HDFS-14366
 URL: https://issues.apache.org/jira/browse/HDFS-14366
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs
Reporter: Chao Sun
Assignee: Chao Sun


In our HDFS cluster we observed that {{append}} operation can take as much as 
10X write lock time than other write operations. By collecting flamegraph on 
the namenode (see attachment), we found that most of the append call is spent 
on {{getNumLiveDataNodes()}}:

{code}
  /** @return the number of live datanodes. */
  public int getNumLiveDataNodes() {
int numLive = 0;
synchronized (this) {
  for(DatanodeDescriptor dn : datanodeMap.values()) {
if (!isDatanodeDead(dn) ) {
  numLive++;
}
  }
}
return numLive;
  }
{code}
this method synchronizes on the {{DatanodeManager}} which is particularly 
expensive in large clusters since {{datanodeMap}} is being modified in many 
places such as processing DN heartbeats.

For {{append}} operation, {{getNumLiveDataNodes()}} is invoked in 
{{isSufficientlyReplicated}}:
{code}
  /**
   * Check if a block is replicated to at least the minimum replication.
   */
  public boolean isSufficientlyReplicated(BlockInfo b) {
// Compare against the lesser of the minReplication and number of live DNs.
final int replication =
Math.min(minReplication, getDatanodeManager().getNumLiveDataNodes());
return countNodes(b).liveReplicas() >= replication;
  }
{code}

The way that the {{replication}} is calculated is not very optimal, as it will 
call {{getNumLiveDataNodes()}} every time even though usually 
{{minReplication}} is much smaller than the latter. 




 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14366) Improve HDFS append performance

2019-03-12 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-14366:

Attachment: append-flamegraph.png

> Improve HDFS append performance
> ---
>
> Key: HDFS-14366
> URL: https://issues.apache.org/jira/browse/HDFS-14366
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
> Attachments: append-flamegraph.png
>
>
> In our HDFS cluster we observed that {{append}} operation can take as much as 
> 10X write lock time than other write operations. By collecting flamegraph on 
> the namenode (see attachment), we found that most of the append call is spent 
> on {{getNumLiveDataNodes()}}:
> {code}
>   /** @return the number of live datanodes. */
>   public int getNumLiveDataNodes() {
> int numLive = 0;
> synchronized (this) {
>   for(DatanodeDescriptor dn : datanodeMap.values()) {
> if (!isDatanodeDead(dn) ) {
>   numLive++;
> }
>   }
> }
> return numLive;
>   }
> {code}
> this method synchronizes on the {{DatanodeManager}} which is particularly 
> expensive in large clusters since {{datanodeMap}} is being modified in many 
> places such as processing DN heartbeats.
> For {{append}} operation, {{getNumLiveDataNodes()}} is invoked in 
> {{isSufficientlyReplicated}}:
> {code}
>   /**
>* Check if a block is replicated to at least the minimum replication.
>*/
>   public boolean isSufficientlyReplicated(BlockInfo b) {
> // Compare against the lesser of the minReplication and number of live 
> DNs.
> final int replication =
> Math.min(minReplication, getDatanodeManager().getNumLiveDataNodes());
> return countNodes(b).liveReplicas() >= replication;
>   }
> {code}
> The way that the {{replication}} is calculated is not very optimal, as it 
> will call {{getNumLiveDataNodes()}} every time even though usually 
> {{minReplication}} is much smaller than the latter. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14366) Improve HDFS append performance

2019-03-12 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-14366:

Description: 
In our HDFS cluster we observed that {{append}} operation can take as much as 
10X write lock time than other write operations. By collecting flamegraph on 
the namenode (see attachment: append-flamegraph.png), we found that most of the 
append call is spent on {{getNumLiveDataNodes()}}:

{code}
  /** @return the number of live datanodes. */
  public int getNumLiveDataNodes() {
int numLive = 0;
synchronized (this) {
  for(DatanodeDescriptor dn : datanodeMap.values()) {
if (!isDatanodeDead(dn) ) {
  numLive++;
}
  }
}
return numLive;
  }
{code}
this method synchronizes on the {{DatanodeManager}} which is particularly 
expensive in large clusters since {{datanodeMap}} is being modified in many 
places such as processing DN heartbeats.

For {{append}} operation, {{getNumLiveDataNodes()}} is invoked in 
{{isSufficientlyReplicated}}:
{code}
  /**
   * Check if a block is replicated to at least the minimum replication.
   */
  public boolean isSufficientlyReplicated(BlockInfo b) {
// Compare against the lesser of the minReplication and number of live DNs.
final int replication =
Math.min(minReplication, getDatanodeManager().getNumLiveDataNodes());
return countNodes(b).liveReplicas() >= replication;
  }
{code}

The way that the {{replication}} is calculated is not very optimal, as it will 
call {{getNumLiveDataNodes()}} every time even though usually 
{{minReplication}} is much smaller than the latter. 




 

  was:
In our HDFS cluster we observed that {{append}} operation can take as much as 
10X write lock time than other write operations. By collecting flamegraph on 
the namenode (see attachment), we found that most of the append call is spent 
on {{getNumLiveDataNodes()}}:

{code}
  /** @return the number of live datanodes. */
  public int getNumLiveDataNodes() {
int numLive = 0;
synchronized (this) {
  for(DatanodeDescriptor dn : datanodeMap.values()) {
if (!isDatanodeDead(dn) ) {
  numLive++;
}
  }
}
return numLive;
  }
{code}
this method synchronizes on the {{DatanodeManager}} which is particularly 
expensive in large clusters since {{datanodeMap}} is being modified in many 
places such as processing DN heartbeats.

For {{append}} operation, {{getNumLiveDataNodes()}} is invoked in 
{{isSufficientlyReplicated}}:
{code}
  /**
   * Check if a block is replicated to at least the minimum replication.
   */
  public boolean isSufficientlyReplicated(BlockInfo b) {
// Compare against the lesser of the minReplication and number of live DNs.
final int replication =
Math.min(minReplication, getDatanodeManager().getNumLiveDataNodes());
return countNodes(b).liveReplicas() >= replication;
  }
{code}

The way that the {{replication}} is calculated is not very optimal, as it will 
call {{getNumLiveDataNodes()}} every time even though usually 
{{minReplication}} is much smaller than the latter. 




 


> Improve HDFS append performance
> ---
>
> Key: HDFS-14366
> URL: https://issues.apache.org/jira/browse/HDFS-14366
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
> Attachments: append-flamegraph.png
>
>
> In our HDFS cluster we observed that {{append}} operation can take as much as 
> 10X write lock time than other write operations. By collecting flamegraph on 
> the namenode (see attachment: append-flamegraph.png), we found that most of 
> the append call is spent on {{getNumLiveDataNodes()}}:
> {code}
>   /** @return the number of live datanodes. */
>   public int getNumLiveDataNodes() {
> int numLive = 0;
> synchronized (this) {
>   for(DatanodeDescriptor dn : datanodeMap.values()) {
> if (!isDatanodeDead(dn) ) {
>   numLive++;
> }
>   }
> }
> return numLive;
>   }
> {code}
> this method synchronizes on the {{DatanodeManager}} which is particularly 
> expensive in large clusters since {{datanodeMap}} is being modified in many 
> places such as processing DN heartbeats.
> For {{append}} operation, {{getNumLiveDataNodes()}} is invoked in 
> {{isSufficientlyReplicated}}:
> {code}
>   /**
>* Check if a block is replicated to at least the minimum replication.
>*/
>   public boolean isSufficientlyReplicated(BlockInfo b) {
> // Compare against the lesser of the minReplication and number of live 
> DNs.
> final int replication =
> Math.min(minReplication, getDatanodeManager().getNumLiveDataNodes());
> return countNodes(b).liveReplicas() >= replication;
>   }
> {code}
> The way that the {{replication}} is calcu

[jira] [Updated] (HDFS-14366) Improve HDFS append performance

2019-03-12 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-14366:

Description: 
In our HDFS cluster we observed that {{append}} operation can take as much as 
10X write lock time than other write operations. By collecting flamegraph on 
the namenode (see attachment: append-flamegraph.png), we found that most of the 
append call is spent on {{getNumLiveDataNodes()}}:

{code}
  /** @return the number of live datanodes. */
  public int getNumLiveDataNodes() {
int numLive = 0;
synchronized (this) {
  for(DatanodeDescriptor dn : datanodeMap.values()) {
if (!isDatanodeDead(dn) ) {
  numLive++;
}
  }
}
return numLive;
  }
{code}
this method synchronizes on the {{DatanodeManager}} which is particularly 
expensive in large clusters since {{datanodeMap}} is being modified in many 
places such as processing DN heartbeats.

For {{append}} operation, {{getNumLiveDataNodes()}} is invoked in 
{{isSufficientlyReplicated}}:
{code}
  /**
   * Check if a block is replicated to at least the minimum replication.
   */
  public boolean isSufficientlyReplicated(BlockInfo b) {
// Compare against the lesser of the minReplication and number of live DNs.
final int replication =
Math.min(minReplication, getDatanodeManager().getNumLiveDataNodes());
return countNodes(b).liveReplicas() >= replication;
  }
{code}

The way that the {{replication}} is calculated is not very optimal, as it will 
call {{getNumLiveDataNodes()}} _every time_ even though usually 
{{minReplication}} is much smaller than the latter. 




 

  was:
In our HDFS cluster we observed that {{append}} operation can take as much as 
10X write lock time than other write operations. By collecting flamegraph on 
the namenode (see attachment: append-flamegraph.png), we found that most of the 
append call is spent on {{getNumLiveDataNodes()}}:

{code}
  /** @return the number of live datanodes. */
  public int getNumLiveDataNodes() {
int numLive = 0;
synchronized (this) {
  for(DatanodeDescriptor dn : datanodeMap.values()) {
if (!isDatanodeDead(dn) ) {
  numLive++;
}
  }
}
return numLive;
  }
{code}
this method synchronizes on the {{DatanodeManager}} which is particularly 
expensive in large clusters since {{datanodeMap}} is being modified in many 
places such as processing DN heartbeats.

For {{append}} operation, {{getNumLiveDataNodes()}} is invoked in 
{{isSufficientlyReplicated}}:
{code}
  /**
   * Check if a block is replicated to at least the minimum replication.
   */
  public boolean isSufficientlyReplicated(BlockInfo b) {
// Compare against the lesser of the minReplication and number of live DNs.
final int replication =
Math.min(minReplication, getDatanodeManager().getNumLiveDataNodes());
return countNodes(b).liveReplicas() >= replication;
  }
{code}

The way that the {{replication}} is calculated is not very optimal, as it will 
call {{getNumLiveDataNodes()}} every time even though usually 
{{minReplication}} is much smaller than the latter. 




 


> Improve HDFS append performance
> ---
>
> Key: HDFS-14366
> URL: https://issues.apache.org/jira/browse/HDFS-14366
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
> Attachments: append-flamegraph.png
>
>
> In our HDFS cluster we observed that {{append}} operation can take as much as 
> 10X write lock time than other write operations. By collecting flamegraph on 
> the namenode (see attachment: append-flamegraph.png), we found that most of 
> the append call is spent on {{getNumLiveDataNodes()}}:
> {code}
>   /** @return the number of live datanodes. */
>   public int getNumLiveDataNodes() {
> int numLive = 0;
> synchronized (this) {
>   for(DatanodeDescriptor dn : datanodeMap.values()) {
> if (!isDatanodeDead(dn) ) {
>   numLive++;
> }
>   }
> }
> return numLive;
>   }
> {code}
> this method synchronizes on the {{DatanodeManager}} which is particularly 
> expensive in large clusters since {{datanodeMap}} is being modified in many 
> places such as processing DN heartbeats.
> For {{append}} operation, {{getNumLiveDataNodes()}} is invoked in 
> {{isSufficientlyReplicated}}:
> {code}
>   /**
>* Check if a block is replicated to at least the minimum replication.
>*/
>   public boolean isSufficientlyReplicated(BlockInfo b) {
> // Compare against the lesser of the minReplication and number of live 
> DNs.
> final int replication =
> Math.min(minReplication, getDatanodeManager().getNumLiveDataNodes());
> return countNodes(b).liveReplicas() >= replication;
>   }
> {code}
> The way that the

[jira] [Updated] (HDFS-14366) Improve HDFS append performance

2019-03-12 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-14366:

Status: Patch Available  (was: Open)

> Improve HDFS append performance
> ---
>
> Key: HDFS-14366
> URL: https://issues.apache.org/jira/browse/HDFS-14366
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
> Attachments: HDFS-14366.000.patch, append-flamegraph.png
>
>
> In our HDFS cluster we observed that {{append}} operation can take as much as 
> 10X write lock time than other write operations. By collecting flamegraph on 
> the namenode (see attachment: append-flamegraph.png), we found that most of 
> the append call is spent on {{getNumLiveDataNodes()}}:
> {code}
>   /** @return the number of live datanodes. */
>   public int getNumLiveDataNodes() {
> int numLive = 0;
> synchronized (this) {
>   for(DatanodeDescriptor dn : datanodeMap.values()) {
> if (!isDatanodeDead(dn) ) {
>   numLive++;
> }
>   }
> }
> return numLive;
>   }
> {code}
> this method synchronizes on the {{DatanodeManager}} which is particularly 
> expensive in large clusters since {{datanodeMap}} is being modified in many 
> places such as processing DN heartbeats.
> For {{append}} operation, {{getNumLiveDataNodes()}} is invoked in 
> {{isSufficientlyReplicated}}:
> {code}
>   /**
>* Check if a block is replicated to at least the minimum replication.
>*/
>   public boolean isSufficientlyReplicated(BlockInfo b) {
> // Compare against the lesser of the minReplication and number of live 
> DNs.
> final int replication =
> Math.min(minReplication, getDatanodeManager().getNumLiveDataNodes());
> return countNodes(b).liveReplicas() >= replication;
>   }
> {code}
> The way that the {{replication}} is calculated is not very optimal, as it 
> will call {{getNumLiveDataNodes()}} _every time_ even though usually 
> {{minReplication}} is much smaller than the latter. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14366) Improve HDFS append performance

2019-03-12 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-14366:

Attachment: HDFS-14366.000.patch

> Improve HDFS append performance
> ---
>
> Key: HDFS-14366
> URL: https://issues.apache.org/jira/browse/HDFS-14366
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
> Attachments: HDFS-14366.000.patch, append-flamegraph.png
>
>
> In our HDFS cluster we observed that {{append}} operation can take as much as 
> 10X write lock time than other write operations. By collecting flamegraph on 
> the namenode (see attachment: append-flamegraph.png), we found that most of 
> the append call is spent on {{getNumLiveDataNodes()}}:
> {code}
>   /** @return the number of live datanodes. */
>   public int getNumLiveDataNodes() {
> int numLive = 0;
> synchronized (this) {
>   for(DatanodeDescriptor dn : datanodeMap.values()) {
> if (!isDatanodeDead(dn) ) {
>   numLive++;
> }
>   }
> }
> return numLive;
>   }
> {code}
> this method synchronizes on the {{DatanodeManager}} which is particularly 
> expensive in large clusters since {{datanodeMap}} is being modified in many 
> places such as processing DN heartbeats.
> For {{append}} operation, {{getNumLiveDataNodes()}} is invoked in 
> {{isSufficientlyReplicated}}:
> {code}
>   /**
>* Check if a block is replicated to at least the minimum replication.
>*/
>   public boolean isSufficientlyReplicated(BlockInfo b) {
> // Compare against the lesser of the minReplication and number of live 
> DNs.
> final int replication =
> Math.min(minReplication, getDatanodeManager().getNumLiveDataNodes());
> return countNodes(b).liveReplicas() >= replication;
>   }
> {code}
> The way that the {{replication}} is calculated is not very optimal, as it 
> will call {{getNumLiveDataNodes()}} _every time_ even though usually 
> {{minReplication}} is much smaller than the latter. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14366) Improve HDFS append performance

2019-03-12 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16791054#comment-16791054
 ] 

Chao Sun commented on HDFS-14366:
-

Attach patch v0 which only calls {{getNumLiveDataNodes}} if the # of live 
replicas is less than the minimum replication. I think this should solve most 
of the issues. Ideally though, we should make {{getNumLiveDataNodes}} lock free 
and cheap but that may be a more involved change - perhaps we can do that in a 
separate JIRA?

> Improve HDFS append performance
> ---
>
> Key: HDFS-14366
> URL: https://issues.apache.org/jira/browse/HDFS-14366
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
> Attachments: HDFS-14366.000.patch, append-flamegraph.png
>
>
> In our HDFS cluster we observed that {{append}} operation can take as much as 
> 10X write lock time than other write operations. By collecting flamegraph on 
> the namenode (see attachment: append-flamegraph.png), we found that most of 
> the append call is spent on {{getNumLiveDataNodes()}}:
> {code}
>   /** @return the number of live datanodes. */
>   public int getNumLiveDataNodes() {
> int numLive = 0;
> synchronized (this) {
>   for(DatanodeDescriptor dn : datanodeMap.values()) {
> if (!isDatanodeDead(dn) ) {
>   numLive++;
> }
>   }
> }
> return numLive;
>   }
> {code}
> this method synchronizes on the {{DatanodeManager}} which is particularly 
> expensive in large clusters since {{datanodeMap}} is being modified in many 
> places such as processing DN heartbeats.
> For {{append}} operation, {{getNumLiveDataNodes()}} is invoked in 
> {{isSufficientlyReplicated}}:
> {code}
>   /**
>* Check if a block is replicated to at least the minimum replication.
>*/
>   public boolean isSufficientlyReplicated(BlockInfo b) {
> // Compare against the lesser of the minReplication and number of live 
> DNs.
> final int replication =
> Math.min(minReplication, getDatanodeManager().getNumLiveDataNodes());
> return countNodes(b).liveReplicas() >= replication;
>   }
> {code}
> The way that the {{replication}} is calculated is not very optimal, as it 
> will call {{getNumLiveDataNodes()}} _every time_ even though usually 
> {{minReplication}} is much smaller than the latter. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14366) Improve HDFS append performance

2019-03-12 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-14366:

Affects Version/s: 2.8.2

> Improve HDFS append performance
> ---
>
> Key: HDFS-14366
> URL: https://issues.apache.org/jira/browse/HDFS-14366
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 2.8.2
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
> Attachments: HDFS-14366.000.patch, append-flamegraph.png
>
>
> In our HDFS cluster we observed that {{append}} operation can take as much as 
> 10X write lock time than other write operations. By collecting flamegraph on 
> the namenode (see attachment: append-flamegraph.png), we found that most of 
> the append call is spent on {{getNumLiveDataNodes()}}:
> {code}
>   /** @return the number of live datanodes. */
>   public int getNumLiveDataNodes() {
> int numLive = 0;
> synchronized (this) {
>   for(DatanodeDescriptor dn : datanodeMap.values()) {
> if (!isDatanodeDead(dn) ) {
>   numLive++;
> }
>   }
> }
> return numLive;
>   }
> {code}
> this method synchronizes on the {{DatanodeManager}} which is particularly 
> expensive in large clusters since {{datanodeMap}} is being modified in many 
> places such as processing DN heartbeats.
> For {{append}} operation, {{getNumLiveDataNodes()}} is invoked in 
> {{isSufficientlyReplicated}}:
> {code}
>   /**
>* Check if a block is replicated to at least the minimum replication.
>*/
>   public boolean isSufficientlyReplicated(BlockInfo b) {
> // Compare against the lesser of the minReplication and number of live 
> DNs.
> final int replication =
> Math.min(minReplication, getDatanodeManager().getNumLiveDataNodes());
> return countNodes(b).liveReplicas() >= replication;
>   }
> {code}
> The way that the {{replication}} is calculated is not very optimal, as it 
> will call {{getNumLiveDataNodes()}} _every time_ even though usually 
> {{minReplication}} is much smaller than the latter. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14366) Improve HDFS append performance

2019-03-12 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-14366:

Attachment: (was: HDFS-14366.000.patch)

> Improve HDFS append performance
> ---
>
> Key: HDFS-14366
> URL: https://issues.apache.org/jira/browse/HDFS-14366
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 2.8.2
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
> Attachments: HDFS-14366.000.patch, append-flamegraph.png
>
>
> In our HDFS cluster we observed that {{append}} operation can take as much as 
> 10X write lock time than other write operations. By collecting flamegraph on 
> the namenode (see attachment: append-flamegraph.png), we found that most of 
> the append call is spent on {{getNumLiveDataNodes()}}:
> {code}
>   /** @return the number of live datanodes. */
>   public int getNumLiveDataNodes() {
> int numLive = 0;
> synchronized (this) {
>   for(DatanodeDescriptor dn : datanodeMap.values()) {
> if (!isDatanodeDead(dn) ) {
>   numLive++;
> }
>   }
> }
> return numLive;
>   }
> {code}
> this method synchronizes on the {{DatanodeManager}} which is particularly 
> expensive in large clusters since {{datanodeMap}} is being modified in many 
> places such as processing DN heartbeats.
> For {{append}} operation, {{getNumLiveDataNodes()}} is invoked in 
> {{isSufficientlyReplicated}}:
> {code}
>   /**
>* Check if a block is replicated to at least the minimum replication.
>*/
>   public boolean isSufficientlyReplicated(BlockInfo b) {
> // Compare against the lesser of the minReplication and number of live 
> DNs.
> final int replication =
> Math.min(minReplication, getDatanodeManager().getNumLiveDataNodes());
> return countNodes(b).liveReplicas() >= replication;
>   }
> {code}
> The way that the {{replication}} is calculated is not very optimal, as it 
> will call {{getNumLiveDataNodes()}} _every time_ even though usually 
> {{minReplication}} is much smaller than the latter. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14366) Improve HDFS append performance

2019-03-12 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-14366:

Attachment: HDFS-14366.000.patch

> Improve HDFS append performance
> ---
>
> Key: HDFS-14366
> URL: https://issues.apache.org/jira/browse/HDFS-14366
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 2.8.2
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
> Attachments: HDFS-14366.000.patch, append-flamegraph.png
>
>
> In our HDFS cluster we observed that {{append}} operation can take as much as 
> 10X write lock time than other write operations. By collecting flamegraph on 
> the namenode (see attachment: append-flamegraph.png), we found that most of 
> the append call is spent on {{getNumLiveDataNodes()}}:
> {code}
>   /** @return the number of live datanodes. */
>   public int getNumLiveDataNodes() {
> int numLive = 0;
> synchronized (this) {
>   for(DatanodeDescriptor dn : datanodeMap.values()) {
> if (!isDatanodeDead(dn) ) {
>   numLive++;
> }
>   }
> }
> return numLive;
>   }
> {code}
> this method synchronizes on the {{DatanodeManager}} which is particularly 
> expensive in large clusters since {{datanodeMap}} is being modified in many 
> places such as processing DN heartbeats.
> For {{append}} operation, {{getNumLiveDataNodes()}} is invoked in 
> {{isSufficientlyReplicated}}:
> {code}
>   /**
>* Check if a block is replicated to at least the minimum replication.
>*/
>   public boolean isSufficientlyReplicated(BlockInfo b) {
> // Compare against the lesser of the minReplication and number of live 
> DNs.
> final int replication =
> Math.min(minReplication, getDatanodeManager().getNumLiveDataNodes());
> return countNodes(b).liveReplicas() >= replication;
>   }
> {code}
> The way that the {{replication}} is calculated is not very optimal, as it 
> will call {{getNumLiveDataNodes()}} _every time_ even though usually 
> {{minReplication}} is much smaller than the latter. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14346) Better time precision in getTimeDuration

2019-03-12 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-14346:

Attachment: HDFS-14346.001.patch

> Better time precision in getTimeDuration
> 
>
> Key: HDFS-14346
> URL: https://issues.apache.org/jira/browse/HDFS-14346
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Minor
> Attachments: HDFS-14346.000.patch, HDFS-14346.001.patch
>
>
> Currently, {{Configuration#getTimeDuration}} has the following signature:
> {code}
>   /**
>* Return time duration in the given time unit. Valid units are encoded in
>* properties as suffixes: nanoseconds (ns), microseconds (us), milliseconds
>* (ms), seconds (s), minutes (m), hours (h), and days (d).
>* @param name Property name
>* @param defaultValue Value returned if no mapping exists.
>* @param unit Unit to convert the stored property, if it exists.
>* @throws NumberFormatException If the property stripped of its unit is not
>* a number
>*/
>   public long getTimeDuration(String name, long defaultValue, TimeUnit unit)
> {code}
> This may lose precision in case the default time unit is larger than the time 
> unit that the configuration value is converted to in the call sites of this 
> method. For instance, in {{EditLogTailer}} this method is used in the 
> following manner:
> {code}
> logRollPeriodMs = conf.getTimeDuration(
> DFSConfigKeys.DFS_HA_LOGROLL_PERIOD_KEY,
> DFSConfigKeys.DFS_HA_LOGROLL_PERIOD_DEFAULT, TimeUnit.SECONDS) * 1000;
> sleepTimeMs = conf.getTimeDuration(
> DFSConfigKeys.DFS_HA_TAILEDITS_PERIOD_KEY,
> DFSConfigKeys.DFS_HA_TAILEDITS_PERIOD_DEFAULT, TimeUnit.SECONDS) * 
> 1000;
> {code}
> in both cases, the default time unit is second, and the configuration value 
> is converted into milli-seconds. Precision is lost when people want to 
> specify sub-second time duration such as {{100ms}}, which will be converted 
> to {{0ms}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14346) Better time precision in getTimeDuration

2019-03-12 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16791066#comment-16791066
 ] 

Chao Sun commented on HDFS-14346:
-

Good spot [~xkrogen]. It should be private. I also want to make the old one 
private but {{NameNode#reconfHeartbeatInterval}} is using it and it's 
non-trivial to change that code, so I gave up.

Attached patch v1.

> Better time precision in getTimeDuration
> 
>
> Key: HDFS-14346
> URL: https://issues.apache.org/jira/browse/HDFS-14346
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Minor
> Attachments: HDFS-14346.000.patch, HDFS-14346.001.patch
>
>
> Currently, {{Configuration#getTimeDuration}} has the following signature:
> {code}
>   /**
>* Return time duration in the given time unit. Valid units are encoded in
>* properties as suffixes: nanoseconds (ns), microseconds (us), milliseconds
>* (ms), seconds (s), minutes (m), hours (h), and days (d).
>* @param name Property name
>* @param defaultValue Value returned if no mapping exists.
>* @param unit Unit to convert the stored property, if it exists.
>* @throws NumberFormatException If the property stripped of its unit is not
>* a number
>*/
>   public long getTimeDuration(String name, long defaultValue, TimeUnit unit)
> {code}
> This may lose precision in case the default time unit is larger than the time 
> unit that the configuration value is converted to in the call sites of this 
> method. For instance, in {{EditLogTailer}} this method is used in the 
> following manner:
> {code}
> logRollPeriodMs = conf.getTimeDuration(
> DFSConfigKeys.DFS_HA_LOGROLL_PERIOD_KEY,
> DFSConfigKeys.DFS_HA_LOGROLL_PERIOD_DEFAULT, TimeUnit.SECONDS) * 1000;
> sleepTimeMs = conf.getTimeDuration(
> DFSConfigKeys.DFS_HA_TAILEDITS_PERIOD_KEY,
> DFSConfigKeys.DFS_HA_TAILEDITS_PERIOD_DEFAULT, TimeUnit.SECONDS) * 
> 1000;
> {code}
> in both cases, the default time unit is second, and the configuration value 
> is converted into milli-seconds. Precision is lost when people want to 
> specify sub-second time duration such as {{100ms}}, which will be converted 
> to {{0ms}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14366) Improve HDFS append performance

2019-03-12 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16791077#comment-16791077
 ] 

Chao Sun commented on HDFS-14366:
-

Thanks [~jojochuang]! Yes similar to HDFS-14171, the issue is again caused by 
{{getNumLiveDataNodes}}. It is much worse in the {{append}} case though since 
the method is called on the hot path and directly affect NameNode performance, 
as it holds the write lock while doing so. Here, the proposed change is pretty 
simple and IMO risk-free. It'd be great if you can take a look. :)

> Improve HDFS append performance
> ---
>
> Key: HDFS-14366
> URL: https://issues.apache.org/jira/browse/HDFS-14366
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 2.8.2
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
> Attachments: HDFS-14366.000.patch, append-flamegraph.png
>
>
> In our HDFS cluster we observed that {{append}} operation can take as much as 
> 10X write lock time than other write operations. By collecting flamegraph on 
> the namenode (see attachment: append-flamegraph.png), we found that most of 
> the append call is spent on {{getNumLiveDataNodes()}}:
> {code}
>   /** @return the number of live datanodes. */
>   public int getNumLiveDataNodes() {
> int numLive = 0;
> synchronized (this) {
>   for(DatanodeDescriptor dn : datanodeMap.values()) {
> if (!isDatanodeDead(dn) ) {
>   numLive++;
> }
>   }
> }
> return numLive;
>   }
> {code}
> this method synchronizes on the {{DatanodeManager}} which is particularly 
> expensive in large clusters since {{datanodeMap}} is being modified in many 
> places such as processing DN heartbeats.
> For {{append}} operation, {{getNumLiveDataNodes()}} is invoked in 
> {{isSufficientlyReplicated}}:
> {code}
>   /**
>* Check if a block is replicated to at least the minimum replication.
>*/
>   public boolean isSufficientlyReplicated(BlockInfo b) {
> // Compare against the lesser of the minReplication and number of live 
> DNs.
> final int replication =
> Math.min(minReplication, getDatanodeManager().getNumLiveDataNodes());
> return countNodes(b).liveReplicas() >= replication;
>   }
> {code}
> The way that the {{replication}} is calculated is not very optimal, as it 
> will call {{getNumLiveDataNodes()}} _every time_ even though usually 
> {{minReplication}} is much smaller than the latter. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14366) Improve HDFS append performance

2019-03-12 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16791089#comment-16791089
 ] 

Chao Sun commented on HDFS-14366:
-

Ah I see. I guess maybe not so many people are using {{append}} in large 
clusters like ours :D. Anyway, thanks for the quick responding and reviewing!

> Improve HDFS append performance
> ---
>
> Key: HDFS-14366
> URL: https://issues.apache.org/jira/browse/HDFS-14366
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 2.8.2
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
> Attachments: HDFS-14366.000.patch, append-flamegraph.png
>
>
> In our HDFS cluster we observed that {{append}} operation can take as much as 
> 10X write lock time than other write operations. By collecting flamegraph on 
> the namenode (see attachment: append-flamegraph.png), we found that most of 
> the append call is spent on {{getNumLiveDataNodes()}}:
> {code}
>   /** @return the number of live datanodes. */
>   public int getNumLiveDataNodes() {
> int numLive = 0;
> synchronized (this) {
>   for(DatanodeDescriptor dn : datanodeMap.values()) {
> if (!isDatanodeDead(dn) ) {
>   numLive++;
> }
>   }
> }
> return numLive;
>   }
> {code}
> this method synchronizes on the {{DatanodeManager}} which is particularly 
> expensive in large clusters since {{datanodeMap}} is being modified in many 
> places such as processing DN heartbeats.
> For {{append}} operation, {{getNumLiveDataNodes()}} is invoked in 
> {{isSufficientlyReplicated}}:
> {code}
>   /**
>* Check if a block is replicated to at least the minimum replication.
>*/
>   public boolean isSufficientlyReplicated(BlockInfo b) {
> // Compare against the lesser of the minReplication and number of live 
> DNs.
> final int replication =
> Math.min(minReplication, getDatanodeManager().getNumLiveDataNodes());
> return countNodes(b).liveReplicas() >= replication;
>   }
> {code}
> The way that the {{replication}} is calculated is not very optimal, as it 
> will call {{getNumLiveDataNodes()}} _every time_ even though usually 
> {{minReplication}} is much smaller than the latter. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14366) Improve HDFS append performance

2019-03-12 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-14366:

Attachment: HDFS-14366.001.patch

> Improve HDFS append performance
> ---
>
> Key: HDFS-14366
> URL: https://issues.apache.org/jira/browse/HDFS-14366
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 2.8.2
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
> Attachments: HDFS-14366.000.patch, HDFS-14366.001.patch, 
> append-flamegraph.png
>
>
> In our HDFS cluster we observed that {{append}} operation can take as much as 
> 10X write lock time than other write operations. By collecting flamegraph on 
> the namenode (see attachment: append-flamegraph.png), we found that most of 
> the append call is spent on {{getNumLiveDataNodes()}}:
> {code}
>   /** @return the number of live datanodes. */
>   public int getNumLiveDataNodes() {
> int numLive = 0;
> synchronized (this) {
>   for(DatanodeDescriptor dn : datanodeMap.values()) {
> if (!isDatanodeDead(dn) ) {
>   numLive++;
> }
>   }
> }
> return numLive;
>   }
> {code}
> this method synchronizes on the {{DatanodeManager}} which is particularly 
> expensive in large clusters since {{datanodeMap}} is being modified in many 
> places such as processing DN heartbeats.
> For {{append}} operation, {{getNumLiveDataNodes()}} is invoked in 
> {{isSufficientlyReplicated}}:
> {code}
>   /**
>* Check if a block is replicated to at least the minimum replication.
>*/
>   public boolean isSufficientlyReplicated(BlockInfo b) {
> // Compare against the lesser of the minReplication and number of live 
> DNs.
> final int replication =
> Math.min(minReplication, getDatanodeManager().getNumLiveDataNodes());
> return countNodes(b).liveReplicas() >= replication;
>   }
> {code}
> The way that the {{replication}} is calculated is not very optimal, as it 
> will call {{getNumLiveDataNodes()}} _every time_ even though usually 
> {{minReplication}} is much smaller than the latter. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14366) Improve HDFS append performance

2019-03-12 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16791104#comment-16791104
 ] 

Chao Sun edited comment on HDFS-14366 at 3/12/19 11:57 PM:
---

Thanks [~elgoiri]. Attached patch v1 according to your comments. Don't see an 
easier way to optimize {{getNumLiveDataNodes}} right now - will update this 
once a JIRA is filed.


was (Author: csun):
Thanks [~elgoiri]. Attached patch v1 according to your comments.

> Improve HDFS append performance
> ---
>
> Key: HDFS-14366
> URL: https://issues.apache.org/jira/browse/HDFS-14366
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 2.8.2
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
> Attachments: HDFS-14366.000.patch, HDFS-14366.001.patch, 
> append-flamegraph.png
>
>
> In our HDFS cluster we observed that {{append}} operation can take as much as 
> 10X write lock time than other write operations. By collecting flamegraph on 
> the namenode (see attachment: append-flamegraph.png), we found that most of 
> the append call is spent on {{getNumLiveDataNodes()}}:
> {code}
>   /** @return the number of live datanodes. */
>   public int getNumLiveDataNodes() {
> int numLive = 0;
> synchronized (this) {
>   for(DatanodeDescriptor dn : datanodeMap.values()) {
> if (!isDatanodeDead(dn) ) {
>   numLive++;
> }
>   }
> }
> return numLive;
>   }
> {code}
> this method synchronizes on the {{DatanodeManager}} which is particularly 
> expensive in large clusters since {{datanodeMap}} is being modified in many 
> places such as processing DN heartbeats.
> For {{append}} operation, {{getNumLiveDataNodes()}} is invoked in 
> {{isSufficientlyReplicated}}:
> {code}
>   /**
>* Check if a block is replicated to at least the minimum replication.
>*/
>   public boolean isSufficientlyReplicated(BlockInfo b) {
> // Compare against the lesser of the minReplication and number of live 
> DNs.
> final int replication =
> Math.min(minReplication, getDatanodeManager().getNumLiveDataNodes());
> return countNodes(b).liveReplicas() >= replication;
>   }
> {code}
> The way that the {{replication}} is calculated is not very optimal, as it 
> will call {{getNumLiveDataNodes()}} _every time_ even though usually 
> {{minReplication}} is much smaller than the latter. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14366) Improve HDFS append performance

2019-03-12 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16791104#comment-16791104
 ] 

Chao Sun commented on HDFS-14366:
-

Thanks [~elgoiri]. Attached patch v1 according to your comments.

> Improve HDFS append performance
> ---
>
> Key: HDFS-14366
> URL: https://issues.apache.org/jira/browse/HDFS-14366
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 2.8.2
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
> Attachments: HDFS-14366.000.patch, HDFS-14366.001.patch, 
> append-flamegraph.png
>
>
> In our HDFS cluster we observed that {{append}} operation can take as much as 
> 10X write lock time than other write operations. By collecting flamegraph on 
> the namenode (see attachment: append-flamegraph.png), we found that most of 
> the append call is spent on {{getNumLiveDataNodes()}}:
> {code}
>   /** @return the number of live datanodes. */
>   public int getNumLiveDataNodes() {
> int numLive = 0;
> synchronized (this) {
>   for(DatanodeDescriptor dn : datanodeMap.values()) {
> if (!isDatanodeDead(dn) ) {
>   numLive++;
> }
>   }
> }
> return numLive;
>   }
> {code}
> this method synchronizes on the {{DatanodeManager}} which is particularly 
> expensive in large clusters since {{datanodeMap}} is being modified in many 
> places such as processing DN heartbeats.
> For {{append}} operation, {{getNumLiveDataNodes()}} is invoked in 
> {{isSufficientlyReplicated}}:
> {code}
>   /**
>* Check if a block is replicated to at least the minimum replication.
>*/
>   public boolean isSufficientlyReplicated(BlockInfo b) {
> // Compare against the lesser of the minReplication and number of live 
> DNs.
> final int replication =
> Math.min(minReplication, getDatanodeManager().getNumLiveDataNodes());
> return countNodes(b).liveReplicas() >= replication;
>   }
> {code}
> The way that the {{replication}} is calculated is not very optimal, as it 
> will call {{getNumLiveDataNodes()}} _every time_ even though usually 
> {{minReplication}} is much smaller than the latter. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14346) Better time precision in getTimeDuration

2019-03-13 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16791842#comment-16791842
 ] 

Chao Sun commented on HDFS-14346:
-

Thanks [~xkrogen]. Attached patch v2 to address the checkstyle issues.

> Better time precision in getTimeDuration
> 
>
> Key: HDFS-14346
> URL: https://issues.apache.org/jira/browse/HDFS-14346
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Minor
> Attachments: HDFS-14346.000.patch, HDFS-14346.001.patch, 
> HDFS-14346.002.patch
>
>
> Currently, {{Configuration#getTimeDuration}} has the following signature:
> {code}
>   /**
>* Return time duration in the given time unit. Valid units are encoded in
>* properties as suffixes: nanoseconds (ns), microseconds (us), milliseconds
>* (ms), seconds (s), minutes (m), hours (h), and days (d).
>* @param name Property name
>* @param defaultValue Value returned if no mapping exists.
>* @param unit Unit to convert the stored property, if it exists.
>* @throws NumberFormatException If the property stripped of its unit is not
>* a number
>*/
>   public long getTimeDuration(String name, long defaultValue, TimeUnit unit)
> {code}
> This may lose precision in case the default time unit is larger than the time 
> unit that the configuration value is converted to in the call sites of this 
> method. For instance, in {{EditLogTailer}} this method is used in the 
> following manner:
> {code}
> logRollPeriodMs = conf.getTimeDuration(
> DFSConfigKeys.DFS_HA_LOGROLL_PERIOD_KEY,
> DFSConfigKeys.DFS_HA_LOGROLL_PERIOD_DEFAULT, TimeUnit.SECONDS) * 1000;
> sleepTimeMs = conf.getTimeDuration(
> DFSConfigKeys.DFS_HA_TAILEDITS_PERIOD_KEY,
> DFSConfigKeys.DFS_HA_TAILEDITS_PERIOD_DEFAULT, TimeUnit.SECONDS) * 
> 1000;
> {code}
> in both cases, the default time unit is second, and the configuration value 
> is converted into milli-seconds. Precision is lost when people want to 
> specify sub-second time duration such as {{100ms}}, which will be converted 
> to {{0ms}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14346) Better time precision in getTimeDuration

2019-03-13 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-14346:

Attachment: HDFS-14346.002.patch

> Better time precision in getTimeDuration
> 
>
> Key: HDFS-14346
> URL: https://issues.apache.org/jira/browse/HDFS-14346
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Minor
> Attachments: HDFS-14346.000.patch, HDFS-14346.001.patch, 
> HDFS-14346.002.patch
>
>
> Currently, {{Configuration#getTimeDuration}} has the following signature:
> {code}
>   /**
>* Return time duration in the given time unit. Valid units are encoded in
>* properties as suffixes: nanoseconds (ns), microseconds (us), milliseconds
>* (ms), seconds (s), minutes (m), hours (h), and days (d).
>* @param name Property name
>* @param defaultValue Value returned if no mapping exists.
>* @param unit Unit to convert the stored property, if it exists.
>* @throws NumberFormatException If the property stripped of its unit is not
>* a number
>*/
>   public long getTimeDuration(String name, long defaultValue, TimeUnit unit)
> {code}
> This may lose precision in case the default time unit is larger than the time 
> unit that the configuration value is converted to in the call sites of this 
> method. For instance, in {{EditLogTailer}} this method is used in the 
> following manner:
> {code}
> logRollPeriodMs = conf.getTimeDuration(
> DFSConfigKeys.DFS_HA_LOGROLL_PERIOD_KEY,
> DFSConfigKeys.DFS_HA_LOGROLL_PERIOD_DEFAULT, TimeUnit.SECONDS) * 1000;
> sleepTimeMs = conf.getTimeDuration(
> DFSConfigKeys.DFS_HA_TAILEDITS_PERIOD_KEY,
> DFSConfigKeys.DFS_HA_TAILEDITS_PERIOD_DEFAULT, TimeUnit.SECONDS) * 
> 1000;
> {code}
> in both cases, the default time unit is second, and the configuration value 
> is converted into milli-seconds. Precision is lost when people want to 
> specify sub-second time duration such as {{100ms}}, which will be converted 
> to {{0ms}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14205) Backport HDFS-6440 to branch-2

2019-03-13 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16792095#comment-16792095
 ] 

Chao Sun commented on HDFS-14205:
-

[~vagarychen] there are a few things I fixed in the latest patch:
 1. One line:
{code:java}
Configuration hdfsConf = new Configuration(conf);
{code}
was missing from {{MiniDFSCluster}}. The change was added later by HDFS-9142 
and conflicts with HDFS-6440. It also changed the return type of 
{{createNameNode}} from {{NameNodeInfo}} to {{void}}.
 2. In {{TestEditLogRace}}, it should be:
{code:java}
private static final String NAME_DIR = MiniDFSCluster.getBaseDirectory() + 
"name-0-1";
{code}
for the latest change.
 3. In {{TestBootstrapStandby#testSuccessfulBaseCase}}, some of the code before 
{{restartNameNodesFromIndex}} need to be moved into the for loop.
 4. In {{TestBootstrapStandby#testDownloadingLaterCheckpoint}}, it should be:
{code:java}
URI editsUri = cluster.getSharedEditsDir(0, maxNNCount - 1);
{code}
instead of:
{code:java}
URI editsUri = cluster.getSharedEditsDir(0, 1);
{code}
5. Binary changes (e.g., {{hadoop-1-reserved.tgz}}) were missing in the early 
patches.
 6. Before patch v2, {{MultipleNameNodeProxy}} was defined but not used. It was 
fixed since patch v3.

> Backport HDFS-6440 to branch-2
> --
>
> Key: HDFS-14205
> URL: https://issues.apache.org/jira/browse/HDFS-14205
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Chen Liang
>Assignee: Chao Sun
>Priority: Major
> Attachments: HDFS-14205-branch-2.001.patch, 
> HDFS-14205-branch-2.002.patch, HDFS-14205-branch-2.003.patch, 
> HDFS-14205-branch-2.004.patch, HDFS-14205-branch-2.005.patch, 
> HDFS-14205-branch-2.006.patch, HDFS-14205-branch-2.007.patch, 
> HDFS-14205-branch-2.008.patch, HDFS-14205-branch-2.009.patch
>
>
> Currently support for more than 2 NameNodes (HDFS-6440) is only in branch-3. 
> This JIRA aims to backport it to branch-2, as this is required by HDFS-12943 
> (consistent read from standby) backport to branch-2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14366) Improve HDFS append performance

2019-03-15 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16793785#comment-16793785
 ] 

Chao Sun commented on HDFS-14366:
-

[~elgoiri], [~jojochuang]: can you help to get committed? Thanks.

> Improve HDFS append performance
> ---
>
> Key: HDFS-14366
> URL: https://issues.apache.org/jira/browse/HDFS-14366
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 2.8.2
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
> Attachments: HDFS-14366.000.patch, HDFS-14366.001.patch, 
> append-flamegraph.png
>
>
> In our HDFS cluster we observed that {{append}} operation can take as much as 
> 10X write lock time than other write operations. By collecting flamegraph on 
> the namenode (see attachment: append-flamegraph.png), we found that most of 
> the append call is spent on {{getNumLiveDataNodes()}}:
> {code}
>   /** @return the number of live datanodes. */
>   public int getNumLiveDataNodes() {
> int numLive = 0;
> synchronized (this) {
>   for(DatanodeDescriptor dn : datanodeMap.values()) {
> if (!isDatanodeDead(dn) ) {
>   numLive++;
> }
>   }
> }
> return numLive;
>   }
> {code}
> this method synchronizes on the {{DatanodeManager}} which is particularly 
> expensive in large clusters since {{datanodeMap}} is being modified in many 
> places such as processing DN heartbeats.
> For {{append}} operation, {{getNumLiveDataNodes()}} is invoked in 
> {{isSufficientlyReplicated}}:
> {code}
>   /**
>* Check if a block is replicated to at least the minimum replication.
>*/
>   public boolean isSufficientlyReplicated(BlockInfo b) {
> // Compare against the lesser of the minReplication and number of live 
> DNs.
> final int replication =
> Math.min(minReplication, getDatanodeManager().getNumLiveDataNodes());
> return countNodes(b).liveReplicas() >= replication;
>   }
> {code}
> The way that the {{replication}} is calculated is not very optimal, as it 
> will call {{getNumLiveDataNodes()}} _every time_ even though usually 
> {{minReplication}} is much smaller than the latter. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14366) Improve HDFS append performance

2019-03-15 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16793785#comment-16793785
 ] 

Chao Sun edited comment on HDFS-14366 at 3/15/19 5:16 PM:
--

[~elgoiri], [~jojochuang]: can you help to get this committed? Thanks.


was (Author: csun):
[~elgoiri], [~jojochuang]: can you help to get committed? Thanks.

> Improve HDFS append performance
> ---
>
> Key: HDFS-14366
> URL: https://issues.apache.org/jira/browse/HDFS-14366
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 2.8.2
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
> Attachments: HDFS-14366.000.patch, HDFS-14366.001.patch, 
> append-flamegraph.png
>
>
> In our HDFS cluster we observed that {{append}} operation can take as much as 
> 10X write lock time than other write operations. By collecting flamegraph on 
> the namenode (see attachment: append-flamegraph.png), we found that most of 
> the append call is spent on {{getNumLiveDataNodes()}}:
> {code}
>   /** @return the number of live datanodes. */
>   public int getNumLiveDataNodes() {
> int numLive = 0;
> synchronized (this) {
>   for(DatanodeDescriptor dn : datanodeMap.values()) {
> if (!isDatanodeDead(dn) ) {
>   numLive++;
> }
>   }
> }
> return numLive;
>   }
> {code}
> this method synchronizes on the {{DatanodeManager}} which is particularly 
> expensive in large clusters since {{datanodeMap}} is being modified in many 
> places such as processing DN heartbeats.
> For {{append}} operation, {{getNumLiveDataNodes()}} is invoked in 
> {{isSufficientlyReplicated}}:
> {code}
>   /**
>* Check if a block is replicated to at least the minimum replication.
>*/
>   public boolean isSufficientlyReplicated(BlockInfo b) {
> // Compare against the lesser of the minReplication and number of live 
> DNs.
> final int replication =
> Math.min(minReplication, getDatanodeManager().getNumLiveDataNodes());
> return countNodes(b).liveReplicas() >= replication;
>   }
> {code}
> The way that the {{replication}} is calculated is not very optimal, as it 
> will call {{getNumLiveDataNodes()}} _every time_ even though usually 
> {{minReplication}} is much smaller than the latter. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14366) Improve HDFS append performance

2019-03-15 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16793906#comment-16793906
 ] 

Chao Sun commented on HDFS-14366:
-

Thanks [~elgoiri]! do you think we should backport this to other branches such 
as branch-2 as well?

> Improve HDFS append performance
> ---
>
> Key: HDFS-14366
> URL: https://issues.apache.org/jira/browse/HDFS-14366
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 2.8.2
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14366.000.patch, HDFS-14366.001.patch, 
> append-flamegraph.png
>
>
> In our HDFS cluster we observed that {{append}} operation can take as much as 
> 10X write lock time than other write operations. By collecting flamegraph on 
> the namenode (see attachment: append-flamegraph.png), we found that most of 
> the append call is spent on {{getNumLiveDataNodes()}}:
> {code}
>   /** @return the number of live datanodes. */
>   public int getNumLiveDataNodes() {
> int numLive = 0;
> synchronized (this) {
>   for(DatanodeDescriptor dn : datanodeMap.values()) {
> if (!isDatanodeDead(dn) ) {
>   numLive++;
> }
>   }
> }
> return numLive;
>   }
> {code}
> this method synchronizes on the {{DatanodeManager}} which is particularly 
> expensive in large clusters since {{datanodeMap}} is being modified in many 
> places such as processing DN heartbeats.
> For {{append}} operation, {{getNumLiveDataNodes()}} is invoked in 
> {{isSufficientlyReplicated}}:
> {code}
>   /**
>* Check if a block is replicated to at least the minimum replication.
>*/
>   public boolean isSufficientlyReplicated(BlockInfo b) {
> // Compare against the lesser of the minReplication and number of live 
> DNs.
> final int replication =
> Math.min(minReplication, getDatanodeManager().getNumLiveDataNodes());
> return countNodes(b).liveReplicas() >= replication;
>   }
> {code}
> The way that the {{replication}} is calculated is not very optimal, as it 
> will call {{getNumLiveDataNodes()}} _every time_ even though usually 
> {{minReplication}} is much smaller than the latter. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14205) Backport HDFS-6440 to branch-2

2019-03-18 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16795417#comment-16795417
 ] 

Chao Sun commented on HDFS-14205:
-

Thanks [~vagarychen]. I think we need to include the change for #2 and #4. In 
trunk, the HDFS-6440 came before HDFS-13676 and HDFS-9533. However, in branch-2 
we don't have HDFS-6440 but have the latter two: the backported version for 
these two is slightly modified to accommodate the single-SBN architecture but 
no longer works with HDFS-6440.

Yes, after this we'll need to backport a few follow up JIRAs after HDFS-6440. 
I'll create JIRAs for the work. Hopefully they should be relatively 
straightforward. :)
 

 

> Backport HDFS-6440 to branch-2
> --
>
> Key: HDFS-14205
> URL: https://issues.apache.org/jira/browse/HDFS-14205
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Chen Liang
>Assignee: Chao Sun
>Priority: Major
> Attachments: HDFS-14205-branch-2.001.patch, 
> HDFS-14205-branch-2.002.patch, HDFS-14205-branch-2.003.patch, 
> HDFS-14205-branch-2.004.patch, HDFS-14205-branch-2.005.patch, 
> HDFS-14205-branch-2.006.patch, HDFS-14205-branch-2.007.patch, 
> HDFS-14205-branch-2.008.patch, HDFS-14205-branch-2.009.patch
>
>
> Currently support for more than 2 NameNodes (HDFS-6440) is only in branch-3. 
> This JIRA aims to backport it to branch-2, as this is required by HDFS-12943 
> (consistent read from standby) backport to branch-2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14205) Backport HDFS-6440 to branch-2

2019-03-26 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16802098#comment-16802098
 ] 

Chao Sun commented on HDFS-14205:
-

Thanks [~vagarychen]! I'll backport the follow-up JIRAs soon.

> Backport HDFS-6440 to branch-2
> --
>
> Key: HDFS-14205
> URL: https://issues.apache.org/jira/browse/HDFS-14205
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Chen Liang
>Assignee: Chao Sun
>Priority: Major
> Fix For: 2.10.0
>
> Attachments: HDFS-14205-branch-2.001.patch, 
> HDFS-14205-branch-2.002.patch, HDFS-14205-branch-2.003.patch, 
> HDFS-14205-branch-2.004.patch, HDFS-14205-branch-2.005.patch, 
> HDFS-14205-branch-2.006.patch, HDFS-14205-branch-2.007.patch, 
> HDFS-14205-branch-2.008.patch, HDFS-14205-branch-2.009.patch
>
>
> Currently support for more than 2 NameNodes (HDFS-6440) is only in branch-3. 
> This JIRA aims to backport it to branch-2, as this is required by HDFS-12943 
> (consistent read from standby) backport to branch-2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14391) Backport HDFS-9659 to branch-2

2019-03-27 Thread Chao Sun (JIRA)
Chao Sun created HDFS-14391:
---

 Summary: Backport HDFS-9659 to branch-2
 Key: HDFS-14391
 URL: https://issues.apache.org/jira/browse/HDFS-14391
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Chao Sun
Assignee: Chao Sun


As multi-SBN feature is already backported to branch-2, this is a follow-up to 
backport HDFS-9659.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14391) Backport HDFS-9659 to branch-2

2019-03-27 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-14391:

Attachment: HDFS-14391-branch-2.000.patch

> Backport HDFS-9659 to branch-2
> --
>
> Key: HDFS-14391
> URL: https://issues.apache.org/jira/browse/HDFS-14391
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Minor
> Attachments: HDFS-14391-branch-2.000.patch
>
>
> As multi-SBN feature is already backported to branch-2, this is a follow-up 
> to backport HDFS-9659.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14391) Backport HDFS-9659 to branch-2

2019-03-27 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-14391:

Status: Patch Available  (was: Open)

> Backport HDFS-9659 to branch-2
> --
>
> Key: HDFS-14391
> URL: https://issues.apache.org/jira/browse/HDFS-14391
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Minor
> Attachments: HDFS-14391-branch-2.000.patch
>
>
> As multi-SBN feature is already backported to branch-2, this is a follow-up 
> to backport HDFS-9659.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14392) Backport HDFS-9787 to branch-2

2019-03-27 Thread Chao Sun (JIRA)
Chao Sun created HDFS-14392:
---

 Summary: Backport HDFS-9787 to branch-2
 Key: HDFS-14392
 URL: https://issues.apache.org/jira/browse/HDFS-14392
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs
Reporter: Chao Sun
Assignee: Chao Sun


As multi-SBN feature is already backported to branch-2, this is a follow-up to 
backport HDFS-9787.





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14391) Backport HDFS-9659 to branch-2

2019-03-27 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16803350#comment-16803350
 ] 

Chao Sun commented on HDFS-14391:
-

The failed tests (except {{TestWebHdfsTimeouts}}, which is a known flaky test) 
are passing on my laptop, and I don't think they are related to this change. 
[~vagarychen]: can you take a look? Thanks.

> Backport HDFS-9659 to branch-2
> --
>
> Key: HDFS-14391
> URL: https://issues.apache.org/jira/browse/HDFS-14391
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Minor
> Attachments: HDFS-14391-branch-2.000.patch
>
>
> As multi-SBN feature is already backported to branch-2, this is a follow-up 
> to backport HDFS-9659.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14392) Backport HDFS-9787 to branch-2

2019-03-27 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-14392:

Status: Patch Available  (was: Open)

> Backport HDFS-9787 to branch-2
> --
>
> Key: HDFS-14392
> URL: https://issues.apache.org/jira/browse/HDFS-14392
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Minor
> Attachments: HDFS-14392-branch-2.000.patch
>
>
> As multi-SBN feature is already backported to branch-2, this is a follow-up 
> to backport HDFS-9787.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14392) Backport HDFS-9787 to branch-2

2019-03-27 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-14392:

Attachment: HDFS-14392-branch-2.000.patch

> Backport HDFS-9787 to branch-2
> --
>
> Key: HDFS-14392
> URL: https://issues.apache.org/jira/browse/HDFS-14392
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Minor
> Attachments: HDFS-14392-branch-2.000.patch
>
>
> As multi-SBN feature is already backported to branch-2, this is a follow-up 
> to backport HDFS-9787.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14392) Backport HDFS-9787 to branch-2

2019-03-28 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804141#comment-16804141
 ] 

Chao Sun commented on HDFS-14392:
-

cc [~vagarychen]: the test failures for this also do not seem to be related. 

> Backport HDFS-9787 to branch-2
> --
>
> Key: HDFS-14392
> URL: https://issues.apache.org/jira/browse/HDFS-14392
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Minor
> Attachments: HDFS-14392-branch-2.000.patch
>
>
> As multi-SBN feature is already backported to branch-2, this is a follow-up 
> to backport HDFS-9787.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14397) Backport HADOOP-15684 to branch-2

2019-03-28 Thread Chao Sun (JIRA)
Chao Sun created HDFS-14397:
---

 Summary: Backport HADOOP-15684 to branch-2
 Key: HDFS-14397
 URL: https://issues.apache.org/jira/browse/HDFS-14397
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Chao Sun
Assignee: Chao Sun


As multi-SBN feature is already backported to branch-2, this is a follow-up to 
backport HADOOP-15684.





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14397) Backport HADOOP-15684 to branch-2

2019-03-28 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-14397:

Attachment: HDFS-14397-branch-2.000.patch

> Backport HADOOP-15684 to branch-2
> -
>
> Key: HDFS-14397
> URL: https://issues.apache.org/jira/browse/HDFS-14397
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Minor
> Attachments: HDFS-14397-branch-2.000.patch
>
>
> As multi-SBN feature is already backported to branch-2, this is a follow-up 
> to backport HADOOP-15684.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14397) Backport HADOOP-15684 to branch-2

2019-03-28 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-14397:

Status: Patch Available  (was: Open)

> Backport HADOOP-15684 to branch-2
> -
>
> Key: HDFS-14397
> URL: https://issues.apache.org/jira/browse/HDFS-14397
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Minor
> Attachments: HDFS-14397-branch-2.000.patch
>
>
> As multi-SBN feature is already backported to branch-2, this is a follow-up 
> to backport HADOOP-15684.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14399) Backport HDFS-10536 to branch-2

2019-03-29 Thread Chao Sun (JIRA)
Chao Sun created HDFS-14399:
---

 Summary: Backport HDFS-10536 to branch-2
 Key: HDFS-14399
 URL: https://issues.apache.org/jira/browse/HDFS-14399
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Chao Sun
Assignee: Chao Sun


As multi-SBN feature is already backported to branch-2, this is a follow-up to 
backport HADOOP-10536.





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14399) Backport HDFS-10536 to branch-2

2019-03-29 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-14399:

Description: 
As multi-SBN feature is already backported to branch-2, this is a follow-up to 
backport HDFS-10536.



  was:
As multi-SBN feature is already backported to branch-2, this is a follow-up to 
backport HADOOP-10536.




> Backport HDFS-10536 to branch-2
> ---
>
> Key: HDFS-14399
> URL: https://issues.apache.org/jira/browse/HDFS-14399
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Critical
> Attachments: HDFS-14399-branch-2.000.patch
>
>
> As multi-SBN feature is already backported to branch-2, this is a follow-up 
> to backport HDFS-10536.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14399) Backport HDFS-10536 to branch-2

2019-03-29 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-14399:

Attachment: HDFS-14399-branch-2.000.patch

> Backport HDFS-10536 to branch-2
> ---
>
> Key: HDFS-14399
> URL: https://issues.apache.org/jira/browse/HDFS-14399
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Critical
> Attachments: HDFS-14399-branch-2.000.patch
>
>
> As multi-SBN feature is already backported to branch-2, this is a follow-up 
> to backport HDFS-10536.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14399) Backport HDFS-10536 to branch-2

2019-03-29 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-14399:

Status: Patch Available  (was: Open)

> Backport HDFS-10536 to branch-2
> ---
>
> Key: HDFS-14399
> URL: https://issues.apache.org/jira/browse/HDFS-14399
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Critical
> Attachments: HDFS-14399-branch-2.000.patch
>
>
> As multi-SBN feature is already backported to branch-2, this is a follow-up 
> to backport HDFS-10536.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14397) Backport HADOOP-15684 to branch-2

2019-03-29 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16805244#comment-16805244
 ] 

Chao Sun commented on HDFS-14397:
-

Test fails because this needs HDFS-10536. Filed HDFS-14399 and will upload a 
new patch after that is resolved.

> Backport HADOOP-15684 to branch-2
> -
>
> Key: HDFS-14397
> URL: https://issues.apache.org/jira/browse/HDFS-14397
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Minor
> Attachments: HDFS-14397-branch-2.000.patch
>
>
> As multi-SBN feature is already backported to branch-2, this is a follow-up 
> to backport HADOOP-15684.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



  1   2   3   4   5   6   7   8   9   10   >