[jira] [Created] (HDFS-15299) Add an option to enable reporting flaky disk

2020-04-25 Thread star (Jira)
star created HDFS-15299:
---

 Summary: Add an option to enable  reporting flaky disk 
 Key: HDFS-15299
 URL: https://issues.apache.org/jira/browse/HDFS-15299
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: star
Assignee: star


In our production environment with disks more than 8 years old, many DN are 
treated as dead   because of partially broken. Then NN will balance data blocks 
in the cluster, introducing high disk loads. To reduce the impact of flaky 
disks, we'd like to extend the tolerance mechanism to partial disk failure. 

       As described in HDFS-10777  , command du could still throw exception in 
a high loaded disk. It is brittle to just remove a flaky disk because it may 
recover later. However it is a rare case in our production environment. So can 
we just add an option to enable partial disk failure tolerance for users who 
has mostly broken disks and care more about stability of the cluster.

      We will replace those old disks in the future, but before that, it will 
last a long time to run hdfs cluster on those servers.

      Comments are appreciated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14378) Simplify the design of multiple NN and both logic of edit log roll and checkpoint

2019-09-09 Thread star (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925622#comment-16925622
 ] 

star edited comment on HDFS-14378 at 9/9/19 11:51 AM:
--

Thanks [~jojochuang] for reviewing and advise.

Section 7.6 of HDFS-1073 emphasize the problem of multi nn. HDFS-6440 didn't 
take care of edits rolling and avoid multiple fsimage uploading by 'primary 
check pointer' status for multi SNN. 

I'd like to make two sub jiras as respect to edits rolling and fsimage 
downloading. ANN will roll its edit logs. As to fsimage, two options as far as 
my concern:
 # SNN do its own checkpointk and ANN will download fsimage from a random 
selected SNN.
 # ANN issues a checkpoint command to SNNs by a special edit log like 
"OP_ROLLING_UPGRADE_START", then ANN downloads fsimage form a random selected 
SNN.

[~jojochuang], [~tlipcon]  what's your opinion?


was (Author: starphin):
Thanks [~jojochuang] for reviewing and advise.

Section 7.6 of HDFS-1073 emphasize the problem of multi nn. HDFS-6440 didn't 
take care of edits rolling and avoid multiple fsimage uploading by 'primary 
check pointer' status for multi SNN. 

I'd like to make two sub jiras as respect to edits rolling and fsimage 
downloading. ANN will roll its edit logs. As to fsimage, two options as far as 
my concern:
 # SNN do its own checkpointk and ANN will download fsimage from a random 
selected SNN.
 # 2. ANN issues a checkpoint command to SNNs by a special edit log like 
"OP_ROLLING_UPGRADE_START", then ANN downloads fsimage form a random selected 
SNN.

[~jojochuang], [~tlipcon]  what's your opinion?

> Simplify the design of multiple NN and both logic of edit log roll and 
> checkpoint
> -
>
> Key: HDFS-14378
> URL: https://issues.apache.org/jira/browse/HDFS-14378
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ha, namenode
>Affects Versions: 3.1.2
>Reporter: star
>Assignee: star
>Priority: Major
> Attachments: HDFS-14378-trunk.001.patch, HDFS-14378-trunk.002.patch, 
> HDFS-14378-trunk.003.patch, HDFS-14378-trunk.004.patch, 
> HDFS-14378-trunk.005.patch, HDFS-14378-trunk.006.patch
>
>
>       HDFS-6440 introduced a mechanism to support more than 2 NNs. It 
> implements a first-writer-win policy to avoid duplicated fsimage downloading. 
> Variable 'isPrimaryCheckPointer' is used to hold the first-writer state, with 
> which SNN will provide fsimage for ANN next time. Then we have three roles in 
> NN cluster: ANN, one primary SNN, one or more normal SNN.
>       Since HDFS-12248, there may be more than two primary SNN shortly after 
> a exception occurred. It takes care with a scenario  that SNN will not upload 
> fsimage on IOE and Interrupted exceptions. Though it will not cause any 
> further functional issues, it is inconsistent. 
>       Futher more, edit log may be rolled more frequently than necessary with 
> multiple Standby name nodes, HDFS-14349. (I'm not so sure about this, will 
> verify by unit tests or any one could point it out.)
>       Above all, I‘m wondering if we could make it simple with following 
> changes:
>  * There are only two roles:ANN, SNN
>  * ANN will roll its edit log every DFS_HA_LOGROLL_PERIOD_KEY period.
>  * ANN will select a SNN to download checkpoint.
> SNN will just do logtail and checkpoint. Then provide a servlet for fsimage 
> downloading as normal. SNN will not try to roll edit log or send checkpoint 
> request to ANN.
> In a word, ANN will be more active. Suggestions are welcomed.
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14378) Simplify the design of multiple NN and both logic of edit log roll and checkpoint

2019-09-09 Thread star (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925622#comment-16925622
 ] 

star edited comment on HDFS-14378 at 9/9/19 11:51 AM:
--

Thanks [~jojochuang] for reviewing and advise.

Section 7.6 of HDFS-1073 emphasize the problem of multi nn. HDFS-6440 didn't 
take care of edits rolling and avoid multiple fsimage uploading by 'primary 
check pointer' status for multi SNN. 

I'd like to make two sub jiras as respect to edits rolling and fsimage 
downloading. ANN will roll its edit logs. As to fsimage, two options as far as 
my concern:
 # SNN do its own checkpointk and ANN will download fsimage from a random 
selected SNN.
 # 2. ANN issues a checkpoint command to SNNs by a special edit log like 
"OP_ROLLING_UPGRADE_START", then ANN downloads fsimage form a random selected 
SNN.

[~jojochuang], [~tlipcon]  what's your opinion?


was (Author: starphin):
Thanks [~jojochuang] for reviewing and advise.

Section 7.6 of HDFS-1073 emphasize the problem of multi nn. HDFS-6440 didn't 
take care of edits rolling and avoid multiple fsimage uploading by 'primary 
check pointer' status for multi SNN. 

I'd like to make two sub jiras as respect to edits rolling and fsimage 
downloading. ANN will roll its edit logs. As to fsimage, two options as far as 
my concern: 1. SNN do its own checkpointk and ANN will download fsimage from a 
random selected SNN. 2. ANN issues a checkpoint command to SNNs by a special 
edit log like "OP_ROLLING_UPGRADE_START", then ANN downloads fsimage form a 
random selected SNN.

[~jojochuang], [~tlipcon]  what's your opinion?

> Simplify the design of multiple NN and both logic of edit log roll and 
> checkpoint
> -
>
> Key: HDFS-14378
> URL: https://issues.apache.org/jira/browse/HDFS-14378
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ha, namenode
>Affects Versions: 3.1.2
>Reporter: star
>Assignee: star
>Priority: Major
> Attachments: HDFS-14378-trunk.001.patch, HDFS-14378-trunk.002.patch, 
> HDFS-14378-trunk.003.patch, HDFS-14378-trunk.004.patch, 
> HDFS-14378-trunk.005.patch, HDFS-14378-trunk.006.patch
>
>
>       HDFS-6440 introduced a mechanism to support more than 2 NNs. It 
> implements a first-writer-win policy to avoid duplicated fsimage downloading. 
> Variable 'isPrimaryCheckPointer' is used to hold the first-writer state, with 
> which SNN will provide fsimage for ANN next time. Then we have three roles in 
> NN cluster: ANN, one primary SNN, one or more normal SNN.
>       Since HDFS-12248, there may be more than two primary SNN shortly after 
> a exception occurred. It takes care with a scenario  that SNN will not upload 
> fsimage on IOE and Interrupted exceptions. Though it will not cause any 
> further functional issues, it is inconsistent. 
>       Futher more, edit log may be rolled more frequently than necessary with 
> multiple Standby name nodes, HDFS-14349. (I'm not so sure about this, will 
> verify by unit tests or any one could point it out.)
>       Above all, I‘m wondering if we could make it simple with following 
> changes:
>  * There are only two roles:ANN, SNN
>  * ANN will roll its edit log every DFS_HA_LOGROLL_PERIOD_KEY period.
>  * ANN will select a SNN to download checkpoint.
> SNN will just do logtail and checkpoint. Then provide a servlet for fsimage 
> downloading as normal. SNN will not try to roll edit log or send checkpoint 
> request to ANN.
> In a word, ANN will be more active. Suggestions are welcomed.
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14378) Simplify the design of multiple NN and both logic of edit log roll and checkpoint

2019-09-09 Thread star (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925622#comment-16925622
 ] 

star commented on HDFS-14378:
-

Thanks [~jojochuang] for reviewing and advise.

Section 7.6 of HDFS-1073 emphasize the problem of multi nn. HDFS-6440 didn't 
take care of edits rolling and avoid multiple fsimage uploading by 'primary 
check pointer' status for multi SNN. 

I'd like to make two sub jiras as respect to edits rolling and fsimage 
downloading. ANN will roll its edit logs. As to fsimage, two options as far as 
my concern: 1. SNN do its own checkpointk and ANN will download fsimage from a 
random selected SNN. 2. ANN issues a checkpoint command to SNNs by a special 
edit log like "OP_ROLLING_UPGRADE_START", then ANN downloads fsimage form a 
random selected SNN.

[~jojochuang], [~tlipcon]  what's your opinion?

> Simplify the design of multiple NN and both logic of edit log roll and 
> checkpoint
> -
>
> Key: HDFS-14378
> URL: https://issues.apache.org/jira/browse/HDFS-14378
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ha, namenode
>Affects Versions: 3.1.2
>Reporter: star
>Assignee: star
>Priority: Major
> Attachments: HDFS-14378-trunk.001.patch, HDFS-14378-trunk.002.patch, 
> HDFS-14378-trunk.003.patch, HDFS-14378-trunk.004.patch, 
> HDFS-14378-trunk.005.patch, HDFS-14378-trunk.006.patch
>
>
>       HDFS-6440 introduced a mechanism to support more than 2 NNs. It 
> implements a first-writer-win policy to avoid duplicated fsimage downloading. 
> Variable 'isPrimaryCheckPointer' is used to hold the first-writer state, with 
> which SNN will provide fsimage for ANN next time. Then we have three roles in 
> NN cluster: ANN, one primary SNN, one or more normal SNN.
>       Since HDFS-12248, there may be more than two primary SNN shortly after 
> a exception occurred. It takes care with a scenario  that SNN will not upload 
> fsimage on IOE and Interrupted exceptions. Though it will not cause any 
> further functional issues, it is inconsistent. 
>       Futher more, edit log may be rolled more frequently than necessary with 
> multiple Standby name nodes, HDFS-14349. (I'm not so sure about this, will 
> verify by unit tests or any one could point it out.)
>       Above all, I‘m wondering if we could make it simple with following 
> changes:
>  * There are only two roles:ANN, SNN
>  * ANN will roll its edit log every DFS_HA_LOGROLL_PERIOD_KEY period.
>  * ANN will select a SNN to download checkpoint.
> SNN will just do logtail and checkpoint. Then provide a servlet for fsimage 
> downloading as normal. SNN will not try to roll edit log or send checkpoint 
> request to ANN.
> In a word, ANN will be more active. Suggestions are welcomed.
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14361) SNN will always upload fsimage

2019-07-07 Thread star (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16879988#comment-16879988
 ] 

star commented on HDFS-14361:
-

Right. isPrimaryCheckPointer will not be changed when any error/exception 
occurred. 

> SNN will always upload fsimage
> --
>
> Key: HDFS-14361
> URL: https://issues.apache.org/jira/browse/HDFS-14361
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, namenode
>Affects Versions: 3.2.0
>Reporter: hunshenshi
>Priority: Major
> Fix For: 3.2.0
>
>
> Related to -HDFS-12248.-
> {code:java}
> boolean sendRequest = isPrimaryCheckPointer
> || secsSinceLastUpload >= checkpointConf.getQuietPeriod();
> doCheckpoint(sendRequest);
> {code}
> If sendRequest is true, SNN will upload fsimage. But isPrimaryCheckPointer 
> always is true,
> {code:java}
> if (ie == null && ioe == null) {
>   //Update only when response from remote about success or
>   lastUploadTime = monotonicNow();
>   // we are primary if we successfully updated the ANN
>   this.isPrimaryCheckPointer = success;
> }
> {code}
> isPrimaryCheckPointer should be outside the if condition.
> If the ANN update was not successful, then isPrimaryCheckPointer should be 
> set to false.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12914) Block report leases cause missing blocks until next report

2019-07-02 Thread star (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16877439#comment-16877439
 ] 

star commented on HDFS-12914:
-

Yes, It is broken in branch-3.0 with 'private' indicator for 'processReport'. 
There's no such issue for other banch HDFS-11673, in which 'private' indicator 
is removed.
{quote} Collection processReport(
      final DatanodeStorageInfo storageInfo,
      final BlockListAsLongs report,
      BlockReportContext context) throws IOException {{quote}
I am not sure whether branch-3.0 should be covered for this issue. 

[~jojochuang], what do you think? 

> Block report leases cause missing blocks until next report
> --
>
> Key: HDFS-12914
> URL: https://issues.apache.org/jira/browse/HDFS-12914
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.8.0, 2.9.2
>Reporter: Daryn Sharp
>Assignee: Santosh Marella
>Priority: Critical
> Fix For: 3.0.4, 3.3.0, 3.2.1, 3.1.3
>
> Attachments: HDFS-12914-branch-2.001.patch, 
> HDFS-12914-trunk.00.patch, HDFS-12914-trunk.01.patch, HDFS-12914.005.patch, 
> HDFS-12914.006.patch, HDFS-12914.007.patch, HDFS-12914.008.patch, 
> HDFS-12914.009.patch, HDFS-12914.branch-2.patch, 
> HDFS-12914.branch-3.1.001.patch, HDFS-12914.branch-3.1.002.patch, 
> HDFS-12914.branch-3.2.patch, HDFS-12914.utfix.patch
>
>
> {{BlockReportLeaseManager#checkLease}} will reject FBRs from DNs for 
> conditions such as "unknown datanode", "not in pending set", "lease has 
> expired", wrong lease id, etc.  Lease rejection does not throw an exception.  
> It returns false which bubbles up to  {{NameNodeRpcServer#blockReport}} and 
> interpreted as {{noStaleStorages}}.
> A re-registering node whose FBR is rejected from an invalid lease becomes 
> active with _no blocks_.  A replication storm ensues possibly causing DNs to 
> temporarily go dead (HDFS-12645), leading to more FBR lease rejections on 
> re-registration.  The cluster will have many "missing blocks" until the DNs 
> next FBR is sent and/or forced.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-12914) Block report leases cause missing blocks until next report

2019-07-02 Thread star (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16877439#comment-16877439
 ] 

star edited comment on HDFS-12914 at 7/3/19 2:43 AM:
-

Yes, It is broken in branch-3.0 with 'private' indicator for 'processReport'. 
There's no such issue for other branch HDFS-11673, in which 'private' indicator 
is removed.
{quote} Collection processReport(
       final DatanodeStorageInfo storageInfo,
       final BlockListAsLongs report,
       BlockReportContext context) throws IOException {
{quote}
I am not sure whether branch-3.0 should be covered for this issue. 

[~jojochuang], what do you think? 


was (Author: starphin):
Yes, It is broken in branch-3.0 with 'private' indicator for 'processReport'. 
There's no such issue for other banch HDFS-11673, in which 'private' indicator 
is removed.
{quote} Collection processReport(
      final DatanodeStorageInfo storageInfo,
      final BlockListAsLongs report,
      BlockReportContext context) throws IOException {{quote}
I am not sure whether branch-3.0 should be covered for this issue. 

[~jojochuang], what do you think? 

> Block report leases cause missing blocks until next report
> --
>
> Key: HDFS-12914
> URL: https://issues.apache.org/jira/browse/HDFS-12914
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.8.0, 2.9.2
>Reporter: Daryn Sharp
>Assignee: Santosh Marella
>Priority: Critical
> Fix For: 3.0.4, 3.3.0, 3.2.1, 3.1.3
>
> Attachments: HDFS-12914-branch-2.001.patch, 
> HDFS-12914-trunk.00.patch, HDFS-12914-trunk.01.patch, HDFS-12914.005.patch, 
> HDFS-12914.006.patch, HDFS-12914.007.patch, HDFS-12914.008.patch, 
> HDFS-12914.009.patch, HDFS-12914.branch-2.patch, 
> HDFS-12914.branch-3.1.001.patch, HDFS-12914.branch-3.1.002.patch, 
> HDFS-12914.branch-3.2.patch, HDFS-12914.utfix.patch
>
>
> {{BlockReportLeaseManager#checkLease}} will reject FBRs from DNs for 
> conditions such as "unknown datanode", "not in pending set", "lease has 
> expired", wrong lease id, etc.  Lease rejection does not throw an exception.  
> It returns false which bubbles up to  {{NameNodeRpcServer#blockReport}} and 
> interpreted as {{noStaleStorages}}.
> A re-registering node whose FBR is rejected from an invalid lease becomes 
> active with _no blocks_.  A replication storm ensues possibly causing DNs to 
> temporarily go dead (HDFS-12645), leading to more FBR lease rejections on 
> re-registration.  The cluster will have many "missing blocks" until the DNs 
> next FBR is sent and/or forced.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-14424) NN failover failed beacuse of lossing paxos directory

2019-06-11 Thread star (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

star resolved HDFS-14424.
-
Resolution: Duplicate

> NN failover failed beacuse of lossing paxos directory
> -
>
> Key: HDFS-14424
> URL: https://issues.apache.org/jira/browse/HDFS-14424
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: star
>Assignee: star
>Priority: Major
>
> Recently, our hadoop namenode shutdown when switching active namenode, just 
> because of missing paxos directory. It is created in the default /tmp path 
> and deleted by os for no operation in 7 days. We can avoid this by moving 
> journal directory to a none tmp dir, but it‘s better to make sure namenode 
> works well by a default config.
> The issue throws exception similar to HDFS-10659, also caused by missing 
> paxos directory.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12914) Block report leases cause missing blocks until next report

2019-06-08 Thread star (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16859347#comment-16859347
 ] 

star commented on HDFS-12914:
-

lgtm. Thanks [~hexiaoqiao] for anwsering. Failing test maybe checked. 

> Block report leases cause missing blocks until next report
> --
>
> Key: HDFS-12914
> URL: https://issues.apache.org/jira/browse/HDFS-12914
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.8.0, 2.9.2
>Reporter: Daryn Sharp
>Assignee: Santosh Marella
>Priority: Critical
> Attachments: HDFS-12914-branch-2.001.patch, 
> HDFS-12914-trunk.00.patch, HDFS-12914-trunk.01.patch, HDFS-12914.005.patch, 
> HDFS-12914.006.patch, HDFS-12914.007.patch
>
>
> {{BlockReportLeaseManager#checkLease}} will reject FBRs from DNs for 
> conditions such as "unknown datanode", "not in pending set", "lease has 
> expired", wrong lease id, etc.  Lease rejection does not throw an exception.  
> It returns false which bubbles up to  {{NameNodeRpcServer#blockReport}} and 
> interpreted as {{noStaleStorages}}.
> A re-registering node whose FBR is rejected from an invalid lease becomes 
> active with _no blocks_.  A replication storm ensues possibly causing DNs to 
> temporarily go dead (HDFS-12645), leading to more FBR lease rejections on 
> re-registration.  The cluster will have many "missing blocks" until the DNs 
> next FBR is sent and/or forced.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12914) Block report leases cause missing blocks until next report

2019-06-08 Thread star (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16859272#comment-16859272
 ] 

star commented on HDFS-12914:
-

Yes client refers to datanode. DN get REGISTER command after the startup of NN 
and make fullblock report every 6 hours. It's ok to send other rpc request to 
NameNode after its first REGISTER command. No need to register again just in 
case of full block lease id expiration. Feel free to ignore this if doesn't 
make sense to you.
{code:java}
I think `client` you mentioned is Datanode, right? If one datanode not register 
and send some other RPC request to NameNode, it should get 
RegisterCommand.REGISTER and try to re-register. It seems a normal flow.
Thanks Íñigo Goiri and star again. HDFS-12914.007.patch please take another 
kindly review.
{code}

> Block report leases cause missing blocks until next report
> --
>
> Key: HDFS-12914
> URL: https://issues.apache.org/jira/browse/HDFS-12914
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.8.0, 2.9.2
>Reporter: Daryn Sharp
>Assignee: Santosh Marella
>Priority: Critical
> Attachments: HDFS-12914-branch-2.001.patch, 
> HDFS-12914-trunk.00.patch, HDFS-12914-trunk.01.patch, HDFS-12914.005.patch, 
> HDFS-12914.006.patch, HDFS-12914.007.patch
>
>
> {{BlockReportLeaseManager#checkLease}} will reject FBRs from DNs for 
> conditions such as "unknown datanode", "not in pending set", "lease has 
> expired", wrong lease id, etc.  Lease rejection does not throw an exception.  
> It returns false which bubbles up to  {{NameNodeRpcServer#blockReport}} and 
> interpreted as {{noStaleStorages}}.
> A re-registering node whose FBR is rejected from an invalid lease becomes 
> active with _no blocks_.  A replication storm ensues possibly causing DNs to 
> temporarily go dead (HDFS-12645), leading to more FBR lease rejections on 
> re-registration.  The cluster will have many "missing blocks" until the DNs 
> next FBR is sent and/or forced.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-12914) Block report leases cause missing blocks until next report

2019-06-08 Thread star (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16859250#comment-16859250
 ] 

star edited comment on HDFS-12914 at 6/8/19 4:10 PM:
-

Few comments about your unit tests. 
 # Following codes bypass lease expiration checking logic by removing valid 
lease id. Better to keep it as it is in running time.

{code:java}
// Remove full block report lease about dn
  spyBlockManager.getBlockReportLeaseManager()
  .removeLease(datanodeDescriptor);{code}
      2. Do we really need to response with a RegisterCommand.REGISTER command 
to client? It's a somewhat heavy command. Should we just let the client know 
its failure on block report(such as a IOException response as my codes) and 
just try few more times. 


was (Author: starphin):
Few comments about your unit tests. 
 # Following codes bypass lease expiration checking logic by removing valid 
lease id. Better to keep it as it is in running time.

{code:java}
// Remove full block report lease about dn
  spyBlockManager.getBlockReportLeaseManager()
  .removeLease(datanodeDescriptor);{code}
      2. Do we really need to response with a RegisterCommand.REGISTER command 
to client? It's a somewhat heavy command. Should we just let the client know 
its failure on block report and just try few more times. 

> Block report leases cause missing blocks until next report
> --
>
> Key: HDFS-12914
> URL: https://issues.apache.org/jira/browse/HDFS-12914
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.8.0, 2.9.2
>Reporter: Daryn Sharp
>Assignee: Santosh Marella
>Priority: Critical
> Attachments: HDFS-12914-branch-2.001.patch, 
> HDFS-12914-trunk.00.patch, HDFS-12914-trunk.01.patch, HDFS-12914.005.patch, 
> HDFS-12914.006.patch
>
>
> {{BlockReportLeaseManager#checkLease}} will reject FBRs from DNs for 
> conditions such as "unknown datanode", "not in pending set", "lease has 
> expired", wrong lease id, etc.  Lease rejection does not throw an exception.  
> It returns false which bubbles up to  {{NameNodeRpcServer#blockReport}} and 
> interpreted as {{noStaleStorages}}.
> A re-registering node whose FBR is rejected from an invalid lease becomes 
> active with _no blocks_.  A replication storm ensues possibly causing DNs to 
> temporarily go dead (HDFS-12645), leading to more FBR lease rejections on 
> re-registration.  The cluster will have many "missing blocks" until the DNs 
> next FBR is sent and/or forced.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-12914) Block report leases cause missing blocks until next report

2019-06-08 Thread star (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16859250#comment-16859250
 ] 

star edited comment on HDFS-12914 at 6/8/19 4:09 PM:
-

Few comments about your unit tests. 
 # Following codes bypass lease expiration checking logic by removing valid 
lease id. Better to keep it as it is in running time.

{code:java}
// Remove full block report lease about dn
  spyBlockManager.getBlockReportLeaseManager()
  .removeLease(datanodeDescriptor);{code}
      2. Do we really need to response with a RegisterCommand.REGISTER command 
to client? It's a somewhat heavy command. Should we just let the client know 
its failure on block report and just try few more times. 


was (Author: starphin):
Few comments about your unit tests. 

Following codes bypass lease expiration checking logic by removing valid lease 
id. Better to keep it as it is in running time.
{code:java}
// Remove full block report lease about dn
  spyBlockManager.getBlockReportLeaseManager()
  .removeLease(datanodeDescriptor);
{code}

> Block report leases cause missing blocks until next report
> --
>
> Key: HDFS-12914
> URL: https://issues.apache.org/jira/browse/HDFS-12914
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.8.0, 2.9.2
>Reporter: Daryn Sharp
>Assignee: Santosh Marella
>Priority: Critical
> Attachments: HDFS-12914-branch-2.001.patch, 
> HDFS-12914-trunk.00.patch, HDFS-12914-trunk.01.patch, HDFS-12914.005.patch, 
> HDFS-12914.006.patch
>
>
> {{BlockReportLeaseManager#checkLease}} will reject FBRs from DNs for 
> conditions such as "unknown datanode", "not in pending set", "lease has 
> expired", wrong lease id, etc.  Lease rejection does not throw an exception.  
> It returns false which bubbles up to  {{NameNodeRpcServer#blockReport}} and 
> interpreted as {{noStaleStorages}}.
> A re-registering node whose FBR is rejected from an invalid lease becomes 
> active with _no blocks_.  A replication storm ensues possibly causing DNs to 
> temporarily go dead (HDFS-12645), leading to more FBR lease rejections on 
> re-registration.  The cluster will have many "missing blocks" until the DNs 
> next FBR is sent and/or forced.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-12914) Block report leases cause missing blocks until next report

2019-06-08 Thread star (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16859246#comment-16859246
 ] 

star edited comment on HDFS-12914 at 6/8/19 3:56 PM:
-

[~hexiaoqiao], I also write a unit test for this issue, mostly similar to 
yours. Pasted here just for ref.

Other than the test code, a piece of code changed. BlockManager#processReport 
will throw IOException to indicate an invalid lease id. Client will get the 
exception.
{code:java}
if (context != null) {
  if (!blockReportLeaseManager.checkLease(node, startTime,
context.getLeaseId())) {
throw new IOException("Invalid block report lease id 
'"+context.getLeaseId()+"'");
  }
}{code}
{code:java}
//Before test start
conf.setLong(DFSConfigKeys.DFS_NAMENODE_FULL_BLOCK_REPORT_LEASE_LENGTH_MS, 
500L);


@Test
public void testDelayedBlockReport() throws IOException{
  FSNamesystem namesystem = cluster.getNameNode(0).getNamesystem();

  BlockManager testBlockManager = Mockito.spy(namesystem.getBlockManager());

  Mockito.doAnswer(new Answer() {
@Override
public Boolean answer(InvocationOnMock invocationOnMock) throws Throwable {
  //sleep 1000 ms to delay processing of current report
  Thread.sleep(1000);
  return (Boolean)invocationOnMock.callRealMethod();
}
  }).when(testBlockManager).processReport(
  Mockito.any(DatanodeID.class), Mockito.any(DatanodeStorage.class),
  Mockito.any(BlockListAsLongs.class),
  Mockito.any(BlockReportContext.class));
  namesystem.setBlockManagerForTesting(testBlockManager);

  String bpid = namesystem.getBlockPoolId();
  DataNode dn = cluster.getDataNodes().get(0);
  DatanodeRegistration dnReg = dn.getDNRegistrationForBP(bpid);

  namesystem.readLock();
  long leaseId = testBlockManager.requestBlockReportLeaseId(dnReg);
  namesystem.readUnlock();

  Map report = cluster.getBlockReport(bpid, 
0);

  List reportList = new ArrayList<>();
  for(Map.Entry en : report.entrySet()){
reportList.add(new StorageBlockReport(en.getKey(), en.getValue()));
  }

  //it will throw IOException if lease id is invalid
  cluster.getNameNode().getRpcServer().blockReport(
  dnReg, bpid, reportList.toArray(new StorageBlockReport[]{}),
  new BlockReportContext(1, 0, System.nanoTime(), leaseId, true));
}
{code}


was (Author: starphin):
[~hexiaoqiao], I also write a unit test for this issue, mostly similar to 
yours. Pasted here just for ref.

Other than the test code, a piece of code changed. BlockManager#processReport 
will throw IOException to indicate an invalid lease id. Client will get the 
exception.
{code:java}
if (context != null) {
  if (!blockReportLeaseManager.checkLease(node, startTime,
context.getLeaseId())) {
throw new IOException("Invalid block report lease id 
'"+context.getLeaseId()+"'");
  }
}{code}
{code:java}
@Test
public void testDelayedBlockReport() throws IOException{
  FSNamesystem namesystem = cluster.getNameNode(0).getNamesystem();

  BlockManager testBlockManager = Mockito.spy(namesystem.getBlockManager());

  Mockito.doAnswer(new Answer() {
@Override
public Boolean answer(InvocationOnMock invocationOnMock) throws Throwable {
  //sleep 1000 ms to delay processing of current report
  Thread.sleep(1000);
  return (Boolean)invocationOnMock.callRealMethod();
}
  }).when(testBlockManager).processReport(
  Mockito.any(DatanodeID.class), Mockito.any(DatanodeStorage.class),
  Mockito.any(BlockListAsLongs.class),
  Mockito.any(BlockReportContext.class));
  namesystem.setBlockManagerForTesting(testBlockManager);

  String bpid = namesystem.getBlockPoolId();
  DataNode dn = cluster.getDataNodes().get(0);
  DatanodeRegistration dnReg = dn.getDNRegistrationForBP(bpid);

  namesystem.readLock();
  long leaseId = testBlockManager.requestBlockReportLeaseId(dnReg);
  namesystem.readUnlock();

  Map report = cluster.getBlockReport(bpid, 
0);

  List reportList = new ArrayList<>();
  for(Map.Entry en : report.entrySet()){
reportList.add(new StorageBlockReport(en.getKey(), en.getValue()));
  }

  //it will throw IOException if lease id is invalid
  cluster.getNameNode().getRpcServer().blockReport(
  dnReg, bpid, reportList.toArray(new StorageBlockReport[]{}),
  new BlockReportContext(1, 0, System.nanoTime(), leaseId, true));
}
{code}

> Block report leases cause missing blocks until next report
> --
>
> Key: HDFS-12914
> URL: https://issues.apache.org/jira/browse/HDFS-12914
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.8.0, 2.9.2
>Reporter: Daryn Sharp
>Assignee: Santosh Marella
>Priority: Critical
> Attachments: HDFS-12914-branch-2.001.patch, 
> HDFS-12914-trunk.00.patch, 

[jira] [Commented] (HDFS-12914) Block report leases cause missing blocks until next report

2019-06-08 Thread star (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16859250#comment-16859250
 ] 

star commented on HDFS-12914:
-

Few comments about your unit tests. 

Following codes bypass lease expiration checking logic by removing valid lease 
id. Better to keep it as it is in running time.
{code:java}
// Remove full block report lease about dn
  spyBlockManager.getBlockReportLeaseManager()
  .removeLease(datanodeDescriptor);
{code}

> Block report leases cause missing blocks until next report
> --
>
> Key: HDFS-12914
> URL: https://issues.apache.org/jira/browse/HDFS-12914
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.8.0, 2.9.2
>Reporter: Daryn Sharp
>Assignee: Santosh Marella
>Priority: Critical
> Attachments: HDFS-12914-branch-2.001.patch, 
> HDFS-12914-trunk.00.patch, HDFS-12914-trunk.01.patch, HDFS-12914.005.patch, 
> HDFS-12914.006.patch
>
>
> {{BlockReportLeaseManager#checkLease}} will reject FBRs from DNs for 
> conditions such as "unknown datanode", "not in pending set", "lease has 
> expired", wrong lease id, etc.  Lease rejection does not throw an exception.  
> It returns false which bubbles up to  {{NameNodeRpcServer#blockReport}} and 
> interpreted as {{noStaleStorages}}.
> A re-registering node whose FBR is rejected from an invalid lease becomes 
> active with _no blocks_.  A replication storm ensues possibly causing DNs to 
> temporarily go dead (HDFS-12645), leading to more FBR lease rejections on 
> re-registration.  The cluster will have many "missing blocks" until the DNs 
> next FBR is sent and/or forced.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12914) Block report leases cause missing blocks until next report

2019-06-08 Thread star (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16859246#comment-16859246
 ] 

star commented on HDFS-12914:
-

[~hexiaoqiao], I also write a unit test for this issue, mostly similar to 
yours. Pasted here just for ref.

Other than the test code, a piece of code changed. BlockManager#processReport 
will throw IOException to indicate an invalid lease id. Client will get the 
exception.
{code:java}
if (context != null) {
  if (!blockReportLeaseManager.checkLease(node, startTime,
context.getLeaseId())) {
throw new IOException("Invalid block report lease id 
'"+context.getLeaseId()+"'");
  }
}{code}
{code:java}
@Test
public void testDelayedBlockReport() throws IOException{
  FSNamesystem namesystem = cluster.getNameNode(0).getNamesystem();

  BlockManager testBlockManager = Mockito.spy(namesystem.getBlockManager());

  Mockito.doAnswer(new Answer() {
@Override
public Boolean answer(InvocationOnMock invocationOnMock) throws Throwable {
  //sleep 1000 ms to delay processing of current report
  Thread.sleep(1000);
  return (Boolean)invocationOnMock.callRealMethod();
}
  }).when(testBlockManager).processReport(
  Mockito.any(DatanodeID.class), Mockito.any(DatanodeStorage.class),
  Mockito.any(BlockListAsLongs.class),
  Mockito.any(BlockReportContext.class));
  namesystem.setBlockManagerForTesting(testBlockManager);

  String bpid = namesystem.getBlockPoolId();
  DataNode dn = cluster.getDataNodes().get(0);
  DatanodeRegistration dnReg = dn.getDNRegistrationForBP(bpid);

  namesystem.readLock();
  long leaseId = testBlockManager.requestBlockReportLeaseId(dnReg);
  namesystem.readUnlock();

  Map report = cluster.getBlockReport(bpid, 
0);

  List reportList = new ArrayList<>();
  for(Map.Entry en : report.entrySet()){
reportList.add(new StorageBlockReport(en.getKey(), en.getValue()));
  }

  //it will throw IOException if lease id is invalid
  cluster.getNameNode().getRpcServer().blockReport(
  dnReg, bpid, reportList.toArray(new StorageBlockReport[]{}),
  new BlockReportContext(1, 0, System.nanoTime(), leaseId, true));
}
{code}

> Block report leases cause missing blocks until next report
> --
>
> Key: HDFS-12914
> URL: https://issues.apache.org/jira/browse/HDFS-12914
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.8.0, 2.9.2
>Reporter: Daryn Sharp
>Assignee: Santosh Marella
>Priority: Critical
> Attachments: HDFS-12914-branch-2.001.patch, 
> HDFS-12914-trunk.00.patch, HDFS-12914-trunk.01.patch, HDFS-12914.005.patch, 
> HDFS-12914.006.patch
>
>
> {{BlockReportLeaseManager#checkLease}} will reject FBRs from DNs for 
> conditions such as "unknown datanode", "not in pending set", "lease has 
> expired", wrong lease id, etc.  Lease rejection does not throw an exception.  
> It returns false which bubbles up to  {{NameNodeRpcServer#blockReport}} and 
> interpreted as {{noStaleStorages}}.
> A re-registering node whose FBR is rejected from an invalid lease becomes 
> active with _no blocks_.  A replication storm ensues possibly causing DNs to 
> temporarily go dead (HDFS-12645), leading to more FBR lease rejections on 
> re-registration.  The cluster will have many "missing blocks" until the DNs 
> next FBR is sent and/or forced.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-5853) Add "hadoop.user.group.metrics.percentiles.intervals" to hdfs-default.xml

2019-06-05 Thread star (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-5853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

star reassigned HDFS-5853:
--

Assignee: (was: star)

> Add "hadoop.user.group.metrics.percentiles.intervals" to hdfs-default.xml
> -
>
> Key: HDFS-5853
> URL: https://issues.apache.org/jira/browse/HDFS-5853
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation, namenode
>Affects Versions: 2.3.0
>Reporter: Akira Ajisaka
>Priority: Minor
> Fix For: 2.7.0
>
> Attachments: HDFS-5853.patch
>
>
> "hadoop.user.group.metrics.percentiles.intervals" was added in HDFS-5220, but 
> the parameter is not written in hdfs-default.xml.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-5853) Add "hadoop.user.group.metrics.percentiles.intervals" to hdfs-default.xml

2019-06-05 Thread star (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-5853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

star reassigned HDFS-5853:
--

Assignee: star  (was: Akira Ajisaka)

> Add "hadoop.user.group.metrics.percentiles.intervals" to hdfs-default.xml
> -
>
> Key: HDFS-5853
> URL: https://issues.apache.org/jira/browse/HDFS-5853
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation, namenode
>Affects Versions: 2.3.0
>Reporter: Akira Ajisaka
>Assignee: star
>Priority: Minor
> Fix For: 2.7.0
>
> Attachments: HDFS-5853.patch
>
>
> "hadoop.user.group.metrics.percentiles.intervals" was added in HDFS-5220, but 
> the parameter is not written in hdfs-default.xml.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-12914) Block report leases cause missing blocks until next report

2019-05-21 Thread star (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16845414#comment-16845414
 ] 

star edited comment on HDFS-12914 at 5/22/19 2:58 AM:
--

Thanks [~smarella]. A few questioins for reference only.

1. Guess it's a mistake updating version of protobuf.

 
{code:java}
2.5.0.t02
{code}
2. Do we need a test case?

3. Should we do more logs about full block lease as INFO level so that we can 
inspect issues easier? [~smarella], [~hexiaoqiao],[~jojochuang].


was (Author: starphin):
Thanks [~smarella]. A few questioins for reference only.

1. Guess it's a mistake updating version of protobuf.

 
{code:java}
2.5.0.t02
{code}
2. Do we need a test case? 

3. Should we do more logs about full block lease so that we can inspect issues 
easier? [~smarella], [~hexiaoqiao],[~jojochuang].

> Block report leases cause missing blocks until next report
> --
>
> Key: HDFS-12914
> URL: https://issues.apache.org/jira/browse/HDFS-12914
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.8.0, 2.9.2
>Reporter: Daryn Sharp
>Assignee: Santosh Marella
>Priority: Critical
> Attachments: HDFS-12914-branch-2.001.patch, 
> HDFS-12914-trunk.00.patch, HDFS-12914-trunk.01.patch
>
>
> {{BlockReportLeaseManager#checkLease}} will reject FBRs from DNs for 
> conditions such as "unknown datanode", "not in pending set", "lease has 
> expired", wrong lease id, etc.  Lease rejection does not throw an exception.  
> It returns false which bubbles up to  {{NameNodeRpcServer#blockReport}} and 
> interpreted as {{noStaleStorages}}.
> A re-registering node whose FBR is rejected from an invalid lease becomes 
> active with _no blocks_.  A replication storm ensues possibly causing DNs to 
> temporarily go dead (HDFS-12645), leading to more FBR lease rejections on 
> re-registration.  The cluster will have many "missing blocks" until the DNs 
> next FBR is sent and/or forced.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12914) Block report leases cause missing blocks until next report

2019-05-21 Thread star (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16845414#comment-16845414
 ] 

star commented on HDFS-12914:
-

Thanks [~smarella]. A few questioins for reference only.

1. Guess it's a mistake updating version of protobuf.

 
{code:java}
2.5.0.t02
{code}
2. Do we need a test case? 

3. Should we do more logs about full block lease so that we can inspect issues 
easier? [~smarella], [~hexiaoqiao],[~jojochuang].

> Block report leases cause missing blocks until next report
> --
>
> Key: HDFS-12914
> URL: https://issues.apache.org/jira/browse/HDFS-12914
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.8.0, 2.9.2
>Reporter: Daryn Sharp
>Assignee: Santosh Marella
>Priority: Critical
> Attachments: HDFS-12914-branch-2.001.patch, HDFS-12914-trunk.00.patch
>
>
> {{BlockReportLeaseManager#checkLease}} will reject FBRs from DNs for 
> conditions such as "unknown datanode", "not in pending set", "lease has 
> expired", wrong lease id, etc.  Lease rejection does not throw an exception.  
> It returns false which bubbles up to  {{NameNodeRpcServer#blockReport}} and 
> interpreted as {{noStaleStorages}}.
> A re-registering node whose FBR is rejected from an invalid lease becomes 
> active with _no blocks_.  A replication storm ensues possibly causing DNs to 
> temporarily go dead (HDFS-12645), leading to more FBR lease rejections on 
> re-registration.  The cluster will have many "missing blocks" until the DNs 
> next FBR is sent and/or forced.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-12914) Block report leases cause missing blocks until next report

2019-05-20 Thread star (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844056#comment-16844056
 ] 

star edited comment on HDFS-12914 at 5/20/19 3:47 PM:
--

[~hexiaoqiao] proposed a good solution to solve this issue.

Except for rpc call queue, there's also a queue for processing block report. 
Maybe we should consider this and check lease id before putting it in the block 
report queue. I think this mainly causes delaying of block report process.


was (Author: starphin):
[~hexiaoqiao] proposed a good solution to solve this issue.

Except for rpc call queue, there's also a queue for processing block report. 
Maybe we should consider this and check lease id before putting it in the block 
report queue.

> Block report leases cause missing blocks until next report
> --
>
> Key: HDFS-12914
> URL: https://issues.apache.org/jira/browse/HDFS-12914
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.8.0
>Reporter: Daryn Sharp
>Priority: Critical
>
> {{BlockReportLeaseManager#checkLease}} will reject FBRs from DNs for 
> conditions such as "unknown datanode", "not in pending set", "lease has 
> expired", wrong lease id, etc.  Lease rejection does not throw an exception.  
> It returns false which bubbles up to  {{NameNodeRpcServer#blockReport}} and 
> interpreted as {{noStaleStorages}}.
> A re-registering node whose FBR is rejected from an invalid lease becomes 
> active with _no blocks_.  A replication storm ensues possibly causing DNs to 
> temporarily go dead (HDFS-12645), leading to more FBR lease rejections on 
> re-registration.  The cluster will have many "missing blocks" until the DNs 
> next FBR is sent and/or forced.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12914) Block report leases cause missing blocks until next report

2019-05-20 Thread star (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844056#comment-16844056
 ] 

star commented on HDFS-12914:
-

[~hexiaoqiao] proposed a good solution to solve this issue.

Except for rpc call queue, there's also a queue for processing block report. 
Maybe we should consider this and check lease id before putting it in the block 
report queue.

> Block report leases cause missing blocks until next report
> --
>
> Key: HDFS-12914
> URL: https://issues.apache.org/jira/browse/HDFS-12914
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.8.0
>Reporter: Daryn Sharp
>Priority: Critical
>
> {{BlockReportLeaseManager#checkLease}} will reject FBRs from DNs for 
> conditions such as "unknown datanode", "not in pending set", "lease has 
> expired", wrong lease id, etc.  Lease rejection does not throw an exception.  
> It returns false which bubbles up to  {{NameNodeRpcServer#blockReport}} and 
> interpreted as {{noStaleStorages}}.
> A re-registering node whose FBR is rejected from an invalid lease becomes 
> active with _no blocks_.  A replication storm ensues possibly causing DNs to 
> temporarily go dead (HDFS-12645), leading to more FBR lease rejections on 
> re-registration.  The cluster will have many "missing blocks" until the DNs 
> next FBR is sent and/or forced.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-12914) Block report leases cause missing blocks until next report

2019-05-20 Thread star (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16843829#comment-16843829
 ] 

star edited comment on HDFS-12914 at 5/20/19 10:08 AM:
---

[~smarella] how many DNs do you have?  According to the limited logs, I think 
it is caused by following case. A high cpu load of SNN delayed the processing 
of full block report.

 
||DN1...||DN2||
|register|register|
|request Lease| |
|process Report| |
|...|request Lease|
|process Report|{color:#707070}_more than 5 minutes_{color}|
|...|{color:#d04437}process Report (failed){color}|

There's no logs between 2019-05-16 15:15:35 and 2019-05-16 15:31:11. Logs 
unrelated to 10.54.63.120:50010 are filtered out, right [~smarella]?

In that time, I think the SNN is processing blockreports from other DN. Untill 
2019-05-16 15:31:11, SNN began to process block reports from that DN. It is 6 
minutes after when full block lease id is requested, beyond default expire 
value 5 minutes (DFS_NAMENODE_FULL_BLOCK_REPORT_LEASE_LENGTH_MS_DEFAULT). 

Don't known when a full block lease id is got from server, for there's no info 
log about it. I guess it's about 5 minutes before the first failed report, say 
15:26:29. 

 


was (Author: starphin):
[~smarella] how many DNs do you have?  According to the limited logs, I think 
it is caused by following case. A high cpu load of SNN delayed the processing 
of full block report.

 
||DN1...||DN2||
|register|register|
|request Lease| |
|process Report| |
|...|request Lease|
|process Report|{color:#707070}_more than 5 minutes_{color}|
|...|process Report|

 

There's no logs between 2019-05-16 15:15:35 and 2019-05-16 15:31:11. Logs 
unrelated to 10.54.63.120:50010 are filtered out, right [~smarella]?

In that time, I think the SNN is processing blockreports from other DN. Untill 
2019-05-16 15:31:11, SNN began to process block reports from that DN. It is 6 
minutes after when full block lease id is requested, beyond default expire 
value 5 minutes (DFS_NAMENODE_FULL_BLOCK_REPORT_LEASE_LENGTH_MS_DEFAULT). 

Don't known when a full block lease id is got from server, for there's no info 
log about it. I guess it's about 5 minutes before the first failed report, say 
15:26:29. 

 

> Block report leases cause missing blocks until next report
> --
>
> Key: HDFS-12914
> URL: https://issues.apache.org/jira/browse/HDFS-12914
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.8.0
>Reporter: Daryn Sharp
>Priority: Critical
>
> {{BlockReportLeaseManager#checkLease}} will reject FBRs from DNs for 
> conditions such as "unknown datanode", "not in pending set", "lease has 
> expired", wrong lease id, etc.  Lease rejection does not throw an exception.  
> It returns false which bubbles up to  {{NameNodeRpcServer#blockReport}} and 
> interpreted as {{noStaleStorages}}.
> A re-registering node whose FBR is rejected from an invalid lease becomes 
> active with _no blocks_.  A replication storm ensues possibly causing DNs to 
> temporarily go dead (HDFS-12645), leading to more FBR lease rejections on 
> re-registration.  The cluster will have many "missing blocks" until the DNs 
> next FBR is sent and/or forced.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-12914) Block report leases cause missing blocks until next report

2019-05-20 Thread star (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16843829#comment-16843829
 ] 

star edited comment on HDFS-12914 at 5/20/19 10:07 AM:
---

[~smarella] how many DNs do you have?  According to the limited logs, I think 
it is caused by following case. A high cpu load of SNN delayed the processing 
of full block report.

 
||DN1...||DN2||
|register|register|
|request Lease| |
|process Request| |
|...|request Lease|
|process Request|{color:#707070}_more than 5 minutes_{color}|
|...|process Request|

 

There's no logs between 2019-05-16 15:15:35 and 2019-05-16 15:31:11. Logs 
unrelated to 10.54.63.120:50010 are filtered out, right [~smarella]?

In that time, I think the SNN is processing blockreports from other DN. Untill 
2019-05-16 15:31:11, SNN began to process block reports from that DN. It is 6 
minutes after when full block lease id is requested, beyond default expire 
value 5 minutes (DFS_NAMENODE_FULL_BLOCK_REPORT_LEASE_LENGTH_MS_DEFAULT). 

Don't known when a full block lease id is got from server, for there's no info 
log about it. I guess it's about 5 minutes before the first failed report, say 
15:26:29. 

 


was (Author: starphin):
[~smarella] how many DNs do you have?  According to the limited logs, I think 
it is caused by following case. A high load delayed the process of full block 
report.

 
||DN1...||DN2||
|register|register|
|request Lease| |
|process Request| |
|...|request Lease|
|process Request|{color:#707070}_more than 5 minutes_{color}|
|...|process Request|

 

There's no logs between 2019-05-16 15:15:35 and 2019-05-16 15:31:11. Logs 
unrelated to 10.54.63.120:50010 are filtered out, right [~smarella]?

In that time, I think the SNN is processing blockreports from other DN. Untill 
2019-05-16 15:31:11, SNN began to process block reports from that DN. It is 6 
minutes after when full block lease id is requested, beyond default expire 
value 5 minutes (DFS_NAMENODE_FULL_BLOCK_REPORT_LEASE_LENGTH_MS_DEFAULT). 

Don't known when a full block lease id is got from server, for there's no info 
log about it. I guess it's about 5 minutes before the first failed report, say 
15:26:29. 

 

> Block report leases cause missing blocks until next report
> --
>
> Key: HDFS-12914
> URL: https://issues.apache.org/jira/browse/HDFS-12914
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.8.0
>Reporter: Daryn Sharp
>Priority: Critical
>
> {{BlockReportLeaseManager#checkLease}} will reject FBRs from DNs for 
> conditions such as "unknown datanode", "not in pending set", "lease has 
> expired", wrong lease id, etc.  Lease rejection does not throw an exception.  
> It returns false which bubbles up to  {{NameNodeRpcServer#blockReport}} and 
> interpreted as {{noStaleStorages}}.
> A re-registering node whose FBR is rejected from an invalid lease becomes 
> active with _no blocks_.  A replication storm ensues possibly causing DNs to 
> temporarily go dead (HDFS-12645), leading to more FBR lease rejections on 
> re-registration.  The cluster will have many "missing blocks" until the DNs 
> next FBR is sent and/or forced.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-12914) Block report leases cause missing blocks until next report

2019-05-20 Thread star (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16843829#comment-16843829
 ] 

star edited comment on HDFS-12914 at 5/20/19 10:07 AM:
---

[~smarella] how many DNs do you have?  According to the limited logs, I think 
it is caused by following case. A high cpu load of SNN delayed the processing 
of full block report.

 
||DN1...||DN2||
|register|register|
|request Lease| |
|process Report| |
|...|request Lease|
|process Report|{color:#707070}_more than 5 minutes_{color}|
|...|process Report|

 

There's no logs between 2019-05-16 15:15:35 and 2019-05-16 15:31:11. Logs 
unrelated to 10.54.63.120:50010 are filtered out, right [~smarella]?

In that time, I think the SNN is processing blockreports from other DN. Untill 
2019-05-16 15:31:11, SNN began to process block reports from that DN. It is 6 
minutes after when full block lease id is requested, beyond default expire 
value 5 minutes (DFS_NAMENODE_FULL_BLOCK_REPORT_LEASE_LENGTH_MS_DEFAULT). 

Don't known when a full block lease id is got from server, for there's no info 
log about it. I guess it's about 5 minutes before the first failed report, say 
15:26:29. 

 


was (Author: starphin):
[~smarella] how many DNs do you have?  According to the limited logs, I think 
it is caused by following case. A high cpu load of SNN delayed the processing 
of full block report.

 
||DN1...||DN2||
|register|register|
|request Lease| |
|process Request| |
|...|request Lease|
|process Request|{color:#707070}_more than 5 minutes_{color}|
|...|process Request|

 

There's no logs between 2019-05-16 15:15:35 and 2019-05-16 15:31:11. Logs 
unrelated to 10.54.63.120:50010 are filtered out, right [~smarella]?

In that time, I think the SNN is processing blockreports from other DN. Untill 
2019-05-16 15:31:11, SNN began to process block reports from that DN. It is 6 
minutes after when full block lease id is requested, beyond default expire 
value 5 minutes (DFS_NAMENODE_FULL_BLOCK_REPORT_LEASE_LENGTH_MS_DEFAULT). 

Don't known when a full block lease id is got from server, for there's no info 
log about it. I guess it's about 5 minutes before the first failed report, say 
15:26:29. 

 

> Block report leases cause missing blocks until next report
> --
>
> Key: HDFS-12914
> URL: https://issues.apache.org/jira/browse/HDFS-12914
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.8.0
>Reporter: Daryn Sharp
>Priority: Critical
>
> {{BlockReportLeaseManager#checkLease}} will reject FBRs from DNs for 
> conditions such as "unknown datanode", "not in pending set", "lease has 
> expired", wrong lease id, etc.  Lease rejection does not throw an exception.  
> It returns false which bubbles up to  {{NameNodeRpcServer#blockReport}} and 
> interpreted as {{noStaleStorages}}.
> A re-registering node whose FBR is rejected from an invalid lease becomes 
> active with _no blocks_.  A replication storm ensues possibly causing DNs to 
> temporarily go dead (HDFS-12645), leading to more FBR lease rejections on 
> re-registration.  The cluster will have many "missing blocks" until the DNs 
> next FBR is sent and/or forced.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12914) Block report leases cause missing blocks until next report

2019-05-20 Thread star (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16843829#comment-16843829
 ] 

star commented on HDFS-12914:
-

[~smarella] how many DNs do you have?  According to the limited logs, I think 
it is caused by following case. A high load delayed the process of full block 
report.

 
||DN1...||DN2||
|register|register|
|request Lease| |
|process Request| |
|...|request Lease|
|process Request|{color:#707070}_more than 5 minutes_{color}|
|...|process Request|

 

There's no logs between 2019-05-16 15:15:35 and 2019-05-16 15:31:11. Logs 
unrelated to 10.54.63.120:50010 are filtered out, right [~smarella]?

In that time, I think the SNN is processing blockreports from other DN. Untill 
2019-05-16 15:31:11, SNN began to process block reports from that DN. It is 6 
minutes after when full block lease id is requested, beyond default expire 
value 5 minutes (DFS_NAMENODE_FULL_BLOCK_REPORT_LEASE_LENGTH_MS_DEFAULT). 

Don't known when a full block lease id is got from server, for there's no info 
log about it. I guess it's about 5 minutes before the first failed report, say 
15:26:29. 

 

> Block report leases cause missing blocks until next report
> --
>
> Key: HDFS-12914
> URL: https://issues.apache.org/jira/browse/HDFS-12914
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.8.0
>Reporter: Daryn Sharp
>Priority: Critical
>
> {{BlockReportLeaseManager#checkLease}} will reject FBRs from DNs for 
> conditions such as "unknown datanode", "not in pending set", "lease has 
> expired", wrong lease id, etc.  Lease rejection does not throw an exception.  
> It returns false which bubbles up to  {{NameNodeRpcServer#blockReport}} and 
> interpreted as {{noStaleStorages}}.
> A re-registering node whose FBR is rejected from an invalid lease becomes 
> active with _no blocks_.  A replication storm ensues possibly causing DNs to 
> temporarily go dead (HDFS-12645), leading to more FBR lease rejections on 
> re-registration.  The cluster will have many "missing blocks" until the DNs 
> next FBR is sent and/or forced.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14489) fix naming issue for ScmBlockLocationTestingClient

2019-05-14 Thread star (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

star updated HDFS-14489:

Component/s: (was: fs/ozone)
 ozone

> fix naming issue for ScmBlockLocationTestingClient
> --
>
> Key: HDFS-14489
> URL: https://issues.apache.org/jira/browse/HDFS-14489
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: star
>Assignee: star
>Priority: Major
> Attachments: HDFS-14489.patch
>
>
> class 'ScmBlockLocationTestIngClient' is not named in Camel-Case form. Rename 
> it to ScmBlockLocationTestingClient.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14489) fix naming issue for ScmBlockLocationTestingClient

2019-05-14 Thread star (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

star updated HDFS-14489:

Attachment: HDFS-14489.patch

> fix naming issue for ScmBlockLocationTestingClient
> --
>
> Key: HDFS-14489
> URL: https://issues.apache.org/jira/browse/HDFS-14489
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fs/ozone
>Affects Versions: HDFS-7240
>Reporter: star
>Assignee: star
>Priority: Major
> Attachments: HDFS-14489.patch
>
>
> class 'ScmBlockLocationTestIngClient' is not named in Camel-Case form. Rename 
> it to ScmBlockLocationTestingClient.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14489) fix naming issue for ScmBlockLocationTestingClient

2019-05-14 Thread star (JIRA)
star created HDFS-14489:
---

 Summary: fix naming issue for ScmBlockLocationTestingClient
 Key: HDFS-14489
 URL: https://issues.apache.org/jira/browse/HDFS-14489
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: fs/ozone
Affects Versions: HDFS-7240
Reporter: star
Assignee: star


class 'ScmBlockLocationTestIngClient' is not named in Camel-Case form. Rename 
it to ScmBlockLocationTestingClient.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14476) lock too long when fix inconsistent blocks between disk and in-memory

2019-05-07 Thread star (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16834675#comment-16834675
 ] 

star commented on HDFS-14476:
-

HDFS-10477 may be a useful reference. Seems like a ubiquitous problem with 
large amount of blocks.

> lock too long when fix inconsistent blocks between disk and in-memory
> -
>
> Key: HDFS-14476
> URL: https://issues.apache.org/jira/browse/HDFS-14476
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.6.0, 2.7.0
>Reporter: Sean Chow
>Priority: Major
>
> When directoryScanner have the results of differences between disk and 
> in-memory blocks. it will try to run `checkAndUpdate` to fix it. However 
> `FsDatasetImpl.checkAndUpdate` is a synchronized call
> As I have about 6millions blocks for every datanodes and every 6hours' scan 
> will have about 25000 abnormal blocks to fix. That leads to a long lock 
> holding FsDatasetImpl object.
> let's assume every block need 10ms to fix(because of latency of SAS disk), 
> that will cost 250 seconds to finish. That means all reads and writes will be 
> blocked for 3mins for that datanode.
>  
> {code:java}
> 2019-05-06 08:06:51,704 INFO 
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool 
> BP-1644920766-10.223.143.220-1450099987967 Total blocks: 6850197, missing 
> metadata files:23574, missing block files:23574, missing blocks in 
> memory:47625, mismatched blocks:0
> ...
> 2019-05-06 08:16:41,625 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Took 588402ms to process 1 commands from NN
> {code}
> Take long time to process command from nn because threads are blocked. And 
> namenode will see long lastContact time for this datanode.
> Maybe this affect all hdfs versions.
> *to fix:*
> just like process invalidate command from namenode with 1000 batch size, fix 
> these abnormal block should be handled with batch too and sleep 2 seconds 
> between the batch to allow normal reading/writing blocks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14349) Edit log may be rolled more frequently than necessary with multiple Standby nodes

2019-05-02 Thread star (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16795553#comment-16795553
 ] 

star edited comment on HDFS-14349 at 5/2/19 3:41 PM:
-

It has been verified that edit log roll will be triggered by multiple SNN.

I‘ve proposed a improvement issue HDFS-14378 to put things right once and for 
all. The main idea is to make ANN rolling its edit log and download fsimage 
randomly from one SNN. SNN will just do checkpointing and tail editlogs. You 
are welcome to make a review or make contributions.


was (Author: starphin):
Yes, it seems that normal edit log roll will be triggered by multiple SNN. I am 
doing unit tests to verify the action. 

> Edit log may be rolled more frequently than necessary with multiple Standby 
> nodes
> -
>
> Key: HDFS-14349
> URL: https://issues.apache.org/jira/browse/HDFS-14349
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, hdfs, qjm
>Reporter: Erik Krogen
>Assignee: Ekanth Sethuramalingam
>Priority: Major
>
> When HDFS-14317 was fixed, we tackled the problem that in a cluster with 
> in-progress edit log tailing enabled, a Standby NameNode may _never_ roll the 
> edit logs, which can eventually cause data loss.
> Unfortunately, in the process, it was made so that if there are multiple 
> Standby NameNodes, they will all roll the edit logs at their specified 
> frequency, so the edit log will be rolled X times more frequently than they 
> should be (where X is the number of Standby NNs). This is not as bad as the 
> original bug since rolling frequently does not affect correctness or data 
> availability, but may degrade performance by creating more edit log segments 
> than necessary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13189) Standby NameNode should roll active edit log when checkpointing

2019-05-02 Thread star (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16831694#comment-16831694
 ] 

star commented on HDFS-13189:
-

I‘ve proposed a improvement issue HDFS-14378 to put things right once and for 
all. The main idea is to make ANN rolling its edit log and download fsimage 
randomly from one SNN. SNN will just do checkpointing and tail editlogs. You 
are welcome to make a review or make contributions.

> Standby NameNode should roll active edit log when checkpointing
> ---
>
> Key: HDFS-13189
> URL: https://issues.apache.org/jira/browse/HDFS-13189
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Chao Sun
>Priority: Minor
>
> When the SBN is doing checkpointing, it will hold the {{cpLock}}. In the 
> current implementation of edit log tailer thread, it will first check and 
> roll active edit log, and then tail and apply edits. In the case of 
> checkpointing, it will be blocked on the {{cpLock}} and will not roll the 
> edit log.
> It seems there is no dependency between the edit log roll and tailing edits, 
> so a better may be to do these in separate threads. This will be helpful for 
> people who uses the observer feature without in-progress edit log tailing. 
> An alternative is to configure 
> {{dfs.namenode.edit.log.autoroll.multiplier.threshold}} and 
> {{dfs.namenode.edit.log.autoroll.check.interval.ms}} to let ANN roll its own 
> log more frequently in case SBN is stuck on the lock.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14437) Exception happened when rollEditLog expects empty EditsDoubleBuffer.bufCurrent but not

2019-04-25 Thread star (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16826604#comment-16826604
 ] 

star edited comment on HDFS-14437 at 4/26/19 3:21 AM:
--

As [~kihwal] said in HDFS-10943 that '{{rollEditLog()}} does not and cannot 
solely depend on {{FSEditLog}} synchronization', it's not enough to reproduce 
this issue in FSEditLog level. We should reproduce it in the FSNamesystem level 
or even RPC level. I've tried that but failed. As far as I know, all method of 
FSEditLog are called in FSNamesystem with either readlock or writelock. 


was (Author: starphin):
As [~kihwal] said in HDFS-10943 that '{{rollEditLog()}} does not and cannot 
solely depend on {{FSEditLog}} synchronization', it's not enough to reproduce 
such issue in FSEditLog level. We should reproduce it in the FSNamesystem level 
or event RPC level. As far as I know, all method of FSEditLog are called in 
FSNamesystem with either readlock or writelock.

> Exception happened when   rollEditLog expects empty 
> EditsDoubleBuffer.bufCurrent  but not
> -
>
> Key: HDFS-14437
> URL: https://issues.apache.org/jira/browse/HDFS-14437
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, namenode, qjm
>Reporter: angerszhu
>Priority: Major
>
> For the problem mentioned in https://issues.apache.org/jira/browse/HDFS-10943 
> , I have sort the process of write and flush EditLog and some important 
> function, I found the in the class  FSEditLog class, the close() function 
> will call such process like below:
>  
> {code:java}
> waitForSyncToFinish();
> endCurrentLogSegment(true);{code}
> since we have gain the object lock in the function close(), so when  
> waitForSyncToFish() method return, it mean all logSync job has done and all 
> data in bufReady has been flushed out, and since current thread has the lock 
> of this object, when call endCurrentLogSegment(), no other thread will gain 
> the lock so they can't write new editlog into currentBuf.
> But when we don't call waitForSyncToFish() before endCurrentLogSegment(), 
> there may be some autoScheduled logSync()'s flush process is doing, since 
> this process don't need
> synchronization since it has mention in the comment of logSync() method :
>  
> {code:java}
> /**
>  * Sync all modifications done by this thread.
>  *
>  * The internal concurrency design of this class is as follows:
>  *   - Log items are written synchronized into an in-memory buffer,
>  * and each assigned a transaction ID.
>  *   - When a thread (client) would like to sync all of its edits, logSync()
>  * uses a ThreadLocal transaction ID to determine what edit number must
>  * be synced to.
>  *   - The isSyncRunning volatile boolean tracks whether a sync is currently
>  * under progress.
>  *
>  * The data is double-buffered within each edit log implementation so that
>  * in-memory writing can occur in parallel with the on-disk writing.
>  *
>  * Each sync occurs in three steps:
>  *   1. synchronized, it swaps the double buffer and sets the isSyncRunning
>  *  flag.
>  *   2. unsynchronized, it flushes the data to storage
>  *   3. synchronized, it resets the flag and notifies anyone waiting on the
>  *  sync.
>  *
>  * The lack of synchronization on step 2 allows other threads to continue
>  * to write into the memory buffer while the sync is in progress.
>  * Because this step is unsynchronized, actions that need to avoid
>  * concurrency with sync() should be synchronized and also call
>  * waitForSyncToFinish() before assuming they are running alone.
>  */
> public void logSync() {
>   long syncStart = 0;
>   // Fetch the transactionId of this thread. 
>   long mytxid = myTransactionId.get().txid;
>   
>   boolean sync = false;
>   try {
> EditLogOutputStream logStream = null;
> synchronized (this) {
>   try {
> printStatistics(false);
> // if somebody is already syncing, then wait
> while (mytxid > synctxid && isSyncRunning) {
>   try {
> wait(1000);
>   } catch (InterruptedException ie) {
>   }
> }
> //
> // If this transaction was already flushed, then nothing to do
> //
> if (mytxid <= synctxid) {
>   numTransactionsBatchedInSync++;
>   if (metrics != null) {
> // Metrics is non-null only when used inside name node
> metrics.incrTransactionsBatchedInSync();
>   }
>   return;
> }
>
> // now, this thread will do the sync
> syncStart = txid;
> isSyncRunning = true;
> sync = true;
> // swap buffers
> try {
>   if (journalSet.isEmpty()) {
> throw new 

[jira] [Commented] (HDFS-14437) Exception happened when rollEditLog expects empty EditsDoubleBuffer.bufCurrent but not

2019-04-25 Thread star (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16826604#comment-16826604
 ] 

star commented on HDFS-14437:
-

As [~kihwal] said in HDFS-10943 that '{{rollEditLog()}} does not and cannot 
solely depend on {{FSEditLog}} synchronization', it's not enough to reproduce 
such issue in FSEditLog level. We should reproduce it in the FSNamesystem level 
or event RPC level. As far as I know, all method of FSEditLog are called in 
FSNamesystem with either readlock or writelock.

> Exception happened when   rollEditLog expects empty 
> EditsDoubleBuffer.bufCurrent  but not
> -
>
> Key: HDFS-14437
> URL: https://issues.apache.org/jira/browse/HDFS-14437
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, namenode, qjm
>Reporter: angerszhu
>Priority: Major
>
> For the problem mentioned in https://issues.apache.org/jira/browse/HDFS-10943 
> , I have sort the process of write and flush EditLog and some important 
> function, I found the in the class  FSEditLog class, the close() function 
> will call such process like below:
>  
> {code:java}
> waitForSyncToFinish();
> endCurrentLogSegment(true);{code}
> since we have gain the object lock in the function close(), so when  
> waitForSyncToFish() method return, it mean all logSync job has done and all 
> data in bufReady has been flushed out, and since current thread has the lock 
> of this object, when call endCurrentLogSegment(), no other thread will gain 
> the lock so they can't write new editlog into currentBuf.
> But when we don't call waitForSyncToFish() before endCurrentLogSegment(), 
> there may be some autoScheduled logSync()'s flush process is doing, since 
> this process don't need
> synchronization since it has mention in the comment of logSync() method :
>  
> {code:java}
> /**
>  * Sync all modifications done by this thread.
>  *
>  * The internal concurrency design of this class is as follows:
>  *   - Log items are written synchronized into an in-memory buffer,
>  * and each assigned a transaction ID.
>  *   - When a thread (client) would like to sync all of its edits, logSync()
>  * uses a ThreadLocal transaction ID to determine what edit number must
>  * be synced to.
>  *   - The isSyncRunning volatile boolean tracks whether a sync is currently
>  * under progress.
>  *
>  * The data is double-buffered within each edit log implementation so that
>  * in-memory writing can occur in parallel with the on-disk writing.
>  *
>  * Each sync occurs in three steps:
>  *   1. synchronized, it swaps the double buffer and sets the isSyncRunning
>  *  flag.
>  *   2. unsynchronized, it flushes the data to storage
>  *   3. synchronized, it resets the flag and notifies anyone waiting on the
>  *  sync.
>  *
>  * The lack of synchronization on step 2 allows other threads to continue
>  * to write into the memory buffer while the sync is in progress.
>  * Because this step is unsynchronized, actions that need to avoid
>  * concurrency with sync() should be synchronized and also call
>  * waitForSyncToFinish() before assuming they are running alone.
>  */
> public void logSync() {
>   long syncStart = 0;
>   // Fetch the transactionId of this thread. 
>   long mytxid = myTransactionId.get().txid;
>   
>   boolean sync = false;
>   try {
> EditLogOutputStream logStream = null;
> synchronized (this) {
>   try {
> printStatistics(false);
> // if somebody is already syncing, then wait
> while (mytxid > synctxid && isSyncRunning) {
>   try {
> wait(1000);
>   } catch (InterruptedException ie) {
>   }
> }
> //
> // If this transaction was already flushed, then nothing to do
> //
> if (mytxid <= synctxid) {
>   numTransactionsBatchedInSync++;
>   if (metrics != null) {
> // Metrics is non-null only when used inside name node
> metrics.incrTransactionsBatchedInSync();
>   }
>   return;
> }
>
> // now, this thread will do the sync
> syncStart = txid;
> isSyncRunning = true;
> sync = true;
> // swap buffers
> try {
>   if (journalSet.isEmpty()) {
> throw new IOException("No journals available to flush");
>   }
>   editLogStream.setReadyToFlush();
> } catch (IOException e) {
>   final String msg =
>   "Could not sync enough journals to persistent storage " +
>   "due to " + e.getMessage() + ". " +
>   "Unsynced transactions: " + (txid - synctxid);
>   LOG.fatal(msg, new Exception());
>   synchronized(journalSetLock) {
> 

[jira] [Commented] (HDFS-14437) Exception happened when rollEditLog expects empty EditsDoubleBuffer.bufCurrent but not

2019-04-23 Thread star (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16823750#comment-16823750
 ] 

star commented on HDFS-14437:
-

ok, Later I will take a view.

> Exception happened when   rollEditLog expects empty 
> EditsDoubleBuffer.bufCurrent  but not
> -
>
> Key: HDFS-14437
> URL: https://issues.apache.org/jira/browse/HDFS-14437
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, namenode, qjm
>Reporter: angerszhu
>Priority: Major
>
> For the problem mentioned in https://issues.apache.org/jira/browse/HDFS-10943 
> , I have sort the process of write and flush EditLog and some important 
> function, I found the in the class  FSEditLog class, the close() function 
> will call such process like below:
>  
> {code:java}
> waitForSyncToFinish();
> endCurrentLogSegment(true);{code}
> since we have gain the object lock in the function close(), so when  
> waitForSyncToFish() method return, it mean all logSync job has done and all 
> data in bufReady has been flushed out, and since current thread has the lock 
> of this object, when call endCurrentLogSegment(), no other thread will gain 
> the lock so they can't write new editlog into currentBuf.
> But when we don't call waitForSyncToFish() before endCurrentLogSegment(), 
> there may be some autoScheduled logSync()'s flush process is doing, since 
> this process don't need
> synchronization since it has mention in the comment of logSync() method :
>  
> {code:java}
> /**
>  * Sync all modifications done by this thread.
>  *
>  * The internal concurrency design of this class is as follows:
>  *   - Log items are written synchronized into an in-memory buffer,
>  * and each assigned a transaction ID.
>  *   - When a thread (client) would like to sync all of its edits, logSync()
>  * uses a ThreadLocal transaction ID to determine what edit number must
>  * be synced to.
>  *   - The isSyncRunning volatile boolean tracks whether a sync is currently
>  * under progress.
>  *
>  * The data is double-buffered within each edit log implementation so that
>  * in-memory writing can occur in parallel with the on-disk writing.
>  *
>  * Each sync occurs in three steps:
>  *   1. synchronized, it swaps the double buffer and sets the isSyncRunning
>  *  flag.
>  *   2. unsynchronized, it flushes the data to storage
>  *   3. synchronized, it resets the flag and notifies anyone waiting on the
>  *  sync.
>  *
>  * The lack of synchronization on step 2 allows other threads to continue
>  * to write into the memory buffer while the sync is in progress.
>  * Because this step is unsynchronized, actions that need to avoid
>  * concurrency with sync() should be synchronized and also call
>  * waitForSyncToFinish() before assuming they are running alone.
>  */
> public void logSync() {
>   long syncStart = 0;
>   // Fetch the transactionId of this thread. 
>   long mytxid = myTransactionId.get().txid;
>   
>   boolean sync = false;
>   try {
> EditLogOutputStream logStream = null;
> synchronized (this) {
>   try {
> printStatistics(false);
> // if somebody is already syncing, then wait
> while (mytxid > synctxid && isSyncRunning) {
>   try {
> wait(1000);
>   } catch (InterruptedException ie) {
>   }
> }
> //
> // If this transaction was already flushed, then nothing to do
> //
> if (mytxid <= synctxid) {
>   numTransactionsBatchedInSync++;
>   if (metrics != null) {
> // Metrics is non-null only when used inside name node
> metrics.incrTransactionsBatchedInSync();
>   }
>   return;
> }
>
> // now, this thread will do the sync
> syncStart = txid;
> isSyncRunning = true;
> sync = true;
> // swap buffers
> try {
>   if (journalSet.isEmpty()) {
> throw new IOException("No journals available to flush");
>   }
>   editLogStream.setReadyToFlush();
> } catch (IOException e) {
>   final String msg =
>   "Could not sync enough journals to persistent storage " +
>   "due to " + e.getMessage() + ". " +
>   "Unsynced transactions: " + (txid - synctxid);
>   LOG.fatal(msg, new Exception());
>   synchronized(journalSetLock) {
> IOUtils.cleanup(LOG, journalSet);
>   }
>   terminate(1, msg);
> }
>   } finally {
> // Prevent RuntimeException from blocking other log edit write 
> doneWithAutoSyncScheduling();
>   }
>   //editLogStream may become null,
>   //so store a local variable for flush.
>   logStream = 

[jira] [Comment Edited] (HDFS-14437) Exception happened when rollEditLog expects empty EditsDoubleBuffer.bufCurrent but not

2019-04-23 Thread star (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16823714#comment-16823714
 ] 

star edited comment on HDFS-14437 at 4/23/19 6:55 AM:
--

[~angerszhuuu], I think I've got that.

 
||Thread0||thread1||Thread2||locked||isSyncRunning||
| |flush| |false|true|
|endCurrentSegment#{color:#33}logSyncAll{color}| | |true|true|
|wait| | |false|true|
| |flush done| |false|false|
| | |*logEdit*|true|false|
|{color:#d04437}finalize error{color}| | |true|false|

 

 


was (Author: starphin):
[~angerszhuuu], I think I've got that.

 
||Thread0||thread1||Thread2||locked||isSyncRunning||
| |flush| |false|true|
|endCurrentSegment#{color:#33}logSyncAll{color}| | |true|true|
|wait| | |false|true|
| |flush done| |false|false|
| | |*logAppend*|true|false|
|{color:#d04437}finalize error{color}| | |true|false|

 

 

> Exception happened when   rollEditLog expects empty 
> EditsDoubleBuffer.bufCurrent  but not
> -
>
> Key: HDFS-14437
> URL: https://issues.apache.org/jira/browse/HDFS-14437
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, namenode, qjm
>Reporter: angerszhu
>Priority: Major
>
> For the problem mentioned in https://issues.apache.org/jira/browse/HDFS-10943 
> , I have sort the process of write and flush EditLog and some important 
> function, I found the in the class  FSEditLog class, the close() function 
> will call such process like below:
>  
> {code:java}
> waitForSyncToFinish();
> endCurrentLogSegment(true);{code}
> since we have gain the object lock in the function close(), so when  
> waitForSyncToFish() method return, it mean all logSync job has done and all 
> data in bufReady has been flushed out, and since current thread has the lock 
> of this object, when call endCurrentLogSegment(), no other thread will gain 
> the lock so they can't write new editlog into currentBuf.
> But when we don't call waitForSyncToFish() before endCurrentLogSegment(), 
> there may be some autoScheduled logSync()'s flush process is doing, since 
> this process don't need
> synchronization since it has mention in the comment of logSync() method :
>  
> {code:java}
> /**
>  * Sync all modifications done by this thread.
>  *
>  * The internal concurrency design of this class is as follows:
>  *   - Log items are written synchronized into an in-memory buffer,
>  * and each assigned a transaction ID.
>  *   - When a thread (client) would like to sync all of its edits, logSync()
>  * uses a ThreadLocal transaction ID to determine what edit number must
>  * be synced to.
>  *   - The isSyncRunning volatile boolean tracks whether a sync is currently
>  * under progress.
>  *
>  * The data is double-buffered within each edit log implementation so that
>  * in-memory writing can occur in parallel with the on-disk writing.
>  *
>  * Each sync occurs in three steps:
>  *   1. synchronized, it swaps the double buffer and sets the isSyncRunning
>  *  flag.
>  *   2. unsynchronized, it flushes the data to storage
>  *   3. synchronized, it resets the flag and notifies anyone waiting on the
>  *  sync.
>  *
>  * The lack of synchronization on step 2 allows other threads to continue
>  * to write into the memory buffer while the sync is in progress.
>  * Because this step is unsynchronized, actions that need to avoid
>  * concurrency with sync() should be synchronized and also call
>  * waitForSyncToFinish() before assuming they are running alone.
>  */
> public void logSync() {
>   long syncStart = 0;
>   // Fetch the transactionId of this thread. 
>   long mytxid = myTransactionId.get().txid;
>   
>   boolean sync = false;
>   try {
> EditLogOutputStream logStream = null;
> synchronized (this) {
>   try {
> printStatistics(false);
> // if somebody is already syncing, then wait
> while (mytxid > synctxid && isSyncRunning) {
>   try {
> wait(1000);
>   } catch (InterruptedException ie) {
>   }
> }
> //
> // If this transaction was already flushed, then nothing to do
> //
> if (mytxid <= synctxid) {
>   numTransactionsBatchedInSync++;
>   if (metrics != null) {
> // Metrics is non-null only when used inside name node
> metrics.incrTransactionsBatchedInSync();
>   }
>   return;
> }
>
> // now, this thread will do the sync
> syncStart = txid;
> isSyncRunning = true;
> sync = true;
> // swap buffers
> try {
>   if (journalSet.isEmpty()) {
> throw new IOException("No journals available to flush");
>   }
>   editLogStream.setReadyToFlush();
> } 

[jira] [Commented] (HDFS-14437) Exception happened when rollEditLog expects empty EditsDoubleBuffer.bufCurrent but not

2019-04-23 Thread star (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16823742#comment-16823742
 ] 

star commented on HDFS-14437:
-

[~angerszhuuu] Right. lock state fixed.
{quote}But logAppend also need lock.
{quote}

> Exception happened when   rollEditLog expects empty 
> EditsDoubleBuffer.bufCurrent  but not
> -
>
> Key: HDFS-14437
> URL: https://issues.apache.org/jira/browse/HDFS-14437
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, namenode, qjm
>Reporter: angerszhu
>Priority: Major
>
> For the problem mentioned in https://issues.apache.org/jira/browse/HDFS-10943 
> , I have sort the process of write and flush EditLog and some important 
> function, I found the in the class  FSEditLog class, the close() function 
> will call such process like below:
>  
> {code:java}
> waitForSyncToFinish();
> endCurrentLogSegment(true);{code}
> since we have gain the object lock in the function close(), so when  
> waitForSyncToFish() method return, it mean all logSync job has done and all 
> data in bufReady has been flushed out, and since current thread has the lock 
> of this object, when call endCurrentLogSegment(), no other thread will gain 
> the lock so they can't write new editlog into currentBuf.
> But when we don't call waitForSyncToFish() before endCurrentLogSegment(), 
> there may be some autoScheduled logSync()'s flush process is doing, since 
> this process don't need
> synchronization since it has mention in the comment of logSync() method :
>  
> {code:java}
> /**
>  * Sync all modifications done by this thread.
>  *
>  * The internal concurrency design of this class is as follows:
>  *   - Log items are written synchronized into an in-memory buffer,
>  * and each assigned a transaction ID.
>  *   - When a thread (client) would like to sync all of its edits, logSync()
>  * uses a ThreadLocal transaction ID to determine what edit number must
>  * be synced to.
>  *   - The isSyncRunning volatile boolean tracks whether a sync is currently
>  * under progress.
>  *
>  * The data is double-buffered within each edit log implementation so that
>  * in-memory writing can occur in parallel with the on-disk writing.
>  *
>  * Each sync occurs in three steps:
>  *   1. synchronized, it swaps the double buffer and sets the isSyncRunning
>  *  flag.
>  *   2. unsynchronized, it flushes the data to storage
>  *   3. synchronized, it resets the flag and notifies anyone waiting on the
>  *  sync.
>  *
>  * The lack of synchronization on step 2 allows other threads to continue
>  * to write into the memory buffer while the sync is in progress.
>  * Because this step is unsynchronized, actions that need to avoid
>  * concurrency with sync() should be synchronized and also call
>  * waitForSyncToFinish() before assuming they are running alone.
>  */
> public void logSync() {
>   long syncStart = 0;
>   // Fetch the transactionId of this thread. 
>   long mytxid = myTransactionId.get().txid;
>   
>   boolean sync = false;
>   try {
> EditLogOutputStream logStream = null;
> synchronized (this) {
>   try {
> printStatistics(false);
> // if somebody is already syncing, then wait
> while (mytxid > synctxid && isSyncRunning) {
>   try {
> wait(1000);
>   } catch (InterruptedException ie) {
>   }
> }
> //
> // If this transaction was already flushed, then nothing to do
> //
> if (mytxid <= synctxid) {
>   numTransactionsBatchedInSync++;
>   if (metrics != null) {
> // Metrics is non-null only when used inside name node
> metrics.incrTransactionsBatchedInSync();
>   }
>   return;
> }
>
> // now, this thread will do the sync
> syncStart = txid;
> isSyncRunning = true;
> sync = true;
> // swap buffers
> try {
>   if (journalSet.isEmpty()) {
> throw new IOException("No journals available to flush");
>   }
>   editLogStream.setReadyToFlush();
> } catch (IOException e) {
>   final String msg =
>   "Could not sync enough journals to persistent storage " +
>   "due to " + e.getMessage() + ". " +
>   "Unsynced transactions: " + (txid - synctxid);
>   LOG.fatal(msg, new Exception());
>   synchronized(journalSetLock) {
> IOUtils.cleanup(LOG, journalSet);
>   }
>   terminate(1, msg);
> }
>   } finally {
> // Prevent RuntimeException from blocking other log edit write 
> doneWithAutoSyncScheduling();
>   }
>   //editLogStream may become null,
>   //so 

[jira] [Comment Edited] (HDFS-14437) Exception happened when rollEditLog expects empty EditsDoubleBuffer.bufCurrent but not

2019-04-23 Thread star (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16823714#comment-16823714
 ] 

star edited comment on HDFS-14437 at 4/23/19 6:38 AM:
--

[~angerszhuuu], I think I've got that.

 
||Thread0||thread1||Thread2||locked||isSyncRunning||
| |flush| |false|true|
|endCurrentSegment#{color:#33}logSyncAll{color}| | |true|true|
|wait| | |false|true|
| |flush done| |false|false|
| | |*logAppend*|true|false|
|{color:#d04437}finalize error{color}| | |true|false|

 

 


was (Author: starphin):
[~angerszhuuu], I think I've got that. 

 
||Thread0||thread1||Thread2||locked||isSyncRunning||
| |flush| |false|true|
|endCurrentSegment#{color:#33}logSyncAll{color}| | |true|true|
|wait| | |false|true|
| |flush done| |false|false|
| | |*logAppend*|false|false|
|{color:#d04437}finalize error{color}| | |true|false|

 

 

> Exception happened when   rollEditLog expects empty 
> EditsDoubleBuffer.bufCurrent  but not
> -
>
> Key: HDFS-14437
> URL: https://issues.apache.org/jira/browse/HDFS-14437
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, namenode, qjm
>Reporter: angerszhu
>Priority: Major
>
> For the problem mentioned in https://issues.apache.org/jira/browse/HDFS-10943 
> , I have sort the process of write and flush EditLog and some important 
> function, I found the in the class  FSEditLog class, the close() function 
> will call such process like below:
>  
> {code:java}
> waitForSyncToFinish();
> endCurrentLogSegment(true);{code}
> since we have gain the object lock in the function close(), so when  
> waitForSyncToFish() method return, it mean all logSync job has done and all 
> data in bufReady has been flushed out, and since current thread has the lock 
> of this object, when call endCurrentLogSegment(), no other thread will gain 
> the lock so they can't write new editlog into currentBuf.
> But when we don't call waitForSyncToFish() before endCurrentLogSegment(), 
> there may be some autoScheduled logSync()'s flush process is doing, since 
> this process don't need
> synchronization since it has mention in the comment of logSync() method :
>  
> {code:java}
> /**
>  * Sync all modifications done by this thread.
>  *
>  * The internal concurrency design of this class is as follows:
>  *   - Log items are written synchronized into an in-memory buffer,
>  * and each assigned a transaction ID.
>  *   - When a thread (client) would like to sync all of its edits, logSync()
>  * uses a ThreadLocal transaction ID to determine what edit number must
>  * be synced to.
>  *   - The isSyncRunning volatile boolean tracks whether a sync is currently
>  * under progress.
>  *
>  * The data is double-buffered within each edit log implementation so that
>  * in-memory writing can occur in parallel with the on-disk writing.
>  *
>  * Each sync occurs in three steps:
>  *   1. synchronized, it swaps the double buffer and sets the isSyncRunning
>  *  flag.
>  *   2. unsynchronized, it flushes the data to storage
>  *   3. synchronized, it resets the flag and notifies anyone waiting on the
>  *  sync.
>  *
>  * The lack of synchronization on step 2 allows other threads to continue
>  * to write into the memory buffer while the sync is in progress.
>  * Because this step is unsynchronized, actions that need to avoid
>  * concurrency with sync() should be synchronized and also call
>  * waitForSyncToFinish() before assuming they are running alone.
>  */
> public void logSync() {
>   long syncStart = 0;
>   // Fetch the transactionId of this thread. 
>   long mytxid = myTransactionId.get().txid;
>   
>   boolean sync = false;
>   try {
> EditLogOutputStream logStream = null;
> synchronized (this) {
>   try {
> printStatistics(false);
> // if somebody is already syncing, then wait
> while (mytxid > synctxid && isSyncRunning) {
>   try {
> wait(1000);
>   } catch (InterruptedException ie) {
>   }
> }
> //
> // If this transaction was already flushed, then nothing to do
> //
> if (mytxid <= synctxid) {
>   numTransactionsBatchedInSync++;
>   if (metrics != null) {
> // Metrics is non-null only when used inside name node
> metrics.incrTransactionsBatchedInSync();
>   }
>   return;
> }
>
> // now, this thread will do the sync
> syncStart = txid;
> isSyncRunning = true;
> sync = true;
> // swap buffers
> try {
>   if (journalSet.isEmpty()) {
> throw new IOException("No journals available to flush");
>   }
>   editLogStream.setReadyToFlush();
>   

[jira] [Commented] (HDFS-14437) Exception happened when rollEditLog expects empty EditsDoubleBuffer.bufCurrent but not

2019-04-23 Thread star (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16823726#comment-16823726
 ] 

star commented on HDFS-14437:
-

locked is object lock of FSEditLog.

> Exception happened when   rollEditLog expects empty 
> EditsDoubleBuffer.bufCurrent  but not
> -
>
> Key: HDFS-14437
> URL: https://issues.apache.org/jira/browse/HDFS-14437
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, namenode, qjm
>Reporter: angerszhu
>Priority: Major
>
> For the problem mentioned in https://issues.apache.org/jira/browse/HDFS-10943 
> , I have sort the process of write and flush EditLog and some important 
> function, I found the in the class  FSEditLog class, the close() function 
> will call such process like below:
>  
> {code:java}
> waitForSyncToFinish();
> endCurrentLogSegment(true);{code}
> since we have gain the object lock in the function close(), so when  
> waitForSyncToFish() method return, it mean all logSync job has done and all 
> data in bufReady has been flushed out, and since current thread has the lock 
> of this object, when call endCurrentLogSegment(), no other thread will gain 
> the lock so they can't write new editlog into currentBuf.
> But when we don't call waitForSyncToFish() before endCurrentLogSegment(), 
> there may be some autoScheduled logSync()'s flush process is doing, since 
> this process don't need
> synchronization since it has mention in the comment of logSync() method :
>  
> {code:java}
> /**
>  * Sync all modifications done by this thread.
>  *
>  * The internal concurrency design of this class is as follows:
>  *   - Log items are written synchronized into an in-memory buffer,
>  * and each assigned a transaction ID.
>  *   - When a thread (client) would like to sync all of its edits, logSync()
>  * uses a ThreadLocal transaction ID to determine what edit number must
>  * be synced to.
>  *   - The isSyncRunning volatile boolean tracks whether a sync is currently
>  * under progress.
>  *
>  * The data is double-buffered within each edit log implementation so that
>  * in-memory writing can occur in parallel with the on-disk writing.
>  *
>  * Each sync occurs in three steps:
>  *   1. synchronized, it swaps the double buffer and sets the isSyncRunning
>  *  flag.
>  *   2. unsynchronized, it flushes the data to storage
>  *   3. synchronized, it resets the flag and notifies anyone waiting on the
>  *  sync.
>  *
>  * The lack of synchronization on step 2 allows other threads to continue
>  * to write into the memory buffer while the sync is in progress.
>  * Because this step is unsynchronized, actions that need to avoid
>  * concurrency with sync() should be synchronized and also call
>  * waitForSyncToFinish() before assuming they are running alone.
>  */
> public void logSync() {
>   long syncStart = 0;
>   // Fetch the transactionId of this thread. 
>   long mytxid = myTransactionId.get().txid;
>   
>   boolean sync = false;
>   try {
> EditLogOutputStream logStream = null;
> synchronized (this) {
>   try {
> printStatistics(false);
> // if somebody is already syncing, then wait
> while (mytxid > synctxid && isSyncRunning) {
>   try {
> wait(1000);
>   } catch (InterruptedException ie) {
>   }
> }
> //
> // If this transaction was already flushed, then nothing to do
> //
> if (mytxid <= synctxid) {
>   numTransactionsBatchedInSync++;
>   if (metrics != null) {
> // Metrics is non-null only when used inside name node
> metrics.incrTransactionsBatchedInSync();
>   }
>   return;
> }
>
> // now, this thread will do the sync
> syncStart = txid;
> isSyncRunning = true;
> sync = true;
> // swap buffers
> try {
>   if (journalSet.isEmpty()) {
> throw new IOException("No journals available to flush");
>   }
>   editLogStream.setReadyToFlush();
> } catch (IOException e) {
>   final String msg =
>   "Could not sync enough journals to persistent storage " +
>   "due to " + e.getMessage() + ". " +
>   "Unsynced transactions: " + (txid - synctxid);
>   LOG.fatal(msg, new Exception());
>   synchronized(journalSetLock) {
> IOUtils.cleanup(LOG, journalSet);
>   }
>   terminate(1, msg);
> }
>   } finally {
> // Prevent RuntimeException from blocking other log edit write 
> doneWithAutoSyncScheduling();
>   }
>   //editLogStream may become null,
>   //so store a local variable for flush.
>   logStream 

[jira] [Comment Edited] (HDFS-14437) Exception happened when rollEditLog expects empty EditsDoubleBuffer.bufCurrent but not

2019-04-23 Thread star (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16823726#comment-16823726
 ] 

star edited comment on HDFS-14437 at 4/23/19 6:24 AM:
--

locked is the state of object lock for FSEditLog.


was (Author: starphin):
locked is object lock of FSEditLog.

> Exception happened when   rollEditLog expects empty 
> EditsDoubleBuffer.bufCurrent  but not
> -
>
> Key: HDFS-14437
> URL: https://issues.apache.org/jira/browse/HDFS-14437
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, namenode, qjm
>Reporter: angerszhu
>Priority: Major
>
> For the problem mentioned in https://issues.apache.org/jira/browse/HDFS-10943 
> , I have sort the process of write and flush EditLog and some important 
> function, I found the in the class  FSEditLog class, the close() function 
> will call such process like below:
>  
> {code:java}
> waitForSyncToFinish();
> endCurrentLogSegment(true);{code}
> since we have gain the object lock in the function close(), so when  
> waitForSyncToFish() method return, it mean all logSync job has done and all 
> data in bufReady has been flushed out, and since current thread has the lock 
> of this object, when call endCurrentLogSegment(), no other thread will gain 
> the lock so they can't write new editlog into currentBuf.
> But when we don't call waitForSyncToFish() before endCurrentLogSegment(), 
> there may be some autoScheduled logSync()'s flush process is doing, since 
> this process don't need
> synchronization since it has mention in the comment of logSync() method :
>  
> {code:java}
> /**
>  * Sync all modifications done by this thread.
>  *
>  * The internal concurrency design of this class is as follows:
>  *   - Log items are written synchronized into an in-memory buffer,
>  * and each assigned a transaction ID.
>  *   - When a thread (client) would like to sync all of its edits, logSync()
>  * uses a ThreadLocal transaction ID to determine what edit number must
>  * be synced to.
>  *   - The isSyncRunning volatile boolean tracks whether a sync is currently
>  * under progress.
>  *
>  * The data is double-buffered within each edit log implementation so that
>  * in-memory writing can occur in parallel with the on-disk writing.
>  *
>  * Each sync occurs in three steps:
>  *   1. synchronized, it swaps the double buffer and sets the isSyncRunning
>  *  flag.
>  *   2. unsynchronized, it flushes the data to storage
>  *   3. synchronized, it resets the flag and notifies anyone waiting on the
>  *  sync.
>  *
>  * The lack of synchronization on step 2 allows other threads to continue
>  * to write into the memory buffer while the sync is in progress.
>  * Because this step is unsynchronized, actions that need to avoid
>  * concurrency with sync() should be synchronized and also call
>  * waitForSyncToFinish() before assuming they are running alone.
>  */
> public void logSync() {
>   long syncStart = 0;
>   // Fetch the transactionId of this thread. 
>   long mytxid = myTransactionId.get().txid;
>   
>   boolean sync = false;
>   try {
> EditLogOutputStream logStream = null;
> synchronized (this) {
>   try {
> printStatistics(false);
> // if somebody is already syncing, then wait
> while (mytxid > synctxid && isSyncRunning) {
>   try {
> wait(1000);
>   } catch (InterruptedException ie) {
>   }
> }
> //
> // If this transaction was already flushed, then nothing to do
> //
> if (mytxid <= synctxid) {
>   numTransactionsBatchedInSync++;
>   if (metrics != null) {
> // Metrics is non-null only when used inside name node
> metrics.incrTransactionsBatchedInSync();
>   }
>   return;
> }
>
> // now, this thread will do the sync
> syncStart = txid;
> isSyncRunning = true;
> sync = true;
> // swap buffers
> try {
>   if (journalSet.isEmpty()) {
> throw new IOException("No journals available to flush");
>   }
>   editLogStream.setReadyToFlush();
> } catch (IOException e) {
>   final String msg =
>   "Could not sync enough journals to persistent storage " +
>   "due to " + e.getMessage() + ". " +
>   "Unsynced transactions: " + (txid - synctxid);
>   LOG.fatal(msg, new Exception());
>   synchronized(journalSetLock) {
> IOUtils.cleanup(LOG, journalSet);
>   }
>   terminate(1, msg);
> }
>   } finally {
> // Prevent RuntimeException from blocking other log edit write 
> 

[jira] [Commented] (HDFS-14437) Exception happened when rollEditLog expects empty EditsDoubleBuffer.bufCurrent but not

2019-04-23 Thread star (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16823714#comment-16823714
 ] 

star commented on HDFS-14437:
-

[~angerszhuuu], I think I've got that. 

 
||Thread0||thread1||Thread2||locked||isSyncRunning||
| |flush| |false|true|
|endCurrentSegment#{color:#33}logSyncAll{color}| | |true|true|
|wait| | |false|true|
| |flush done| |false|false|
| | |*logAppend*|false|false|
|{color:#d04437}finalize error{color}| | |true|false|

 

 

> Exception happened when   rollEditLog expects empty 
> EditsDoubleBuffer.bufCurrent  but not
> -
>
> Key: HDFS-14437
> URL: https://issues.apache.org/jira/browse/HDFS-14437
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, namenode, qjm
>Reporter: angerszhu
>Priority: Major
>
> For the problem mentioned in https://issues.apache.org/jira/browse/HDFS-10943 
> , I have sort the process of write and flush EditLog and some important 
> function, I found the in the class  FSEditLog class, the close() function 
> will call such process like below:
>  
> {code:java}
> waitForSyncToFinish();
> endCurrentLogSegment(true);{code}
> since we have gain the object lock in the function close(), so when  
> waitForSyncToFish() method return, it mean all logSync job has done and all 
> data in bufReady has been flushed out, and since current thread has the lock 
> of this object, when call endCurrentLogSegment(), no other thread will gain 
> the lock so they can't write new editlog into currentBuf.
> But when we don't call waitForSyncToFish() before endCurrentLogSegment(), 
> there may be some autoScheduled logSync()'s flush process is doing, since 
> this process don't need
> synchronization since it has mention in the comment of logSync() method :
>  
> {code:java}
> /**
>  * Sync all modifications done by this thread.
>  *
>  * The internal concurrency design of this class is as follows:
>  *   - Log items are written synchronized into an in-memory buffer,
>  * and each assigned a transaction ID.
>  *   - When a thread (client) would like to sync all of its edits, logSync()
>  * uses a ThreadLocal transaction ID to determine what edit number must
>  * be synced to.
>  *   - The isSyncRunning volatile boolean tracks whether a sync is currently
>  * under progress.
>  *
>  * The data is double-buffered within each edit log implementation so that
>  * in-memory writing can occur in parallel with the on-disk writing.
>  *
>  * Each sync occurs in three steps:
>  *   1. synchronized, it swaps the double buffer and sets the isSyncRunning
>  *  flag.
>  *   2. unsynchronized, it flushes the data to storage
>  *   3. synchronized, it resets the flag and notifies anyone waiting on the
>  *  sync.
>  *
>  * The lack of synchronization on step 2 allows other threads to continue
>  * to write into the memory buffer while the sync is in progress.
>  * Because this step is unsynchronized, actions that need to avoid
>  * concurrency with sync() should be synchronized and also call
>  * waitForSyncToFinish() before assuming they are running alone.
>  */
> public void logSync() {
>   long syncStart = 0;
>   // Fetch the transactionId of this thread. 
>   long mytxid = myTransactionId.get().txid;
>   
>   boolean sync = false;
>   try {
> EditLogOutputStream logStream = null;
> synchronized (this) {
>   try {
> printStatistics(false);
> // if somebody is already syncing, then wait
> while (mytxid > synctxid && isSyncRunning) {
>   try {
> wait(1000);
>   } catch (InterruptedException ie) {
>   }
> }
> //
> // If this transaction was already flushed, then nothing to do
> //
> if (mytxid <= synctxid) {
>   numTransactionsBatchedInSync++;
>   if (metrics != null) {
> // Metrics is non-null only when used inside name node
> metrics.incrTransactionsBatchedInSync();
>   }
>   return;
> }
>
> // now, this thread will do the sync
> syncStart = txid;
> isSyncRunning = true;
> sync = true;
> // swap buffers
> try {
>   if (journalSet.isEmpty()) {
> throw new IOException("No journals available to flush");
>   }
>   editLogStream.setReadyToFlush();
> } catch (IOException e) {
>   final String msg =
>   "Could not sync enough journals to persistent storage " +
>   "due to " + e.getMessage() + ". " +
>   "Unsynced transactions: " + (txid - synctxid);
>   LOG.fatal(msg, new Exception());
>   synchronized(journalSetLock) {
> IOUtils.cleanup(LOG, journalSet);
>   }
> 

[jira] [Commented] (HDFS-14378) Simplify the design of multiple NN and both logic of edit log roll and checkpoint

2019-04-19 Thread star (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16822296#comment-16822296
 ] 

star commented on HDFS-14378:
-

Failed test pass locally.

> Simplify the design of multiple NN and both logic of edit log roll and 
> checkpoint
> -
>
> Key: HDFS-14378
> URL: https://issues.apache.org/jira/browse/HDFS-14378
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.1.2
>Reporter: star
>Assignee: star
>Priority: Minor
>  Labels: patch
> Attachments: HDFS-14378-trunk.001.patch, HDFS-14378-trunk.002.patch, 
> HDFS-14378-trunk.003.patch, HDFS-14378-trunk.004.patch, 
> HDFS-14378-trunk.005.patch, HDFS-14378-trunk.006.patch
>
>
>       HDFS-6440 introduced a mechanism to support more than 2 NNs. It 
> implements a first-writer-win policy to avoid duplicated fsimage downloading. 
> Variable 'isPrimaryCheckPointer' is used to hold the first-writer state, with 
> which SNN will provide fsimage for ANN next time. Then we have three roles in 
> NN cluster: ANN, one primary SNN, one or more normal SNN.
>       Since HDFS-12248, there may be more than two primary SNN shortly after 
> a exception occurred. It takes care with a scenario  that SNN will not upload 
> fsimage on IOE and Interrupted exceptions. Though it will not cause any 
> further functional issues, it is inconsistent. 
>       Futher more, edit log may be rolled more frequently than necessary with 
> multiple Standby name nodes, HDFS-14349. (I'm not so sure about this, will 
> verify by unit tests or any one could point it out.)
>       Above all, I‘m wondering if we could make it simple with following 
> changes:
>  * There are only two roles:ANN, SNN
>  * ANN will roll its edit log every DFS_HA_LOGROLL_PERIOD_KEY period.
>  * ANN will select a SNN to download checkpoint.
> SNN will just do logtail and checkpoint. Then provide a servlet for fsimage 
> downloading as normal. SNN will not try to roll edit log or send checkpoint 
> request to ANN.
> In a word, ANN will be more active. Suggestions are welcomed.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-14439) FileNotFoundException thrown in TestEditLogRace when just add a setPermission operation

2019-04-19 Thread star (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

star resolved HDFS-14439.
-
Resolution: Not A Problem

> FileNotFoundException thrown in TestEditLogRace when just add a setPermission 
> operation
> ---
>
> Key: HDFS-14439
> URL: https://issues.apache.org/jira/browse/HDFS-14439
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fs
>Reporter: star
>Assignee: star
>Priority: Minor
>
> In TestEditLogRace.Transactions#run, add following code between mkdirs and 
> delete.
>  
> {panel}
> fs.setPermission(dirnamePath, p);
> {panel}
> It will be like. 
> {panel}
> fs.mkdirs(dirnamePath);
> *fs.setPermission(dirnamePath, p);*
> fs.delete(dirnamePath, true);{panel}
> then run test TestEditLogRace#testEditLogRolling, it will throw 
> FileNotFoundException.
> {code:java}
> java.io.FileNotFoundException: cannot find /thr-291-dir-2
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.resolveLastINode(FSDirectory.java:1524)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.unprotectedSetPermission(FSDirAttrOp.java:264)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:656)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:287)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:182)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:159)
> at 
> org.apache.hadoop.hdfs.server.namenode.TestEditLogRace.verifyEditLogs(TestEditLogRace.java:293)
> at 
> org.apache.hadoop.hdfs.server.namenode.TestEditLogRace.testEditLogRolling(TestEditLogRace.java:258)
> {code}
> It happens when verifing edit logs and finds the target dir does not exits. 
> Any one could help figure out whether it makes sense or what makes it 
> behaving that way?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14439) FileNotFoundException thrown in TestEditLogRace when just add a setPermission operation

2019-04-19 Thread star (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16822295#comment-16822295
 ] 

star commented on HDFS-14439:
-

Not a issue. Function verifyEditlogs load single editlogs segment, which makes 
it throws such exception. It happens when mkdir op is in previous segment and 
verify current segment with setPermission op.

> FileNotFoundException thrown in TestEditLogRace when just add a setPermission 
> operation
> ---
>
> Key: HDFS-14439
> URL: https://issues.apache.org/jira/browse/HDFS-14439
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fs
>Reporter: star
>Assignee: star
>Priority: Minor
>
> In TestEditLogRace.Transactions#run, add following code between mkdirs and 
> delete.
>  
> {panel}
> fs.setPermission(dirnamePath, p);
> {panel}
> It will be like. 
> {panel}
> fs.mkdirs(dirnamePath);
> *fs.setPermission(dirnamePath, p);*
> fs.delete(dirnamePath, true);{panel}
> then run test TestEditLogRace#testEditLogRolling, it will throw 
> FileNotFoundException.
> {code:java}
> java.io.FileNotFoundException: cannot find /thr-291-dir-2
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.resolveLastINode(FSDirectory.java:1524)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.unprotectedSetPermission(FSDirAttrOp.java:264)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:656)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:287)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:182)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:159)
> at 
> org.apache.hadoop.hdfs.server.namenode.TestEditLogRace.verifyEditLogs(TestEditLogRace.java:293)
> at 
> org.apache.hadoop.hdfs.server.namenode.TestEditLogRace.testEditLogRolling(TestEditLogRace.java:258)
> {code}
> It happens when verifing edit logs and finds the target dir does not exits. 
> Any one could help figure out whether it makes sense or what makes it 
> behaving that way?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14378) Simplify the design of multiple NN and both logic of edit log roll and checkpoint

2019-04-19 Thread star (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

star updated HDFS-14378:

Attachment: HDFS-14378-trunk.006.patch

> Simplify the design of multiple NN and both logic of edit log roll and 
> checkpoint
> -
>
> Key: HDFS-14378
> URL: https://issues.apache.org/jira/browse/HDFS-14378
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.1.2
>Reporter: star
>Assignee: star
>Priority: Minor
>  Labels: patch
> Attachments: HDFS-14378-trunk.001.patch, HDFS-14378-trunk.002.patch, 
> HDFS-14378-trunk.003.patch, HDFS-14378-trunk.004.patch, 
> HDFS-14378-trunk.005.patch, HDFS-14378-trunk.006.patch
>
>
>       HDFS-6440 introduced a mechanism to support more than 2 NNs. It 
> implements a first-writer-win policy to avoid duplicated fsimage downloading. 
> Variable 'isPrimaryCheckPointer' is used to hold the first-writer state, with 
> which SNN will provide fsimage for ANN next time. Then we have three roles in 
> NN cluster: ANN, one primary SNN, one or more normal SNN.
>       Since HDFS-12248, there may be more than two primary SNN shortly after 
> a exception occurred. It takes care with a scenario  that SNN will not upload 
> fsimage on IOE and Interrupted exceptions. Though it will not cause any 
> further functional issues, it is inconsistent. 
>       Futher more, edit log may be rolled more frequently than necessary with 
> multiple Standby name nodes, HDFS-14349. (I'm not so sure about this, will 
> verify by unit tests or any one could point it out.)
>       Above all, I‘m wondering if we could make it simple with following 
> changes:
>  * There are only two roles:ANN, SNN
>  * ANN will roll its edit log every DFS_HA_LOGROLL_PERIOD_KEY period.
>  * ANN will select a SNN to download checkpoint.
> SNN will just do logtail and checkpoint. Then provide a servlet for fsimage 
> downloading as normal. SNN will not try to roll edit log or send checkpoint 
> request to ANN.
> In a word, ANN will be more active. Suggestions are welcomed.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14437) Exception happened when rollEditLog expects empty EditsDoubleBuffer.bufCurrent but not

2019-04-19 Thread star (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16821862#comment-16821862
 ] 

star commented on HDFS-14437:
-

[~angerszhuuu], better upload your unit test code for others to reproduce the 
issue.

> Exception happened when   rollEditLog expects empty 
> EditsDoubleBuffer.bufCurrent  but not
> -
>
> Key: HDFS-14437
> URL: https://issues.apache.org/jira/browse/HDFS-14437
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, namenode, qjm
>Reporter: angerszhu
>Priority: Major
>
> For the problem mentioned in https://issues.apache.org/jira/browse/HDFS-10943 
> , I have sort the process of write and flush EditLog and some important 
> function, I found the in the class  FSEditLog class, the close() function 
> will call such process like below:
>  
> {code:java}
> waitForSyncToFinish();
> endCurrentLogSegment(true);{code}
> since we have gain the object lock in the function close(), so when  
> waitForSyncToFish() method return, it mean all logSync job has done and all 
> data in bufReady has been flushed out, and since current thread has the lock 
> of this object, when call endCurrentLogSegment(), no other thread will gain 
> the lock so they can't write new editlog into currentBuf.
> But when we don't call waitForSyncToFish() before endCurrentLogSegment(), 
> there may be some autoScheduled logSync()'s flush process is doing, since 
> this process don't need
> synchronization since it has mention in the comment of logSync() method :
>  
> {code:java}
> /**
>  * Sync all modifications done by this thread.
>  *
>  * The internal concurrency design of this class is as follows:
>  *   - Log items are written synchronized into an in-memory buffer,
>  * and each assigned a transaction ID.
>  *   - When a thread (client) would like to sync all of its edits, logSync()
>  * uses a ThreadLocal transaction ID to determine what edit number must
>  * be synced to.
>  *   - The isSyncRunning volatile boolean tracks whether a sync is currently
>  * under progress.
>  *
>  * The data is double-buffered within each edit log implementation so that
>  * in-memory writing can occur in parallel with the on-disk writing.
>  *
>  * Each sync occurs in three steps:
>  *   1. synchronized, it swaps the double buffer and sets the isSyncRunning
>  *  flag.
>  *   2. unsynchronized, it flushes the data to storage
>  *   3. synchronized, it resets the flag and notifies anyone waiting on the
>  *  sync.
>  *
>  * The lack of synchronization on step 2 allows other threads to continue
>  * to write into the memory buffer while the sync is in progress.
>  * Because this step is unsynchronized, actions that need to avoid
>  * concurrency with sync() should be synchronized and also call
>  * waitForSyncToFinish() before assuming they are running alone.
>  */
> public void logSync() {
>   long syncStart = 0;
>   // Fetch the transactionId of this thread. 
>   long mytxid = myTransactionId.get().txid;
>   
>   boolean sync = false;
>   try {
> EditLogOutputStream logStream = null;
> synchronized (this) {
>   try {
> printStatistics(false);
> // if somebody is already syncing, then wait
> while (mytxid > synctxid && isSyncRunning) {
>   try {
> wait(1000);
>   } catch (InterruptedException ie) {
>   }
> }
> //
> // If this transaction was already flushed, then nothing to do
> //
> if (mytxid <= synctxid) {
>   numTransactionsBatchedInSync++;
>   if (metrics != null) {
> // Metrics is non-null only when used inside name node
> metrics.incrTransactionsBatchedInSync();
>   }
>   return;
> }
>
> // now, this thread will do the sync
> syncStart = txid;
> isSyncRunning = true;
> sync = true;
> // swap buffers
> try {
>   if (journalSet.isEmpty()) {
> throw new IOException("No journals available to flush");
>   }
>   editLogStream.setReadyToFlush();
> } catch (IOException e) {
>   final String msg =
>   "Could not sync enough journals to persistent storage " +
>   "due to " + e.getMessage() + ". " +
>   "Unsynced transactions: " + (txid - synctxid);
>   LOG.fatal(msg, new Exception());
>   synchronized(journalSetLock) {
> IOUtils.cleanup(LOG, journalSet);
>   }
>   terminate(1, msg);
> }
>   } finally {
> // Prevent RuntimeException from blocking other log edit write 
> doneWithAutoSyncScheduling();
>   }
>   //editLogStream may become null,
>   //so 

[jira] [Assigned] (HDFS-14439) FileNotFoundException thrown in TestEditLogRace when just add a setPermission operation

2019-04-18 Thread star (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

star reassigned HDFS-14439:
---

Assignee: star

> FileNotFoundException thrown in TestEditLogRace when just add a setPermission 
> operation
> ---
>
> Key: HDFS-14439
> URL: https://issues.apache.org/jira/browse/HDFS-14439
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fs
>Reporter: star
>Assignee: star
>Priority: Minor
>
> In TestEditLogRace.Transactions#run, add following code between mkdirs and 
> delete.
>  
> {panel}
> fs.setPermission(dirnamePath, p);
> {panel}
> It will be like. 
> {panel}
> fs.mkdirs(dirnamePath);
> *fs.setPermission(dirnamePath, p);*
> fs.delete(dirnamePath, true);{panel}
> then run test TestEditLogRace#testEditLogRolling, it will throw 
> FileNotFoundException.
> {code:java}
> java.io.FileNotFoundException: cannot find /thr-291-dir-2
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.resolveLastINode(FSDirectory.java:1524)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.unprotectedSetPermission(FSDirAttrOp.java:264)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:656)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:287)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:182)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:159)
> at 
> org.apache.hadoop.hdfs.server.namenode.TestEditLogRace.verifyEditLogs(TestEditLogRace.java:293)
> at 
> org.apache.hadoop.hdfs.server.namenode.TestEditLogRace.testEditLogRolling(TestEditLogRace.java:258)
> {code}
> It happens when verifing edit logs and finds the target dir does not exits. 
> Any one could help figure out whether it makes sense or what makes it 
> behaving that way?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14378) Simplify the design of multiple NN and both logic of edit log roll and checkpoint

2019-04-18 Thread star (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

star updated HDFS-14378:

Attachment: HDFS-14378-trunk.005.patch

> Simplify the design of multiple NN and both logic of edit log roll and 
> checkpoint
> -
>
> Key: HDFS-14378
> URL: https://issues.apache.org/jira/browse/HDFS-14378
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.1.2
>Reporter: star
>Assignee: star
>Priority: Minor
>  Labels: patch
> Attachments: HDFS-14378-trunk.001.patch, HDFS-14378-trunk.002.patch, 
> HDFS-14378-trunk.003.patch, HDFS-14378-trunk.004.patch, 
> HDFS-14378-trunk.005.patch
>
>
>       HDFS-6440 introduced a mechanism to support more than 2 NNs. It 
> implements a first-writer-win policy to avoid duplicated fsimage downloading. 
> Variable 'isPrimaryCheckPointer' is used to hold the first-writer state, with 
> which SNN will provide fsimage for ANN next time. Then we have three roles in 
> NN cluster: ANN, one primary SNN, one or more normal SNN.
>       Since HDFS-12248, there may be more than two primary SNN shortly after 
> a exception occurred. It takes care with a scenario  that SNN will not upload 
> fsimage on IOE and Interrupted exceptions. Though it will not cause any 
> further functional issues, it is inconsistent. 
>       Futher more, edit log may be rolled more frequently than necessary with 
> multiple Standby name nodes, HDFS-14349. (I'm not so sure about this, will 
> verify by unit tests or any one could point it out.)
>       Above all, I‘m wondering if we could make it simple with following 
> changes:
>  * There are only two roles:ANN, SNN
>  * ANN will roll its edit log every DFS_HA_LOGROLL_PERIOD_KEY period.
>  * ANN will select a SNN to download checkpoint.
> SNN will just do logtail and checkpoint. Then provide a servlet for fsimage 
> downloading as normal. SNN will not try to roll edit log or send checkpoint 
> request to ANN.
> In a word, ANN will be more active. Suggestions are welcomed.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14439) FileNotFoundException thrown in TestEditLogRace when just add a setPermission operation

2019-04-18 Thread star (JIRA)
star created HDFS-14439:
---

 Summary: FileNotFoundException thrown in TestEditLogRace when just 
add a setPermission operation
 Key: HDFS-14439
 URL: https://issues.apache.org/jira/browse/HDFS-14439
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: fs
Reporter: star


In TestEditLogRace.Transactions#run, add following code between mkdirs and 
delete.

 
{panel}
fs.setPermission(dirnamePath, p);
{panel}
It will be like. 
{panel}
fs.mkdirs(dirnamePath);
*fs.setPermission(dirnamePath, p);*
fs.delete(dirnamePath, true);{panel}
then run test TestEditLogRace#testEditLogRolling, it will throw 
FileNotFoundException.
{code:java}
java.io.FileNotFoundException: cannot find /thr-291-dir-2

at 
org.apache.hadoop.hdfs.server.namenode.FSDirectory.resolveLastINode(FSDirectory.java:1524)
at 
org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.unprotectedSetPermission(FSDirAttrOp.java:264)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:656)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:287)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:182)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:159)
at 
org.apache.hadoop.hdfs.server.namenode.TestEditLogRace.verifyEditLogs(TestEditLogRace.java:293)
at 
org.apache.hadoop.hdfs.server.namenode.TestEditLogRace.testEditLogRolling(TestEditLogRace.java:258)
{code}
It happens when verifing edit logs and finds the target dir does not exits. Any 
one could help figure out whether it makes sense or what makes it behaving that 
way?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14437) Exception happened when rollEditLog expects empty EditsDoubleBuffer.bufCurrent but not

2019-04-18 Thread star (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16821092#comment-16821092
 ] 

star commented on HDFS-14437:
-

[~angerszhuuu]. If fsLock ignored, the senario He Xiaoqiao listed could occur 
because flush doesn't need object lock of FSEditlog.

> Exception happened when   rollEditLog expects empty 
> EditsDoubleBuffer.bufCurrent  but not
> -
>
> Key: HDFS-14437
> URL: https://issues.apache.org/jira/browse/HDFS-14437
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, namenode, qjm
>Reporter: angerszhu
>Priority: Major
>
> For the problem mentioned in https://issues.apache.org/jira/browse/HDFS-10943 
> , I have sort the process of write and flush EditLog and some important 
> function, I found the in the class  FSEditLog class, the close() function 
> will call such process like below:
>  
> {code:java}
> waitForSyncToFinish();
> endCurrentLogSegment(true);{code}
> since we have gain the object lock in the function close(), so when  
> waitForSyncToFish() method return, it mean all logSync job has done and all 
> data in bufReady has been flushed out, and since current thread has the lock 
> of this object, when call endCurrentLogSegment(), no other thread will gain 
> the lock so they can't write new editlog into currentBuf.
> But when we don't call waitForSyncToFish() before endCurrentLogSegment(), 
> there may be some autoScheduled logSync()'s flush process is doing, since 
> this process don't need
> synchronization since it has mention in the comment of logSync() method :
>  
> {code:java}
> /**
>  * Sync all modifications done by this thread.
>  *
>  * The internal concurrency design of this class is as follows:
>  *   - Log items are written synchronized into an in-memory buffer,
>  * and each assigned a transaction ID.
>  *   - When a thread (client) would like to sync all of its edits, logSync()
>  * uses a ThreadLocal transaction ID to determine what edit number must
>  * be synced to.
>  *   - The isSyncRunning volatile boolean tracks whether a sync is currently
>  * under progress.
>  *
>  * The data is double-buffered within each edit log implementation so that
>  * in-memory writing can occur in parallel with the on-disk writing.
>  *
>  * Each sync occurs in three steps:
>  *   1. synchronized, it swaps the double buffer and sets the isSyncRunning
>  *  flag.
>  *   2. unsynchronized, it flushes the data to storage
>  *   3. synchronized, it resets the flag and notifies anyone waiting on the
>  *  sync.
>  *
>  * The lack of synchronization on step 2 allows other threads to continue
>  * to write into the memory buffer while the sync is in progress.
>  * Because this step is unsynchronized, actions that need to avoid
>  * concurrency with sync() should be synchronized and also call
>  * waitForSyncToFinish() before assuming they are running alone.
>  */
> public void logSync() {
>   long syncStart = 0;
>   // Fetch the transactionId of this thread. 
>   long mytxid = myTransactionId.get().txid;
>   
>   boolean sync = false;
>   try {
> EditLogOutputStream logStream = null;
> synchronized (this) {
>   try {
> printStatistics(false);
> // if somebody is already syncing, then wait
> while (mytxid > synctxid && isSyncRunning) {
>   try {
> wait(1000);
>   } catch (InterruptedException ie) {
>   }
> }
> //
> // If this transaction was already flushed, then nothing to do
> //
> if (mytxid <= synctxid) {
>   numTransactionsBatchedInSync++;
>   if (metrics != null) {
> // Metrics is non-null only when used inside name node
> metrics.incrTransactionsBatchedInSync();
>   }
>   return;
> }
>
> // now, this thread will do the sync
> syncStart = txid;
> isSyncRunning = true;
> sync = true;
> // swap buffers
> try {
>   if (journalSet.isEmpty()) {
> throw new IOException("No journals available to flush");
>   }
>   editLogStream.setReadyToFlush();
> } catch (IOException e) {
>   final String msg =
>   "Could not sync enough journals to persistent storage " +
>   "due to " + e.getMessage() + ". " +
>   "Unsynced transactions: " + (txid - synctxid);
>   LOG.fatal(msg, new Exception());
>   synchronized(journalSetLock) {
> IOUtils.cleanup(LOG, journalSet);
>   }
>   terminate(1, msg);
> }
>   } finally {
> // Prevent RuntimeException from blocking other log edit write 
> doneWithAutoSyncScheduling();
>   }
>   

[jira] [Commented] (HDFS-14436) Configuration#getTimeDuration is not consistent between default value and manual settings.

2019-04-18 Thread star (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16821045#comment-16821045
 ] 

star commented on HDFS-14436:
-

Maybe I should move it to hadoop-common project.

> Configuration#getTimeDuration is not consistent between default value and 
> manual settings.
> --
>
> Key: HDFS-14436
> URL: https://issues.apache.org/jira/browse/HDFS-14436
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: star
>Assignee: star
>Priority: Major
> Attachments: HDFS-14436.001.patch
>
>
> When call getTimeDuration like this:
> {quote}conf.getTimeDuration("nn.interval", 10, 
> TimeUnit.{color:#9876aa}SECONDS{color}{color:#cc7832}, 
> {color}TimeUnit.{color:#9876aa}MILLISECONDS{color}){color:#cc7832};{color}
> {quote}
>  {color:#33}If "nn.interval" is set manually or configured in xml file, 
> 1 will be retrurned.{color}
>  If not, 10 will be returned while 1 is expected. 
>  The logic is not consistent.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14436) Configuration#getTimeDuration is not consistent between default value and manual settings.

2019-04-18 Thread star (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16821044#comment-16821044
 ] 

star commented on HDFS-14436:
-

It is a little wired to return only the raw literal number when default time 
units and literal number is provided. It may mislead us to unexpected behave if 
we just take into account its name and parameters.  Though changing long 10 to 
string '10s' will return expected result, I think it should return the same 
result when long 10 and default time unit SECOND is given. 

> Configuration#getTimeDuration is not consistent between default value and 
> manual settings.
> --
>
> Key: HDFS-14436
> URL: https://issues.apache.org/jira/browse/HDFS-14436
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: star
>Assignee: star
>Priority: Major
> Attachments: HDFS-14436.001.patch
>
>
> When call getTimeDuration like this:
> {quote}conf.getTimeDuration("nn.interval", 10, 
> TimeUnit.{color:#9876aa}SECONDS{color}{color:#cc7832}, 
> {color}TimeUnit.{color:#9876aa}MILLISECONDS{color}){color:#cc7832};{color}
> {quote}
>  {color:#33}If "nn.interval" is set manually or configured in xml file, 
> 1 will be retrurned.{color}
>  If not, 10 will be returned while 1 is expected. 
>  The logic is not consistent.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14436) Configuration#getTimeDuration is not consistent between default value and manual settings.

2019-04-18 Thread star (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

star updated HDFS-14436:

Attachment: HDFS-14436.001.patch

> Configuration#getTimeDuration is not consistent between default value and 
> manual settings.
> --
>
> Key: HDFS-14436
> URL: https://issues.apache.org/jira/browse/HDFS-14436
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: star
>Assignee: star
>Priority: Major
> Attachments: HDFS-14436.001.patch
>
>
> When call getTimeDuration like this:
> {quote}conf.getTimeDuration("nn.interval", 10, 
> TimeUnit.{color:#9876aa}SECONDS{color}{color:#cc7832}, 
> {color}TimeUnit.{color:#9876aa}MILLISECONDS{color}){color:#cc7832};{color}
> {quote}
>  {color:#33}If "nn.interval" is set manually or configured in xml file, 
> 1 will be retrurned.{color}
>  If not, 10 will be returned while 1 is expected. 
>  The logic is not consistent.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14437) Exception happened when rollEditLog expects empty EditsDoubleBuffer.bufCurrent but not

2019-04-18 Thread star (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16820950#comment-16820950
 ] 

star edited comment on HDFS-14437 at 4/18/19 10:43 AM:
---

[~hexiaoqiao], agreed. There may be more editlog written into bufCurrent before 
flush.
{quote}In my opinion, The following code segment in FSEditLog#logSync which is 
out of {{synchronized}} is core reason.
{quote}
But the table you presented will hardly occurred, for rollEditLog and 
FSEditLog#logEdit are almost called with FSNamesystem.fsLock holding such as 
mkdir, setAcl. I'm still working on it to find that racing case.


was (Author: starphin):
[~hexiaoqiao], agreed.
{quote}In my opinion, The following code segment in FSEditLog#logSync which is 
out of {{synchronized}} is core reason.
try {if (logStream != null) \{
  logStream.flush();
}
  }{quote}

> Exception happened when   rollEditLog expects empty 
> EditsDoubleBuffer.bufCurrent  but not
> -
>
> Key: HDFS-14437
> URL: https://issues.apache.org/jira/browse/HDFS-14437
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, namenode, qjm
>Reporter: angerszhu
>Priority: Major
>
> For the problem mentioned in https://issues.apache.org/jira/browse/HDFS-10943 
> , I have sort the process of write and flush EditLog and some important 
> function, I found the in the class  FSEditLog class, the close() function 
> will call such process like below:
>  
> {code:java}
> waitForSyncToFinish();
> endCurrentLogSegment(true);{code}
> since we have gain the object lock in the function close(), so when  
> waitForSyncToFish() method return, it mean all logSync job has done and all 
> data in bufReady has been flushed out, and since current thread has the lock 
> of this object, when call endCurrentLogSegment(), no other thread will gain 
> the lock so they can't write new editlog into currentBuf.
> But when we don't call waitForSyncToFish() before endCurrentLogSegment(), 
> there may be some autoScheduled logSync()'s flush process is doing, since 
> this process don't need
> synchronization since it has mention in the comment of logSync() method :
>  
> {code:java}
> /**
>  * Sync all modifications done by this thread.
>  *
>  * The internal concurrency design of this class is as follows:
>  *   - Log items are written synchronized into an in-memory buffer,
>  * and each assigned a transaction ID.
>  *   - When a thread (client) would like to sync all of its edits, logSync()
>  * uses a ThreadLocal transaction ID to determine what edit number must
>  * be synced to.
>  *   - The isSyncRunning volatile boolean tracks whether a sync is currently
>  * under progress.
>  *
>  * The data is double-buffered within each edit log implementation so that
>  * in-memory writing can occur in parallel with the on-disk writing.
>  *
>  * Each sync occurs in three steps:
>  *   1. synchronized, it swaps the double buffer and sets the isSyncRunning
>  *  flag.
>  *   2. unsynchronized, it flushes the data to storage
>  *   3. synchronized, it resets the flag and notifies anyone waiting on the
>  *  sync.
>  *
>  * The lack of synchronization on step 2 allows other threads to continue
>  * to write into the memory buffer while the sync is in progress.
>  * Because this step is unsynchronized, actions that need to avoid
>  * concurrency with sync() should be synchronized and also call
>  * waitForSyncToFinish() before assuming they are running alone.
>  */
> public void logSync() {
>   long syncStart = 0;
>   // Fetch the transactionId of this thread. 
>   long mytxid = myTransactionId.get().txid;
>   
>   boolean sync = false;
>   try {
> EditLogOutputStream logStream = null;
> synchronized (this) {
>   try {
> printStatistics(false);
> // if somebody is already syncing, then wait
> while (mytxid > synctxid && isSyncRunning) {
>   try {
> wait(1000);
>   } catch (InterruptedException ie) {
>   }
> }
> //
> // If this transaction was already flushed, then nothing to do
> //
> if (mytxid <= synctxid) {
>   numTransactionsBatchedInSync++;
>   if (metrics != null) {
> // Metrics is non-null only when used inside name node
> metrics.incrTransactionsBatchedInSync();
>   }
>   return;
> }
>
> // now, this thread will do the sync
> syncStart = txid;
> isSyncRunning = true;
> sync = true;
> // swap buffers
> try {
>   if (journalSet.isEmpty()) {
> throw new IOException("No journals available to flush");
>   }
>   

[jira] [Commented] (HDFS-14437) Exception happened when rollEditLog expects empty EditsDoubleBuffer.bufCurrent but not

2019-04-18 Thread star (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16820950#comment-16820950
 ] 

star commented on HDFS-14437:
-

[~hexiaoqiao], agreed.
{quote}In my opinion, The following code segment in FSEditLog#logSync which is 
out of {{synchronized}} is core reason.
try {if (logStream != null) \{
  logStream.flush();
}
  }{quote}

> Exception happened when   rollEditLog expects empty 
> EditsDoubleBuffer.bufCurrent  but not
> -
>
> Key: HDFS-14437
> URL: https://issues.apache.org/jira/browse/HDFS-14437
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, namenode, qjm
>Reporter: angerszhu
>Priority: Major
>
> For the problem mentioned in https://issues.apache.org/jira/browse/HDFS-10943 
> , I have sort the process of write and flush EditLog and some important 
> function, I found the in the class  FSEditLog class, the close() function 
> will call such process like below:
>  
> {code:java}
> waitForSyncToFinish();
> endCurrentLogSegment(true);{code}
> since we have gain the object lock in the function close(), so when  
> waitForSyncToFish() method return, it mean all logSync job has done and all 
> data in bufReady has been flushed out, and since current thread has the lock 
> of this object, when call endCurrentLogSegment(), no other thread will gain 
> the lock so they can't write new editlog into currentBuf.
> But when we don't call waitForSyncToFish() before endCurrentLogSegment(), 
> there may be some autoScheduled logSync()'s flush process is doing, since 
> this process don't need
> synchronization since it has mention in the comment of logSync() method :
>  
> {code:java}
> /**
>  * Sync all modifications done by this thread.
>  *
>  * The internal concurrency design of this class is as follows:
>  *   - Log items are written synchronized into an in-memory buffer,
>  * and each assigned a transaction ID.
>  *   - When a thread (client) would like to sync all of its edits, logSync()
>  * uses a ThreadLocal transaction ID to determine what edit number must
>  * be synced to.
>  *   - The isSyncRunning volatile boolean tracks whether a sync is currently
>  * under progress.
>  *
>  * The data is double-buffered within each edit log implementation so that
>  * in-memory writing can occur in parallel with the on-disk writing.
>  *
>  * Each sync occurs in three steps:
>  *   1. synchronized, it swaps the double buffer and sets the isSyncRunning
>  *  flag.
>  *   2. unsynchronized, it flushes the data to storage
>  *   3. synchronized, it resets the flag and notifies anyone waiting on the
>  *  sync.
>  *
>  * The lack of synchronization on step 2 allows other threads to continue
>  * to write into the memory buffer while the sync is in progress.
>  * Because this step is unsynchronized, actions that need to avoid
>  * concurrency with sync() should be synchronized and also call
>  * waitForSyncToFinish() before assuming they are running alone.
>  */
> public void logSync() {
>   long syncStart = 0;
>   // Fetch the transactionId of this thread. 
>   long mytxid = myTransactionId.get().txid;
>   
>   boolean sync = false;
>   try {
> EditLogOutputStream logStream = null;
> synchronized (this) {
>   try {
> printStatistics(false);
> // if somebody is already syncing, then wait
> while (mytxid > synctxid && isSyncRunning) {
>   try {
> wait(1000);
>   } catch (InterruptedException ie) {
>   }
> }
> //
> // If this transaction was already flushed, then nothing to do
> //
> if (mytxid <= synctxid) {
>   numTransactionsBatchedInSync++;
>   if (metrics != null) {
> // Metrics is non-null only when used inside name node
> metrics.incrTransactionsBatchedInSync();
>   }
>   return;
> }
>
> // now, this thread will do the sync
> syncStart = txid;
> isSyncRunning = true;
> sync = true;
> // swap buffers
> try {
>   if (journalSet.isEmpty()) {
> throw new IOException("No journals available to flush");
>   }
>   editLogStream.setReadyToFlush();
> } catch (IOException e) {
>   final String msg =
>   "Could not sync enough journals to persistent storage " +
>   "due to " + e.getMessage() + ". " +
>   "Unsynced transactions: " + (txid - synctxid);
>   LOG.fatal(msg, new Exception());
>   synchronized(journalSetLock) {
> IOUtils.cleanup(LOG, journalSet);
>   }
>   terminate(1, msg);
> }
>   } finally {
> // Prevent 

[jira] [Comment Edited] (HDFS-14437) Exception happened when rollEditLog expects empty EditsDoubleBuffer.bufCurrent but not

2019-04-18 Thread star (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16820886#comment-16820886
 ] 

star edited comment on HDFS-14437 at 4/18/19 9:29 AM:
--

Thanks [~hexiaoqiao] . The table really make the issue clearer.

I've tracking FSNamesystem and FSEditLog, found that FSEditLog#rollEditlog 
always being called holding fsLock, such as {color:#33}rollEditLog{color}, 
startRollingUpgrade,  finalizeRollingUpgrade in FSNamesystem.
{quote}CheckpointSignature {color:#ffc66d}rollEditLog{color}() 
{color:#cc7832}throws {color}IOException {
   writeLock();{color:#cc7832} {color}

{color:#cc7832}  try {color}{
     result = 
getFSImage().rollEditLog(getEffectiveLayoutVersion()){color:#cc7832};{color}
 

  } {color:#cc7832}finally {color}{
     writeUnlock(operationName){color:#cc7832};{color}  

  }{color:#cc7832}   {color}

{color:#cc7832}   return {color}result{color:#cc7832};{color}

}
{quote}
FSEditLog#logSync is also called holding fsLock in most file/dir related 
operations, such as mkdir, delete. But permission related operation is called 
out of fsLock, such as setPermission, setOwner, setAcl.
{quote}{color:#cc7832}void {color}{color:#ffc66d}setPermission{color}(String 
src{color:#cc7832}, {color}FsPermission permission) {color:#cc7832}throws 
{color}IOException {
    writeLock();{color:#cc7832} {color}

{color:#cc7832}       try {color}{
   auditStat = FSDirAttrOp.setPermission(dir{color:#cc7832}, 
{color}pc{color:#cc7832}, {color}src{color:#cc7832}, 
{color}permission){color:#cc7832};{color}

   } {color:#cc7832}catch {color}(AccessControlException e)
Unknown macro: \{    }
finally {
  writeUnlock(operationName){color:#cc7832};{color}  

   }
    getEditLog().logSync(){color:#cc7832};{color}

}
{quote}
Theoretically it will be a race case to cause this issue. Following operations 
as table below will reproduce the issue though it seems very difficult.
||Time||Thread1(rollEditlog)||Thread2(setPermission)||
|t0| |writeUnlock|
|t1|logEdit|-|
|t2|logSyncAll|-|
|t3|-|logSync|
|t4|finalize|-|
|t5|exception and terminate| |

 

 


was (Author: starphin):
Thanks [~hexiaoqiao] . The table really make the issue clearer.

I've tracking FSNamesystem and FSEditLog, found that FSEditLog#rollEditlog 
always being called holding fsLock, such as {color:#33}rollEditLog{color}, 
startRollingUpgrade,  finalizeRollingUpgrade in FSNamesystem.
{quote}CheckpointSignature {color:#ffc66d}rollEditLog{color}() 
{color:#cc7832}throws {color}IOException {
   writeLock();{color:#cc7832} {color}

{color:#cc7832}  try {color}{
     result = 
getFSImage().rollEditLog(getEffectiveLayoutVersion()){color:#cc7832};{color}
 

  } {color:#cc7832}finally {color}{
     writeUnlock(operationName){color:#cc7832};{color}  

  }{color:#cc7832}   {color}

{color:#cc7832}   return {color}result{color:#cc7832};{color}

}
{quote}
FSEditLog#logSync is also called holding fsLock in most file/dir related 
operations, such as mkdir, delete. But permission related operation is called 
out of fsLock, such as setPermission, setOwner, setAcl.
{quote}{color:#cc7832}void {color}{color:#ffc66d}setPermission{color}(String 
src{color:#cc7832}, {color}FsPermission permission) {color:#cc7832}throws 
{color}IOException {
    writeLock();{color:#cc7832} {color}

{color:#cc7832}       try {color}{
   auditStat = FSDirAttrOp.setPermission(dir{color:#cc7832}, 
{color}pc{color:#cc7832}, {color}src{color:#cc7832}, 
{color}permission){color:#cc7832};{color}

   } {color:#cc7832}catch {color}(AccessControlException e) {
    } finally {
  writeUnlock(operationName){color:#cc7832};{color}  

   }
    getEditLog().logSync(){color:#cc7832};{color}

}
{quote}
Theoretically it will be a race case to cause this issue. Following operations 
as table below will reproduce the issue though it seems very difficult.
||Time||Thread1(rollEditlog)||Thread2(somewhat write)||
|t0| |writeUnlock|
|t1|logEdit|-|
|t2|logSyncAll|-|
|t3|-|logSync|
|t4|finalize|-|
|t5|exception and terminate| |

 

 

> Exception happened when   rollEditLog expects empty 
> EditsDoubleBuffer.bufCurrent  but not
> -
>
> Key: HDFS-14437
> URL: https://issues.apache.org/jira/browse/HDFS-14437
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, namenode, qjm
>Reporter: angerszhu
>Priority: Major
>
> For the problem mentioned in https://issues.apache.org/jira/browse/HDFS-10943 
> , I have sort the process of write and flush EditLog and some important 
> function, I found the in the class  FSEditLog 

[jira] [Comment Edited] (HDFS-14437) Exception happened when rollEditLog expects empty EditsDoubleBuffer.bufCurrent but not

2019-04-18 Thread star (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16820886#comment-16820886
 ] 

star edited comment on HDFS-14437 at 4/18/19 9:25 AM:
--

Thanks [~hexiaoqiao] . The table really make the issue clearer.

I've tracking FSNamesystem and FSEditLog, found that FSEditLog#rollEditlog 
always being called holding fsLock, such as {color:#33}rollEditLog{color}, 
startRollingUpgrade,  finalizeRollingUpgrade in FSNamesystem.
{quote}CheckpointSignature {color:#ffc66d}rollEditLog{color}() 
{color:#cc7832}throws {color}IOException {
   writeLock();{color:#cc7832} {color}

{color:#cc7832}  try {color}{
     result = 
getFSImage().rollEditLog(getEffectiveLayoutVersion()){color:#cc7832};{color}
 

  } {color:#cc7832}finally {color}{
     writeUnlock(operationName){color:#cc7832};{color}  

  }{color:#cc7832}   {color}

{color:#cc7832}   return {color}result{color:#cc7832};{color}

}
{quote}
FSEditLog#logSync is also called holding fsLock in most file/dir related 
operations, such as mkdir, delete. But permission related operation is called 
out of fsLock, such as setPermission, setOwner, setAcl.
{quote}{color:#cc7832}void {color}{color:#ffc66d}setPermission{color}(String 
src{color:#cc7832}, {color}FsPermission permission) {color:#cc7832}throws 
{color}IOException {
    writeLock();{color:#cc7832} {color}

{color:#cc7832}       try {color}{
   auditStat = FSDirAttrOp.setPermission(dir{color:#cc7832}, 
{color}pc{color:#cc7832}, {color}src{color:#cc7832}, 
{color}permission){color:#cc7832};{color}

   } {color:#cc7832}catch {color}(AccessControlException e) {
    } finally {
  writeUnlock(operationName){color:#cc7832};{color}  

   }
    getEditLog().logSync(){color:#cc7832};{color}

}
{quote}
Theoretically it will be a race case to cause this issue. Following operations 
as table below will reproduce the issue though it seems very difficult.
||Time||Thread1(rollEditlog)||Thread2(somewhat write)||
|t0| |writeUnlock|
|t1|logEdit|-|
|t2|logSyncAll|-|
|t3|-|logSync|
|t4|finalize|-|
|t5|exception and terminate| |

 

 


was (Author: starphin):
Thanks [~hexiaoqiao] . The table really make the issue clearer.

I've tracking FSNamesystem and FSEditLog, found that FSEditLog#rollEditlog 
always being called holding fsLock, such as {color:#33}rollEditLog{color}, 
startRollingUpgrade,  finalizeRollingUpgrade in FSNamesystem.
{quote}CheckpointSignature {color:#ffc66d}rollEditLog{color}() 
{color:#cc7832}throws {color}IOException {{color:#cc7832}
{color}  writeLock(){color:#cc7832};
{color}{color:#cc7832}  try {color}{
    result = 
getFSImage().rollEditLog(getEffectiveLayoutVersion()){color:#cc7832};
{color}  } {color:#cc7832}finally {color}{
    writeUnlock(operationName){color:#cc7832};
{color}   }{color:#cc7832}
{color}{color:#cc7832}   return {color}result{color:#cc7832};
{color}}{quote}
FSEditLog#logSync is also called holding fsLock in most file/dir related 
operations, such as mkdir, delete. But permission related operation is called 
out of fsLock, such as setPermission, setOwner, setAcl.
{quote}{color:#cc7832}void {color}{color:#ffc66d}setPermission{color}(String 
src{color:#cc7832}, {color}FsPermission permission) {color:#cc7832}throws 
{color}IOException {{color:#cc7832}
{color}   writeLock(){color:#cc7832};
{color}{color:#cc7832}   try {color}{{color:#cc7832}
{color}  auditStat = 
FSDirAttrOp.setPermission({color:#9876aa}dir{color}{color:#cc7832}, 
{color}pc{color:#cc7832}, {color}src{color:#cc7832}, 
{color}permission){color:#cc7832};
{color}   } {color:#cc7832}catch {color}(AccessControlException e) 
{{color:#cc7832}
{color}   } {color:#cc7832}finally {color}{
 writeUnlock(operationName){color:#cc7832};
{color}   }
   getEditLog().logSync(){color:#cc7832};{color}{color:#cc7832}
{color}}{quote}
Theoretically it will be a race case to cause this issue. Following operations 
as table below will reproduce the issue though it seems very difficult.
||Time||Thread1(rollEditlog)||Thread2(somewhat write)||
|t0| |writeUnlock|
|t1|logEdit|-|
|t2|logSyncAll|-|
|t3|-|logSync|
|t4|finalize|-|
|t5|exception and terminate| |

 

 

> Exception happened when   rollEditLog expects empty 
> EditsDoubleBuffer.bufCurrent  but not
> -
>
> Key: HDFS-14437
> URL: https://issues.apache.org/jira/browse/HDFS-14437
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, namenode, qjm
>Reporter: angerszhu
>Priority: Major
>
> For the problem mentioned in https://issues.apache.org/jira/browse/HDFS-10943 
> , I have sort the process of write and flush 

[jira] [Commented] (HDFS-14437) Exception happened when rollEditLog expects empty EditsDoubleBuffer.bufCurrent but not

2019-04-18 Thread star (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16820886#comment-16820886
 ] 

star commented on HDFS-14437:
-

Thanks [~hexiaoqiao] . The table really make the issue clearer.

I've tracking FSNamesystem and FSEditLog, found that FSEditLog#rollEditlog 
always being called holding fsLock, such as {color:#33}rollEditLog{color}, 
startRollingUpgrade,  finalizeRollingUpgrade in FSNamesystem.
{quote}CheckpointSignature {color:#ffc66d}rollEditLog{color}() 
{color:#cc7832}throws {color}IOException {{color:#cc7832}
{color}  writeLock(){color:#cc7832};
{color}{color:#cc7832}  try {color}{
    result = 
getFSImage().rollEditLog(getEffectiveLayoutVersion()){color:#cc7832};
{color}  } {color:#cc7832}finally {color}{
    writeUnlock(operationName){color:#cc7832};
{color}   }{color:#cc7832}
{color}{color:#cc7832}   return {color}result{color:#cc7832};
{color}}{quote}
FSEditLog#logSync is also called holding fsLock in most file/dir related 
operations, such as mkdir, delete. But permission related operation is called 
out of fsLock, such as setPermission, setOwner, setAcl.
{quote}{color:#cc7832}void {color}{color:#ffc66d}setPermission{color}(String 
src{color:#cc7832}, {color}FsPermission permission) {color:#cc7832}throws 
{color}IOException {{color:#cc7832}
{color}   writeLock(){color:#cc7832};
{color}{color:#cc7832}   try {color}{{color:#cc7832}
{color}  auditStat = 
FSDirAttrOp.setPermission({color:#9876aa}dir{color}{color:#cc7832}, 
{color}pc{color:#cc7832}, {color}src{color:#cc7832}, 
{color}permission){color:#cc7832};
{color}   } {color:#cc7832}catch {color}(AccessControlException e) 
{{color:#cc7832}
{color}   } {color:#cc7832}finally {color}{
 writeUnlock(operationName){color:#cc7832};
{color}   }
   getEditLog().logSync(){color:#cc7832};{color}{color:#cc7832}
{color}}{quote}
Theoretically it will be a race case to cause this issue. Following operations 
as table below will reproduce the issue though it seems very difficult.
||Time||Thread1(rollEditlog)||Thread2(somewhat write)||
|t0| |writeUnlock|
|t1|logEdit|-|
|t2|logSyncAll|-|
|t3|-|logSync|
|t4|finalize|-|
|t5|exception and terminate| |

 

 

> Exception happened when   rollEditLog expects empty 
> EditsDoubleBuffer.bufCurrent  but not
> -
>
> Key: HDFS-14437
> URL: https://issues.apache.org/jira/browse/HDFS-14437
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, namenode, qjm
>Reporter: angerszhu
>Priority: Major
>
> For the problem mentioned in https://issues.apache.org/jira/browse/HDFS-10943 
> , I have sort the process of write and flush EditLog and some important 
> function, I found the in the class  FSEditLog class, the close() function 
> will call such process like below:
>  
> {code:java}
> waitForSyncToFinish();
> endCurrentLogSegment(true);{code}
> since we have gain the object lock in the function close(), so when  
> waitForSyncToFish() method return, it mean all logSync job has done and all 
> data in bufReady has been flushed out, and since current thread has the lock 
> of this object, when call endCurrentLogSegment(), no other thread will gain 
> the lock so they can't write new editlog into currentBuf.
> But when we don't call waitForSyncToFish() before endCurrentLogSegment(), 
> there may be some autoScheduled logSync()'s flush process is doing, since 
> this process don't need
> synchronization since it has mention in the comment of logSync() method :
>  
> {code:java}
> /**
>  * Sync all modifications done by this thread.
>  *
>  * The internal concurrency design of this class is as follows:
>  *   - Log items are written synchronized into an in-memory buffer,
>  * and each assigned a transaction ID.
>  *   - When a thread (client) would like to sync all of its edits, logSync()
>  * uses a ThreadLocal transaction ID to determine what edit number must
>  * be synced to.
>  *   - The isSyncRunning volatile boolean tracks whether a sync is currently
>  * under progress.
>  *
>  * The data is double-buffered within each edit log implementation so that
>  * in-memory writing can occur in parallel with the on-disk writing.
>  *
>  * Each sync occurs in three steps:
>  *   1. synchronized, it swaps the double buffer and sets the isSyncRunning
>  *  flag.
>  *   2. unsynchronized, it flushes the data to storage
>  *   3. synchronized, it resets the flag and notifies anyone waiting on the
>  *  sync.
>  *
>  * The lack of synchronization on step 2 allows other threads to continue
>  * to write into the memory buffer while the sync is in progress.
>  * Because this step is unsynchronized, actions that need to avoid
>  * concurrency with sync() 

[jira] [Updated] (HDFS-14436) Configuration#getTimeDuration is not consistent between default value and manual settings.

2019-04-17 Thread star (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

star updated HDFS-14436:

Description: 
When call getTimeDuration like this:
{quote}conf.getTimeDuration("nn.interval", 10, 
TimeUnit.{color:#9876aa}SECONDS{color}{color:#cc7832}, 
{color}TimeUnit.{color:#9876aa}MILLISECONDS{color}){color:#cc7832};{color}
{quote}
 {color:#33}If "nn.interval" is set manually or configured in xml file, 
1 will be retrurned.{color}
 If not, 10 will be returned while 1 is expected. 
 The logic is not consistent.

  was:
When call getTimeDuration like this:
{quote}conf.getTimeDuration("nn.interval", 10, 
TimeUnit.{color:#9876aa}SECONDS{color}{color:#cc7832}, 
{color}TimeUnit.{color:#9876aa}MILLISECONDS{color}){color:#cc7832};{color}
{quote}
 

{color:#33}{color:#cc7832}If "nn.interval" is set manually or configured in 
xml file, 1 will be retrurned.{color}{color}
 If not, 10 will be returned while 1 is expected. 
 The logic is not consistent.


> Configuration#getTimeDuration is not consistent between default value and 
> manual settings.
> --
>
> Key: HDFS-14436
> URL: https://issues.apache.org/jira/browse/HDFS-14436
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: star
>Assignee: star
>Priority: Major
>
> When call getTimeDuration like this:
> {quote}conf.getTimeDuration("nn.interval", 10, 
> TimeUnit.{color:#9876aa}SECONDS{color}{color:#cc7832}, 
> {color}TimeUnit.{color:#9876aa}MILLISECONDS{color}){color:#cc7832};{color}
> {quote}
>  {color:#33}If "nn.interval" is set manually or configured in xml file, 
> 1 will be retrurned.{color}
>  If not, 10 will be returned while 1 is expected. 
>  The logic is not consistent.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14436) Configuration#getTimeDuration is not consistent between default value and manual settings.

2019-04-17 Thread star (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

star updated HDFS-14436:

Description: 
When call getTimeDuration like this:
{quote}conf.getTimeDuration("nn.interval", 10, 
TimeUnit.{color:#9876aa}SECONDS{color}{color:#cc7832}, 
{color}TimeUnit.{color:#9876aa}MILLISECONDS{color}){color:#cc7832};{color}
{quote}
 

{color:#33}{color:#cc7832}If "nn.interval" is set manually or configured in 
xml file, 1 will be retrurned.{color}{color}
 If not, 10 will be returned while 1 is expected. 
 The logic is not consistent.

  was:
When call getTimeDuration like this:
{quote}conf.getTimeDuration("nn.interval", 10, 
TimeUnit.{color:#9876aa}SECONDS{color}{color:#cc7832}, 
{color}TimeUnit.{color:#9876aa}MILLISECONDS{color}){color:#cc7832};
{color}
{quote}
{color:#cc7832}{color:#33}If "nn.interval" is set manually or configured in 
xml file, 1 will be retrurned.
{color}{color}

{color:#cc7832}{color:#33}If not, 10 will be returned while 1 is 
expected.{color} {color}

{color:#33}{color:#cc7832}The logic is not consistent.{color}{color}


> Configuration#getTimeDuration is not consistent between default value and 
> manual settings.
> --
>
> Key: HDFS-14436
> URL: https://issues.apache.org/jira/browse/HDFS-14436
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: star
>Assignee: star
>Priority: Major
>
> When call getTimeDuration like this:
> {quote}conf.getTimeDuration("nn.interval", 10, 
> TimeUnit.{color:#9876aa}SECONDS{color}{color:#cc7832}, 
> {color}TimeUnit.{color:#9876aa}MILLISECONDS{color}){color:#cc7832};{color}
> {quote}
>  
> {color:#33}{color:#cc7832}If "nn.interval" is set manually or configured 
> in xml file, 1 will be retrurned.{color}{color}
>  If not, 10 will be returned while 1 is expected. 
>  The logic is not consistent.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14436) Configuration#getTimeDuration is not consistent between default value and manual settings.

2019-04-17 Thread star (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

star updated HDFS-14436:

Description: 
When call getTimeDuration like this:
{quote}conf.getTimeDuration("nn.interval", 10, 
TimeUnit.{color:#9876aa}SECONDS{color}{color:#cc7832}, 
{color}TimeUnit.{color:#9876aa}MILLISECONDS{color}){color:#cc7832};
{color}
{quote}
{color:#cc7832}{color:#33}If "nn.interval" is set manually or configured in 
xml file, 1 will be retrurned.
{color}{color}

{color:#cc7832}{color:#33}If not, 10 will be returned while 1 is 
expected.{color} {color}

{color:#33}{color:#cc7832}The logic is not consistent.{color}{color}

  was:
When call getTimeDuration like this:
{quote}conf.getTimeDuration("property", 10, 
TimeUnit.{color:#9876aa}SECONDS{color}{color:#cc7832}, 
{color}TimeUnit.{color:#9876aa}MILLISECONDS{color}){color:#cc7832};
{color}

 
{quote}


> Configuration#getTimeDuration is not consistent between default value and 
> manual settings.
> --
>
> Key: HDFS-14436
> URL: https://issues.apache.org/jira/browse/HDFS-14436
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: star
>Assignee: star
>Priority: Major
>
> When call getTimeDuration like this:
> {quote}conf.getTimeDuration("nn.interval", 10, 
> TimeUnit.{color:#9876aa}SECONDS{color}{color:#cc7832}, 
> {color}TimeUnit.{color:#9876aa}MILLISECONDS{color}){color:#cc7832};
> {color}
> {quote}
> {color:#cc7832}{color:#33}If "nn.interval" is set manually or configured 
> in xml file, 1 will be retrurned.
> {color}{color}
> {color:#cc7832}{color:#33}If not, 10 will be returned while 1 is 
> expected.{color} {color}
> {color:#33}{color:#cc7832}The logic is not consistent.{color}{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14436) Configuration#getTimeDuration is not consistent between default value and manual settings.

2019-04-17 Thread star (JIRA)
star created HDFS-14436:
---

 Summary: Configuration#getTimeDuration is not consistent between 
default value and manual settings.
 Key: HDFS-14436
 URL: https://issues.apache.org/jira/browse/HDFS-14436
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: star
Assignee: star


When call getTimeDuration like this:
{quote}conf.getTimeDuration("property", 10, 
TimeUnit.{color:#9876aa}SECONDS{color}{color:#cc7832}, 
{color}TimeUnit.{color:#9876aa}MILLISECONDS{color}){color:#cc7832};
{color}

 
{quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10659) Namenode crashes after Journalnode re-installation in an HA cluster due to missing paxos directory

2019-04-17 Thread star (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16820206#comment-16820206
 ] 

star commented on HDFS-10659:
-

[~jojochuang], would you like to take a review for the patch?

> Namenode crashes after Journalnode re-installation in an HA cluster due to 
> missing paxos directory
> --
>
> Key: HDFS-10659
> URL: https://issues.apache.org/jira/browse/HDFS-10659
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ha, journal-node
>Affects Versions: 2.7.0
>Reporter: Amit Anand
>Assignee: star
>Priority: Major
> Attachments: HDFS-10659.000.patch, HDFS-10659.001.patch, 
> HDFS-10659.002.patch, HDFS-10659.003.patch, HDFS-10659.004.patch, 
> HDFS-10659.005.patch, HDFS-10659.006.patch
>
>
> In my environment I am seeing {{Namenodes}} crashing down after majority of 
> {{Journalnodes}} are re-installed. We manage multiple clusters and do rolling 
> upgrades followed by rolling re-install of each node including master(NN, JN, 
> RM, ZK) nodes. When a journal node is re-installed or moved to a new 
> disk/host, instead of running {{"initializeSharedEdits"}} command, I copy 
> {{VERSION}} file from one of the other {{Journalnode}} and that allows my 
> {{NN}} to start writing data to the newly installed {{Journalnode}}.
> To acheive quorum for JN and recover unfinalized segments NN during starupt 
> creates .tmp files under {{"/jn/current/paxos"}} directory . In 
> current implementation "paxos" directry is only created during 
> {{"initializeSharedEdits"}} command and if a JN is re-installed the "paxos" 
> directory is not created upon JN startup or by NN while writing .tmp 
> files which causes NN to crash with following error message:
> {code}
> 192.168.100.16:8485: /disk/1/dfs/jn/Test-Laptop/current/paxos/64044.tmp (No 
> such file or directory)
> at java.io.FileOutputStream.open(Native Method)
> at java.io.FileOutputStream.(FileOutputStream.java:221)
> at java.io.FileOutputStream.(FileOutputStream.java:171)
> at 
> org.apache.hadoop.hdfs.util.AtomicFileOutputStream.(AtomicFileOutputStream.java:58)
> at 
> org.apache.hadoop.hdfs.qjournal.server.Journal.persistPaxosData(Journal.java:971)
> at 
> org.apache.hadoop.hdfs.qjournal.server.Journal.acceptRecovery(Journal.java:846)
> at 
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.acceptRecovery(JournalNodeRpcServer.java:205)
> at 
> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.acceptRecovery(QJournalProtocolServerSideTranslatorPB.java:249)
> at 
> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25435)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2147)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2145)
> {code}
> The current 
> [getPaxosFile|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/server/JNStorage.java#L128-L130]
>  method simply returns a path to a file under "paxos" directory without 
> verifiying its existence. Since "paxos" directoy holds files that are 
> required for NN recovery and acheiving JN quorum my proposed solution is to 
> add a check to "getPaxosFile" method and create the {{"paxos"}} directory if 
> it is missing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14378) Simplify the design of multiple NN and both logic of edit log roll and checkpoint

2019-04-17 Thread star (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16820196#comment-16820196
 ] 

star commented on HDFS-14378:
-

Any one would like to make a review for the patch? Appreciated.

> Simplify the design of multiple NN and both logic of edit log roll and 
> checkpoint
> -
>
> Key: HDFS-14378
> URL: https://issues.apache.org/jira/browse/HDFS-14378
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.1.2
>Reporter: star
>Assignee: star
>Priority: Minor
>  Labels: patch
> Attachments: HDFS-14378-trunk.001.patch, HDFS-14378-trunk.002.patch, 
> HDFS-14378-trunk.003.patch, HDFS-14378-trunk.004.patch
>
>
>       HDFS-6440 introduced a mechanism to support more than 2 NNs. It 
> implements a first-writer-win policy to avoid duplicated fsimage downloading. 
> Variable 'isPrimaryCheckPointer' is used to hold the first-writer state, with 
> which SNN will provide fsimage for ANN next time. Then we have three roles in 
> NN cluster: ANN, one primary SNN, one or more normal SNN.
>       Since HDFS-12248, there may be more than two primary SNN shortly after 
> a exception occurred. It takes care with a scenario  that SNN will not upload 
> fsimage on IOE and Interrupted exceptions. Though it will not cause any 
> further functional issues, it is inconsistent. 
>       Futher more, edit log may be rolled more frequently than necessary with 
> multiple Standby name nodes, HDFS-14349. (I'm not so sure about this, will 
> verify by unit tests or any one could point it out.)
>       Above all, I‘m wondering if we could make it simple with following 
> changes:
>  * There are only two roles:ANN, SNN
>  * ANN will roll its edit log every DFS_HA_LOGROLL_PERIOD_KEY period.
>  * ANN will select a SNN to download checkpoint.
> SNN will just do logtail and checkpoint. Then provide a servlet for fsimage 
> downloading as normal. SNN will not try to roll edit log or send checkpoint 
> request to ANN.
> In a word, ANN will be more active. Suggestions are welcomed.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14378) Simplify the design of multiple NN and both logic of edit log roll and checkpoint

2019-04-17 Thread star (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

star updated HDFS-14378:

Attachment: HDFS-14378-trunk.004.patch

> Simplify the design of multiple NN and both logic of edit log roll and 
> checkpoint
> -
>
> Key: HDFS-14378
> URL: https://issues.apache.org/jira/browse/HDFS-14378
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.1.2
>Reporter: star
>Assignee: star
>Priority: Minor
>  Labels: patch
> Attachments: HDFS-14378-trunk.001.patch, HDFS-14378-trunk.002.patch, 
> HDFS-14378-trunk.003.patch, HDFS-14378-trunk.004.patch
>
>
>       HDFS-6440 introduced a mechanism to support more than 2 NNs. It 
> implements a first-writer-win policy to avoid duplicated fsimage downloading. 
> Variable 'isPrimaryCheckPointer' is used to hold the first-writer state, with 
> which SNN will provide fsimage for ANN next time. Then we have three roles in 
> NN cluster: ANN, one primary SNN, one or more normal SNN.
>       Since HDFS-12248, there may be more than two primary SNN shortly after 
> a exception occurred. It takes care with a scenario  that SNN will not upload 
> fsimage on IOE and Interrupted exceptions. Though it will not cause any 
> further functional issues, it is inconsistent. 
>       Futher more, edit log may be rolled more frequently than necessary with 
> multiple Standby name nodes, HDFS-14349. (I'm not so sure about this, will 
> verify by unit tests or any one could point it out.)
>       Above all, I‘m wondering if we could make it simple with following 
> changes:
>  * There are only two roles:ANN, SNN
>  * ANN will roll its edit log every DFS_HA_LOGROLL_PERIOD_KEY period.
>  * ANN will select a SNN to download checkpoint.
> SNN will just do logtail and checkpoint. Then provide a servlet for fsimage 
> downloading as normal. SNN will not try to roll edit log or send checkpoint 
> request to ANN.
> In a word, ANN will be more active. Suggestions are welcomed.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14378) Simplify the design of multiple NN and both logic of edit log roll and checkpoint

2019-04-16 Thread star (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

star updated HDFS-14378:

Attachment: HDFS-14378-trunk.003.patch

> Simplify the design of multiple NN and both logic of edit log roll and 
> checkpoint
> -
>
> Key: HDFS-14378
> URL: https://issues.apache.org/jira/browse/HDFS-14378
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.1.2
>Reporter: star
>Assignee: star
>Priority: Minor
>  Labels: patch
> Attachments: HDFS-14378-trunk.001.patch, HDFS-14378-trunk.002.patch, 
> HDFS-14378-trunk.003.patch
>
>
>       HDFS-6440 introduced a mechanism to support more than 2 NNs. It 
> implements a first-writer-win policy to avoid duplicated fsimage downloading. 
> Variable 'isPrimaryCheckPointer' is used to hold the first-writer state, with 
> which SNN will provide fsimage for ANN next time. Then we have three roles in 
> NN cluster: ANN, one primary SNN, one or more normal SNN.
>       Since HDFS-12248, there may be more than two primary SNN shortly after 
> a exception occurred. It takes care with a scenario  that SNN will not upload 
> fsimage on IOE and Interrupted exceptions. Though it will not cause any 
> further functional issues, it is inconsistent. 
>       Futher more, edit log may be rolled more frequently than necessary with 
> multiple Standby name nodes, HDFS-14349. (I'm not so sure about this, will 
> verify by unit tests or any one could point it out.)
>       Above all, I‘m wondering if we could make it simple with following 
> changes:
>  * There are only two roles:ANN, SNN
>  * ANN will roll its edit log every DFS_HA_LOGROLL_PERIOD_KEY period.
>  * ANN will select a SNN to download checkpoint.
> SNN will just do logtail and checkpoint. Then provide a servlet for fsimage 
> downloading as normal. SNN will not try to roll edit log or send checkpoint 
> request to ANN.
> In a word, ANN will be more active. Suggestions are welcomed.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14378) Simplify the design of multiple NN and both logic of edit log roll and checkpoint

2019-04-15 Thread star (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

star updated HDFS-14378:

Attachment: (was: HDFS-14378-trunk.002.patch)

> Simplify the design of multiple NN and both logic of edit log roll and 
> checkpoint
> -
>
> Key: HDFS-14378
> URL: https://issues.apache.org/jira/browse/HDFS-14378
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.1.2
>Reporter: star
>Assignee: star
>Priority: Minor
>  Labels: patch
> Attachments: HDFS-14378-trunk.001.patch, HDFS-14378-trunk.002.patch
>
>
>       HDFS-6440 introduced a mechanism to support more than 2 NNs. It 
> implements a first-writer-win policy to avoid duplicated fsimage downloading. 
> Variable 'isPrimaryCheckPointer' is used to hold the first-writer state, with 
> which SNN will provide fsimage for ANN next time. Then we have three roles in 
> NN cluster: ANN, one primary SNN, one or more normal SNN.
>       Since HDFS-12248, there may be more than two primary SNN shortly after 
> a exception occurred. It takes care with a scenario  that SNN will not upload 
> fsimage on IOE and Interrupted exceptions. Though it will not cause any 
> further functional issues, it is inconsistent. 
>       Futher more, edit log may be rolled more frequently than necessary with 
> multiple Standby name nodes, HDFS-14349. (I'm not so sure about this, will 
> verify by unit tests or any one could point it out.)
>       Above all, I‘m wondering if we could make it simple with following 
> changes:
>  * There are only two roles:ANN, SNN
>  * ANN will roll its edit log every DFS_HA_LOGROLL_PERIOD_KEY period.
>  * ANN will select a SNN to download checkpoint.
> SNN will just do logtail and checkpoint. Then provide a servlet for fsimage 
> downloading as normal. SNN will not try to roll edit log or send checkpoint 
> request to ANN.
> In a word, ANN will be more active. Suggestions are welcomed.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14378) Simplify the design of multiple NN and both logic of edit log roll and checkpoint

2019-04-15 Thread star (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

star updated HDFS-14378:

   Labels: patch  (was: )
Affects Version/s: 3.1.2
   Attachment: HDFS-14378-trunk.002.patch
   Status: Patch Available  (was: Open)

> Simplify the design of multiple NN and both logic of edit log roll and 
> checkpoint
> -
>
> Key: HDFS-14378
> URL: https://issues.apache.org/jira/browse/HDFS-14378
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.1.2
>Reporter: star
>Assignee: star
>Priority: Minor
>  Labels: patch
> Attachments: HDFS-14378-trunk.001.patch, HDFS-14378-trunk.002.patch, 
> HDFS-14378-trunk.002.patch
>
>
>       HDFS-6440 introduced a mechanism to support more than 2 NNs. It 
> implements a first-writer-win policy to avoid duplicated fsimage downloading. 
> Variable 'isPrimaryCheckPointer' is used to hold the first-writer state, with 
> which SNN will provide fsimage for ANN next time. Then we have three roles in 
> NN cluster: ANN, one primary SNN, one or more normal SNN.
>       Since HDFS-12248, there may be more than two primary SNN shortly after 
> a exception occurred. It takes care with a scenario  that SNN will not upload 
> fsimage on IOE and Interrupted exceptions. Though it will not cause any 
> further functional issues, it is inconsistent. 
>       Futher more, edit log may be rolled more frequently than necessary with 
> multiple Standby name nodes, HDFS-14349. (I'm not so sure about this, will 
> verify by unit tests or any one could point it out.)
>       Above all, I‘m wondering if we could make it simple with following 
> changes:
>  * There are only two roles:ANN, SNN
>  * ANN will roll its edit log every DFS_HA_LOGROLL_PERIOD_KEY period.
>  * ANN will select a SNN to download checkpoint.
> SNN will just do logtail and checkpoint. Then provide a servlet for fsimage 
> downloading as normal. SNN will not try to roll edit log or send checkpoint 
> request to ANN.
> In a word, ANN will be more active. Suggestions are welcomed.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14424) NN failover failed beacuse of lossing paxos directory

2019-04-15 Thread star (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

star updated HDFS-14424:

External issue ID:   (was: HDFS-10659)

> NN failover failed beacuse of lossing paxos directory
> -
>
> Key: HDFS-14424
> URL: https://issues.apache.org/jira/browse/HDFS-14424
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: star
>Assignee: star
>Priority: Major
>
> Recently, our hadoop namenode shutdown when switching active namenode, just 
> because of missing paxos directory. It is created in the default /tmp path 
> and deleted by os for no operation in 7 days. We can avoid this by moving 
> journal directory to a none tmp dir, but it‘s better to make sure namenode 
> works well by a default config.
> The issue throws exception similar to HDFS-10659, also caused by missing 
> paxos directory.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14424) NN failover failed beacuse of lossing paxos directory

2019-04-15 Thread star (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

star updated HDFS-14424:

External issue ID: HDFS-10659

> NN failover failed beacuse of lossing paxos directory
> -
>
> Key: HDFS-14424
> URL: https://issues.apache.org/jira/browse/HDFS-14424
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: star
>Assignee: star
>Priority: Major
>
> Recently, our hadoop namenode shutdown when switching active namenode, just 
> because of missing paxos directory. It is created in the default /tmp path 
> and deleted by os for no operation in 7 days. We can avoid this by moving 
> journal directory to a none tmp dir, but it‘s better to make sure namenode 
> works well by a default config.
> The issue throws exception similar to HDFS-10659, also caused by missing 
> paxos directory.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10659) Namenode crashes after Journalnode re-installation in an HA cluster due to missing paxos directory

2019-04-15 Thread star (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16818634#comment-16818634
 ] 

star commented on HDFS-10659:
-

failed test unrelated. check style is not newly added for hidden variable 'sd' .

> Namenode crashes after Journalnode re-installation in an HA cluster due to 
> missing paxos directory
> --
>
> Key: HDFS-10659
> URL: https://issues.apache.org/jira/browse/HDFS-10659
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ha, journal-node
>Affects Versions: 2.7.0
>Reporter: Amit Anand
>Assignee: star
>Priority: Major
> Attachments: HDFS-10659.000.patch, HDFS-10659.001.patch, 
> HDFS-10659.002.patch, HDFS-10659.003.patch, HDFS-10659.004.patch, 
> HDFS-10659.005.patch, HDFS-10659.006.patch
>
>
> In my environment I am seeing {{Namenodes}} crashing down after majority of 
> {{Journalnodes}} are re-installed. We manage multiple clusters and do rolling 
> upgrades followed by rolling re-install of each node including master(NN, JN, 
> RM, ZK) nodes. When a journal node is re-installed or moved to a new 
> disk/host, instead of running {{"initializeSharedEdits"}} command, I copy 
> {{VERSION}} file from one of the other {{Journalnode}} and that allows my 
> {{NN}} to start writing data to the newly installed {{Journalnode}}.
> To acheive quorum for JN and recover unfinalized segments NN during starupt 
> creates .tmp files under {{"/jn/current/paxos"}} directory . In 
> current implementation "paxos" directry is only created during 
> {{"initializeSharedEdits"}} command and if a JN is re-installed the "paxos" 
> directory is not created upon JN startup or by NN while writing .tmp 
> files which causes NN to crash with following error message:
> {code}
> 192.168.100.16:8485: /disk/1/dfs/jn/Test-Laptop/current/paxos/64044.tmp (No 
> such file or directory)
> at java.io.FileOutputStream.open(Native Method)
> at java.io.FileOutputStream.(FileOutputStream.java:221)
> at java.io.FileOutputStream.(FileOutputStream.java:171)
> at 
> org.apache.hadoop.hdfs.util.AtomicFileOutputStream.(AtomicFileOutputStream.java:58)
> at 
> org.apache.hadoop.hdfs.qjournal.server.Journal.persistPaxosData(Journal.java:971)
> at 
> org.apache.hadoop.hdfs.qjournal.server.Journal.acceptRecovery(Journal.java:846)
> at 
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.acceptRecovery(JournalNodeRpcServer.java:205)
> at 
> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.acceptRecovery(QJournalProtocolServerSideTranslatorPB.java:249)
> at 
> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25435)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2147)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2145)
> {code}
> The current 
> [getPaxosFile|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/server/JNStorage.java#L128-L130]
>  method simply returns a path to a file under "paxos" directory without 
> verifiying its existence. Since "paxos" directoy holds files that are 
> required for NN recovery and acheiving JN quorum my proposed solution is to 
> add a check to "getPaxosFile" method and create the {{"paxos"}} directory if 
> it is missing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10659) Namenode crashes after Journalnode re-installation in an HA cluster due to missing paxos directory

2019-04-15 Thread star (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16818480#comment-16818480
 ] 

star commented on HDFS-10659:
-

fix check style.

Seems func JNStorage#findFinalizedEditsFile is never used. Need to remove?

> Namenode crashes after Journalnode re-installation in an HA cluster due to 
> missing paxos directory
> --
>
> Key: HDFS-10659
> URL: https://issues.apache.org/jira/browse/HDFS-10659
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ha, journal-node
>Affects Versions: 2.7.0
>Reporter: Amit Anand
>Assignee: star
>Priority: Major
> Attachments: HDFS-10659.000.patch, HDFS-10659.001.patch, 
> HDFS-10659.002.patch, HDFS-10659.003.patch, HDFS-10659.004.patch, 
> HDFS-10659.005.patch, HDFS-10659.006.patch
>
>
> In my environment I am seeing {{Namenodes}} crashing down after majority of 
> {{Journalnodes}} are re-installed. We manage multiple clusters and do rolling 
> upgrades followed by rolling re-install of each node including master(NN, JN, 
> RM, ZK) nodes. When a journal node is re-installed or moved to a new 
> disk/host, instead of running {{"initializeSharedEdits"}} command, I copy 
> {{VERSION}} file from one of the other {{Journalnode}} and that allows my 
> {{NN}} to start writing data to the newly installed {{Journalnode}}.
> To acheive quorum for JN and recover unfinalized segments NN during starupt 
> creates .tmp files under {{"/jn/current/paxos"}} directory . In 
> current implementation "paxos" directry is only created during 
> {{"initializeSharedEdits"}} command and if a JN is re-installed the "paxos" 
> directory is not created upon JN startup or by NN while writing .tmp 
> files which causes NN to crash with following error message:
> {code}
> 192.168.100.16:8485: /disk/1/dfs/jn/Test-Laptop/current/paxos/64044.tmp (No 
> such file or directory)
> at java.io.FileOutputStream.open(Native Method)
> at java.io.FileOutputStream.(FileOutputStream.java:221)
> at java.io.FileOutputStream.(FileOutputStream.java:171)
> at 
> org.apache.hadoop.hdfs.util.AtomicFileOutputStream.(AtomicFileOutputStream.java:58)
> at 
> org.apache.hadoop.hdfs.qjournal.server.Journal.persistPaxosData(Journal.java:971)
> at 
> org.apache.hadoop.hdfs.qjournal.server.Journal.acceptRecovery(Journal.java:846)
> at 
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.acceptRecovery(JournalNodeRpcServer.java:205)
> at 
> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.acceptRecovery(QJournalProtocolServerSideTranslatorPB.java:249)
> at 
> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25435)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2147)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2145)
> {code}
> The current 
> [getPaxosFile|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/server/JNStorage.java#L128-L130]
>  method simply returns a path to a file under "paxos" directory without 
> verifiying its existence. Since "paxos" directoy holds files that are 
> required for NN recovery and acheiving JN quorum my proposed solution is to 
> add a check to "getPaxosFile" method and create the {{"paxos"}} directory if 
> it is missing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10659) Namenode crashes after Journalnode re-installation in an HA cluster due to missing paxos directory

2019-04-15 Thread star (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-10659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

star updated HDFS-10659:

Attachment: HDFS-10659.006.patch

> Namenode crashes after Journalnode re-installation in an HA cluster due to 
> missing paxos directory
> --
>
> Key: HDFS-10659
> URL: https://issues.apache.org/jira/browse/HDFS-10659
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ha, journal-node
>Affects Versions: 2.7.0
>Reporter: Amit Anand
>Assignee: star
>Priority: Major
> Attachments: HDFS-10659.000.patch, HDFS-10659.001.patch, 
> HDFS-10659.002.patch, HDFS-10659.003.patch, HDFS-10659.004.patch, 
> HDFS-10659.005.patch, HDFS-10659.006.patch
>
>
> In my environment I am seeing {{Namenodes}} crashing down after majority of 
> {{Journalnodes}} are re-installed. We manage multiple clusters and do rolling 
> upgrades followed by rolling re-install of each node including master(NN, JN, 
> RM, ZK) nodes. When a journal node is re-installed or moved to a new 
> disk/host, instead of running {{"initializeSharedEdits"}} command, I copy 
> {{VERSION}} file from one of the other {{Journalnode}} and that allows my 
> {{NN}} to start writing data to the newly installed {{Journalnode}}.
> To acheive quorum for JN and recover unfinalized segments NN during starupt 
> creates .tmp files under {{"/jn/current/paxos"}} directory . In 
> current implementation "paxos" directry is only created during 
> {{"initializeSharedEdits"}} command and if a JN is re-installed the "paxos" 
> directory is not created upon JN startup or by NN while writing .tmp 
> files which causes NN to crash with following error message:
> {code}
> 192.168.100.16:8485: /disk/1/dfs/jn/Test-Laptop/current/paxos/64044.tmp (No 
> such file or directory)
> at java.io.FileOutputStream.open(Native Method)
> at java.io.FileOutputStream.(FileOutputStream.java:221)
> at java.io.FileOutputStream.(FileOutputStream.java:171)
> at 
> org.apache.hadoop.hdfs.util.AtomicFileOutputStream.(AtomicFileOutputStream.java:58)
> at 
> org.apache.hadoop.hdfs.qjournal.server.Journal.persistPaxosData(Journal.java:971)
> at 
> org.apache.hadoop.hdfs.qjournal.server.Journal.acceptRecovery(Journal.java:846)
> at 
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.acceptRecovery(JournalNodeRpcServer.java:205)
> at 
> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.acceptRecovery(QJournalProtocolServerSideTranslatorPB.java:249)
> at 
> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25435)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2147)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2145)
> {code}
> The current 
> [getPaxosFile|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/server/JNStorage.java#L128-L130]
>  method simply returns a path to a file under "paxos" directory without 
> verifiying its existence. Since "paxos" directoy holds files that are 
> required for NN recovery and acheiving JN quorum my proposed solution is to 
> add a check to "getPaxosFile" method and create the {{"paxos"}} directory if 
> it is missing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10659) Namenode crashes after Journalnode re-installation in an HA cluster due to missing paxos directory

2019-04-15 Thread star (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16817936#comment-16817936
 ] 

star commented on HDFS-10659:
-

[~hanishakoneru] thanks.

[~jojochuang], patch for trunk uploaded. Mind make a review?

> Namenode crashes after Journalnode re-installation in an HA cluster due to 
> missing paxos directory
> --
>
> Key: HDFS-10659
> URL: https://issues.apache.org/jira/browse/HDFS-10659
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ha, journal-node
>Affects Versions: 2.7.0
>Reporter: Amit Anand
>Assignee: star
>Priority: Major
> Attachments: HDFS-10659.000.patch, HDFS-10659.001.patch, 
> HDFS-10659.002.patch, HDFS-10659.003.patch, HDFS-10659.004.patch, 
> HDFS-10659.005.patch
>
>
> In my environment I am seeing {{Namenodes}} crashing down after majority of 
> {{Journalnodes}} are re-installed. We manage multiple clusters and do rolling 
> upgrades followed by rolling re-install of each node including master(NN, JN, 
> RM, ZK) nodes. When a journal node is re-installed or moved to a new 
> disk/host, instead of running {{"initializeSharedEdits"}} command, I copy 
> {{VERSION}} file from one of the other {{Journalnode}} and that allows my 
> {{NN}} to start writing data to the newly installed {{Journalnode}}.
> To acheive quorum for JN and recover unfinalized segments NN during starupt 
> creates .tmp files under {{"/jn/current/paxos"}} directory . In 
> current implementation "paxos" directry is only created during 
> {{"initializeSharedEdits"}} command and if a JN is re-installed the "paxos" 
> directory is not created upon JN startup or by NN while writing .tmp 
> files which causes NN to crash with following error message:
> {code}
> 192.168.100.16:8485: /disk/1/dfs/jn/Test-Laptop/current/paxos/64044.tmp (No 
> such file or directory)
> at java.io.FileOutputStream.open(Native Method)
> at java.io.FileOutputStream.(FileOutputStream.java:221)
> at java.io.FileOutputStream.(FileOutputStream.java:171)
> at 
> org.apache.hadoop.hdfs.util.AtomicFileOutputStream.(AtomicFileOutputStream.java:58)
> at 
> org.apache.hadoop.hdfs.qjournal.server.Journal.persistPaxosData(Journal.java:971)
> at 
> org.apache.hadoop.hdfs.qjournal.server.Journal.acceptRecovery(Journal.java:846)
> at 
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.acceptRecovery(JournalNodeRpcServer.java:205)
> at 
> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.acceptRecovery(QJournalProtocolServerSideTranslatorPB.java:249)
> at 
> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25435)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2147)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2145)
> {code}
> The current 
> [getPaxosFile|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/server/JNStorage.java#L128-L130]
>  method simply returns a path to a file under "paxos" directory without 
> verifiying its existence. Since "paxos" directoy holds files that are 
> required for NN recovery and acheiving JN quorum my proposed solution is to 
> add a check to "getPaxosFile" method and create the {{"paxos"}} directory if 
> it is missing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10659) Namenode crashes after Journalnode re-installation in an HA cluster due to missing paxos directory

2019-04-15 Thread star (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-10659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

star updated HDFS-10659:

Attachment: HDFS-10659.005.patch

> Namenode crashes after Journalnode re-installation in an HA cluster due to 
> missing paxos directory
> --
>
> Key: HDFS-10659
> URL: https://issues.apache.org/jira/browse/HDFS-10659
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ha, journal-node
>Affects Versions: 2.7.0
>Reporter: Amit Anand
>Assignee: star
>Priority: Major
> Attachments: HDFS-10659.000.patch, HDFS-10659.001.patch, 
> HDFS-10659.002.patch, HDFS-10659.003.patch, HDFS-10659.004.patch, 
> HDFS-10659.005.patch
>
>
> In my environment I am seeing {{Namenodes}} crashing down after majority of 
> {{Journalnodes}} are re-installed. We manage multiple clusters and do rolling 
> upgrades followed by rolling re-install of each node including master(NN, JN, 
> RM, ZK) nodes. When a journal node is re-installed or moved to a new 
> disk/host, instead of running {{"initializeSharedEdits"}} command, I copy 
> {{VERSION}} file from one of the other {{Journalnode}} and that allows my 
> {{NN}} to start writing data to the newly installed {{Journalnode}}.
> To acheive quorum for JN and recover unfinalized segments NN during starupt 
> creates .tmp files under {{"/jn/current/paxos"}} directory . In 
> current implementation "paxos" directry is only created during 
> {{"initializeSharedEdits"}} command and if a JN is re-installed the "paxos" 
> directory is not created upon JN startup or by NN while writing .tmp 
> files which causes NN to crash with following error message:
> {code}
> 192.168.100.16:8485: /disk/1/dfs/jn/Test-Laptop/current/paxos/64044.tmp (No 
> such file or directory)
> at java.io.FileOutputStream.open(Native Method)
> at java.io.FileOutputStream.(FileOutputStream.java:221)
> at java.io.FileOutputStream.(FileOutputStream.java:171)
> at 
> org.apache.hadoop.hdfs.util.AtomicFileOutputStream.(AtomicFileOutputStream.java:58)
> at 
> org.apache.hadoop.hdfs.qjournal.server.Journal.persistPaxosData(Journal.java:971)
> at 
> org.apache.hadoop.hdfs.qjournal.server.Journal.acceptRecovery(Journal.java:846)
> at 
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.acceptRecovery(JournalNodeRpcServer.java:205)
> at 
> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.acceptRecovery(QJournalProtocolServerSideTranslatorPB.java:249)
> at 
> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25435)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2147)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2145)
> {code}
> The current 
> [getPaxosFile|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/server/JNStorage.java#L128-L130]
>  method simply returns a path to a file under "paxos" directory without 
> verifiying its existence. Since "paxos" directoy holds files that are 
> required for NN recovery and acheiving JN quorum my proposed solution is to 
> add a check to "getPaxosFile" method and create the {{"paxos"}} directory if 
> it is missing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-10659) Namenode crashes after Journalnode re-installation in an HA cluster due to missing paxos directory

2019-04-15 Thread star (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-10659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

star reassigned HDFS-10659:
---

Assignee: star  (was: Hanisha Koneru)

> Namenode crashes after Journalnode re-installation in an HA cluster due to 
> missing paxos directory
> --
>
> Key: HDFS-10659
> URL: https://issues.apache.org/jira/browse/HDFS-10659
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ha, journal-node
>Affects Versions: 2.7.0
>Reporter: Amit Anand
>Assignee: star
>Priority: Major
> Attachments: HDFS-10659.000.patch, HDFS-10659.001.patch, 
> HDFS-10659.002.patch, HDFS-10659.003.patch, HDFS-10659.004.patch
>
>
> In my environment I am seeing {{Namenodes}} crashing down after majority of 
> {{Journalnodes}} are re-installed. We manage multiple clusters and do rolling 
> upgrades followed by rolling re-install of each node including master(NN, JN, 
> RM, ZK) nodes. When a journal node is re-installed or moved to a new 
> disk/host, instead of running {{"initializeSharedEdits"}} command, I copy 
> {{VERSION}} file from one of the other {{Journalnode}} and that allows my 
> {{NN}} to start writing data to the newly installed {{Journalnode}}.
> To acheive quorum for JN and recover unfinalized segments NN during starupt 
> creates .tmp files under {{"/jn/current/paxos"}} directory . In 
> current implementation "paxos" directry is only created during 
> {{"initializeSharedEdits"}} command and if a JN is re-installed the "paxos" 
> directory is not created upon JN startup or by NN while writing .tmp 
> files which causes NN to crash with following error message:
> {code}
> 192.168.100.16:8485: /disk/1/dfs/jn/Test-Laptop/current/paxos/64044.tmp (No 
> such file or directory)
> at java.io.FileOutputStream.open(Native Method)
> at java.io.FileOutputStream.(FileOutputStream.java:221)
> at java.io.FileOutputStream.(FileOutputStream.java:171)
> at 
> org.apache.hadoop.hdfs.util.AtomicFileOutputStream.(AtomicFileOutputStream.java:58)
> at 
> org.apache.hadoop.hdfs.qjournal.server.Journal.persistPaxosData(Journal.java:971)
> at 
> org.apache.hadoop.hdfs.qjournal.server.Journal.acceptRecovery(Journal.java:846)
> at 
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.acceptRecovery(JournalNodeRpcServer.java:205)
> at 
> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.acceptRecovery(QJournalProtocolServerSideTranslatorPB.java:249)
> at 
> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25435)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2147)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2145)
> {code}
> The current 
> [getPaxosFile|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/server/JNStorage.java#L128-L130]
>  method simply returns a path to a file under "paxos" directory without 
> verifiying its existence. Since "paxos" directoy holds files that are 
> required for NN recovery and acheiving JN quorum my proposed solution is to 
> add a check to "getPaxosFile" method and create the {{"paxos"}} directory if 
> it is missing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10477) Stop decommission a rack of DataNodes caused NameNode fail over to standby

2019-04-15 Thread star (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16817908#comment-16817908
 ] 

star commented on HDFS-10477:
-

[~jojochuang], [^HDFS-10477.branch-2.8.patch] is for branch 2.8.

Anything else should I do to make the patch committed?

> Stop decommission a rack of DataNodes caused NameNode fail over to standby
> --
>
> Key: HDFS-10477
> URL: https://issues.apache.org/jira/browse/HDFS-10477
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.2
>Reporter: yunjiong zhao
>Assignee: yunjiong zhao
>Priority: Major
> Fix For: 2.10.0, 3.0.4, 3.3.0, 3.2.1, 2.9.3, 3.1.3
>
> Attachments: HDFS-10477.002.patch, HDFS-10477.003.patch, 
> HDFS-10477.004.patch, HDFS-10477.005.patch, HDFS-10477.006.patch, 
> HDFS-10477.007.patch, HDFS-10477.branch-2.8.patch, HDFS-10477.branch-2.patch, 
> HDFS-10477.patch
>
>
> In our cluster, when we stop decommissioning a rack which have 46 DataNodes, 
> it locked Namesystem for about 7 minutes as below log shows:
> {code}
> 2016-05-26 20:11:41,697 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop 
> Decommissioning 10.142.27.27:1004
> 2016-05-26 20:11:51,171 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated 
> 285258 over-replicated blocks on 10.142.27.27:1004 during recommissioning
> 2016-05-26 20:11:51,171 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop 
> Decommissioning 10.142.27.118:1004
> 2016-05-26 20:11:59,972 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated 
> 279923 over-replicated blocks on 10.142.27.118:1004 during recommissioning
> 2016-05-26 20:11:59,972 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop 
> Decommissioning 10.142.27.113:1004
> 2016-05-26 20:12:09,007 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated 
> 294307 over-replicated blocks on 10.142.27.113:1004 during recommissioning
> 2016-05-26 20:12:09,008 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop 
> Decommissioning 10.142.27.117:1004
> 2016-05-26 20:12:18,055 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated 
> 314381 over-replicated blocks on 10.142.27.117:1004 during recommissioning
> 2016-05-26 20:12:18,056 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop 
> Decommissioning 10.142.27.130:1004
> 2016-05-26 20:12:25,938 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated 
> 272779 over-replicated blocks on 10.142.27.130:1004 during recommissioning
> 2016-05-26 20:12:25,939 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop 
> Decommissioning 10.142.27.121:1004
> 2016-05-26 20:12:34,134 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated 
> 287248 over-replicated blocks on 10.142.27.121:1004 during recommissioning
> 2016-05-26 20:12:34,134 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop 
> Decommissioning 10.142.27.33:1004
> 2016-05-26 20:12:43,020 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated 
> 299868 over-replicated blocks on 10.142.27.33:1004 during recommissioning
> 2016-05-26 20:12:43,020 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop 
> Decommissioning 10.142.27.137:1004
> 2016-05-26 20:12:52,220 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated 
> 303914 over-replicated blocks on 10.142.27.137:1004 during recommissioning
> 2016-05-26 20:12:52,220 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop 
> Decommissioning 10.142.27.51:1004
> 2016-05-26 20:13:00,362 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated 
> 281175 over-replicated blocks on 10.142.27.51:1004 during recommissioning
> 2016-05-26 20:13:00,362 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop 
> Decommissioning 10.142.27.12:1004
> 2016-05-26 20:13:08,756 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated 
> 274880 over-replicated blocks on 10.142.27.12:1004 during recommissioning
> 2016-05-26 20:13:08,757 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop 
> Decommissioning 10.142.27.15:1004
> 2016-05-26 20:13:17,185 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated 
> 286334 over-replicated blocks on 10.142.27.15:1004 during recommissioning
> 2016-05-26 20:13:17,185 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop 
> Decommissioning 10.142.27.14:1004
> 2016-05-26 

[jira] [Assigned] (HDFS-14424) NN failover failed beacuse of lossing paxos directory

2019-04-13 Thread star (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

star reassigned HDFS-14424:
---

Assignee: star

> NN failover failed beacuse of lossing paxos directory
> -
>
> Key: HDFS-14424
> URL: https://issues.apache.org/jira/browse/HDFS-14424
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: star
>Assignee: star
>Priority: Major
>
> Recently, our hadoop namenode shutdown when switching active namenode, just 
> because of missing paxos directory. It is created in the default /tmp path 
> and deleted by os for no operation in 7 days. We can avoid this by moving 
> journal directory to a none tmp dir, but it‘s better to make sure namenode 
> works well by a default config.
> The issue throws exception similar to HDFS-10659, also caused by missing 
> paxos directory.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14424) NN failover failed beacuse of lossing paxos directory

2019-04-13 Thread star (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

star updated HDFS-14424:

Description: 
Recently, our hadoop namenode shutdown when switching active namenode, just 
because of missing paxos directory. It is created in the default /tmp path and 
deleted by os for no operation in 7 days. We can avoid this by moving journal 
directory to a none tmp dir, but it‘s better to make sure namenode works well 
by a default config.

The issue throws exception similar to HDFS-10659, also caused by missing paxos 
directory.

 

> NN failover failed beacuse of lossing paxos directory
> -
>
> Key: HDFS-14424
> URL: https://issues.apache.org/jira/browse/HDFS-14424
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: star
>Priority: Major
>
> Recently, our hadoop namenode shutdown when switching active namenode, just 
> because of missing paxos directory. It is created in the default /tmp path 
> and deleted by os for no operation in 7 days. We can avoid this by moving 
> journal directory to a none tmp dir, but it‘s better to make sure namenode 
> works well by a default config.
> The issue throws exception similar to HDFS-10659, also caused by missing 
> paxos directory.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14424) NN start beacuse of lossing paxos directory

2019-04-13 Thread star (JIRA)
star created HDFS-14424:
---

 Summary: NN start beacuse of lossing paxos directory
 Key: HDFS-14424
 URL: https://issues.apache.org/jira/browse/HDFS-14424
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: star






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14424) NN failover failed beacuse of lossing paxos directory

2019-04-13 Thread star (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

star updated HDFS-14424:

Summary: NN failover failed beacuse of lossing paxos directory  (was: NN 
start beacuse of lossing paxos directory)

> NN failover failed beacuse of lossing paxos directory
> -
>
> Key: HDFS-14424
> URL: https://issues.apache.org/jira/browse/HDFS-14424
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: star
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10477) Stop decommission a rack of DataNodes caused NameNode fail over to standby

2019-04-10 Thread star (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815013#comment-16815013
 ] 

star commented on HDFS-10477:
-

[~jojochuang], uploaded patch for branch-2.8. 

> Stop decommission a rack of DataNodes caused NameNode fail over to standby
> --
>
> Key: HDFS-10477
> URL: https://issues.apache.org/jira/browse/HDFS-10477
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.2
>Reporter: yunjiong zhao
>Assignee: yunjiong zhao
>Priority: Major
> Fix For: 2.10.0, 3.0.4, 3.3.0, 3.2.1, 2.9.3, 3.1.3
>
> Attachments: HDFS-10477.002.patch, HDFS-10477.003.patch, 
> HDFS-10477.004.patch, HDFS-10477.005.patch, HDFS-10477.006.patch, 
> HDFS-10477.007.patch, HDFS-10477.branch-2.8.patch, HDFS-10477.branch-2.patch, 
> HDFS-10477.patch
>
>
> In our cluster, when we stop decommissioning a rack which have 46 DataNodes, 
> it locked Namesystem for about 7 minutes as below log shows:
> {code}
> 2016-05-26 20:11:41,697 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop 
> Decommissioning 10.142.27.27:1004
> 2016-05-26 20:11:51,171 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated 
> 285258 over-replicated blocks on 10.142.27.27:1004 during recommissioning
> 2016-05-26 20:11:51,171 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop 
> Decommissioning 10.142.27.118:1004
> 2016-05-26 20:11:59,972 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated 
> 279923 over-replicated blocks on 10.142.27.118:1004 during recommissioning
> 2016-05-26 20:11:59,972 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop 
> Decommissioning 10.142.27.113:1004
> 2016-05-26 20:12:09,007 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated 
> 294307 over-replicated blocks on 10.142.27.113:1004 during recommissioning
> 2016-05-26 20:12:09,008 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop 
> Decommissioning 10.142.27.117:1004
> 2016-05-26 20:12:18,055 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated 
> 314381 over-replicated blocks on 10.142.27.117:1004 during recommissioning
> 2016-05-26 20:12:18,056 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop 
> Decommissioning 10.142.27.130:1004
> 2016-05-26 20:12:25,938 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated 
> 272779 over-replicated blocks on 10.142.27.130:1004 during recommissioning
> 2016-05-26 20:12:25,939 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop 
> Decommissioning 10.142.27.121:1004
> 2016-05-26 20:12:34,134 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated 
> 287248 over-replicated blocks on 10.142.27.121:1004 during recommissioning
> 2016-05-26 20:12:34,134 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop 
> Decommissioning 10.142.27.33:1004
> 2016-05-26 20:12:43,020 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated 
> 299868 over-replicated blocks on 10.142.27.33:1004 during recommissioning
> 2016-05-26 20:12:43,020 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop 
> Decommissioning 10.142.27.137:1004
> 2016-05-26 20:12:52,220 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated 
> 303914 over-replicated blocks on 10.142.27.137:1004 during recommissioning
> 2016-05-26 20:12:52,220 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop 
> Decommissioning 10.142.27.51:1004
> 2016-05-26 20:13:00,362 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated 
> 281175 over-replicated blocks on 10.142.27.51:1004 during recommissioning
> 2016-05-26 20:13:00,362 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop 
> Decommissioning 10.142.27.12:1004
> 2016-05-26 20:13:08,756 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated 
> 274880 over-replicated blocks on 10.142.27.12:1004 during recommissioning
> 2016-05-26 20:13:08,757 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop 
> Decommissioning 10.142.27.15:1004
> 2016-05-26 20:13:17,185 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated 
> 286334 over-replicated blocks on 10.142.27.15:1004 during recommissioning
> 2016-05-26 20:13:17,185 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop 
> Decommissioning 10.142.27.14:1004
> 2016-05-26 20:13:25,369 INFO 
> 

[jira] [Updated] (HDFS-10477) Stop decommission a rack of DataNodes caused NameNode fail over to standby

2019-04-10 Thread star (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

star updated HDFS-10477:

Attachment: HDFS-10477.branch-2.8.patch

> Stop decommission a rack of DataNodes caused NameNode fail over to standby
> --
>
> Key: HDFS-10477
> URL: https://issues.apache.org/jira/browse/HDFS-10477
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.2
>Reporter: yunjiong zhao
>Assignee: yunjiong zhao
>Priority: Major
> Fix For: 2.10.0, 3.0.4, 3.3.0, 3.2.1, 2.9.3, 3.1.3
>
> Attachments: HDFS-10477.002.patch, HDFS-10477.003.patch, 
> HDFS-10477.004.patch, HDFS-10477.005.patch, HDFS-10477.006.patch, 
> HDFS-10477.007.patch, HDFS-10477.branch-2.8.patch, HDFS-10477.branch-2.patch, 
> HDFS-10477.patch
>
>
> In our cluster, when we stop decommissioning a rack which have 46 DataNodes, 
> it locked Namesystem for about 7 minutes as below log shows:
> {code}
> 2016-05-26 20:11:41,697 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop 
> Decommissioning 10.142.27.27:1004
> 2016-05-26 20:11:51,171 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated 
> 285258 over-replicated blocks on 10.142.27.27:1004 during recommissioning
> 2016-05-26 20:11:51,171 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop 
> Decommissioning 10.142.27.118:1004
> 2016-05-26 20:11:59,972 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated 
> 279923 over-replicated blocks on 10.142.27.118:1004 during recommissioning
> 2016-05-26 20:11:59,972 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop 
> Decommissioning 10.142.27.113:1004
> 2016-05-26 20:12:09,007 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated 
> 294307 over-replicated blocks on 10.142.27.113:1004 during recommissioning
> 2016-05-26 20:12:09,008 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop 
> Decommissioning 10.142.27.117:1004
> 2016-05-26 20:12:18,055 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated 
> 314381 over-replicated blocks on 10.142.27.117:1004 during recommissioning
> 2016-05-26 20:12:18,056 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop 
> Decommissioning 10.142.27.130:1004
> 2016-05-26 20:12:25,938 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated 
> 272779 over-replicated blocks on 10.142.27.130:1004 during recommissioning
> 2016-05-26 20:12:25,939 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop 
> Decommissioning 10.142.27.121:1004
> 2016-05-26 20:12:34,134 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated 
> 287248 over-replicated blocks on 10.142.27.121:1004 during recommissioning
> 2016-05-26 20:12:34,134 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop 
> Decommissioning 10.142.27.33:1004
> 2016-05-26 20:12:43,020 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated 
> 299868 over-replicated blocks on 10.142.27.33:1004 during recommissioning
> 2016-05-26 20:12:43,020 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop 
> Decommissioning 10.142.27.137:1004
> 2016-05-26 20:12:52,220 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated 
> 303914 over-replicated blocks on 10.142.27.137:1004 during recommissioning
> 2016-05-26 20:12:52,220 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop 
> Decommissioning 10.142.27.51:1004
> 2016-05-26 20:13:00,362 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated 
> 281175 over-replicated blocks on 10.142.27.51:1004 during recommissioning
> 2016-05-26 20:13:00,362 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop 
> Decommissioning 10.142.27.12:1004
> 2016-05-26 20:13:08,756 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated 
> 274880 over-replicated blocks on 10.142.27.12:1004 during recommissioning
> 2016-05-26 20:13:08,757 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop 
> Decommissioning 10.142.27.15:1004
> 2016-05-26 20:13:17,185 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated 
> 286334 over-replicated blocks on 10.142.27.15:1004 during recommissioning
> 2016-05-26 20:13:17,185 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop 
> Decommissioning 10.142.27.14:1004
> 2016-05-26 20:13:25,369 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated 
> 280219 over-replicated 

[jira] [Commented] (HDFS-13596) NN restart fails after RollingUpgrade from 2.x to 3.x

2019-04-10 Thread star (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16814975#comment-16814975
 ] 

star commented on HDFS-13596:
-

Sorry for my mistake. Indeed length field is not behaving the way as my 
expectation. It just skip 4 bytes of checksum.
{quote}IOUtils.skipFully(in, 4 + 8); // skip length and txid
op.readFields(in, logVersion);
// skip over the checksum, which we validated above.
IOUtils.skipFully(in, CHECKSUM_LENGTH);{quote}
 

 

 

> NN restart fails after RollingUpgrade from 2.x to 3.x
> -
>
> Key: HDFS-13596
> URL: https://issues.apache.org/jira/browse/HDFS-13596
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Reporter: Hanisha Koneru
>Assignee: Fei Hui
>Priority: Critical
> Attachments: HDFS-13596.001.patch, HDFS-13596.002.patch, 
> HDFS-13596.003.patch, HDFS-13596.004.patch
>
>
> After rollingUpgrade NN from 2.x and 3.x, if the NN is restarted, it fails 
> while replaying edit logs.
>  * After NN is started with rollingUpgrade, the layoutVersion written to 
> editLogs (before finalizing the upgrade) is the pre-upgrade layout version 
> (so as to support downgrade).
>  * When writing transactions to log, NN writes as per the current layout 
> version. In 3.x, erasureCoding bits are added to the editLog transactions.
>  * So any edit log written after the upgrade and before finalizing the 
> upgrade will have the old layout version but the new format of transactions.
>  * When NN is restarted and the edit logs are replayed, the NN reads the old 
> layout version from the editLog file. When parsing the transactions, it 
> assumes that the transactions are also from the previous layout and hence 
> skips parsing the erasureCoding bits.
>  * This cascades into reading the wrong set of bits for other fields and 
> leads to NN shutting down.
> Sample error output:
> {code:java}
> java.lang.IllegalArgumentException: Invalid clientId - length is 0 expected 
> length 16
>  at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
>  at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:74)
>  at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:86)
>  at 
> org.apache.hadoop.ipc.RetryCache$CacheEntryWithPayload.(RetryCache.java:163)
>  at 
> org.apache.hadoop.ipc.RetryCache.addCacheEntryWithPayload(RetryCache.java:322)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheEntryWithPayload(FSNamesystem.java:960)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:397)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158)
>  at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:694)
>  at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:937)
>  at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:910)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1643)
>  at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1710)
> 2018-05-17 19:10:06,522 WARN 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception 
> loading fsimage
> java.io.IOException: java.lang.IllegalStateException: Cannot skip to less 
> than the current value (=16389), where newValue=16388
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.resetLastInodeId(FSDirectory.java:1945)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:298)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158)
>  at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714)
>  at 
> 

[jira] [Commented] (HDFS-13596) NN restart fails after RollingUpgrade from 2.x to 3.x

2019-04-09 Thread star (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16813327#comment-16813327
 ] 

star commented on HDFS-13596:
-

New properties will be ignored as I checked the code. There ls a length field 
in every edit log.  To write new properties after older ones is not just about 
this issue, but a rule for future extension. If not, there's high risks that 
hdfs metadata being broken after finanizing. Because until then new properties 
are written in editlogs, which is not verified in the process of rolling 
upgrade. Metadata will not be recoverable at that time. Maybe I 'm wrong on 
this, feel free to correct.

> NN restart fails after RollingUpgrade from 2.x to 3.x
> -
>
> Key: HDFS-13596
> URL: https://issues.apache.org/jira/browse/HDFS-13596
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Reporter: Hanisha Koneru
>Assignee: Fei Hui
>Priority: Critical
> Attachments: HDFS-13596.001.patch, HDFS-13596.002.patch, 
> HDFS-13596.003.patch, HDFS-13596.004.patch
>
>
> After rollingUpgrade NN from 2.x and 3.x, if the NN is restarted, it fails 
> while replaying edit logs.
>  * After NN is started with rollingUpgrade, the layoutVersion written to 
> editLogs (before finalizing the upgrade) is the pre-upgrade layout version 
> (so as to support downgrade).
>  * When writing transactions to log, NN writes as per the current layout 
> version. In 3.x, erasureCoding bits are added to the editLog transactions.
>  * So any edit log written after the upgrade and before finalizing the 
> upgrade will have the old layout version but the new format of transactions.
>  * When NN is restarted and the edit logs are replayed, the NN reads the old 
> layout version from the editLog file. When parsing the transactions, it 
> assumes that the transactions are also from the previous layout and hence 
> skips parsing the erasureCoding bits.
>  * This cascades into reading the wrong set of bits for other fields and 
> leads to NN shutting down.
> Sample error output:
> {code:java}
> java.lang.IllegalArgumentException: Invalid clientId - length is 0 expected 
> length 16
>  at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
>  at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:74)
>  at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:86)
>  at 
> org.apache.hadoop.ipc.RetryCache$CacheEntryWithPayload.(RetryCache.java:163)
>  at 
> org.apache.hadoop.ipc.RetryCache.addCacheEntryWithPayload(RetryCache.java:322)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheEntryWithPayload(FSNamesystem.java:960)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:397)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158)
>  at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:694)
>  at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:937)
>  at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:910)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1643)
>  at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1710)
> 2018-05-17 19:10:06,522 WARN 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception 
> loading fsimage
> java.io.IOException: java.lang.IllegalStateException: Cannot skip to less 
> than the current value (=16389), where newValue=16388
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.resetLastInodeId(FSDirectory.java:1945)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:298)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158)
>  at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323)
>  at 
> 

[jira] [Commented] (HDFS-14403) Cost-Based RPC FairCallQueue

2019-04-07 Thread star (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16811877#comment-16811877
 ] 

star commented on HDFS-14403:
-

[~xkrogen], Impressive results.

Can you give a detailed information about how much listStatus on a directory 
with one subdirectories and listStatus on a directory with 1000 subdirectories 
in your benckmark tests? Is it possible that scheduler with LockCostProvider 
schedules more low cost operations like listStatus on a directory with one 
subdirectories, which results in a lower queue time. Or it makes sense for 
LockCostProvider. 

> Cost-Based RPC FairCallQueue
> 
>
> Key: HDFS-14403
> URL: https://issues.apache.org/jira/browse/HDFS-14403
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ipc, namenode
>Reporter: Erik Krogen
>Assignee: Christopher Gregorian
>Priority: Major
>  Labels: qos, rpc
> Attachments: CostBasedFairCallQueueDesign_v0.pdf, HDFS-14403.001.patch
>
>
> HADOOP-15016 initially described extensions to the Hadoop FairCallQueue 
> encompassing both cost-based analysis of incoming RPCs, as well as support 
> for reservations of RPC capacity for system/platform users. This JIRA intends 
> to track the former, as HADOOP-15016 was repurposed to more specifically 
> focus on the reservation portion of the work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-13596) NN restart fails after RollingUpgrade from 2.x to 3.x

2019-04-06 Thread star (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16811584#comment-16811584
 ] 

star edited comment on HDFS-13596 at 4/6/19 2:35 PM:
-

Maybe another option:we can write new properties after the older ones, so that 
when restarted they can be ignored.  EC requests will still be accepted and 
persisted in editlog before finalizing rollingupgrade. 

In this case , it would be like this.

from 
{quote}FSImageSerialization.writeByte(storagePolicyId, out);
 FSImageSerialization.writeByte(erasureCodingPolicyId, out);
 writeRpcIds(rpcClientId, rpcCallId, out);
{quote}
to 
{quote}writeRpcIds(rpcClientId, rpcCallId, out);

FSImageSerialization.writeByte(storagePolicyId, out);
 FSImageSerialization.writeByte(erasureCodingPolicyId, out);
{quote}


was (Author: starphin):
Maybe another option:we can write new properties after the older ones, so that 
when restarted they can be ignored.  EC requests will still be accepted and 
persisted in editlog. 

In this case , it would be like this.

from 
{quote}FSImageSerialization.writeByte(storagePolicyId, out);
 FSImageSerialization.writeByte(erasureCodingPolicyId, out);
 writeRpcIds(rpcClientId, rpcCallId, out);
{quote}
to 
{quote}writeRpcIds(rpcClientId, rpcCallId, out);

FSImageSerialization.writeByte(storagePolicyId, out);
 FSImageSerialization.writeByte(erasureCodingPolicyId, out);
{quote}

> NN restart fails after RollingUpgrade from 2.x to 3.x
> -
>
> Key: HDFS-13596
> URL: https://issues.apache.org/jira/browse/HDFS-13596
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Reporter: Hanisha Koneru
>Assignee: Fei Hui
>Priority: Critical
> Attachments: HDFS-13596.001.patch, HDFS-13596.002.patch, 
> HDFS-13596.003.patch
>
>
> After rollingUpgrade NN from 2.x and 3.x, if the NN is restarted, it fails 
> while replaying edit logs.
>  * After NN is started with rollingUpgrade, the layoutVersion written to 
> editLogs (before finalizing the upgrade) is the pre-upgrade layout version 
> (so as to support downgrade).
>  * When writing transactions to log, NN writes as per the current layout 
> version. In 3.x, erasureCoding bits are added to the editLog transactions.
>  * So any edit log written after the upgrade and before finalizing the 
> upgrade will have the old layout version but the new format of transactions.
>  * When NN is restarted and the edit logs are replayed, the NN reads the old 
> layout version from the editLog file. When parsing the transactions, it 
> assumes that the transactions are also from the previous layout and hence 
> skips parsing the erasureCoding bits.
>  * This cascades into reading the wrong set of bits for other fields and 
> leads to NN shutting down.
> Sample error output:
> {code:java}
> java.lang.IllegalArgumentException: Invalid clientId - length is 0 expected 
> length 16
>  at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
>  at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:74)
>  at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:86)
>  at 
> org.apache.hadoop.ipc.RetryCache$CacheEntryWithPayload.(RetryCache.java:163)
>  at 
> org.apache.hadoop.ipc.RetryCache.addCacheEntryWithPayload(RetryCache.java:322)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheEntryWithPayload(FSNamesystem.java:960)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:397)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158)
>  at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:694)
>  at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:937)
>  at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:910)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1643)
>  at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1710)
> 2018-05-17 19:10:06,522 WARN 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception 
> loading fsimage
> 

[jira] [Comment Edited] (HDFS-13596) NN restart fails after RollingUpgrade from 2.x to 3.x

2019-04-06 Thread star (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16811584#comment-16811584
 ] 

star edited comment on HDFS-13596 at 4/6/19 2:34 PM:
-

Maybe another option:we can write new properties after the older ones, so that 
when restarted they can be ignored.  EC requests will still be accepted and 
persisted in editlog. 

In this case , it would be like this.

from 
{quote}FSImageSerialization.writeByte(storagePolicyId, out);
 FSImageSerialization.writeByte(erasureCodingPolicyId, out);
 writeRpcIds(rpcClientId, rpcCallId, out);
{quote}
to 
{quote}writeRpcIds(rpcClientId, rpcCallId, out);

FSImageSerialization.writeByte(storagePolicyId, out);
 FSImageSerialization.writeByte(erasureCodingPolicyId, out);
{quote}


was (Author: starphin):
Maybe another option:we can write new properties after the older ones, so that 
when restarted they can be ignored.  EC requests will still be accepted. 

In this case , it would be like this.

from 
{quote}FSImageSerialization.writeByte(storagePolicyId, out);
 FSImageSerialization.writeByte(erasureCodingPolicyId, out);
 writeRpcIds(rpcClientId, rpcCallId, out);
{quote}
to 
{quote}writeRpcIds(rpcClientId, rpcCallId, out);

FSImageSerialization.writeByte(storagePolicyId, out);
 FSImageSerialization.writeByte(erasureCodingPolicyId, out);
{quote}

> NN restart fails after RollingUpgrade from 2.x to 3.x
> -
>
> Key: HDFS-13596
> URL: https://issues.apache.org/jira/browse/HDFS-13596
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Reporter: Hanisha Koneru
>Assignee: Fei Hui
>Priority: Critical
> Attachments: HDFS-13596.001.patch, HDFS-13596.002.patch, 
> HDFS-13596.003.patch
>
>
> After rollingUpgrade NN from 2.x and 3.x, if the NN is restarted, it fails 
> while replaying edit logs.
>  * After NN is started with rollingUpgrade, the layoutVersion written to 
> editLogs (before finalizing the upgrade) is the pre-upgrade layout version 
> (so as to support downgrade).
>  * When writing transactions to log, NN writes as per the current layout 
> version. In 3.x, erasureCoding bits are added to the editLog transactions.
>  * So any edit log written after the upgrade and before finalizing the 
> upgrade will have the old layout version but the new format of transactions.
>  * When NN is restarted and the edit logs are replayed, the NN reads the old 
> layout version from the editLog file. When parsing the transactions, it 
> assumes that the transactions are also from the previous layout and hence 
> skips parsing the erasureCoding bits.
>  * This cascades into reading the wrong set of bits for other fields and 
> leads to NN shutting down.
> Sample error output:
> {code:java}
> java.lang.IllegalArgumentException: Invalid clientId - length is 0 expected 
> length 16
>  at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
>  at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:74)
>  at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:86)
>  at 
> org.apache.hadoop.ipc.RetryCache$CacheEntryWithPayload.(RetryCache.java:163)
>  at 
> org.apache.hadoop.ipc.RetryCache.addCacheEntryWithPayload(RetryCache.java:322)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheEntryWithPayload(FSNamesystem.java:960)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:397)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158)
>  at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:694)
>  at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:937)
>  at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:910)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1643)
>  at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1710)
> 2018-05-17 19:10:06,522 WARN 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception 
> loading fsimage
> java.io.IOException: java.lang.IllegalStateException: Cannot skip 

[jira] [Commented] (HDFS-13596) NN restart fails after RollingUpgrade from 2.x to 3.x

2019-04-06 Thread star (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16811584#comment-16811584
 ] 

star commented on HDFS-13596:
-

Maybe another option:we can write new properties after the older ones, so that 
when restarted they can be ignored.  EC requests will still be accepted. 

In this case , it would be like this.

from 
{quote}FSImageSerialization.writeByte(storagePolicyId, out);
 FSImageSerialization.writeByte(erasureCodingPolicyId, out);
 writeRpcIds(rpcClientId, rpcCallId, out);
{quote}
to 
{quote}writeRpcIds(rpcClientId, rpcCallId, out);

FSImageSerialization.writeByte(storagePolicyId, out);
 FSImageSerialization.writeByte(erasureCodingPolicyId, out);
{quote}

> NN restart fails after RollingUpgrade from 2.x to 3.x
> -
>
> Key: HDFS-13596
> URL: https://issues.apache.org/jira/browse/HDFS-13596
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Reporter: Hanisha Koneru
>Assignee: Fei Hui
>Priority: Critical
> Attachments: HDFS-13596.001.patch, HDFS-13596.002.patch, 
> HDFS-13596.003.patch
>
>
> After rollingUpgrade NN from 2.x and 3.x, if the NN is restarted, it fails 
> while replaying edit logs.
>  * After NN is started with rollingUpgrade, the layoutVersion written to 
> editLogs (before finalizing the upgrade) is the pre-upgrade layout version 
> (so as to support downgrade).
>  * When writing transactions to log, NN writes as per the current layout 
> version. In 3.x, erasureCoding bits are added to the editLog transactions.
>  * So any edit log written after the upgrade and before finalizing the 
> upgrade will have the old layout version but the new format of transactions.
>  * When NN is restarted and the edit logs are replayed, the NN reads the old 
> layout version from the editLog file. When parsing the transactions, it 
> assumes that the transactions are also from the previous layout and hence 
> skips parsing the erasureCoding bits.
>  * This cascades into reading the wrong set of bits for other fields and 
> leads to NN shutting down.
> Sample error output:
> {code:java}
> java.lang.IllegalArgumentException: Invalid clientId - length is 0 expected 
> length 16
>  at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
>  at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:74)
>  at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:86)
>  at 
> org.apache.hadoop.ipc.RetryCache$CacheEntryWithPayload.(RetryCache.java:163)
>  at 
> org.apache.hadoop.ipc.RetryCache.addCacheEntryWithPayload(RetryCache.java:322)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheEntryWithPayload(FSNamesystem.java:960)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:397)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158)
>  at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:694)
>  at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:937)
>  at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:910)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1643)
>  at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1710)
> 2018-05-17 19:10:06,522 WARN 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception 
> loading fsimage
> java.io.IOException: java.lang.IllegalStateException: Cannot skip to less 
> than the current value (=16389), where newValue=16388
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.resetLastInodeId(FSDirectory.java:1945)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:298)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158)
>  at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323)
>  at 
> 

[jira] [Commented] (HDFS-14378) Simplify the design of multiple NN and both logic of edit log roll and checkpoint

2019-04-02 Thread star (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16807719#comment-16807719
 ] 

star commented on HDFS-14378:
-

Initial patch for review. [~xkrogen] would you like to make a review for the 
patch? 

Don't have too much time these days. Maybe there are better solutions for this 
patch. I'd like to have some suggestion for this.

> Simplify the design of multiple NN and both logic of edit log roll and 
> checkpoint
> -
>
> Key: HDFS-14378
> URL: https://issues.apache.org/jira/browse/HDFS-14378
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: star
>Assignee: star
>Priority: Minor
> Attachments: HDFS-14378-trunk.001.patch, HDFS-14378-trunk.002.patch
>
>
>       HDFS-6440 introduced a mechanism to support more than 2 NNs. It 
> implements a first-writer-win policy to avoid duplicated fsimage downloading. 
> Variable 'isPrimaryCheckPointer' is used to hold the first-writer state, with 
> which SNN will provide fsimage for ANN next time. Then we have three roles in 
> NN cluster: ANN, one primary SNN, one or more normal SNN.
>       Since HDFS-12248, there may be more than two primary SNN shortly after 
> a exception occurred. It takes care with a scenario  that SNN will not upload 
> fsimage on IOE and Interrupted exceptions. Though it will not cause any 
> further functional issues, it is inconsistent. 
>       Futher more, edit log may be rolled more frequently than necessary with 
> multiple Standby name nodes, HDFS-14349. (I'm not so sure about this, will 
> verify by unit tests or any one could point it out.)
>       Above all, I‘m wondering if we could make it simple with following 
> changes:
>  * There are only two roles:ANN, SNN
>  * ANN will roll its edit log every DFS_HA_LOGROLL_PERIOD_KEY period.
>  * ANN will select a SNN to download checkpoint.
> SNN will just do logtail and checkpoint. Then provide a servlet for fsimage 
> downloading as normal. SNN will not try to roll edit log or send checkpoint 
> request to ANN.
> In a word, ANN will be more active. Suggestions are welcomed.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14378) Simplify the design of multiple NN and both logic of edit log roll and checkpoint

2019-04-02 Thread star (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

star updated HDFS-14378:

Attachment: HDFS-14378-trunk.002.patch

> Simplify the design of multiple NN and both logic of edit log roll and 
> checkpoint
> -
>
> Key: HDFS-14378
> URL: https://issues.apache.org/jira/browse/HDFS-14378
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: star
>Assignee: star
>Priority: Minor
> Attachments: HDFS-14378-trunk.001.patch, HDFS-14378-trunk.002.patch
>
>
>       HDFS-6440 introduced a mechanism to support more than 2 NNs. It 
> implements a first-writer-win policy to avoid duplicated fsimage downloading. 
> Variable 'isPrimaryCheckPointer' is used to hold the first-writer state, with 
> which SNN will provide fsimage for ANN next time. Then we have three roles in 
> NN cluster: ANN, one primary SNN, one or more normal SNN.
>       Since HDFS-12248, there may be more than two primary SNN shortly after 
> a exception occurred. It takes care with a scenario  that SNN will not upload 
> fsimage on IOE and Interrupted exceptions. Though it will not cause any 
> further functional issues, it is inconsistent. 
>       Futher more, edit log may be rolled more frequently than necessary with 
> multiple Standby name nodes, HDFS-14349. (I'm not so sure about this, will 
> verify by unit tests or any one could point it out.)
>       Above all, I‘m wondering if we could make it simple with following 
> changes:
>  * There are only two roles:ANN, SNN
>  * ANN will roll its edit log every DFS_HA_LOGROLL_PERIOD_KEY period.
>  * ANN will select a SNN to download checkpoint.
> SNN will just do logtail and checkpoint. Then provide a servlet for fsimage 
> downloading as normal. SNN will not try to roll edit log or send checkpoint 
> request to ANN.
> In a word, ANN will be more active. Suggestions are welcomed.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14370) Edit log tailing fast-path should allow for backoff

2019-04-01 Thread star (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16807341#comment-16807341
 ] 

star commented on HDFS-14370:
-

[~xkrogen], agree.  Server side handler as limited resource should not be 
blocked. 
{quote}This is similar to how the NN, when it wants a client to backoff due to 
load, throws a backoff exception and expects the client to act accordingly.
{quote}

> Edit log tailing fast-path should allow for backoff
> ---
>
> Key: HDFS-14370
> URL: https://issues.apache.org/jira/browse/HDFS-14370
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode, qjm
>Affects Versions: 3.3.0
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
>
> As part of HDFS-13150, in-progress edit log tailing was changed to use an 
> RPC-based mechanism, thus allowing the edit log tailing frequency to be 
> turned way down, and allowing standby/observer NameNodes to be only a few 
> milliseconds stale as compared to the Active NameNode.
> When there is a high volume of transactions on the system, each RPC fetches 
> transactions and takes some time to process them, self-rate-limiting how 
> frequently an RPC is submitted. In a lightly loaded cluster, however, most of 
> these RPCs return an empty set of transactions, consuming a high 
> (de)serialization overhead for very little benefit. This was reported by 
> [~jojochuang] in HDFS-14276 and I have also seen it on a test cluster where 
> the SbNN was submitting 8000 RPCs per second that returned empty.
> I propose we add some sort of backoff to the tailing, so that if an empty 
> response is received, it will wait a longer period of time before submitting 
> a new RPC.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14276) [SBN read] Reduce tailing overhead

2019-04-01 Thread star (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16806918#comment-16806918
 ] 

star commented on HDFS-14276:
-

Maybe we can just sleep 10ms in rpc server side if there are no edit logs. As a 
consequence that rpc thread will be blocked for 10ms.

> [SBN read] Reduce tailing overhead
> --
>
> Key: HDFS-14276
> URL: https://issues.apache.org/jira/browse/HDFS-14276
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ha, namenode
>Affects Versions: 3.3.0
> Environment: Hardware: 4-node cluster, each node has 4 core, Xeon 
> 2.5Ghz, 25GB memory.
> Software: CentOS 7.4, CDH 6.0 + Consistent Reads from Standby, Kerberos, SSL, 
> RPC encryption + Data Transfer Encryption.
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Major
> Attachments: HDFS-14276.000.patch, Screen Shot 2019-02-12 at 10.51.41 
> PM.png, Screen Shot 2019-02-14 at 11.50.37 AM.png
>
>
> When Observer sets {{dfs.ha.tail-edits.period}} = {{0ms}}, it tails edit log 
> continuously in order to fetch the latest edits, but there is a lot of 
> overhead in doing so.
> Critically, edit log tailer should _not_ update NameDirSize metric every 
> time. It has nothing to do with fetching edits, and it involves lots of 
> directory space calculation.
> Profiler suggests a non-trivial chunk of time is spent for nothing.
> Other than this, the biggest overhead is in the communication to 
> serialize/deserialize messages to/from JNs. I am looking for ways to reduce 
> the cost because it's burning 30% of my CPU time even when the cluster is 
> idle.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14378) Simplify the design of multiple NN and both logic of edit log roll and checkpoint

2019-04-01 Thread star (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

star updated HDFS-14378:

Attachment: (was: HDFS-14378-trunk.002.patch)

> Simplify the design of multiple NN and both logic of edit log roll and 
> checkpoint
> -
>
> Key: HDFS-14378
> URL: https://issues.apache.org/jira/browse/HDFS-14378
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: star
>Assignee: star
>Priority: Minor
> Attachments: HDFS-14378-trunk.001.patch
>
>
>       HDFS-6440 introduced a mechanism to support more than 2 NNs. It 
> implements a first-writer-win policy to avoid duplicated fsimage downloading. 
> Variable 'isPrimaryCheckPointer' is used to hold the first-writer state, with 
> which SNN will provide fsimage for ANN next time. Then we have three roles in 
> NN cluster: ANN, one primary SNN, one or more normal SNN.
>       Since HDFS-12248, there may be more than two primary SNN shortly after 
> a exception occurred. It takes care with a scenario  that SNN will not upload 
> fsimage on IOE and Interrupted exceptions. Though it will not cause any 
> further functional issues, it is inconsistent. 
>       Futher more, edit log may be rolled more frequently than necessary with 
> multiple Standby name nodes, HDFS-14349. (I'm not so sure about this, will 
> verify by unit tests or any one could point it out.)
>       Above all, I‘m wondering if we could make it simple with following 
> changes:
>  * There are only two roles:ANN, SNN
>  * ANN will roll its edit log every DFS_HA_LOGROLL_PERIOD_KEY period.
>  * ANN will select a SNN to download checkpoint.
> SNN will just do logtail and checkpoint. Then provide a servlet for fsimage 
> downloading as normal. SNN will not try to roll edit log or send checkpoint 
> request to ANN.
> In a word, ANN will be more active. Suggestions are welcomed.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14378) Simplify the design of multiple NN and both logic of edit log roll and checkpoint

2019-04-01 Thread star (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

star updated HDFS-14378:

Attachment: HDFS-14378-trunk.002.patch

> Simplify the design of multiple NN and both logic of edit log roll and 
> checkpoint
> -
>
> Key: HDFS-14378
> URL: https://issues.apache.org/jira/browse/HDFS-14378
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: star
>Assignee: star
>Priority: Minor
> Attachments: HDFS-14378-trunk.001.patch, HDFS-14378-trunk.002.patch
>
>
>       HDFS-6440 introduced a mechanism to support more than 2 NNs. It 
> implements a first-writer-win policy to avoid duplicated fsimage downloading. 
> Variable 'isPrimaryCheckPointer' is used to hold the first-writer state, with 
> which SNN will provide fsimage for ANN next time. Then we have three roles in 
> NN cluster: ANN, one primary SNN, one or more normal SNN.
>       Since HDFS-12248, there may be more than two primary SNN shortly after 
> a exception occurred. It takes care with a scenario  that SNN will not upload 
> fsimage on IOE and Interrupted exceptions. Though it will not cause any 
> further functional issues, it is inconsistent. 
>       Futher more, edit log may be rolled more frequently than necessary with 
> multiple Standby name nodes, HDFS-14349. (I'm not so sure about this, will 
> verify by unit tests or any one could point it out.)
>       Above all, I‘m wondering if we could make it simple with following 
> changes:
>  * There are only two roles:ANN, SNN
>  * ANN will roll its edit log every DFS_HA_LOGROLL_PERIOD_KEY period.
>  * ANN will select a SNN to download checkpoint.
> SNN will just do logtail and checkpoint. Then provide a servlet for fsimage 
> downloading as normal. SNN will not try to roll edit log or send checkpoint 
> request to ANN.
> In a word, ANN will be more active. Suggestions are welcomed.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14378) Simplify the design of multiple NN and both logic of edit log roll and checkpoint

2019-03-30 Thread star (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16806045#comment-16806045
 ] 

star commented on HDFS-14378:
-

first step: make ANN rolling its own log.

last step: make ANN download a fsimage from a randomly chosen SNN. Later will 
be added.

> Simplify the design of multiple NN and both logic of edit log roll and 
> checkpoint
> -
>
> Key: HDFS-14378
> URL: https://issues.apache.org/jira/browse/HDFS-14378
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: star
>Assignee: star
>Priority: Minor
> Attachments: HDFS-14378-trunk.001.patch
>
>
>       HDFS-6440 introduced a mechanism to support more than 2 NNs. It 
> implements a first-writer-win policy to avoid duplicated fsimage downloading. 
> Variable 'isPrimaryCheckPointer' is used to hold the first-writer state, with 
> which SNN will provide fsimage for ANN next time. Then we have three roles in 
> NN cluster: ANN, one primary SNN, one or more normal SNN.
>       Since HDFS-12248, there may be more than two primary SNN shortly after 
> a exception occurred. It takes care with a scenario  that SNN will not upload 
> fsimage on IOE and Interrupted exceptions. Though it will not cause any 
> further functional issues, it is inconsistent. 
>       Futher more, edit log may be rolled more frequently than necessary with 
> multiple Standby name nodes, HDFS-14349. (I'm not so sure about this, will 
> verify by unit tests or any one could point it out.)
>       Above all, I‘m wondering if we could make it simple with following 
> changes:
>  * There are only two roles:ANN, SNN
>  * ANN will roll its edit log every DFS_HA_LOGROLL_PERIOD_KEY period.
>  * ANN will select a SNN to download checkpoint.
> SNN will just do logtail and checkpoint. Then provide a servlet for fsimage 
> downloading as normal. SNN will not try to roll edit log or send checkpoint 
> request to ANN.
> In a word, ANN will be more active. Suggestions are welcomed.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14378) Simplify the design of multiple NN and both logic of edit log roll and checkpoint

2019-03-30 Thread star (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

star updated HDFS-14378:

Attachment: HDFS-14378-trunk.001.patch

> Simplify the design of multiple NN and both logic of edit log roll and 
> checkpoint
> -
>
> Key: HDFS-14378
> URL: https://issues.apache.org/jira/browse/HDFS-14378
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: star
>Assignee: star
>Priority: Minor
> Attachments: HDFS-14378-trunk.001.patch
>
>
>       HDFS-6440 introduced a mechanism to support more than 2 NNs. It 
> implements a first-writer-win policy to avoid duplicated fsimage downloading. 
> Variable 'isPrimaryCheckPointer' is used to hold the first-writer state, with 
> which SNN will provide fsimage for ANN next time. Then we have three roles in 
> NN cluster: ANN, one primary SNN, one or more normal SNN.
>       Since HDFS-12248, there may be more than two primary SNN shortly after 
> a exception occurred. It takes care with a scenario  that SNN will not upload 
> fsimage on IOE and Interrupted exceptions. Though it will not cause any 
> further functional issues, it is inconsistent. 
>       Futher more, edit log may be rolled more frequently than necessary with 
> multiple Standby name nodes, HDFS-14349. (I'm not so sure about this, will 
> verify by unit tests or any one could point it out.)
>       Above all, I‘m wondering if we could make it simple with following 
> changes:
>  * There are only two roles:ANN, SNN
>  * ANN will roll its edit log every DFS_HA_LOGROLL_PERIOD_KEY period.
>  * ANN will select a SNN to download checkpoint.
> SNN will just do logtail and checkpoint. Then provide a servlet for fsimage 
> downloading as normal. SNN will not try to roll edit log or send checkpoint 
> request to ANN.
> In a word, ANN will be more active. Suggestions are welcomed.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14378) Simplify the design of multiple NN and both logic of edit log roll and checkpoint

2019-03-18 Thread star (JIRA)
star created HDFS-14378:
---

 Summary: Simplify the design of multiple NN and both logic of edit 
log roll and checkpoint
 Key: HDFS-14378
 URL: https://issues.apache.org/jira/browse/HDFS-14378
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: star
Assignee: star


      HDFS-6440 introduced a mechanism to support more than 2 NNs. It 
implements a first-writer-win policy to avoid duplicated fsimage downloading. 
Variable 'isPrimaryCheckPointer' is used to hold the first-writer state, with 
which SNN will provide fsimage for ANN next time. Then we have three roles in 
NN cluster: ANN, one primary SNN, one or more normal SNN.

      Since HDFS-12248, there may be more than two primary SNN shortly after a 
exception occurred. It takes care with a scenario  that SNN will not upload 
fsimage on IOE and Interrupted exceptions. Though it will not cause any further 
functional issues, it is inconsistent. 

      Futher more, edit log may be rolled more frequently than necessary with 
multiple Standby name nodes, HDFS-14349. (I'm not so sure about this, will 
verify by unit tests or any could point it out.)

      Above all, I‘m wondering if we could make it simple with following 
changes:
 * There are only two roles:ANN, SNN
 * ANN will roll its edit log every DFS_HA_LOGROLL_PERIOD_KEY period.
 * ANN will select a SNN to download checkpoint.

SNN will just do logtail and checkpoint. Then provide a servlet for fsimage 
downloading as normal. SNN will not try to roll edit log or send checkpoint 
request to ANN.

In a word, ANN will be more active. Suggestions are welcomed.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14378) Simplify the design of multiple NN and both logic of edit log roll and checkpoint

2019-03-18 Thread star (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

star updated HDFS-14378:

Description: 
      HDFS-6440 introduced a mechanism to support more than 2 NNs. It 
implements a first-writer-win policy to avoid duplicated fsimage downloading. 
Variable 'isPrimaryCheckPointer' is used to hold the first-writer state, with 
which SNN will provide fsimage for ANN next time. Then we have three roles in 
NN cluster: ANN, one primary SNN, one or more normal SNN.

      Since HDFS-12248, there may be more than two primary SNN shortly after a 
exception occurred. It takes care with a scenario  that SNN will not upload 
fsimage on IOE and Interrupted exceptions. Though it will not cause any further 
functional issues, it is inconsistent. 

      Futher more, edit log may be rolled more frequently than necessary with 
multiple Standby name nodes, HDFS-14349. (I'm not so sure about this, will 
verify by unit tests or any one could point it out.)

      Above all, I‘m wondering if we could make it simple with following 
changes:
 * There are only two roles:ANN, SNN
 * ANN will roll its edit log every DFS_HA_LOGROLL_PERIOD_KEY period.
 * ANN will select a SNN to download checkpoint.

SNN will just do logtail and checkpoint. Then provide a servlet for fsimage 
downloading as normal. SNN will not try to roll edit log or send checkpoint 
request to ANN.

In a word, ANN will be more active. Suggestions are welcomed.

 

  was:
      HDFS-6440 introduced a mechanism to support more than 2 NNs. It 
implements a first-writer-win policy to avoid duplicated fsimage downloading. 
Variable 'isPrimaryCheckPointer' is used to hold the first-writer state, with 
which SNN will provide fsimage for ANN next time. Then we have three roles in 
NN cluster: ANN, one primary SNN, one or more normal SNN.

      Since HDFS-12248, there may be more than two primary SNN shortly after a 
exception occurred. It takes care with a scenario  that SNN will not upload 
fsimage on IOE and Interrupted exceptions. Though it will not cause any further 
functional issues, it is inconsistent. 

      Futher more, edit log may be rolled more frequently than necessary with 
multiple Standby name nodes, HDFS-14349. (I'm not so sure about this, will 
verify by unit tests or any could point it out.)

      Above all, I‘m wondering if we could make it simple with following 
changes:
 * There are only two roles:ANN, SNN
 * ANN will roll its edit log every DFS_HA_LOGROLL_PERIOD_KEY period.
 * ANN will select a SNN to download checkpoint.

SNN will just do logtail and checkpoint. Then provide a servlet for fsimage 
downloading as normal. SNN will not try to roll edit log or send checkpoint 
request to ANN.

In a word, ANN will be more active. Suggestions are welcomed.

 


> Simplify the design of multiple NN and both logic of edit log roll and 
> checkpoint
> -
>
> Key: HDFS-14378
> URL: https://issues.apache.org/jira/browse/HDFS-14378
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: star
>Assignee: star
>Priority: Minor
>
>       HDFS-6440 introduced a mechanism to support more than 2 NNs. It 
> implements a first-writer-win policy to avoid duplicated fsimage downloading. 
> Variable 'isPrimaryCheckPointer' is used to hold the first-writer state, with 
> which SNN will provide fsimage for ANN next time. Then we have three roles in 
> NN cluster: ANN, one primary SNN, one or more normal SNN.
>       Since HDFS-12248, there may be more than two primary SNN shortly after 
> a exception occurred. It takes care with a scenario  that SNN will not upload 
> fsimage on IOE and Interrupted exceptions. Though it will not cause any 
> further functional issues, it is inconsistent. 
>       Futher more, edit log may be rolled more frequently than necessary with 
> multiple Standby name nodes, HDFS-14349. (I'm not so sure about this, will 
> verify by unit tests or any one could point it out.)
>       Above all, I‘m wondering if we could make it simple with following 
> changes:
>  * There are only two roles:ANN, SNN
>  * ANN will roll its edit log every DFS_HA_LOGROLL_PERIOD_KEY period.
>  * ANN will select a SNN to download checkpoint.
> SNN will just do logtail and checkpoint. Then provide a servlet for fsimage 
> downloading as normal. SNN will not try to roll edit log or send checkpoint 
> request to ANN.
> In a word, ANN will be more active. Suggestions are welcomed.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14349) Edit log may be rolled more frequently than necessary with multiple Standby nodes

2019-03-18 Thread star (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16795553#comment-16795553
 ] 

star commented on HDFS-14349:
-

Yes, it seems that normal edit log roll will be triggered by multiple SNN. I am 
doing unit tests to verify the action. 

> Edit log may be rolled more frequently than necessary with multiple Standby 
> nodes
> -
>
> Key: HDFS-14349
> URL: https://issues.apache.org/jira/browse/HDFS-14349
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, hdfs, qjm
>Reporter: Erik Krogen
>Assignee: Ekanth Sethuramalingam
>Priority: Major
>
> When HDFS-14317 was fixed, we tackled the problem that in a cluster with 
> in-progress edit log tailing enabled, a Standby NameNode may _never_ roll the 
> edit logs, which can eventually cause data loss.
> Unfortunately, in the process, it was made so that if there are multiple 
> Standby NameNodes, they will all roll the edit logs at their specified 
> frequency, so the edit log will be rolled X times more frequently than they 
> should be (where X is the number of Standby NNs). This is not as bad as the 
> original bug since rolling frequently does not affect correctness or data 
> availability, but may degrade performance by creating more edit log segments 
> than necessary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14349) Edit log may be rolled more frequently than necessary with multiple Standby nodes

2019-03-16 Thread star (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16794201#comment-16794201
 ] 

star commented on HDFS-14349:
-

The autoroll operation is executed in ANN service, not triggered by SNN.

So it will not degrade NN performance with more SNN.

Related code FSNamesystem:

 
{code:java}
//Active Service
void startActiveServices() throws IOException {
...
nnEditLogRoller = new Daemon(new NameNodeEditLogRoller(
editLogRollerThreshold, editLogRollerInterval));
nnEditLogRoller.start();
...
}

//Auto roll log
long numEdits = getCorrectTransactionsSinceLastLogRoll();
if (numEdits > rollThreshold) {
  FSNamesystem.LOG.info("NameNode rolling its own edit log because"
  + " number of edits in open segment exceeds threshold of "
  + rollThreshold);
  rollEditLog();
}

{code}
 

> Edit log may be rolled more frequently than necessary with multiple Standby 
> nodes
> -
>
> Key: HDFS-14349
> URL: https://issues.apache.org/jira/browse/HDFS-14349
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, hdfs, qjm
>Reporter: Erik Krogen
>Assignee: Ekanth Sethuramalingam
>Priority: Major
>
> When HDFS-14317 was fixed, we tackled the problem that in a cluster with 
> in-progress edit log tailing enabled, a Standby NameNode may _never_ roll the 
> edit logs, which can eventually cause data loss.
> Unfortunately, in the process, it was made so that if there are multiple 
> Standby NameNodes, they will all roll the edit logs at their specified 
> frequency, so the edit log will be rolled X times more frequently than they 
> should be (where X is the number of Standby NNs). This is not as bad as the 
> original bug since rolling frequently does not affect correctness or data 
> availability, but may degrade performance by creating more edit log segments 
> than necessary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



  1   2   >