[jira] [Created] (HDFS-15464) ViewFsOverloadScheme should work when -fs option pointing to remote cluster without mount links

2020-07-09 Thread Uma Maheswara Rao G (Jira)
Uma Maheswara Rao G created HDFS-15464:
--

 Summary: ViewFsOverloadScheme should work when -fs option pointing 
to remote cluster without mount links
 Key: HDFS-15464
 URL: https://issues.apache.org/jira/browse/HDFS-15464
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: viewfsOverloadScheme
Affects Versions: 3.2.1
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G


When users try to connect to remote cluster from the cluster env where you 
enabled ViewFSOverloadScheme, it expects to have at least one mount link make 
fs init success. 
Unfortunately you might not have configured any mount links with that remote 
cluster in your current env. You would have configured only with your local 
clusters mount points.
In this case fs init will fail with no mount points configured the mount table 
if that remote cluster uri's authority.

One idea is that, when there are no mount links configured, we should just 
consider that as default cluster, that can be achieved by considering it as 
fallback option automatically.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14498) LeaseManager can loop forever on the file for which create has failed

2020-07-09 Thread Xiaoqiao He (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17155075#comment-17155075
 ] 

Xiaoqiao He commented on HDFS-14498:


Hi [~weichiu] and every watchers, Any other review comments here?
I would like to pending double days then commit. 
[~sodonnell] just check all active branches, it seems all of them have the 
issue, we may need backport to them, [^HDFS-14498.002.patch] is clean to 
cherry-pick at local fortunately. Would you like to have double check?

> LeaseManager can loop forever on the file for which create has failed 
> --
>
> Key: HDFS-14498
> URL: https://issues.apache.org/jira/browse/HDFS-14498
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.9.0
>Reporter: Sergey Shelukhin
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: HDFS-14498.001.patch, HDFS-14498.002.patch
>
>
> The logs from file creation are long gone due to infinite lease logging, 
> however it presumably failed... the client who was trying to write this file 
> is definitely long dead.
> The version includes HDFS-4882.
> We get this log pattern repeating infinitely:
> {noformat}
> 2019-05-16 14:00:16,893 INFO 
> [org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager: [Lease.  Holder: 
> DFSClient_NONMAPREDUCE_-20898906_61, pending creates: 1] has expired hard 
> limit
> 2019-05-16 14:00:16,893 INFO 
> [org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering [Lease.  
> Holder: DFSClient_NONMAPREDUCE_-20898906_61, pending creates: 1], src=
> 2019-05-16 14:00:16,893 WARN 
> [org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] 
> org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.internalReleaseLease: 
> Failed to release lease for file . Committed blocks are waiting to be 
> minimally replicated. Try again later.
> 2019-05-16 14:00:16,893 WARN 
> [org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager: Cannot release the path 
>  in the lease [Lease.  Holder: DFSClient_NONMAPREDUCE_-20898906_61, 
> pending creates: 1]. It will be retried.
> org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: DIR* 
> NameSystem.internalReleaseLease: Failed to release lease for file . 
> Committed blocks are waiting to be minimally replicated. Try again later.
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.internalReleaseLease(FSNamesystem.java:3357)
>   at 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager.checkLeases(LeaseManager.java:573)
>   at 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:509)
>   at java.lang.Thread.run(Thread.java:745)
> $  grep -c "Recovering.*DFSClient_NONMAPREDUCE_-20898906_61, pending creates: 
> 1" hdfs_nn*
> hdfs_nn.log:1068035
> hdfs_nn.log.2019-05-16-14:1516179
> hdfs_nn.log.2019-05-16-15:1538350
> {noformat}
> Aside from an actual bug fix, it might make sense to make LeaseManager not 
> log so much, in case if there are more bugs like this...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15462) Add fs.viewfs.overload.scheme.target.ofs.impl to core-default.xml

2020-07-09 Thread Uma Maheswara Rao G (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-15462:
---
Fix Version/s: 3.1.5
   3.2.2

> Add fs.viewfs.overload.scheme.target.ofs.impl to core-default.xml
> -
>
> Key: HDFS-15462
> URL: https://issues.apache.org/jira/browse/HDFS-15462
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: configuration, viewfs, viewfsOverloadScheme
>Affects Versions: 3.2.1
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
> Fix For: 3.2.2, 3.3.1, 3.4.0, 3.1.5
>
>
> HDFS-15394 added the existing impls in core-default.xml except ofs. Let's add 
> ofs to core-default here.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15462) Add fs.viewfs.overload.scheme.target.ofs.impl to core-default.xml

2020-07-09 Thread Uma Maheswara Rao G (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-15462:
---
Fix Version/s: 3.3.1

> Add fs.viewfs.overload.scheme.target.ofs.impl to core-default.xml
> -
>
> Key: HDFS-15462
> URL: https://issues.apache.org/jira/browse/HDFS-15462
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: configuration, viewfs, viewfsOverloadScheme
>Affects Versions: 3.2.1
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
> Fix For: 3.3.1, 3.4.0
>
>
> HDFS-15394 added the existing impls in core-default.xml except ofs. Let's add 
> ofs to core-default here.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15463) Add a tool to validate FsImage

2020-07-09 Thread Tsz-wo Sze (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz-wo Sze updated HDFS-15463:
--
Attachment: FsImageValidation20200709.patch

> Add a tool to validate FsImage
> --
>
> Key: HDFS-15463
> URL: https://issues.apache.org/jira/browse/HDFS-15463
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: namenode
>Reporter: Tsz-wo Sze
>Assignee: Tsz-wo Sze
>Priority: Major
> Attachments: FsImageValidation20200709.patch
>
>
> Due to some snapshot related bugs, a fsimage may become corrupted.  Using a 
> corrupted fsimage may further result in data loss.
> In some cases, we found that reference counts are incorrect in some corrupted 
> FsImage.  One of the goals of the validation tool is to check  reference 
> counts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15463) Add a tool to validate FsImage

2020-07-09 Thread Tsz-wo Sze (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz-wo Sze updated HDFS-15463:
--
Description: 
Due to some snapshot related bugs, a fsimage may become corrupted.  Using a 
corrupted fsimage may further result in data loss.

In some cases, we found that reference counts are incorrect in some corrupted 
FsImage.  One of the goals of the validation tool is to check  reference counts.

  was:


In some cases, we found that reference counts in some corrupted FsImage


> Add a tool to validate FsImage
> --
>
> Key: HDFS-15463
> URL: https://issues.apache.org/jira/browse/HDFS-15463
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: namenode
>Reporter: Tsz-wo Sze
>Assignee: Tsz-wo Sze
>Priority: Major
>
> Due to some snapshot related bugs, a fsimage may become corrupted.  Using a 
> corrupted fsimage may further result in data loss.
> In some cases, we found that reference counts are incorrect in some corrupted 
> FsImage.  One of the goals of the validation tool is to check  reference 
> counts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15463) Add a tool to validate FsImage

2020-07-09 Thread Tsz-wo Sze (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz-wo Sze updated HDFS-15463:
--
Attachment: FsImageValidation20200708.patch

> Add a tool to validate FsImage
> --
>
> Key: HDFS-15463
> URL: https://issues.apache.org/jira/browse/HDFS-15463
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: namenode
>Reporter: Tsz-wo Sze
>Assignee: Tsz-wo Sze
>Priority: Major
>
> In some cases, we found that reference counts in some corrupted FsImage



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15463) Add a tool to validate FsImage

2020-07-09 Thread Tsz-wo Sze (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz-wo Sze updated HDFS-15463:
--
Description: 


In some cases, we found that reference counts in some corrupted FsImage

> Add a tool to validate FsImage
> --
>
> Key: HDFS-15463
> URL: https://issues.apache.org/jira/browse/HDFS-15463
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: namenode
>Reporter: Tsz-wo Sze
>Assignee: Tsz-wo Sze
>Priority: Major
>
> In some cases, we found that reference counts in some corrupted FsImage



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15463) Add a tool to validate FsImage

2020-07-09 Thread Tsz-wo Sze (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz-wo Sze updated HDFS-15463:
--
Attachment: (was: FsImageValidation20200708.patch)

> Add a tool to validate FsImage
> --
>
> Key: HDFS-15463
> URL: https://issues.apache.org/jira/browse/HDFS-15463
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: namenode
>Reporter: Tsz-wo Sze
>Assignee: Tsz-wo Sze
>Priority: Major
>
> In some cases, we found that reference counts in some corrupted FsImage



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15463) Add a tool to validate FsImage

2020-07-09 Thread Tsz-wo Sze (Jira)
Tsz-wo Sze created HDFS-15463:
-

 Summary: Add a tool to validate FsImage
 Key: HDFS-15463
 URL: https://issues.apache.org/jira/browse/HDFS-15463
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: namenode
Reporter: Tsz-wo Sze
Assignee: Tsz-wo Sze






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15462) Add fs.viewfs.overload.scheme.target.ofs.impl to core-default.xml

2020-07-09 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154941#comment-17154941
 ] 

Hudson commented on HDFS-15462:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #18423 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/18423/])
HDFS-15462. Add fs.viewfs.overload.scheme.target.ofs.impl to (github: rev 
0e694b20b9d59cc46882df506dcea386020b1e4d)
* (edit) hadoop-common-project/hadoop-common/src/main/resources/core-default.xml
* (edit) 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/conf/TestCommonConfigurationFields.java


> Add fs.viewfs.overload.scheme.target.ofs.impl to core-default.xml
> -
>
> Key: HDFS-15462
> URL: https://issues.apache.org/jira/browse/HDFS-15462
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: configuration, viewfs, viewfsOverloadScheme
>Affects Versions: 3.2.1
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
> Fix For: 3.4.0
>
>
> HDFS-15394 added the existing impls in core-default.xml except ofs. Let's add 
> ofs to core-default here.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15462) Add fs.viewfs.overload.scheme.target.ofs.impl to core-default.xml

2020-07-09 Thread Siyao Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siyao Meng updated HDFS-15462:
--
Fix Version/s: 3.4.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> Add fs.viewfs.overload.scheme.target.ofs.impl to core-default.xml
> -
>
> Key: HDFS-15462
> URL: https://issues.apache.org/jira/browse/HDFS-15462
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: configuration, viewfs, viewfsOverloadScheme
>Affects Versions: 3.2.1
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
> Fix For: 3.4.0
>
>
> HDFS-15394 added the existing impls in core-default.xml except ofs. Let's add 
> ofs to core-default here.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-13082) cookieverf mismatch error over NFS gateway on Linux

2020-07-09 Thread Daniel Howard (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154055#comment-17154055
 ] 

Daniel Howard edited comment on HDFS-13082 at 7/9/20, 6:16 PM:
---

I found a comment that implies that Linux doesn't handle cookie verification:[1]
{quote}This discussion comes up pretty much every time someone writes a new NFS 
server and/or filesystem. The thing that neither RFC1813, RFC3530, or RFC5661 
have done is come up with sane semantics for how a NFS client is supposed to 
recover from the above scenario. What do I do with things like 
telldir()/seekdir() cookies? How do I recover my 'current position' in the 
readdir() stream?

IOW: how do I fake up POSIX semantics to the applications?

Until the recovery question is answered, the Linux client will continue to 
ignore the whole "cookie verifier" junk...
{quote}
[1]: 
[https://linuxlists.cc/l/17/linux-nfs/t/2933109/readdir_from_linux_nfs4_client_when_cookieverf_is_no_longer_valid]

Here is a reference to cookieverf being removed from an Android kernel:
 
[https://gitlab.incom.co/CM-Shield/android_kernel_nvidia_shieldtablet/commit/c3f52af3e03013db5237e339c817beaae5ec9e3a]


was (Author: dannyman):
I found a comment that implies that Linux doesn't handle cookie verification:[1]
bq. This discussion comes up pretty much every time someone writes a new NFS 
server and/or filesystem. The thing that neither RFC1813, RFC3530, or RFC5661 
have done is come up with sane semantics for how a NFS client is supposed to 
recover from the above scenario. What do I do with things like 
telldir()/seekdir() cookies? How do I recover my 'current position' in the 
readdir() stream?
bq. IOW: how do I fake up POSIX semantics to the applications?
bq. 
bq. Until the recovery question is answered, the Linux client will continue to 
ignore the whole "cookie verifier" junk...

[1]: 
https://linuxlists.cc/l/17/linux-nfs/t/2933109/readdir_from_linux_nfs4_client_when_cookieverf_is_no_longer_valid

Here is a reference to cookieverf being removed from an Android kernel:
https://gitlab.incom.co/CM-Shield/android_kernel_nvidia_shieldtablet/commit/c3f52af3e03013db5237e339c817beaae5ec9e3a

> cookieverf mismatch error over NFS gateway on Linux
> ---
>
> Key: HDFS-13082
> URL: https://issues.apache.org/jira/browse/HDFS-13082
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.7.3
>Reporter: Dan Moraru
>Priority: Minor
>
> Running 'ls' on some directories over an HDFS-NFS gateway sometimes fails to 
> list the contents of those directories.  Running 'ls' on those same 
> directories mounted via FUSE works.  The NFS gateway logs errors like the 
> following:
> 2018-01-29 11:53:01,130 ERROR org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3: 
> cookieverf mismatch. request cookieverf: 1513390944415 dir cookieverf: 
> 1516920857335
> Reviewing 
> hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/RpcProgramNfs3.java
>  suggested that  these errors can be avoided by setting 
> nfs.aix.compatibility.mode.enabled=true, and that is indeed the case.  The 
> documentation lists https://issues.apache.org/jira/browse/HDFS-6549 as a 
> known issue, but also goes on to say that "regular, non-AIX clients should 
> NOT enable AIX compatibility mode. The work-arounds implemented by AIX 
> compatibility mode effectively disable safeguards to ensure that listing of 
> directory contents via NFS returns consistent results, and that all data sent 
> to the NFS server can be assured to have been committed."   Server and client 
> is this case are one and the same, running Scientific Linux 7.4.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14498) LeaseManager can loop forever on the file for which create has failed

2020-07-09 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154657#comment-17154657
 ] 

Wei-Chiu Chuang commented on HDFS-14498:


It looks very similar to HDFS-8406 / HDFS-11817. I thought this class of lease 
leakage bugs are settled. Surprised to see there are still corner cases!

> LeaseManager can loop forever on the file for which create has failed 
> --
>
> Key: HDFS-14498
> URL: https://issues.apache.org/jira/browse/HDFS-14498
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.9.0
>Reporter: Sergey Shelukhin
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: HDFS-14498.001.patch, HDFS-14498.002.patch
>
>
> The logs from file creation are long gone due to infinite lease logging, 
> however it presumably failed... the client who was trying to write this file 
> is definitely long dead.
> The version includes HDFS-4882.
> We get this log pattern repeating infinitely:
> {noformat}
> 2019-05-16 14:00:16,893 INFO 
> [org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager: [Lease.  Holder: 
> DFSClient_NONMAPREDUCE_-20898906_61, pending creates: 1] has expired hard 
> limit
> 2019-05-16 14:00:16,893 INFO 
> [org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering [Lease.  
> Holder: DFSClient_NONMAPREDUCE_-20898906_61, pending creates: 1], src=
> 2019-05-16 14:00:16,893 WARN 
> [org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] 
> org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.internalReleaseLease: 
> Failed to release lease for file . Committed blocks are waiting to be 
> minimally replicated. Try again later.
> 2019-05-16 14:00:16,893 WARN 
> [org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager: Cannot release the path 
>  in the lease [Lease.  Holder: DFSClient_NONMAPREDUCE_-20898906_61, 
> pending creates: 1]. It will be retried.
> org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: DIR* 
> NameSystem.internalReleaseLease: Failed to release lease for file . 
> Committed blocks are waiting to be minimally replicated. Try again later.
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.internalReleaseLease(FSNamesystem.java:3357)
>   at 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager.checkLeases(LeaseManager.java:573)
>   at 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:509)
>   at java.lang.Thread.run(Thread.java:745)
> $  grep -c "Recovering.*DFSClient_NONMAPREDUCE_-20898906_61, pending creates: 
> 1" hdfs_nn*
> hdfs_nn.log:1068035
> hdfs_nn.log.2019-05-16-14:1516179
> hdfs_nn.log.2019-05-16-15:1538350
> {noformat}
> Aside from an actual bug fix, it might make sense to make LeaseManager not 
> log so much, in case if there are more bugs like this...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-15425) Review Logging of DFSClient

2020-07-09 Thread Hongbing Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongbing Wang reassigned HDFS-15425:


Assignee: Hongbing Wang

> Review Logging of DFSClient
> ---
>
> Key: HDFS-15425
> URL: https://issues.apache.org/jira/browse/HDFS-15425
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: dfsclient
>Reporter: Hongbing Wang
>Assignee: Hongbing Wang
>Priority: Minor
> Fix For: 3.4.0
>
> Attachments: HDFS-15425.001.patch, HDFS-15425.002.patch, 
> HDFS-15425.003.patch
>
>
> Review use of SLF4J for DFSClient.LOG. 
> Make the code more concise and readable. 
> Less is more !



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-15425) Review Logging of DFSClient

2020-07-09 Thread wangzhaohui (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wangzhaohui reassigned HDFS-15425:
--

Assignee: (was: Hongbing Wang)

> Review Logging of DFSClient
> ---
>
> Key: HDFS-15425
> URL: https://issues.apache.org/jira/browse/HDFS-15425
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: dfsclient
>Reporter: Hongbing Wang
>Priority: Minor
> Fix For: 3.4.0
>
> Attachments: HDFS-15425.001.patch, HDFS-15425.002.patch, 
> HDFS-15425.003.patch
>
>
> Review use of SLF4J for DFSClient.LOG. 
> Make the code more concise and readable. 
> Less is more !



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14396) Failed to load image from FSImageFile when downgrade from 3.x to 2.x

2020-07-09 Thread fengwu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154478#comment-17154478
 ] 

fengwu edited comment on HDFS-14396 at 7/9/20, 12:10 PM:
-

Hi,   Found in my test roll downgrade from 3.1.3 to 2.7.2, namenode  successful 
, but datanode failed (2.8+ successfully ).  because  different datanode  
layout versions is -56 in hdfs 2.7

So,  Is there a way to solve datanode roll downgrade from layout version -57 to 
-56 ?

 
{code:java}
 // code placeholder
2020-07-06 14:45:01,313 WARN org.apache.hadoop.hdfs.server.common.Storage: 
org.apache.hadoop.hdfs.server.common.IncorrectVersionException: Unexpected 
version of storage directory /data/hadoop/dfs. Reported: -57. Expecting = -56. 
2020-07-06 14:45:01,315 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock 
on /data/hadoop/dfs/in_use.lock acquired by nodename 21258@test-v03 2020-07-06 
14:45:01,315 WARN org.apache.hadoop.hdfs.server.common.Storage: 
org.apache.hadoop.hdfs.server.common.IncorrectVersionException: Unexpected 
version of storage directory /data/hadoop/dfs. Reported: -57. Expecting = -56. 
2020-07-06 14:45:01,315 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: 
Initialization failed for Block pool  (Datanode Uuid unassigned) 
service to test-v01/10.110.228.21:8020. Exiting. java.io.IOException: All 
specified directories are failed to load. at 
org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:478)
 at 
org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1358) 
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1323)
 at 
org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:317)
 at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:223)
 at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:802)
 at java.lang.Thread.run(Thread.java:748)
{code}
 


was (Author: fengwu99):
Hi,   Found in my test roll downgrade from 3.1.3 to 2.7.2, namenode  successful 
, but datanode failed (2.8+ successfully ).  because  different datanode  
layout versions is -56 in hdfs 2.7.  

So,  Is there a way to solve datanode roll downgrade from layout version -57 to 
-56 ?
// code placeholder2020-07-06 14:45:01,313 WARN 
org.apache.hadoop.hdfs.server.common.Storage: 
org.apache.hadoop.hdfs.server.common.IncorrectVersionException: Unexpected 
version of storage directory /data/hadoop/dfs. Reported: -57. Expecting = -56.
2020-07-06 14:45:01,315 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock 
on /data/hadoop/dfs/in_use.lock acquired by nodename 21258@test-v03
2020-07-06 14:45:01,315 WARN org.apache.hadoop.hdfs.server.common.Storage: 
org.apache.hadoop.hdfs.server.common.IncorrectVersionException: Unexpected 
version of storage directory /data/hadoop/dfs. Reported: -57. Expecting = -56.
2020-07-06 14:45:01,315 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: 
Initialization failed for Block pool  (Datanode Uuid unassigned) 
service to test-v01/10.110.228.21:8020. Exiting.
java.io.IOException: All specified directories are failed to load.
at 
org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:478)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1358)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1323)
at 
org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:317)
at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:223)
at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:802)
at java.lang.Thread.run(Thread.java:748)

> Failed to load image from FSImageFile when downgrade from 3.x to 2.x
> 
>
> Key: HDFS-14396
> URL: https://issues.apache.org/jira/browse/HDFS-14396
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rolling upgrades
>Reporter: Fei Hui
>Assignee: Fei Hui
>Priority: Blocker
> Fix For: 3.3.0, 3.2.1, 3.1.3
>
> Attachments: HDFS-14396.001.patch, HDFS-14396.002.patch
>
>
> After fixing HDFS-13596, try to downgrade from 3.x to 2.x. But namenode can't 
> start because exception occurs. The message follows
> {code:java}
> 2019-01-23 17:22:18,730 ERROR org.apache.hadoop.hdfs.server.namenode.FSImage: 
> Failed to load image from 
> FSImageFile(file=/data1/hadoopdata/hadoop-namenode/current/fsimage_0025310,
>  cpktTxId=00
> 25310)
> java.lang.NullPointerException
> at 
> 

[jira] [Commented] (HDFS-14396) Failed to load image from FSImageFile when downgrade from 3.x to 2.x

2020-07-09 Thread fengwu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154478#comment-17154478
 ] 

fengwu commented on HDFS-14396:
---

Hi,   Found in my test roll downgrade from 3.1.3 to 2.7.2, namenode  successful 
, but datanode failed (2.8+ successfully ).  because  different datanode  
layout versions is -56 in hdfs 2.7.  

So,  Is there a way to solve datanode roll downgrade from layout version -57 to 
-56 ?
// code placeholder2020-07-06 14:45:01,313 WARN 
org.apache.hadoop.hdfs.server.common.Storage: 
org.apache.hadoop.hdfs.server.common.IncorrectVersionException: Unexpected 
version of storage directory /data/hadoop/dfs. Reported: -57. Expecting = -56.
2020-07-06 14:45:01,315 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock 
on /data/hadoop/dfs/in_use.lock acquired by nodename 21258@test-v03
2020-07-06 14:45:01,315 WARN org.apache.hadoop.hdfs.server.common.Storage: 
org.apache.hadoop.hdfs.server.common.IncorrectVersionException: Unexpected 
version of storage directory /data/hadoop/dfs. Reported: -57. Expecting = -56.
2020-07-06 14:45:01,315 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: 
Initialization failed for Block pool  (Datanode Uuid unassigned) 
service to test-v01/10.110.228.21:8020. Exiting.
java.io.IOException: All specified directories are failed to load.
at 
org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:478)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1358)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1323)
at 
org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:317)
at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:223)
at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:802)
at java.lang.Thread.run(Thread.java:748)

> Failed to load image from FSImageFile when downgrade from 3.x to 2.x
> 
>
> Key: HDFS-14396
> URL: https://issues.apache.org/jira/browse/HDFS-14396
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rolling upgrades
>Reporter: Fei Hui
>Assignee: Fei Hui
>Priority: Blocker
> Fix For: 3.3.0, 3.2.1, 3.1.3
>
> Attachments: HDFS-14396.001.patch, HDFS-14396.002.patch
>
>
> After fixing HDFS-13596, try to downgrade from 3.x to 2.x. But namenode can't 
> start because exception occurs. The message follows
> {code:java}
> 2019-01-23 17:22:18,730 ERROR org.apache.hadoop.hdfs.server.namenode.FSImage: 
> Failed to load image from 
> FSImageFile(file=/data1/hadoopdata/hadoop-namenode/current/fsimage_0025310,
>  cpktTxId=00
> 25310)
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:243)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:179)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:226)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:885)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:869)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:742)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:673)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:290)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:998)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:700)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:612)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:672)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:839)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:823)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1517)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1583)
> 2019-01-23 17:22:19,023 WARN 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception 
> loading fsimage
> java.io.IOException: Failed to load FSImage file, see error(s) above for more 
> info.
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:688)
> at 
> 

[jira] [Comment Edited] (HDFS-15098) Add SM4 encryption method for HDFS

2020-07-09 Thread Vinayakumar B (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154413#comment-17154413
 ] 

Vinayakumar B edited comment on HDFS-15098 at 7/9/20, 11:19 AM:


Thanks [~zZtai] for the contribution

Overall changes looks good. Following are my comments. Please check.

 
 1. Adding this provider should be configurable. And update the document as 
required.
 As already mentioned by [~lindongdong] no need to add to JDK dirs. May be 
Issue descreption can be updated.

so, following addition of Provider needs to be done only if its configured.  
Because direct adding of {{BounctCatleProvider}} seems to change the existing 
default behavior in some cases. Ex: {{TestKeyShell#createInvalidKeySize()}} 
suppose to fail with keysize 56. But it passes when provider is BC. So it 
should be used only on user's demand. So making it configurable would be wise 
choise.
{code:java}
+  Security.addProvider(new BouncyCastleProvider());
{code}
In KeyProvider.java it can be added as below.
{code:java}
  String jceProvider = conf.get(HADOOP_SECURITY_CRYPTO_JCE_PROVIDER_KEY);
  if (BouncyCastleProvider.PROVIDER_NAME.equals(jceProvider)) {
Security.addProvider(new BouncyCastleProvider());
  }
{code}
In JceSm4CtrCryptoCodec.java should add on setConf() instead of constructor.
{code:java}
provider = conf.get(HADOOP_SECURITY_CRYPTO_JCE_PROVIDER_KEY,
BouncyCastleProvider.PROVIDER_NAME);
final String secureRandomAlg = conf.get(
HADOOP_SECURITY_JAVA_SECURE_RANDOM_ALGORITHM_KEY,
HADOOP_SECURITY_JAVA_SECURE_RANDOM_ALGORITHM_DEFAULT);
if (BouncyCastleProvider.PROVIDER_NAME.equals(provider)) {
  Security.addProvider(new BouncyCastleProvider());
}
{code}
2. With Above change, {{TestKeyShell#testInvalidKeySize()}} will not fail 
anymore, as BC provider will not be added by default. So  changes in  
{{TestKeyShell}} can be reverted.

3. In {{TestCryptoCodec.java}}
 Remove these lines from every test.
{code:java}
try {
  KeyGenerator keyGenerator = KeyGenerator.getInstance("SM4");
} catch (Exception e) {
  Assume.assumeTrue(false);
}
{code}
4. In {{TestCryptoCodec#testJceSm4CtrCryptoCodec}} change this config as below.
{code:java}
conf.set(HADOOP_SECURITY_CRYPTO_CODEC_CLASSES_SM4_CTR_NOPADDING_KEY,
JceSm4CtrCryptoCodec.class.getName());{code}
Uncomment following lines
{code:java}
//cryptoCodecTest(conf, seed, count,
//jceSm4CodecClass, opensslSm4CodecClass, iv);
{code}
{code:java}
//cryptoCodecTest(conf, seed, count,
//jceSm4CodecClass, opensslSm4CodecClass, iv);
{code}
5. Avoid import statements with * in all classes. import only required classes 
directly.

6. {{HdfsKMSUtil.getCryptoCodec()}} is not logging {{JceSm4CTRCodec}}. May be 
can log all classnames, when its not null without checking the instanceof ?

7. I can see lot of code is same between AES and SM4 codecs, except the 
classnames and algorithm names. May be refactoring would help to reduce the 
duplicate code.

8. I think in {{hdfs.proto}} SM4 enum value can be changed to 3 directly.
{code}enum CipherSuiteProto {
UNKNOWN = 1;
AES_CTR_NOPADDING = 2;
SM4_CTR_NOPADDING = 3;
}{code}

9. In {{OpenSecureRandom.c}} following functions' declarations and definitions 
can be kept within {{OPENSSL_VERSION_NUMBER < 0x1010L}} block. i.e.
  following fuctions should be used only when {{OPENSSL_VERSION_NUMBER < 
0x1010L}} is true:
  {code}
  static void locks_setup(void)
  static void locks_cleanup(void)
  static void pthreads_locking_callback(int mode, int type, char *file, int 
line)
  static unsigned long pthreads_thread_id(void)
  {code}


was (Author: vinayrpet):
Thanks [~zZtai] for the contribution

Overall changes looks good. Following are my comments. Please check.

 
 1. Adding this provider should be configurable. And update the document as 
required.
 As already mentioned by [~lindongdong] no need to add to JDK dirs. May be 
Issue descreption can be updated.

so, following addition of Provider needs to be done only if its configured.  
Because direct adding of {{BounctCatleProvider}} seems to change the existing 
default behavior in some cases. Ex: {{TestKeyShell#createInvalidKeySize()}} 
suppose to fail with keysize 56. But it passes when provider is BC. So it 
should be used only on user's demand. So making it configurable would be wise 
choise.
{code:java}
+  Security.addProvider(new BouncyCastleProvider());
{code}
In KeyProvider.java it can be added as below.
{code:java}
  String jceProvider = conf.get(HADOOP_SECURITY_CRYPTO_JCE_PROVIDER_KEY);
  if (BouncyCastleProvider.PROVIDER_NAME.equals(jceProvider)) {
Security.addProvider(new BouncyCastleProvider());
  }
{code}
In JceSm4CtrCryptoCodec.java should add on setConf() instead of constructor.

[jira] [Commented] (HDFS-15392) DistrbutedFileSystem#concat api can create large number of small blocks

2020-07-09 Thread jianghua zhu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154422#comment-17154422
 ] 

jianghua zhu commented on HDFS-15392:
-

[~weichiu] , I very much agree with your suggestion.

 

> DistrbutedFileSystem#concat api can create large number of small blocks
> ---
>
> Key: HDFS-15392
> URL: https://issues.apache.org/jira/browse/HDFS-15392
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Lokesh Jain
>Priority: Major
>
> DistrbutedFileSystem#concat moves blocks from source to target. If the api is 
> repeatedly used on small files it can create large number of small blocks in 
> the target file. The Jira aims to optimize the api to avoid the issue of 
> small blocks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13596) NN restart fails after RollingUpgrade from 2.x to 3.x

2020-07-09 Thread fengwu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154414#comment-17154414
 ] 

fengwu commented on HDFS-13596:
---

[~_ph] , At first I had the same view as you, when I saw this comment 
:[https://issues.apache.org/jira/browse/HDFS-13596?focusedCommentId=16911102=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16911102]

> NN restart fails after RollingUpgrade from 2.x to 3.x
> -
>
> Key: HDFS-13596
> URL: https://issues.apache.org/jira/browse/HDFS-13596
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Reporter: Hanisha Koneru
>Assignee: Fei Hui
>Priority: Blocker
> Fix For: 3.3.0, 3.2.1, 3.1.3
>
> Attachments: HDFS-13596.001.patch, HDFS-13596.002.patch, 
> HDFS-13596.003.patch, HDFS-13596.004.patch, HDFS-13596.005.patch, 
> HDFS-13596.006.patch, HDFS-13596.007.patch, HDFS-13596.008.patch, 
> HDFS-13596.009.patch, HDFS-13596.010.patch
>
>
> After rollingUpgrade NN from 2.x and 3.x, if the NN is restarted, it fails 
> while replaying edit logs.
>  * After NN is started with rollingUpgrade, the layoutVersion written to 
> editLogs (before finalizing the upgrade) is the pre-upgrade layout version 
> (so as to support downgrade).
>  * When writing transactions to log, NN writes as per the current layout 
> version. In 3.x, erasureCoding bits are added to the editLog transactions.
>  * So any edit log written after the upgrade and before finalizing the 
> upgrade will have the old layout version but the new format of transactions.
>  * When NN is restarted and the edit logs are replayed, the NN reads the old 
> layout version from the editLog file. When parsing the transactions, it 
> assumes that the transactions are also from the previous layout and hence 
> skips parsing the erasureCoding bits.
>  * This cascades into reading the wrong set of bits for other fields and 
> leads to NN shutting down.
> Sample error output:
> {code:java}
> java.lang.IllegalArgumentException: Invalid clientId - length is 0 expected 
> length 16
>  at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
>  at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:74)
>  at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:86)
>  at 
> org.apache.hadoop.ipc.RetryCache$CacheEntryWithPayload.(RetryCache.java:163)
>  at 
> org.apache.hadoop.ipc.RetryCache.addCacheEntryWithPayload(RetryCache.java:322)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheEntryWithPayload(FSNamesystem.java:960)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:397)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158)
>  at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:694)
>  at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:937)
>  at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:910)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1643)
>  at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1710)
> 2018-05-17 19:10:06,522 WARN 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception 
> loading fsimage
> java.io.IOException: java.lang.IllegalStateException: Cannot skip to less 
> than the current value (=16389), where newValue=16388
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.resetLastInodeId(FSDirectory.java:1945)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:298)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158)
>  at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086)
>  at 
> 

[jira] [Commented] (HDFS-15098) Add SM4 encryption method for HDFS

2020-07-09 Thread Vinayakumar B (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154413#comment-17154413
 ] 

Vinayakumar B commented on HDFS-15098:
--

Thanks [~zZtai] for the contribution

Overall changes looks good. Following are my comments. Please check.

 
 1. Adding this provider should be configurable. And update the document as 
required.
 As already mentioned by [~lindongdong] no need to add to JDK dirs. May be 
Issue descreption can be updated.

so, following addition of Provider needs to be done only if its configured.  
Because direct adding of {{BounctCatleProvider}} seems to change the existing 
default behavior in some cases. Ex: {{TestKeyShell#createInvalidKeySize()}} 
suppose to fail with keysize 56. But it passes when provider is BC. So it 
should be used only on user's demand. So making it configurable would be wise 
choise.
{code:java}
+  Security.addProvider(new BouncyCastleProvider());
{code}
In KeyProvider.java it can be added as below.
{code:java}
  String jceProvider = conf.get(HADOOP_SECURITY_CRYPTO_JCE_PROVIDER_KEY);
  if (BouncyCastleProvider.PROVIDER_NAME.equals(jceProvider)) {
Security.addProvider(new BouncyCastleProvider());
  }
{code}
In JceSm4CtrCryptoCodec.java should add on setConf() instead of constructor.
{code:java}
provider = conf.get(HADOOP_SECURITY_CRYPTO_JCE_PROVIDER_KEY,
BouncyCastleProvider.PROVIDER_NAME);
final String secureRandomAlg = conf.get(
HADOOP_SECURITY_JAVA_SECURE_RANDOM_ALGORITHM_KEY,
HADOOP_SECURITY_JAVA_SECURE_RANDOM_ALGORITHM_DEFAULT);
if (BouncyCastleProvider.PROVIDER_NAME.equals(provider)) {
  Security.addProvider(new BouncyCastleProvider());
}
{code}
2. With Above change, {{TestKeyShell#testInvalidKeySize()}} will not fail 
anymore, as BC provider will not be added by default. So  changes in  
{{TestKeyShell}} can be reverted.

3. In {{TestCryptoCodec.java}}
 Remove these lines from every test.
{code:java}
try {
  KeyGenerator keyGenerator = KeyGenerator.getInstance("SM4");
} catch (Exception e) {
  Assume.assumeTrue(false);
}
{code}
4. In {{TestCryptoCodec#testJceSm4CtrCryptoCodec}} change this config as below.
{code:java}
conf.set(HADOOP_SECURITY_CRYPTO_CODEC_CLASSES_SM4_CTR_NOPADDING_KEY,
JceSm4CtrCryptoCodec.class.getName());{code}
Uncomment following lines
{code:java}
//cryptoCodecTest(conf, seed, count,
//jceSm4CodecClass, opensslSm4CodecClass, iv);
{code}
{code:java}
//cryptoCodecTest(conf, seed, count,
//jceSm4CodecClass, opensslSm4CodecClass, iv);
{code}
5. Avoid import statements with * in all classes. import only required classes 
directly.

6. {{HdfsKMSUtil.getCryptoCodec()}} is not logging {{JceSm4CTRCodec}}. May be 
can log all classnames, when its not null without checking the instanceof ?

7. I can see lot of code is same between AES and SM4 codecs, except the 
classnames and algorithm names. May be refactoring would help to reduce the 
duplicate code.

> Add SM4 encryption method for HDFS
> --
>
> Key: HDFS-15098
> URL: https://issues.apache.org/jira/browse/HDFS-15098
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.4.0
>Reporter: liusheng
>Assignee: zZtai
>Priority: Major
>  Labels: sm4
> Attachments: HDFS-15098.001.patch, HDFS-15098.002.patch, 
> HDFS-15098.003.patch, HDFS-15098.004.patch, HDFS-15098.005.patch, 
> HDFS-15098.006.patch, HDFS-15098.007.patch, HDFS-15098.008.patch
>
>
> SM4 (formerly SMS4)is a block cipher used in the Chinese National Standard 
> for Wireless LAN WAPI (Wired Authentication and Privacy Infrastructure).
>  SM4 was a cipher proposed to for the IEEE 802.11i standard, but has so far 
> been rejected by ISO. One of the reasons for the rejection has been 
> opposition to the WAPI fast-track proposal by the IEEE. please see:
> [https://en.wikipedia.org/wiki/SM4_(cipher)]
>  
> *Use sm4 on hdfs as follows:*
> 1.download Bouncy Castle Crypto APIs from bouncycastle.org
> [https://bouncycastle.org/download/bcprov-ext-jdk15on-165.jar]
> 2.Configure JDK
> Place bcprov-ext-jdk15on-165.jar in $JAVA_HOME/jre/lib/ext directory,
> add "security.provider.10=org.bouncycastle.jce.provider.BouncyCastleProvider" 
> to $JAVA_HOME/jre/lib/security/java.security file
> 3.Configure Hadoop KMS
> 4.test HDFS sm4
> hadoop key create key1 -cipher 'SM4/CTR/NoPadding'
> hdfs dfs -mkdir /benchmarks
> hdfs crypto -createZone -keyName key1 -path /benchmarks
> *requires:*
> 1.openssl version >=1.1.1
> 2.configure Bouncy Castle Crypto on JDK



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: 

[jira] [Commented] (HDFS-13596) NN restart fails after RollingUpgrade from 2.x to 3.x

2020-07-09 Thread Yuriy Malygin (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154401#comment-17154401
 ] 

Yuriy Malygin commented on HDFS-13596:
--

[~fengwu99] downgrade is not possible between major versions, because will be 
changing layout version (in your case 57/56 layout versions)

Note from official documentation:
{quote}
A newer release is downgradable to the pre-upgrade release only if both the 
namenode layout version and the datanode layout version are not changed between 
these two releases.
{quote}
URL: 
https://hadoop.apache.org/docs/r3.1.3/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html#Downgrade


> NN restart fails after RollingUpgrade from 2.x to 3.x
> -
>
> Key: HDFS-13596
> URL: https://issues.apache.org/jira/browse/HDFS-13596
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Reporter: Hanisha Koneru
>Assignee: Fei Hui
>Priority: Blocker
> Fix For: 3.3.0, 3.2.1, 3.1.3
>
> Attachments: HDFS-13596.001.patch, HDFS-13596.002.patch, 
> HDFS-13596.003.patch, HDFS-13596.004.patch, HDFS-13596.005.patch, 
> HDFS-13596.006.patch, HDFS-13596.007.patch, HDFS-13596.008.patch, 
> HDFS-13596.009.patch, HDFS-13596.010.patch
>
>
> After rollingUpgrade NN from 2.x and 3.x, if the NN is restarted, it fails 
> while replaying edit logs.
>  * After NN is started with rollingUpgrade, the layoutVersion written to 
> editLogs (before finalizing the upgrade) is the pre-upgrade layout version 
> (so as to support downgrade).
>  * When writing transactions to log, NN writes as per the current layout 
> version. In 3.x, erasureCoding bits are added to the editLog transactions.
>  * So any edit log written after the upgrade and before finalizing the 
> upgrade will have the old layout version but the new format of transactions.
>  * When NN is restarted and the edit logs are replayed, the NN reads the old 
> layout version from the editLog file. When parsing the transactions, it 
> assumes that the transactions are also from the previous layout and hence 
> skips parsing the erasureCoding bits.
>  * This cascades into reading the wrong set of bits for other fields and 
> leads to NN shutting down.
> Sample error output:
> {code:java}
> java.lang.IllegalArgumentException: Invalid clientId - length is 0 expected 
> length 16
>  at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
>  at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:74)
>  at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:86)
>  at 
> org.apache.hadoop.ipc.RetryCache$CacheEntryWithPayload.(RetryCache.java:163)
>  at 
> org.apache.hadoop.ipc.RetryCache.addCacheEntryWithPayload(RetryCache.java:322)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheEntryWithPayload(FSNamesystem.java:960)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:397)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158)
>  at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:694)
>  at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:937)
>  at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:910)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1643)
>  at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1710)
> 2018-05-17 19:10:06,522 WARN 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception 
> loading fsimage
> java.io.IOException: java.lang.IllegalStateException: Cannot skip to less 
> than the current value (=16389), where newValue=16388
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.resetLastInodeId(FSDirectory.java:1945)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:298)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158)
>  at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888)
>  at 
> 

[jira] [Comment Edited] (HDFS-13596) NN restart fails after RollingUpgrade from 2.x to 3.x

2020-07-09 Thread fengwu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154336#comment-17154336
 ] 

fengwu edited comment on HDFS-13596 at 7/9/20, 9:12 AM:


[~_ph]   Thanks.

In your case, used rollback not downgrade .  Rollback will lost data,  I want  
a way to roll downgrade do you think it works?


was (Author: fengwu99):
[~_ph]   Thanks.

In your case, used rollback not downgrade .  Rollback will lost data,  I want  
a way to roll downgrade . 

> NN restart fails after RollingUpgrade from 2.x to 3.x
> -
>
> Key: HDFS-13596
> URL: https://issues.apache.org/jira/browse/HDFS-13596
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Reporter: Hanisha Koneru
>Assignee: Fei Hui
>Priority: Blocker
> Fix For: 3.3.0, 3.2.1, 3.1.3
>
> Attachments: HDFS-13596.001.patch, HDFS-13596.002.patch, 
> HDFS-13596.003.patch, HDFS-13596.004.patch, HDFS-13596.005.patch, 
> HDFS-13596.006.patch, HDFS-13596.007.patch, HDFS-13596.008.patch, 
> HDFS-13596.009.patch, HDFS-13596.010.patch
>
>
> After rollingUpgrade NN from 2.x and 3.x, if the NN is restarted, it fails 
> while replaying edit logs.
>  * After NN is started with rollingUpgrade, the layoutVersion written to 
> editLogs (before finalizing the upgrade) is the pre-upgrade layout version 
> (so as to support downgrade).
>  * When writing transactions to log, NN writes as per the current layout 
> version. In 3.x, erasureCoding bits are added to the editLog transactions.
>  * So any edit log written after the upgrade and before finalizing the 
> upgrade will have the old layout version but the new format of transactions.
>  * When NN is restarted and the edit logs are replayed, the NN reads the old 
> layout version from the editLog file. When parsing the transactions, it 
> assumes that the transactions are also from the previous layout and hence 
> skips parsing the erasureCoding bits.
>  * This cascades into reading the wrong set of bits for other fields and 
> leads to NN shutting down.
> Sample error output:
> {code:java}
> java.lang.IllegalArgumentException: Invalid clientId - length is 0 expected 
> length 16
>  at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
>  at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:74)
>  at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:86)
>  at 
> org.apache.hadoop.ipc.RetryCache$CacheEntryWithPayload.(RetryCache.java:163)
>  at 
> org.apache.hadoop.ipc.RetryCache.addCacheEntryWithPayload(RetryCache.java:322)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheEntryWithPayload(FSNamesystem.java:960)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:397)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158)
>  at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:694)
>  at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:937)
>  at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:910)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1643)
>  at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1710)
> 2018-05-17 19:10:06,522 WARN 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception 
> loading fsimage
> java.io.IOException: java.lang.IllegalStateException: Cannot skip to less 
> than the current value (=16389), where newValue=16388
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.resetLastInodeId(FSDirectory.java:1945)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:298)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158)
>  at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323)
>  at 
> 

[jira] [Commented] (HDFS-15444) mkdir should not create dir in fallback if the dir already in mount Path

2020-07-09 Thread jianghua zhu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154339#comment-17154339
 ] 

jianghua zhu commented on HDFS-15444:
-

[~umamaheswararao], can you tell me which version of the phenomenon happened?
Is there a more detailed scene?

 

> mkdir should not create dir in fallback if the dir already in mount Path
> 
>
> Key: HDFS-15444
> URL: https://issues.apache.org/jira/browse/HDFS-15444
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13596) NN restart fails after RollingUpgrade from 2.x to 3.x

2020-07-09 Thread fengwu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154336#comment-17154336
 ] 

fengwu commented on HDFS-13596:
---

[~_ph]   Thanks.

In your case, used rollback not downgrade .  Rollback will lost data,  I want  
a way to roll downgrade . 

> NN restart fails after RollingUpgrade from 2.x to 3.x
> -
>
> Key: HDFS-13596
> URL: https://issues.apache.org/jira/browse/HDFS-13596
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Reporter: Hanisha Koneru
>Assignee: Fei Hui
>Priority: Blocker
> Fix For: 3.3.0, 3.2.1, 3.1.3
>
> Attachments: HDFS-13596.001.patch, HDFS-13596.002.patch, 
> HDFS-13596.003.patch, HDFS-13596.004.patch, HDFS-13596.005.patch, 
> HDFS-13596.006.patch, HDFS-13596.007.patch, HDFS-13596.008.patch, 
> HDFS-13596.009.patch, HDFS-13596.010.patch
>
>
> After rollingUpgrade NN from 2.x and 3.x, if the NN is restarted, it fails 
> while replaying edit logs.
>  * After NN is started with rollingUpgrade, the layoutVersion written to 
> editLogs (before finalizing the upgrade) is the pre-upgrade layout version 
> (so as to support downgrade).
>  * When writing transactions to log, NN writes as per the current layout 
> version. In 3.x, erasureCoding bits are added to the editLog transactions.
>  * So any edit log written after the upgrade and before finalizing the 
> upgrade will have the old layout version but the new format of transactions.
>  * When NN is restarted and the edit logs are replayed, the NN reads the old 
> layout version from the editLog file. When parsing the transactions, it 
> assumes that the transactions are also from the previous layout and hence 
> skips parsing the erasureCoding bits.
>  * This cascades into reading the wrong set of bits for other fields and 
> leads to NN shutting down.
> Sample error output:
> {code:java}
> java.lang.IllegalArgumentException: Invalid clientId - length is 0 expected 
> length 16
>  at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
>  at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:74)
>  at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:86)
>  at 
> org.apache.hadoop.ipc.RetryCache$CacheEntryWithPayload.(RetryCache.java:163)
>  at 
> org.apache.hadoop.ipc.RetryCache.addCacheEntryWithPayload(RetryCache.java:322)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheEntryWithPayload(FSNamesystem.java:960)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:397)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158)
>  at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:694)
>  at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:937)
>  at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:910)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1643)
>  at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1710)
> 2018-05-17 19:10:06,522 WARN 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception 
> loading fsimage
> java.io.IOException: java.lang.IllegalStateException: Cannot skip to less 
> than the current value (=16389), where newValue=16388
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.resetLastInodeId(FSDirectory.java:1945)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:298)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158)
>  at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714)
>  at 
> 

[jira] [Commented] (HDFS-15452) Dynamically initialize the capacity of BlocksMap

2020-07-09 Thread jianghua zhu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154329#comment-17154329
 ] 

jianghua zhu commented on HDFS-15452:
-

OK.
Thank you very much for your suggestion, [~hexiaoqiao].

> Dynamically initialize the capacity of BlocksMap
> 
>
> Key: HDFS-15452
> URL: https://issues.apache.org/jira/browse/HDFS-15452
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.3
>Reporter: jianghua zhu
>Assignee: jianghua zhu
>Priority: Major
> Attachments: HDFS-15452.001.patch
>
>
> The default value for initializing BlocksMap in the BlockManager class is 2. 
> This can be set to a dynamic value.
> BlockManager#BlockManager() {
> ..
> // Compute the map capacity by allocating 2% of total memory
> blocksMap = new BlocksMap(
>  LightWeightGSet.computeCapacity(2.0, "BlocksMap"));
> ..
> }
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-13596) NN restart fails after RollingUpgrade from 2.x to 3.x

2020-07-09 Thread Yuriy Malygin (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154318#comment-17154318
 ] 

Yuriy Malygin edited comment on HDFS-13596 at 7/9/20, 8:21 AM:
---

[~fengwu99] yes, in my case 
(https://issues.apache.org/jira/browse/HDFS-13596?focusedCommentId=16826162=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16826162
 + 
https://issues.apache.org/jira/browse/HDFS-13596?focusedCommentId=16826939=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16826939)
 downgrade to 2.7.3 was successfully from 3.1.2 and 3.3.0-SNAPSHOT versions.


was (Author: _ph):
[~fengwu99] yes, in my case 
(https://issues.apache.org/jira/browse/HDFS-13596?focusedCommentId=16826162=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16826162)
 downgrade to 2.7.3 was successfully from 3.1.2 and 3.3.0-SNAPSHOT versions.

> NN restart fails after RollingUpgrade from 2.x to 3.x
> -
>
> Key: HDFS-13596
> URL: https://issues.apache.org/jira/browse/HDFS-13596
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Reporter: Hanisha Koneru
>Assignee: Fei Hui
>Priority: Blocker
> Fix For: 3.3.0, 3.2.1, 3.1.3
>
> Attachments: HDFS-13596.001.patch, HDFS-13596.002.patch, 
> HDFS-13596.003.patch, HDFS-13596.004.patch, HDFS-13596.005.patch, 
> HDFS-13596.006.patch, HDFS-13596.007.patch, HDFS-13596.008.patch, 
> HDFS-13596.009.patch, HDFS-13596.010.patch
>
>
> After rollingUpgrade NN from 2.x and 3.x, if the NN is restarted, it fails 
> while replaying edit logs.
>  * After NN is started with rollingUpgrade, the layoutVersion written to 
> editLogs (before finalizing the upgrade) is the pre-upgrade layout version 
> (so as to support downgrade).
>  * When writing transactions to log, NN writes as per the current layout 
> version. In 3.x, erasureCoding bits are added to the editLog transactions.
>  * So any edit log written after the upgrade and before finalizing the 
> upgrade will have the old layout version but the new format of transactions.
>  * When NN is restarted and the edit logs are replayed, the NN reads the old 
> layout version from the editLog file. When parsing the transactions, it 
> assumes that the transactions are also from the previous layout and hence 
> skips parsing the erasureCoding bits.
>  * This cascades into reading the wrong set of bits for other fields and 
> leads to NN shutting down.
> Sample error output:
> {code:java}
> java.lang.IllegalArgumentException: Invalid clientId - length is 0 expected 
> length 16
>  at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
>  at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:74)
>  at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:86)
>  at 
> org.apache.hadoop.ipc.RetryCache$CacheEntryWithPayload.(RetryCache.java:163)
>  at 
> org.apache.hadoop.ipc.RetryCache.addCacheEntryWithPayload(RetryCache.java:322)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheEntryWithPayload(FSNamesystem.java:960)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:397)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158)
>  at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:694)
>  at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:937)
>  at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:910)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1643)
>  at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1710)
> 2018-05-17 19:10:06,522 WARN 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception 
> loading fsimage
> java.io.IOException: java.lang.IllegalStateException: Cannot skip to less 
> than the current value (=16389), where newValue=16388
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.resetLastInodeId(FSDirectory.java:1945)
>  at 
> 

[jira] [Commented] (HDFS-13596) NN restart fails after RollingUpgrade from 2.x to 3.x

2020-07-09 Thread Yuriy Malygin (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154318#comment-17154318
 ] 

Yuriy Malygin commented on HDFS-13596:
--

[~fengwu99] yes, in my case 
(https://issues.apache.org/jira/browse/HDFS-13596?focusedCommentId=16826162=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16826162)
 downgrade to 2.7.3 was successfully from 3.1.2 and 3.3.0-SNAPSHOT versions.

> NN restart fails after RollingUpgrade from 2.x to 3.x
> -
>
> Key: HDFS-13596
> URL: https://issues.apache.org/jira/browse/HDFS-13596
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Reporter: Hanisha Koneru
>Assignee: Fei Hui
>Priority: Blocker
> Fix For: 3.3.0, 3.2.1, 3.1.3
>
> Attachments: HDFS-13596.001.patch, HDFS-13596.002.patch, 
> HDFS-13596.003.patch, HDFS-13596.004.patch, HDFS-13596.005.patch, 
> HDFS-13596.006.patch, HDFS-13596.007.patch, HDFS-13596.008.patch, 
> HDFS-13596.009.patch, HDFS-13596.010.patch
>
>
> After rollingUpgrade NN from 2.x and 3.x, if the NN is restarted, it fails 
> while replaying edit logs.
>  * After NN is started with rollingUpgrade, the layoutVersion written to 
> editLogs (before finalizing the upgrade) is the pre-upgrade layout version 
> (so as to support downgrade).
>  * When writing transactions to log, NN writes as per the current layout 
> version. In 3.x, erasureCoding bits are added to the editLog transactions.
>  * So any edit log written after the upgrade and before finalizing the 
> upgrade will have the old layout version but the new format of transactions.
>  * When NN is restarted and the edit logs are replayed, the NN reads the old 
> layout version from the editLog file. When parsing the transactions, it 
> assumes that the transactions are also from the previous layout and hence 
> skips parsing the erasureCoding bits.
>  * This cascades into reading the wrong set of bits for other fields and 
> leads to NN shutting down.
> Sample error output:
> {code:java}
> java.lang.IllegalArgumentException: Invalid clientId - length is 0 expected 
> length 16
>  at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
>  at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:74)
>  at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:86)
>  at 
> org.apache.hadoop.ipc.RetryCache$CacheEntryWithPayload.(RetryCache.java:163)
>  at 
> org.apache.hadoop.ipc.RetryCache.addCacheEntryWithPayload(RetryCache.java:322)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheEntryWithPayload(FSNamesystem.java:960)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:397)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158)
>  at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:694)
>  at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:937)
>  at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:910)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1643)
>  at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1710)
> 2018-05-17 19:10:06,522 WARN 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception 
> loading fsimage
> java.io.IOException: java.lang.IllegalStateException: Cannot skip to less 
> than the current value (=16389), where newValue=16388
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.resetLastInodeId(FSDirectory.java:1945)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:298)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158)
>  at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323)
>  at 
> 

[jira] [Commented] (HDFS-15452) Dynamically initialize the capacity of BlocksMap

2020-07-09 Thread Xiaoqiao He (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154314#comment-17154314
 ] 

Xiaoqiao He commented on HDFS-15452:


Thanks involve me here. Sorry I don't get what issues do you meet. IIUC, tuning 
the percentage of memory capacity will impact performance obviously (GSet 
collision ratio or heap occupy). I don't trace why this is 2% rather than other 
percentage since it is very historical logic.
About patch [^HDFS-15452.001.patch],
a. it is better to give value section, rather than any value will pass now.
b. does `INodeMap`, `RetryCache` and `cachedBlocks` also need to config?
I prefer to keep this static value, in my practice it runs well for over 600GB 
heap memory. I don't find any heap occupy cost or performance issue.

> Dynamically initialize the capacity of BlocksMap
> 
>
> Key: HDFS-15452
> URL: https://issues.apache.org/jira/browse/HDFS-15452
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.3
>Reporter: jianghua zhu
>Assignee: jianghua zhu
>Priority: Major
> Attachments: HDFS-15452.001.patch
>
>
> The default value for initializing BlocksMap in the BlockManager class is 2. 
> This can be set to a dynamic value.
> BlockManager#BlockManager() {
> ..
> // Compute the map capacity by allocating 2% of total memory
> blocksMap = new BlocksMap(
>  LightWeightGSet.computeCapacity(2.0, "BlocksMap"));
> ..
> }
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-13596) NN restart fails after RollingUpgrade from 2.x to 3.x

2020-07-09 Thread fengwu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154238#comment-17154238
 ] 

fengwu edited comment on HDFS-13596 at 7/9/20, 7:02 AM:


Hi, [~_ph] ! Yes,  downgrade before namenodes and not finalize upgrade.  

Have you tested downgrade to 2.x from 3.x ?  Datanode downgrade successfully ?


was (Author: fengwu99):
Hi, [~_ph] ! Yes,  downgrade before namenodes and not finalize upgrade.

> NN restart fails after RollingUpgrade from 2.x to 3.x
> -
>
> Key: HDFS-13596
> URL: https://issues.apache.org/jira/browse/HDFS-13596
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Reporter: Hanisha Koneru
>Assignee: Fei Hui
>Priority: Blocker
> Fix For: 3.3.0, 3.2.1, 3.1.3
>
> Attachments: HDFS-13596.001.patch, HDFS-13596.002.patch, 
> HDFS-13596.003.patch, HDFS-13596.004.patch, HDFS-13596.005.patch, 
> HDFS-13596.006.patch, HDFS-13596.007.patch, HDFS-13596.008.patch, 
> HDFS-13596.009.patch, HDFS-13596.010.patch
>
>
> After rollingUpgrade NN from 2.x and 3.x, if the NN is restarted, it fails 
> while replaying edit logs.
>  * After NN is started with rollingUpgrade, the layoutVersion written to 
> editLogs (before finalizing the upgrade) is the pre-upgrade layout version 
> (so as to support downgrade).
>  * When writing transactions to log, NN writes as per the current layout 
> version. In 3.x, erasureCoding bits are added to the editLog transactions.
>  * So any edit log written after the upgrade and before finalizing the 
> upgrade will have the old layout version but the new format of transactions.
>  * When NN is restarted and the edit logs are replayed, the NN reads the old 
> layout version from the editLog file. When parsing the transactions, it 
> assumes that the transactions are also from the previous layout and hence 
> skips parsing the erasureCoding bits.
>  * This cascades into reading the wrong set of bits for other fields and 
> leads to NN shutting down.
> Sample error output:
> {code:java}
> java.lang.IllegalArgumentException: Invalid clientId - length is 0 expected 
> length 16
>  at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
>  at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:74)
>  at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:86)
>  at 
> org.apache.hadoop.ipc.RetryCache$CacheEntryWithPayload.(RetryCache.java:163)
>  at 
> org.apache.hadoop.ipc.RetryCache.addCacheEntryWithPayload(RetryCache.java:322)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheEntryWithPayload(FSNamesystem.java:960)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:397)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158)
>  at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:694)
>  at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:937)
>  at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:910)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1643)
>  at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1710)
> 2018-05-17 19:10:06,522 WARN 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception 
> loading fsimage
> java.io.IOException: java.lang.IllegalStateException: Cannot skip to less 
> than the current value (=16389), where newValue=16388
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.resetLastInodeId(FSDirectory.java:1945)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:298)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158)
>  at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323)
>  at 
> 

[jira] [Commented] (HDFS-13596) NN restart fails after RollingUpgrade from 2.x to 3.x

2020-07-09 Thread fengwu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154238#comment-17154238
 ] 

fengwu commented on HDFS-13596:
---

Hi, [~_ph] ! Yes,  downgrade before namenodes and not finalize upgrade.

> NN restart fails after RollingUpgrade from 2.x to 3.x
> -
>
> Key: HDFS-13596
> URL: https://issues.apache.org/jira/browse/HDFS-13596
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Reporter: Hanisha Koneru
>Assignee: Fei Hui
>Priority: Blocker
> Fix For: 3.3.0, 3.2.1, 3.1.3
>
> Attachments: HDFS-13596.001.patch, HDFS-13596.002.patch, 
> HDFS-13596.003.patch, HDFS-13596.004.patch, HDFS-13596.005.patch, 
> HDFS-13596.006.patch, HDFS-13596.007.patch, HDFS-13596.008.patch, 
> HDFS-13596.009.patch, HDFS-13596.010.patch
>
>
> After rollingUpgrade NN from 2.x and 3.x, if the NN is restarted, it fails 
> while replaying edit logs.
>  * After NN is started with rollingUpgrade, the layoutVersion written to 
> editLogs (before finalizing the upgrade) is the pre-upgrade layout version 
> (so as to support downgrade).
>  * When writing transactions to log, NN writes as per the current layout 
> version. In 3.x, erasureCoding bits are added to the editLog transactions.
>  * So any edit log written after the upgrade and before finalizing the 
> upgrade will have the old layout version but the new format of transactions.
>  * When NN is restarted and the edit logs are replayed, the NN reads the old 
> layout version from the editLog file. When parsing the transactions, it 
> assumes that the transactions are also from the previous layout and hence 
> skips parsing the erasureCoding bits.
>  * This cascades into reading the wrong set of bits for other fields and 
> leads to NN shutting down.
> Sample error output:
> {code:java}
> java.lang.IllegalArgumentException: Invalid clientId - length is 0 expected 
> length 16
>  at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
>  at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:74)
>  at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:86)
>  at 
> org.apache.hadoop.ipc.RetryCache$CacheEntryWithPayload.(RetryCache.java:163)
>  at 
> org.apache.hadoop.ipc.RetryCache.addCacheEntryWithPayload(RetryCache.java:322)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheEntryWithPayload(FSNamesystem.java:960)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:397)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158)
>  at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:694)
>  at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:937)
>  at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:910)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1643)
>  at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1710)
> 2018-05-17 19:10:06,522 WARN 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception 
> loading fsimage
> java.io.IOException: java.lang.IllegalStateException: Cannot skip to less 
> than the current value (=16389), where newValue=16388
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.resetLastInodeId(FSDirectory.java:1945)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:298)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158)
>  at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714)
>  at 
> 

[jira] [Commented] (HDFS-13596) NN restart fails after RollingUpgrade from 2.x to 3.x

2020-07-09 Thread Yuriy Malygin (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154228#comment-17154228
 ] 

Yuriy Malygin commented on HDFS-13596:
--

Hi [~fengwu99]! Did you downgrade datanodes before namenodes?

> NN restart fails after RollingUpgrade from 2.x to 3.x
> -
>
> Key: HDFS-13596
> URL: https://issues.apache.org/jira/browse/HDFS-13596
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Reporter: Hanisha Koneru
>Assignee: Fei Hui
>Priority: Blocker
> Fix For: 3.3.0, 3.2.1, 3.1.3
>
> Attachments: HDFS-13596.001.patch, HDFS-13596.002.patch, 
> HDFS-13596.003.patch, HDFS-13596.004.patch, HDFS-13596.005.patch, 
> HDFS-13596.006.patch, HDFS-13596.007.patch, HDFS-13596.008.patch, 
> HDFS-13596.009.patch, HDFS-13596.010.patch
>
>
> After rollingUpgrade NN from 2.x and 3.x, if the NN is restarted, it fails 
> while replaying edit logs.
>  * After NN is started with rollingUpgrade, the layoutVersion written to 
> editLogs (before finalizing the upgrade) is the pre-upgrade layout version 
> (so as to support downgrade).
>  * When writing transactions to log, NN writes as per the current layout 
> version. In 3.x, erasureCoding bits are added to the editLog transactions.
>  * So any edit log written after the upgrade and before finalizing the 
> upgrade will have the old layout version but the new format of transactions.
>  * When NN is restarted and the edit logs are replayed, the NN reads the old 
> layout version from the editLog file. When parsing the transactions, it 
> assumes that the transactions are also from the previous layout and hence 
> skips parsing the erasureCoding bits.
>  * This cascades into reading the wrong set of bits for other fields and 
> leads to NN shutting down.
> Sample error output:
> {code:java}
> java.lang.IllegalArgumentException: Invalid clientId - length is 0 expected 
> length 16
>  at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
>  at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:74)
>  at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:86)
>  at 
> org.apache.hadoop.ipc.RetryCache$CacheEntryWithPayload.(RetryCache.java:163)
>  at 
> org.apache.hadoop.ipc.RetryCache.addCacheEntryWithPayload(RetryCache.java:322)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheEntryWithPayload(FSNamesystem.java:960)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:397)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158)
>  at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:694)
>  at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:937)
>  at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:910)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1643)
>  at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1710)
> 2018-05-17 19:10:06,522 WARN 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception 
> loading fsimage
> java.io.IOException: java.lang.IllegalStateException: Cannot skip to less 
> than the current value (=16389), where newValue=16388
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.resetLastInodeId(FSDirectory.java:1945)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:298)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158)
>  at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714)
>  at 
>