[jira] [Commented] (HDFS-14737) Writing data through the HDFS nfs3 service is very slow, and timeout occurs while mount directories on the nfs client
[ https://issues.apache.org/jira/browse/HDFS-14737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249994#comment-17249994 ] Daniel Howard commented on HDFS-14737: -- On 3.2.1 I'm seeing the NFS server wedge up with messages like these: {{2020-12-15 14:35:49,558 WARN org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Got overwrite [2661285888-2662334464) smaller than current offset 2792085880, drop the request.}} {{2020-12-15 14:35:49,609 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Perfect overwrite has same content, updating the mtime, then return success}} {{2020-12-15 14:35:49,701 WARN org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Got overwrite [2662334464-2663383040) smaller than current offset 2792085880, drop the request.}} {{2020-12-15 14:35:49,753 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Perfect overwrite has same content, updating the mtime, then return success}} > Writing data through the HDFS nfs3 service is very slow, and timeout occurs > while mount directories on the nfs client > - > > Key: HDFS-14737 > URL: https://issues.apache.org/jira/browse/HDFS-14737 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.7.2 >Reporter: liying >Assignee: liying >Priority: Major > > We know the NFS Gateway supports NFSv3 and allows HDFS to be mounted as part > of the client’s local file system. I start the portmap and nfs3server in the > hadoop node,and use the command (mount -t nfs -o vers=3,proto=tcp,nolock > nfs3serverIP:/ /hdfs) to mount. It works well。Then i ues this command to > mount on many client. And i used linux cp to copy files to the mounted > dir(/hdfs),found the speed was very slow . There is a lot of information in > the log as the follow: > 2019-08-14 15:36:28,347 INFO org.apache.hadoop.security.ShellBasedIdMapping: > Can't map group supergroup. Use its string hashcode:-1710818332 > 2019-08-14 15:37:03,093 INFO org.apache.hadoop.security.ShellBasedIdMapping: > Can't map group supergroup. Use its string hashcode:-1710818332 > 2019-08-14 15:37:03,850 INFO org.apache.hadoop.security.ShellBasedIdMapping: > Can't map group supergroup. Use its string hashcode:-1710818332 > 2019-08-14 15:37:14,780 INFO org.apache.hadoop.security.ShellBasedIdMapping: > Can't map group supergroup. Use its string hashcode:-1710818332 > 2019-08-14 15:37:14,928 INFO org.apache.hadoop.security.ShellBasedIdMapping: > Can't map group supergroup. Use its string hashcode:-1710818332 > 2019-08-14 15:37:28,410 INFO org.apache.hadoop.security.ShellBasedIdMapping: > Can't map group supergroup. Use its string hashcode:-1710818332 > 2019-08-14 15:38:09,310 INFO org.apache.hadoop.security.ShellBasedIdMapping: > Can't map group supergroup. Use its string hashcode:-1710818332 > 2019-08-14 15:38:10,069 INFO org.apache.hadoop.security.ShellBasedIdMapping: > Can't map group supergroup. Use its string hashcode:-1710818332 > 2019-08-14 15:38:14,856 INFO org.apache.hadoop.security.ShellBasedIdMapping: > Can't map group supergroup. Use its string hashcode:-1710818332 > 2019-08-14 15:38:14,957 INFO org.apache.hadoop.security.ShellBasedIdMapping: > Can't map group supergroup. Use its string hashcode:-1710818332 > 2019-08-14 15:38:28,475 INFO org.apache.hadoop.security.ShellBasedIdMapping: > Can't map group supergroup. Use its string hashcode:-1710818332 > 2019-08-14 15:39:14,923 INFO org.apache.hadoop.security.ShellBasedIdMapping: > Can't map group supergroup. Use its string hashcode:-1710818332 > 2019-08-14 15:39:14,987 INFO org.apache.hadoop.security.ShellBasedIdMapping: > Can't map group supergroup. Use its string hashcode:-1710818332 > 2019-08-14 15:39:15,541 INFO org.apache.hadoop.security.ShellBasedIdMapping: > Can't map group supergroup. Use its string hashcode:-1710818332 > 2019-08-14 15:39:16,287 INFO org.apache.hadoop.security.ShellBasedIdMapping: > Can't map group supergroup. Use its string hashcode:-1710818332 > 2019-08-14 15:39:28,530 INFO org.apache.hadoop.security.ShellBasedIdMapping: > Can't map group supergroup. Use its string hashcode:-1710818332 > 2019-08-14 15:40:15,015 INFO org.apache.hadoop.security.ShellBasedIdMapping: > Can't map group supergroup. Use its string hashcode:-1710818332 > 2019-08-14 15:40:15,024 INFO org.apache.hadoop.security.ShellBasedIdMapping: > Can't map group supergroup. Use its string hashcode:-1710818332 > 2019-08-14 15:40:15,028 INFO org.apache.hadoop.security.ShellBasedIdMapping: > Can't map group supergroup. Use its string hashcode:-1710818332 > 2019-08-14 15:40:21,757 INFO org.apache.hadoop.security.ShellBasedIdMapping: > Can't map group supergroup. Use its string hashcode:-1710818332 > 2019-08-14 15:40:22,508 INFO
[jira] [Issue Comment Deleted] (HDFS-14737) Writing data through the HDFS nfs3 service is very slow, and timeout occurs while mount directories on the nfs client
[ https://issues.apache.org/jira/browse/HDFS-14737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Howard updated HDFS-14737: - Comment: was deleted (was: On 3.2.1 I'm seeing the NFS server wedge up with messages like these: {{2020-12-15 14:35:49,558 WARN org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Got overwrite [2661285888-2662334464) smaller than current offset 2792085880, drop the request.}} {{2020-12-15 14:35:49,609 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Perfect overwrite has same content, updating the mtime, then return success}} {{2020-12-15 14:35:49,701 WARN org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Got overwrite [2662334464-2663383040) smaller than current offset 2792085880, drop the request.}} {{2020-12-15 14:35:49,753 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Perfect overwrite has same content, updating the mtime, then return success}}) > Writing data through the HDFS nfs3 service is very slow, and timeout occurs > while mount directories on the nfs client > - > > Key: HDFS-14737 > URL: https://issues.apache.org/jira/browse/HDFS-14737 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.7.2 >Reporter: liying >Assignee: liying >Priority: Major > > We know the NFS Gateway supports NFSv3 and allows HDFS to be mounted as part > of the client’s local file system. I start the portmap and nfs3server in the > hadoop node,and use the command (mount -t nfs -o vers=3,proto=tcp,nolock > nfs3serverIP:/ /hdfs) to mount. It works well。Then i ues this command to > mount on many client. And i used linux cp to copy files to the mounted > dir(/hdfs),found the speed was very slow . There is a lot of information in > the log as the follow: > 2019-08-14 15:36:28,347 INFO org.apache.hadoop.security.ShellBasedIdMapping: > Can't map group supergroup. Use its string hashcode:-1710818332 > 2019-08-14 15:37:03,093 INFO org.apache.hadoop.security.ShellBasedIdMapping: > Can't map group supergroup. Use its string hashcode:-1710818332 > 2019-08-14 15:37:03,850 INFO org.apache.hadoop.security.ShellBasedIdMapping: > Can't map group supergroup. Use its string hashcode:-1710818332 > 2019-08-14 15:37:14,780 INFO org.apache.hadoop.security.ShellBasedIdMapping: > Can't map group supergroup. Use its string hashcode:-1710818332 > 2019-08-14 15:37:14,928 INFO org.apache.hadoop.security.ShellBasedIdMapping: > Can't map group supergroup. Use its string hashcode:-1710818332 > 2019-08-14 15:37:28,410 INFO org.apache.hadoop.security.ShellBasedIdMapping: > Can't map group supergroup. Use its string hashcode:-1710818332 > 2019-08-14 15:38:09,310 INFO org.apache.hadoop.security.ShellBasedIdMapping: > Can't map group supergroup. Use its string hashcode:-1710818332 > 2019-08-14 15:38:10,069 INFO org.apache.hadoop.security.ShellBasedIdMapping: > Can't map group supergroup. Use its string hashcode:-1710818332 > 2019-08-14 15:38:14,856 INFO org.apache.hadoop.security.ShellBasedIdMapping: > Can't map group supergroup. Use its string hashcode:-1710818332 > 2019-08-14 15:38:14,957 INFO org.apache.hadoop.security.ShellBasedIdMapping: > Can't map group supergroup. Use its string hashcode:-1710818332 > 2019-08-14 15:38:28,475 INFO org.apache.hadoop.security.ShellBasedIdMapping: > Can't map group supergroup. Use its string hashcode:-1710818332 > 2019-08-14 15:39:14,923 INFO org.apache.hadoop.security.ShellBasedIdMapping: > Can't map group supergroup. Use its string hashcode:-1710818332 > 2019-08-14 15:39:14,987 INFO org.apache.hadoop.security.ShellBasedIdMapping: > Can't map group supergroup. Use its string hashcode:-1710818332 > 2019-08-14 15:39:15,541 INFO org.apache.hadoop.security.ShellBasedIdMapping: > Can't map group supergroup. Use its string hashcode:-1710818332 > 2019-08-14 15:39:16,287 INFO org.apache.hadoop.security.ShellBasedIdMapping: > Can't map group supergroup. Use its string hashcode:-1710818332 > 2019-08-14 15:39:28,530 INFO org.apache.hadoop.security.ShellBasedIdMapping: > Can't map group supergroup. Use its string hashcode:-1710818332 > 2019-08-14 15:40:15,015 INFO org.apache.hadoop.security.ShellBasedIdMapping: > Can't map group supergroup. Use its string hashcode:-1710818332 > 2019-08-14 15:40:15,024 INFO org.apache.hadoop.security.ShellBasedIdMapping: > Can't map group supergroup. Use its string hashcode:-1710818332 > 2019-08-14 15:40:15,028 INFO org.apache.hadoop.security.ShellBasedIdMapping: > Can't map group supergroup. Use its string hashcode:-1710818332 > 2019-08-14 15:40:21,757 INFO org.apache.hadoop.security.ShellBasedIdMapping: > Can't map group supergroup. Use its string hashcode:-1710818332 > 2019-08-14 15:40:22,508 INFO org.apache.hadoop.security.ShellBasedIdMapping: >
[jira] [Commented] (HDFS-12109) "fs" java.net.UnknownHostException when HA NameNode is used
[ https://issues.apache.org/jira/browse/HDFS-12109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17235106#comment-17235106 ] Daniel Howard commented on HDFS-12109: -- PS: thank you, [~luigidifraia] for documenting this issue and [~surendrasingh] for the suggested fix. I am setting up HA right now and I committed the same error copy-paste {{dfs.client.failover.proxy.provider.mycluster}} into my configuration! > "fs" java.net.UnknownHostException when HA NameNode is used > --- > > Key: HDFS-12109 > URL: https://issues.apache.org/jira/browse/HDFS-12109 > Project: Hadoop HDFS > Issue Type: Bug > Components: fs >Affects Versions: 2.8.0 > Environment: [hadoop@namenode01 ~]$ cat /etc/redhat-release > CentOS Linux release 7.3.1611 (Core) > [hadoop@namenode01 ~]$ uname -a > Linux namenode01 3.10.0-514.10.2.el7.x86_64 #1 SMP Fri Mar 3 00:04:05 UTC > 2017 x86_64 x86_64 x86_64 GNU/Linux > [hadoop@namenode01 ~]$ java -version > java version "1.8.0_131" > Java(TM) SE Runtime Environment (build 1.8.0_131-b11) > Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode) >Reporter: Luigi Di Fraia >Priority: Major > > After setting up an HA NameNode configuration, the following invocation of > "fs" fails: > [hadoop@namenode01 ~]$ /usr/local/hadoop/bin/hdfs dfs -ls / > -ls: java.net.UnknownHostException: saccluster > It works if properties are defined as per below: > /usr/local/hadoop/bin/hdfs dfs -Ddfs.nameservices=saccluster > -Ddfs.client.failover.proxy.provider.saccluster=org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider > -Ddfs.ha.namenodes.saccluster=namenode01,namenode02 > -Ddfs.namenode.rpc-address.saccluster.namenode01=namenode01:8020 > -Ddfs.namenode.rpc-address.saccluster.namenode02=namenode02:8020 -ls / > These properties are defined in /usr/local/hadoop/etc/hadoop/hdfs-site.xml as > per below: > > dfs.nameservices > saccluster > > > dfs.ha.namenodes.saccluster > namenode01,namenode02 > > > dfs.namenode.rpc-address.saccluster.namenode01 > namenode01:8020 > > > dfs.namenode.rpc-address.saccluster.namenode02 > namenode02:8020 > > > dfs.namenode.http-address.saccluster.namenode01 > namenode01:50070 > > > dfs.namenode.http-address.saccluster.namenode02 > namenode02:50070 > > > dfs.namenode.shared.edits.dir > > qjournal://namenode01:8485;namenode02:8485;datanode01:8485/saccluster > > > dfs.client.failover.proxy.provider.mycluster > > org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider > > In /usr/local/hadoop/etc/hadoop/core-site.xml the default FS is defined as > per below: > > fs.defaultFS > hdfs://saccluster > > In /usr/local/hadoop/etc/hadoop/hadoop-env.sh the following export is defined: > export HADOOP_CONF_DIR="/usr/local/hadoop/etc/hadoop" > Is "fs" trying to read these properties from somewhere else, such as a > separate client configuration file? > Apologies if I am missing something obvious here. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-13082) cookieverf mismatch error over NFS gateway on Linux
[ https://issues.apache.org/jira/browse/HDFS-13082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140885#comment-17140885 ] Daniel Howard edited comment on HDFS-13082 at 7/10/20, 7:28 PM: I am running into this as well on Ubuntu 20.04. I can confirm that setting *{{nfs.aix.compatibility.mode.enabled}}* to *{{true}}* resolves this problem. For example: {{0-15:58 djh@c24-03-06 ~> *ls /hadoop/wxxxs/data/*}} {{# No files listed}} {{0-16:01 djh@c24-03-06 ~> *touch /hadoop/wxxxs/data/foo*}} {{0-16:01 djh@c24-03-06 ~> *ls /hadoop/wxxxs/data/*}} {{foo packed-hbfs/ raw/ tmp/}} {{0-16:01 djh@c24-03-06 ~> *rm /hadoop/wxxxs/data/foo*}} {{0-16:01 djh@c24-03-06 ~> *ls /hadoop/wxxxs/data/*}} {{packed-hbfs/ raw/ tmp/}} Writing to this directory forced the NFS server to return the correct directory contents. I have a bunch of this in the log: {{2020-06-19 16:01:35,281 ERROR org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3: cookieverf mismatch. request cookieverf: 1591897331315 dir cookieverf: 1592428367587}} {{2020-06-19 16:01:35,287 ERROR org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3: cookieverf mismatch. request cookieverf: 1591897331315 dir cookieverf: 1592428367587}} {{2020-06-19 16:01:35,454 ERROR org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3: cookieverf mismatch. request cookieverf: 1591897331315 dir cookieverf: 1592428367587}} If AIX compatibility is enabled, the log messages change FROM {{ERROR org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3: cookieverf mismatch.[...]}} TO {{WARN org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3: AIX compatibility mode enabled...}} Judging by the code in {{RpcProgramNfs3.java}} if the {{cookieverf}} does not match a directory's {{mtime}}, then normally the Nfs3 server will return an error to the client. In AIX compatibility mode, the Nfs3 server instead logs a warning and then constructs the response it would have constructed had there been no {{cookieverf}} mismatch. What does this all mean? I don't know, but I am working to see if I can trigger an empty directory situation with the AIX compat enabled. was (Author: dannyman): I am running into this as well on Ubuntu 20.04. I am in the process of testing the AIX compatibility mode. For example: {{0-15:58 djh@c24-03-06 ~> *ls /hadoop/wxxxs/data/*}} {{# No files listed}} {{0-16:01 djh@c24-03-06 ~> *touch /hadoop/wxxxs/data/foo*}} {{0-16:01 djh@c24-03-06 ~> *ls /hadoop/wxxxs/data/*}} {{foo packed-hbfs/ raw/ tmp/}} {{0-16:01 djh@c24-03-06 ~> *rm /hadoop/wxxxs/data/foo*}} {{0-16:01 djh@c24-03-06 ~> *ls /hadoop/wxxxs/data/*}} {{packed-hbfs/ raw/ tmp/}} Writing to this directory forced the NFS server to return the correct directory contents. I have a bunch of this in the log: {{2020-06-19 16:01:35,281 ERROR org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3: cookieverf mismatch. request cookieverf: 1591897331315 dir cookieverf: 1592428367587}} {{2020-06-19 16:01:35,287 ERROR org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3: cookieverf mismatch. request cookieverf: 1591897331315 dir cookieverf: 1592428367587}} {{2020-06-19 16:01:35,454 ERROR org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3: cookieverf mismatch. request cookieverf: 1591897331315 dir cookieverf: 1592428367587}} If AIX compatibility is enabled, the log messages change FROM {{ERROR org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3: cookieverf mismatch.[...]}} TO {{WARN org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3: AIX compatibility mode enabled...}} Judging by the code in {{RpcProgramNfs3.java}} if the {{cookieverf}} does not match a directory's {{mtime}}, then normally the Nfs3 server will return an error to the client. In AIX compatibility mode, the Nfs3 server instead logs a warning and then constructs the response it would have constructed had there been no {{cookieverf}} mismatch. What does this all mean? I don't know, but I am working to see if I can trigger an empty directory situation with the AIX compat enabled. > cookieverf mismatch error over NFS gateway on Linux > --- > > Key: HDFS-13082 > URL: https://issues.apache.org/jira/browse/HDFS-13082 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.7.3 >Reporter: Dan Moraru >Priority: Minor > > Running 'ls' on some directories over an HDFS-NFS gateway sometimes fails to > list the contents of those directories. Running 'ls' on those same > directories mounted via FUSE works. The NFS gateway logs errors like the > following: > 2018-01-29 11:53:01,130 ERROR org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3: > cookieverf mismatch. request cookieverf: 1513390944415 dir cookieverf: > 1516920857335 > Reviewing >
[jira] [Comment Edited] (HDFS-13082) cookieverf mismatch error over NFS gateway on Linux
[ https://issues.apache.org/jira/browse/HDFS-13082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154055#comment-17154055 ] Daniel Howard edited comment on HDFS-13082 at 7/9/20, 6:16 PM: --- I found a comment that implies that Linux doesn't handle cookie verification:[1] {quote}This discussion comes up pretty much every time someone writes a new NFS server and/or filesystem. The thing that neither RFC1813, RFC3530, or RFC5661 have done is come up with sane semantics for how a NFS client is supposed to recover from the above scenario. What do I do with things like telldir()/seekdir() cookies? How do I recover my 'current position' in the readdir() stream? IOW: how do I fake up POSIX semantics to the applications? Until the recovery question is answered, the Linux client will continue to ignore the whole "cookie verifier" junk... {quote} [1]: [https://linuxlists.cc/l/17/linux-nfs/t/2933109/readdir_from_linux_nfs4_client_when_cookieverf_is_no_longer_valid] Here is a reference to cookieverf being removed from an Android kernel: [https://gitlab.incom.co/CM-Shield/android_kernel_nvidia_shieldtablet/commit/c3f52af3e03013db5237e339c817beaae5ec9e3a] was (Author: dannyman): I found a comment that implies that Linux doesn't handle cookie verification:[1] bq. This discussion comes up pretty much every time someone writes a new NFS server and/or filesystem. The thing that neither RFC1813, RFC3530, or RFC5661 have done is come up with sane semantics for how a NFS client is supposed to recover from the above scenario. What do I do with things like telldir()/seekdir() cookies? How do I recover my 'current position' in the readdir() stream? bq. IOW: how do I fake up POSIX semantics to the applications? bq. bq. Until the recovery question is answered, the Linux client will continue to ignore the whole "cookie verifier" junk... [1]: https://linuxlists.cc/l/17/linux-nfs/t/2933109/readdir_from_linux_nfs4_client_when_cookieverf_is_no_longer_valid Here is a reference to cookieverf being removed from an Android kernel: https://gitlab.incom.co/CM-Shield/android_kernel_nvidia_shieldtablet/commit/c3f52af3e03013db5237e339c817beaae5ec9e3a > cookieverf mismatch error over NFS gateway on Linux > --- > > Key: HDFS-13082 > URL: https://issues.apache.org/jira/browse/HDFS-13082 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.7.3 >Reporter: Dan Moraru >Priority: Minor > > Running 'ls' on some directories over an HDFS-NFS gateway sometimes fails to > list the contents of those directories. Running 'ls' on those same > directories mounted via FUSE works. The NFS gateway logs errors like the > following: > 2018-01-29 11:53:01,130 ERROR org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3: > cookieverf mismatch. request cookieverf: 1513390944415 dir cookieverf: > 1516920857335 > Reviewing > hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/RpcProgramNfs3.java > suggested that these errors can be avoided by setting > nfs.aix.compatibility.mode.enabled=true, and that is indeed the case. The > documentation lists https://issues.apache.org/jira/browse/HDFS-6549 as a > known issue, but also goes on to say that "regular, non-AIX clients should > NOT enable AIX compatibility mode. The work-arounds implemented by AIX > compatibility mode effectively disable safeguards to ensure that listing of > directory contents via NFS returns consistent results, and that all data sent > to the NFS server can be assured to have been committed." Server and client > is this case are one and the same, running Scientific Linux 7.4. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13082) cookieverf mismatch error over NFS gateway on Linux
[ https://issues.apache.org/jira/browse/HDFS-13082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154055#comment-17154055 ] Daniel Howard commented on HDFS-13082: -- I found a comment that implies that Linux doesn't handle cookie verification:[1] bq. This discussion comes up pretty much every time someone writes a new NFS server and/or filesystem. The thing that neither RFC1813, RFC3530, or RFC5661 have done is come up with sane semantics for how a NFS client is supposed to recover from the above scenario. What do I do with things like telldir()/seekdir() cookies? How do I recover my 'current position' in the readdir() stream? bq. IOW: how do I fake up POSIX semantics to the applications? bq. bq. Until the recovery question is answered, the Linux client will continue to ignore the whole "cookie verifier" junk... [1]: https://linuxlists.cc/l/17/linux-nfs/t/2933109/readdir_from_linux_nfs4_client_when_cookieverf_is_no_longer_valid Here is a reference to cookieverf being removed from an Android kernel: https://gitlab.incom.co/CM-Shield/android_kernel_nvidia_shieldtablet/commit/c3f52af3e03013db5237e339c817beaae5ec9e3a > cookieverf mismatch error over NFS gateway on Linux > --- > > Key: HDFS-13082 > URL: https://issues.apache.org/jira/browse/HDFS-13082 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.7.3 >Reporter: Dan Moraru >Priority: Minor > > Running 'ls' on some directories over an HDFS-NFS gateway sometimes fails to > list the contents of those directories. Running 'ls' on those same > directories mounted via FUSE works. The NFS gateway logs errors like the > following: > 2018-01-29 11:53:01,130 ERROR org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3: > cookieverf mismatch. request cookieverf: 1513390944415 dir cookieverf: > 1516920857335 > Reviewing > hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/RpcProgramNfs3.java > suggested that these errors can be avoided by setting > nfs.aix.compatibility.mode.enabled=true, and that is indeed the case. The > documentation lists https://issues.apache.org/jira/browse/HDFS-6549 as a > known issue, but also goes on to say that "regular, non-AIX clients should > NOT enable AIX compatibility mode. The work-arounds implemented by AIX > compatibility mode effectively disable safeguards to ensure that listing of > directory contents via NFS returns consistent results, and that all data sent > to the NFS server can be assured to have been committed." Server and client > is this case are one and the same, running Scientific Linux 7.4. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-13082) cookieverf mismatch error over NFS gateway on Linux
[ https://issues.apache.org/jira/browse/HDFS-13082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140885#comment-17140885 ] Daniel Howard edited comment on HDFS-13082 at 7/8/20, 10:25 PM: I am running into this as well on Ubuntu 20.04. I am in the process of testing the AIX compatibility mode. For example: {{0-15:58 djh@c24-03-06 ~> *ls /hadoop/wxxxs/data/*}} {{# No files listed}} {{0-16:01 djh@c24-03-06 ~> *touch /hadoop/wxxxs/data/foo*}} {{0-16:01 djh@c24-03-06 ~> *ls /hadoop/wxxxs/data/*}} {{foo packed-hbfs/ raw/ tmp/}} {{0-16:01 djh@c24-03-06 ~> *rm /hadoop/wxxxs/data/foo*}} {{0-16:01 djh@c24-03-06 ~> *ls /hadoop/wxxxs/data/*}} {{packed-hbfs/ raw/ tmp/}} Writing to this directory forced the NFS server to return the correct directory contents. I have a bunch of this in the log: {{2020-06-19 16:01:35,281 ERROR org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3: cookieverf mismatch. request cookieverf: 1591897331315 dir cookieverf: 1592428367587}} {{2020-06-19 16:01:35,287 ERROR org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3: cookieverf mismatch. request cookieverf: 1591897331315 dir cookieverf: 1592428367587}} {{2020-06-19 16:01:35,454 ERROR org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3: cookieverf mismatch. request cookieverf: 1591897331315 dir cookieverf: 1592428367587}} If AIX compatibility is enabled, the log messages change FROM {{ERROR org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3: cookieverf mismatch.[...]}} TO {{WARN org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3: AIX compatibility mode enabled...}} Judging by the code in {{RpcProgramNfs3.java}} if the {{cookieverf}} does not match a directory's {{mtime}}, then normally the Nfs3 server will return an error to the client. In AIX compatibility mode, the Nfs3 server instead logs a warning and then constructs the response it would have constructed had there been no {{cookieverf}} mismatch. What does this all mean? I don't know, but I am working to see if I can trigger an empty directory situation with the AIX compat enabled. was (Author: dannyman): I am running into this as well, but the AIX compatibility trick did not help. For example: {{0-15:58 djh@c24-03-06 ~> *ls /hadoop/wxxxs/data/*}} {{# No files listed}} {{0-16:01 djh@c24-03-06 ~> *touch /hadoop/wxxxs/data/foo*}} {{0-16:01 djh@c24-03-06 ~> *ls /hadoop/wxxxs/data/*}} {{foo packed-hbfs/ raw/ tmp/}} {{0-16:01 djh@c24-03-06 ~> *rm /hadoop/wxxxs/data/foo*}} {{0-16:01 djh@c24-03-06 ~> *ls /hadoop/wxxxs/data/*}} {{packed-hbfs/ raw/ tmp/}} Writing to this directory forced the NFS server to return the correct directory contents. I have a bunch of this in the log: {{2020-06-19 16:01:35,281 ERROR org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3: cookieverf mismatch. request cookieverf: 1591897331315 dir cookieverf: 1592428367587}} {{2020-06-19 16:01:35,287 ERROR org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3: cookieverf mismatch. request cookieverf: 1591897331315 dir cookieverf: 1592428367587}} {{2020-06-19 16:01:35,454 ERROR org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3: cookieverf mismatch. request cookieverf: 1591897331315 dir cookieverf: 1592428367587}} I am tempted to fiddle with _dfs.namenode.accesstime.precision_ but .. ?! > cookieverf mismatch error over NFS gateway on Linux > --- > > Key: HDFS-13082 > URL: https://issues.apache.org/jira/browse/HDFS-13082 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.7.3 >Reporter: Dan Moraru >Priority: Minor > > Running 'ls' on some directories over an HDFS-NFS gateway sometimes fails to > list the contents of those directories. Running 'ls' on those same > directories mounted via FUSE works. The NFS gateway logs errors like the > following: > 2018-01-29 11:53:01,130 ERROR org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3: > cookieverf mismatch. request cookieverf: 1513390944415 dir cookieverf: > 1516920857335 > Reviewing > hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/RpcProgramNfs3.java > suggested that these errors can be avoided by setting > nfs.aix.compatibility.mode.enabled=true, and that is indeed the case. The > documentation lists https://issues.apache.org/jira/browse/HDFS-6549 as a > known issue, but also goes on to say that "regular, non-AIX clients should > NOT enable AIX compatibility mode. The work-arounds implemented by AIX > compatibility mode effectively disable safeguards to ensure that listing of > directory contents via NFS returns consistent results, and that all data sent > to the NFS server can be assured to have been committed." Server and client > is this case are one and the same, running Scientific Linux 7.4. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HDFS-13082) cookieverf mismatch error over NFS gateway on Linux
[ https://issues.apache.org/jira/browse/HDFS-13082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140885#comment-17140885 ] Daniel Howard commented on HDFS-13082: -- I am running into this as well, but the AIX compatibility trick did not help. For example: {{0-15:58 djh@c24-03-06 ~> *ls /hadoop/wxxxs/data/*}} {{# No files listed}} {{0-16:01 djh@c24-03-06 ~> *touch /hadoop/wxxxs/data/foo*}} {{0-16:01 djh@c24-03-06 ~> *ls /hadoop/wxxxs/data/*}} {{foo packed-hbfs/ raw/ tmp/}} {{0-16:01 djh@c24-03-06 ~> *rm /hadoop/wxxxs/data/foo*}} {{0-16:01 djh@c24-03-06 ~> *ls /hadoop/wxxxs/data/*}} {{packed-hbfs/ raw/ tmp/}} Writing to this directory forced the NFS server to return the correct directory contents. I have a bunch of this in the log: {{2020-06-19 16:01:35,281 ERROR org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3: cookieverf mismatch. request cookieverf: 1591897331315 dir cookieverf: 1592428367587}} {{2020-06-19 16:01:35,287 ERROR org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3: cookieverf mismatch. request cookieverf: 1591897331315 dir cookieverf: 1592428367587}} {{2020-06-19 16:01:35,454 ERROR org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3: cookieverf mismatch. request cookieverf: 1591897331315 dir cookieverf: 1592428367587}} I am tempted to fiddle with _dfs.namenode.accesstime.precision_ but .. ?! > cookieverf mismatch error over NFS gateway on Linux > --- > > Key: HDFS-13082 > URL: https://issues.apache.org/jira/browse/HDFS-13082 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.7.3 >Reporter: Dan Moraru >Priority: Minor > > Running 'ls' on some directories over an HDFS-NFS gateway sometimes fails to > list the contents of those directories. Running 'ls' on those same > directories mounted via FUSE works. The NFS gateway logs errors like the > following: > 2018-01-29 11:53:01,130 ERROR org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3: > cookieverf mismatch. request cookieverf: 1513390944415 dir cookieverf: > 1516920857335 > Reviewing > hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/RpcProgramNfs3.java > suggested that these errors can be avoided by setting > nfs.aix.compatibility.mode.enabled=true, and that is indeed the case. The > documentation lists https://issues.apache.org/jira/browse/HDFS-6549 as a > known issue, but also goes on to say that "regular, non-AIX clients should > NOT enable AIX compatibility mode. The work-arounds implemented by AIX > compatibility mode effectively disable safeguards to ensure that listing of > directory contents via NFS returns consistent results, and that all data sent > to the NFS server can be assured to have been committed." Server and client > is this case are one and the same, running Scientific Linux 7.4. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org