[jira] [Commented] (HDFS-14737) Writing data through the HDFS nfs3 service is very slow, and timeout occurs while mount directories on the nfs client

2020-12-15 Thread Daniel Howard (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249994#comment-17249994
 ] 

Daniel Howard commented on HDFS-14737:
--

On 3.2.1 I'm seeing the NFS server wedge up with messages like these:

{{2020-12-15 14:35:49,558 WARN org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Got 
overwrite [2661285888-2662334464) smaller than current offset 2792085880, drop 
the request.}}
{{2020-12-15 14:35:49,609 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
Perfect overwrite has same content, updating the mtime, then return success}}
{{2020-12-15 14:35:49,701 WARN org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Got 
overwrite [2662334464-2663383040) smaller than current offset 2792085880, drop 
the request.}}
{{2020-12-15 14:35:49,753 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
Perfect overwrite has same content, updating the mtime, then return success}}

> Writing data through the HDFS nfs3 service is very slow, and timeout occurs 
> while mount directories on the nfs client
> -
>
> Key: HDFS-14737
> URL: https://issues.apache.org/jira/browse/HDFS-14737
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.7.2
>Reporter: liying
>Assignee: liying
>Priority: Major
>
> We know the NFS Gateway supports NFSv3 and allows HDFS to be mounted as part 
> of the client’s local file system.  I start the portmap and nfs3server in the 
> hadoop node,and use the command (mount -t nfs -o vers=3,proto=tcp,nolock 
> nfs3serverIP:/ /hdfs) to mount. It works well。Then i ues this command to 
> mount on many client. And i used linux cp to copy files to the mounted 
> dir(/hdfs),found the speed was very slow . There is a lot of information in 
> the log as the follow:
> 2019-08-14 15:36:28,347 INFO org.apache.hadoop.security.ShellBasedIdMapping: 
> Can't map group supergroup. Use its string hashcode:-1710818332
> 2019-08-14 15:37:03,093 INFO org.apache.hadoop.security.ShellBasedIdMapping: 
> Can't map group supergroup. Use its string hashcode:-1710818332
> 2019-08-14 15:37:03,850 INFO org.apache.hadoop.security.ShellBasedIdMapping: 
> Can't map group supergroup. Use its string hashcode:-1710818332
> 2019-08-14 15:37:14,780 INFO org.apache.hadoop.security.ShellBasedIdMapping: 
> Can't map group supergroup. Use its string hashcode:-1710818332
> 2019-08-14 15:37:14,928 INFO org.apache.hadoop.security.ShellBasedIdMapping: 
> Can't map group supergroup. Use its string hashcode:-1710818332
> 2019-08-14 15:37:28,410 INFO org.apache.hadoop.security.ShellBasedIdMapping: 
> Can't map group supergroup. Use its string hashcode:-1710818332
> 2019-08-14 15:38:09,310 INFO org.apache.hadoop.security.ShellBasedIdMapping: 
> Can't map group supergroup. Use its string hashcode:-1710818332
> 2019-08-14 15:38:10,069 INFO org.apache.hadoop.security.ShellBasedIdMapping: 
> Can't map group supergroup. Use its string hashcode:-1710818332
> 2019-08-14 15:38:14,856 INFO org.apache.hadoop.security.ShellBasedIdMapping: 
> Can't map group supergroup. Use its string hashcode:-1710818332
> 2019-08-14 15:38:14,957 INFO org.apache.hadoop.security.ShellBasedIdMapping: 
> Can't map group supergroup. Use its string hashcode:-1710818332
> 2019-08-14 15:38:28,475 INFO org.apache.hadoop.security.ShellBasedIdMapping: 
> Can't map group supergroup. Use its string hashcode:-1710818332
> 2019-08-14 15:39:14,923 INFO org.apache.hadoop.security.ShellBasedIdMapping: 
> Can't map group supergroup. Use its string hashcode:-1710818332
> 2019-08-14 15:39:14,987 INFO org.apache.hadoop.security.ShellBasedIdMapping: 
> Can't map group supergroup. Use its string hashcode:-1710818332
> 2019-08-14 15:39:15,541 INFO org.apache.hadoop.security.ShellBasedIdMapping: 
> Can't map group supergroup. Use its string hashcode:-1710818332
> 2019-08-14 15:39:16,287 INFO org.apache.hadoop.security.ShellBasedIdMapping: 
> Can't map group supergroup. Use its string hashcode:-1710818332
> 2019-08-14 15:39:28,530 INFO org.apache.hadoop.security.ShellBasedIdMapping: 
> Can't map group supergroup. Use its string hashcode:-1710818332
> 2019-08-14 15:40:15,015 INFO org.apache.hadoop.security.ShellBasedIdMapping: 
> Can't map group supergroup. Use its string hashcode:-1710818332
> 2019-08-14 15:40:15,024 INFO org.apache.hadoop.security.ShellBasedIdMapping: 
> Can't map group supergroup. Use its string hashcode:-1710818332
> 2019-08-14 15:40:15,028 INFO org.apache.hadoop.security.ShellBasedIdMapping: 
> Can't map group supergroup. Use its string hashcode:-1710818332
> 2019-08-14 15:40:21,757 INFO org.apache.hadoop.security.ShellBasedIdMapping: 
> Can't map group supergroup. Use its string hashcode:-1710818332
> 2019-08-14 15:40:22,508 INFO 

[jira] [Issue Comment Deleted] (HDFS-14737) Writing data through the HDFS nfs3 service is very slow, and timeout occurs while mount directories on the nfs client

2020-12-15 Thread Daniel Howard (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Howard updated HDFS-14737:
-
Comment: was deleted

(was: On 3.2.1 I'm seeing the NFS server wedge up with messages like these:

{{2020-12-15 14:35:49,558 WARN org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Got 
overwrite [2661285888-2662334464) smaller than current offset 2792085880, drop 
the request.}}
{{2020-12-15 14:35:49,609 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
Perfect overwrite has same content, updating the mtime, then return success}}
{{2020-12-15 14:35:49,701 WARN org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Got 
overwrite [2662334464-2663383040) smaller than current offset 2792085880, drop 
the request.}}
{{2020-12-15 14:35:49,753 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
Perfect overwrite has same content, updating the mtime, then return success}})

> Writing data through the HDFS nfs3 service is very slow, and timeout occurs 
> while mount directories on the nfs client
> -
>
> Key: HDFS-14737
> URL: https://issues.apache.org/jira/browse/HDFS-14737
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.7.2
>Reporter: liying
>Assignee: liying
>Priority: Major
>
> We know the NFS Gateway supports NFSv3 and allows HDFS to be mounted as part 
> of the client’s local file system.  I start the portmap and nfs3server in the 
> hadoop node,and use the command (mount -t nfs -o vers=3,proto=tcp,nolock 
> nfs3serverIP:/ /hdfs) to mount. It works well。Then i ues this command to 
> mount on many client. And i used linux cp to copy files to the mounted 
> dir(/hdfs),found the speed was very slow . There is a lot of information in 
> the log as the follow:
> 2019-08-14 15:36:28,347 INFO org.apache.hadoop.security.ShellBasedIdMapping: 
> Can't map group supergroup. Use its string hashcode:-1710818332
> 2019-08-14 15:37:03,093 INFO org.apache.hadoop.security.ShellBasedIdMapping: 
> Can't map group supergroup. Use its string hashcode:-1710818332
> 2019-08-14 15:37:03,850 INFO org.apache.hadoop.security.ShellBasedIdMapping: 
> Can't map group supergroup. Use its string hashcode:-1710818332
> 2019-08-14 15:37:14,780 INFO org.apache.hadoop.security.ShellBasedIdMapping: 
> Can't map group supergroup. Use its string hashcode:-1710818332
> 2019-08-14 15:37:14,928 INFO org.apache.hadoop.security.ShellBasedIdMapping: 
> Can't map group supergroup. Use its string hashcode:-1710818332
> 2019-08-14 15:37:28,410 INFO org.apache.hadoop.security.ShellBasedIdMapping: 
> Can't map group supergroup. Use its string hashcode:-1710818332
> 2019-08-14 15:38:09,310 INFO org.apache.hadoop.security.ShellBasedIdMapping: 
> Can't map group supergroup. Use its string hashcode:-1710818332
> 2019-08-14 15:38:10,069 INFO org.apache.hadoop.security.ShellBasedIdMapping: 
> Can't map group supergroup. Use its string hashcode:-1710818332
> 2019-08-14 15:38:14,856 INFO org.apache.hadoop.security.ShellBasedIdMapping: 
> Can't map group supergroup. Use its string hashcode:-1710818332
> 2019-08-14 15:38:14,957 INFO org.apache.hadoop.security.ShellBasedIdMapping: 
> Can't map group supergroup. Use its string hashcode:-1710818332
> 2019-08-14 15:38:28,475 INFO org.apache.hadoop.security.ShellBasedIdMapping: 
> Can't map group supergroup. Use its string hashcode:-1710818332
> 2019-08-14 15:39:14,923 INFO org.apache.hadoop.security.ShellBasedIdMapping: 
> Can't map group supergroup. Use its string hashcode:-1710818332
> 2019-08-14 15:39:14,987 INFO org.apache.hadoop.security.ShellBasedIdMapping: 
> Can't map group supergroup. Use its string hashcode:-1710818332
> 2019-08-14 15:39:15,541 INFO org.apache.hadoop.security.ShellBasedIdMapping: 
> Can't map group supergroup. Use its string hashcode:-1710818332
> 2019-08-14 15:39:16,287 INFO org.apache.hadoop.security.ShellBasedIdMapping: 
> Can't map group supergroup. Use its string hashcode:-1710818332
> 2019-08-14 15:39:28,530 INFO org.apache.hadoop.security.ShellBasedIdMapping: 
> Can't map group supergroup. Use its string hashcode:-1710818332
> 2019-08-14 15:40:15,015 INFO org.apache.hadoop.security.ShellBasedIdMapping: 
> Can't map group supergroup. Use its string hashcode:-1710818332
> 2019-08-14 15:40:15,024 INFO org.apache.hadoop.security.ShellBasedIdMapping: 
> Can't map group supergroup. Use its string hashcode:-1710818332
> 2019-08-14 15:40:15,028 INFO org.apache.hadoop.security.ShellBasedIdMapping: 
> Can't map group supergroup. Use its string hashcode:-1710818332
> 2019-08-14 15:40:21,757 INFO org.apache.hadoop.security.ShellBasedIdMapping: 
> Can't map group supergroup. Use its string hashcode:-1710818332
> 2019-08-14 15:40:22,508 INFO org.apache.hadoop.security.ShellBasedIdMapping: 
> 

[jira] [Commented] (HDFS-12109) "fs" java.net.UnknownHostException when HA NameNode is used

2020-11-18 Thread Daniel Howard (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-12109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17235106#comment-17235106
 ] 

Daniel Howard commented on HDFS-12109:
--

PS: thank you, [~luigidifraia] for documenting this issue and [~surendrasingh] 
for the suggested fix. I am setting up HA right now and I committed the same 
error copy-paste {{dfs.client.failover.proxy.provider.mycluster}} into my 
configuration!

> "fs" java.net.UnknownHostException when HA NameNode is used
> ---
>
> Key: HDFS-12109
> URL: https://issues.apache.org/jira/browse/HDFS-12109
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fs
>Affects Versions: 2.8.0
> Environment: [hadoop@namenode01 ~]$ cat /etc/redhat-release
> CentOS Linux release 7.3.1611 (Core)
> [hadoop@namenode01 ~]$ uname -a
> Linux namenode01 3.10.0-514.10.2.el7.x86_64 #1 SMP Fri Mar 3 00:04:05 UTC 
> 2017 x86_64 x86_64 x86_64 GNU/Linux
> [hadoop@namenode01 ~]$ java -version
> java version "1.8.0_131"
> Java(TM) SE Runtime Environment (build 1.8.0_131-b11)
> Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)
>Reporter: Luigi Di Fraia
>Priority: Major
>
> After setting up an HA NameNode configuration, the following invocation of 
> "fs" fails:
> [hadoop@namenode01 ~]$ /usr/local/hadoop/bin/hdfs dfs -ls /
> -ls: java.net.UnknownHostException: saccluster
> It works if properties are defined as per below:
> /usr/local/hadoop/bin/hdfs dfs -Ddfs.nameservices=saccluster 
> -Ddfs.client.failover.proxy.provider.saccluster=org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
>  -Ddfs.ha.namenodes.saccluster=namenode01,namenode02 
> -Ddfs.namenode.rpc-address.saccluster.namenode01=namenode01:8020 
> -Ddfs.namenode.rpc-address.saccluster.namenode02=namenode02:8020 -ls /
> These properties are defined in /usr/local/hadoop/etc/hadoop/hdfs-site.xml as 
> per below:
> 
> dfs.nameservices
> saccluster
> 
> 
> dfs.ha.namenodes.saccluster
> namenode01,namenode02
> 
> 
> dfs.namenode.rpc-address.saccluster.namenode01
> namenode01:8020
> 
> 
> dfs.namenode.rpc-address.saccluster.namenode02
> namenode02:8020
> 
> 
> dfs.namenode.http-address.saccluster.namenode01
> namenode01:50070
> 
> 
> dfs.namenode.http-address.saccluster.namenode02
> namenode02:50070
> 
> 
> dfs.namenode.shared.edits.dir
> 
> qjournal://namenode01:8485;namenode02:8485;datanode01:8485/saccluster
> 
> 
> dfs.client.failover.proxy.provider.mycluster
> 
> org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
> 
> In /usr/local/hadoop/etc/hadoop/core-site.xml the default FS is defined as 
> per below:
> 
> fs.defaultFS
> hdfs://saccluster
> 
> In /usr/local/hadoop/etc/hadoop/hadoop-env.sh the following export is defined:
> export HADOOP_CONF_DIR="/usr/local/hadoop/etc/hadoop"
> Is "fs" trying to read these properties from somewhere else, such as a 
> separate client configuration file?
> Apologies if I am missing something obvious here.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-13082) cookieverf mismatch error over NFS gateway on Linux

2020-07-10 Thread Daniel Howard (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140885#comment-17140885
 ] 

Daniel Howard edited comment on HDFS-13082 at 7/10/20, 7:28 PM:


I am running into this as well on Ubuntu 20.04. I can confirm that setting 
*{{nfs.aix.compatibility.mode.enabled}}* to *{{true}}* resolves this problem.

For example:

{{0-15:58 djh@c24-03-06 ~> *ls /hadoop/wxxxs/data/*}}
 {{# No files listed}}
 {{0-16:01 djh@c24-03-06 ~> *touch /hadoop/wxxxs/data/foo*}}
 {{0-16:01 djh@c24-03-06 ~> *ls /hadoop/wxxxs/data/*}}
 {{foo packed-hbfs/ raw/ tmp/}}
 {{0-16:01 djh@c24-03-06 ~> *rm /hadoop/wxxxs/data/foo*}}
 {{0-16:01 djh@c24-03-06 ~> *ls /hadoop/wxxxs/data/*}}
 {{packed-hbfs/ raw/ tmp/}}

Writing to this directory forced the NFS server to return the correct directory 
contents.

I have a bunch of this in the log:

{{2020-06-19 16:01:35,281 ERROR org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3: 
cookieverf mismatch. request cookieverf: 1591897331315 dir cookieverf: 
1592428367587}}
 {{2020-06-19 16:01:35,287 ERROR 
org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3: cookieverf mismatch. request 
cookieverf: 1591897331315 dir cookieverf: 1592428367587}}
 {{2020-06-19 16:01:35,454 ERROR 
org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3: cookieverf mismatch. request 
cookieverf: 1591897331315 dir cookieverf: 1592428367587}}

If AIX compatibility is enabled, the log messages change FROM
 {{ERROR org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3: cookieverf 
mismatch.[...]}}
 TO
 {{WARN org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3: AIX compatibility mode 
enabled...}}

Judging by the code in {{RpcProgramNfs3.java}} if the {{cookieverf}} does not 
match a directory's {{mtime}}, then normally the Nfs3 server will return an 
error to the client. In AIX compatibility mode, the Nfs3 server instead logs a 
warning and then constructs the response it would have constructed had there 
been no {{cookieverf}} mismatch. What does this all mean? I don't know, but I 
am working to see if I can trigger an empty directory situation with the AIX 
compat enabled.


was (Author: dannyman):
I am running into this as well on Ubuntu 20.04. I am in the process of testing 
the AIX compatibility mode.

For example:

{{0-15:58 djh@c24-03-06 ~> *ls /hadoop/wxxxs/data/*}}
{{# No files listed}}
{{0-16:01 djh@c24-03-06 ~> *touch /hadoop/wxxxs/data/foo*}}
{{0-16:01 djh@c24-03-06 ~> *ls /hadoop/wxxxs/data/*}}
{{foo packed-hbfs/ raw/ tmp/}}
{{0-16:01 djh@c24-03-06 ~> *rm /hadoop/wxxxs/data/foo*}}
{{0-16:01 djh@c24-03-06 ~> *ls /hadoop/wxxxs/data/*}}
{{packed-hbfs/ raw/ tmp/}}

Writing to this directory forced the NFS server to return the correct directory 
contents.

I have a bunch of this in the log:

{{2020-06-19 16:01:35,281 ERROR org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3: 
cookieverf mismatch. request cookieverf: 1591897331315 dir cookieverf: 
1592428367587}}
{{2020-06-19 16:01:35,287 ERROR org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3: 
cookieverf mismatch. request cookieverf: 1591897331315 dir cookieverf: 
1592428367587}}
{{2020-06-19 16:01:35,454 ERROR org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3: 
cookieverf mismatch. request cookieverf: 1591897331315 dir cookieverf: 
1592428367587}}

If AIX compatibility is enabled, the log messages change FROM
{{ERROR org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3: cookieverf 
mismatch.[...]}}
TO
{{WARN org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3: AIX compatibility mode 
enabled...}}

Judging by the code in {{RpcProgramNfs3.java}} if the {{cookieverf}} does not 
match a directory's {{mtime}}, then normally the Nfs3 server will return an 
error to the client. In AIX compatibility mode, the Nfs3 server instead logs a 
warning and then constructs the response it would have constructed had there 
been no {{cookieverf}} mismatch. What does this all mean? I don't know, but I 
am working to see if I can trigger an empty directory situation with the AIX 
compat enabled.

> cookieverf mismatch error over NFS gateway on Linux
> ---
>
> Key: HDFS-13082
> URL: https://issues.apache.org/jira/browse/HDFS-13082
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.7.3
>Reporter: Dan Moraru
>Priority: Minor
>
> Running 'ls' on some directories over an HDFS-NFS gateway sometimes fails to 
> list the contents of those directories.  Running 'ls' on those same 
> directories mounted via FUSE works.  The NFS gateway logs errors like the 
> following:
> 2018-01-29 11:53:01,130 ERROR org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3: 
> cookieverf mismatch. request cookieverf: 1513390944415 dir cookieverf: 
> 1516920857335
> Reviewing 
> 

[jira] [Comment Edited] (HDFS-13082) cookieverf mismatch error over NFS gateway on Linux

2020-07-09 Thread Daniel Howard (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154055#comment-17154055
 ] 

Daniel Howard edited comment on HDFS-13082 at 7/9/20, 6:16 PM:
---

I found a comment that implies that Linux doesn't handle cookie verification:[1]
{quote}This discussion comes up pretty much every time someone writes a new NFS 
server and/or filesystem. The thing that neither RFC1813, RFC3530, or RFC5661 
have done is come up with sane semantics for how a NFS client is supposed to 
recover from the above scenario. What do I do with things like 
telldir()/seekdir() cookies? How do I recover my 'current position' in the 
readdir() stream?

IOW: how do I fake up POSIX semantics to the applications?

Until the recovery question is answered, the Linux client will continue to 
ignore the whole "cookie verifier" junk...
{quote}
[1]: 
[https://linuxlists.cc/l/17/linux-nfs/t/2933109/readdir_from_linux_nfs4_client_when_cookieverf_is_no_longer_valid]

Here is a reference to cookieverf being removed from an Android kernel:
 
[https://gitlab.incom.co/CM-Shield/android_kernel_nvidia_shieldtablet/commit/c3f52af3e03013db5237e339c817beaae5ec9e3a]


was (Author: dannyman):
I found a comment that implies that Linux doesn't handle cookie verification:[1]
bq. This discussion comes up pretty much every time someone writes a new NFS 
server and/or filesystem. The thing that neither RFC1813, RFC3530, or RFC5661 
have done is come up with sane semantics for how a NFS client is supposed to 
recover from the above scenario. What do I do with things like 
telldir()/seekdir() cookies? How do I recover my 'current position' in the 
readdir() stream?
bq. IOW: how do I fake up POSIX semantics to the applications?
bq. 
bq. Until the recovery question is answered, the Linux client will continue to 
ignore the whole "cookie verifier" junk...

[1]: 
https://linuxlists.cc/l/17/linux-nfs/t/2933109/readdir_from_linux_nfs4_client_when_cookieverf_is_no_longer_valid

Here is a reference to cookieverf being removed from an Android kernel:
https://gitlab.incom.co/CM-Shield/android_kernel_nvidia_shieldtablet/commit/c3f52af3e03013db5237e339c817beaae5ec9e3a

> cookieverf mismatch error over NFS gateway on Linux
> ---
>
> Key: HDFS-13082
> URL: https://issues.apache.org/jira/browse/HDFS-13082
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.7.3
>Reporter: Dan Moraru
>Priority: Minor
>
> Running 'ls' on some directories over an HDFS-NFS gateway sometimes fails to 
> list the contents of those directories.  Running 'ls' on those same 
> directories mounted via FUSE works.  The NFS gateway logs errors like the 
> following:
> 2018-01-29 11:53:01,130 ERROR org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3: 
> cookieverf mismatch. request cookieverf: 1513390944415 dir cookieverf: 
> 1516920857335
> Reviewing 
> hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/RpcProgramNfs3.java
>  suggested that  these errors can be avoided by setting 
> nfs.aix.compatibility.mode.enabled=true, and that is indeed the case.  The 
> documentation lists https://issues.apache.org/jira/browse/HDFS-6549 as a 
> known issue, but also goes on to say that "regular, non-AIX clients should 
> NOT enable AIX compatibility mode. The work-arounds implemented by AIX 
> compatibility mode effectively disable safeguards to ensure that listing of 
> directory contents via NFS returns consistent results, and that all data sent 
> to the NFS server can be assured to have been committed."   Server and client 
> is this case are one and the same, running Scientific Linux 7.4.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13082) cookieverf mismatch error over NFS gateway on Linux

2020-07-08 Thread Daniel Howard (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154055#comment-17154055
 ] 

Daniel Howard commented on HDFS-13082:
--

I found a comment that implies that Linux doesn't handle cookie verification:[1]
bq. This discussion comes up pretty much every time someone writes a new NFS 
server and/or filesystem. The thing that neither RFC1813, RFC3530, or RFC5661 
have done is come up with sane semantics for how a NFS client is supposed to 
recover from the above scenario. What do I do with things like 
telldir()/seekdir() cookies? How do I recover my 'current position' in the 
readdir() stream?
bq. IOW: how do I fake up POSIX semantics to the applications?
bq. 
bq. Until the recovery question is answered, the Linux client will continue to 
ignore the whole "cookie verifier" junk...

[1]: 
https://linuxlists.cc/l/17/linux-nfs/t/2933109/readdir_from_linux_nfs4_client_when_cookieverf_is_no_longer_valid

Here is a reference to cookieverf being removed from an Android kernel:
https://gitlab.incom.co/CM-Shield/android_kernel_nvidia_shieldtablet/commit/c3f52af3e03013db5237e339c817beaae5ec9e3a

> cookieverf mismatch error over NFS gateway on Linux
> ---
>
> Key: HDFS-13082
> URL: https://issues.apache.org/jira/browse/HDFS-13082
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.7.3
>Reporter: Dan Moraru
>Priority: Minor
>
> Running 'ls' on some directories over an HDFS-NFS gateway sometimes fails to 
> list the contents of those directories.  Running 'ls' on those same 
> directories mounted via FUSE works.  The NFS gateway logs errors like the 
> following:
> 2018-01-29 11:53:01,130 ERROR org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3: 
> cookieverf mismatch. request cookieverf: 1513390944415 dir cookieverf: 
> 1516920857335
> Reviewing 
> hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/RpcProgramNfs3.java
>  suggested that  these errors can be avoided by setting 
> nfs.aix.compatibility.mode.enabled=true, and that is indeed the case.  The 
> documentation lists https://issues.apache.org/jira/browse/HDFS-6549 as a 
> known issue, but also goes on to say that "regular, non-AIX clients should 
> NOT enable AIX compatibility mode. The work-arounds implemented by AIX 
> compatibility mode effectively disable safeguards to ensure that listing of 
> directory contents via NFS returns consistent results, and that all data sent 
> to the NFS server can be assured to have been committed."   Server and client 
> is this case are one and the same, running Scientific Linux 7.4.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-13082) cookieverf mismatch error over NFS gateway on Linux

2020-07-08 Thread Daniel Howard (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140885#comment-17140885
 ] 

Daniel Howard edited comment on HDFS-13082 at 7/8/20, 10:25 PM:


I am running into this as well on Ubuntu 20.04. I am in the process of testing 
the AIX compatibility mode.

For example:

{{0-15:58 djh@c24-03-06 ~> *ls /hadoop/wxxxs/data/*}}
{{# No files listed}}
{{0-16:01 djh@c24-03-06 ~> *touch /hadoop/wxxxs/data/foo*}}
{{0-16:01 djh@c24-03-06 ~> *ls /hadoop/wxxxs/data/*}}
{{foo packed-hbfs/ raw/ tmp/}}
{{0-16:01 djh@c24-03-06 ~> *rm /hadoop/wxxxs/data/foo*}}
{{0-16:01 djh@c24-03-06 ~> *ls /hadoop/wxxxs/data/*}}
{{packed-hbfs/ raw/ tmp/}}

Writing to this directory forced the NFS server to return the correct directory 
contents.

I have a bunch of this in the log:

{{2020-06-19 16:01:35,281 ERROR org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3: 
cookieverf mismatch. request cookieverf: 1591897331315 dir cookieverf: 
1592428367587}}
{{2020-06-19 16:01:35,287 ERROR org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3: 
cookieverf mismatch. request cookieverf: 1591897331315 dir cookieverf: 
1592428367587}}
{{2020-06-19 16:01:35,454 ERROR org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3: 
cookieverf mismatch. request cookieverf: 1591897331315 dir cookieverf: 
1592428367587}}

If AIX compatibility is enabled, the log messages change FROM
{{ERROR org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3: cookieverf 
mismatch.[...]}}
TO
{{WARN org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3: AIX compatibility mode 
enabled...}}

Judging by the code in {{RpcProgramNfs3.java}} if the {{cookieverf}} does not 
match a directory's {{mtime}}, then normally the Nfs3 server will return an 
error to the client. In AIX compatibility mode, the Nfs3 server instead logs a 
warning and then constructs the response it would have constructed had there 
been no {{cookieverf}} mismatch. What does this all mean? I don't know, but I 
am working to see if I can trigger an empty directory situation with the AIX 
compat enabled.


was (Author: dannyman):
I am running into this as well, but the AIX compatibility trick did not help.

For example:

{{0-15:58 djh@c24-03-06 ~> *ls /hadoop/wxxxs/data/*}}
{{# No files listed}}
{{0-16:01 djh@c24-03-06 ~> *touch /hadoop/wxxxs/data/foo*}}
{{0-16:01 djh@c24-03-06 ~> *ls /hadoop/wxxxs/data/*}}
{{foo packed-hbfs/ raw/ tmp/}}
{{0-16:01 djh@c24-03-06 ~> *rm /hadoop/wxxxs/data/foo*}}
{{0-16:01 djh@c24-03-06 ~> *ls /hadoop/wxxxs/data/*}}
{{packed-hbfs/ raw/ tmp/}}

Writing to this directory forced the NFS server to return the correct directory 
contents.

I have a bunch of this in the log:

{{2020-06-19 16:01:35,281 ERROR org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3: 
cookieverf mismatch. request cookieverf: 1591897331315 dir cookieverf: 
1592428367587}}
{{2020-06-19 16:01:35,287 ERROR org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3: 
cookieverf mismatch. request cookieverf: 1591897331315 dir cookieverf: 
1592428367587}}
{{2020-06-19 16:01:35,454 ERROR org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3: 
cookieverf mismatch. request cookieverf: 1591897331315 dir cookieverf: 
1592428367587}}

I am tempted to fiddle with _dfs.namenode.accesstime.precision_ but .. ?!

> cookieverf mismatch error over NFS gateway on Linux
> ---
>
> Key: HDFS-13082
> URL: https://issues.apache.org/jira/browse/HDFS-13082
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.7.3
>Reporter: Dan Moraru
>Priority: Minor
>
> Running 'ls' on some directories over an HDFS-NFS gateway sometimes fails to 
> list the contents of those directories.  Running 'ls' on those same 
> directories mounted via FUSE works.  The NFS gateway logs errors like the 
> following:
> 2018-01-29 11:53:01,130 ERROR org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3: 
> cookieverf mismatch. request cookieverf: 1513390944415 dir cookieverf: 
> 1516920857335
> Reviewing 
> hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/RpcProgramNfs3.java
>  suggested that  these errors can be avoided by setting 
> nfs.aix.compatibility.mode.enabled=true, and that is indeed the case.  The 
> documentation lists https://issues.apache.org/jira/browse/HDFS-6549 as a 
> known issue, but also goes on to say that "regular, non-AIX clients should 
> NOT enable AIX compatibility mode. The work-arounds implemented by AIX 
> compatibility mode effectively disable safeguards to ensure that listing of 
> directory contents via NFS returns consistent results, and that all data sent 
> to the NFS server can be assured to have been committed."   Server and client 
> is this case are one and the same, running Scientific Linux 7.4.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HDFS-13082) cookieverf mismatch error over NFS gateway on Linux

2020-06-19 Thread Daniel Howard (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140885#comment-17140885
 ] 

Daniel Howard commented on HDFS-13082:
--

I am running into this as well, but the AIX compatibility trick did not help.

For example:

{{0-15:58 djh@c24-03-06 ~> *ls /hadoop/wxxxs/data/*}}
{{# No files listed}}
{{0-16:01 djh@c24-03-06 ~> *touch /hadoop/wxxxs/data/foo*}}
{{0-16:01 djh@c24-03-06 ~> *ls /hadoop/wxxxs/data/*}}
{{foo packed-hbfs/ raw/ tmp/}}
{{0-16:01 djh@c24-03-06 ~> *rm /hadoop/wxxxs/data/foo*}}
{{0-16:01 djh@c24-03-06 ~> *ls /hadoop/wxxxs/data/*}}
{{packed-hbfs/ raw/ tmp/}}

Writing to this directory forced the NFS server to return the correct directory 
contents.

I have a bunch of this in the log:

{{2020-06-19 16:01:35,281 ERROR org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3: 
cookieverf mismatch. request cookieverf: 1591897331315 dir cookieverf: 
1592428367587}}
{{2020-06-19 16:01:35,287 ERROR org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3: 
cookieverf mismatch. request cookieverf: 1591897331315 dir cookieverf: 
1592428367587}}
{{2020-06-19 16:01:35,454 ERROR org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3: 
cookieverf mismatch. request cookieverf: 1591897331315 dir cookieverf: 
1592428367587}}

I am tempted to fiddle with _dfs.namenode.accesstime.precision_ but .. ?!

> cookieverf mismatch error over NFS gateway on Linux
> ---
>
> Key: HDFS-13082
> URL: https://issues.apache.org/jira/browse/HDFS-13082
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.7.3
>Reporter: Dan Moraru
>Priority: Minor
>
> Running 'ls' on some directories over an HDFS-NFS gateway sometimes fails to 
> list the contents of those directories.  Running 'ls' on those same 
> directories mounted via FUSE works.  The NFS gateway logs errors like the 
> following:
> 2018-01-29 11:53:01,130 ERROR org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3: 
> cookieverf mismatch. request cookieverf: 1513390944415 dir cookieverf: 
> 1516920857335
> Reviewing 
> hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/RpcProgramNfs3.java
>  suggested that  these errors can be avoided by setting 
> nfs.aix.compatibility.mode.enabled=true, and that is indeed the case.  The 
> documentation lists https://issues.apache.org/jira/browse/HDFS-6549 as a 
> known issue, but also goes on to say that "regular, non-AIX clients should 
> NOT enable AIX compatibility mode. The work-arounds implemented by AIX 
> compatibility mode effectively disable safeguards to ensure that listing of 
> directory contents via NFS returns consistent results, and that all data sent 
> to the NFS server can be assured to have been committed."   Server and client 
> is this case are one and the same, running Scientific Linux 7.4.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org