[Kernel-packages] [Bug 2042363] Re: AIX 7.3 NFS client frequently returns an EIO error to an application when reading or writing to a file that has been locked with fcntl() on a Ubuntu 20.04 NFSV4 ser

2024-04-25 Thread GuoqingJiang
Per comment#23, the ip from AIX 7.2 client are:

9.20.120.127 name = adia.v6.hursley.ibm.com -- Primary
9.20.121.46 name = amberjack.v6.hursley.ibm.com ? Partner


And I searched the trace again with above ips, looks socket cc6f0db2 is 
created between 9.20.120.127 and nfs server, however it can also return EAGAIN.

duckseason kernel: [13254.724411] svc: socket cc6f0db2 
sendto([8485f39d 72... ], 72) = 72 (addr 9.20.120.127, port=1022)
...
duckseason kernel: [13254.724734] svc: socket cc6f0db2(inet 
c831762e), busy=0
duckseason kernel: [13254.724759] svc: server 728e82a2, pool 0, 
transport cc6f0db2, inuse=2
duckseason kernel: [13254.724761] svc: tcp_recv cc6f0db2 data 1 conn 0 
close 0
duckseason kernel: [13254.724765] svc: socket cc6f0db2 
recvfrom(b6708704, 4) = 4
duckseason kernel: [13254.724766] svc: TCP record, 168 bytes
duckseason kernel: [13254.724769] svc: socket cc6f0db2 
recvfrom(57dbced3, 4096) = 168
duckseason kernel: [13254.724771] svc: TCP final record (168 bytes)
duckseason kernel: [13254.724775] svc: svc_authenticate (1)
duckseason kernel: [13254.724779] svc: server ee62a401, pool 0, 
transport cc6f0db2, inuse=3
duckseason kernel: [13254.724780] svc: tcp_recv cc6f0db2 data 1 conn 0 
close 0
duckseason kernel: [13254.724783] svc: socket cc6f0db2 
recvfrom(b6708704, 4) = -11

And it is same for socket 3497acd5 which is used between
9.20.121.46 and nfs server.

duckseason kernel: [13254.802249] svc: socket 3497acd5 
sendto([86e5a045 72... ], 72) = 72 (addr 9.20.121.46, port=1020)
...
duckseason kernel: [13254.802533] svc: socket 3497acd5(inet 
72c9551d), busy=0
duckseason kernel: [13254.802571] svc: server 728e82a2, pool 0, 
transport 3497acd5, inuse=2
duckseason kernel: [13254.802573] svc: tcp_recv 3497acd5 data 1 conn 0 
close 0
duckseason kernel: [13254.802578] svc: socket 3497acd5 
recvfrom(77f9cf7c, 4) = 4
duckseason kernel: [13254.802579] svc: TCP record, 164 bytes
duckseason kernel: [13254.802583] svc: socket 3497acd5 
recvfrom(57dbced3, 4096) = 164
duckseason kernel: [13254.802585] svc: TCP final record (164 bytes)
duckseason kernel: [13254.802590] svc: svc_authenticate (1)
duckseason kernel: [13254.802596] svc: server ee62a401, pool 0, 
transport 3497acd5, inuse=3
duckseason kernel: [13254.802597] svc: tcp_recv 3497acd5 data 1 conn 0 
close 0
duckseason kernel: [13254.802599] svc: socket 3497acd5 
recvfrom(77f9cf7c, 4) = -11 

But since aix 7.2 client can work with the same server according to bug
description, I am curious why 7.2 client also return EAGAIN which is
same as 7.3 client, what am I missing?

Some questions/suggestion:

1. Did aix 7.3 nfs client work with previous kernel? If so, run "git bisect" to 
find which commit caused the issue.
2. Is it possible to try with latest 5.4 stable kernel as suggested in 
comment#1? Also try latest upstream kernel (6.9-rc5 at this time) as well.
3. Does increase lease time make difference?

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2042363

Title:
  AIX 7.3 NFS client frequently returns an EIO error to an application
  when reading or writing to a file that has been locked with fcntl() on
  a Ubuntu 20.04 NFSV4 server

Status in linux package in Ubuntu:
  New

Bug description:
  ---Problem Description---
  AIX 7.3 NFS client frequently returns an EIO error to an application when 
reading or writing to a file that has been locked with fcntl(). NFS server is 
Ubuntu 20.04.6 LTS, GNU/Linux 5.4.0-139-generic x86_64. The problem does not 
appear to affect other combinations of NFS client (including AIX 7.2) with this 
NFS server.

  The AIX team have indicated that the cause of the EIO is triggered by the NFS 
server returning a BAD_SEQID error which leads to the AIX NFS client 
incorrectly zeroing the stateid, which then leads to the NFS server returning a 
BAD_STATEID error and the NFS client then returns the EIO error. The AIX team 
would like to understand why the BAD_SEQID has been returned.
   
  ---uname output---
  Linux duckseason 5.4.0-156-generic #173-Ubuntu SMP Tue Jul 11 07:25:22 UTC 
2023 x86_64 x86_64 x86_64 GNU/Linux
   
  Machine Type = VMware ESXi Server 7.0 4 x Intel(R) Xeon(R) Gold 6348H CPU @ 
2.30GHz  

  ---Steps to Reproduce---
   We cannot offer a simple way to recreate the problem as it involves IBM MQ 
running on two primary machines (AIX) using the Ubuntu server for it's HA NFSv4 
storage.

  However, we can provide any requested trace or dumps from any or all
  of the involved machines.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2042363/+subscriptions


-- 
Mailing list: 

[Kernel-packages] [Bug 2042363] Re: AIX 7.3 NFS client frequently returns an EIO error to an application when reading or writing to a file that has been locked with fcntl() on a Ubuntu 20.04 NFSV4 ser

2024-04-17 Thread GuoqingJiang
** Attachment added: "RENEW packets between 9.20.32.85 (server) and 
9.20.120.127 (7.2 client)"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2042363/+attachment/5767206/+files/7.2nfs.png

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2042363

Title:
  AIX 7.3 NFS client frequently returns an EIO error to an application
  when reading or writing to a file that has been locked with fcntl() on
  a Ubuntu 20.04 NFSV4 server

Status in linux package in Ubuntu:
  New

Bug description:
  ---Problem Description---
  AIX 7.3 NFS client frequently returns an EIO error to an application when 
reading or writing to a file that has been locked with fcntl(). NFS server is 
Ubuntu 20.04.6 LTS, GNU/Linux 5.4.0-139-generic x86_64. The problem does not 
appear to affect other combinations of NFS client (including AIX 7.2) with this 
NFS server.

  The AIX team have indicated that the cause of the EIO is triggered by the NFS 
server returning a BAD_SEQID error which leads to the AIX NFS client 
incorrectly zeroing the stateid, which then leads to the NFS server returning a 
BAD_STATEID error and the NFS client then returns the EIO error. The AIX team 
would like to understand why the BAD_SEQID has been returned.
   
  ---uname output---
  Linux duckseason 5.4.0-156-generic #173-Ubuntu SMP Tue Jul 11 07:25:22 UTC 
2023 x86_64 x86_64 x86_64 GNU/Linux
   
  Machine Type = VMware ESXi Server 7.0 4 x Intel(R) Xeon(R) Gold 6348H CPU @ 
2.30GHz  

  ---Steps to Reproduce---
   We cannot offer a simple way to recreate the problem as it involves IBM MQ 
running on two primary machines (AIX) using the Ubuntu server for it's HA NFSv4 
storage.

  However, we can provide any requested trace or dumps from any or all
  of the involved machines.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2042363/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2042363] Re: AIX 7.3 NFS client frequently returns an EIO error to an application when reading or writing to a file that has been locked with fcntl() on a Ubuntu 20.04 NFSV4 ser

2024-04-17 Thread GuoqingJiang
** Attachment added: "packets for 9.20.32.85 (server) and 9.20.120.112 (7.3 
client)"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2042363/+attachment/5767207/+files/7.3nfs.png

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2042363

Title:
  AIX 7.3 NFS client frequently returns an EIO error to an application
  when reading or writing to a file that has been locked with fcntl() on
  a Ubuntu 20.04 NFSV4 server

Status in linux package in Ubuntu:
  New

Bug description:
  ---Problem Description---
  AIX 7.3 NFS client frequently returns an EIO error to an application when 
reading or writing to a file that has been locked with fcntl(). NFS server is 
Ubuntu 20.04.6 LTS, GNU/Linux 5.4.0-139-generic x86_64. The problem does not 
appear to affect other combinations of NFS client (including AIX 7.2) with this 
NFS server.

  The AIX team have indicated that the cause of the EIO is triggered by the NFS 
server returning a BAD_SEQID error which leads to the AIX NFS client 
incorrectly zeroing the stateid, which then leads to the NFS server returning a 
BAD_STATEID error and the NFS client then returns the EIO error. The AIX team 
would like to understand why the BAD_SEQID has been returned.
   
  ---uname output---
  Linux duckseason 5.4.0-156-generic #173-Ubuntu SMP Tue Jul 11 07:25:22 UTC 
2023 x86_64 x86_64 x86_64 GNU/Linux
   
  Machine Type = VMware ESXi Server 7.0 4 x Intel(R) Xeon(R) Gold 6348H CPU @ 
2.30GHz  

  ---Steps to Reproduce---
   We cannot offer a simple way to recreate the problem as it involves IBM MQ 
running on two primary machines (AIX) using the Ubuntu server for it's HA NFSv4 
storage.

  However, we can provide any requested trace or dumps from any or all
  of the involved machines.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2042363/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2042363] Re: AIX 7.3 NFS client frequently returns an EIO error to an application when reading or writing to a file that has been locked with fcntl() on a Ubuntu 20.04 NFSV4 ser

2024-04-17 Thread GuoqingJiang
Sorry, I can't distinguish which parts of logs in the attachments
(#comment11, #comment12 and #comment13) are belong to the connection
from working 7.2 and non-working 7.3. All the attachments have "TCP
recvfrom got EAGAIN" which should from the connection for 7.3.

$ grep "TCP recvfrom got EAGAIN" 
syslog_16042024_amaliada_primary_adamsongrunter_partner_both_aix73_part1.log 
-r|wc -l
213127
$ grep "TCP recvfrom got EAGAIN" 
syslog_16042024_amaliada_primary_adamsongrunter_partner_both_aix73_part2.log 
-r|wc -l
226005
$ grep "TCP recvfrom got EAGAIN" 
syslog_17042024_adia_primary_amberjack_partner_both_aix72.log -r|wc -l
20233


May I suggest to collect those logs in two separated files? One from 7.2 and 
another from 7.3 instead of mix them together.

Not an network expert, but I see some NFS RENEW ops packets between
9.20.32.85 (server) and 9.20.120.127 (7.2 client) in
tcp_dump17_04_2024_09H_10M, but no such RENEW packets for 9.20.32.85
(server) and 9.20.120.112 (7.3 client) in tcpdump16_04_2024_14H_03M.
Given NFS4 is a stateful fs which is based on leases, without client
send an operation to renew the lease, it is possible for server to
return EAGAIN. And please check if 7.3 client is not same as 7.2 client
regarding lease renewing.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2042363

Title:
  AIX 7.3 NFS client frequently returns an EIO error to an application
  when reading or writing to a file that has been locked with fcntl() on
  a Ubuntu 20.04 NFSV4 server

Status in linux package in Ubuntu:
  New

Bug description:
  ---Problem Description---
  AIX 7.3 NFS client frequently returns an EIO error to an application when 
reading or writing to a file that has been locked with fcntl(). NFS server is 
Ubuntu 20.04.6 LTS, GNU/Linux 5.4.0-139-generic x86_64. The problem does not 
appear to affect other combinations of NFS client (including AIX 7.2) with this 
NFS server.

  The AIX team have indicated that the cause of the EIO is triggered by the NFS 
server returning a BAD_SEQID error which leads to the AIX NFS client 
incorrectly zeroing the stateid, which then leads to the NFS server returning a 
BAD_STATEID error and the NFS client then returns the EIO error. The AIX team 
would like to understand why the BAD_SEQID has been returned.
   
  ---uname output---
  Linux duckseason 5.4.0-156-generic #173-Ubuntu SMP Tue Jul 11 07:25:22 UTC 
2023 x86_64 x86_64 x86_64 GNU/Linux
   
  Machine Type = VMware ESXi Server 7.0 4 x Intel(R) Xeon(R) Gold 6348H CPU @ 
2.30GHz  

  ---Steps to Reproduce---
   We cannot offer a simple way to recreate the problem as it involves IBM MQ 
running on two primary machines (AIX) using the Ubuntu server for it's HA NFSv4 
storage.

  However, we can provide any requested trace or dumps from any or all
  of the involved machines.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2042363/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2042363] Re: AIX 7.3 NFS client frequently returns an EIO error to an application when reading or writing to a file that has been locked with fcntl() on a Ubuntu 20.04 NFSV4 ser

2024-04-15 Thread GuoqingJiang
Per below from the trace file

Nov 30 11:13:40 duckseason kernel: [1291756.354728] nfsd_dispatch: vers 4 proc 1
Nov 30 11:13:40 duckseason kernel: [1291756.354731] svc: server 
7c7e7536, pool 0, transport 3fd86d34, inuse=3
Nov 30 11:13:40 duckseason kernel: [1291756.354732] 
process_renew(6554b87b/4ab45507): starting
Nov 30 11:13:40 duckseason kernel: [1291756.354734] svc: tcp_recv 
3fd86d34 data 1 conn 0 close 0
Nov 30 11:13:40 duckseason kernel: [1291756.354736] svc: socket 
3fd86d34 recvfrom(03fecffb, 4) = -11
Nov 30 11:13:40 duckseason kernel: [1291756.354737] RPC: TCP recv_record got -11
Nov 30 11:13:40 duckseason kernel: [1291756.354737] RPC: TCP recvfrom got EAGAIN

we can see NFS server return -11 (EAGAIN), which can be executed from
from the path,

svc_recv -> svc_handle_xprt
-> xprt->xpt_ops->xpo_recvfrom
   svc_tcp_recvfrom
   -> svc_recvfrom
  -> sock_recvmsg which probably triggers sock_recvmsg_nosec -> 
... -> tcp_recvmsg

As mentioned in recvfrom manpage,

ERRORS
   The recvfrom() function shall fail if:
   EAGAIN or EWOULDBLOCK
  The socket's file descriptor is marked O_NONBLOCK and no data is
  waiting  to  be  received;  or MSG_OOB is set and no out-of-band
  data is available and either the  socket's  file  descriptor  is
  marked  O_NONBLOCK  or  the  socket does not support blocking to
  await out-of-band data.

I am not sure if 7.3 NFS client opened non-blocking socket and no data on that 
socket to be read. 
So I would like to check if 7.3 client sent something different compared with 
7.2 client which caused server returned BAD_SEQID to AIX 7.3 client.

Please also collect relevant trace log from server side when connecting
with 7.2 client, then we can investigate the difference between good one
and bad one.

If possible, maybe you can try with the latest 5.4 stable (5.4.274) and
upstream version (6.9-rc4).

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2042363

Title:
  AIX 7.3 NFS client frequently returns an EIO error to an application
  when reading or writing to a file that has been locked with fcntl() on
  a Ubuntu 20.04 NFSV4 server

Status in linux package in Ubuntu:
  New

Bug description:
  ---Problem Description---
  AIX 7.3 NFS client frequently returns an EIO error to an application when 
reading or writing to a file that has been locked with fcntl(). NFS server is 
Ubuntu 20.04.6 LTS, GNU/Linux 5.4.0-139-generic x86_64. The problem does not 
appear to affect other combinations of NFS client (including AIX 7.2) with this 
NFS server.

  The AIX team have indicated that the cause of the EIO is triggered by the NFS 
server returning a BAD_SEQID error which leads to the AIX NFS client 
incorrectly zeroing the stateid, which then leads to the NFS server returning a 
BAD_STATEID error and the NFS client then returns the EIO error. The AIX team 
would like to understand why the BAD_SEQID has been returned.
   
  ---uname output---
  Linux duckseason 5.4.0-156-generic #173-Ubuntu SMP Tue Jul 11 07:25:22 UTC 
2023 x86_64 x86_64 x86_64 GNU/Linux
   
  Machine Type = VMware ESXi Server 7.0 4 x Intel(R) Xeon(R) Gold 6348H CPU @ 
2.30GHz  

  ---Steps to Reproduce---
   We cannot offer a simple way to recreate the problem as it involves IBM MQ 
running on two primary machines (AIX) using the Ubuntu server for it's HA NFSv4 
storage.

  However, we can provide any requested trace or dumps from any or all
  of the involved machines.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2042363/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2042363] Re: AIX 7.3 NFS client frequently returns an EIO error to an application when reading or writing to a file that has been locked with fcntl() on a Ubuntu 20.04 NFSV4 ser

2024-02-01 Thread Frank Heimes
I did a screening of the traces, but couldn't really find suspicious entries.
I'm now looking for a someone else's view and opinion...

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2042363

Title:
  AIX 7.3 NFS client frequently returns an EIO error to an application
  when reading or writing to a file that has been locked with fcntl() on
  a Ubuntu 20.04 NFSV4 server

Status in linux package in Ubuntu:
  New

Bug description:
  ---Problem Description---
  AIX 7.3 NFS client frequently returns an EIO error to an application when 
reading or writing to a file that has been locked with fcntl(). NFS server is 
Ubuntu 20.04.6 LTS, GNU/Linux 5.4.0-139-generic x86_64. The problem does not 
appear to affect other combinations of NFS client (including AIX 7.2) with this 
NFS server.

  The AIX team have indicated that the cause of the EIO is triggered by the NFS 
server returning a BAD_SEQID error which leads to the AIX NFS client 
incorrectly zeroing the stateid, which then leads to the NFS server returning a 
BAD_STATEID error and the NFS client then returns the EIO error. The AIX team 
would like to understand why the BAD_SEQID has been returned.
   
  ---uname output---
  Linux duckseason 5.4.0-156-generic #173-Ubuntu SMP Tue Jul 11 07:25:22 UTC 
2023 x86_64 x86_64 x86_64 GNU/Linux
   
  Machine Type = VMware ESXi Server 7.0 4 x Intel(R) Xeon(R) Gold 6348H CPU @ 
2.30GHz  

  ---Steps to Reproduce---
   We cannot offer a simple way to recreate the problem as it involves IBM MQ 
running on two primary machines (AIX) using the Ubuntu server for it's HA NFSv4 
storage.

  However, we can provide any requested trace or dumps from any or all
  of the involved machines.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2042363/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2042363] Re: AIX 7.3 NFS client frequently returns an EIO error to an application when reading or writing to a file that has been locked with fcntl() on a Ubuntu 20.04 NFSV4 ser

2023-11-10 Thread Frank Heimes
Hi, reading the bug description one of the first things that caught my 
attention is that the kernel (5.4.0-156) seems to be a bit outdated, since the 
latest (aot) is 5.4.0.166.
Would you mind retrying this with the latest kernel (ideally actually with the 
latest userspace, for example after having done a 'apt update' and 'apt 
full-upgrade'), since there will be a difference of hundreds of patches (also 
upstream stable) between these.

On top I believe it would probably very helpful to have rpcdebug enabled for 
the NFS Server, like:
rpcdebug -m nfsd -s all
rpcdebug -m nlm -s all
rpcdebug -m rpc -s all

Btw. it would also be interesting to know it this also happens with a
bare-metal install of the NFS server, means without having VMware in
between (avoiding any potential flaws with VMware virtual network
components, like virtual switches.).

I think technically this is not a Launchpad bug for the Ubuntu on IBM
Power project, since here Ubuntu runs on amd64 (and on VMware), but we
may still try to figure out what's going on.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2042363

Title:
  AIX 7.3 NFS client frequently returns an EIO error to an application
  when reading or writing to a file that has been locked with fcntl() on
  a Ubuntu 20.04 NFSV4 server

Status in linux package in Ubuntu:
  New

Bug description:
  ---Problem Description---
  AIX 7.3 NFS client frequently returns an EIO error to an application when 
reading or writing to a file that has been locked with fcntl(). NFS server is 
Ubuntu 20.04.6 LTS, GNU/Linux 5.4.0-139-generic x86_64. The problem does not 
appear to affect other combinations of NFS client (including AIX 7.2) with this 
NFS server.

  The AIX team have indicated that the cause of the EIO is triggered by the NFS 
server returning a BAD_SEQID error which leads to the AIX NFS client 
incorrectly zeroing the stateid, which then leads to the NFS server returning a 
BAD_STATEID error and the NFS client then returns the EIO error. The AIX team 
would like to understand why the BAD_SEQID has been returned.
   
  ---uname output---
  Linux duckseason 5.4.0-156-generic #173-Ubuntu SMP Tue Jul 11 07:25:22 UTC 
2023 x86_64 x86_64 x86_64 GNU/Linux
   
  Machine Type = VMware ESXi Server 7.0 4 x Intel(R) Xeon(R) Gold 6348H CPU @ 
2.30GHz  

  ---Steps to Reproduce---
   We cannot offer a simple way to recreate the problem as it involves IBM MQ 
running on two primary machines (AIX) using the Ubuntu server for it's HA NFSv4 
storage.

  However, we can provide any requested trace or dumps from any or all
  of the involved machines.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2042363/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp