> -----Original Messages-----
> From: "Stephan Wiesand" <stephan.wies...@desy.de>
> Sent Time: 2019-12-18 18:56:12 (Wednesday)
> To: openafs-info@openafs.org
> Cc: 
> Subject: Re: [OpenAFS] AFS client hanged
> 
> Hi Andreas,
> 
> > On 18. Dec 2019, at 10:48, Andreas Ladanyi <andreas.lada...@kit.edu> wrote:
> > 
> > Hi,
> > 
> >>> kernel-2.6.32-696.20.1.el6.x86_64. After we upgrade to the new linux 
> >>> kernel and install the default openafs client version using yum(the 
> >>> version we used listed in the following), we have the hang issue. That's 
> >>> why I suspect the version compatibility.
> >>> AFS clinet--sl7 : l.6.23
> >>> [root@bws0825 ~]# rpm -qa|grep openafs
> >>> openafs-1.6-sl-client-1.6.23-289.sl7.x86_64
> >>> openafs-1.6-sl-authlibs-1.6.23-289.sl7.x86_64
> >>> openafs-1.6-sl-devel-1.6.23-289.sl7.x86_64
> >>> openafs-1.6-sl-module-tools-1.6.23-289.sl7.x86_64
> >>> openafs-1.6-sl-krb5-1.6.23-289.sl7.x86_64
> >>> openafs-1.6-sl-1.6.23-289.sl7.x86_64
> >>> openafs-1.6-sl-authlibs-devel-1.6.23-289.sl7.x86_64
> >>> kmod-openafs-1.6-sl-957-1.6.23-289.sl7.957.x86_64
> >>> 
> >>> AFS client-SL6: 1.6.23
> >>> openafs-krb5-1.6.23-289.sl6.x86_64
> >>> openafs-client-1.6.23-289.sl6.x86_64
> >>> openafs-1.6.23-289.sl6.x86_64
> >>> openafs-kpasswd-1.6.23-289.sl6.x86_64
> >>> openafs-module-tools-1.6.23-289.sl6.x86_64
> >>> openafs-kernel-source-1.6.23-289.sl6.x86_64
> >>> openafs-firstboot-1.6-1.sl6.noarch
> >>> openafs-authlibs-1.6.23-289.sl6.x86_64
> >>> kmod-openafs-1.6.22.3-1.SL610.el6.noarch
> >>> openafs-compat-1.6.23-289.sl6.x86_64
> >>> 
> > What i could see here is a version difference between kmod-openafs 1.6.22 
> > and openafs-client 1.6.23
> 
> While 1.6.23 was a security update and yes, this looks kind of manual, it 
> shouldn't matter.
> 
> > Does the issue appear on one client only or all clients which are upgraded ?

The issue appear on many clients in the cluster but not at the same time. Every 
morning we check the status of compute nodes we got some of them hanged.

Also, we found some clients failed to access some file located in AFS. The file 
looks well by "ls". But it failed to read it.

I use strace command to trace the cp command to know the operation was stuck 
after reading several KB. It's weird. And from server side using tcpdump to 
know the package sent to server only version information without data. It seems 
like the client think the cache data is the latest one. Actually the cache is 
dirty . But the data access didn't trigger data fetch. 

17:40:07.167220 IP lhws136.ihep.ac.cn.afs3-callback > 
202.122.35.133.afs3-fileserver:  rx version (29)
17:40:07.171580 IP bws0666.ihep.ac.cn.afs3-callback > 
202.122.35.133.afs3-fileserver:  rx version (29)
17:40:07.196307 IP acc-ap18.ihep.ac.cn.afs3-callback > 
202.122.35.133.afs3-fileserver:  rx data fs call setlock fid 536887265/30/43 
(48)
17:40:07.222616 IP lxslc604.ihep.ac.cn.afs3-callback > 
202.122.35.133.afs3-fileserver:  rx data fs call setlock fid 536887265/30/43 
(48)
17:40:07.240442 IP bws0420.ihep.ac.cn.afs3-callback > 
202.122.35.133.afs3-fileserver:  rx version (29)


> 
> We run these packages on a lot of SL6 and SL7 systems, and the issue reported 
> here at least isn't common. We seem to have a project with a usage pattern 
> able to provoke hangs though. That has yet to be investigated. It was about 
> something like using zsh tab completion in a git repo...
> 
> 
> _______________________________________________
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-info

Reply via email to