Dear all,

Recently, I'm stuck with some AFS issues.

AFS client hanged with the following log message. In this case, the AFS 
instance blocked and jobs failed to access any files located in AFS. I have to 
reboot the work node to recover service.

Dec  6 15:03:18 bws0825 kernel: INFO: task afs_callback:19124 blocked for more 
than 120 seconds.
Dec  6 15:03:18 bws0825 kernel: "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec  6 15:03:18 bws0825 kernel: afs_callback    D ffff9860d826e180     0 19124  
    2 0x00000000
Dec  6 15:03:18 bws0825 kernel: Call Trace:
Dec  6 15:03:18 bws0825 kernel: afs_callback    D ffff9860d826e180     0 19124  
    2 0x00000000
Dec  6 15:03:18 bws0825 kernel: Call Trace:
Dec  6 15:03:18 bws0825 kernel: [<ffffffffa2169df9>] 
schedule_preempt_disabled+0x29/0x70
Dec  6 15:03:18 bws0825 kernel: [<ffffffffa2167d77>] 
__mutex_lock_slowpath+0xc7/0x1d0
Dec  6 15:03:18 bws0825 kernel: [<ffffffffa216715f>] mutex_lock+0x1f/0x2f
Dec  6 15:03:18 bws0825 kernel: [<ffffffffc084dff4>] 
SRXAFSCB_InitCallBackState+0x34/0x470 [openafs]
Dec  6 15:03:18 bws0825 kernel: [<ffffffffc0898047>] ? afs_xdr_vector+0x57/0x90 
[openafs]
Dec  6 15:03:18 bws0825 kernel: [<ffffffffc084f19e>] 
SRXAFSCB_InitCallBackState3+0xe/0x10 [openafs]
Dec  6 15:03:18 bws0825 kernel: [<ffffffffc08b6f43>] 
RXAFSCB_ExecuteRequest+0x6f3/0x8a0 [openafs]
Dec  6 15:03:18 bws0825 kernel: [<ffffffffa1b028ae>] ? getnstimeofday64+0xe/0x30
Dec  6 15:03:18 bws0825 kernel: [<ffffffffc08ae589>] ? afs_mutex_exit+0x29/0x40 
[openafs]
Dec  6 15:03:18 bws0825 kernel: [<ffffffffc08a6a5d>] rxi_ServerProc+0xcd/0x1e0 
[openafs]
Dec  6 15:03:18 bws0825 kernel: [<ffffffffc08c74c0>] ? 
afs_shutdown_pagecopy+0x20/0x20 [openafs]
Dec  6 15:03:18 bws0825 kernel: [<ffffffffc08af017>] rx_ServerProc+0x87/0xe0 
[openafs]
Dec  6 15:03:18 bws0825 kernel: [<ffffffffc084eedd>] 
afs_RXCallBackServer+0x3d/0x50 [openafs]
Dec  6 15:03:18 bws0825 kernel: [<ffffffffc08c76a5>] afsd_thread+0x1e5/0x730 
[openafs]
Dec  6 15:03:18 bws0825 kernel: [<ffffffffc08c74c0>] ? 
afs_shutdown_pagecopy+0x20/0x20 [openafs]
Dec  6 15:03:18 bws0825 kernel: [<ffffffffa1ac1da1>] kthread+0xd1/0xe0
Dec  6 15:03:18 bws0825 kernel: [<ffffffffa1ac1cd0>] ? 
insert_kthread_work+0x40/0x40
Dec  6 15:03:18 bws0825 kernel: [<ffffffffa2175c1d>] 
ret_from_fork_nospec_begin+0x7/0x21
Dec  6 15:03:18 bws0825 kernel: [<ffffffffa1ac1cd0>] ? 
insert_kthread_work+0x40/0x40
Dec  6 15:03:18 bws0825 kernel: INFO: task afs_rxevent:19127 blocked for more 
than 120 seconds.
Dec  6 15:03:18 bws0825 kernel: "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec  6 15:03:18 bws0825 kernel: afs_rxevent     D ffff9860cbbf6180     0 19127  
    2 0x00000000
Dec  6 15:03:18 bws0825 kernel: Call Trace:
Dec  6 15:03:18 bws0825 kernel: [<ffffffffa1aaa2d2>] ? del_timer_sync+0x52/0x60
Dec  6 15:03:18 bws0825 kernel: [<ffffffffa2169df9>] 
schedule_preempt_disabled+0x29/0x70
Dec  6 15:03:18 bws0825 kernel: [<ffffffffa2167d77>] 
__mutex_lock_slowpath+0xc7/0x1d0
Dec  6 15:03:18 bws0825 kernel: [<ffffffffa216715f>] mutex_lock+0x1f/0x2f
Dec  6 15:03:18 bws0825 kernel: [<ffffffffc08bdb58>] 
afs_osi_TimedSleep+0x118/0x210 [openafs]
Dec  6 15:03:18 bws0825 kernel: [<ffffffffa1ad6b60>] ? wake_up_state+0x20/0x20
Dec  6 15:03:18 bws0825 kernel: [<ffffffffc08bdce8>] afs_osi_Wait+0x98/0xd0 
[openafs]
Dec  6 15:03:18 bws0825 kernel: [<ffffffffc08c74c0>] ? 
afs_shutdown_pagecopy+0x20/0x20 [openafs]
Dec  6 15:03:18 bws0825 kernel: [<ffffffffc08af575>] 
afs_rxevent_daemon+0x95/0x140 [openafs]
Dec  6 15:03:18 bws0825 kernel: [<ffffffffc08c7af6>] afsd_thread+0x636/0x730 
[openafs]
Dec  6 15:03:18 bws0825 kernel: [<ffffffffc08c74c0>] ? 
afs_shutdown_pagecopy+0x20/0x20 [openafs]
Dec  6 15:03:18 bws0825 kernel: [<ffffffffa1ac1da1>] kthread+0xd1/0xe0
Dec  6 15:03:18 bws0825 kernel: [<ffffffffa1ac1cd0>] ? 
insert_kthread_work+0x40/0x40
Dec  6 15:03:18 bws0825 kernel: [<ffffffffa2175c1d>] 
ret_from_fork_nospec_begin+0x7/0x21
Dec  6 15:03:18 bws0825 kernel: [<ffffffffa1ac1cd0>] ? 
insert_kthread_work+0x40/0x40
Dec  6 15:03:18 bws0825 kernel: [<ffffffffa2175c1d>] 
ret_from_fork_nospec_begin+0x7/0x21
Dec  6 15:03:18 bws0825 kernel: [<ffffffffa1ac1cd0>] ? 
insert_kthread_work+0x40/0x40
Dec  6 15:03:18 bws0825 kernel: INFO: task afs_checkserver:19870 blocked for 
more than 120 seconds.
Dec  6 15:03:18 bws0825 kernel: "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec  6 15:03:18 bws0825 kernel: afs_checkserver D ffff9860c7811040     0 19870  
    2 0x00000000
Dec  6 15:03:18 bws0825 kernel: Call Trace:
Dec  6 15:03:18 bws0825 kernel: [<ffffffffa1aaa2d2>] ? del_timer_sync+0x52/0x60
Dec  6 15:03:18 bws0825 kernel: [<ffffffffa2169df9>] 
schedule_preempt_disabled+0x29/0x70
Dec  6 15:03:18 bws0825 kernel: [<ffffffffa2167d77>] 
__mutex_lock_slowpath+0xc7/0x1d0
Dec  6 15:03:18 bws0825 kernel: [<ffffffffa216715f>] mutex_lock+0x1f/0x2f
Dec  6 15:03:18 bws0825 kernel: [<ffffffffc08bdb58>] 
afs_osi_TimedSleep+0x118/0x210 [openafs]
Dec  6 15:03:18 bws0825 kernel: [<ffffffffa1ad6b60>] ? wake_up_state+0x20/0x20
Dec  6 15:03:18 bws0825 kernel: [<ffffffffc08bdce8>] afs_osi_Wait+0x98/0xd0 
[openafs]
Dec  6 15:03:18 bws0825 kernel: [<ffffffffc0853b08>] 
afs_CheckServerDaemon+0x118/0x1a0 [openafs]
Dec  6 15:03:18 bws0825 kernel: [<ffffffffc08c74c0>] ? 
afs_shutdown_pagecopy+0x20/0x20 [openafs]
Dec  6 15:03:18 bws0825 kernel: [<ffffffffc08c7930>] afsd_thread+0x470/0x730 
[openafs]
Dec  6 15:03:18 bws0825 kernel: [<ffffffffc08c74c0>] ? 
afs_shutdown_pagecopy+0x20/0x20 [openafs]
Dec  6 15:03:18 bws0825 kernel: [<ffffffffa1ac1da1>] kthread+0xd1/0xe0
Dec  6 15:03:18 bws0825 kernel: [<ffffffffa1ac1cd0>] ? 
insert_kthread_work+0x40/0x40
Dec  6 15:03:18 bws0825 kernel: [<ffffffffa2175c1d>] 
ret_from_fork_nospec_begin+0x7/0x21


The following message is the specification of AFS cluster at IHEP. 

AFS clinet--sl7

[root@bws0825 ~]# rpm -qa|grep openafs
openafs-1.6-sl-client-1.6.23-289.sl7.x86_64
openafs-1.6-sl-authlibs-1.6.23-289.sl7.x86_64
openafs-1.6-sl-devel-1.6.23-289.sl7.x86_64
openafs-1.6-sl-module-tools-1.6.23-289.sl7.x86_64
openafs-1.6-sl-krb5-1.6.23-289.sl7.x86_64
openafs-1.6-sl-1.6.23-289.sl7.x86_64
openafs-1.6-sl-authlibs-devel-1.6.23-289.sl7.x86_64
kmod-openafs-1.6-sl-1.6.23-1.SL76.el7.noarch
kmod-openafs-1.6-sl-957-1.6.23-289.sl7.957.x86_64

AFS client-SL6:
openafs-krb5-1.6.23-289.sl6.x86_64
openafs-client-1.6.23-289.sl6.x86_64
openafs-1.6.23-289.sl6.x86_64
openafs-kpasswd-1.6.23-289.sl6.x86_64
openafs-module-tools-1.6.23-289.sl6.x86_64
openafs-kernel-source-1.6.23-289.sl6.x86_64
openafs-firstboot-1.6-1.sl6.noarch
openafs-authlibs-1.6.23-289.sl6.x86_64
kmod-openafs-696-1.6.20-257.sl6.696.x86_64
kmod-openafs-1.6.22.3-1.SL610.el6.noarch
openafs-compat-1.6.23-289.sl6.x86_64

AFS server side:

openafs-authlibs-1.6.20-256.sl6.x86_64
openafs-server-1.6.11-1.el6.x86_64
openafs-kernel-source-1.6.11-1.el6.x86_64
openafs-docs-1.6.11-1.el6.x86_64
openafs-authlibs-devel-1.6.20-256.sl6.x86_64
openafs-1.6.11-1.el6.x86_64
openafs-compat-1.6.11-1.el6.x86_64
openafs-devel-1.6.20-256.sl6.x86_64
openafs-client-1.6.11-1.el6.x86_64
kmod-openafs-1.6.11-1.2.6.32_504.3.3.el6.x86_64
openafs-krb5-1.6.11-1.el6.x86_64


Does the 1.6.23 is not compatible with the linux kernel or AFS server version?

Any information you provided would be appreciated. Thanks.


Regards,
Qiulan




huangql
====================================================================
Computing center,the Institute of High Energy Physics, CAS, China
Qiulan Huang                       Tel: (+86) 10 8823 6087
P.O. Box 918-7                       Fax: (+86) 10 8823 6839
Beijing 100049  P.R. China           Email: huan...@ihep.ac.cn
===================================================================

Reply via email to