Hi , > Dear all, > > Recently, I'm stuck with some AFS issues. > > AFS client hanged with the following log message. In this case, > the AFS instance blocked and jobs failed to access any files > located in AFS. I have to reboot the work node to recover service. > > Dec 6 15:03:18 bws0825 kernel: INFO: task afs_callback:19124 blocked for > more than 120 seconds. > Dec 6 15:03:18 bws0825 kernel: "echo 0 > > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > Dec 6 15:03:18 bws0825 kernel: afs_callback D ffff9860d826e180 0 > 19124 2 0x00000000 > Dec 6 15:03:18 bws0825 kernel: Call Trace: > Dec 6 15:03:18 bws0825 kernel: afs_callback D ffff9860d826e180 0 > 19124 2 0x00000000 > Dec 6 15:03:18 bws0825 kernel: Call Trace: > Dec 6 15:03:18 bws0825 kernel: [<ffffffffa2169df9>] > schedule_preempt_disabled+0x29/0x70 > Dec 6 15:03:18 bws0825 kernel: [<ffffffffa2167d77>] > __mutex_lock_slowpath+0xc7/0x1d0 > Dec 6 15:03:18 bws0825 kernel: [<ffffffffa216715f>] mutex_lock+0x1f/0x2f > Dec 6 15:03:18 bws0825 kernel: [<ffffffffc084dff4>] > SRXAFSCB_InitCallBackState+0x34/0x470 [openafs] > Dec 6 15:03:18 bws0825 kernel: [<ffffffffc0898047>] ? > afs_xdr_vector+0x57/0x90 [openafs] > Dec 6 15:03:18 bws0825 kernel: [<ffffffffc084f19e>] > SRXAFSCB_InitCallBackState3+0xe/0x10 [openafs] > Dec 6 15:03:18 bws0825 kernel: [<ffffffffc08b6f43>] > RXAFSCB_ExecuteRequest+0x6f3/0x8a0 [openafs] > Dec 6 15:03:18 bws0825 kernel: [<ffffffffa1b028ae>] ? > getnstimeofday64+0xe/0x30 > Dec 6 15:03:18 bws0825 kernel: [<ffffffffc08ae589>] ? > afs_mutex_exit+0x29/0x40 [openafs] > Dec 6 15:03:18 bws0825 kernel: [<ffffffffc08a6a5d>] > rxi_ServerProc+0xcd/0x1e0 [openafs] > Dec 6 15:03:18 bws0825 kernel: [<ffffffffc08c74c0>] ? > afs_shutdown_pagecopy+0x20/0x20 [openafs] > Dec 6 15:03:18 bws0825 kernel: [<ffffffffc08af017>] > rx_ServerProc+0x87/0xe0 [openafs] > Dec 6 15:03:18 bws0825 kernel: [<ffffffffc084eedd>] > afs_RXCallBackServer+0x3d/0x50 [openafs] > Dec 6 15:03:18 bws0825 kernel: [<ffffffffc08c76a5>] > afsd_thread+0x1e5/0x730 [openafs] > Dec 6 15:03:18 bws0825 kernel: [<ffffffffc08c74c0>] ? > afs_shutdown_pagecopy+0x20/0x20 [openafs] > Dec 6 15:03:18 bws0825 kernel: [<ffffffffa1ac1da1>] kthread+0xd1/0xe0 > Dec 6 15:03:18 bws0825 kernel: [<ffffffffa1ac1cd0>] ? > insert_kthread_work+0x40/0x40 > Dec 6 15:03:18 bws0825 kernel: [<ffffffffa2175c1d>] > ret_from_fork_nospec_begin+0x7/0x21 > Dec 6 15:03:18 bws0825 kernel: [<ffffffffa1ac1cd0>] ? > insert_kthread_work+0x40/0x40 > Dec 6 15:03:18 bws0825 kernel: INFO: task afs_rxevent:19127 blocked for > more than 120 seconds. > Dec 6 15:03:18 bws0825 kernel: "echo 0 > > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > Dec 6 15:03:18 bws0825 kernel: afs_rxevent D ffff9860cbbf6180 0 > 19127 2 0x00000000 > Dec 6 15:03:18 bws0825 kernel: Call Trace: > Dec 6 15:03:18 bws0825 kernel: [<ffffffffa1aaa2d2>] ? > del_timer_sync+0x52/0x60 > Dec 6 15:03:18 bws0825 kernel: [<ffffffffa2169df9>] > schedule_preempt_disabled+0x29/0x70 > Dec 6 15:03:18 bws0825 kernel: [<ffffffffa2167d77>] > __mutex_lock_slowpath+0xc7/0x1d0 > Dec 6 15:03:18 bws0825 kernel: [<ffffffffa216715f>] mutex_lock+0x1f/0x2f > Dec 6 15:03:18 bws0825 kernel: [<ffffffffc08bdb58>] > afs_osi_TimedSleep+0x118/0x210 [openafs] > Dec 6 15:03:18 bws0825 kernel: [<ffffffffa1ad6b60>] ? > wake_up_state+0x20/0x20 > Dec 6 15:03:18 bws0825 kernel: [<ffffffffc08bdce8>] > afs_osi_Wait+0x98/0xd0 [openafs] > Dec 6 15:03:18 bws0825 kernel: [<ffffffffc08c74c0>] ? > afs_shutdown_pagecopy+0x20/0x20 [openafs] > Dec 6 15:03:18 bws0825 kernel: [<ffffffffc08af575>] > afs_rxevent_daemon+0x95/0x140 [openafs] > Dec 6 15:03:18 bws0825 kernel: [<ffffffffc08c7af6>] > afsd_thread+0x636/0x730 [openafs] > Dec 6 15:03:18 bws0825 kernel: [<ffffffffc08c74c0>] ? > afs_shutdown_pagecopy+0x20/0x20 [openafs] > Dec 6 15:03:18 bws0825 kernel: [<ffffffffa1ac1da1>] kthread+0xd1/0xe0 > Dec 6 15:03:18 bws0825 kernel: [<ffffffffa1ac1cd0>] ? > insert_kthread_work+0x40/0x40 > Dec 6 15:03:18 bws0825 kernel: [<ffffffffa2175c1d>] > ret_from_fork_nospec_begin+0x7/0x21 > Dec 6 15:03:18 bws0825 kernel: [<ffffffffa1ac1cd0>] ? > insert_kthread_work+0x40/0x40 > Dec 6 15:03:18 bws0825 kernel: [<ffffffffa2175c1d>] > ret_from_fork_nospec_begin+0x7/0x21 > Dec 6 15:03:18 bws0825 kernel: [<ffffffffa1ac1cd0>] ? > insert_kthread_work+0x40/0x40 > Dec 6 15:03:18 bws0825 kernel: INFO: task afs_checkserver:19870 blocked > for more than 120 seconds. > Dec 6 15:03:18 bws0825 kernel: "echo 0 > > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > Dec 6 15:03:18 bws0825 kernel: afs_checkserver D ffff9860c7811040 0 > 19870 2 0x00000000 > Dec 6 15:03:18 bws0825 kernel: Call Trace: > Dec 6 15:03:18 bws0825 kernel: [<ffffffffa1aaa2d2>] ? > del_timer_sync+0x52/0x60 > Dec 6 15:03:18 bws0825 kernel: [<ffffffffa2169df9>] > schedule_preempt_disabled+0x29/0x70 > Dec 6 15:03:18 bws0825 kernel: [<ffffffffa2167d77>] > __mutex_lock_slowpath+0xc7/0x1d0 > Dec 6 15:03:18 bws0825 kernel: [<ffffffffa216715f>] mutex_lock+0x1f/0x2f > Dec 6 15:03:18 bws0825 kernel: [<ffffffffc08bdb58>] > afs_osi_TimedSleep+0x118/0x210 [openafs] > Dec 6 15:03:18 bws0825 kernel: [<ffffffffa1ad6b60>] ? > wake_up_state+0x20/0x20 > Dec 6 15:03:18 bws0825 kernel: [<ffffffffc08bdce8>] > afs_osi_Wait+0x98/0xd0 [openafs] > Dec 6 15:03:18 bws0825 kernel: [<ffffffffc0853b08>] > afs_CheckServerDaemon+0x118/0x1a0 [openafs] > Dec 6 15:03:18 bws0825 kernel: [<ffffffffc08c74c0>] ? > afs_shutdown_pagecopy+0x20/0x20 [openafs] > Dec 6 15:03:18 bws0825 kernel: [<ffffffffc08c7930>] > afsd_thread+0x470/0x730 [openafs] > Dec 6 15:03:18 bws0825 kernel: [<ffffffffc08c74c0>] ? > afs_shutdown_pagecopy+0x20/0x20 [openafs] > Dec 6 15:03:18 bws0825 kernel: [<ffffffffa1ac1da1>] kthread+0xd1/0xe0 > Dec 6 15:03:18 bws0825 kernel: [<ffffffffa1ac1cd0>] ? > insert_kthread_work+0x40/0x40 > Dec 6 15:03:18 bws0825 kernel: [<ffffffffa2175c1d>] > ret_from_fork_nospec_begin+0x7/0x21 > Is there an IO intensive process running in the background ?
Is there an process which uses too much RAM ? > > > Does the 1.6.23 is not compatible with the linux kernel or AFS > server version? > SL7 has kernel 3.10, since AFS 1.6.4 SL6 has kernel 2.6, support before AFS 1.6 Since AFS 1.6.22.4 kernel support up to 4.18 is included > > Any information you provided would be appreciated. Thanks. > > > Regards, > Qiulan > > > ------------------------------------------------------------------------ > huangql > ==================================================================== > Computing center,the Institute of High Energy Physics, CAS, China > Qiulan Huang Tel: (+86) 10 8823 6087 > P.O. Box 918-7 Fax: (+86) 10 8823 6839 > Beijing 100049 P.R. China Email: [email protected] > =================================================================== >
