Hi ,

>     Dear all,
>
>     Recently, I'm stuck with some AFS issues.
>
>     AFS client hanged with the following log message. In this case,
>     the AFS instance blocked and jobs failed to access any files
>     located in AFS. I have to reboot the work node to recover service.
>
>     Dec  6 15:03:18 bws0825 kernel: INFO: task afs_callback:19124 blocked for 
> more than 120 seconds.
>     Dec  6 15:03:18 bws0825 kernel: "echo 0 > 
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>     Dec  6 15:03:18 bws0825 kernel: afs_callback    D ffff9860d826e180     0 
> 19124      2 0x00000000
>     Dec  6 15:03:18 bws0825 kernel: Call Trace:
>     Dec  6 15:03:18 bws0825 kernel: afs_callback    D ffff9860d826e180     0 
> 19124      2 0x00000000
>     Dec  6 15:03:18 bws0825 kernel: Call Trace:
>     Dec  6 15:03:18 bws0825 kernel: [<ffffffffa2169df9>] 
> schedule_preempt_disabled+0x29/0x70
>     Dec  6 15:03:18 bws0825 kernel: [<ffffffffa2167d77>] 
> __mutex_lock_slowpath+0xc7/0x1d0
>     Dec  6 15:03:18 bws0825 kernel: [<ffffffffa216715f>] mutex_lock+0x1f/0x2f
>     Dec  6 15:03:18 bws0825 kernel: [<ffffffffc084dff4>] 
> SRXAFSCB_InitCallBackState+0x34/0x470 [openafs]
>     Dec  6 15:03:18 bws0825 kernel: [<ffffffffc0898047>] ? 
> afs_xdr_vector+0x57/0x90 [openafs]
>     Dec  6 15:03:18 bws0825 kernel: [<ffffffffc084f19e>] 
> SRXAFSCB_InitCallBackState3+0xe/0x10 [openafs]
>     Dec  6 15:03:18 bws0825 kernel: [<ffffffffc08b6f43>] 
> RXAFSCB_ExecuteRequest+0x6f3/0x8a0 [openafs]
>     Dec  6 15:03:18 bws0825 kernel: [<ffffffffa1b028ae>] ? 
> getnstimeofday64+0xe/0x30
>     Dec  6 15:03:18 bws0825 kernel: [<ffffffffc08ae589>] ? 
> afs_mutex_exit+0x29/0x40 [openafs]
>     Dec  6 15:03:18 bws0825 kernel: [<ffffffffc08a6a5d>] 
> rxi_ServerProc+0xcd/0x1e0 [openafs]
>     Dec  6 15:03:18 bws0825 kernel: [<ffffffffc08c74c0>] ? 
> afs_shutdown_pagecopy+0x20/0x20 [openafs]
>     Dec  6 15:03:18 bws0825 kernel: [<ffffffffc08af017>] 
> rx_ServerProc+0x87/0xe0 [openafs]
>     Dec  6 15:03:18 bws0825 kernel: [<ffffffffc084eedd>] 
> afs_RXCallBackServer+0x3d/0x50 [openafs]
>     Dec  6 15:03:18 bws0825 kernel: [<ffffffffc08c76a5>] 
> afsd_thread+0x1e5/0x730 [openafs]
>     Dec  6 15:03:18 bws0825 kernel: [<ffffffffc08c74c0>] ? 
> afs_shutdown_pagecopy+0x20/0x20 [openafs]
>     Dec  6 15:03:18 bws0825 kernel: [<ffffffffa1ac1da1>] kthread+0xd1/0xe0
>     Dec  6 15:03:18 bws0825 kernel: [<ffffffffa1ac1cd0>] ? 
> insert_kthread_work+0x40/0x40
>     Dec  6 15:03:18 bws0825 kernel: [<ffffffffa2175c1d>] 
> ret_from_fork_nospec_begin+0x7/0x21
>     Dec  6 15:03:18 bws0825 kernel: [<ffffffffa1ac1cd0>] ? 
> insert_kthread_work+0x40/0x40
>     Dec  6 15:03:18 bws0825 kernel: INFO: task afs_rxevent:19127 blocked for 
> more than 120 seconds.
>     Dec  6 15:03:18 bws0825 kernel: "echo 0 > 
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>     Dec  6 15:03:18 bws0825 kernel: afs_rxevent     D ffff9860cbbf6180     0 
> 19127      2 0x00000000
>     Dec  6 15:03:18 bws0825 kernel: Call Trace:
>     Dec  6 15:03:18 bws0825 kernel: [<ffffffffa1aaa2d2>] ? 
> del_timer_sync+0x52/0x60
>     Dec  6 15:03:18 bws0825 kernel: [<ffffffffa2169df9>] 
> schedule_preempt_disabled+0x29/0x70
>     Dec  6 15:03:18 bws0825 kernel: [<ffffffffa2167d77>] 
> __mutex_lock_slowpath+0xc7/0x1d0
>     Dec  6 15:03:18 bws0825 kernel: [<ffffffffa216715f>] mutex_lock+0x1f/0x2f
>     Dec  6 15:03:18 bws0825 kernel: [<ffffffffc08bdb58>] 
> afs_osi_TimedSleep+0x118/0x210 [openafs]
>     Dec  6 15:03:18 bws0825 kernel: [<ffffffffa1ad6b60>] ? 
> wake_up_state+0x20/0x20
>     Dec  6 15:03:18 bws0825 kernel: [<ffffffffc08bdce8>] 
> afs_osi_Wait+0x98/0xd0 [openafs]
>     Dec  6 15:03:18 bws0825 kernel: [<ffffffffc08c74c0>] ? 
> afs_shutdown_pagecopy+0x20/0x20 [openafs]
>     Dec  6 15:03:18 bws0825 kernel: [<ffffffffc08af575>] 
> afs_rxevent_daemon+0x95/0x140 [openafs]
>     Dec  6 15:03:18 bws0825 kernel: [<ffffffffc08c7af6>] 
> afsd_thread+0x636/0x730 [openafs]
>     Dec  6 15:03:18 bws0825 kernel: [<ffffffffc08c74c0>] ? 
> afs_shutdown_pagecopy+0x20/0x20 [openafs]
>     Dec  6 15:03:18 bws0825 kernel: [<ffffffffa1ac1da1>] kthread+0xd1/0xe0
>     Dec  6 15:03:18 bws0825 kernel: [<ffffffffa1ac1cd0>] ? 
> insert_kthread_work+0x40/0x40
>     Dec  6 15:03:18 bws0825 kernel: [<ffffffffa2175c1d>] 
> ret_from_fork_nospec_begin+0x7/0x21
>     Dec  6 15:03:18 bws0825 kernel: [<ffffffffa1ac1cd0>] ? 
> insert_kthread_work+0x40/0x40
>     Dec  6 15:03:18 bws0825 kernel: [<ffffffffa2175c1d>] 
> ret_from_fork_nospec_begin+0x7/0x21
>     Dec  6 15:03:18 bws0825 kernel: [<ffffffffa1ac1cd0>] ? 
> insert_kthread_work+0x40/0x40
>     Dec  6 15:03:18 bws0825 kernel: INFO: task afs_checkserver:19870 blocked 
> for more than 120 seconds.
>     Dec  6 15:03:18 bws0825 kernel: "echo 0 > 
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>     Dec  6 15:03:18 bws0825 kernel: afs_checkserver D ffff9860c7811040     0 
> 19870      2 0x00000000
>     Dec  6 15:03:18 bws0825 kernel: Call Trace:
>     Dec  6 15:03:18 bws0825 kernel: [<ffffffffa1aaa2d2>] ? 
> del_timer_sync+0x52/0x60
>     Dec  6 15:03:18 bws0825 kernel: [<ffffffffa2169df9>] 
> schedule_preempt_disabled+0x29/0x70
>     Dec  6 15:03:18 bws0825 kernel: [<ffffffffa2167d77>] 
> __mutex_lock_slowpath+0xc7/0x1d0
>     Dec  6 15:03:18 bws0825 kernel: [<ffffffffa216715f>] mutex_lock+0x1f/0x2f
>     Dec  6 15:03:18 bws0825 kernel: [<ffffffffc08bdb58>] 
> afs_osi_TimedSleep+0x118/0x210 [openafs]
>     Dec  6 15:03:18 bws0825 kernel: [<ffffffffa1ad6b60>] ? 
> wake_up_state+0x20/0x20
>     Dec  6 15:03:18 bws0825 kernel: [<ffffffffc08bdce8>] 
> afs_osi_Wait+0x98/0xd0 [openafs]
>     Dec  6 15:03:18 bws0825 kernel: [<ffffffffc0853b08>] 
> afs_CheckServerDaemon+0x118/0x1a0 [openafs]
>     Dec  6 15:03:18 bws0825 kernel: [<ffffffffc08c74c0>] ? 
> afs_shutdown_pagecopy+0x20/0x20 [openafs]
>     Dec  6 15:03:18 bws0825 kernel: [<ffffffffc08c7930>] 
> afsd_thread+0x470/0x730 [openafs]
>     Dec  6 15:03:18 bws0825 kernel: [<ffffffffc08c74c0>] ? 
> afs_shutdown_pagecopy+0x20/0x20 [openafs]
>     Dec  6 15:03:18 bws0825 kernel: [<ffffffffa1ac1da1>] kthread+0xd1/0xe0
>     Dec  6 15:03:18 bws0825 kernel: [<ffffffffa1ac1cd0>] ? 
> insert_kthread_work+0x40/0x40
>     Dec  6 15:03:18 bws0825 kernel: [<ffffffffa2175c1d>] 
> ret_from_fork_nospec_begin+0x7/0x21
>
Is there an IO intensive process running in the background ?

Is there an process which uses too much RAM ?


>
>
>     Does the 1.6.23 is not compatible with the linux kernel or AFS
>     server version?
>

SL7 has kernel 3.10, since AFS 1.6.4

SL6 has kernel 2.6, support before AFS 1.6

Since AFS 1.6.22.4 kernel support up to 4.18 is included

>
>     Any information you provided would be appreciated. Thanks.
>
>
>     Regards,
>     Qiulan
>
>
>     ------------------------------------------------------------------------
>     huangql
>     ====================================================================
>     Computing center,the Institute of High Energy Physics, CAS, China
>     Qiulan Huang                       Tel: (+86) 10 8823 6087
>     P.O. Box 918-7                       Fax: (+86) 10 8823 6839
>     Beijing 100049  P.R. China           Email: [email protected]
>     ===================================================================
>

Reply via email to