Hi all

I'm fighting a problem with an OpenIndiana 148 server and NFS3 mounts from 
Linux clients. A simple cron job is run that moves some data files from another 
server to the OI box. This runs well for a while, until at some point, the 
client hangs and reports NFS server connection failure. The calltrace from 
linux is 

[  484.712558] INFO: task mv:2353 blocked for more than 120 seconds.
[  484.712562] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[  484.712566] mv            D 0000000100000a8b     0  2353   2352 0x00000001
[  484.712573]  ffff880234b75ba8 0000000000000086 ffff880234b75b18 
0000000000015980
[  484.712579]  ffff880234b75fd8 0000000000015980 ffff880234b75fd8 
ffff8802349896e0
[  484.712584]  0000000000015980 0000000000015980 ffff880234b75fd8 
0000000000015980
[  484.712589] Call Trace:
[  484.712599]  [<ffffffff81100d60>] ? sync_page+0x0/0x50
[  484.712606]  [<ffffffff8159e053>] io_schedule+0x73/0xc0
[  484.712610]  [<ffffffff81100d9d>] sync_page+0x3d/0x50
[  484.712614]  [<ffffffff8159e6cf>] __wait_on_bit+0x5f/0x90
[  484.712618]  [<ffffffff81100f53>] wait_on_page_bit+0x73/0x80
[  484.712623]  [<ffffffff8107f250>] ? wake_bit_function+0x0/0x40
[  484.712628]  [<ffffffff8110b975>] ? pagevec_lookup_tag+0x25/0x40
[  484.712632]  [<ffffffff8110141d>] filemap_fdatawait_range+0x10d/0x1a0
[  484.712637]  [<ffffffff811014db>] filemap_fdatawait+0x2b/0x30
[  484.712640]  [<ffffffff811017e4>] filemap_write_and_wait+0x44/0x50
[  484.712660]  [<ffffffffa038dfcc>] nfs_setattr+0x14c/0x160 [nfs]
[  484.712666]  [<ffffffff8116c55b>] notify_change+0x16b/0x310
[  484.712671]  [<ffffffff8117b15c>] utimes_common+0xdc/0x1b0
[  484.712675]  [<ffffffff8117b2d1>] do_utimes+0xa1/0xf0
[  484.712678]  [<ffffffff8117b3e3>] sys_utimensat+0x33/0x90
[  484.712684]  [<ffffffff8100a307>] tracesys+0xd9/0xde

When I strace the mv job from the client, it hangs on utimensat()

read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
1048576) = 1048576
write(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
1048576) = 1048576
read(3, 
"\0\0\6q7\17\\\30\3L\342\0\277\2\16\355!\33\362\366\22\201\223\1h\201\16\355\22\n\227\340"...,
 1048576) = 848404
write(4, 
"\0\0\6q7\17\\\30\3L\342\0\277\2\16\355!\33\362\366\22\201\223\1h\201\16\355\22\n\227\340"...,
 848404) = 848404
read(3, "", 1048576)                    = 0
utimensat(4, NULL, {{1300591624, 0}, {1300508167, 0}}, 0

This server has been working well for well over a year, and it normally works 
well, but in this case, we see repeatedly hangs. The clients experiencing this 
problem, will hang with 100% "wio" on one core, and the only way I've found to 
solve it temporarily is to reboot the client. I can't find anything in the 
server logs, but since the problem is from both an elderly Fedora box and an 
updated Ubuntu 10.04.2 machine, and that it has been working well for quite 
some time, I guess the upgrade to OI may be to blame.

Does anyone know how I can debug this further?

Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
[email protected]
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk.

_______________________________________________
OpenIndiana-discuss mailing list
[email protected]
http://openindiana.org/mailman/listinfo/openindiana-discuss

Reply via email to