Hi all
I'm fighting a problem with an OpenIndiana 148 server and NFS3 mounts from
Linux clients. A simple cron job is run that moves some data files from another
server to the OI box. This runs well for a while, until at some point, the
client hangs and reports NFS server connection failure. The calltrace from
linux is
[ 484.712558] INFO: task mv:2353 blocked for more than 120 seconds.
[ 484.712562] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[ 484.712566] mv D 0000000100000a8b 0 2353 2352 0x00000001
[ 484.712573] ffff880234b75ba8 0000000000000086 ffff880234b75b18
0000000000015980
[ 484.712579] ffff880234b75fd8 0000000000015980 ffff880234b75fd8
ffff8802349896e0
[ 484.712584] 0000000000015980 0000000000015980 ffff880234b75fd8
0000000000015980
[ 484.712589] Call Trace:
[ 484.712599] [<ffffffff81100d60>] ? sync_page+0x0/0x50
[ 484.712606] [<ffffffff8159e053>] io_schedule+0x73/0xc0
[ 484.712610] [<ffffffff81100d9d>] sync_page+0x3d/0x50
[ 484.712614] [<ffffffff8159e6cf>] __wait_on_bit+0x5f/0x90
[ 484.712618] [<ffffffff81100f53>] wait_on_page_bit+0x73/0x80
[ 484.712623] [<ffffffff8107f250>] ? wake_bit_function+0x0/0x40
[ 484.712628] [<ffffffff8110b975>] ? pagevec_lookup_tag+0x25/0x40
[ 484.712632] [<ffffffff8110141d>] filemap_fdatawait_range+0x10d/0x1a0
[ 484.712637] [<ffffffff811014db>] filemap_fdatawait+0x2b/0x30
[ 484.712640] [<ffffffff811017e4>] filemap_write_and_wait+0x44/0x50
[ 484.712660] [<ffffffffa038dfcc>] nfs_setattr+0x14c/0x160 [nfs]
[ 484.712666] [<ffffffff8116c55b>] notify_change+0x16b/0x310
[ 484.712671] [<ffffffff8117b15c>] utimes_common+0xdc/0x1b0
[ 484.712675] [<ffffffff8117b2d1>] do_utimes+0xa1/0xf0
[ 484.712678] [<ffffffff8117b3e3>] sys_utimensat+0x33/0x90
[ 484.712684] [<ffffffff8100a307>] tracesys+0xd9/0xde
When I strace the mv job from the client, it hangs on utimensat()
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
1048576) = 1048576
write(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
1048576) = 1048576
read(3,
"\0\0\6q7\17\\\30\3L\342\0\277\2\16\355!\33\362\366\22\201\223\1h\201\16\355\22\n\227\340"...,
1048576) = 848404
write(4,
"\0\0\6q7\17\\\30\3L\342\0\277\2\16\355!\33\362\366\22\201\223\1h\201\16\355\22\n\227\340"...,
848404) = 848404
read(3, "", 1048576) = 0
utimensat(4, NULL, {{1300591624, 0}, {1300508167, 0}}, 0
This server has been working well for well over a year, and it normally works
well, but in this case, we see repeatedly hangs. The clients experiencing this
problem, will hang with 100% "wio" on one core, and the only way I've found to
solve it temporarily is to reboot the client. I can't find anything in the
server logs, but since the problem is from both an elderly Fedora box and an
updated Ubuntu 10.04.2 machine, and that it has been working well for quite
some time, I guess the upgrade to OI may be to blame.
Does anyone know how I can debug this further?
Vennlige hilsener / Best regards
roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
[email protected]
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og
relevante synonymer på norsk.
_______________________________________________
OpenIndiana-discuss mailing list
[email protected]
http://openindiana.org/mailman/listinfo/openindiana-discuss