Brent Jones wrote:
On Fri, Jun 5, 2009 at 3:25 PM, Ian Collins <i...@ianshome.com> wrote:
Brent Jones wrote:
On the sending side, I CAN kill the ZFS send process, but the remote
side leaves its processes going, and I CANNOT kill -9 them. I also
cannot reboot the receiving system, at init 6, the system will just
hang trying to unmount the file systems.
I have to physically cut power to the server, but a couple days later,
this issue will occur again.
I have seen this on Solaris 10. Something appears to break with a pool or
filesystem causing zfs receive to hang in the kernel. Once this happens,
any zfs command that changes the state of the pool/filesystem will hang,
including a zpool detach or an int 6.
Can you get truss -p or mdb -p to work on the stuck process?
I cannot.
# truss -p 11308
truss: unanticipated system error: 11308
(r...@pdxfilu02)-(06:29 PM Fri Jun 05)-(log)
# mdb -p 11308
mdb: cannot debug 11308: unanticipated system error
mdb: failed to initialize target: No such file or directory
Same as me...
All the hung zfs receives PID's have '1' as their PPID.
Is it safe to truss PID 1? :)
When you saw this, how did you escape it? I've found only pulling the
plug will fix it.
I'm several miles away from the boxes, so I had to resort to a hard
reset through the ILOM.
I have yet to identify the root cause, all I know is the problem happens
"sometimes". I have sent over several 10s of thousands of snapshots to
the last system that hung over the past few days without incident.
--
Ian.
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss