Brent Jones wrote:
On Fri, Jun 5, 2009 at 3:25 PM, Ian Collins <i...@ianshome.com> wrote:
Brent Jones wrote:
On the sending side, I CAN kill the ZFS send process, but the remote
side leaves its processes going, and I CANNOT kill -9 them. I also
cannot reboot the receiving system, at init 6, the system will just
hang trying to unmount the file systems.
I have to physically cut power to the server, but a couple days later,
this issue will occur again.


I have seen this on Solaris 10.  Something appears to break with a pool or
filesystem causing zfs receive to hang in the kernel.  Once this happens,
any zfs command that changes the state of the pool/filesystem will hang,
including a zpool detach or an int 6.

Can you get truss -p or mdb -p to work on the stuck process?

I cannot.

# truss -p 11308
truss: unanticipated system error: 11308
(r...@pdxfilu02)-(06:29 PM Fri Jun 05)-(log)
# mdb -p 11308
mdb: cannot debug 11308: unanticipated system error
mdb: failed to initialize target: No such file or directory

Same as me...
All the hung zfs receives PID's have '1' as their PPID.
Is it safe to truss PID 1?  :)

When you saw this, how did you escape it? I've found only pulling the
plug will fix it.

I'm several miles away from the boxes, so I had to resort to a hard reset through the ILOM.

I have yet to identify the root cause, all I know is the problem happens "sometimes". I have sent over several 10s of thousands of snapshots to the last system that hung over the past few days without incident.

--
Ian.

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to