And more info, I've found somebody far more knowledgable than I hitting exactly the same problem in April this year:
http://web.ivy.net/~carton/oneNightOfWork/20061119-carton.html Check out the update at the bottom dated 2007-04-03. I quote: "10.100.100.135 is the iSCSI target. When it's down, connect() from the Solaris initiator will take a while to time out. I added its address as an alias on some other box's interface, so Solaris would get a TCP reset immediately. Now zpool status is fast again" and "error handling is the most important part of any RAID implementation. In this case, among the more obvious and immediately inconvenient problems we have a fundamentally serious one: iSCSI's not returning errors fast enough is pushing us up against a timeout in the svc subsystem, so one broken disk can potentially cascade into breaking a huge swath of the SVM subsystem." This message posted from opensolaris.org _______________________________________________ storage-discuss mailing list [email protected] http://mail.opensolaris.org/mailman/listinfo/storage-discuss
