Re: Sleeping on isp_mboxwaiting with the following non-sleepablelocks held:
In message [EMAIL PROTECTED], Matthew Jacob writes: [1]: by 'sleep', I mean if I do *my* locking right, I should be able to yield the processor and wait for an event (an interrupt in this case). Not so when your device driver is entered through the devsw-strategy() function, since that [cw]ould deadlock the entire disk-I/O system until you return back up. Ideally, devsw-strategy() should just queue the request and return immediately. In a world where context switches were free, scheduling a task_queue to run foostart() (if necessary) would be the way to do things. Most drivers call their own foostart() from strategy(), and as long as foostart() does not go on long-term vacation, this is also OK, poking a few registers, doing a bit of BUSDMA work is acceptable. But sleeping is not OK, mostly because a lot of sleeps may not properly terminate in case of a memory shortage, and the way we clear up a memory shortage is to page something out, and to page something out we need to issue disk I/O, but somebody is holding that hostage by sleeping in a driver... I will conceede that there are a certain small class of legitimate sleeps that could be performed, unfortunately we can not automatically tell an OK sleep from a not OK sleep, and therefore the decision was to ban all sleeps, until such time as we had a case of something that could not be (sensibly) done without the ability to sleep. I realize that in this case, you're sitting below CAM and scsi_??.c and have very little say in what happens between devsw-strategy() and your driver. As I read the original report, this is a case of error-handling. Considering how long time error handling often takes and how imperfect results one gets a lot of the time, error handling should never strategy() sleep or take a long time, since that will eliminates the chances that the system can flush dirty data to other devices as part of a shutdown or panic. At some point soon, I plan to start measuring how much time drivers spend in their strategy() routine and any offenders will be put on notice because this is a brilliant way to hose our overall system performance. I'm not going to set any specific performance goal at this time, but I think we can agree that more than one millisecond is way over the line. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 [EMAIL PROTECTED] | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Sleeping on isp_mboxwaiting with the following non-sleepablelocks held:
In message [EMAIL PROTECTED], Matthew Jacob writes: Well, I don't agree with the design here, but it is what it is. I'll make the change that you've added a requirement for. This is nothing new, but it is new that we can and do enforce it. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 [EMAIL PROTECTED] | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
RE: Sleeping on isp_mboxwaiting with the following non-sleepablelocks held:
So? How about some details and context? I thought was told that being able to use locks in HBAs is fine. I had them on for a while, and then had them off. I turned them on again over a month ago. I'm somewhat surprised to see that a problem shows up now. *I* do the right thing with locks, IMO. I hold them in my module when I enter and release them if/when I leave. Seeing a lock held by some random caller causing me to blow up to me seems to be a hole in the architecture, but I'd be the first to admit that I hardly am up to date on what the rules of the road are now so such an opinion is ill-informed. Comment out ISP_SMPLOCK in isp_freebsd.h. If the problem goes away, we'll make the change back again. -matt p.s.: you have *way* more issues here than locking- you've a bad disk. Anyway, isn't alpha desupported? -Original Message- From: Kris Kennaway [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 21, 2003 9:06 AM To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: Sleeping on isp_mboxwaiting with the following non-sleepablelocks held: This happened to me just now on alpha: (da0:isp0:0:0:0): Retrying Command (per Sense Data) (da0:isp0:0:0:0): READ(06). CDB: 8 5 f 30 10 0 (da0:isp0:0:0:0): CAM Status: SCSI Status Error (da0:isp0:0:0:0): SCSI Status: Check Condition (da0:isp0:0:0:0): MEDIUM ERROR info:50f30 asc:11,0 (da0:isp0:0:0:0): Unrecovered read error field replaceable unit: e4 actual retry count: 257 (da0:isp0:0:0:0): Retrying Command (per Sense Data) swap_pager: indefinite wait buffer: device: da0b, blkno: 331568, size: 8192 (da0:isp0:0:0:0): READ(06). CDB: 8 5 f 30 10 0 (da0:isp0:0:0:0): CAM Status: SCSI Status Error (da0:isp0:0:0:0): SCSI Status: Check Condition (da0:isp0:0:0:0): MEDIUM ERROR info:50f30 asc:11,0 (da0:isp0:0:0:0): Unrecovered read error field replaceable unit: e4 actual retry count: 257 (da0:isp0:0:0:0): Retries Exhausted swap_pager: I/O error - pagein failed; blkno 331568,size 8192, error 5 vm_fault: pager read error, pid 90537 (aspell) Sleeping on isp_mboxwaiting with the following non-sleepablelocks held: exclusive sleep mutex g_xdown r = 0 (0xfe0006bfdc78) locked @ /a/asami/portbuild/alpha/src-client/sys/geom/geom_io.c:345 witness_warn Stopped at Debugger+0x38: zapnot v0,#0xf,v0 v0=0x0 db trace Debugger() at Debugger+0x38 witness_warn() at witness_warn+0x284 msleep() at msleep+0xa8 isp_mbox_wait_complete() at isp_mbox_wait_complete+0x94 isp_mboxcmd() at isp_mboxcmd+0x258 isp_update_bus() at isp_update_bus+0x2f0 isp_update() at isp_update+0x54 isp_start() at isp_start+0x208 isp_action() at isp_action+0x1bc xpt_run_dev_sendq() at xpt_run_dev_sendq+0x23c xpt_action() at xpt_action+0x2a0 dastart() at dastart+0x220 xpt_run_dev_allocq() at xpt_run_dev_allocq+0xf0 xpt_schedule() at xpt_schedule+0xc0 dastrategy() at dastrategy+0x70 g_disk_start() at g_disk_start+0x1ec g_io_schedule_down() at g_io_schedule_down+0x234 g_down_procbody() at g_down_procbody+0x5c fork_exit() at fork_exit+0x100 exception_return() at exception_return --- root of call graph --- Kris ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Sleeping on isp_mboxwaiting with the following non-sleepablelocks held:
In message [EMAIL PROTECTED], Matthew Jacob writes: So? How about some details and context? I thought was told that being able to use locks in HBAs is fine. I had them on for a while, and then had them off. I turned them on again over a month ago. I'm somewhat surprised to see that a problem shows up now. *I* do the right thing with locks, IMO. I hold them in my module when I enter and release them if/when I leave. Seeing a lock held by some random caller causing me to blow up to me seems to be a hole in the architecture, but I'd be the first to admit that I hardly am up to date on what the rules of the road are now so such an opinion is ill-informed. The lock held in this case, is not some random caller, that is a mutex held specifically to expose device drivers which try to sleep in their -strategy() function. You cannot sleep in the strategy() function because that would hold op I/O, and therefore likely lead to deadlock. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 [EMAIL PROTECTED] | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
RE: Sleeping on isp_mboxwaiting with the following non-sleepablelocks held:
Well, I don't agree with the design here, but it is what it is. I'll make the change that you've added a requirement for. -Original Message- From: Poul-Henning Kamp [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 21, 2003 2:46 PM To: [EMAIL PROTECTED] Cc: 'Kris Kennaway'; [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: Re: Sleeping on isp_mboxwaiting with the following non-sleepablelocks held: In message [EMAIL PROTECTED], Matthew Jacob writes: So? How about some details and context? I thought was told that being able to use locks in HBAs is fine. I had them on for a while, and then had them off. I turned them on again over a month ago. I'm somewhat surprised to see that a problem shows up now. *I* do the right thing with locks, IMO. I hold them in my module when I enter and release them if/when I leave. Seeing a lock held by some random caller causing me to blow up to me seems to be a hole in the architecture, but I'd be the first to admit that I hardly am up to date on what the rules of the road are now so such an opinion is ill-informed. The lock held in this case, is not some random caller, that is a mutex held specifically to expose device drivers which try to sleep in their -strategy() function. You cannot sleep in the strategy() function because that would hold op I/O, and therefore likely lead to deadlock. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 [EMAIL PROTECTED] | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Sleeping on isp_mboxwaiting with the following non-sleepablelocks held:
On Tue, Oct 21, 2003 at 02:30:21PM -0700, Matthew Jacob wrote: So? How about some details and context? Um, what more details and context do you need? I provided the log of the system activity (specifically, media errors and swap read failure) leading up to the panic, and the ddb backtrace. I thought was told that being able to use locks in HBAs is fine. I had them on for a while, and then had them off. I turned them on again over a month ago. I'm somewhat surprised to see that a problem shows up now. This was apparently triggered by the disk failure, which is not a commonly exercised code path. *I* do the right thing with locks, IMO. I hold them in my module when I enter and release them if/when I leave. Seeing a lock held by some random caller causing me to blow up to me seems to be a hole in the architecture, but I'd be the first to admit that I hardly am up to date on what the rules of the road are now so such an opinion is ill-informed. Comment out ISP_SMPLOCK in isp_freebsd.h. If the problem goes away, we'll make the change back again. I'll do what I can. -matt p.s.: you have *way* more issues here than locking- you've a bad disk. I know, but the system shouldn't blow up with a lock assertion in this failure mode. Anyway, isn't alpha desupported? No. Kris pgp0.pgp Description: PGP signature