Re: Sleeping on isp_mboxwaiting with the following non-sleepablelocks held:

2003-10-23 Thread Poul-Henning Kamp
In message [EMAIL PROTECTED], Matthew Jacob writes:

[1]: by 'sleep', I mean if I do *my* locking right, I should be able to
yield the processor and wait for an event (an interrupt in this case).

Not so when your device driver is entered through the devsw-strategy()
function, since that [cw]ould deadlock the entire disk-I/O system until
you return back up.

Ideally, devsw-strategy() should just queue the request and return
immediately.  In a world where context switches were free, scheduling
a task_queue to run foostart() (if necessary) would be the way to
do things.

Most drivers call their own foostart() from strategy(), and as long
as foostart() does not go on long-term vacation, this is also OK,
poking a few registers, doing a bit of BUSDMA work is acceptable.

But sleeping is not OK, mostly because a lot of sleeps may not
properly terminate in case of a memory shortage, and the way we
clear up a memory shortage is to page something out, and to page
something out we need to issue disk I/O, but somebody is holding
that hostage by sleeping in a driver...

I will conceede that there are a certain small class of legitimate
sleeps that could be performed, unfortunately we can not automatically
tell an OK sleep from a not OK sleep, and therefore the decision
was to ban all sleeps, until such time as we had a case of something
that could not be (sensibly) done without the ability to sleep.

I realize that in this case, you're sitting below CAM and scsi_??.c
and have very little say in what happens between devsw-strategy()
and your driver.

As I read the original report, this is a case of error-handling.
Considering how long time error handling often takes and how
imperfect results one gets a lot of the time, error handling
should never strategy() sleep or take a long time, since that
will eliminates the chances that the system can flush dirty data
to other devices as part of a shutdown or panic.

At some point soon, I plan to start measuring how much time
drivers spend in their strategy() routine and any offenders
will be put on notice because this is a brilliant way to hose
our overall system performance.  I'm not going to set any
specific performance goal at this time, but I think we can
agree that more than one millisecond is way over the line.

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
[EMAIL PROTECTED] | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Sleeping on isp_mboxwaiting with the following non-sleepablelocks held:

2003-10-22 Thread Poul-Henning Kamp
In message [EMAIL PROTECTED], Matthew Jacob writes:

Well, I don't agree with the design here, but it is what it is. I'll
make the change that you've added a requirement for. 

This is nothing new, but it is new that we can and do enforce it.

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
[EMAIL PROTECTED] | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


RE: Sleeping on isp_mboxwaiting with the following non-sleepablelocks held:

2003-10-21 Thread Matthew Jacob
So? How about some details and context?

I thought was told that being able to use locks in HBAs is fine. I had
them on for a while, and then had them off. I turned them on again over
a month ago. I'm somewhat surprised to see that a problem shows up now.

*I* do the right thing with locks, IMO. I hold them in my module when I
enter and release them if/when I leave. Seeing a lock held by some
random caller causing me to blow up to me seems to be a hole in the
architecture, but I'd be the first to admit that I hardly am up to date
on what the rules of the road are now so such an opinion is
ill-informed.

Comment out ISP_SMPLOCK in isp_freebsd.h. If the problem goes away,
we'll make the change back again.

-matt

p.s.: you have *way* more issues here than locking- you've a bad disk.
Anyway, isn't alpha desupported?

-Original Message-
From: Kris Kennaway [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, October 21, 2003 9:06 AM
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]
Subject: Sleeping on isp_mboxwaiting with the following
non-sleepablelocks held:


This happened to me just now on alpha:

(da0:isp0:0:0:0): Retrying Command (per Sense Data)
(da0:isp0:0:0:0): READ(06). CDB: 8 5 f 30 10 0
(da0:isp0:0:0:0): CAM Status: SCSI Status Error
(da0:isp0:0:0:0): SCSI Status: Check Condition
(da0:isp0:0:0:0): MEDIUM ERROR info:50f30 asc:11,0
(da0:isp0:0:0:0): Unrecovered read error field replaceable unit: e4
actual retry count: 257
(da0:isp0:0:0:0): Retrying Command (per Sense Data)
swap_pager: indefinite wait buffer: device: da0b, blkno: 331568, size:
8192
(da0:isp0:0:0:0): READ(06). CDB: 8 5 f 30 10 0
(da0:isp0:0:0:0): CAM Status: SCSI Status Error
(da0:isp0:0:0:0): SCSI Status: Check Condition
(da0:isp0:0:0:0): MEDIUM ERROR info:50f30 asc:11,0
(da0:isp0:0:0:0): Unrecovered read error field replaceable unit: e4
actual retry count: 257
(da0:isp0:0:0:0): Retries Exhausted
swap_pager: I/O error - pagein failed; blkno 331568,size 8192, error 5
vm_fault: pager read error, pid 90537 (aspell)
Sleeping on isp_mboxwaiting with the following non-sleepablelocks
held: exclusive sleep mutex g_xdown r = 0 (0xfe0006bfdc78) locked @
/a/asami/portbuild/alpha/src-client/sys/geom/geom_io.c:345
witness_warn
Stopped at  Debugger+0x38:  zapnot  v0,#0xf,v0  v0=0x0
db trace
Debugger() at Debugger+0x38
witness_warn() at witness_warn+0x284
msleep() at msleep+0xa8
isp_mbox_wait_complete() at isp_mbox_wait_complete+0x94
isp_mboxcmd() at isp_mboxcmd+0x258
isp_update_bus() at isp_update_bus+0x2f0
isp_update() at isp_update+0x54
isp_start() at isp_start+0x208
isp_action() at isp_action+0x1bc
xpt_run_dev_sendq() at xpt_run_dev_sendq+0x23c
xpt_action() at xpt_action+0x2a0
dastart() at dastart+0x220
xpt_run_dev_allocq() at xpt_run_dev_allocq+0xf0
xpt_schedule() at xpt_schedule+0xc0
dastrategy() at dastrategy+0x70
g_disk_start() at g_disk_start+0x1ec
g_io_schedule_down() at g_io_schedule_down+0x234
g_down_procbody() at g_down_procbody+0x5c
fork_exit() at fork_exit+0x100
exception_return() at exception_return
--- root of call graph ---

Kris

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Sleeping on isp_mboxwaiting with the following non-sleepablelocks held:

2003-10-21 Thread Poul-Henning Kamp
In message [EMAIL PROTECTED], Matthew Jacob writes:
So? How about some details and context?

I thought was told that being able to use locks in HBAs is fine. I had
them on for a while, and then had them off. I turned them on again over
a month ago. I'm somewhat surprised to see that a problem shows up now.

*I* do the right thing with locks, IMO. I hold them in my module when I
enter and release them if/when I leave. Seeing a lock held by some
random caller causing me to blow up to me seems to be a hole in the
architecture, but I'd be the first to admit that I hardly am up to date
on what the rules of the road are now so such an opinion is
ill-informed.

The lock held in this case, is not some random caller, that is a
mutex held specifically to expose device drivers which try to sleep
in their -strategy() function.

You cannot sleep in the strategy() function because that would hold
op I/O, and therefore likely lead to deadlock.

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
[EMAIL PROTECTED] | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


RE: Sleeping on isp_mboxwaiting with the following non-sleepablelocks held:

2003-10-21 Thread Matthew Jacob

Well, I don't agree with the design here, but it is what it is. I'll
make the change that you've added a requirement for. 

-Original Message-
From: Poul-Henning Kamp [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, October 21, 2003 2:46 PM
To: [EMAIL PROTECTED]
Cc: 'Kris Kennaway'; [EMAIL PROTECTED]; [EMAIL PROTECTED]
Subject: Re: Sleeping on isp_mboxwaiting with the following
non-sleepablelocks held: 


In message [EMAIL PROTECTED], Matthew Jacob
writes:
So? How about some details and context?

I thought was told that being able to use locks in HBAs is fine. I had 
them on for a while, and then had them off. I turned them on again over

a month ago. I'm somewhat surprised to see that a problem shows up now.

*I* do the right thing with locks, IMO. I hold them in my module when I

enter and release them if/when I leave. Seeing a lock held by some 
random caller causing me to blow up to me seems to be a hole in the 
architecture, but I'd be the first to admit that I hardly am up to date

on what the rules of the road are now so such an opinion is 
ill-informed.

The lock held in this case, is not some random caller, that is a mutex
held specifically to expose device drivers which try to sleep in their
-strategy() function.

You cannot sleep in the strategy() function because that would hold op
I/O, and therefore likely lead to deadlock.

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
[EMAIL PROTECTED] | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by
incompetence.

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Sleeping on isp_mboxwaiting with the following non-sleepablelocks held:

2003-10-21 Thread Kris Kennaway
On Tue, Oct 21, 2003 at 02:30:21PM -0700, Matthew Jacob wrote:
 So? How about some details and context?

Um, what more details and context do you need?  I provided the log
of the system activity (specifically, media errors and swap read
failure) leading up to the panic, and the ddb backtrace.

 I thought was told that being able to use locks in HBAs is fine. I had
 them on for a while, and then had them off. I turned them on again over
 a month ago. I'm somewhat surprised to see that a problem shows up now.

This was apparently triggered by the disk failure, which is not a
commonly exercised code path.

 *I* do the right thing with locks, IMO. I hold them in my module when I
 enter and release them if/when I leave. Seeing a lock held by some
 random caller causing me to blow up to me seems to be a hole in the
 architecture, but I'd be the first to admit that I hardly am up to date
 on what the rules of the road are now so such an opinion is
 ill-informed.
 
 Comment out ISP_SMPLOCK in isp_freebsd.h. If the problem goes away,
 we'll make the change back again.

I'll do what I can.

 -matt
 
 p.s.: you have *way* more issues here than locking- you've a bad disk.

I know, but the system shouldn't blow up with a lock assertion in this
failure mode.

 Anyway, isn't alpha desupported?

No.

Kris


pgp0.pgp
Description: PGP signature