Re: Linux Cluster using shared scsi
Doug Ledford wrote: > > ... > > If told to hold a reservation, then resend your reservation request once every > 2 seconds (this actually has very minimal CPU/BUS usage and isn't as big a > deal as requesting a reservation every 2 seconds might sound). The first time > the reservation is refused, consider the reservation stolen by another machine > and exit (or optionally, reboot). I agree that the resend of the reservation is not all that big but there is also the proverbial "straw that broke the Camel's back." When there is enough activity there could be logic added to avoid sending reservations. In all cases when a reservation is forcefully removed the result will cause the device to return a UNIT ATTENTION (Well, I guess I know that is the behavior on Parallel SCSI, is this true for FC?). So the host should know with the next command issued that it lost the reservation (not necessarily that someone else has stolen it but that for some reason the device just lost it). So you could "check" to see if within the last 2 seconds (a) has an IO completed and (b) every IO that completed in that 2 second span completed without any "error". In error I mean without incident, such as a check condition. In this case the reservation is not needed as you know nothing has happened to cause the reservation to be lost. Perhaps in this heavy load situation you could even add logic to issue the reservation as soon as the mid-layer is aware that the reservation was broken maybe saving a second or so? I see this as an enhancement that could be added on later, perhaps keep in mind this enhancement so that your initial development does not make it more difficult to implement it later. Eddie - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED]
Re: Linux Cluster using shared scsi
Pavel Machek writes: > Hi! > > > > > ... > > > > > > > > If told to hold a reservation, then resend your reservation request once every > > > > 2 seconds (this actually has very minimal CPU/BUS usage and isn't as big a > > > > deal as requesting a reservation every 2 seconds might sound). The first time > > > > the reservation is refused, consider the reservation stolen by another machine > > > > and exit (or optionally, reboot). > > > > > > Umm. Reboot? What do you think this is? Windoze? > > > > It's the *only* way to guarantee that the drive is never touched by more than > > one machine at a time (notice, I've not been talking about a shared use drive, > > only one machine in the cluster "owns" the drive at a time, and it isn't for > > single transactions that it owns the drive, it owns the drive for as long as > > it is alive, this is a limitation of the filesystes currently available in > > mainstream kernels). The reservation conflict and subsequent reboot also > > *only* happens when a reservation has been forcefully stolen from a > >machine. > > I do not believe reboot from kernel is right approach. Tell init with > special signal, maybe; but do not reboot forcefully. This is policy; > doing reboot might be right answer in 90% cases; that does not mean > you should do it always. ... However distateful it sounds, there is precedent for the behavior that Doug is proposing in commercial clustering implementations. My recollection is that both Compaq TruCluster and HP Service Guard have logic that will panic the kernel when a disk is "stolen" from under a running service and there is a "network partition" in the cluster. A network partition occurs when multiple machines in the cluster are runnig, but the HA software agents on two nodes can't communicate via the network to arbitrate which node should be the owner of the disk. -Eric -- Eric Z. Ayers Lead Software Engineer Phone: +1 404-705-2864Computer Generation, Incorporated Fax:+1 404-705-2805 an Intec Telecom Systems Company Web:http://www.intec-telecom-systems.com/ Email: [EMAIL PROTECTED] Postal: Bldg G 4th Floor, 5775 Peachtree-Dunwoody Rd, Atlanta, GA 30342 USA - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED]
Re: Linux Cluster using shared scsi
There is another nasty in multi-port arrays that I should perhaps point out: a bus reset isn't supposed to drop the reservation if it was taken on another port. A device or LUN reset will drop reservations on all ports. This behaviour, although clearly mandated by the SCSI-3-SPC, is rather patchily implemented in arrays and I have seen some multi-port arrays that will, illegally, drop reservations on all ports on receipt of a bus reset. Unfortunately, most Linux SCSI drivers won't issue device resets on command, they'll only issue bus resets, so it is possible to get into a situation where you cannot break a reservation belonging to a dead machine, if you set up a point-to-point cluster rather than a true shared-scsi one. James - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED]
Re: Linux Cluster using shared scsi
Max TenEyck Woodbury wrote: > > Doug Ledford wrote: > > > > Max TenEyck Woodbury wrote: > >> > >> Umm. Reboot? What do you think this is? Windoze? > > > > It's the *only* way to guarantee that the drive is never touched by more > > than one machine at a time (notice, I've not been talking about a shared > > use drive, only one machine in the cluster "owns" the drive at a time, > > and it isn't for single transactions that it owns the drive, it owns > > the drive for as long as it is alive, this is a limitation of the > > filesystes currently available in mainstream kernels). The reservation > > conflict and subsequent reboot also *only* happens when a reservation > > has been forcefully stolen from a machine. In that case, you are talking > > about a machine that has been failed over against its will, and the > > absolute safest thing to do in order to make sure the failed over machine > > doesn't screw the whole cluster up, is to make it reboot itself and > > re-enter the cluster as a new failover slave node instead of as a master > > node. If you want a shared common access device with write locking > > semantics, as you seem to be suggesting later on, then you need a > > different method of locking than what I put in this, I knew that as I > > wrote it and that was intentional. > > That was partly meant to be a joke, but it was also meant to make you stop > and think about what you are doing. From what little context I read, you > seem to be looking for a high availability solution. Rebooting a system, > even if there is a hot backup, should only be used as a last resort. This is something that only happens when a machine has been forcefully failed over against its will. I guess you would need to see the code to tell what I'm talking about, but in the description I gave of the code, if it doesn't get a reservation, it exits. The way the code is intended to be used is something like this: Given machine A as cluster master and machine B as a cluster slave. Machine A starts the reservation program with something like this as the command line: reserve --reserve --hold /dev/sdc This will result in the program grabbing a reservation on drive sdc (or exiting with a non-0 status on failure) and then sitting in a loop where it re-issues the reservation every 2 seconds. Under normal operation, the reserve program is not started at all on machine B. However, machine B does use the normal heartbeat method (be that the heartbeat package or something similar, but not reservations) to check that machine A is still alive. Given a failure in the communications between machine B and machine A, which would typically mean it is time to fail over the cluster, machine B can test the status of machine A by throwing a reset to the drive to break any existing reservations, waiting 4 seconds, then trying to run it's own reservation. This can be accomplished with the command: reserve --reset --reserve --hold /dev/sdc If the program fails to get the reservation then that means machine A was able to resend it's reservation. Obviously then, machine A isn't dead. Machine B can then decide that the heartbeat link is dead but machine A is still fine and not try any further failover actions, or it could decide that machine A has a working reserve program but key services or network connectivity may be dead, in which case a forced failover would be in order. To accomplish that, machine B can issue this command: reserve --preempt --hold /dev/sdc This will break machine A's reservation and take the drive over from machine A. It's at this point, and this point only, that machine A will see a reservation conflict. It has been forcefully failed over, so resetting/rebooting the machine is a perfectly fine alternative (and the reason it is recommended is because at this point in time, machine B may already be engaged in recovering the filesystem on the shared drive, and machine A may still have buffers it is trying to flush to the same drive, so in order to make sure machine A doesn't let some dirty buffer get through a break in machine B's reservation caused by something as inane as another machine on the bus starting up and throwing an initial reset, we should reset machine A *as soon as we know it has been forcefully failed over and is no longer allowed to write to the drive*). Arguments with this can be directed to Stephen Tweedie, who is largely responsible for beating me into doing it this way ;-) > Another problem is that reservations do *not* guarantee ownership over > the long haul. There are too many mechanisms that break reservations to > build a complete strategy on them. See above about the reason for needing to reset the machine ;-) The overall package is cooperative in nature, so we don't rely on reservations except for the actual failover. However, due to this very issue, we need to kill the machine that was failed over as soon as possible after the failover to avoid any possible races with open wi
Re: Linux Cluster using shared scsi
Doug Ledford wrote: > > Max TenEyck Woodbury wrote: >> >> Umm. Reboot? What do you think this is? Windoze? > > It's the *only* way to guarantee that the drive is never touched by more > than one machine at a time (notice, I've not been talking about a shared > use drive, only one machine in the cluster "owns" the drive at a time, > and it isn't for single transactions that it owns the drive, it owns > the drive for as long as it is alive, this is a limitation of the > filesystes currently available in mainstream kernels). The reservation > conflict and subsequent reboot also *only* happens when a reservation > has been forcefully stolen from a machine. In that case, you are talking > about a machine that has been failed over against its will, and the > absolute safest thing to do in order to make sure the failed over machine > doesn't screw the whole cluster up, is to make it reboot itself and > re-enter the cluster as a new failover slave node instead of as a master > node. If you want a shared common access device with write locking > semantics, as you seem to be suggesting later on, then you need a > different method of locking than what I put in this, I knew that as I > wrote it and that was intentional. That was partly meant to be a joke, but it was also meant to make you stop and think about what you are doing. From what little context I read, you seem to be looking for a high availability solution. Rebooting a system, even if there is a hot backup, should only be used as a last resort. Another problem is that reservations do *not* guarantee ownership over the long haul. There are too many mechanisms that break reservations to build a complete strategy on them. Unfortunately, this ground was covered during the 'cluster wars' between IBM and DEC and the field is strewn with patents, so finding an open source solution may be tough. >> ... >> >> In other words, the reservation acts as a spin-lock to make sure updates >> occur atomically. > > Apples to oranges, as I described above. This is for a failover cluster, not > a shared data, load balancing cluster. Load balancing clusters do need a good locking method, but so do failover clusters. It's been 12 years since I did much with this, but I did do a fine tooth analysis of parts of DECs clustering code looking for ways it could fail. (The results were a few internal SPRs. I'm not at liberty to discuss the details, even if I could remember them. However, I can say without giving anything away that there were places where a hardware locking mechanism, like reservation, would have simplified the code and improved performance.) It was precisely the kinds of things associated with hardware failover that lead DEC to do clusters in the first place. The load balancing stuff was secondary, even if it did sell more systems in the long run. You may be able to get through the patent minefield by retrofitting the load balancing lock mechanisms to failover. It may be that you can make it work, but the tested solution requires software to back up the hardware. Good Luck, you'll probably need a lot of it. (no sarcasm intended.) [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED]
Re: Linux Cluster using shared scsi
Mike Anderson wrote: > > Doug, > > I guess I worded my question poorly. My question was around multi-path > devices in combination with SCSI-2 reserve vs SCSI-3 persistent reserve which > has not always been easy, but is more difficult is you use a name space that > can slip or can have multiple entries for the same physical device you want > to reserve. The software is independant on each machine, so it is entirely possible that the same disk will be in two different name spaces one two different machines and everything work just fine. For example, maybe it's /dev/sdc on machine A and /dev/sdf on machine B. That's fine, you simply tell the software on machine A to grab /dev/sdc and tell the software on machine B to grab /dev/sdf and all will work properly. Now, as to mixing SCSI-2 and SCSI-3 Persistent Reservations on the same drive, not a chance. The software will automatically use the best alternative available, so it won't fall back to SCSI-2 LUN reservations with SCSI-3 Persistent Reservations available (and if you force it to do so, then you have no one to blame but yourself ;-) > But here is a second try. > > If this is a failover cluster then node A will need to reserve all disks in > shareable space using sg or only a subset if node A has sync'd his sd name > space with the other node and they both wish to do work in disjoint pools of > disks. > > In the scenario of grabbing all the disks. If sda and sdb are the same device > than I can only reserve one of them and ensure IO only goes down through the > one I reserver-ed otherwise I could get a reservation conflict. Correct, if you hold a reservation on a device for which you have multiple paths, you have to use the correct path. > This goes > along with your previous patch on supporting multi-path at "md" and translating this >into the proper device to reserve. The md multipath driver doesn't currently allow the proper ioctls for us to do reservations at the md level. We could only do them by going in and doing reservations on /dev/sg entries behind the back of the md layer, which would be risky at best. > I guess it is up to the caller of > your service to handle this case correct?? For now, yes. And the best method to do so is to configure your failover software to know that a device is a multipath device, only attempt to reserve or mount one path, and fail back on the second path if the first path goes away by issuing a bus reset on the secondary path, then reserving the secondary path, then mounting the secondary path. However, as you will have lost data due to the failed writes on the primary path, I think this is of dubious value. Right now, as I see it, multipath and failover simply don't mix well. There is more work needed to make it work well. > If this not any clearer than my last mail I will just wait to see the code > :-). > > Thanks, > > -Mike > > Doug Ledford [[EMAIL PROTECTED]] wrote: > > > > > > > > To: Mike Anderson <[EMAIL PROTECTED]> > > cc: [EMAIL PROTECTED], James Bottomley > > <[EMAIL PROTECTED]>, "Roets, Chris" > > <[EMAIL PROTECTED]>, [EMAIL PROTECTED], > > [EMAIL PROTECTED] > > > > > > > > > > > > Mike Anderson wrote: > > > > > > Doug, > > > > > > A question on clarification. > > > > > > Is the configuration you are testing have both FC adapters going to the > > same > > > port of the storage device (mutli-path) or to different ports of the > > storage > > > device (mulit-port)? > > > > > > The reason I ask is that I thought if you are using SCSI-2 reserves that > > the > > > reserve was on a per initiator basis. How does one know which path has > > the > > > reserve? > > > > Reservations are global in nature in that a reservation with a device will > > block access to that device from all other initiators, including across > > different ports on multiport devices (or else they are broken and need a > > firmware update). > > > > > On a side note. I thought the GFS project had up leveled there locking / > > fencing > > > into a API called a locking harness to support different kinds of fencing > > > methods. Any thoughts if this capability could be plugged into this > > service so > > > that users could reduce recoding depending on which fencing support they > > > selected. > > > > I wouldn't know about that. > > > > -- > > > > Doug Ledford <[EMAIL PROTECTED]> http://people.redhat.com/dledford > > Please check my web site for aic7xxx updates/answers before > > e-mailing me about problems > > -- > Michael Anderson > [EMAIL PROTECTED] > > IBM Linux Technology Center - Storage IO > Phone (503) 578-4466 > Tie Line: 775-4466 -- Doug Ledford <[EMAIL PROTECTED]> http://people.redhat.com/dledford Please check my web site for aic7xxx updates/answers before e-mailing me about problems - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED]
Re: Linux Cluster using shared scsi
Doug, I guess I worded my question poorly. My question was around multi-path devices in combination with SCSI-2 reserve vs SCSI-3 persistent reserve which has not always been easy, but is more difficult is you use a name space that can slip or can have multiple entries for the same physical device you want to reserve. But here is a second try. If this is a failover cluster then node A will need to reserve all disks in shareable space using sg or only a subset if node A has sync'd his sd name space with the other node and they both wish to do work in disjoint pools of disks. In the scenario of grabbing all the disks. If sda and sdb are the same device than I can only reserve one of them and ensure IO only goes down through the one I reserver-ed otherwise I could get a reservation conflict. This goes along with your previous patch on supporting multi-path at "md" and translating this into the proper device to reserve. I guess it is up to the caller of your service to handle this case correct?? If this not any clearer than my last mail I will just wait to see the code :-). Thanks, -Mike Doug Ledford [[EMAIL PROTECTED]] wrote: > > > > To: Mike Anderson <[EMAIL PROTECTED]> > cc: [EMAIL PROTECTED], James Bottomley > <[EMAIL PROTECTED]>, "Roets, Chris" > <[EMAIL PROTECTED]>, [EMAIL PROTECTED], > [EMAIL PROTECTED] > > > > > > Mike Anderson wrote: > > > > Doug, > > > > A question on clarification. > > > > Is the configuration you are testing have both FC adapters going to the > same > > port of the storage device (mutli-path) or to different ports of the > storage > > device (mulit-port)? > > > > The reason I ask is that I thought if you are using SCSI-2 reserves that > the > > reserve was on a per initiator basis. How does one know which path has > the > > reserve? > > Reservations are global in nature in that a reservation with a device will > block access to that device from all other initiators, including across > different ports on multiport devices (or else they are broken and need a > firmware update). > > > On a side note. I thought the GFS project had up leveled there locking / > fencing > > into a API called a locking harness to support different kinds of fencing > > methods. Any thoughts if this capability could be plugged into this > service so > > that users could reduce recoding depending on which fencing support they > > selected. > > I wouldn't know about that. > > -- > > Doug Ledford <[EMAIL PROTECTED]> http://people.redhat.com/dledford > Please check my web site for aic7xxx updates/answers before > e-mailing me about problems -- Michael Anderson [EMAIL PROTECTED] IBM Linux Technology Center - Storage IO Phone (503) 578-4466 Tie Line: 775-4466 - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED]
Re: Linux Cluster using shared scsi
Mike Anderson wrote: > > Doug, > > A question on clarification. > > Is the configuration you are testing have both FC adapters going to the same > port of the storage device (mutli-path) or to different ports of the storage > device (mulit-port)? > > The reason I ask is that I thought if you are using SCSI-2 reserves that the > reserve was on a per initiator basis. How does one know which path has the > reserve? Reservations are global in nature in that a reservation with a device will block access to that device from all other initiators, including across different ports on multiport devices (or else they are broken and need a firmware update). > On a side note. I thought the GFS project had up leveled there locking / fencing > into a API called a locking harness to support different kinds of fencing > methods. Any thoughts if this capability could be plugged into this service so > that users could reduce recoding depending on which fencing support they > selected. I wouldn't know about that. -- Doug Ledford <[EMAIL PROTECTED]> http://people.redhat.com/dledford Please check my web site for aic7xxx updates/answers before e-mailing me about problems - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED]
Re: Linux Cluster using shared scsi
Max TenEyck Woodbury wrote: > > Doug Ledford wrote: > > > > ... > > > > If told to hold a reservation, then resend your reservation request once every > > 2 seconds (this actually has very minimal CPU/BUS usage and isn't as big a > > deal as requesting a reservation every 2 seconds might sound). The first time > > the reservation is refused, consider the reservation stolen by another machine > > and exit (or optionally, reboot). > > Umm. Reboot? What do you think this is? Windoze? It's the *only* way to guarantee that the drive is never touched by more than one machine at a time (notice, I've not been talking about a shared use drive, only one machine in the cluster "owns" the drive at a time, and it isn't for single transactions that it owns the drive, it owns the drive for as long as it is alive, this is a limitation of the filesystes currently available in mainstream kernels). The reservation conflict and subsequent reboot also *only* happens when a reservation has been forcefully stolen from a machine. In that case, you are talking about a machine that has been failed over against its will, and the absolute safest thing to do in order to make sure the failed over machine doesn't screw the whole cluster up, is to make it reboot itself and re-enter the cluster as a new failover slave node instead of as a master node. If you want a shared common access device with write locking semantics, as you seem to be suggesting later on, then you need a different method of locking than what I put in this, I knew that as I wrote it and that was intentional. > Really, You can NOT do clustering well if you don't have a consistent locking > mechanism. The use of a hardware locking method like 'reservation' may be a > good way to avoid race conditions, but it should be backed up by the > appropriate exchange of messages to make sure everybody has the same view of > the system. For example, you might use it like this: > > 1. Examine the lock list for conflicts. If a conflict is found, the lock >request fails. > > 2. Reserve the device with the lock on it. If the reservation fails, delay >a short amount of time and return to 1. > > 3. Update the lock list for the device. > > 4. When the list update is complete, release the reservation. > > In other words, the reservation acts as a spin-lock to make sure updates > occur atomically. Apples to oranges, as I described above. This is for a failover cluster, not a shared data, load balancing cluster. -- Doug Ledford <[EMAIL PROTECTED]> http://people.redhat.com/dledford Please check my web site for aic7xxx updates/answers before e-mailing me about problems - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED]
Re: Linux Cluster using shared scsi
Doug Ledford wrote: > > ... > > If told to hold a reservation, then resend your reservation request once every > 2 seconds (this actually has very minimal CPU/BUS usage and isn't as big a > deal as requesting a reservation every 2 seconds might sound). The first time > the reservation is refused, consider the reservation stolen by another machine > and exit (or optionally, reboot). Umm. Reboot? What do you think this is? Windoze? Really, You can NOT do clustering well if you don't have a consistent locking mechanism. The use of a hardware locking method like 'reservation' may be a good way to avoid race conditions, but it should be backed up by the appropriate exchange of messages to make sure everybody has the same view of the system. For example, you might use it like this: 1. Examine the lock list for conflicts. If a conflict is found, the lock request fails. 2. Reserve the device with the lock on it. If the reservation fails, delay a short amount of time and return to 1. 3. Update the lock list for the device. 4. When the list update is complete, release the reservation. In other words, the reservation acts as a spin-lock to make sure updates occur atomically. [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED]
Re: Linux Cluster using shared scsi
Doug, A question on clarification. Is the configuration you are testing have both FC adapters going to the same port of the storage device (mutli-path) or to different ports of the storage device (mulit-port)? The reason I ask is that I thought if you are using SCSI-2 reserves that the reserve was on a per initiator basis. How does one know which path has the reserve? On a side note. I thought the GFS project had up leveled there locking / fencing into a API called a locking harness to support different kinds of fencing methods. Any thoughts if this capability could be plugged into this service so that users could reduce recoding depending on which fencing support they selected. Thanks, -Mike Doug Ledford [[EMAIL PROTECTED]] wrote: > > > > To: [EMAIL PROTECTED] > cc: James Bottomley <[EMAIL PROTECTED]>, "Roets, Chris" > <[EMAIL PROTECTED]>, [EMAIL PROTECTED], > [EMAIL PROTECTED] > > > > > > "Eric Z. Ayers" wrote: > > > > Doug Ledford writes: > > (James Bottomley commented about the need for SCSI reservation kernel > patches) > > > > > > I agree. It's something that needs fixed in general, your software > needs it > > > as well, and I've written (about 80% done at this point) some open > source > > > software geared towards getting/holding reservations that also > requires the > > > same kernel patches (plus one more to be fully functional, an ioctl to > allow a > > > SCSI reservation to do a forced reboot of a machine). I'll be > releasing that > > > package in the short term (once I get back from my vacation anyway). > > > > > > > Hello Doug, > > > > Does this package also tell the kernel to "re-establish" a > > reservation for all devices after a bus reset, or at least inform a > > user level program? Finding out when there has been a bus reset has > > been a stumbling block for me. > > It doesn't have to. The kernel changes are minimal (basically James' SCSI > reset patch that he's been carrying around, the scsi reservation conflict > patch, and I need to write a third patch that makes the system optionally > reboot immediately on a reservation conflict and which is controlled by an > ioctl, but I haven't done that patch yet). All of the rest is implemented > in > user space via the /dev/sg entries. As such, it doesn't have any more > information about bus resets than you do. However, because of the policy > enacted in the code, it doesn't need to. Furthermore, because there are so > many ways to loose a reservation silently, it's foolhardy to try and keep > reservation consistency any way other than something similar to what I > outline > below. > > The package is meant to be a sort of "scsi reservation" library. The > application that uses the library is responsible for setting policy. I > wrote > a small, simple application that actually does a decent job of implementing > policy on the system. The policy it does implement is simple: > > If told to get a reservation, then attempt to get it. If the attempt is > blocked by an existing reservation and we aren't suppossed to reset the > drive, > then exit. If it's blocked and we are suppossed to reset the drive, then > send > a device reset, then wait 5 seconds, then try to get the reservation. If > we > again fail, then the other machine is still alive (as proven by the fact > that > it re-established its reservation after the reset) and we exit, else we now > have the reservation. > > If told to forcefully get a reservation, then attempt to get it. If the > attempt fails, then reset the device and try again immediately (no 5 second > wait), if it fails again, then exit. > > If told to hold a reservation, then resend your reservation request once > every > 2 seconds (this actually has very minimal CPU/BUS usage and isn't as big a > deal as requesting a reservation every 2 seconds might sound). The first > time > the reservation is refused, consider the reservation stolen by another > machine > and exit (or optionally, reboot). > > The package is meant to lock against itself (in other words, a malicious > user > with write access to the /dev/sg entries could confuse this locking > mechanism, > but it will work cooperatively with other copies of itself running on other > machines), the requirements for the locking to be safe are as follows: > > 1) A machine is not allowed to mount or otherwise use a drive in any way > shape or form until it has successfully acquired a reservation. > > 2) Once a machine has a reservation, it is not allowed to ever take any > action to break another machines reservation, so that if the reservation is > stolen, this machine is required to "gracefully" step away from the drive > (rebooting is the best way to accomplish this since even the act of > unmounting > the drive will attempt to write to it). > > 3) The timeouts in the program must be honored (resend your reservation, > when > you hold it, every 2 seconds so that a passive attempt to steal the > rese
Re: Linux Cluster using shared scsi
Hi Doug, Great to hear your progress on this. As I had not heard anything about this effort since this time last year I had assumed you put this project on the shelf. I will be happy to test these interfaces when they are ready. Eddie > "Eric Z. Ayers" wrote: > > > > Doug Ledford writes: > > (James Bottomley commented about the need for SCSI reservation kernel patches) > > > > > > I agree. It's something that needs fixed in general, your software needs it > > > as well, and I've written (about 80% done at this point) some open source > > > software geared towards getting/holding reservations that also requires the > > > same kernel patches (plus one more to be fully functional, an ioctl to allow a > > > SCSI reservation to do a forced reboot of a machine). I'll be releasing that > > > package in the short term (once I get back from my vacation anyway). > > > > > > > Hello Doug, > > > > Does this package also tell the kernel to "re-establish" a > > reservation for all devices after a bus reset, or at least inform a > > user level program? Finding out when there has been a bus reset has > > been a stumbling block for me. > > It doesn't have to. The kernel changes are minimal (basically James' SCSI > reset patch that he's been carrying around, the scsi reservation conflict > patch, and I need to write a third patch that makes the system optionally > reboot immediately on a reservation conflict and which is controlled by an > ioctl, but I haven't done that patch yet). All of the rest is implemented in > user space via the /dev/sg entries. As such, it doesn't have any more > information about bus resets than you do. However, because of the policy > enacted in the code, it doesn't need to. Furthermore, because there are so > many ways to loose a reservation silently, it's foolhardy to try and keep > reservation consistency any way other than something similar to what I outline > below. > > The package is meant to be a sort of "scsi reservation" library. The > application that uses the library is responsible for setting policy. I wrote > a small, simple application that actually does a decent job of implementing > policy on the system. The policy it does implement is simple: > > If told to get a reservation, then attempt to get it. If the attempt is > blocked by an existing reservation and we aren't suppossed to reset the drive, > then exit. If it's blocked and we are suppossed to reset the drive, then send > a device reset, then wait 5 seconds, then try to get the reservation. If we > again fail, then the other machine is still alive (as proven by the fact that > it re-established its reservation after the reset) and we exit, else we now > have the reservation. > > If told to forcefully get a reservation, then attempt to get it. If the > attempt fails, then reset the device and try again immediately (no 5 second > wait), if it fails again, then exit. > > If told to hold a reservation, then resend your reservation request once every > 2 seconds (this actually has very minimal CPU/BUS usage and isn't as big a > deal as requesting a reservation every 2 seconds might sound). The first time > the reservation is refused, consider the reservation stolen by another machine > and exit (or optionally, reboot). > > The package is meant to lock against itself (in other words, a malicious user > with write access to the /dev/sg entries could confuse this locking mechanism, > but it will work cooperatively with other copies of itself running on other > machines), the requirements for the locking to be safe are as follows: > > 1) A machine is not allowed to mount or otherwise use a drive in any way > shape or form until it has successfully acquired a reservation. > > 2) Once a machine has a reservation, it is not allowed to ever take any > action to break another machines reservation, so that if the reservation is > stolen, this machine is required to "gracefully" step away from the drive > (rebooting is the best way to accomplish this since even the act of unmounting > the drive will attempt to write to it). > > 3) The timeouts in the program must be honored (resend your reservation, when > you hold it, every 2 seconds so that a passive attempt to steal the > reservation will see you are still alive within the 5 second timeout and leave > you be, which is a sort of heartbeat in and of itself). > > Anyway, as I said in my previous email, it's about 80% complete. It currently > is up and running on SCSI-2 LUN based reservations. There is code to do > SCSI-2 and SCSI-3 extent based reservations but it hasn't been tested due to > lack of devices that support extent based reservations (my test bed is a > multipath FC setup, so I'm doing all my testing on FC drives over two FC > controllers in the same machine). I've still got to add the SCSI-3 Persistent > Reservation code to the library (again, I'm lacking test drives for this > scenario). The library itself requires that the p
Re: Linux Cluster using shared scsi
"Eric Z. Ayers" wrote: > > Doug Ledford writes: > (James Bottomley commented about the need for SCSI reservation kernel patches) > > > > I agree. It's something that needs fixed in general, your software needs it > > as well, and I've written (about 80% done at this point) some open source > > software geared towards getting/holding reservations that also requires the > > same kernel patches (plus one more to be fully functional, an ioctl to allow a > > SCSI reservation to do a forced reboot of a machine). I'll be releasing that > > package in the short term (once I get back from my vacation anyway). > > > > Hello Doug, > > Does this package also tell the kernel to "re-establish" a > reservation for all devices after a bus reset, or at least inform a > user level program? Finding out when there has been a bus reset has > been a stumbling block for me. It doesn't have to. The kernel changes are minimal (basically James' SCSI reset patch that he's been carrying around, the scsi reservation conflict patch, and I need to write a third patch that makes the system optionally reboot immediately on a reservation conflict and which is controlled by an ioctl, but I haven't done that patch yet). All of the rest is implemented in user space via the /dev/sg entries. As such, it doesn't have any more information about bus resets than you do. However, because of the policy enacted in the code, it doesn't need to. Furthermore, because there are so many ways to loose a reservation silently, it's foolhardy to try and keep reservation consistency any way other than something similar to what I outline below. The package is meant to be a sort of "scsi reservation" library. The application that uses the library is responsible for setting policy. I wrote a small, simple application that actually does a decent job of implementing policy on the system. The policy it does implement is simple: If told to get a reservation, then attempt to get it. If the attempt is blocked by an existing reservation and we aren't suppossed to reset the drive, then exit. If it's blocked and we are suppossed to reset the drive, then send a device reset, then wait 5 seconds, then try to get the reservation. If we again fail, then the other machine is still alive (as proven by the fact that it re-established its reservation after the reset) and we exit, else we now have the reservation. If told to forcefully get a reservation, then attempt to get it. If the attempt fails, then reset the device and try again immediately (no 5 second wait), if it fails again, then exit. If told to hold a reservation, then resend your reservation request once every 2 seconds (this actually has very minimal CPU/BUS usage and isn't as big a deal as requesting a reservation every 2 seconds might sound). The first time the reservation is refused, consider the reservation stolen by another machine and exit (or optionally, reboot). The package is meant to lock against itself (in other words, a malicious user with write access to the /dev/sg entries could confuse this locking mechanism, but it will work cooperatively with other copies of itself running on other machines), the requirements for the locking to be safe are as follows: 1) A machine is not allowed to mount or otherwise use a drive in any way shape or form until it has successfully acquired a reservation. 2) Once a machine has a reservation, it is not allowed to ever take any action to break another machines reservation, so that if the reservation is stolen, this machine is required to "gracefully" step away from the drive (rebooting is the best way to accomplish this since even the act of unmounting the drive will attempt to write to it). 3) The timeouts in the program must be honored (resend your reservation, when you hold it, every 2 seconds so that a passive attempt to steal the reservation will see you are still alive within the 5 second timeout and leave you be, which is a sort of heartbeat in and of itself). Anyway, as I said in my previous email, it's about 80% complete. It currently is up and running on SCSI-2 LUN based reservations. There is code to do SCSI-2 and SCSI-3 extent based reservations but it hasn't been tested due to lack of devices that support extent based reservations (my test bed is a multipath FC setup, so I'm doing all my testing on FC drives over two FC controllers in the same machine). I've still got to add the SCSI-3 Persistent Reservation code to the library (again, I'm lacking test drives for this scenario). The library itself requires that the program treat all reservations as extent/persistent reservations and it silently falls back to LUN reservations when neither of those two are available. My simple program that goes with the application just makes extent reservations of the whole disc, so it acts like a LUN reservation regardless, but there is considerably more flexibility in the library if a person wishes to program to it. -- Doug L
Re: Linux Cluster using shared scsi
> reserved.But if you did such a hot swap you would have "bigger > fish to fry" in a HA application... I mean, none of your data would be > there! You need to realise this has happened and do the right thing. Since it could be an md raid array the hotswap is not fatal. If its fatal you need to realise promptly before you either damage the disk contents inserted in error (if possible) and so the HA system can take countermeasures > if the kernel (by this I mean the scsi midlayer) was maintaining > reservations, that there would be some logic activated to "handle" > this problem, whether it be re-reserving the device, or the ability to Suppose the cluster nodes don't agree on the reservation table ? > Bus resets in the Linux drivers also tend to happen frequently when a > disk is failing, which has tended to leave the system in a somewhat > functional but often an unusable state, (but that's a different story...) The new scsi EH code in 2.4 for the drivers that use it is a lot better. Real problem. - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED]
Re: Linux Cluster using shared scsi
Alan Cox writes: > > Does this package also tell the kernel to "re-establish" a > > reservation for all devices after a bus reset, or at least inform a > > user level program? Finding out when there has been a bus reset has > > been a stumbling block for me. > > You cannot rely on a bus reset. Imagine hot swap disks on an FC fabric. I > suspect the controller itself needs to call back for problem events > I'm not an SCSI expert by any stretch of the imagination. I think that what you are saying is that you cannot rely that a bus reset is as only thing that will remove a reservation. For example, if a device is 'hot replaced', the device will (clearly) no longer be reserved.But if you did such a hot swap you would have "bigger fish to fry" in a HA application... I mean, none of your data would be there! My understanding is that specifically, when a bus reset occurs, all SCSI reservations for devices on that bus are lost. I was hoping that if the kernel (by this I mean the scsi midlayer) was maintaining reservations, that there would be some logic activated to "handle" this problem, whether it be re-reserving the device, or the ability to pass notification of a reset (or another problem event as you point out) up to the application that's handling reservations. In my experience, the most common reason for a bus reset in parallel SCSI is that a peer host on the bus is rebooting. Since this happens under normal operation and well in advance of any attempt to acess the device, it would be nice if there were some sort of asyncronous notification instead of a polling process with an interval of 2-3 minutes, where it's conceivable that the peer system could have booted and attempted to take-over the disk out from under a running system. Bus resets in the Linux drivers also tend to happen frequently when a disk is failing, which has tended to leave the system in a somewhat functional but often an unusable state, (but that's a different story...) James Bottomley <[EMAIL PROTECTED]> writes: > Essentially, there are many conditions which cause a quiet loss of a SCSI-2 > reservation. Even in parallel SCSI: Reservations can be silently lost because >of LUN reset, device reset or even simple powering off the device. ... James mentions that even handling a bus reset still leaves a window where a peer could grab the reservation out from underneath an un-suspecting host. I agree that this could happen, and the old host might perform writes to an 'unreserved' disk, but once the second system suceeded in obtaining the reservation, any read/write commands from the "old" host would return SCSI errors (this is my layman's understanding - the commands would return a UNIT_RESERVED error) , so I believe you would have the desired behavior in this kind of cluster - only one machine in the cluster can access the disk at the same time. The data on the disk should be in a state where the second system in the cluster could start a recovery task and begin to provide the service hosted on the disk. -Eric. -- Eric Z. Ayers Lead Software Engineer Phone: +1 404-705-2864Computer Generation, Incorporated Fax:+1 404-705-2805 an Intec Telecom Systems Company Web:http://www.intec-telecom-systems.com/ Email: [EMAIL PROTECTED] Postal: Bldg G 4th Floor, 5775 Peachtree-Dunwoody Rd, Atlanta, GA 30342 USA - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED]
Re: Linux Cluster using shared scsi
[EMAIL PROTECTED] said: > Does this package also tell the kernel to "re-establish" a reservation > for all devices after a bus reset, or at least inform a user level > program? Finding out when there has been a bus reset has been a > stumbling block for me. [EMAIL PROTECTED] said: > You cannot rely on a bus reset. Imagine hot swap disks on an FC > fabric. I suspect the controller itself needs to call back for > problem events Essentially, there are many conditions which cause a quiet loss of a SCSI-2 reservation. Even in parallel SCSI: Reservations can be silently lost because of LUN reset, device reset or even simple powering off the device. The way we maintain reservations for LifeKeeper is to have a user level daemon ping the device with a reservation command every few minutes. If you get a RESERVATION_CONFLICT return you know that something else stole your reservation, otherwise you maintain it. There is a window in this scheme where the device may be accessible by other initiators but that's the price you pay for using SCSI-2 reservations instead of the more cluster friendly SCSI-3 ones. In a kernel scheme, you may get early notification of reservation loss by putting a hook into the processing of CHECK_CONDITION/UNIT_ATTENTION, but it won't close the window entirely. James - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED]
Re: Linux Cluster using shared scsi
> Does this package also tell the kernel to "re-establish" a > reservation for all devices after a bus reset, or at least inform a > user level program? Finding out when there has been a bus reset has > been a stumbling block for me. You cannot rely on a bus reset. Imagine hot swap disks on an FC fabric. I suspect the controller itself needs to call back for problem events - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED]
Re: Linux Cluster using shared scsi
Doug Ledford writes: (James Bottomley commented about the need for SCSI reservation kernel patches) > > I agree. It's something that needs fixed in general, your software needs it > as well, and I've written (about 80% done at this point) some open source > software geared towards getting/holding reservations that also requires the > same kernel patches (plus one more to be fully functional, an ioctl to allow a > SCSI reservation to do a forced reboot of a machine). I'll be releasing that > package in the short term (once I get back from my vacation anyway). > Hello Doug, Does this package also tell the kernel to "re-establish" a reservation for all devices after a bus reset, or at least inform a user level program? Finding out when there has been a bus reset has been a stumbling block for me. -Eric. -- Eric Z. Ayers Lead Software Engineer Phone: +1 404-705-2864Computer Generation, Incorporated Fax:+1 404-705-2805 an Intec Telecom Systems Company Web:http://www.intec-telecom-systems.com/ Email: [EMAIL PROTECTED] Postal: Bldg G 4th Floor, 5775 Peachtree-Dunwoody Rd, Atlanta, GA 30342 USA - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED]
Re: Linux Cluster using shared scsi
James Bottomley wrote: > > [EMAIL PROTECTED] said: > > So, will Linux ever support the scsi reservation mechanism as standard? > > That's not within my gift. I can merely write the code that corrects the > behaviour. I can't force anyone else to accept it. I think it will be standard before not too much longer (I hope anyway, I'm tired of carrying the patches forward all the time so I'll lend my support to getting it into the mainstream kernel ;-) > [EMAIL PROTECTED] said: > > Isn't there a standard that says if you scsi reserve a disk, no one > > else should be able to access this disk, or is this a "steeleye/ > > Compaq" standard. > > Use of reservations is laid out in the SCSI-2 and SCSI-3 standards (which can > be downloaded from the T10 site www.t10.org) which are international in scope. > I think the implementation issues come because the reservations part is > really only relevant to a multi-initiator clustered environment which isn't an > every day configuration for most Linux users. Obviously, as Linux moves into > the SAN arena this type of configuration will become a lot more common, at > which time the various problems associated with multiple initiators should > rise in prominence. I agree. It's something that needs fixed in general, your software needs it as well, and I've written (about 80% done at this point) some open source software geared towards getting/holding reservations that also requires the same kernel patches (plus one more to be fully functional, an ioctl to allow a SCSI reservation to do a forced reboot of a machine). I'll be releasing that package in the short term (once I get back from my vacation anyway). -- Doug Ledford <[EMAIL PROTECTED]> http://people.redhat.com/dledford Please check my web site for aic7xxx updates/answers before e-mailing me about problems - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED]
Re: Linux Cluster using shared scsi
[EMAIL PROTECTED] said: > So, will Linux ever support the scsi reservation mechanism as standard? That's not within my gift. I can merely write the code that corrects the behaviour. I can't force anyone else to accept it. [EMAIL PROTECTED] said: > Isn't there a standard that says if you scsi reserve a disk, no one > else should be able to access this disk, or is this a "steeleye/ > Compaq" standard. Use of reservations is laid out in the SCSI-2 and SCSI-3 standards (which can be downloaded from the T10 site www.t10.org) which are international in scope. I think the implementation issues come because the reservations part is really only relevant to a multi-initiator clustered environment which isn't an every day configuration for most Linux users. Obviously, as Linux moves into the SAN arena this type of configuration will become a lot more common, at which time the various problems associated with multiple initiators should rise in prominence. James - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED]
RE: Linux Cluster using shared scsi
So, will Linux ever support the scsi reservation mechanism as standard ? Isn't there a standard that says if you scsi reserve a disk, no one else should be able to access this disk, or is this a "steeleye/Compaq" standard. Chris -Original Message- From: James Bottomley [mailto:[EMAIL PROTECTED]] Sent: Friday, April 27, 2001 5:12 PM To: Roets, Chris Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: Re: Linux Cluster using shared scsi I've copied linux SCSI and quoted the entire message below so they can follow. Your assertion that this works in 2.2.16 is incorrect, the patch to fix the linux reservation conflict handler has never been added to the official tree. I suspect you actually don't have vanilla 2.2.16 but instead have a redhat or other distribution patched version. Most distributions include the Steeleye SCSI clustering patches which correct reservation handling. I've attached the complete patch, which fixes both the old and the new error handlers in the 2.2 kernel it applies against 2.2.18. James Bottomley > Problem : > install two Linux-system with a shared scsi-bus and storage on that shared > bus. > suppose : > system one : SCSI ID 7 > system two : SCSI ID 6 > shared disk : SCSI ID 4 > > By default, you can mount the disk on both system. This is normal > behavior, but > may impose data corruption. > To prevent this, you can SCSI-reserve a disk on one system. If the other > system > would try to access this device, the system should return an i/o error due > to the reservation. > This is a common technique used in > - Traditional Tru64 Unix ase clustering > - Tr64 Unix V5 Clustering to accomplish i/o barriers > - Windows-NT Clusters > - Steel-eye clustering > The reservation can be done using a standard tool like scu > > scu -f /dev/sdb > scu > reserve device > > On Linux, this works fine under Kernel version 2.2.16. > Below is the code that accomplish this > /usr/src/linux/drivers/scsi/scsi_obsolete.c in routine scsi_old_done > case RESERVATION_CONFLICT: > printk("scsi%d (%d,%d,%d) : RESERVATION CONFLICT\n", >SCpnt->host->host_no, SCpnt->channel, >SCpnt->device->id, SCpnt->device->lun); > status = CMD_FINISHED; /* returns I/O error */ > break; > default: > As of kernel version 2.2.18, this code has changed, If a scsi reserve > error > occurs, the device driver does a scsi reset. This way the scsi > reservation is > gone, and the device can be accessed. > /usr/src/linux/drivers/scsi/scsi_obsolete.c in routine scsi_old_done > case RESERVATION_CONFLICT: > printk("scsi%d, channel %d : RESERVATION CONFLICT > performing" >" reset.\n", SCpnt->host->host_no, SCpnt->channel); > scsi_reset(SCpnt, SCSI_RESET_SYNCHRONOUS); > status = REDO; > break; > > Fix : delete the scsi reset in the kernel code > case RESERVATION_CONFLICT: > /* Deleted Chris Roets > printk("scsi%d, channel %d : RESERVATION CONFLICT > performing" >" reset.\n", SCpnt->host->host_no, SCpnt->channel); > scsi_reset(SCpnt, SCSI_RESET_SYNCHRONOUS); > status = REDO; > next four lines added */ > printk("scsi%d (%d,%d,%d) : RESERVATION CONFLICT\n", >SCpnt->host->host_no, SCpnt->channel, >SCpnt->device->id, SCpnt->device->lun); > status = CMD_FINISHED; /* returns I/O error */ > break; > > and rebuild the kernel. > > This should get the customer being able to continue > Questions : > - why is this scsi reset done/added as of kernel version 2.2.18 > - as we are talking about an obsolete routine, how is this accomplished > in the new code and how is it activated. > - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED]
Re: Linux Cluster using shared scsi
I've copied linux SCSI and quoted the entire message below so they can follow. Your assertion that this works in 2.2.16 is incorrect, the patch to fix the linux reservation conflict handler has never been added to the official tree. I suspect you actually don't have vanilla 2.2.16 but instead have a redhat or other distribution patched version. Most distributions include the Steeleye SCSI clustering patches which correct reservation handling. I've attached the complete patch, which fixes both the old and the new error handlers in the 2.2 kernel it applies against 2.2.18. James Bottomley > Problem : > install two Linux-system with a shared scsi-bus and storage on that shared > bus. > suppose : > system one : SCSI ID 7 > system two : SCSI ID 6 > shared disk : SCSI ID 4 > > By default, you can mount the disk on both system. This is normal > behavior, but > may impose data corruption. > To prevent this, you can SCSI-reserve a disk on one system. If the other > system > would try to access this device, the system should return an i/o error due > to the reservation. > This is a common technique used in > - Traditional Tru64 Unix ase clustering > - Tr64 Unix V5 Clustering to accomplish i/o barriers > - Windows-NT Clusters > - Steel-eye clustering > The reservation can be done using a standard tool like scu > > scu -f /dev/sdb > scu > reserve device > > On Linux, this works fine under Kernel version 2.2.16. > Below is the code that accomplish this > /usr/src/linux/drivers/scsi/scsi_obsolete.c in routine scsi_old_done > case RESERVATION_CONFLICT: > printk("scsi%d (%d,%d,%d) : RESERVATION CONFLICT\n", >SCpnt->host->host_no, SCpnt->channel, >SCpnt->device->id, SCpnt->device->lun); > status = CMD_FINISHED; /* returns I/O error */ > break; > default: > As of kernel version 2.2.18, this code has changed, If a scsi reserve > error > occurs, the device driver does a scsi reset. This way the scsi > reservation is > gone, and the device can be accessed. > /usr/src/linux/drivers/scsi/scsi_obsolete.c in routine scsi_old_done > case RESERVATION_CONFLICT: > printk("scsi%d, channel %d : RESERVATION CONFLICT > performing" >" reset.\n", SCpnt->host->host_no, SCpnt->channel); > scsi_reset(SCpnt, SCSI_RESET_SYNCHRONOUS); > status = REDO; > break; > > Fix : delete the scsi reset in the kernel code > case RESERVATION_CONFLICT: > /* Deleted Chris Roets > printk("scsi%d, channel %d : RESERVATION CONFLICT > performing" >" reset.\n", SCpnt->host->host_no, SCpnt->channel); > scsi_reset(SCpnt, SCSI_RESET_SYNCHRONOUS); > status = REDO; > next four lines added */ > printk("scsi%d (%d,%d,%d) : RESERVATION CONFLICT\n", >SCpnt->host->host_no, SCpnt->channel, >SCpnt->device->id, SCpnt->device->lun); > status = CMD_FINISHED; /* returns I/O error */ > break; > > and rebuild the kernel. > > This should get the customer being able to continue > Questions : > - why is this scsi reset done/added as of kernel version 2.2.18 > - as we are talking about an obsolete routine, how is this accomplished > in the new code and how is it activated. > Index: linux/2.2/drivers/scsi/scsi.c diff -u linux/2.2/drivers/scsi/scsi.c:1.1.1.9 linux/2.2/drivers/scsi/scsi.c:1.1.1.9.2.4 --- linux/2.2/drivers/scsi/scsi.c:1.1.1.9 Thu Feb 15 12:53:35 2001 +++ linux/2.2/drivers/scsi/scsi.c Fri Mar 2 18:04:40 2001 @@ -198,7 +198,13 @@ */ extern void scsi_old_done (Scsi_Cmnd *SCpnt); extern void scsi_old_times_out (Scsi_Cmnd * SCpnt); +extern int scsi_old_reset(Scsi_Cmnd *SCpnt, unsigned int flag); +/* + * private interface into the new error handling code + */ +extern int scsi_new_reset(Scsi_Cmnd *SCpnt, unsigned int flag); + #if CONFIG_PROC_FS extern int (* dispatch_scsi_info_ptr)(int ino, char *buffer, char **start, off_t offset, int length, int inout); @@ -724,7 +730,7 @@ SCSI_LOG_SCAN_BUS(3,print_hostbyte(SCpnt->result)); SCSI_LOG_SCAN_BUS(3,printk("\n")); - if (SCpnt->result) { + if (SCpnt->result && status_byte(SCpnt->result) != RESERVATION_CONFLICT) { if (((driver_byte (SCpnt->result) & DRIVER_SENSE) || (status_byte (SCpnt->result) & CHECK_CONDITION)) && ((SCpnt->sense_buffer[0] & 0x70) >> 4) == 7) { @@ -2180,6 +2186,87 @@ printk("\n"); } +/* Dummy done routine. We don't want the bogus command used for the + * bus/device reset to find its way into the mid-layer so we intercept + * it here */ +static void +scsi_reset_provider_done_command(Scsi_Cmnd *SCpnt) { +/* Empty function. Some low level drivers will call scsi_done + * (and en