Re: Linux Cluster using shared scsi

2001-05-04 Thread Eddie Williams

Doug Ledford wrote:
>
> ...
>
> If told to hold a reservation, then resend your reservation request once every
> 2 seconds (this actually has very minimal CPU/BUS usage and isn't as big a
> deal as requesting a reservation every 2 seconds might sound).  The first time
> the reservation is refused, consider the reservation stolen by another machine
> and exit (or optionally, reboot).

I agree that the resend of the reservation is not all that big but there is 
also the proverbial "straw that broke the Camel's back."  When there is enough 
activity there could be logic added to avoid sending reservations.  In all 
cases when a reservation is forcefully removed the result will cause the 
device to return a UNIT ATTENTION (Well, I guess I know that is the behavior 
on Parallel SCSI, is this true for FC?).  So the host should know with the 
next command issued that it lost the reservation (not necessarily that someone 
else has stolen it but that for some reason the device just lost it).  So you 
could "check" to see if within the last 2 seconds (a) has an IO completed and 
(b) every IO that completed in that 2 second span completed without any 
"error".  In error I mean without incident, such as a check condition.  In 
this case the reservation is not needed as you know nothing has happened to 
cause the reservation to be lost.  Perhaps in this heavy load situation you 
could even add logic to issue the reservation as soon as the mid-layer is 
aware that the reservation was broken maybe saving a second or so?

I see this as an enhancement that could be added on later, perhaps keep in 
mind this enhancement so that your initial development does not make it more 
difficult to implement it later.

Eddie


-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]



Re: Linux Cluster using shared scsi

2001-05-03 Thread Eric Z. Ayers

Pavel Machek writes:
 > Hi!
 > 
 > > > > ...
 > > > >
 > > > > If told to hold a reservation, then resend your reservation request once every
 > > > > 2 seconds (this actually has very minimal CPU/BUS usage and isn't as big a
 > > > > deal as requesting a reservation every 2 seconds might sound).  The first time
 > > > > the reservation is refused, consider the reservation stolen by another machine
 > > > > and exit (or optionally, reboot).
 > > > 
 > > > Umm. Reboot? What do you think this is? Windoze?
 > > 
 > > It's the *only* way to guarantee that the drive is never touched by more than
 > > one machine at a time (notice, I've not been talking about a shared use drive,
 > > only one machine in the cluster "owns" the drive at a time, and it isn't for
 > > single transactions that it owns the drive, it owns the drive for as long as
 > > it is alive, this is a limitation of the filesystes currently available in
 > > mainstream kernels).  The reservation conflict and subsequent reboot also
 > > *only* happens when a reservation has been forcefully stolen from a
 > >machine. 
 > 
 > I do not believe reboot from kernel is right approach. Tell init with
 > special signal, maybe; but do not reboot forcefully. This is policy;
 > doing reboot might be right answer in 90% cases; that does not mean
 > you should do it always.
...

However distateful it sounds, there is precedent for the
behavior that Doug is proposing in commercial clustering
implementations.  My recollection is that both Compaq TruCluster and
HP Service Guard have logic that will panic the kernel when a disk is
"stolen" from under a running service and there is a "network
partition" in the cluster.

A network partition occurs when multiple machines in the cluster are
runnig, but the HA software agents on two nodes can't communicate via
the network to arbitrate which node should be the owner of the disk. 

-Eric
--
Eric Z. Ayers Lead Software Engineer
Phone:  +1 404-705-2864Computer Generation, Incorporated
Fax:+1 404-705-2805 an Intec Telecom Systems Company
Web:http://www.intec-telecom-systems.com/
Email:  [EMAIL PROTECTED]
Postal: Bldg G 4th Floor, 5775 Peachtree-Dunwoody Rd, Atlanta, GA 30342 USA
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]



Re: Linux Cluster using shared scsi

2001-05-03 Thread James Bottomley

There is another nasty in multi-port arrays that I should perhaps point out:  
a bus reset isn't supposed to drop the reservation if it was taken on another 
port.  A device or LUN reset will drop reservations on all ports.  This 
behaviour, although clearly mandated by the SCSI-3-SPC, is rather patchily 
implemented in arrays and I have seen some multi-port arrays that will, 
illegally, drop reservations on all ports on receipt of a bus reset.

Unfortunately, most Linux SCSI drivers won't issue device resets on command, 
they'll only issue bus resets, so it is possible to get into a situation where 
you cannot break a reservation belonging to a dead machine, if you set up a 
point-to-point cluster rather than a true shared-scsi one.

James


-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]



Re: Linux Cluster using shared scsi

2001-05-02 Thread Doug Ledford

Max TenEyck Woodbury wrote:
> 
> Doug Ledford wrote:
> >
> > Max TenEyck Woodbury wrote:
> >>
> >> Umm. Reboot? What do you think this is? Windoze?
> >
> > It's the *only* way to guarantee that the drive is never touched by more
> > than one machine at a time (notice, I've not been talking about a shared
> > use drive, only one machine in the cluster "owns" the drive at a time,
> > and it isn't for single transactions that it owns the drive, it owns
> > the drive for as long as it is alive, this is a limitation of the
> > filesystes currently available in mainstream kernels).  The reservation
> > conflict and subsequent reboot also *only* happens when a reservation
> > has been forcefully stolen from a machine. In that case, you are talking
> > about a machine that has been failed over against its will, and the
> > absolute safest thing to do in order to make sure the failed over machine
> > doesn't screw the whole cluster up, is to make it reboot itself and
> > re-enter the cluster as a new failover slave node instead of as a master
> > node.  If you want a shared common access device with write locking
> > semantics, as you seem to be suggesting later on, then you need a
> > different method of locking than what I put in this, I knew that as I
> > wrote it and that was intentional.
> 
> That was partly meant to be a joke, but it was also meant to make you stop
> and think about what you are doing. From what little context I read, you
> seem to be looking for a high availability solution. Rebooting a system,
> even if there is a hot backup, should only be used as a last resort.

This is something that only happens when a machine has been forcefully failed
over against its will.  I guess you would need to see the code to tell what
I'm talking about, but in the description I gave of the code, if it doesn't
get a reservation, it exits.  The way the code is intended to be used is
something like this:

Given machine A as cluster master and machine B as a cluster slave.  Machine A
starts the reservation program with something like this as the command line:

reserve --reserve --hold /dev/sdc

This will result in the program grabbing a reservation on drive sdc (or
exiting with a non-0 status on failure) and then sitting in a loop where it
re-issues the reservation every 2 seconds.

Under normal operation, the reserve program is not started at all on machine
B.  However, machine B does use the normal heartbeat method (be that the
heartbeat package or something similar, but not reservations) to check that
machine A is still alive.  Given a failure in the communications between
machine B and machine A, which would typically mean it is time to fail over
the cluster, machine B can test the status of machine A by throwing a reset to
the drive to break any existing reservations, waiting 4 seconds, then trying
to run it's own reservation.  This can be accomplished with the command:

reserve --reset --reserve --hold /dev/sdc

If the program fails to get the reservation then that means machine A was able
to resend it's reservation.  Obviously then, machine A isn't dead.  Machine B
can then decide that the heartbeat link is dead but machine A is still fine
and not try any further failover actions, or it could decide that machine A
has a working reserve program but key services or network connectivity may be
dead, in which case a forced failover would be in order.  To accomplish that,
machine B can issue this command:

reserve --preempt --hold /dev/sdc

This will break machine A's reservation and take the drive over from machine
A.  It's at this point, and this point only, that machine A will see a
reservation conflict.  It has been forcefully failed over, so
resetting/rebooting the machine is a perfectly fine alternative (and the
reason it is recommended is because at this point in time, machine B may
already be engaged in recovering the filesystem on the shared drive, and
machine A may still have buffers it is trying to flush to the same drive, so
in order to make sure machine A doesn't let some dirty buffer get through a
break in machine B's reservation caused by something as inane as another
machine on the bus starting up and throwing an initial reset, we should reset
machine A *as soon as we know it has been forcefully failed over and is no
longer allowed to write to the drive*).  Arguments with this can be directed
to Stephen Tweedie, who is largely responsible for beating me into doing it
this way ;-)


> Another problem is that reservations do *not* guarantee ownership over
> the long haul. There are too many mechanisms that break reservations to
> build a complete strategy on them.

See above about the reason for needing to reset the machine ;-)  The overall
package is cooperative in nature, so we don't rely on reservations except for
the actual failover.  However, due to this very issue, we need to kill the
machine that was failed over as soon as possible after the failover to avoid
any possible races with open wi

Re: Linux Cluster using shared scsi

2001-05-02 Thread Max TenEyck Woodbury

Doug Ledford wrote:
> 
> Max TenEyck Woodbury wrote:
>>
>> Umm. Reboot? What do you think this is? Windoze?
> 
> It's the *only* way to guarantee that the drive is never touched by more
> than one machine at a time (notice, I've not been talking about a shared
> use drive, only one machine in the cluster "owns" the drive at a time,
> and it isn't for single transactions that it owns the drive, it owns
> the drive for as long as it is alive, this is a limitation of the
> filesystes currently available in mainstream kernels).  The reservation
> conflict and subsequent reboot also *only* happens when a reservation
> has been forcefully stolen from a machine. In that case, you are talking
> about a machine that has been failed over against its will, and the
> absolute safest thing to do in order to make sure the failed over machine
> doesn't screw the whole cluster up, is to make it reboot itself and
> re-enter the cluster as a new failover slave node instead of as a master
> node.  If you want a shared common access device with write locking
> semantics, as you seem to be suggesting later on, then you need a
> different method of locking than what I put in this, I knew that as I
> wrote it and that was intentional.

That was partly meant to be a joke, but it was also meant to make you stop
and think about what you are doing. From what little context I read, you
seem to be looking for a high availability solution. Rebooting a system,
even if there is a hot backup, should only be used as a last resort.

Another problem is that reservations do *not* guarantee ownership over 
the long haul. There are too many mechanisms that break reservations to 
build a complete strategy on them. Unfortunately, this ground was covered
during the 'cluster wars' between IBM and DEC and the field is strewn
with patents, so finding an open source solution may be tough.

>> ...
>> 
>> In other words, the reservation acts as a spin-lock to make sure updates
>> occur atomically.
> 
> Apples to oranges, as I described above.  This is for a failover cluster, not
> a shared data, load balancing cluster.

Load balancing clusters do need a good locking method, but so do failover
clusters. 

It's been 12 years since I did much with this, but I did do a fine tooth
analysis of parts of DECs clustering code looking for ways it could fail.
(The results were a few internal SPRs. I'm not at liberty to discuss the
details, even if I could remember them. However, I can say without giving
anything away that there were places where a hardware locking mechanism,
like reservation, would have simplified the code and improved performance.)

It was precisely the kinds of things associated with hardware failover
that lead DEC to do clusters in the first place. The load balancing stuff
was secondary, even if it did sell more systems in the long run. You may
be able to get through the patent minefield by retrofitting the load 
balancing lock mechanisms to failover.

It may be that you can make it work, but the tested solution requires
software to back up the hardware. Good Luck, you'll probably need a
lot of it. (no sarcasm intended.)

[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]



Re: Linux Cluster using shared scsi

2001-05-02 Thread Doug Ledford

Mike Anderson wrote:
> 
> Doug,
> 
> I guess I worded my question poorly. My question was around multi-path
> devices in combination with SCSI-2 reserve vs SCSI-3 persistent reserve which
> has not always been easy, but is more difficult is you use a name space that
> can slip or can have multiple entries for the same physical device you want
> to reserve.

The software is independant on each machine, so it is entirely possible that
the same disk will be in two different name spaces one two different machines
and everything work just fine.  For example, maybe it's /dev/sdc on machine A
and /dev/sdf on machine B.  That's fine, you simply tell the software on
machine A to grab /dev/sdc and tell the software on machine B to grab /dev/sdf
and all will work properly.  Now, as to mixing SCSI-2 and SCSI-3 Persistent
Reservations on the same drive, not a chance.  The software will automatically
use the best alternative available, so it won't fall back to SCSI-2 LUN
reservations with SCSI-3 Persistent Reservations available (and if you force
it to do so, then you have no one to blame but yourself ;-)

> But here is a second try.
> 
> If this is a failover cluster then node A will need to reserve all disks in
> shareable space using sg or only a subset if node A has sync'd his sd name
> space with the other node and they both wish to do work in disjoint pools of
> disks.
> 
> In the scenario of grabbing all the disks. If sda and sdb are the same device
> than I can only reserve one of them and ensure IO only goes down through the
> one I reserver-ed otherwise I could get a reservation conflict.

Correct, if you hold a reservation on a device for which you have multiple
paths, you have to use the correct path.

> This goes
> along with your previous patch on supporting multi-path at "md" and translating this 
>into the proper device to reserve.

The md multipath driver doesn't currently allow the proper ioctls for us to do
reservations at the md level.  We could only do them by going in and doing
reservations on /dev/sg entries behind the back of the md layer, which would
be risky at best.

> I guess it is up to the caller of
> your service to handle this case correct??

For now, yes.  And the best method to do so is to configure your failover
software to know that a device is a multipath device, only attempt to reserve
or mount one path, and fail back on the second path if the first path goes
away by issuing a bus reset on the secondary path, then reserving the
secondary path, then mounting the secondary path.  However, as you will have
lost data due to the failed writes on the primary path, I think this is of
dubious value.  Right now, as I see it, multipath and failover simply don't
mix well.  There is more work needed to make it work well.

> If this not any clearer than my last mail I will just wait to see the code
> :-).
> 
> Thanks,
> 
> -Mike
> 
> Doug Ledford [[EMAIL PROTECTED]] wrote:
> >
> >
> >
> > To:   Mike Anderson <[EMAIL PROTECTED]>
> > cc:   [EMAIL PROTECTED], James Bottomley
> >   <[EMAIL PROTECTED]>, "Roets, Chris"
> >   <[EMAIL PROTECTED]>, [EMAIL PROTECTED],
> >   [EMAIL PROTECTED]
> >
> >
> >
> >
> >
> > Mike Anderson wrote:
> > >
> > > Doug,
> > >
> > > A question on clarification.
> > >
> > > Is the configuration you are testing have both FC adapters going to the
> > same
> > > port of the storage device (mutli-path) or to different ports of the
> > storage
> > > device (mulit-port)?
> > >
> > > The reason I ask is that I thought if you are using SCSI-2 reserves that
> > the
> > > reserve was on a per initiator basis. How does one know which path has
> > the
> > > reserve?
> >
> > Reservations are global in nature in that a reservation with a device will
> > block access to that device from all other initiators, including across
> > different ports on multiport devices (or else they are broken and need a
> > firmware update).
> >
> > > On a side note. I thought the GFS project had up leveled there locking /
> > fencing
> > > into a API called a locking harness to support different kinds of fencing
> > > methods. Any thoughts if this capability could be plugged into this
> > service so
> > > that users could reduce recoding depending on which fencing support they
> > > selected.
> >
> > I wouldn't know about that.
> >
> > --
> >
> >  Doug Ledford <[EMAIL PROTECTED]>  http://people.redhat.com/dledford
> >   Please check my web site for aic7xxx updates/answers before
> >   e-mailing me about problems
> 
> --
> Michael Anderson
> [EMAIL PROTECTED]
> 
> IBM Linux Technology Center - Storage IO
> Phone (503) 578-4466
> Tie Line: 775-4466

-- 

 Doug Ledford <[EMAIL PROTECTED]>  http://people.redhat.com/dledford
  Please check my web site for aic7xxx updates/answers before
  e-mailing me about problems
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]



Re: Linux Cluster using shared scsi

2001-05-02 Thread Mike Anderson

Doug,

I guess I worded my question poorly. My question was around multi-path
devices in combination with SCSI-2 reserve vs SCSI-3 persistent reserve which 
has not always been easy, but is more difficult is you use a name space that 
can slip or can have multiple entries for the same physical device you want
to reserve.

But here is a second try.

If this is a failover cluster then node A will need to reserve all disks in 
shareable space using sg or only a subset if node A has sync'd his sd name
space with the other node and they both wish to do work in disjoint pools of
disks.

In the scenario of grabbing all the disks. If sda and sdb are the same device 
than I can only reserve one of them and ensure IO only goes down through the
one I reserver-ed otherwise I could get a reservation conflict. This goes 
along with your previous patch on supporting multi-path at "md" and translating this 
into the proper device to reserve. I guess it is up to the caller of 
your service to handle this case correct??

If this not any clearer than my last mail I will just wait to see the code
:-).

Thanks,

-Mike

Doug Ledford [[EMAIL PROTECTED]] wrote:
> 
> 
> 
> To:   Mike Anderson <[EMAIL PROTECTED]>
> cc:   [EMAIL PROTECTED], James Bottomley
>   <[EMAIL PROTECTED]>, "Roets, Chris"
>   <[EMAIL PROTECTED]>, [EMAIL PROTECTED],
>   [EMAIL PROTECTED]
> 
> 
> 
> 
> 
> Mike Anderson wrote:
> >
> > Doug,
> >
> > A question on clarification.
> >
> > Is the configuration you are testing have both FC adapters going to the
> same
> > port of the storage device (mutli-path) or to different ports of the
> storage
> > device (mulit-port)?
> >
> > The reason I ask is that I thought if you are using SCSI-2 reserves that
> the
> > reserve was on a per initiator basis. How does one know which path has
> the
> > reserve?
> 
> Reservations are global in nature in that a reservation with a device will
> block access to that device from all other initiators, including across
> different ports on multiport devices (or else they are broken and need a
> firmware update).
> 
> > On a side note. I thought the GFS project had up leveled there locking /
> fencing
> > into a API called a locking harness to support different kinds of fencing
> > methods. Any thoughts if this capability could be plugged into this
> service so
> > that users could reduce recoding depending on which fencing support they
> > selected.
> 
> I wouldn't know about that.
> 
> --
> 
>  Doug Ledford <[EMAIL PROTECTED]>  http://people.redhat.com/dledford
>   Please check my web site for aic7xxx updates/answers before
>   e-mailing me about problems

-- 
Michael Anderson
[EMAIL PROTECTED]

IBM Linux Technology Center - Storage IO
Phone (503) 578-4466
Tie Line: 775-4466

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]



Re: Linux Cluster using shared scsi

2001-05-02 Thread Doug Ledford

Mike Anderson wrote:
> 
> Doug,
> 
> A question on clarification.
> 
> Is the configuration you are testing have both FC adapters going to the same
> port of the storage device (mutli-path) or to different ports of the storage
> device (mulit-port)?
> 
> The reason I ask is that I thought if you are using SCSI-2 reserves that the
> reserve was on a per initiator basis. How does one know which path has the
> reserve?

Reservations are global in nature in that a reservation with a device will
block access to that device from all other initiators, including across
different ports on multiport devices (or else they are broken and need a
firmware update).

> On a side note. I thought the GFS project had up leveled there locking / fencing
> into a API called a locking harness to support different kinds of fencing
> methods. Any thoughts if this capability could be plugged into this service so
> that users could reduce recoding depending on which fencing support they
> selected.

I wouldn't know about that.

-- 

 Doug Ledford <[EMAIL PROTECTED]>  http://people.redhat.com/dledford
  Please check my web site for aic7xxx updates/answers before
  e-mailing me about problems
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]



Re: Linux Cluster using shared scsi

2001-05-02 Thread Doug Ledford

Max TenEyck Woodbury wrote:
> 
> Doug Ledford wrote:
> >
> > ...
> >
> > If told to hold a reservation, then resend your reservation request once every
> > 2 seconds (this actually has very minimal CPU/BUS usage and isn't as big a
> > deal as requesting a reservation every 2 seconds might sound).  The first time
> > the reservation is refused, consider the reservation stolen by another machine
> > and exit (or optionally, reboot).
> 
> Umm. Reboot? What do you think this is? Windoze?

It's the *only* way to guarantee that the drive is never touched by more than
one machine at a time (notice, I've not been talking about a shared use drive,
only one machine in the cluster "owns" the drive at a time, and it isn't for
single transactions that it owns the drive, it owns the drive for as long as
it is alive, this is a limitation of the filesystes currently available in
mainstream kernels).  The reservation conflict and subsequent reboot also
*only* happens when a reservation has been forcefully stolen from a machine. 
In that case, you are talking about a machine that has been failed over
against its will, and the absolute safest thing to do in order to make sure
the failed over machine doesn't screw the whole cluster up, is to make it
reboot itself and re-enter the cluster as a new failover slave node instead of
as a master node.  If you want a shared common access device with write
locking semantics, as you seem to be suggesting later on, then you need a
different method of locking than what I put in this, I knew that as I wrote it
and that was intentional.

> Really, You can NOT do clustering well if you don't have a consistent locking
> mechanism. The use of a hardware locking method like 'reservation' may be a
> good way to avoid race conditions, but it should be backed up by the
> appropriate exchange of messages to make sure everybody has the same view of
> the system. For example, you might use it like this:
> 
> 1. Examine the lock list for conflicts. If a conflict is found, the lock
>request fails.
> 
> 2. Reserve the device with the lock on it. If the reservation fails, delay
>a short amount of time and return to 1.
> 
> 3. Update the lock list for the device.
> 
> 4. When the list update is complete, release the reservation.
> 
> In other words, the reservation acts as a spin-lock to make sure updates
> occur atomically.

Apples to oranges, as I described above.  This is for a failover cluster, not
a shared data, load balancing cluster.

-- 

 Doug Ledford <[EMAIL PROTECTED]>  http://people.redhat.com/dledford
  Please check my web site for aic7xxx updates/answers before
  e-mailing me about problems
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]



Re: Linux Cluster using shared scsi

2001-05-02 Thread Max TenEyck Woodbury

Doug Ledford wrote:
> 
> ...
> 
> If told to hold a reservation, then resend your reservation request once every
> 2 seconds (this actually has very minimal CPU/BUS usage and isn't as big a
> deal as requesting a reservation every 2 seconds might sound).  The first time
> the reservation is refused, consider the reservation stolen by another machine
> and exit (or optionally, reboot).

Umm. Reboot? What do you think this is? Windoze?

Really, You can NOT do clustering well if you don't have a consistent locking
mechanism. The use of a hardware locking method like 'reservation' may be a
good way to avoid race conditions, but it should be backed up by the 
appropriate exchange of messages to make sure everybody has the same view of
the system. For example, you might use it like this:

1. Examine the lock list for conflicts. If a conflict is found, the lock
   request fails.

2. Reserve the device with the lock on it. If the reservation fails, delay
   a short amount of time and return to 1.

3. Update the lock list for the device.

4. When the list update is complete, release the reservation.

In other words, the reservation acts as a spin-lock to make sure updates
occur atomically.

[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]



Re: Linux Cluster using shared scsi

2001-05-02 Thread Mike Anderson

Doug,

A question on clarification.

Is the configuration you are testing have both FC adapters going to the same
port of the storage device (mutli-path) or to different ports of the storage 
device (mulit-port)?

The reason I ask is that I thought if you are using SCSI-2 reserves that the
reserve was on a per initiator basis. How does one know which path has the
reserve?

On a side note. I thought the GFS project had up leveled there locking / fencing
into a API called a locking harness to support different kinds of fencing
methods. Any thoughts if this capability could be plugged into this service so
that users could reduce recoding depending on which fencing support they
selected.


Thanks,

-Mike

Doug Ledford [[EMAIL PROTECTED]] wrote:
> 
> 
> 
> To:   [EMAIL PROTECTED]
> cc:   James Bottomley <[EMAIL PROTECTED]>, "Roets, Chris"
>   <[EMAIL PROTECTED]>, [EMAIL PROTECTED],
>   [EMAIL PROTECTED]
> 
> 
> 
> 
> 
> "Eric Z. Ayers" wrote:
> >
> > Doug Ledford writes:
> > (James Bottomley commented about the need for SCSI reservation kernel
> patches)
> >  >
> >  > I agree.  It's something that needs fixed in general, your software
> needs it
> >  > as well, and I've written (about 80% done at this point) some open
> source
> >  > software geared towards getting/holding reservations that also
> requires the
> >  > same kernel patches (plus one more to be fully functional, an ioctl to
> allow a
> >  > SCSI reservation to do a forced reboot of a machine).  I'll be
> releasing that
> >  > package in the short term (once I get back from my vacation anyway).
> >  >
> >
> > Hello Doug,
> >
> > Does this package also tell the kernel to "re-establish" a
> > reservation for all devices after a bus reset, or at least inform a
> > user level program?  Finding out when there has been a bus reset has
> > been a stumbling block for me.
> 
> It doesn't have to.  The kernel changes are minimal (basically James' SCSI
> reset patch that he's been carrying around, the scsi reservation conflict
> patch, and I need to write a third patch that makes the system optionally
> reboot immediately on a reservation conflict and which is controlled by an
> ioctl, but I haven't done that patch yet).  All of the rest is implemented
> in
> user space via the /dev/sg entries.  As such, it doesn't have any more
> information about bus resets than you do.  However, because of the policy
> enacted in the code, it doesn't need to.  Furthermore, because there are so
> many ways to loose a reservation silently, it's foolhardy to try and keep
> reservation consistency any way other than something similar to what I
> outline
> below.
> 
> The package is meant to be a sort of "scsi reservation" library.  The
> application that uses the library is responsible for setting policy.  I
> wrote
> a small, simple application that actually does a decent job of implementing
> policy on the system.  The policy it does implement is simple:
> 
> If told to get a reservation, then attempt to get it.  If the attempt is
> blocked by an existing reservation and we aren't suppossed to reset the
> drive,
> then exit.  If it's blocked and we are suppossed to reset the drive, then
> send
> a device reset, then wait 5 seconds, then try to get the reservation.  If
> we
> again fail, then the other machine is still alive (as proven by the fact
> that
> it re-established its reservation after the reset) and we exit, else we now
> have the reservation.
> 
> If told to forcefully get a reservation, then attempt to get it.  If the
> attempt fails, then reset the device and try again immediately (no 5 second
> wait), if it fails again, then exit.
> 
> If told to hold a reservation, then resend your reservation request once
> every
> 2 seconds (this actually has very minimal CPU/BUS usage and isn't as big a
> deal as requesting a reservation every 2 seconds might sound).  The first
> time
> the reservation is refused, consider the reservation stolen by another
> machine
> and exit (or optionally, reboot).
> 
> The package is meant to lock against itself (in other words, a malicious
> user
> with write access to the /dev/sg entries could confuse this locking
> mechanism,
> but it will work cooperatively with other copies of itself running on other
> machines), the requirements for the locking to be safe are as follows:
> 
> 1)  A machine is not allowed to mount or otherwise use a drive in any way
> shape or form until it has successfully acquired a reservation.
> 
> 2)  Once a machine has a reservation, it is not allowed to ever take any
> action to break another machines reservation, so that if the reservation is
> stolen, this machine is required to "gracefully" step away from the drive
> (rebooting is the best way to accomplish this since even the act of
> unmounting
> the drive will attempt to write to it).
> 
> 3)  The timeouts in the program must be honored (resend your reservation,
> when
> you hold it, every 2 seconds so that a passive attempt to steal the
> rese

Re: Linux Cluster using shared scsi

2001-05-02 Thread Eddie Williams


Hi Doug,

Great to hear your progress on this.  As I had not heard anything about this 
effort since this time last year I had assumed you put this project on the 
shelf.  I will be happy to test these interfaces when they are ready.

Eddie

> "Eric Z. Ayers" wrote:
> > 
> > Doug Ledford writes:
> > (James Bottomley commented about the need for SCSI reservation kernel patches)
> >  >
> >  > I agree.  It's something that needs fixed in general, your software needs it
> >  > as well, and I've written (about 80% done at this point) some open source
> >  > software geared towards getting/holding reservations that also requires the
> >  > same kernel patches (plus one more to be fully functional, an ioctl to allow a
> >  > SCSI reservation to do a forced reboot of a machine).  I'll be releasing that
> >  > package in the short term (once I get back from my vacation anyway).
> >  >
> > 
> > Hello Doug,
> > 
> > Does this package also tell the kernel to "re-establish" a
> > reservation for all devices after a bus reset, or at least inform a
> > user level program?  Finding out when there has been a bus reset has
> > been a stumbling block for me.
> 
> It doesn't have to.  The kernel changes are minimal (basically James' SCSI
> reset patch that he's been carrying around, the scsi reservation conflict
> patch, and I need to write a third patch that makes the system optionally
> reboot immediately on a reservation conflict and which is controlled by an
> ioctl, but I haven't done that patch yet).  All of the rest is implemented in
> user space via the /dev/sg entries.  As such, it doesn't have any more
> information about bus resets than you do.  However, because of the policy
> enacted in the code, it doesn't need to.  Furthermore, because there are so
> many ways to loose a reservation silently, it's foolhardy to try and keep
> reservation consistency any way other than something similar to what I outline
> below.
> 
> The package is meant to be a sort of "scsi reservation" library.  The
> application that uses the library is responsible for setting policy.  I wrote
> a small, simple application that actually does a decent job of implementing
> policy on the system.  The policy it does implement is simple:
> 
> If told to get a reservation, then attempt to get it.  If the attempt is
> blocked by an existing reservation and we aren't suppossed to reset the drive,
> then exit.  If it's blocked and we are suppossed to reset the drive, then send
> a device reset, then wait 5 seconds, then try to get the reservation.  If we
> again fail, then the other machine is still alive (as proven by the fact that
> it re-established its reservation after the reset) and we exit, else we now
> have the reservation.
> 
> If told to forcefully get a reservation, then attempt to get it.  If the
> attempt fails, then reset the device and try again immediately (no 5 second
> wait), if it fails again, then exit.
> 
> If told to hold a reservation, then resend your reservation request once every
> 2 seconds (this actually has very minimal CPU/BUS usage and isn't as big a
> deal as requesting a reservation every 2 seconds might sound).  The first time
> the reservation is refused, consider the reservation stolen by another machine
> and exit (or optionally, reboot).
> 
> The package is meant to lock against itself (in other words, a malicious user
> with write access to the /dev/sg entries could confuse this locking mechanism,
> but it will work cooperatively with other copies of itself running on other
> machines), the requirements for the locking to be safe are as follows:
> 
> 1)  A machine is not allowed to mount or otherwise use a drive in any way
> shape or form until it has successfully acquired a reservation.
> 
> 2)  Once a machine has a reservation, it is not allowed to ever take any
> action to break another machines reservation, so that if the reservation is
> stolen, this machine is required to "gracefully" step away from the drive
> (rebooting is the best way to accomplish this since even the act of unmounting
> the drive will attempt to write to it).
> 
> 3)  The timeouts in the program must be honored (resend your reservation, when
> you hold it, every 2 seconds so that a passive attempt to steal the
> reservation will see you are still alive within the 5 second timeout and leave
> you be, which is a sort of heartbeat in and of itself).
> 
> Anyway, as I said in my previous email, it's about 80% complete.  It currently
> is up and running on SCSI-2 LUN based reservations.  There is code to do
> SCSI-2 and SCSI-3 extent based reservations but it hasn't been tested due to
> lack of devices that support extent based reservations (my test bed is a
> multipath FC setup, so I'm doing all my testing on FC drives over two FC
> controllers in the same machine).  I've still got to add the SCSI-3 Persistent
> Reservation code to the library (again, I'm lacking test drives for this
> scenario).  The library itself requires that the p

Re: Linux Cluster using shared scsi

2001-05-02 Thread Doug Ledford

"Eric Z. Ayers" wrote:
> 
> Doug Ledford writes:
> (James Bottomley commented about the need for SCSI reservation kernel patches)
>  >
>  > I agree.  It's something that needs fixed in general, your software needs it
>  > as well, and I've written (about 80% done at this point) some open source
>  > software geared towards getting/holding reservations that also requires the
>  > same kernel patches (plus one more to be fully functional, an ioctl to allow a
>  > SCSI reservation to do a forced reboot of a machine).  I'll be releasing that
>  > package in the short term (once I get back from my vacation anyway).
>  >
> 
> Hello Doug,
> 
> Does this package also tell the kernel to "re-establish" a
> reservation for all devices after a bus reset, or at least inform a
> user level program?  Finding out when there has been a bus reset has
> been a stumbling block for me.

It doesn't have to.  The kernel changes are minimal (basically James' SCSI
reset patch that he's been carrying around, the scsi reservation conflict
patch, and I need to write a third patch that makes the system optionally
reboot immediately on a reservation conflict and which is controlled by an
ioctl, but I haven't done that patch yet).  All of the rest is implemented in
user space via the /dev/sg entries.  As such, it doesn't have any more
information about bus resets than you do.  However, because of the policy
enacted in the code, it doesn't need to.  Furthermore, because there are so
many ways to loose a reservation silently, it's foolhardy to try and keep
reservation consistency any way other than something similar to what I outline
below.

The package is meant to be a sort of "scsi reservation" library.  The
application that uses the library is responsible for setting policy.  I wrote
a small, simple application that actually does a decent job of implementing
policy on the system.  The policy it does implement is simple:

If told to get a reservation, then attempt to get it.  If the attempt is
blocked by an existing reservation and we aren't suppossed to reset the drive,
then exit.  If it's blocked and we are suppossed to reset the drive, then send
a device reset, then wait 5 seconds, then try to get the reservation.  If we
again fail, then the other machine is still alive (as proven by the fact that
it re-established its reservation after the reset) and we exit, else we now
have the reservation.

If told to forcefully get a reservation, then attempt to get it.  If the
attempt fails, then reset the device and try again immediately (no 5 second
wait), if it fails again, then exit.

If told to hold a reservation, then resend your reservation request once every
2 seconds (this actually has very minimal CPU/BUS usage and isn't as big a
deal as requesting a reservation every 2 seconds might sound).  The first time
the reservation is refused, consider the reservation stolen by another machine
and exit (or optionally, reboot).

The package is meant to lock against itself (in other words, a malicious user
with write access to the /dev/sg entries could confuse this locking mechanism,
but it will work cooperatively with other copies of itself running on other
machines), the requirements for the locking to be safe are as follows:

1)  A machine is not allowed to mount or otherwise use a drive in any way
shape or form until it has successfully acquired a reservation.

2)  Once a machine has a reservation, it is not allowed to ever take any
action to break another machines reservation, so that if the reservation is
stolen, this machine is required to "gracefully" step away from the drive
(rebooting is the best way to accomplish this since even the act of unmounting
the drive will attempt to write to it).

3)  The timeouts in the program must be honored (resend your reservation, when
you hold it, every 2 seconds so that a passive attempt to steal the
reservation will see you are still alive within the 5 second timeout and leave
you be, which is a sort of heartbeat in and of itself).

Anyway, as I said in my previous email, it's about 80% complete.  It currently
is up and running on SCSI-2 LUN based reservations.  There is code to do
SCSI-2 and SCSI-3 extent based reservations but it hasn't been tested due to
lack of devices that support extent based reservations (my test bed is a
multipath FC setup, so I'm doing all my testing on FC drives over two FC
controllers in the same machine).  I've still got to add the SCSI-3 Persistent
Reservation code to the library (again, I'm lacking test drives for this
scenario).  The library itself requires that the program treat all
reservations as extent/persistent reservations and it silently falls back to
LUN reservations when neither of those two are available.  My simple program
that goes with the application just makes extent reservations of the whole
disc, so it acts like a LUN reservation regardless, but there is considerably
more flexibility in the library if a person wishes to program to it.


-- 

 Doug L

Re: Linux Cluster using shared scsi

2001-05-01 Thread Alan Cox

> reserved.But if you did such a hot swap you would have "bigger
> fish to fry" in a HA application... I mean, none of your data would be
> there! 

You need to realise this has happened and do the right thing. Since
it could be an md raid array the hotswap is not fatal.

If its fatal you need to realise promptly before you either damage
the disk contents inserted in error (if possible) and so the HA
system can take countermeasures


> if the kernel (by this I mean the scsi midlayer) was maintaining
> reservations, that there would be some logic activated to "handle"
> this problem, whether it be re-reserving the device, or the ability to

Suppose the cluster nodes don't agree on the reservation table ?

> Bus resets in the Linux drivers also tend to happen frequently when a
> disk is failing, which has tended to leave the system in a somewhat
> functional but often an unusable state, (but that's a different story...)

The new scsi EH code in 2.4 for the drivers that use it is a lot better. Real
problem.


-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]



Re: Linux Cluster using shared scsi

2001-05-01 Thread Eric Z. Ayers

Alan Cox writes:
 > > Does this package also tell the kernel to "re-establish" a
 > > reservation for all devices after a bus reset, or at least inform a
 > > user level program?  Finding out when there has been a bus reset has
 > > been a stumbling block for me.
 > 
 > You cannot rely on a bus reset. Imagine hot swap disks on an FC fabric. I 
 > suspect the controller itself needs to call back for problem events
 > 

I'm not an SCSI expert by any stretch of the imagination.  I think
that what you are saying is that you cannot rely that a bus reset is
as only thing that will remove a reservation.  For example, if a
device is 'hot replaced', the device will (clearly) no longer be
reserved.But if you did such a hot swap you would have "bigger
fish to fry" in a HA application... I mean, none of your data would be
there! 

My understanding is that specifically, when a bus reset occurs,  all
SCSI reservations for devices on that bus are lost.  I was hoping that
if the kernel (by this I mean the scsi midlayer) was maintaining
reservations, that there would be some logic activated to "handle"
this problem, whether it be re-reserving the device, or the ability to
pass notification of a reset (or another problem event as you point
out) up to the application that's handling reservations. 

In my experience, the most common reason for a bus reset in parallel
SCSI is that a peer host on the bus is rebooting.  Since this happens
under normal operation and well in advance of any attempt to acess the
device, it would be nice if there were some sort of asyncronous
notification instead of a polling process with an interval of 2-3
minutes, where it's conceivable that the peer system could have booted
and attempted to take-over the disk out from under a running system.  

Bus resets in the Linux drivers also tend to happen frequently when a
disk is failing, which has tended to leave the system in a somewhat
functional but often an unusable state, (but that's a different story...)

James Bottomley <[EMAIL PROTECTED]> writes:
 > Essentially, there are many conditions which cause a quiet loss of a SCSI-2 
 > reservation.  Even in parallel SCSI: Reservations can be silently lost because 
 >of LUN reset, device reset or even simple powering off the device.
...

James mentions that even handling a bus reset still leaves a window
where a peer could grab the reservation out from underneath an
un-suspecting host.  I agree that this could happen, and the old host
might perform writes to an 'unreserved' disk,  but once the second
system suceeded in obtaining the reservation, any read/write commands
from the "old" host would return SCSI errors (this is my layman's
understanding - the commands would return a UNIT_RESERVED error) , so
I believe you would have the desired behavior in this kind of cluster
- only one machine in the cluster can access the disk at the same
time.  The data on the disk should be in a state where the second
system in the cluster could start a recovery task and begin to provide
the service hosted on the disk. 

-Eric.
--
Eric Z. Ayers Lead Software Engineer
Phone:  +1 404-705-2864Computer Generation, Incorporated
Fax:+1 404-705-2805 an Intec Telecom Systems Company
Web:http://www.intec-telecom-systems.com/
Email:  [EMAIL PROTECTED]
Postal: Bldg G 4th Floor, 5775 Peachtree-Dunwoody Rd, Atlanta, GA 30342 USA
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]



Re: Linux Cluster using shared scsi

2001-05-01 Thread James Bottomley

[EMAIL PROTECTED] said:
> Does this package also tell the kernel to "re-establish" a reservation
> for all devices after a bus reset, or at least inform a user level
> program?  Finding out when there has been a bus reset has been a
> stumbling block for me. 

[EMAIL PROTECTED] said:
> You cannot rely on a bus reset. Imagine hot swap disks on an FC
> fabric. I  suspect the controller itself needs to call back for
> problem events 

Essentially, there are many conditions which cause a quiet loss of a SCSI-2 
reservation.  Even in parallel SCSI: Reservations can be silently lost because 
of LUN reset, device reset or even simple powering off the device.

The way we maintain reservations for LifeKeeper is to have a user level daemon 
ping the device with a reservation command every few minutes.  If you get a 
RESERVATION_CONFLICT return you know that something else stole your 
reservation, otherwise you maintain it.  There is a window in this scheme 
where the device may be accessible by other initiators but that's the price 
you pay for using SCSI-2 reservations instead of the more cluster friendly 
SCSI-3 ones.  In a kernel scheme, you may get early notification of 
reservation loss by putting a hook into the processing of 
CHECK_CONDITION/UNIT_ATTENTION, but it won't close the window entirely.

James




-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]



Re: Linux Cluster using shared scsi

2001-05-01 Thread Alan Cox

> Does this package also tell the kernel to "re-establish" a
> reservation for all devices after a bus reset, or at least inform a
> user level program?  Finding out when there has been a bus reset has
> been a stumbling block for me.

You cannot rely on a bus reset. Imagine hot swap disks on an FC fabric. I 
suspect the controller itself needs to call back for problem events

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]



Re: Linux Cluster using shared scsi

2001-05-01 Thread Eric Z. Ayers

Doug Ledford writes:
(James Bottomley commented about the need for SCSI reservation kernel patches)
 > 
 > I agree.  It's something that needs fixed in general, your software needs it
 > as well, and I've written (about 80% done at this point) some open source
 > software geared towards getting/holding reservations that also requires the
 > same kernel patches (plus one more to be fully functional, an ioctl to allow a
 > SCSI reservation to do a forced reboot of a machine).  I'll be releasing that
 > package in the short term (once I get back from my vacation anyway).
 > 

Hello Doug,

Does this package also tell the kernel to "re-establish" a
reservation for all devices after a bus reset, or at least inform a
user level program?  Finding out when there has been a bus reset has
been a stumbling block for me.

-Eric.
--
Eric Z. Ayers Lead Software Engineer
Phone:  +1 404-705-2864Computer Generation, Incorporated
Fax:+1 404-705-2805 an Intec Telecom Systems Company
Web:http://www.intec-telecom-systems.com/
Email:  [EMAIL PROTECTED]
Postal: Bldg G 4th Floor, 5775 Peachtree-Dunwoody Rd, Atlanta, GA 30342 USA
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]



Re: Linux Cluster using shared scsi

2001-05-01 Thread Doug Ledford

James Bottomley wrote:
> 
> [EMAIL PROTECTED] said:
> > So, will Linux ever support the scsi reservation mechanism as standard?
> 
> That's not within my gift.  I can merely write the code that corrects the
> behaviour.  I can't force anyone else to accept it.

I think it will be standard before not too much longer (I hope anyway, I'm
tired of carrying the patches forward all the time so I'll lend my support to
getting it into the mainstream kernel ;-)

> [EMAIL PROTECTED] said:
> > Isn't there a standard that says if you scsi reserve a disk, no one
> > else should be able to access this disk, or is this a "steeleye/
> > Compaq" standard.
> 
> Use of reservations is laid out in the SCSI-2 and SCSI-3 standards (which can
> be downloaded from the T10 site www.t10.org) which are international in scope.
>  I think the implementation issues come because the reservations part is
> really only relevant to a multi-initiator clustered environment which isn't an
> every day configuration for most Linux users.  Obviously, as Linux moves into
> the SAN arena this type of configuration will become a lot more common, at
> which time the various problems associated with multiple initiators should
> rise in prominence.

I agree.  It's something that needs fixed in general, your software needs it
as well, and I've written (about 80% done at this point) some open source
software geared towards getting/holding reservations that also requires the
same kernel patches (plus one more to be fully functional, an ioctl to allow a
SCSI reservation to do a forced reboot of a machine).  I'll be releasing that
package in the short term (once I get back from my vacation anyway).

-- 

 Doug Ledford <[EMAIL PROTECTED]>  http://people.redhat.com/dledford
  Please check my web site for aic7xxx updates/answers before
  e-mailing me about problems
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]



Re: Linux Cluster using shared scsi

2001-05-01 Thread James Bottomley

[EMAIL PROTECTED] said:
> So, will Linux ever support the scsi reservation mechanism as standard? 

That's not within my gift.  I can merely write the code that corrects the 
behaviour.  I can't force anyone else to accept it.

[EMAIL PROTECTED] said:
> Isn't there a standard that says if you scsi reserve a disk, no one
> else should be able to access this disk, or is this a "steeleye/
> Compaq" standard. 

Use of reservations is laid out in the SCSI-2 and SCSI-3 standards (which can 
be downloaded from the T10 site www.t10.org) which are international in scope. 
 I think the implementation issues come because the reservations part is 
really only relevant to a multi-initiator clustered environment which isn't an 
every day configuration for most Linux users.  Obviously, as Linux moves into 
the SAN arena this type of configuration will become a lot more common, at 
which time the various problems associated with multiple initiators should 
rise in prominence.

James


-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]



RE: Linux Cluster using shared scsi

2001-05-01 Thread Roets, Chris

So, will Linux ever support the scsi reservation mechanism as standard ?
Isn't there a standard that says if you scsi reserve a disk, no one
else should be able to access this disk, or is this a "steeleye/Compaq"
standard.

Chris

-Original Message-
From: James Bottomley [mailto:[EMAIL PROTECTED]]
Sent: Friday, April 27, 2001 5:12 PM
To: Roets, Chris
Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]
Subject: Re: Linux Cluster using shared scsi


I've copied linux SCSI and quoted the entire message below so they can
follow.

Your assertion that this works in 2.2.16 is incorrect, the patch to fix the 
linux reservation conflict handler has never been added to the official
tree.  
I suspect you actually don't have vanilla 2.2.16 but instead have a redhat
or 
other distribution patched version.  Most distributions include the Steeleye

SCSI clustering patches which correct reservation handling.

I've attached the complete patch, which fixes both the old and the new error

handlers in the 2.2 kernel it applies against 2.2.18.

James Bottomley


> Problem :
> install two Linux-system with a shared scsi-bus and storage on that shared
> bus.
> suppose :
> system one : SCSI ID 7
> system two : SCSI ID 6
> shared disk : SCSI ID 4
> 
> By default, you can mount the disk on both system.  This is normal
> behavior, but
> may impose data corruption.
> To prevent this, you can SCSI-reserve a disk on one system.  If the other
> system
> would try to access this device, the system should return an i/o error due
> to the reservation.
> This is a common technique used in
> - Traditional Tru64 Unix ase clustering
> - Tr64 Unix V5 Clustering to accomplish i/o barriers
> - Windows-NT Clusters
> - Steel-eye clustering
> The reservation can be done using a standard tool like scu
> 
> scu -f /dev/sdb
> scu > reserve device
> 
> On Linux, this works fine under Kernel version 2.2.16.
> Below is the code that accomplish this
> /usr/src/linux/drivers/scsi/scsi_obsolete.c in routine scsi_old_done
> case RESERVATION_CONFLICT:
> printk("scsi%d (%d,%d,%d) : RESERVATION CONFLICT\n",
>SCpnt->host->host_no, SCpnt->channel,
>SCpnt->device->id, SCpnt->device->lun);
> status = CMD_FINISHED; /* returns I/O error */
> break;
> default:
> As of kernel version 2.2.18, this code has changed, If a scsi reserve
> error
> occurs, the device driver does a scsi reset.  This way the scsi
> reservation is
> gone, and the device can be accessed.
> /usr/src/linux/drivers/scsi/scsi_obsolete.c in routine scsi_old_done 
> case RESERVATION_CONFLICT:
> printk("scsi%d, channel %d : RESERVATION CONFLICT
> performing"
>" reset.\n", SCpnt->host->host_no, SCpnt->channel);
> scsi_reset(SCpnt, SCSI_RESET_SYNCHRONOUS);
> status = REDO;
> break;
> 
> Fix : delete the scsi reset in the kernel code
> case RESERVATION_CONFLICT:
> /* Deleted Chris Roets
> printk("scsi%d, channel %d : RESERVATION CONFLICT
> performing"
>" reset.\n", SCpnt->host->host_no, SCpnt->channel);
> scsi_reset(SCpnt, SCSI_RESET_SYNCHRONOUS);
> status = REDO;
> next four lines added */
> printk("scsi%d (%d,%d,%d) : RESERVATION CONFLICT\n",
>SCpnt->host->host_no, SCpnt->channel,
>SCpnt->device->id, SCpnt->device->lun);
> status = CMD_FINISHED; /* returns I/O error */
> break;
> 
> and rebuild the kernel.
> 
> This should get the customer being able to continue
> 
Questions  :
> - why  is this scsi reset done/added as of kernel version 2.2.18
> - as we are talking about an obsolete routine, how is this accomplished 
>  in the new code and how is it activated.  
>
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]



Re: Linux Cluster using shared scsi

2001-04-27 Thread James Bottomley

I've copied linux SCSI and quoted the entire message below so they can follow.

Your assertion that this works in 2.2.16 is incorrect, the patch to fix the 
linux reservation conflict handler has never been added to the official tree.  
I suspect you actually don't have vanilla 2.2.16 but instead have a redhat or 
other distribution patched version.  Most distributions include the Steeleye 
SCSI clustering patches which correct reservation handling.

I've attached the complete patch, which fixes both the old and the new error 
handlers in the 2.2 kernel it applies against 2.2.18.

James Bottomley


> Problem :
> install two Linux-system with a shared scsi-bus and storage on that shared
> bus.
> suppose :
> system one : SCSI ID 7
> system two : SCSI ID 6
> shared disk : SCSI ID 4
> 
> By default, you can mount the disk on both system.  This is normal
> behavior, but
> may impose data corruption.
> To prevent this, you can SCSI-reserve a disk on one system.  If the other
> system
> would try to access this device, the system should return an i/o error due
> to the reservation.
> This is a common technique used in
> - Traditional Tru64 Unix ase clustering
> - Tr64 Unix V5 Clustering to accomplish i/o barriers
> - Windows-NT Clusters
> - Steel-eye clustering
> The reservation can be done using a standard tool like scu
> 
> scu -f /dev/sdb
> scu > reserve device
> 
> On Linux, this works fine under Kernel version 2.2.16.
> Below is the code that accomplish this
> /usr/src/linux/drivers/scsi/scsi_obsolete.c in routine scsi_old_done
> case RESERVATION_CONFLICT:
> printk("scsi%d (%d,%d,%d) : RESERVATION CONFLICT\n",
>SCpnt->host->host_no, SCpnt->channel,
>SCpnt->device->id, SCpnt->device->lun);
> status = CMD_FINISHED; /* returns I/O error */
> break;
> default:
> As of kernel version 2.2.18, this code has changed, If a scsi reserve
> error
> occurs, the device driver does a scsi reset.  This way the scsi
> reservation is
> gone, and the device can be accessed.
> /usr/src/linux/drivers/scsi/scsi_obsolete.c in routine scsi_old_done 
> case RESERVATION_CONFLICT:
> printk("scsi%d, channel %d : RESERVATION CONFLICT
> performing"
>" reset.\n", SCpnt->host->host_no, SCpnt->channel);
> scsi_reset(SCpnt, SCSI_RESET_SYNCHRONOUS);
> status = REDO;
> break;
> 
> Fix : delete the scsi reset in the kernel code
> case RESERVATION_CONFLICT:
> /* Deleted Chris Roets
> printk("scsi%d, channel %d : RESERVATION CONFLICT
> performing"
>" reset.\n", SCpnt->host->host_no, SCpnt->channel);
> scsi_reset(SCpnt, SCSI_RESET_SYNCHRONOUS);
> status = REDO;
> next four lines added */
> printk("scsi%d (%d,%d,%d) : RESERVATION CONFLICT\n",
>SCpnt->host->host_no, SCpnt->channel,
>SCpnt->device->id, SCpnt->device->lun);
> status = CMD_FINISHED; /* returns I/O error */
> break;
> 
> and rebuild the kernel.
> 
> This should get the customer being able to continue
> 
Questions  :
> - why  is this scsi reset done/added as of kernel version 2.2.18
> - as we are talking about an obsolete routine, how is this accomplished 
>  in the new code and how is it activated.  
>


Index: linux/2.2/drivers/scsi/scsi.c
diff -u linux/2.2/drivers/scsi/scsi.c:1.1.1.9 linux/2.2/drivers/scsi/scsi.c:1.1.1.9.2.4
--- linux/2.2/drivers/scsi/scsi.c:1.1.1.9   Thu Feb 15 12:53:35 2001
+++ linux/2.2/drivers/scsi/scsi.c   Fri Mar  2 18:04:40 2001
@@ -198,7 +198,13 @@
  */
 extern void scsi_old_done (Scsi_Cmnd *SCpnt);
 extern void scsi_old_times_out (Scsi_Cmnd * SCpnt);
+extern int scsi_old_reset(Scsi_Cmnd *SCpnt, unsigned int flag);
 
+/* 
+ * private interface into the new error handling code
+ */
+extern int scsi_new_reset(Scsi_Cmnd *SCpnt, unsigned int flag);
+
 #if CONFIG_PROC_FS
 extern int (* dispatch_scsi_info_ptr)(int ino, char *buffer, char **start,
  off_t offset, int length, int inout);
@@ -724,7 +730,7 @@
   SCSI_LOG_SCAN_BUS(3,print_hostbyte(SCpnt->result));
   SCSI_LOG_SCAN_BUS(3,printk("\n"));
 
-  if (SCpnt->result) {
+  if (SCpnt->result && status_byte(SCpnt->result) != RESERVATION_CONFLICT) {
 if (((driver_byte (SCpnt->result) & DRIVER_SENSE) ||
  (status_byte (SCpnt->result) & CHECK_CONDITION)) &&
 ((SCpnt->sense_buffer[0] & 0x70) >> 4) == 7) {
@@ -2180,6 +2186,87 @@
printk("\n");
 }
 
+/* Dummy done routine.  We don't want the bogus command used for the
+ * bus/device reset to find its way into the mid-layer so we intercept
+ * it here */
+static void
+scsi_reset_provider_done_command(Scsi_Cmnd *SCpnt) {
+/* Empty function.  Some low level drivers will call scsi_done
+ * (and en