Re: [RFC/Patch 2.6.11] Take control of PCI Master Abort Mode

2005-04-14 Thread Daniel Egger
On 14.04.2005, at 19:25, Ross Biro wrote:
Just to be clear, we can have two users A and B with the exact same
hardware.  A setting of  =y will screw user A and a setting of =n will
screw user B.  Ideally, they would both get better hardware, but that
is not always an option.
You tell me a better[1] 32bit GigE PCI adapter than Intel E1000
and I sure do this. It's pretty interesting to see that those
who buy some not-so-cheeep hardware are being screwed in this
case; it should be in Intels best interest to help fix this
issue ASAP and permantently for all users.
[1] better performance at less CPU utilization + good diagnostics
and negotiation capabilities
Servus,
  Daniel


PGP.sig
Description: This is a digitally signed message part


Re: [RFC/Patch 2.6.11] Take control of PCI Master Abort Mode

2005-04-14 Thread Dave Jones
On Thu, Apr 14, 2005 at 08:02:02PM +0200, Andi Kleen wrote:
 > > What if it was always on, except when the commandlien was passed
 > > (eliminate the CONFIG option)?  Really 'leet hacks could tweak a #define
 > > if they don't like the command line option..
 > 
 > That is basically what I suggested. But test it for a month
 > in -mm* first and figure out if it needs more black/whitelisting

Indeed. I'm in full agreement with Andi's suggestion.

Dave

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC/Patch 2.6.11] Take control of PCI Master Abort Mode

2005-04-14 Thread Andi Kleen
> What if it was always on, except when the commandlien was passed
> (eliminate the CONFIG option)?  Really 'leet hacks could tweak a #define
> if they don't like the command line option..

That is basically what I suggested. But test it for a month
in -mm* first and figure out if it needs more black/whitelisting

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC/Patch 2.6.11] Take control of PCI Master Abort Mode

2005-04-14 Thread Tim Hockin
On 4/13/05, Dave Jones <[EMAIL PROTECTED]> wrote:

> If we have a situation where we screw a subset of users with the
> config option =y and a different subset with =n, how is this improving
> the situation any over what we have today ?

Dave,

What's a good alternative?  Do we need to keep a whitelist of hardware
that is known to work?  A blacklist is pretty risky, since this is a very
hard problem to find.

What if it was always on, except when the commandlien was passed
(eliminate the CONFIG option)?  Really 'leet hacks could tweak a #define
if they don't like the command line option..

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC/Patch 2.6.11] Take control of PCI Master Abort Mode

2005-04-14 Thread Ross Biro
On 4/13/05, Dave Jones <[EMAIL PROTECTED]> wrote:

> If we have a situation where we screw a subset of users with the
> config option =y and a different subset with =n, how is this improving
> the situation any over what we have today ?

This is exactly the case and this is better than what we have today
because it makes it easy to chose =y or =n, so rather than making
things work for subset 1 and screwing subset 2. Each distro can chose
which subset to screw by default and make it easy for them to unscrew
themselves.

Just to be clear, we can have two users A and B with the exact same
hardware.  A setting of  =y will screw user A and a setting of =n will
screw user B.  Ideally, they would both get better hardware, but that
is not always an option.

Ross

Ross
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC/Patch 2.6.11] Take control of PCI Master Abort Mode

2005-04-14 Thread Ross Biro
On 4/13/05, Dave Jones [EMAIL PROTECTED] wrote:

 If we have a situation where we screw a subset of users with the
 config option =y and a different subset with =n, how is this improving
 the situation any over what we have today ?

This is exactly the case and this is better than what we have today
because it makes it easy to chose =y or =n, so rather than making
things work for subset 1 and screwing subset 2. Each distro can chose
which subset to screw by default and make it easy for them to unscrew
themselves.

Just to be clear, we can have two users A and B with the exact same
hardware.  A setting of  =y will screw user A and a setting of =n will
screw user B.  Ideally, they would both get better hardware, but that
is not always an option.

Ross

Ross
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC/Patch 2.6.11] Take control of PCI Master Abort Mode

2005-04-14 Thread Tim Hockin
On 4/13/05, Dave Jones [EMAIL PROTECTED] wrote:

 If we have a situation where we screw a subset of users with the
 config option =y and a different subset with =n, how is this improving
 the situation any over what we have today ?

Dave,

What's a good alternative?  Do we need to keep a whitelist of hardware
that is known to work?  A blacklist is pretty risky, since this is a very
hard problem to find.

What if it was always on, except when the commandlien was passed
(eliminate the CONFIG option)?  Really 'leet hacks could tweak a #define
if they don't like the command line option..

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC/Patch 2.6.11] Take control of PCI Master Abort Mode

2005-04-14 Thread Andi Kleen
 What if it was always on, except when the commandlien was passed
 (eliminate the CONFIG option)?  Really 'leet hacks could tweak a #define
 if they don't like the command line option..

That is basically what I suggested. But test it for a month
in -mm* first and figure out if it needs more black/whitelisting

-Andi
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC/Patch 2.6.11] Take control of PCI Master Abort Mode

2005-04-14 Thread Dave Jones
On Thu, Apr 14, 2005 at 08:02:02PM +0200, Andi Kleen wrote:
   What if it was always on, except when the commandlien was passed
   (eliminate the CONFIG option)?  Really 'leet hacks could tweak a #define
   if they don't like the command line option..
  
  That is basically what I suggested. But test it for a month
  in -mm* first and figure out if it needs more black/whitelisting

Indeed. I'm in full agreement with Andi's suggestion.

Dave

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC/Patch 2.6.11] Take control of PCI Master Abort Mode

2005-04-14 Thread Daniel Egger
On 14.04.2005, at 19:25, Ross Biro wrote:
Just to be clear, we can have two users A and B with the exact same
hardware.  A setting of  =y will screw user A and a setting of =n will
screw user B.  Ideally, they would both get better hardware, but that
is not always an option.
You tell me a better[1] 32bit GigE PCI adapter than Intel E1000
and I sure do this. It's pretty interesting to see that those
who buy some not-so-cheeep hardware are being screwed in this
case; it should be in Intels best interest to help fix this
issue ASAP and permantently for all users.
[1] better performance at less CPU utilization + good diagnostics
and negotiation capabilities
Servus,
  Daniel


PGP.sig
Description: This is a digitally signed message part


Re: [RFC/Patch 2.6.11] Take control of PCI Master Abort Mode

2005-04-13 Thread Dave Jones
On Wed, Apr 13, 2005 at 07:00:06PM -0400, Ross Biro wrote:

 > > If you take a look at quirks.c and DMI options you will see we have quite 
 > > a lot
 > > of workarounds for various hardware bug. Just imagine there were
 > > CONFIG options for all of this. It would be a big mess!
 > 
 >  The config option is for distro maintainers to use to set a policy
 > for their particular distribution.  The boot line option is for end
 > users to adjust it.  Last I heard, most distro makers compile their
 > own kernels and select options appropriately.  I really don't think
 > it's too much to ask an end user to adjust their grub.conf or
 > lilo.conf file to work around a bug in their hardware, especially
 > since their is *no way* to work around the bug in all cases with out
 > user intervention.

The thing is, most users won't have a clue about this option,
and that is a good thing. They just want stuff to work, not have
to poke random bits and pieces.

 > As I said before, the quirks routines cannot handle it since there is
 > no way to know what the correct setting is unless you know what
 > application is going to be run and what the users tolerance to
 > particular problems is.  In a perfect world, master abort mode would
 > always be set to on, but that is not practical in the real world.  If
 > you are suggesting that something in the quirks file stop the boot and
 > ask the user some questions about how they intend to use the system
 > and what their tolerance for certain types of errors is, then I think
 > you are suggesting an even bigger mess.

You don't need to ask the user anything (they won't know the answers anyway)
You already mentioned that E1000's cause this problem, so you have the
basis for the beginning of a blacklist.  A patch to explicitly enable
this feature in -mm for a while will probably shake out most of the
common problematic hardware pretty quickly.

 > Someone creating a dstro for enterprise use would most likely compile
 > the kernel with master abort mode enabled to prevent silent data loss.
 >  Someone building the system for desktop use would choose either
 > default or disabled, to prevent spurious error messages, or hardware
 > lock ups. 

So its ok for enterprise use to spew error msgs and have hardware lockups ?
See the problem with setting it either on/off ? We need to take
additional factors into consideration, or we're left with something
thats essentially useless.
 
 > If users report problems that look like they are caused by
 > the master abort mode setting, a tech support person could easily ask
 > the end user to add a boot time command line option to see if the
 > problem goes away.  The end user would then have the *option* of
 > adjusting the config file, or just using the boot time option.

A lock-up could be caused by any number of problems, and I'll put money
on even the best support guys not knowing about this option 6 months
after it got merged. Obscure toggles for esoteric features like this
get forgotten about quickly. It's more likely the support bod would
chase down other avenues before ever hitting upon this.

 > I would aggree with you if it were not for the fact that the correct
 > setting of this bit is really a judgement call, so it must be simple
 > for anyone who needs to make the call to be able to.  The people
 > building distors will need to be able change the default setting
 > easily at compile time and the end user needs to be able to change the
 > setting at boot time or run time.

As someone who builds distro kernels I disagree.
End users need things to 'just work'. 99% of end-users don't know, or care
about quirks in their hardware.  If we start expecting the bulk of
them to have to go editing their grub/lilo/etc configs, we've lost.

 > Someone on the PCI mailing list has suggested that it is enough to let
 > the distro maintainer edit the header file and adjust the setting
 > there.  To do so would mean that many distro maintainers would have to
 > maintain an additional patch for very little reason.  Perhaps the
 > correct solution is to keep it as a config option and add a
 > CONFIG_OBSCURE so that most people don't ever see option, but the few
 > that need to can.

If we have a situation where we screw a subset of users with the
config option =y and a different subset with =n, how is this improving
the situation any over what we have today ?

Dave

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC/Patch 2.6.11] Take control of PCI Master Abort Mode

2005-04-13 Thread Ross Biro
On 13 Apr 2005 20:37:25 +0200, Andi Kleen <[EMAIL PROTECTED]> wrote:
> \>
> > You're argument that no one can make sense of such options is totally off
> > base. Once you are having a problem, it's pretty easy to see if it's related
> 
> I dont think it is in any way help to put suche highly obscure
> things into Config. Near nobody can make any sense of it.
> 
> If you take a look at quirks.c and DMI options you will see we have quite a 
> lot
> of workarounds for various hardware bug. Just imagine there were
> CONFIG options for all of this. It would be a big mess!

 The config option is for distro maintainers to use to set a policy
for their particular distribution.  The boot line option is for end
users to adjust it.  Last I heard, most distro makers compile their
own kernels and select options appropriately.  I really don't think
it's too much to ask an end user to adjust their grub.conf or
lilo.conf file to work around a bug in their hardware, especially
since their is *no way* to work around the bug in all cases with out
user intervention.

As I said before, the quirks routines cannot handle it since there is
no way to know what the correct setting is unless you know what
application is going to be run and what the users tolerance to
particular problems is.  In a perfect world, master abort mode would
always be set to on, but that is not practical in the real world.  If
you are suggesting that something in the quirks file stop the boot and
ask the user some questions about how they intend to use the system
and what their tolerance for certain types of errors is, then I think
you are suggesting an even bigger mess.

Someone creating a dstro for enterprise use would most likely compile
the kernel with master abort mode enabled to prevent silent data loss.
 Someone building the system for desktop use would choose either
default or disabled, to prevent spurious error messages, or hardware
lock ups.  If users report problems that look like they are caused by
the master abort mode setting, a tech support person could easily ask
the end user to add a boot time command line option to see if the
problem goes away.  The end user would then have the *option* of
adjusting the config file, or just using the boot time option.

I would aggree with you if it were not for the fact that the correct
setting of this bit is really a judgement call, so it must be simple
for anyone who needs to make the call to be able to.  The people
building distors will need to be able change the default setting
easily at compile time and the end user needs to be able to change the
setting at boot time or run time.

Someone on the PCI mailing list has suggested that it is enough to let
the distro maintainer edit the header file and adjust the setting
there.  To do so would mean that many distro maintainers would have to
maintain an additional patch for very little reason.  Perhaps the
correct solution is to keep it as a config option and add a
CONFIG_OBSCURE so that most people don't ever see option, but the few
that need to can.

Ross
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC/Patch 2.6.11] Take control of PCI Master Abort Mode

2005-04-13 Thread Andi Kleen
On Tue, Apr 12, 2005 at 10:52:55AM -0400, Ross Biro wrote:
> On Apr 10, 2005 9:29 AM, Andi Kleen <[EMAIL PROTECTED]> wrote:
> > 
> > 
> > The right way to do this would be to have sysfs knobs that allow
> > to change these bits, and then let a user space tool change
> > it depending on PCI-ID. If the issue is critical enough
> > that it happens very often then it should be added to kernel
> > pci quirks - but again be unconditional.
> 
> 
> Using user space knobs has advantages, but nothing can depend on just the 
> hardware configuration. The application the machine is being used for also 
> matters. Image you have one of the bad NICs and an IDE controller behind the 
> same bridge. Then you have to chose between silent data corruption and the 
> NIC locking up for up to a few minutes once in a while. The correct choice 
> depends on the application. 
> 
> For the way we use machines, we are better off with a compile time option 
> and no boot line override. That's clearly wrong for general use.

That is definitely wrong for general use. In fact the Linux kernel
has been moving away from the old "put weird workarounds into CONFIG"
for quite some time now. One big reason is that actually most 
users use binary kernels these days, but even for us who recompile
kernels regularly it is inconvenient to recompile kernels just for
such things.

If you want it compiled in for your use case I would recommend
that you add a local patch or add a patch for a compiled in kernel
command line in config (some non i386 archs have this already)

> 
> You're argument that no one can make sense of such options is totally off 
> base. Once you are having a problem, it's pretty easy to see if it's related 

I dont think it is in any way help to put suche highly obscure
things into Config. Near nobody can make any sense of it.

If you take a look at quirks.c and DMI options you will see we have quite a lot 
of workarounds for various hardware bug. Just imagine there were 
CONFIG options for all of this. It would be a big mess!

> to a wrong master abort mode setting. If you see data that is all 0xff's 
> somewhere it shouldn't be, for example on a hard drive sector (it usually 
> occurs in the file system meta data and not in the data itself) you need to 
> force master abort mode on. If you have a mis-behaving PCI device and 
> everytime it misbehaves, the saw target abort bit is set, then you need to 
> force master abort mode off. First line tech support people should be able 
> to tell users to use these settings.

Yeah, but that is impossible if it is a CONFIG - they would need
to expnain the users first how to recompile a kernel, which would
be totally wasted time because it can be set fine without any recompilation
if done properly.

> 
> I actually don't see any reason you would ever want master abort mode off, 
> other than you have buggy hardware. Unfortunately when you are working with 
> PC's you have to assume you always have buggy hardware. I don't have much 
> experience with other platforms, so I'll assume they are better (those of 
> you with experience, please do not disillusion me.)

Probably yes. 

What you could do is to put a experimental patch that forces this always
into -mm* for a few weeks and see if there are any bad reports.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC/Patch 2.6.11] Take control of PCI Master Abort Mode

2005-04-13 Thread Andi Kleen
On Tue, Apr 12, 2005 at 10:52:55AM -0400, Ross Biro wrote:
 On Apr 10, 2005 9:29 AM, Andi Kleen [EMAIL PROTECTED] wrote:
  
  
  The right way to do this would be to have sysfs knobs that allow
  to change these bits, and then let a user space tool change
  it depending on PCI-ID. If the issue is critical enough
  that it happens very often then it should be added to kernel
  pci quirks - but again be unconditional.
 
 
 Using user space knobs has advantages, but nothing can depend on just the 
 hardware configuration. The application the machine is being used for also 
 matters. Image you have one of the bad NICs and an IDE controller behind the 
 same bridge. Then you have to chose between silent data corruption and the 
 NIC locking up for up to a few minutes once in a while. The correct choice 
 depends on the application. 
 
 For the way we use machines, we are better off with a compile time option 
 and no boot line override. That's clearly wrong for general use.

That is definitely wrong for general use. In fact the Linux kernel
has been moving away from the old put weird workarounds into CONFIG
for quite some time now. One big reason is that actually most 
users use binary kernels these days, but even for us who recompile
kernels regularly it is inconvenient to recompile kernels just for
such things.

If you want it compiled in for your use case I would recommend
that you add a local patch or add a patch for a compiled in kernel
command line in config (some non i386 archs have this already)

 
 You're argument that no one can make sense of such options is totally off 
 base. Once you are having a problem, it's pretty easy to see if it's related 

I dont think it is in any way help to put suche highly obscure
things into Config. Near nobody can make any sense of it.

If you take a look at quirks.c and DMI options you will see we have quite a lot 
of workarounds for various hardware bug. Just imagine there were 
CONFIG options for all of this. It would be a big mess!

 to a wrong master abort mode setting. If you see data that is all 0xff's 
 somewhere it shouldn't be, for example on a hard drive sector (it usually 
 occurs in the file system meta data and not in the data itself) you need to 
 force master abort mode on. If you have a mis-behaving PCI device and 
 everytime it misbehaves, the saw target abort bit is set, then you need to 
 force master abort mode off. First line tech support people should be able 
 to tell users to use these settings.

Yeah, but that is impossible if it is a CONFIG - they would need
to expnain the users first how to recompile a kernel, which would
be totally wasted time because it can be set fine without any recompilation
if done properly.

 
 I actually don't see any reason you would ever want master abort mode off, 
 other than you have buggy hardware. Unfortunately when you are working with 
 PC's you have to assume you always have buggy hardware. I don't have much 
 experience with other platforms, so I'll assume they are better (those of 
 you with experience, please do not disillusion me.)

Probably yes. 

What you could do is to put a experimental patch that forces this always
into -mm* for a few weeks and see if there are any bad reports.

-Andi
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC/Patch 2.6.11] Take control of PCI Master Abort Mode

2005-04-13 Thread Ross Biro
On 13 Apr 2005 20:37:25 +0200, Andi Kleen [EMAIL PROTECTED] wrote:
 \
  You're argument that no one can make sense of such options is totally off
  base. Once you are having a problem, it's pretty easy to see if it's related
 
 I dont think it is in any way help to put suche highly obscure
 things into Config. Near nobody can make any sense of it.
 
 If you take a look at quirks.c and DMI options you will see we have quite a 
 lot
 of workarounds for various hardware bug. Just imagine there were
 CONFIG options for all of this. It would be a big mess!

 The config option is for distro maintainers to use to set a policy
for their particular distribution.  The boot line option is for end
users to adjust it.  Last I heard, most distro makers compile their
own kernels and select options appropriately.  I really don't think
it's too much to ask an end user to adjust their grub.conf or
lilo.conf file to work around a bug in their hardware, especially
since their is *no way* to work around the bug in all cases with out
user intervention.

As I said before, the quirks routines cannot handle it since there is
no way to know what the correct setting is unless you know what
application is going to be run and what the users tolerance to
particular problems is.  In a perfect world, master abort mode would
always be set to on, but that is not practical in the real world.  If
you are suggesting that something in the quirks file stop the boot and
ask the user some questions about how they intend to use the system
and what their tolerance for certain types of errors is, then I think
you are suggesting an even bigger mess.

Someone creating a dstro for enterprise use would most likely compile
the kernel with master abort mode enabled to prevent silent data loss.
 Someone building the system for desktop use would choose either
default or disabled, to prevent spurious error messages, or hardware
lock ups.  If users report problems that look like they are caused by
the master abort mode setting, a tech support person could easily ask
the end user to add a boot time command line option to see if the
problem goes away.  The end user would then have the *option* of
adjusting the config file, or just using the boot time option.

I would aggree with you if it were not for the fact that the correct
setting of this bit is really a judgement call, so it must be simple
for anyone who needs to make the call to be able to.  The people
building distors will need to be able change the default setting
easily at compile time and the end user needs to be able to change the
setting at boot time or run time.

Someone on the PCI mailing list has suggested that it is enough to let
the distro maintainer edit the header file and adjust the setting
there.  To do so would mean that many distro maintainers would have to
maintain an additional patch for very little reason.  Perhaps the
correct solution is to keep it as a config option and add a
CONFIG_OBSCURE so that most people don't ever see option, but the few
that need to can.

Ross
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC/Patch 2.6.11] Take control of PCI Master Abort Mode

2005-04-13 Thread Dave Jones
On Wed, Apr 13, 2005 at 07:00:06PM -0400, Ross Biro wrote:

   If you take a look at quirks.c and DMI options you will see we have quite 
   a lot
   of workarounds for various hardware bug. Just imagine there were
   CONFIG options for all of this. It would be a big mess!
  
   The config option is for distro maintainers to use to set a policy
  for their particular distribution.  The boot line option is for end
  users to adjust it.  Last I heard, most distro makers compile their
  own kernels and select options appropriately.  I really don't think
  it's too much to ask an end user to adjust their grub.conf or
  lilo.conf file to work around a bug in their hardware, especially
  since their is *no way* to work around the bug in all cases with out
  user intervention.

The thing is, most users won't have a clue about this option,
and that is a good thing. They just want stuff to work, not have
to poke random bits and pieces.

  As I said before, the quirks routines cannot handle it since there is
  no way to know what the correct setting is unless you know what
  application is going to be run and what the users tolerance to
  particular problems is.  In a perfect world, master abort mode would
  always be set to on, but that is not practical in the real world.  If
  you are suggesting that something in the quirks file stop the boot and
  ask the user some questions about how they intend to use the system
  and what their tolerance for certain types of errors is, then I think
  you are suggesting an even bigger mess.

You don't need to ask the user anything (they won't know the answers anyway)
You already mentioned that E1000's cause this problem, so you have the
basis for the beginning of a blacklist.  A patch to explicitly enable
this feature in -mm for a while will probably shake out most of the
common problematic hardware pretty quickly.

  Someone creating a dstro for enterprise use would most likely compile
  the kernel with master abort mode enabled to prevent silent data loss.
   Someone building the system for desktop use would choose either
  default or disabled, to prevent spurious error messages, or hardware
  lock ups. 

So its ok for enterprise use to spew error msgs and have hardware lockups ?
See the problem with setting it either on/off ? We need to take
additional factors into consideration, or we're left with something
thats essentially useless.
 
  If users report problems that look like they are caused by
  the master abort mode setting, a tech support person could easily ask
  the end user to add a boot time command line option to see if the
  problem goes away.  The end user would then have the *option* of
  adjusting the config file, or just using the boot time option.

A lock-up could be caused by any number of problems, and I'll put money
on even the best support guys not knowing about this option 6 months
after it got merged. Obscure toggles for esoteric features like this
get forgotten about quickly. It's more likely the support bod would
chase down other avenues before ever hitting upon this.

  I would aggree with you if it were not for the fact that the correct
  setting of this bit is really a judgement call, so it must be simple
  for anyone who needs to make the call to be able to.  The people
  building distors will need to be able change the default setting
  easily at compile time and the end user needs to be able to change the
  setting at boot time or run time.

As someone who builds distro kernels I disagree.
End users need things to 'just work'. 99% of end-users don't know, or care
about quirks in their hardware.  If we start expecting the bulk of
them to have to go editing their grub/lilo/etc configs, we've lost.

  Someone on the PCI mailing list has suggested that it is enough to let
  the distro maintainer edit the header file and adjust the setting
  there.  To do so would mean that many distro maintainers would have to
  maintain an additional patch for very little reason.  Perhaps the
  correct solution is to keep it as a config option and add a
  CONFIG_OBSCURE so that most people don't ever see option, but the few
  that need to can.

If we have a situation where we screw a subset of users with the
config option =y and a different subset with =n, how is this improving
the situation any over what we have today ?

Dave

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC/Patch 2.6.11] Take control of PCI Master Abort Mode

2005-04-10 Thread Andi Kleen
Ross Biro <[EMAIL PROTECTED]> writes:
>
> I even have a single motherboard with both a device that cannot handle
> the target abort and an IDE controller that can handle the target
> abort behind the same bridge.  For this motherboard, I have to choose
> the lesser of two evils, network hiccups or potential data corruption.
> For the record, I have seen both occur.  Other people may make wish to
> make a different choice than we did, hence this patch allows the user
> to choose the mode at runtime.

I think it is totally wrong to make this Configs and boot options.
Nobody can do anything with such obscure boot configurations
and it is bad to require kernel recompiles for such things.

The right way to do this would be to have sysfs knobs that allow
to change these bits, and then let a user space tool change
it depending on PCI-ID. If the issue is critical enough
that it happens very often then it should be added to kernel
pci quirks - but again be unconditional.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC/Patch 2.6.11] Take control of PCI Master Abort Mode

2005-04-10 Thread Andi Kleen
Ross Biro [EMAIL PROTECTED] writes:

 I even have a single motherboard with both a device that cannot handle
 the target abort and an IDE controller that can handle the target
 abort behind the same bridge.  For this motherboard, I have to choose
 the lesser of two evils, network hiccups or potential data corruption.
 For the record, I have seen both occur.  Other people may make wish to
 make a different choice than we did, hence this patch allows the user
 to choose the mode at runtime.

I think it is totally wrong to make this Configs and boot options.
Nobody can do anything with such obscure boot configurations
and it is bad to require kernel recompiles for such things.

The right way to do this would be to have sysfs knobs that allow
to change these bits, and then let a user space tool change
it depending on PCI-ID. If the issue is critical enough
that it happens very often then it should be added to kernel
pci quirks - but again be unconditional.

-Andi
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC/Patch 2.6.11] Take control of PCI Master Abort Mode

2005-04-06 Thread Daniel Egger
On 05.04.2005, at 21:33, Ross Biro wrote:
The problem with always setting the bit is that some PCI hardware, 
notably some Intel E-1000 chips (Ethernet controller: Intel 
Corporation: Unknown device 1076) cannot properly handle the target 
abort bit.  In the case of the E-1000 chip, the driver must reset the 
chip to recover. This usually leads to the machine being off the 
network for several seconds, or sometimes even minutes, which can be 
bad for servers.
This sounds *exactly* like my problem since I swapped
motherboards. I'll see whether there's some option in
the BIOS that fixes it and if not bite the bullet and
compile a generic kernel
Thanks a lot for investigating this.
Servus,
  Daniel


PGP.sig
Description: This is a digitally signed message part


Re: [RFC/Patch 2.6.11] Take control of PCI Master Abort Mode

2005-04-06 Thread Ross Biro
Randy.Dunlap wrote:

Is this related (or could it be -- or should it be) at all to the
current discussion on the linux-pci mailing list
[EMAIL PROTECTED]) about "PCI Error Recovery
API Proposal" ?

I'm not familiar with the proposal, but this is not related to error 
recovery since master aborts are a way of life on the PCI bus and things 
just need to deal.  The only question is how.


the master.  This can only happen when the system is heavily loaded.

or a PCI device isn't playing nicely?
Yes, but at least then you could blame the device in that case.
[ style and grammar comments noted ]
One thing I did fail to mention in my original post is that all of this 
could be done by rc scripts from user space, but that seems unclean to me.

Ross
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC/Patch 2.6.11] Take control of PCI Master Abort Mode

2005-04-06 Thread Ross Biro
Randy.Dunlap wrote:

Is this related (or could it be -- or should it be) at all to the
current discussion on the linux-pci mailing list
[EMAIL PROTECTED]) about PCI Error Recovery
API Proposal ?

I'm not familiar with the proposal, but this is not related to error 
recovery since master aborts are a way of life on the PCI bus and things 
just need to deal.  The only question is how.


the master.  This can only happen when the system is heavily loaded.

or a PCI device isn't playing nicely?
Yes, but at least then you could blame the device in that case.
[ style and grammar comments noted ]
One thing I did fail to mention in my original post is that all of this 
could be done by rc scripts from user space, but that seems unclean to me.

Ross
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC/Patch 2.6.11] Take control of PCI Master Abort Mode

2005-04-06 Thread Daniel Egger
On 05.04.2005, at 21:33, Ross Biro wrote:
The problem with always setting the bit is that some PCI hardware, 
notably some Intel E-1000 chips (Ethernet controller: Intel 
Corporation: Unknown device 1076) cannot properly handle the target 
abort bit.  In the case of the E-1000 chip, the driver must reset the 
chip to recover. This usually leads to the machine being off the 
network for several seconds, or sometimes even minutes, which can be 
bad for servers.
This sounds *exactly* like my problem since I swapped
motherboards. I'll see whether there's some option in
the BIOS that fixes it and if not bite the bullet and
compile a generic kernel
Thanks a lot for investigating this.
Servus,
  Daniel


PGP.sig
Description: This is a digitally signed message part


Re: [RFC/Patch 2.6.11] Take control of PCI Master Abort Mode

2005-04-05 Thread Randy.Dunlap
Ross Biro wrote:
Currently Linux 2.6 assumes the BIOS (or firmware) sets the master abort 
mode flag on PCI bridge chips in a coherent fashion.  This is not always 
the case and the consequences of getting this flag incorrect can cause 
hardware to fail or silent data corruption.  This patch lets the user 
override the BIOS master abort setting at boot time and the distro 
maintainer to set a default according to their target audience.

The comments in the patch are probably a bit too verbose, but I think it 
is a good patch to start discussions around.  If it is decided that 
something should be done about this problem, this patch could be 
included in a -mm release and migrate into Linus's kernel as appropriate.
The comments were helpful to me.
This incarnation of the patch has had minimal testing.  For our internal 
kernels, we always force the master abort mode to 1 and then let the 
device drivers for hardware we know can't handle target aborts switch 
the master abort mode to 0. This does not seem appropriate for general 
release.

Some background for those who do not spend most of their waking hours 
exploring buses and what can go wrong.
Is this related (or could it be -- or should it be) at all to the
current discussion on the linux-pci mailing list
[EMAIL PROTECTED]) about "PCI Error Recovery
API Proposal" ?
The master abort flag tells a PCI bridge what to do when a bus master 
behind the bridge requests the bus and the bridge is unable to get the 
bus.  With the flag clear, for master reads the bridge returns all 
0xff's (hence silent data corruption) and for master writes, it throws 
the data away.  With the bit set, the bridge sends a target abort to the 
master.  This can only happen when the system is heavily loaded.
or a PCI device isn't playing nicely?
The problem with always setting the bit is that some PCI hardware, 
notably some Intel E-1000 chips (Ethernet controller: Intel Corporation: 
Unknown device 1076) cannot properly handle the target abort bit.  In 
the case of the E-1000 chip, the driver must reset the chip to recover. 
This usually leads to the machine being off the network for several 
seconds, or sometimes even minutes, which can be bad for servers.

I even have a single motherboard with both a device that cannot handle 
the target abort and an IDE controller that can handle the target abort 
behind the same bridge.  For this motherboard, I have to choose the 
lesser of two evils, network hiccups or potential data corruption.
For the record, I have seen both occur.  Other people may make wish to 
make a different choice than we did, hence this patch allows the user to 
choose the mode at runtime.

Ross

diff -ur linux-2.6.11/drivers/pci/Kconfig linux-2.6.11-new/drivers/pci/Kconfig
--- linux-2.6.11/drivers/pci/Kconfig	2005-03-01 23:37:51.0 -0800
+++ linux-2.6.11-new/drivers/pci/Kconfig	2005-04-01 07:19:32.0 -0800
@@ -47,3 +47,38 @@
 
 	  When in doubt, say Y.
 
+choice
+	prompt "Enable PCI Master Abort Mode"
+	depends on PCI
+	default PCI_MASTER_ABORT_DEFAULT
+	help
+	  On PCI systems, when a bus is unavailable to a bus master, a 
+	  master abort occurs.  Older bridges satisfy the master request
+	  with all 0xFF's.  This can lead to silent data corruption.  Newer
+	  bridges can send a target abort to the bus master.  Some PCI
+	  hardware cannot handle the target abort.  Some x86 BIOSes configure
+  the buses in a suboptimal way.  This option allows you to override
  ^^^ extra spaces
+	  the BIOS setting.  If unsure chose default.  This choice can be
   choose
+ overridden at boot time with the pci_enable_master_abort={default,
+ enable, disable}
   boot option.
+
+config PCI_MASTER_ABORT_DEFAULT
+   bool "Default"
+   help
+ Choose this option if you are unsure, or believe your
+ firmware does the right thing.
+
+config PCI_MASTER_ABORT_ENABLE
+   bool "Enable"
+   help
+ Choose this option if it is more important for you to prevent
+ silent data loss than to have more hardware configurations work.
 ??
+
+
+config PCI_MASTER_ABORT_DISABLE
+   bool "Disable"
+   help
+ Choose this option if it is more important for you to have more
  
The phrase "have more hardware configurations work" need something
Maybe add something like:  "Some devices are known not to work with
PCI Master Aborts.  If you have one of these devices, you probably
want to Disable this option."

+	  hardware configurations work than to prevent silent data loss.
+
+endchoice
diff -ur linux-2.6.11/drivers/pci/probe.c linux-2.6.11-new/drivers/pci/probe.c
--- linux-2.6.11/drivers/pci/probe.c	2005-03-01 23:38:13.0 -0800
+++ 

[RFC/Patch 2.6.11] Take control of PCI Master Abort Mode

2005-04-05 Thread Ross Biro
Currently Linux 2.6 assumes the BIOS (or firmware) sets the master abort 
mode flag on PCI bridge chips in a coherent fashion.  This is not always 
the case and the consequences of getting this flag incorrect can cause 
hardware to fail or silent data corruption.  This patch lets the user 
override the BIOS master abort setting at boot time and the distro 
maintainer to set a default according to their target audience.

The comments in the patch are probably a bit too verbose, but I think it 
is a good patch to start discussions around.  If it is decided that 
something should be done about this problem, this patch could be 
included in a -mm release and migrate into Linus's kernel as appropriate.

This incarnation of the patch has had minimal testing.  For our internal 
kernels, we always force the master abort mode to 1 and then let the 
device drivers for hardware we know can't handle target aborts switch 
the master abort mode to 0. This does not seem appropriate for general 
release.

Some background for those who do not spend most of their waking hours 
exploring buses and what can go wrong.

The master abort flag tells a PCI bridge what to do when a bus master 
behind the bridge requests the bus and the bridge is unable to get the 
bus.  With the flag clear, for master reads the bridge returns all 
0xff's (hence silent data corruption) and for master writes, it throws 
the data away.  With the bit set, the bridge sends a target abort to the 
master.  This can only happen when the system is heavily loaded.

The problem with always setting the bit is that some PCI hardware, 
notably some Intel E-1000 chips (Ethernet controller: Intel Corporation: 
Unknown device 1076) cannot properly handle the target abort bit.  In 
the case of the E-1000 chip, the driver must reset the chip to recover. 
This usually leads to the machine being off the network for several 
seconds, or sometimes even minutes, which can be bad for servers.

I even have a single motherboard with both a device that cannot handle 
the target abort and an IDE controller that can handle the target abort 
behind the same bridge.  For this motherboard, I have to choose the 
lesser of two evils, network hiccups or potential data corruption.
For the record, I have seen both occur.  Other people may make wish to 
make a different choice than we did, hence this patch allows the user to 
choose the mode at runtime.

Ross



diff -ur linux-2.6.11/drivers/pci/Kconfig linux-2.6.11-new/drivers/pci/Kconfig
--- linux-2.6.11/drivers/pci/Kconfig2005-03-01 23:37:51.0 -0800
+++ linux-2.6.11-new/drivers/pci/Kconfig2005-04-01 07:19:32.0 
-0800
@@ -47,3 +47,38 @@
 
  When in doubt, say Y.
 
+choice
+   prompt "Enable PCI Master Abort Mode"
+   depends on PCI
+   default PCI_MASTER_ABORT_DEFAULT
+   help
+ On PCI systems, when a bus is unavailable to a bus master, a 
+ master abort occurs.  Older bridges satisfy the master request
+ with all 0xFF's.  This can lead to silent data corruption.  Newer
+ bridges can send a target abort to the bus master.  Some PCI
+ hardware cannot handle the target abort.  Some x86 BIOSes configure
+  the buses in a suboptimal way.  This option allows you to override
+ the BIOS setting.  If unsure chose default.  This choice can be
+ overridden at boot time with the pci_enable_master_abort={default,
+ enable, disable}
+
+config PCI_MASTER_ABORT_DEFAULT
+   bool "Default"
+   help
+ Choose this option if you are unsure, or believe your
+ firmware does the right thing.
+
+config PCI_MASTER_ABORT_ENABLE
+   bool "Enable"
+   help
+ Choose this option if it is more important for you to prevent
+ silent data loss than to have more hardware configurations work.
+
+
+config PCI_MASTER_ABORT_DISABLE
+   bool "Disable"
+   help
+ Choose this option if it is more important for you to have more
+ hardware configurations work than to prevent silent data loss.
+
+endchoice
diff -ur linux-2.6.11/drivers/pci/probe.c linux-2.6.11-new/drivers/pci/probe.c
--- linux-2.6.11/drivers/pci/probe.c2005-03-01 23:38:13.0 -0800
+++ linux-2.6.11-new/drivers/pci/probe.c2005-04-05 12:07:53.0 
-0700
@@ -28,6 +28,15 @@
 
 LIST_HEAD(pci_devices);
 
+/* used to force master abort mode on or off at runtime.
+   PCI_MASTER_ABORT_DEFAULT means leave alone, the BIOS got it correct.
+   PCI_MASTER_ABORT_ENABLE means turn it on everywhere.
+   PCI_MASTER_ABORT_DISABLE means turn it off everywhere.
+*/
+
+static int pci_enable_master_abort=PCI_MASTER_ABORT_VAL;
+
+
 #ifdef HAVE_PCI_LEGACY
 /**
  * pci_create_legacy_files - create legacy I/O port and memory files
@@ -429,6 +438,20 @@
pci_write_config_word(dev, PCI_BRIDGE_CONTROL,
  bctl & ~PCI_BRIDGE_CTL_MASTER_ABORT);
 
+   /* Some BIOSes disable master 

[RFC/Patch 2.6.11] Take control of PCI Master Abort Mode

2005-04-05 Thread Ross Biro
Currently Linux 2.6 assumes the BIOS (or firmware) sets the master abort 
mode flag on PCI bridge chips in a coherent fashion.  This is not always 
the case and the consequences of getting this flag incorrect can cause 
hardware to fail or silent data corruption.  This patch lets the user 
override the BIOS master abort setting at boot time and the distro 
maintainer to set a default according to their target audience.

The comments in the patch are probably a bit too verbose, but I think it 
is a good patch to start discussions around.  If it is decided that 
something should be done about this problem, this patch could be 
included in a -mm release and migrate into Linus's kernel as appropriate.

This incarnation of the patch has had minimal testing.  For our internal 
kernels, we always force the master abort mode to 1 and then let the 
device drivers for hardware we know can't handle target aborts switch 
the master abort mode to 0. This does not seem appropriate for general 
release.

Some background for those who do not spend most of their waking hours 
exploring buses and what can go wrong.

The master abort flag tells a PCI bridge what to do when a bus master 
behind the bridge requests the bus and the bridge is unable to get the 
bus.  With the flag clear, for master reads the bridge returns all 
0xff's (hence silent data corruption) and for master writes, it throws 
the data away.  With the bit set, the bridge sends a target abort to the 
master.  This can only happen when the system is heavily loaded.

The problem with always setting the bit is that some PCI hardware, 
notably some Intel E-1000 chips (Ethernet controller: Intel Corporation: 
Unknown device 1076) cannot properly handle the target abort bit.  In 
the case of the E-1000 chip, the driver must reset the chip to recover. 
This usually leads to the machine being off the network for several 
seconds, or sometimes even minutes, which can be bad for servers.

I even have a single motherboard with both a device that cannot handle 
the target abort and an IDE controller that can handle the target abort 
behind the same bridge.  For this motherboard, I have to choose the 
lesser of two evils, network hiccups or potential data corruption.
For the record, I have seen both occur.  Other people may make wish to 
make a different choice than we did, hence this patch allows the user to 
choose the mode at runtime.

Ross



diff -ur linux-2.6.11/drivers/pci/Kconfig linux-2.6.11-new/drivers/pci/Kconfig
--- linux-2.6.11/drivers/pci/Kconfig2005-03-01 23:37:51.0 -0800
+++ linux-2.6.11-new/drivers/pci/Kconfig2005-04-01 07:19:32.0 
-0800
@@ -47,3 +47,38 @@
 
  When in doubt, say Y.
 
+choice
+   prompt Enable PCI Master Abort Mode
+   depends on PCI
+   default PCI_MASTER_ABORT_DEFAULT
+   help
+ On PCI systems, when a bus is unavailable to a bus master, a 
+ master abort occurs.  Older bridges satisfy the master request
+ with all 0xFF's.  This can lead to silent data corruption.  Newer
+ bridges can send a target abort to the bus master.  Some PCI
+ hardware cannot handle the target abort.  Some x86 BIOSes configure
+  the buses in a suboptimal way.  This option allows you to override
+ the BIOS setting.  If unsure chose default.  This choice can be
+ overridden at boot time with the pci_enable_master_abort={default,
+ enable, disable}
+
+config PCI_MASTER_ABORT_DEFAULT
+   bool Default
+   help
+ Choose this option if you are unsure, or believe your
+ firmware does the right thing.
+
+config PCI_MASTER_ABORT_ENABLE
+   bool Enable
+   help
+ Choose this option if it is more important for you to prevent
+ silent data loss than to have more hardware configurations work.
+
+
+config PCI_MASTER_ABORT_DISABLE
+   bool Disable
+   help
+ Choose this option if it is more important for you to have more
+ hardware configurations work than to prevent silent data loss.
+
+endchoice
diff -ur linux-2.6.11/drivers/pci/probe.c linux-2.6.11-new/drivers/pci/probe.c
--- linux-2.6.11/drivers/pci/probe.c2005-03-01 23:38:13.0 -0800
+++ linux-2.6.11-new/drivers/pci/probe.c2005-04-05 12:07:53.0 
-0700
@@ -28,6 +28,15 @@
 
 LIST_HEAD(pci_devices);
 
+/* used to force master abort mode on or off at runtime.
+   PCI_MASTER_ABORT_DEFAULT means leave alone, the BIOS got it correct.
+   PCI_MASTER_ABORT_ENABLE means turn it on everywhere.
+   PCI_MASTER_ABORT_DISABLE means turn it off everywhere.
+*/
+
+static int pci_enable_master_abort=PCI_MASTER_ABORT_VAL;
+
+
 #ifdef HAVE_PCI_LEGACY
 /**
  * pci_create_legacy_files - create legacy I/O port and memory files
@@ -429,6 +438,20 @@
pci_write_config_word(dev, PCI_BRIDGE_CONTROL,
  bctl  ~PCI_BRIDGE_CTL_MASTER_ABORT);
 
+   /* Some BIOSes disable master abort mode, 

Re: [RFC/Patch 2.6.11] Take control of PCI Master Abort Mode

2005-04-05 Thread Randy.Dunlap
Ross Biro wrote:
Currently Linux 2.6 assumes the BIOS (or firmware) sets the master abort 
mode flag on PCI bridge chips in a coherent fashion.  This is not always 
the case and the consequences of getting this flag incorrect can cause 
hardware to fail or silent data corruption.  This patch lets the user 
override the BIOS master abort setting at boot time and the distro 
maintainer to set a default according to their target audience.

The comments in the patch are probably a bit too verbose, but I think it 
is a good patch to start discussions around.  If it is decided that 
something should be done about this problem, this patch could be 
included in a -mm release and migrate into Linus's kernel as appropriate.
The comments were helpful to me.
This incarnation of the patch has had minimal testing.  For our internal 
kernels, we always force the master abort mode to 1 and then let the 
device drivers for hardware we know can't handle target aborts switch 
the master abort mode to 0. This does not seem appropriate for general 
release.

Some background for those who do not spend most of their waking hours 
exploring buses and what can go wrong.
Is this related (or could it be -- or should it be) at all to the
current discussion on the linux-pci mailing list
[EMAIL PROTECTED]) about PCI Error Recovery
API Proposal ?
The master abort flag tells a PCI bridge what to do when a bus master 
behind the bridge requests the bus and the bridge is unable to get the 
bus.  With the flag clear, for master reads the bridge returns all 
0xff's (hence silent data corruption) and for master writes, it throws 
the data away.  With the bit set, the bridge sends a target abort to the 
master.  This can only happen when the system is heavily loaded.
or a PCI device isn't playing nicely?
The problem with always setting the bit is that some PCI hardware, 
notably some Intel E-1000 chips (Ethernet controller: Intel Corporation: 
Unknown device 1076) cannot properly handle the target abort bit.  In 
the case of the E-1000 chip, the driver must reset the chip to recover. 
This usually leads to the machine being off the network for several 
seconds, or sometimes even minutes, which can be bad for servers.

I even have a single motherboard with both a device that cannot handle 
the target abort and an IDE controller that can handle the target abort 
behind the same bridge.  For this motherboard, I have to choose the 
lesser of two evils, network hiccups or potential data corruption.
For the record, I have seen both occur.  Other people may make wish to 
make a different choice than we did, hence this patch allows the user to 
choose the mode at runtime.

Ross

diff -ur linux-2.6.11/drivers/pci/Kconfig linux-2.6.11-new/drivers/pci/Kconfig
--- linux-2.6.11/drivers/pci/Kconfig	2005-03-01 23:37:51.0 -0800
+++ linux-2.6.11-new/drivers/pci/Kconfig	2005-04-01 07:19:32.0 -0800
@@ -47,3 +47,38 @@
 
 	  When in doubt, say Y.
 
+choice
+	prompt Enable PCI Master Abort Mode
+	depends on PCI
+	default PCI_MASTER_ABORT_DEFAULT
+	help
+	  On PCI systems, when a bus is unavailable to a bus master, a 
+	  master abort occurs.  Older bridges satisfy the master request
+	  with all 0xFF's.  This can lead to silent data corruption.  Newer
+	  bridges can send a target abort to the bus master.  Some PCI
+	  hardware cannot handle the target abort.  Some x86 BIOSes configure
+  the buses in a suboptimal way.  This option allows you to override
  ^^^ extra spaces
+	  the BIOS setting.  If unsure chose default.  This choice can be
   choose
+ overridden at boot time with the pci_enable_master_abort={default,
+ enable, disable}
   boot option.
+
+config PCI_MASTER_ABORT_DEFAULT
+   bool Default
+   help
+ Choose this option if you are unsure, or believe your
+ firmware does the right thing.
+
+config PCI_MASTER_ABORT_ENABLE
+   bool Enable
+   help
+ Choose this option if it is more important for you to prevent
+ silent data loss than to have more hardware configurations work.
 ??
+
+
+config PCI_MASTER_ABORT_DISABLE
+   bool Disable
+   help
+ Choose this option if it is more important for you to have more
  
The phrase have more hardware configurations work need something
Maybe add something like:  Some devices are known not to work with
PCI Master Aborts.  If you have one of these devices, you probably
want to Disable this option.

+	  hardware configurations work than to prevent silent data loss.
+
+endchoice
diff -ur linux-2.6.11/drivers/pci/probe.c linux-2.6.11-new/drivers/pci/probe.c
--- linux-2.6.11/drivers/pci/probe.c	2005-03-01 23:38:13.0 -0800
+++