Re: [PATCH v2] PCI/MSI: Don't touch MSI bits when the PCI device is disconnected

2018-12-27 Thread Alex_Gagniuc
On 11/8/18 2:09 PM, Bjorn Helgaas wrote:
> 
> [EXTERNAL EMAIL]
> Please report any suspicious attachments, links, or requests for sensitive 
> information.
> 
> 
> [+cc Jonathan, Greg, Lukas, Russell, Sam, Oliver for discussion about
> PCI error recovery in general]
> 
> On Wed, Nov 07, 2018 at 05:42:57PM -0600, Bjorn Helgaas wrote:
>> On Tue, Sep 18, 2018 at 05:15:00PM -0500, Alexandru Gagniuc wrote:
>>> When a PCI device is gone, we don't want to send IO to it if we can
>>> avoid it. We expose functionality via the irq_chip structure. As
>>> users of that structure may not know about the underlying PCI device,
>>> it's our responsibility to guard against removed devices.
>>>
>>> .irq_write_msi_msg() is already guarded inside __pci_write_msi_msg().
>>> .irq_mask/unmask() are not. Guard them for completeness.
>>>
>>> For example, surprise removal of a PCIe device triggers teardown. This
>>> touches the irq_chips ops some point to disable the interrupts. I/O
>>> generated here can crash the system on firmware-first machines.
>>> Not triggering the IO in the first place greatly reduces the
>>> possibility of the problem occurring.
>>>
>>> Signed-off-by: Alexandru Gagniuc 
>>
>> Applied to pci/misc for v4.21, thanks!
> 
> I'm having second thoughts about this. 

Do we have a verdict on this? If you don't like this approach, then I'll 
have to fix the problem in some other way, but the problem still needs 
to be fixed.

Alex


Re: [PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER

2018-11-27 Thread Alex_Gagniuc
On 11/20/2018 04:08 PM, Sinan Kaya wrote:
> I followed the ASWG thread yesterday. There will be a meeting next week to
> discuss this.

Any updates on the meeting?



Re: [PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER

2018-11-20 Thread Alex_Gagniuc
On 11/20/2018 04:08 PM, Sinan Kaya wrote:
> One version is:
> "if HEST table is present, ignore _OSC"
> or
> Another version is:
> "if HEST table is present, make sure that FW sets _OSC bit for AER to
> false. Otherwise, warn like crazy that this BIOS is broken and needs
> an update and can cause all sorts of trouble"

ACPI 6.2 Ch 6.2.11.3, Table 6-197
PCI Express Advanced Error Reporting control

The firmware sets this bit to 1 to grant control over PCI
Express Advanced Error Reporting.
If firmware allows the OS control of this feature, then in the
context of the _OSC method it must ensure that error messages
are routed to device interrupts as described in the PCI
Express Base Specification. Additionally, after control is
transferred to the OS, firmware must not modify the Advanced
Error Reporting Capability. If control of this feature was
requested and denied or was not requested, firmware returns
this bit set to 0.


Re: [PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER

2018-11-20 Thread Alex_Gagniuc
On 11/20/2018 03:02 PM, Sinan Kaya wrote:
> On 11/20/2018 3:44 PM, alex_gagn...@dellteam.com wrote:
>> I'd prefer "sure" instead of "think". "I think it breaks some system I'm
>> not telling you about" doesn't help much in figuring out how not to
>> break said system(s).:)
> 
> Sorry, I thought I mentioned why it would break but let me repeat.

Why, yes, but bets are still being placed on the systems allegedly 
suffering from this.


> The systems I have seen rely on the HEST table presence as an indicator
> to the OS that firmware first is enabled. If you go look at the _OSC bits
> on such systems, it still says OS owns the AER service.
> 
> The assumption here is that HEST table has precedence over the _OSC bits.
> That's what needs to be clarified in the UEFI forum.
> 
> If this code is to go in and ignore the HEST table presence, then firmware
> will think that it owns AER service and OS will think that it owns AER
> service too.

So this seems like exactly the scenario we were hypothesizing.

  * System boots up with FFS enabled. Everything is fine so far.
  * OSPM requests control of AER (set bit 3 in _OSC)
  * FW grants OS control of AER (set bit 3 in _OSC reply)

That's how things are designed to work.


Now, let's assume, for the sake of argument, that the firmware on those 
system's is broken, and it didn't intend to give the OS control of AER. 
OSPM checking HEST instead of _OSC is still wrong, according to the 
spec. Two wrongs don't make a right, they just don't crash.

I think the correct way is to identify those broken systems, and add 
quirks for them. Continuing to have inconsistent and over-complicated 
logic that is not spec compliant is not any better.

Alex


Re: [PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER

2018-11-20 Thread Alex_Gagniuc
On 11/19/2018 07:54 PM, Sinan Kaya wrote:
> On 11/19/2018 6:49 PM, alex_gagn...@dellteam.com wrote:
>> On 11/19/2018 02:33 PM, Sinan Kaya wrote:
>>> However; table assumes governance about for which entities firmware first
>>> should be enabled. There is no cross reference to _OSC or permission
>>> negotiation like _OST.
>>
>> Well, from an OSPM perspective, is FFS something that can be enabled or
>> disabled? FFS seems to be static to OSPM, which would change the sort of
>> assumptions we can reasonably make here.
> 
> IMO, it can be enabled/disabled in BIOS. I have seen this implementation 
> before.
> If the trigger is the presence of a statically compiled ACPI HEST table (as 
> the
> current code does); presence of FFS would be static from OSPM perspective.
> BIOS could patch this table or hide it during boot.
> 
> If FFS were to be negotiated via _OSC as indirectly implied in this series, 
> then
> same BIOS could patch the ACPI table to return different values for the _OSC
> return.

It is theoretically possible to have proprietary BIOS settings to 
disable FFS. The platform vendors that I've spoken to do not offer this 
option. Though even if, hypothetically, BIOS clears the FFS bit in HEST, 
it won't stop it from commandeering the CPU and doing whatever it wants.

Although, I'm not quite sure why we'd want to negotiate FFS itself. FFS 
is too big of a can of worms (goes far beyond AER error reporting), when 
what we really care about is if OS can use a specific feature or not.
>> Cool. While the UEFI Secret Society debates, can we figure out if/how
>> [patch 1/2] breaks those systems, or is it only [patch 2/2] of this
>> series that we suspect?
> 
> I went back and looked at both patches. Both of them are removing references 
> to
> HEST table. I think both patches are impacted by this discussion.

I'd prefer "sure" instead of "think". "I think it breaks some system I'm 
not telling you about" doesn't help much in figuring out how not to 
break said system(s). :)

Alex


Re: [PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER

2018-11-19 Thread Alex_Gagniuc
On 11/19/2018 02:33 PM, Sinan Kaya wrote:
> True. I was trying to get it out in a rush. I omitted words.

Sounds like you'd make an top notch spec writer! :p

> However; table assumes governance about for which entities firmware first
> should be enabled. There is no cross reference to _OSC or permission
> negotiation like _OST.

Well, from an OSPM perspective, is FFS something that can be enabled or 
disabled? FFS seems to be static to OSPM, which would change the sort of 
assumptions we can reasonably make here.


>>> As I said in my previous email, the right place to talk about this is UEFI
>>> forum.
>>
>> The way I would present the problem to he spec writers is that, although
>> the spec appears to be consistent, we've seen firmware vendors that made
>> the wrong assumptions about HEST/_OSC. Instead of describing AER
>> ownership with _OSC, they attempted to do it with HEST. So we should add
>> an implementation note, or clarification about this.
> 
> I agree.

Cool. While the UEFI Secret Society debates, can we figure out if/how 
[patch 1/2] breaks those systems, or is it only [patch 2/2] of this 
series that we suspect?

Alex


Re: [PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER

2018-11-19 Thread Alex_Gagniuc
On 11/19/2018 01:32 PM, Sinan Kaya wrote:
> ACPI 6.2:
> 
> 18.3.2.4 PCI Express Root Port AER Structure
> 
> Flags:
> 
> Bit [0] - FIRMWARE_FIRST: If set, this bit indicates to the OSPM that system
> firmware will handle errors from this source first.
> Bit [1] - GLOBAL: If set, indicates that the settings contained in this
> structure apply globally to all PCI Express Devices.
> All other bits must be set to zero.
> 
> It doesn't say shall, may or might. It says will.

It says "system firmware will handle errors". It does not say "system 
firmware owns AER registers". In absence on any descriptor text on the 
meaning of these tables, this really looks to me like it should be 
interpreted as a descriptor of APEI error sources, not a mutex on who 
writes to certain bits-- AER in this case.

I don't think that is contradictory or inconsistent.
I also wasn't able to find any reference to HEST in UEFI 2.7, only in 
ACPI spec.

> I think It depends on your PCI topology.
> 
> For other topologies with multiple PCI root complexes, I can see this being
> used per root complex flag to indicate which root complex needs firmware first
> and which one doesn't.

_OSC is per root bus, so it's already granular enough, right? Why would 
it depend on PCI topology?


>> I'd like see how exactly we break one of those elusive systems with _OSC. I
>> suspect _OSC and HEST end up having the same information, and that's why we
>> didn't see any real-life issue with mixing the approaches.
> 
> I'm already aware of two systems that rely on HEST table to pass information 
> to
> the OS that firmware first is enabled. Both of the systems do not change their
> _OSC bits during this assuming HEST table has priority over _OSC for firmware
> first.

Are those hax86 systems?
It seems like the systems have broken firmware. I see several ways to 
handle broken systems like those:
  - Parse both HEST and _OSC, and decide AER ownership with root bridge 
granularity. i.e. host_bridge->native_aer is authoritative, but is 
derived from both HEST and _OSC
  - Add quirks for the broken systems
  - Keep doing what we're doing until current code breaks a new system

> If we add this patch, OS will try to claim the AER address space while 
> firmware
> wants exclusive access.

Yay! FFS wants exclusive access, but does not claim it. Oh, FFS!


> As I said in my previous email, the right place to talk about this is UEFI
> forum.

The way I would present the problem to he spec writers is that, although 
the spec appears to be consistent, we've seen firmware vendors that made 
the wrong assumptions about HEST/_OSC. Instead of describing AER 
ownership with _OSC, they attempted to do it with HEST. So we should add 
an implementation note, or clarification about this.

Alex


Re: [PATCH v2] PCI/MSI: Don't touch MSI bits when the PCI device is disconnected

2018-11-15 Thread Alex_Gagniuc
+ Borislav (ACPI guy, maybe he can answer more about HEST)

On 11/15/2018 12:24 AM, Bjorn Helgaas wrote:
> On Wed, Nov 14, 2018 at 07:22:04PM +, alex_gagn...@dellteam.com wrote:
>> On 11/14/2018 12:00 AM, Bjorn Helgaas wrote:
>>> On Tue, Nov 13, 2018 at 10:39:15PM +, alex_gagn...@dellteam.com wrote:
>> ACPI 6.2 - 6.2.11.3, Table 6-197:
>>
>> PCI Express Advanced Error Reporting control:
>>* The firmware sets this bit to 1 to grant control over PCI Express
>> Advanced Error Reporting. If firmware allows the OS control of this
>> feature, then in the context of the _OSC method it must ensure that
>> error messages are routed to device interrupts as described in the PCI
>> Express Base Specification[...]
> 
> The PCIe Base Spec is pretty big, so I wish this reference were a
> little more explicit.  I *guess* maybe it's referring to PCIe r4.0,
> figure 6-3 in sec 6.2.6, where PCIe ERR_* Messages can be routed to
> "INTx or MSI Error Interrupts" and/or "platform-specific System Error"
> interrupts.

It's engineering slang for "I don't know what the spec says, but do what 
the spec says." What it means is that FW should set up HW in such a way, 
that PCIe-compliant OS which enables interrupts will get interrupts the 
way it expects. For example, some FW might set the root port to trigger 
an SMI instead of firing up the proper MSI vector. In this case FW must 
ensure that AER interrupts fire up the MSI vector instead of triggering SMI.

> "Device interrupts" seems like it refers to the "INTx or MSI"
> interrupts, not the platform-specific System Errors, so I would read
> that as saying "if firmware grants OS control of AER via _OSC,
> firmware must set the AER Reporting Enables in the AER Root Error
> Command register."  But that seems a little silly because the OS now
> *owns* the AER capability and it can set the AER Root Error Command
> register itself if it wants to.

When OS takes control of AER, it is responsible for setting things up, 
at least as far as standard PCIe registers are concerned. So setting up 
the the root command register would still be the OS's responsibility. 
Proprietary registers, like routing to SMI/SCI/MSI would have to be done 
by FW.


> And I still don't see the connection here with Firmware-First.  I'm
> pretty sure firmware could not be notified via INTx or MSI interrupts
> because those are totally managed by OSPM.

The connection with FFS is that FFS needs to be notified by some other 
method. FW can't commandeer interrupt vectors willie-nillie. FW gets 
notified by platform-specific mechanisms. In my case, it happens to be 
SMM, but it could be a number of other proprietary mechanisms.


>>> The closest I can find is the "Enabled" field in the HEST PCIe
>>> AER structures (ACPI v6.2, sec 18.3.2.4, .5, .6), where it says:
>>>[...] 
>>> AFAICT, Linux completely ignores the Enabled field in these
>>> structures.
>>
>> I don't think ignoring the field is a problem:
>>* With FFS, OS should ignore it.
>>* Without FFS, we have control, and we get to make the decisions anyway.
>> In the latter case we decide whether to use AER, independent of the crap
>> in ACPI. I'm not even sure why "Enabled" matters in native AER handling.
> 
> It seems like these HEST structures are "here's how firmware thinks
> you should set up AER on this device".  But I agree, I have no idea
> how to interpret "Enabled".  The rest of the HEST fields cover all the
> useful AER registers, including the Reporting Enables in the AER Root
> Error Command register *and* the Error Reporting Enables in the Device
> Control register.  So I don't know what the "Enabled" field adds to
> all that.  

"This table contains information platform firmware supplies to OSPM for 
configuring AER support on a given PCI Express device."

I don't fully get the point of the HEST structures for PCIe. I'm hoping 
Borislav can shed enlightment on this. I think we can ignore them in PCI 
core, with a note to revisit them if something ever goes horribly wrong. 
I've patched some changes to reduce reliance on HEST [1].

> What a mess.

It's ACPI. What else did you expect?

>>> For firmware-first to work, firmware has to get control.  How does
>>> it get control?  How does OSPM know to either set up that
>>> mechanism or keep its mitts off something firmware set up before
>>> handoff?
>>
>> My understanding is that, if FW keeps control of AER in _OSC, then
>> it will have set things up to get notified instead of the OS. OSPM
>> not touching AER bits is to make sure it doesn't mess up FW's setup.
>> I think there are some proprietary bits in the root port to route
>> interrupts to SMIs instead of the AER vectors.
> 
> It makes good sense that if OSPM doesn't have AER control, firmware
> does all AER handling, including any setup for firmware-first
> notification.  If we can assume that firmware-first notification is
> done in some way the OS doesn't know about and can't mess up, that
> would be awesome.

I think that's a 

Re: [PATCH v2] PCI/MSI: Don't touch MSI bits when the PCI device is disconnected

2018-11-14 Thread Alex_Gagniuc
On 11/14/2018 02:27 PM, Keith Busch wrote:
> On Wed, Nov 14, 2018 at 07:22:04PM +, alex_gagn...@dellteam.com wrote:
>> On 11/14/2018 12:00 AM, Bjorn Helgaas wrote:
>>> Just to make sure we're on the same page, can you point me to this
>>> rule?  I do see that OSPM must request control of AER using _OSC
>>> before it touches the AER registers.  What I don't see is the
>>> connection between firmware-first and the AER registers.
>>
>> ACPI 6.2 - 6.2.11.3, Table 6-197:
>>[...]
>> Maybe Keith knows better why we're doing it this way. From ACPI text, it
>> doesn't seem that control of AER would be tied to HEST entries, although
>> in practice, it is.
> 
> I'm not sure, that predates me.  HEST does have a FIRMWARE_FIRST flag, but
> spec does not say anymore on relation to _OSC control or AER capability.
> Nothing in PCIe spec either.

Speaking to one of the PCIe (and _HPX type 3) spec authors, ownership of 
AER should be determined by _OSC. period. The result of _OSC applies to 
every device under the root port. This crap we do with checking HEST is 
crap.

If I'm not stepping on anyone toes, and there's no known unintended 
consequences, I can look at patching this up. I'm not promising a patch, 
though, but it's exactly the sort of thing I like to fix.

> I also don't know why Linux disables the AER driver if only one
> device has a FIRMWARE_FIRST HEST. Shouldn't that just be a per-device
> decision?

I think the logic is if one HEST entry has both FFS and GLOBAL flags 
set, then then disable AER services for all devices. It works in 
practice better than it works in theory. I think _OSC should be the 
determining factor here, not HEST.

>>> The closest I can find is the "Enabled" field in the HEST PCIe
>>> AER structures (ACPI v6.2, sec 18.3.2.4, .5, .6), where it says:
>>> [...]
>>> AFAICT, Linux completely ignores the Enabled field in these
>>> structures.
>>
>> I don't think ignoring the field is a problem:
>>* With FFS, OS should ignore it.
>>* Without FFS, we have control, and we get to make the decisions anyway.
>> In the latter case we decide whether to use AER, independent of the crap
>> in ACPI. I'm not even sure why "Enabled" matters in native AER handling.
>> Probably one of the check-boxes in "Binary table designer's handbook"?
> 
> And why doesn't Linux do anything with _OSC response other than logging
> it? If OS control wasn't granted, shouldn't that take priority over HEST?

But it does in portdrv_core.c:

if (dev->aer_cap && pci_aer_available() &&
(pcie_ports_native || host->native_aer)) {
services |= PCIE_PORT_SERVICE_AER;

That flag later creates a pcie device that allows aerdrv to attach to.

Alex


Re: [PATCH v2] PCI/MSI: Don't touch MSI bits when the PCI device is disconnected

2018-11-14 Thread Alex_Gagniuc
On 11/14/2018 12:00 AM, Bjorn Helgaas wrote:
> On Tue, Nov 13, 2018 at 10:39:15PM +, alex_gagn...@dellteam.com wrote:
>> On 11/12/2018 11:02 PM, Bjorn Helgaas wrote:
>>>
>>> [EXTERNAL EMAIL]
>>> Please report any suspicious attachments, links, or requests for sensitive 
>>> information.
> 
> It looks like Dell's email system adds the above in such a way that the
> email quoting convention suggests that *I* wrote it, when I did not.

I was wondering why you thought I was suspicious. It's a recent 
(server-side) change. You used to be able to disable these sort of 
notices. I'm told back in the day people were asked to delete emails 
before reading them.

>> ...
>>> Do you think Linux observes the rule about not touching AER bits on
>>> FFS?  I'm not sure it does.  I'm not even sure what section of the
>>> spec is relevant.
>>
>> I haven't found any place where linux breaks this rule. I'm very
>> confident that, unless otherwise instructed, we follow this rule.
> 
> Just to make sure we're on the same page, can you point me to this
> rule?  I do see that OSPM must request control of AER using _OSC
> before it touches the AER registers.  What I don't see is the
> connection between firmware-first and the AER registers.

ACPI 6.2 - 6.2.11.3, Table 6-197:

PCI Express Advanced Error Reporting control:
  * The firmware sets this bit to 1 to grant control over PCI Express 
Advanced Error Reporting. If firmware allows the OS control of this 
feature, then in the context of the _OSC method it must ensure that 
error messages are routed to device interrupts as described in the PCI 
Express Base Specification[...]

Now I'm confused too:
  * HEST -> __aer_firmware_first
This is used for touching/not touching AER bits
  * _OSC -> bridge->native_aer
Used to enable/not enable AER portdrv service
Maybe Keith knows better why we're doing it this way. From ACPI text, it 
doesn't seem that control of AER would be tied to HEST entries, although 
in practice, it is.

> The closest I can find is the "Enabled" field in the HEST PCIe
> AER structures (ACPI v6.2, sec 18.3.2.4, .5, .6), where it says:
> 
>If the field value is 1, indicates this error source is
>to be enabled.
> 
>If the field value is 0, indicates that the error source
>is not to be enabled.
> 
>If FIRMWARE_FIRST is set in the flags field, the Enabled
>field is ignored by the OSPM.
> 
> AFAICT, Linux completely ignores the Enabled field in these
> structures.

I don't think ignoring the field is a problem:
  * With FFS, OS should ignore it.
  * Without FFS, we have control, and we get to make the decisions anyway.
In the latter case we decide whether to use AER, independent of the crap 
in ACPI. I'm not even sure why "Enabled" matters in native AER handling. 
Probably one of the check-boxes in "Binary table designer's handbook"?

> These structures also contain values the OS is apparently supposed to
> write to Device Control and several AER registers (in struct
> acpi_hest_aer_common).  Linux ignores these as well.
> 
> These seem like fairly serious omissions in Linux.

I think HPX carries the same sort of information (except for Root 
Command reg). FW is supposed to program those registers anyway, so even 
if OS doesn't touch them, I'd expect things to just work.

>>> The whole issue of firmware-first, the mechanism by which firmware
>>> gets control, the System Error enables in Root Port Root Control
>>> registers, etc., is very murky to me.  Jon has a sort of similar issue
>>> with VMD where he needs to leave System Errors enabled instead of
>>> disabling them as we currently do.
>>
>> Well, OS gets control via _OSC method, and based on that it should
>> touch/not touch the AER bits.
> 
> I agree so far.
> 
>> The bits that get set/cleared come from _HPX method,
> 
> _HPX tells us about some AER registers, Device Control, Link Control,
> and some bridge registers.  It doesn't say anything about the Root
> Control register that Jon is concerned with.

_HPX type 3 (yay!!!) got approved recently, and that will have more 
fine-grained control. It will be able to handle root control reg.

> For firmware-first to work, firmware has to get control.  How does it
> get control?  How does OSPM know to either set up that mechanism or
> keep its mitts off something firmware set up before handoff?

My understanding is that, if FW keeps control of AER in _OSC, then it 
will have set things up to get notified instead of the OS. OSPM not 
touching AER bits is to make sure it doesn't mess up FW's setup. I think 
there are some proprietary bits in the root port to route interrupts to 
SMIs instead of the AER vectors.

> In Jon's
> VMD case, I think firmware-first relies on the System Error controlled
> by the Root Control register.  Linux thinks it owns that, and I don't
> know how to learn otherwise.

Didn't Keith say the root port is not visible to the OS?

Alex


Re: [PATCH v2] PCI/MSI: Don't touch MSI bits when the PCI device is disconnected

2018-11-13 Thread Alex_Gagniuc
On 11/13/2018 04:56 PM, Keith Busch wrote:
> On Tue, Nov 13, 2018 at 10:39:15PM +, alex_gagn...@dellteam.com wrote:
>> On 11/12/2018 11:02 PM, Bjorn Helgaas wrote:
>>> The whole issue of firmware-first, the mechanism by which firmware
>>> gets control, the System Error enables in Root Port Root Control
>>> registers, etc., is very murky to me.  Jon has a sort of similar issue
>>> with VMD where he needs to leave System Errors enabled instead of
>>> disabling them as we currently do.
>>
>> Well, OS gets control via _OSC method, and based on that it should
>> touch/not touch the AER bits. The bits that get set/cleared come from
>> _HPX method, and there's a more about the FFS described in ACPI spec. It
>> seems that if platform, wants to enable VMD, it should pass the correct
>> bits via _HPX. I'm curious to know in what new twisted way FFS doesn't
>> work as intended.
> 
> When VMD is enabled, the platform sees a VMD endpoint. It doesn't see
> any of the root ports on that domain, so ACPI can't provide policies for
> them nor AER registers for the platform to consider controlling.

I'm not understanding the interdependency between RP AER settings and 
VMD. My understanding of VMD is quite rudimentary though, so I'll take 
your word for it.

Alex



Re: [PATCH v2] PCI/MSI: Don't touch MSI bits when the PCI device is disconnected

2018-11-13 Thread Alex_Gagniuc
On 11/12/2018 11:02 PM, Bjorn Helgaas wrote:
> 
> [EXTERNAL EMAIL]
> Please report any suspicious attachments, links, or requests for sensitive 
> information.
> 
> 
> [+cc Jon, for related VMD firmware-first error enable issue]
> 
> On Mon, Nov 12, 2018 at 08:05:41PM +, alex_gagn...@dellteam.com wrote:
>> On 11/11/2018 11:50 PM, Oliver O'Halloran wrote:
>>> On Thu, 2018-11-08 at 23:06 +, alex_gagn...@dellteam.com wrote:
> 
 But it's not the firmware that crashes. It's linux as a result of a
 fatal error message from the firmware. And we can't fix that because FFS
 handling requires that the system reboots [1].
>>>
>>> Do we know the exact circumsances that result in firmware requesting a
>>> reboot? If it happen on any PCIe error I don't see what we can do to
>>> prevent that beyond masking UEs entirely (are we even allowed to do
>>> that on FFS systems?).
>>
>> Pull a drive out at an angle, push two drives in at the same time, pull
>> out a drive really slow. If an error is even reported to the OS depends
>> on PD state, and proprietary mechanisms and logic in the HW and FW. OS
>> is not supposed to mask errors (touch AER bits) on FFS.
> 
> PD?

Presence Detect

> Do you think Linux observes the rule about not touching AER bits on
> FFS?  I'm not sure it does.  I'm not even sure what section of the
> spec is relevant.

I haven't found any place where linux breaks this rule. I'm very 
confident that, unless otherwise instructed, we follow this rule.

> The whole issue of firmware-first, the mechanism by which firmware
> gets control, the System Error enables in Root Port Root Control
> registers, etc., is very murky to me.  Jon has a sort of similar issue
> with VMD where he needs to leave System Errors enabled instead of
> disabling them as we currently do.

Well, OS gets control via _OSC method, and based on that it should 
touch/not touch the AER bits. The bits that get set/cleared come from 
_HPX method, and there's a more about the FFS described in ACPI spec. It 
seems that if platform, wants to enable VMD, it should pass the correct 
bits via _HPX. I'm curious to know in what new twisted way FFS doesn't 
work as intended.

Alex

> Bjorn
> 
> [1] 
> https://lore.kernel.org/linux-pci/20181029210651.gb13...@bhelgaas-glaptop.roam.corp.google.com
> 



Re: [PATCH v2] PCI/MSI: Don't touch MSI bits when the PCI device is disconnected

2018-11-12 Thread Alex_Gagniuc
On 11/11/2018 11:50 PM, Oliver O'Halloran wrote:
> 
> [EXTERNAL EMAIL]
> Please report any suspicious attachments, links, or requests for sensitive 
> information.
> 
> 
> On Thu, 2018-11-08 at 23:06 +, alex_gagn...@dellteam.com wrote:
>> On 11/08/2018 04:51 PM, Greg KH wrote:
>>> On Thu, Nov 08, 2018 at 10:49:08PM +, alex_gagn...@dellteam.com wrote:
 In the case that we're trying to fix, this code executing is a result of
 the device being gone, so we can guarantee race-free operation. I agree
 that there is a race, in the general case. As far as checking the result
 for all F's, that's not an option when firmware crashes the system as a
 result of the mmio read/write. It's never pretty when firmware gets
 involved.
>>>
>>> If you have firmware that crashes the system when you try to read from a
>>> PCI device that was hot-removed, that is broken firmware and needs to be
>>> fixed.  The kernel can not work around that as again, you will never win
>>> that race.
>>
>> But it's not the firmware that crashes. It's linux as a result of a
>> fatal error message from the firmware. And we can't fix that because FFS
>> handling requires that the system reboots [1].
> 
> Do we know the exact circumsances that result in firmware requesting a
> reboot? If it happen on any PCIe error I don't see what we can do to
> prevent that beyond masking UEs entirely (are we even allowed to do
> that on FFS systems?).

Pull a drive out at an angle, push two drives in at the same time, pull 
out a drive really slow. If an error is even reported to the OS depends 
on PD state, and proprietary mechanisms and logic in the HW and FW. OS 
is not supposed to mask errors (touch AER bits) on FFS.

Sadly, with FFS, behavior can and does change from BIOS version to BIOS 
version. On one product, for example, we eliminated a lot of crashes by 
simply not reporting some classes of PCIe errors to the OS.

Alex

>> If we're going to say that we don't want to support FFS because it's a
>> separate code path, and different flow, that's fine. I am myself, not a
>> fan of FFS. But if we're going to continue supporting it, I think we'll
>> continue to have to resolve these sort of unintended consequences.
>>
>> Alex
>>
>> [1] ACPI 6.2, 18.1 - Hardware Errors and Error Sources
> 
> 



Re: [PATCH v2] PCI/MSI: Don't touch MSI bits when the PCI device is disconnected

2018-11-08 Thread Alex_Gagniuc
On 11/08/2018 04:51 PM, Greg KH wrote:
> On Thu, Nov 08, 2018 at 10:49:08PM +, alex_gagn...@dellteam.com wrote:
>> In the case that we're trying to fix, this code executing is a result of
>> the device being gone, so we can guarantee race-free operation. I agree
>> that there is a race, in the general case. As far as checking the result
>> for all F's, that's not an option when firmware crashes the system as a
>> result of the mmio read/write. It's never pretty when firmware gets
>> involved.
> 
> If you have firmware that crashes the system when you try to read from a
> PCI device that was hot-removed, that is broken firmware and needs to be
> fixed.  The kernel can not work around that as again, you will never win
> that race.

But it's not the firmware that crashes. It's linux as a result of a 
fatal error message from the firmware. And we can't fix that because FFS 
handling requires that the system reboots [1].

If we're going to say that we don't want to support FFS because it's a 
separate code path, and different flow, that's fine. I am myself, not a 
fan of FFS. But if we're going to continue supporting it, I think we'll 
continue to have to resolve these sort of unintended consequences.

Alex

[1] ACPI 6.2, 18.1 - Hardware Errors and Error Sources


Re: [PATCH v2] PCI/MSI: Don't touch MSI bits when the PCI device is disconnected

2018-11-08 Thread Alex_Gagniuc
On 11/08/2018 04:43 PM, Greg Kroah-Hartman wrote:
> 
> [EXTERNAL EMAIL]
> Please report any suspicious attachments, links, or requests for sensitive 
> information.
> 
> 
> On Thu, Nov 08, 2018 at 03:32:58PM -0700, Keith Busch wrote:
>> On Thu, Nov 08, 2018 at 02:01:17PM -0800, Greg Kroah-Hartman wrote:
>>> On Thu, Nov 08, 2018 at 02:09:17PM -0600, Bjorn Helgaas wrote:
 I'm having second thoughts about this.  One thing I'm uncomfortable
 with is that sprinkling pci_dev_is_disconnected() around feels ad hoc
 instead of systematic, in the sense that I don't know how we convince
 ourselves that this (and only this) is the correct place to put it.
>>>
>>> I think my stance always has been that this call is not good at all
>>> because once you call it you never really know if it is still true as
>>> the device could have been removed right afterward.
>>>
>>> So almost any code that relies on it is broken, there is no locking and
>>> it can and will race and you will loose.
>>
>> AIUI, we're not trying to create code to rely on this. This more about
>> reducing reliance on hardware. If the software misses the race once and
>> accesses disconnected device memory, that's usually not a big deal to
>> let hardware sort it out, but the point is not to push our luck.
> 
> Then why even care about this call at all?  If you need to really know
> if the read worked, you have to check the value.  If the value is FF
> then you have a huge hint that the hardware is now gone.  And you can
> rely on it being gone, you can never rely on making the call to the
> function to check if the hardware is there to be still valid any point
> in time after the call returns.

In the case that we're trying to fix, this code executing is a result of 
the device being gone, so we can guarantee race-free operation. I agree 
that there is a race, in the general case. As far as checking the result 
for all F's, that's not an option when firmware crashes the system as a 
result of the mmio read/write. It's never pretty when firmware gets 
involved.

Alex


Re: [PATCH v2] PCI/MSI: Don't touch MSI bits when the PCI device is disconnected

2018-11-08 Thread Alex_Gagniuc
On 11/08/2018 02:09 PM, Bjorn Helgaas wrote:
> 
> [EXTERNAL EMAIL]
> Please report any suspicious attachments, links, or requests for sensitive 
> information.
> 
> 
> [+cc Jonathan, Greg, Lukas, Russell, Sam, Oliver for discussion about
> PCI error recovery in general]

Has anyone seen seen the ECRs in the PCIe base spec and ACPI that have 
been floating around the past few months? -- HPX, SFI, CER. Without 
divulging too much before publication, I'm curious on opinions on how 
well (or not well) those flows would work in general, and in linux.

> On Wed, Nov 07, 2018 at 05:42:57PM -0600, Bjorn Helgaas wrote:
> I'm having second thoughts about this.  One thing I'm uncomfortable
> with is that sprinkling pci_dev_is_disconnected() around feels ad hoc
> instead of systematic, in the sense that I don't know how we convince
> ourselves that this (and only this) is the correct place to put it. >
> Another is that the only place we call pci_dev_set_disconnected() is
> in pciehp and acpiphp, so the only "disconnected" case we catch is if
> hotplug happens to be involved.  Every MMIO read from the device is an
> opportunity to learn whether it is reachable (a read from an
> unreachable device typically returns ~0 data), but we don't do
> anything at all with those.
 >
> The config accessors already check pci_dev_is_disconnected(), so this
> patch is really aimed at MMIO accesses.  I think it would be more
> robust if we added wrappers for readl() and writel() so we could
> notice read errors and avoid future reads and writes.

I wouldn't expect anything less than  complete scrutiny and quality 
control of unquestionable moral integrity :). In theory ~0 can be a 
great indicator that something may be wrong. Though I think it's about 
as ad-hoc as pci_dev_is_disconnected().

I slightly like the idea of wrapping the MMIO accessors. There's still 
memcpy and DMA that cause the same MemRead/Wr PCIe transactions, and the 
same sort of errors in PCIe land, and it would be good to have more 
testing on this. Since this patch is tested and confirmed to fix a known 
failure case, I would keep it, and the look at fixing the problem in a 
more generic way.

BTW, a lot of the problems we're fixing here come courtesy of 
firmware-first error handling. Do we reach a point where we draw a line 
in handling new problems introduced by FFS? So, if something is a 
problem with FFS, but not native handling, do we commit to supporting it?

Alex