On Mon, Jul 04, 2022 at 07:34:47PM +0800, G.R. wrote:
> On Mon, Jul 4, 2022 at 5:53 PM Roger Pau Monné <roger....@citrix.com> wrote:
> >
> > On Sun, Jul 03, 2022 at 01:43:11AM +0800, G.R. wrote:
> > > Hi everybody,
> > >
> > > I run into problems passing through a SN570 NVME SSD to a HVM guest.
> > > So far I have no idea if the problem is with this specific SSD or with
> > > the CPU + motherboard combination or the SW stack.
> > > Looking for some suggestions on troubleshooting.
> > >
> > > List of build info:
> > > CPU+motherboard: E-2146G + Gigabyte C246N-WU2
> > > XEN version: 4.14.3
> >
> > Are you using a debug build of Xen? (if not it would be helpful to do
> > so).
> It's a release version at this moment. I can switch to a debug version
> later when I get my hands free.
> BTW, I got a DEBUG build of the xen_pciback driver to see how it plays
> with 'xl pci-assignable-xxx' commands.
> You can find this in my 2nd email in the chain.
> 
> >
> > > Dom0: Linux Kernel 5.10 (built from Debian 11.2 kernel source package)
> > > The SN570 SSD sits here in the PCI tree:
> > >            +-1d.0-[05]----00.0  Sandisk Corp Device 501a
> >
> > Could be helpful to post the output with -vvv so we can see the
> > capabilities of the device.
> Sure, please find the -vvv output from the attachment.
> This one is just to indicate the connection in the PCI tree.
> I.e. 05:00.0 is attached under 00:1d.0.
> 
> >
> > > Syndromes observed:
> > > With ASPM enabled, pciback has problem seizing the device.
> > >
> > > Jul  2 00:36:54 gaia kernel: [    1.648270] pciback 0000:05:00.0:
> > > xen_pciback: seizing device
> > > ...
> > > Jul  2 00:36:54 gaia kernel: [    1.768646] pcieport 0000:00:1d.0:
> > > AER: enabled with IRQ 150
> > > Jul  2 00:36:54 gaia kernel: [    1.768716] pcieport 0000:00:1d.0:
> > > DPC: enabled with IRQ 150
> > > Jul  2 00:36:54 gaia kernel: [    1.768717] pcieport 0000:00:1d.0:
> > > DPC: error containment capabilities: Int Msg #0, RPExt+ PoisonedTLP+
> > > SwTrigger+ RP PIO Log 4, DL_ActiveErr+
> >
> > Is there a device reset involved here?  It's possible the device
> > doesn't reset properly and hence the Uncorrectable Error Status
> > Register ends up with inconsistent bits set.
> 
> xen_pciback appears to force a FLR whenever it attempts to seize a
> capable device.
> As shown in pciback_dbg_xl-pci_assignable_XXX.log attached in my 2nd mail.
> [  323.448115] xen_pciback: wants to seize 0000:05:00.0
> [  323.448136] pciback 0000:05:00.0: xen_pciback: probing...
> [  323.448137] pciback 0000:05:00.0: xen_pciback: seizing device
> [  323.448162] pciback 0000:05:00.0: xen_pciback: pcistub_device_alloc
> [  323.448162] pciback 0000:05:00.0: xen_pciback: initializing...
> [  323.448163] pciback 0000:05:00.0: xen_pciback: initializing config
> [  323.448344] pciback 0000:05:00.0: xen_pciback: enabling device
> [  323.448425] xen: registering gsi 16 triggering 0 polarity 1
> [  323.448428] Already setup the GSI :16
> [  323.448497] pciback 0000:05:00.0: xen_pciback: save state of device
> [  323.448642] pciback 0000:05:00.0: xen_pciback: resetting (FLR, D3,
> etc) the device
> [  323.448707] pcieport 0000:00:1d.0: DPC: containment event,
> status:0x1f11 source:0x0000
> [  323.448730] pcieport 0000:00:1d.0: DPC: unmasked uncorrectable error 
> detected
> [  323.448760] pcieport 0000:00:1d.0: PCIe Bus Error:
> severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Receiver
> ID)
> [  323.448786] pcieport 0000:00:1d.0:   device [8086:a330] error
> status/mask=00200000/00010000
> [  323.448813] pcieport 0000:00:1d.0:    [21] ACSViol                (First)
> [  324.690979] pciback 0000:05:00.0: not ready 1023ms after FLR;
> waiting  <============ HERE
> [  325.730706] pciback 0000:05:00.0: not ready 2047ms after FLR; waiting
> [  327.997638] pciback 0000:05:00.0: not ready 4095ms after FLR; waiting
> [  332.264251] pciback 0000:05:00.0: not ready 8191ms after FLR; waiting
> [  340.584320] pciback 0000:05:00.0: not ready 16383ms after FLR;
> waiting
> [  357.010896] pciback 0000:05:00.0: not ready 32767ms after FLR; waiting
> [  391.143951] pciback 0000:05:00.0: not ready 65535ms after FLR; giving up
> [  392.249252] pciback 0000:05:00.0: xen_pciback: reset device
> [  392.249392] pciback 0000:05:00.0: xen_pciback:
> xen_pcibk_error_detected(bus:5,devfn:0)
> [  392.249393] pciback 0000:05:00.0: xen_pciback: device is not found/assigned
> [  392.397074] pciback 0000:05:00.0: xen_pciback:
> xen_pcibk_error_resume(bus:5,devfn:0)
> [  392.397080] pciback 0000:05:00.0: xen_pciback: device is not found/assigned
> [  392.397284] pcieport 0000:00:1d.0: AER: device recovery successful
> Note, I only see this in FLR action the 1st attempt.
> And my SATA controller which doesn't support FLR appears to pass
> through just fine...
> 
> >
> > > ...
> > > Jul  2 00:36:54 gaia kernel: [    1.770039] xen: registering gsi 16
> > > triggering 0 polarity 1
> > > Jul  2 00:36:54 gaia kernel: [    1.770041] Already setup the GSI :16
> > > Jul  2 00:36:54 gaia kernel: [    1.770314] pcieport 0000:00:1d.0:
> > > DPC: containment event, status:0x1f11 source:0x0000
> > > Jul  2 00:36:54 gaia kernel: [    1.770315] pcieport 0000:00:1d.0:
> > > DPC: unmasked uncorrectable error detected
> > > Jul  2 00:36:54 gaia kernel: [    1.770320] pcieport 0000:00:1d.0:
> > > PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction
> > > Layer, (Receiver ID)
> > > Jul  2 00:36:54 gaia kernel: [    1.770371] pcieport 0000:00:1d.0:
> > > device [8086:a330] error status/mask=00200000/00010000
> > > Jul  2 00:36:54 gaia kernel: [    1.770413] pcieport 0000:00:1d.0:
> > > [21] ACSViol                (First)
> > > Jul  2 00:36:54 gaia kernel: [    1.770466] pciback 0000:05:00.0:
> > > xen_pciback: device is not found/assigned
> > > Jul  2 00:36:54 gaia kernel: [    1.920195] pciback 0000:05:00.0:
> > > xen_pciback: device is not found/assigned
> > > Jul  2 00:36:54 gaia kernel: [    1.920260] pcieport 0000:00:1d.0:
> > > AER: device recovery successful
> > > Jul  2 00:36:54 gaia kernel: [    1.920263] pcieport 0000:00:1d.0:
> > > DPC: containment event, status:0x1f01 source:0x0000
> > > Jul  2 00:36:54 gaia kernel: [    1.920264] pcieport 0000:00:1d.0:
> > > DPC: unmasked uncorrectable error detected
> > > Jul  2 00:36:54 gaia kernel: [    1.920267] pciback 0000:05:00.0:
> > > xen_pciback: device is not found/assigned
> >
> > That's from a different device (05:00.0).
> 00:1d.0 is the bridge port that 05:00.0 attaches to.
> 
> 
> > >
> > > After the 'xl pci-assignable-list' appears to be self-consistent,
> > > creating VM with the SSD assigned still leads to a guest crash:
> > > From qemu log:
> > > [00:06.0] xen_pt_region_update: Error: create new mem mapping failed! 
> > > (err: 1)
> > > qemu-system-i386: terminating on signal 1 from pid 1192 (xl)
> > >
> > > From the 'xl dmesg' output:
> > > (XEN) d1: GFN 0xf3078 (0xa2616,0,5,7) -> (0xa2504,0,5,7) not permitted
> >
> > Seems like QEMU is attempting to remap a p2m_mmio_direct region.
> >
> > Can you paste the full output of `xl dmesg`? (as that will contain the
> > memory map).
> Attached.
> 
> >
> > Would also be helpful if you could get the RMRR regions from that
> > box. Booting with `iommu=verbose` on the Xen command line should print
> > those.
> Coming in my next reply...

> 00:1d.0 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port 
> #9 (rev f0) (prog-if 00 [Normal decode])
>       Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
> Stepping- SERR- FastB2B- DisINTx+
>       Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- 
> <MAbort- >SERR- <PERR- INTx-
>       Latency: 0, Cache Line Size: 64 bytes
>       Interrupt: pin A routed to IRQ 126
>       IOMMU group: 10
>       Bus: primary=00, secondary=05, subordinate=05, sec-latency=0
>       I/O behind bridge: 0000f000-00000fff [disabled]
>       Memory behind bridge: a2600000-a26fffff [size=1M]
>       Prefetchable memory behind bridge: 00000000fff00000-00000000000fffff 
> [disabled]
>       Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- 
> <MAbort+ <SERR- <PERR-
>       BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16+ MAbort- >Reset- FastB2B-
>               PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
>       Capabilities: [40] Express (v2) Root Port (Slot+), MSI 00
>               DevCap: MaxPayload 256 bytes, PhantFunc 0
>                       ExtTag- RBE+
>               DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
>                       RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
>                       MaxPayload 256 bytes, MaxReadReq 128 bytes
>               DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr+ 
> TransPend-
>               LnkCap: Port #9, Speed 8GT/s, Width x4, ASPM L0s L1, Exit 
> Latency L0s <1us, L1 <16us
>                       ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp+
>               LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+
>                       ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
>               LnkSta: Speed 8GT/s (ok), Width x4 (ok)
>                       TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt-
>               SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- 
> Surprise-
>                       Slot #12, PowerLimit 25.000W; Interlock- NoCompl+
>               SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- 
> LinkChg-
>                       Control: AttnInd Unknown, PwrInd Unknown, Power- 
> Interlock-
>               SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ 
> Interlock-
>                       Changed: MRL- PresDet- LinkState+
>               RootCap: CRSVisible-
>               RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- 
> CRSVisible-
>               RootSta: PME ReqID 0000, PMEStatus- PMEPending-
>               DevCap2: Completion Timeout: Range ABC, TimeoutDis+ NROPrPrP- 
> LTR+
>                        10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- 
> EETLPPrefix-
>                        EmergencyPowerReduction Not Supported, 
> EmergencyPowerReductionInit-
>                        FRS- LN System CLS Not Supported, TPHComp- ExtTPHComp- 
> ARIFwd+
>                        AtomicOpsCap: Routing- 32bit- 64bit- 128bitCAS-
>               DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR+ 
> OBFF Disabled, ARIFwd-
>                        AtomicOpsCtl: ReqEn- EgressBlck-
>               LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink- Retimer- 
> 2Retimers- DRS-
>               LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
>                        Transmit Margin: Normal Operating Range, 
> EnterModifiedCompliance- ComplianceSOS-
>                        Compliance De-emphasis: -6dB
>               LnkSta2: Current De-emphasis Level: -3.5dB, 
> EqualizationComplete+ EqualizationPhase1+
>                        EqualizationPhase2+ EqualizationPhase3+ 
> LinkEqualizationRequest-
>                        Retimer- 2Retimers- CrosslinkRes: unsupported
>       Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
>               Address: fee002b8  Data: 0000
>       Capabilities: [90] Subsystem: Gigabyte Technology Co., Ltd Cannon Lake 
> PCH PCI Express Root Port
>       Capabilities: [a0] Power Management version 3
>               Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA 
> PME(D0+,D1-,D2-,D3hot+,D3cold+)
>               Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
>       Capabilities: [100 v1] Advanced Error Reporting
>               UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- 
> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>               UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt+ 
> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>               UESvrt: DLP+ SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- 
> RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
>               CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- 
> AdvNonFatalErr-
>               CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- 
> AdvNonFatalErr+
>               AERCap: First Error Pointer: 00, ECRCGenCap- ECRCGenEn- 
> ECRCChkCap- ECRCChkEn-
>                       MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
>               HeaderLog: 00000000 00000000 00000000 00000000
>               RootCmd: CERptEn+ NFERptEn+ FERptEn+
>               RootSta: CERcvd- MultCERcvd- UERcvd- MultUERcvd-
>                        FirstFatal- NonFatalMsg- FatalMsg- IntMsg 0
>               ErrorSrc: ERR_COR: 0000 ERR_FATAL/NONFATAL: 0000
>       Capabilities: [140 v1] Access Control Services
>               ACSCap: SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd- 
> EgressCtrl- DirectTrans-
>               ACSCtl: SrcValid+ TransBlk- ReqRedir+ CmpltRedir+ UpstreamFwd- 
> EgressCtrl- DirectTrans-
>       Capabilities: [150 v1] Precision Time Measurement
>               PTMCap: Requester:- Responder:+ Root:+
>               PTMClockGranularity: 4ns
>               PTMControl: Enabled:+ RootSelected:+
>               PTMEffectiveGranularity: Unknown
>       Capabilities: [200 v1] L1 PM Substates
>               L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ 
> L1_PM_Substates+
>                         PortCommonModeRestoreTime=40us PortTPowerOnTime=44us
>               L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1-
>                          T_CommonMode=40us LTR1.2_Threshold=65536ns
>               L1SubCtl2: T_PwrOn=44us
>       Capabilities: [220 v1] Secondary PCI Express
>               LnkCtl3: LnkEquIntrruptEn- PerformEqu-
>               LaneErrStat: 0
>       Capabilities: [250 v1] Downstream Port Containment
>               DpcCap: INT Msg #0, RPExt+ PoisonedTLP+ SwTrigger+ RP PIO Log 
> 4, DL_ActiveErr+
>               DpcCtl: Trigger:1 Cmpl- INT+ ErrCor- PoisonedTLP- SwTrigger- 
> DL_ActiveErr-
>               DpcSta: Trigger- Reason:00 INT- RPBusy- TriggerExt:00 RP PIO 
> ErrPtr:1f
>               Source: 0000
>       Kernel driver in use: pcieport
> 
> 05:00.0 Non-Volatile memory controller: Sandisk Corp Device 501a (prog-if 02 
> [NVM Express])
>       Subsystem: Sandisk Corp Device 501a
>       Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
> Stepping- SERR- FastB2B- DisINTx+
>       Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- 
> <MAbort- >SERR- <PERR- INTx-
>       Latency: 0, Cache Line Size: 64 bytes
>       Interrupt: pin A routed to IRQ 16
>       NUMA node: 0
>       IOMMU group: 13
>       Region 0: Memory at a2600000 (64-bit, non-prefetchable) [size=16K]
>       Region 4: Memory at a2604000 (64-bit, non-prefetchable) [size=256]

I think I'm slightly confused, the overlapping happens at:

(XEN) d1: GFN 0xf3078 (0xa2616,0,5,7) -> (0xa2504,0,5,7) not permitted

So it's MFNs 0xa2616 and 0xa2504, yet none of those are in the BAR
ranges of this device.

Can you paste the lspci -vvv output for any other device you are also
passing through to this guest?

Thanks, Roger.

Reply via email to