Re: PCIe coherency in spec (was: [RFC PATCH 2/2] drm/ttm: downgrade cached to write_combined when snooping not available)

2024-07-03 Thread Icenowy Zheng
在 2024-07-03星期三的 23:11 -0700,Christoph Hellwig写道:
> On Thu, Jul 04, 2024 at 10:00:52AM +0800, Icenowy Zheng wrote:
> > So I here want to ask a question as an individual hacker: what's
> > the
> > policy of linux-pci towards these non-coherent PCIe
> > implementations?
> > 
> > If the sentences of Christian is right, these implementations are
> > just
> > out-of-spec, should them get purged out of the kernel, or at least
> > raising a warning that some HW won't work because of inconformant
> > implementation?
> 
> Nothing in the PCIe specifications that mandates a programming model.
> Non-coherent DMA is extremely common in lower end devices, and
> despite
> all the issues that it causes well supported in Linux.
> 
> What are you trying to solve?

Currently the DRM TTM subsystem (and GPU drivers using it) will assume
coherency and fail on these non-coherent systems with cryptic error
messages (like `[drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring gfx
test failed (-110)`) without mentioning coherency issues at all.

My original patchset tries to solve this problem by make the TTM
subsystem sensible of coherency status (and prevent CPU-side cached
mapping when non-coherent), but got argued by TTM maintainer and the
maintainer says TTM's ignorance on non-coherent systems is intentional.

> 



Re: PCIe coherency in spec (was: [RFC PATCH 2/2] drm/ttm: downgrade cached to write_combined when snooping not available)

2024-07-03 Thread Icenowy Zheng
在 2024-07-03星期三的 16:08 -0500,Bjorn Helgaas写道:
> On Wed, Jul 03, 2024 at 04:52:30PM +0800, Jiaxun Yang wrote:
> > 在2024年7月2日七月 下午6:03,Jiaxun Yang写道:
> > > 在2024年7月2日七月 下午5:27,Christian König写道:
> > > > Am 02.07.24 um 11:06 schrieb Icenowy Zheng:
> > > > > [SNIP] However I don't think the definition of the AGP spec
> > > > > could apply on all
> > > > > PCI(e) implementations. The AGP spec itself don't apply on
> > > > > implementations that do not implement AGP (which is the most
> > > > > PCI(e)
> > > > > implementations today), and it's not in the reference list of
> > > > > the PCIe
> > > > > spec, so it does no help on this context. 
> > > > No, exactly that is not correct.
> > > > 
> > > > See as I explained the No-Snoop extension to PCIe was created
> > > > to help 
> > > > with AGP support and later merged into the base PCIe
> > > > specification.
> > > > 
> > > > So the AGP spec is now part of the PCIe spec.
> > 
> > Hi Bjorn & linux-pci folks,
> > 
> > It seems like we have some disputes on interpretation pf PCIe
> > specification.
> > 
> > We are seeking your expertise on the question: Does PCIe
> > specification mandate Cache coherency via snoop?
> 
> I'm not qualified to opine on this.  I'd say it's a question for the
> PCI SIG protocol workgroup.  https://forum.pcisig.com/ is a place to
> start.

Sorry for the disturbance.

As individual hacker, I am not eligble of being a PCI-SIG member and
join the discussion there.

So I here want to ask a question as an individual hacker: what's the
policy of linux-pci towards these non-coherent PCIe implementations?

If the sentences of Christian is right, these implementations are just
out-of-spec, should them get purged out of the kernel, or at least
raising a warning that some HW won't work because of inconformant
implementation?

> 
> Bjorn



Re: PCIe coherency in spec (was: [RFC PATCH 2/2] drm/ttm: downgrade cached to write_combined when snooping not available)

2024-07-03 Thread Bjorn Helgaas
On Wed, Jul 03, 2024 at 04:52:30PM +0800, Jiaxun Yang wrote:
> 在2024年7月2日七月 下午6:03,Jiaxun Yang写道:
> > 在2024年7月2日七月 下午5:27,Christian König写道:
> >> Am 02.07.24 um 11:06 schrieb Icenowy Zheng:
> >>> [SNIP] However I don't think the definition of the AGP spec could apply 
> >>> on all
> >>> PCI(e) implementations. The AGP spec itself don't apply on
> >>> implementations that do not implement AGP (which is the most PCI(e)
> >>> implementations today), and it's not in the reference list of the PCIe
> >>> spec, so it does no help on this context. 
> >> No, exactly that is not correct.
> >>
> >> See as I explained the No-Snoop extension to PCIe was created to help 
> >> with AGP support and later merged into the base PCIe specification.
> >>
> >> So the AGP spec is now part of the PCIe spec.
> 
> Hi Bjorn & linux-pci folks,
> 
> It seems like we have some disputes on interpretation pf PCIe
> specification.
> 
> We are seeking your expertise on the question: Does PCIe
> specification mandate Cache coherency via snoop?

I'm not qualified to opine on this.  I'd say it's a question for the
PCI SIG protocol workgroup.  https://forum.pcisig.com/ is a place to
start.

Bjorn