RE: [patch 00/39] genirq, PCI/MSI: Support for per device MSI and PCI/IMS - Part 1 cleanups

2022-11-17 Thread Tian, Kevin
> From: Thomas Gleixner 
> Sent: Friday, November 11, 2022 9:54 PM
> 
> Enough of history and theory. Here comes part 1:
> 
> This is just a cleanup and a reorganisation of the PCI/MSI code which
> became quite an unreadable mess over time. There is no intentional
> functional change in this series.
> 
> It's just a separate step to make the subsequent changes in the
> infrastructure easier both to implement and to review.
> 
> Thanks,
> 
>   tglx

The entire series looks good to me except a couple nits replied in
individual patches:

Reviewed-by: Kevin Tian 


[patch 00/39] genirq, PCI/MSI: Support for per device MSI and PCI/IMS - Part 1 cleanups

2022-11-11 Thread Thomas Gleixner
Hi!

This is a three part series which provides support for per device MSI
interrupt domains. This solves conceptual problems of the current PCI/MSI
design which are in the way of providing support for PCI/MSI[-X] and
the upcoming PCI/IMS mechanism on the same device.

IMS (Interrupt Message Store] is a new specification which allows device
manufactures to provide implementaton defined storage for MSI messages
contrary to the uniform and specification defined storage mechanisms for
PCI/MSI and PCI/MSI-X. IMS not only allows to overcome the size limitions
of the MSI-X table, but also gives the device manufacturer the freedom to
store the message in arbitrary places, even in host memory which is shared
with the device.

There have been several attempts to glue this into the current MSI code,
but after lengthy discussions in various threads:

  https://lore.kernel.org/all/20211126230957.239391...@linutronix.de
  
https://lore.kernel.org/all/mwhpr11mb188603d0d809c1079f5817dc8c...@mwhpr11mb1886.namprd11.prod.outlook.com
  
https://lore.kernel.org/all/160408357912.912050.17005584526266191420.st...@djiang5-desk3.ch.intel.com

it turned out that there is a fundamental design problem in the current
PCI/MSI-X implementation. This needs some historical background.

When PCI/MSI[-X] support was added around 2003 interrupt management was
completely different from what we have today in the actively developed
architectures. Interrupt management was completely architecture specific
and while there were attempts to create common infrastructure the
commonalities were rudimental and just providing shared data structures and
interfaces so that drivers could be written in an architecture agnostic
way.

The initial PCI/MSI[-X] support obviously plugged into this model which
resulted in some basic shared infrastructure in the PCI core code for
setting up MSI descriptors, which are a pure software construct for holding
data relevant for a particular MSI interrupt, but the actual association to
Linux interrupts was completely architecture specific. This model is still
supported today to keep museum architectures and notorious stranglers
alive.

In 2013 Intel tried to add support for hotplugable IO/APICs to the kernel
which was creating yet another architecture specific mechanism and resulted
in an unholy mess on top of the existing horrors of x86 interrupt handling.
The x86 interrupt management code was already an uncomprehensible maze of
indirections between the CPU vector management, interrupt remapping and the
actual IO/APIC and PCI/MSI[-X] implementation.

At roughly the same time ARM struggled with the ever growing SoC specific
extensions which were glued on top of the architected GIC interrupt
controller.

This resulted in a fundamental redesign of interrupt management and
provided the today prevailing concept of hierarchical interrupt
domains. This allows to disentangle the interactions between x86 vector
domain and interrupt remapping and also allowed ARM to handle the zoo of
SoC specific interrupt components in a sane way.

The concept of hierarchical interrupt domains aims to encapsulate the
functionality of particual IP blocks which are involved in interrupt
delivery so that they become extensible and pluggable. The X86
encapsulation looks like this:

 |--- device 1
 [Vector]---[Remapping]---[PCI/MSI]--|...
 |--- device N

where the remapping domain is an optional component and in case that it is
not available the PCI/MSI[-X] domains have the vector domain as their
parent. This reduced the required interaction between the domains pretty
much to the initialization phase where it is obviously required to
establish the proper parent relation ship in the components of the
hierarchy.

While in most cases the model is strictly representing the chain of IP
blocks and abstracting them so they can be plugged together to form a
hierarchy, the design stopped short on PCI/MSI[-X]. Looking at the hardware
it's clear that the actual PCI/MSI[-X] interrupt controller is not a global
entity, but strict a per PCI device entity.

Here we took a short cut on the hierarchical model and went for the easy
solution of providing "global" PCI/MSI domains which was possible because
the PCI/MSI[-X] handling is uniform accross the devices. This also allowed
to keep the existing PCI/MSI[-X] infrastructure mostly unchanged which in
turn made it simple to keep the existing architecture specific management
alive.

A similar problem was created in the ARM world with support for IP block
specific message storage. Instead of going all the way to stack a IP block
specific domain on top of the generic MSI domain this ended in a construct
which provides a "global" platform MSI domain which allows overriding the
irq_write_msi_msg() callback per allocation.

In course of the lengthy discussions we identified other abuse of the MSI
infrastructure in wireless drivers, NTB etc.