Re: [RFC PATCH 00/11] Refactor MSI to support Non-PCI device
On 2014/8/1 21:52, Arnd Bergmann wrote: On Wednesday 30 July 2014, Yijing Wang wrote: On 2014/7/29 22:08, Arnd Bergmann wrote: On Saturday 26 July 2014 11:08:37 Yijing Wang wrote: The new data struct for generic MSI driver. struct msi_irqs { u8 msi_enabled:1; /* Enable flag */ u8 msix_enabled:1; struct list_head msi_list; /* MSI desc list */ void *data; /* help to find the MSI device */ struct msi_ops *ops; /* MSI device specific hook */ }; struct msi_irqs is used to manage MSI related informations. Every device supports MSI should contain this data struct and allocate it. I think you should have a stronger association with the 'struct device' here. Can you replace the 'void *data' with 'struct device *dev'? Actually, I used the struct device *dev in my first draft, finally, I replaced it with void *data, because some MSI devices don't have a struct device *dev, like the existing hpet device, dmar msi device, and OF device, like the ARM consolidator. Of course, we can make the MSI devices have their own struct device, and register to device tree, eg, add a class device named MSI_DEV. But I'm not sure whether it is appropriate. It doesn't have to be in the (OF) device tree, but I think it absolutely makes sense to use the 'struct device' infrastructure here, as almost everything uses a device, and the ones that don't do that today can be easily changed. I will try to use struct device infrastructure, thanks for your suggestion. :) The other part I'm not completely sure about is how you want to have MSIs map into normal IRQ descriptors. At the moment, all MSI users are based on IRQ numbers, but this has known scalability problems. Hmmm, I still use the IRQ number to map the MSIs to IRQ description. I'm sorry, I don't understand you meaning. What are the scalability problems you mentioned ? For device drivers, they always process interrupt in two steps. If irq is the legacy interrupt, drivers will first use the irq_of_parse_and_map() or pci_enable_device() to parse and get the IRQ number. Then drivers will call the request_irq() to register the interrupt handler. If irq is MSIs, first call pci_enable_msi/x() to get the IRQ number and then call request_irq() to register interrupt handler. The method you describe here makes sense for PCI devices that are required to support legacy interrupts and may or may not support MSI on a given system, but not so much for platform devices for which we know exactly whether we want to use MSI or legacy interrupts. In particular if you have a device that can only do MSI, the entire pci_enable_msi step is pointless: all we need to do is program the correct MSI target address/message pair into the device and register the handler. Yes, I almost agree if we won't change the existing hundreds of drivers, what I worried about is some drivers may want to know the IRQ numbers, and use the IRQ number to process different things, as I mentioned in another reply. But we can also provide the interface which integrate MSI configuration and request_irq(), if most drivers don't care the IRQ number. I wonder if we can do the interface in a way that hides the interrupt number from generic device drivers and just passes a 'struct irq_desc'. Note that there are long-term plans to get rid of IRQ numbers entirely, but those plans have existed for a long time already without anybody seriously addressing the device driver interfaces so far, so it might never really happen. Maybe this is a huge work, now hundreds drivers use the IRQ number, so maybe we can consider this in a separate title. Sorry for being unclear here: I did suggest changing all drivers now. What I meant is that we use a different API for non-PCI devices that works without IRQ numbers. I don't think we should touch the PCI interfaces at this point. OK, I got it. What I'd envision as the API from the device driver perspective is something as simple like this: struct msi_desc *msi_request(struct msi_chip *chip, irq_handler_t handler, unsigned long flags, const char *name, struct device *dev); which would get an msi descriptor that is valid for this device (dev) connected to a particular msi_chip, and associate a handler function with it. The device driver can call that function and retrieve the address/message pair from the msi_desc in order to store it in its own device specific registers. The request_irq() can be handled internally to msi_request(). This is a huge change for device drivers, and some device drivers don't know which msi_chip their MSI irq deliver to. I'm reworking the msi_chip, and try to use msi_chip to eliminate all arch_msi_xxx() under every arch in kernel. And the important point is how to create the binding for the MSI device to the target msi_chip. Which drivers are you thinking of? Again, I wouldn't expect to
Re: [RFC PATCH 00/11] Refactor MSI to support Non-PCI device
On Monday 04 August 2014, Yijing Wang wrote: I have another question is some drivers will request more than one MSI/MSI-X IRQ, and the driver will use them to process different things. Eg. network driver generally uses one of them to process trivial network thins, and others to transmit/receive data. So, in this case, it seems to driver need to touch the IRQ numbers. wr-linux:~ # cat /proc/interrupts CPU0 CPU1 CPU2 CPU17 CPU18 CPU19 CPU20 CPU21 CPU22 CPU23 .. 100: 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSI-edge eth0 101: 2 0 0 0 0 0 302830488 0 0 0 IR-PCI-MSI-edge eth0-TxRx-0 102:110 0 0 0 0 360675897 0 0 0 0 IR-PCI-MSI-edge eth0-TxRx-1 103:109 0 0 0 0 0 0 0 0 0 IR-PCI-MSI-edge eth0-TxRx-2 104:107 0 0 9678933 0 0 0 0 0 0 IR-PCI-MSI-edge eth0-TxRx-3 105:107 0 0 0 357838258 0 0 0 0 0 IR-PCI-MSI-edge eth0-TxRx-4 106:115 0 0 0 0 0 0 0 0 0 IR-PCI-MSI-edge eth0-TxRx-5 107:114 0 0 0 0 0 0 337866096 0 0 IR-PCI-MSI-edge eth0-TxRx-6 108: 373801199 0 0 0 0 0 0 0 0 0 IR-PCI-MSI-edge eth0-TxRx-7 I think in this example, you just need to request eight interrupts, and pass a different data pointer each time, pointing to the napi_struct of each of the NIC queues. The driver has no need to deal with the IRQ number at all, and I would be surprised if it cared today. Arnd ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH 00/11] Refactor MSI to support Non-PCI device
On Monday 04 August 2014, Yijing Wang wrote: On 2014/8/1 21:52, Arnd Bergmann wrote: On Wednesday 30 July 2014, Yijing Wang wrote: On 2014/7/29 22:08, Arnd Bergmann wrote: The other part I'm not completely sure about is how you want to have MSIs map into normal IRQ descriptors. At the moment, all MSI users are based on IRQ numbers, but this has known scalability problems. Hmmm, I still use the IRQ number to map the MSIs to IRQ description. I'm sorry, I don't understand you meaning. What are the scalability problems you mentioned ? For device drivers, they always process interrupt in two steps. If irq is the legacy interrupt, drivers will first use the irq_of_parse_and_map() or pci_enable_device() to parse and get the IRQ number. Then drivers will call the request_irq() to register the interrupt handler. If irq is MSIs, first call pci_enable_msi/x() to get the IRQ number and then call request_irq() to register interrupt handler. The method you describe here makes sense for PCI devices that are required to support legacy interrupts and may or may not support MSI on a given system, but not so much for platform devices for which we know exactly whether we want to use MSI or legacy interrupts. In particular if you have a device that can only do MSI, the entire pci_enable_msi step is pointless: all we need to do is program the correct MSI target address/message pair into the device and register the handler. Yes, I almost agree if we won't change the existing hundreds of drivers, what I worried about is some drivers may want to know the IRQ numbers, and use the IRQ number to process different things, as I mentioned in another reply. But we can also provide the interface which integrate MSI configuration and request_irq(), if most drivers don't care the IRQ number. The driver would still have the option of getting the IRQ number for now: With the interface I imagine, you would get a 'struct msi_desc' pointer, from which you can look up the 'struct irq_desc' pointer (either embedded in msi_desc, or using a pointer from a member of msi_desc), and you can already get the interrupt number from the irq_desc. My point was that a well-written driver already does not care about the interrupt number: the only information a driver needs in the interrupt handler is a pointer to its own context, which we already derive from the irq_desc. The main interface that currently requires the irq number is free_irq(), but I would argue that we can just add a wrapper that takes the msi_desc pointer as its first argument so the driver does not have to worry about it. We can add additional wrappers like that as needed. What I'd envision as the API from the device driver perspective is something as simple like this: struct msi_desc *msi_request(struct msi_chip *chip, irq_handler_t handler, unsigned long flags, const char *name, struct device *dev); which would get an msi descriptor that is valid for this device (dev) connected to a particular msi_chip, and associate a handler function with it. The device driver can call that function and retrieve the address/message pair from the msi_desc in order to store it in its own device specific registers. The request_irq() can be handled internally to msi_request(). This is a huge change for device drivers, and some device drivers don't know which msi_chip their MSI irq deliver to. I'm reworking the msi_chip, and try to use msi_chip to eliminate all arch_msi_xxx() under every arch in kernel. And the important point is how to create the binding for the MSI device to the target msi_chip. Which drivers are you thinking of? Again, I wouldn't expect to change any PCI drivers, but only platform drivers that do native MSI, so we only have to change drivers that do not support any MSI at all yet and that need to be changed anyway in order to add support. I mean platform device drivers, because we can find the target msi_chip by some platform interfaces(like the existing of_pci_find_msi_chip_by_node()). So we no need to explicitly provide the msi_chip as the function argument. Right, that works too. I was thinking we might need an interface that allows us to pick a particular msi_chip if there are several alternatives (e.g. one in the GIC and one in the PCI host), but you are right: we should normally be able to hardwire that information in DT or elsewhere, and just need the 'struct device pointer' which should probably be the first argument here. As you pointed out, it's common to have multiple MSIs for a single device, so we also need a context to pass around, so my suggestion would become something like: struct msi_desc *msi_request(struct device *dev, irq_handler_t handler, unsigned long flags, const char *name, void *data); It's possible that we have to add one or two more arguments
RE: [RFC PATCH 00/11] Refactor MSI to support Non-PCI device
Hi Yijing -Original Message- From: Yijing Wang [mailto:wangyij...@huawei.com] Sent: Saturday, July 26, 2014 8:39 AM To: linux-ker...@vger.kernel.org Cc: Xinwei Hu; Wuyun; Bjorn Helgaas; linux-...@vger.kernel.org; paul.mu...@huawei.com; James E.J. Bottomley; Marc Zyngier; linux-arm- ker...@lists.infradead.org; Russell King; linux-a...@vger.kernel.org; Basu Arnab-B45036; virtualization@lists.linux-foundation.org; Hanjun Guo; Yijing Wang Subject: [RFC PATCH 00/11] Refactor MSI to support Non-PCI device Hi all, The series is a draft of generic MSI driver that supports PCI and Non-PCI device which have MSI capability. If you're not interested it, sorry for the noise. Thanks for sending out these patches, I have some (very basic) questions. The series is based on Linux-3.16-rc1. MSI was introduced in PCI Spec 2.2. Currently, kernel MSI driver codes are bonding with PCI device. Because MSI has a lot advantages in design. More and more non-PCI devices want to use MSI as their default interrupt. The existing MSI device include HPET. HPET driver provide its own MSI code to initialize and process MSI interrupts. In the latest GIC v3 spec, legacy device can deliver MSI by the help of a relay device named consolidator. Consolidator can translate the legacy interrupts connected to it to MSI/MSI-X. And new non-PCI device will be designed to support MSI in future. So make the MSI driver code be generic will help the non-PCI device use MSI more simply. As per my understanding the GICv3 provides a service that will convert writes to a specified address to IRQs delivered to the core and as you mention above MSIs are part of the PCI spec. So I can see a strong case for non-PCI devices to want MSI like functionality without being fully compliant with the requirements of the MSI spec. My question is do we necessarily want to rework so much of the PCI-MSI layer to support non PCI devices? Or will it be sufficient to create a framework to allow non PCI devices to hook up with a device that can convert their writes to an IRQ to the core. As I understand it, the msi_chip is (almost) such a framework. The only problem being that it makes some PCI specific assumptions (such as PCI specific writes from within msi_chip functions). Won't it be sufficient to make the msi_chip framework generic enough to be used by non-PCI devices and let each bus/device manage any additional requirements (such as configuration flow, bit definitions etc) that it places on message based interrupts? Thanks Arnab ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH 00/11] Refactor MSI to support Non-PCI device
The method you describe here makes sense for PCI devices that are required to support legacy interrupts and may or may not support MSI on a given system, but not so much for platform devices for which we know exactly whether we want to use MSI or legacy interrupts. In particular if you have a device that can only do MSI, the entire pci_enable_msi step is pointless: all we need to do is program the correct MSI target address/message pair into the device and register the handler. Yes, I almost agree if we won't change the existing hundreds of drivers, what I worried about is some drivers may want to know the IRQ numbers, and use the IRQ number to process different things, as I mentioned in another reply. But we can also provide the interface which integrate MSI configuration and request_irq(), if most drivers don't care the IRQ number. The driver would still have the option of getting the IRQ number for now: With the interface I imagine, you would get a 'struct msi_desc' pointer, from which you can look up the 'struct irq_desc' pointer (either embedded in msi_desc, or using a pointer from a member of msi_desc), and you can already get the interrupt number from the irq_desc. My point was that a well-written driver already does not care about the interrupt number: the only information a driver needs in the interrupt handler is a pointer to its own context, which we already derive from the irq_desc. Agree, I will try to introduce this similar interface in next version, thanks! The main interface that currently requires the irq number is free_irq(), but I would argue that we can just add a wrapper that takes the msi_desc pointer as its first argument so the driver does not have to worry about it. We can add additional wrappers like that as needed. OK This is a huge change for device drivers, and some device drivers don't know which msi_chip their MSI irq deliver to. I'm reworking the msi_chip, and try to use msi_chip to eliminate all arch_msi_xxx() under every arch in kernel. And the important point is how to create the binding for the MSI device to the target msi_chip. Which drivers are you thinking of? Again, I wouldn't expect to change any PCI drivers, but only platform drivers that do native MSI, so we only have to change drivers that do not support any MSI at all yet and that need to be changed anyway in order to add support. I mean platform device drivers, because we can find the target msi_chip by some platform interfaces(like the existing of_pci_find_msi_chip_by_node()). So we no need to explicitly provide the msi_chip as the function argument. Right, that works too. I was thinking we might need an interface that allows us to pick a particular msi_chip if there are several alternatives (e.g. one in the GIC and one in the PCI host), but you are right: we should normally be able to hardwire that information in DT or elsewhere, and just need the 'struct device pointer' which should probably be the first argument here. As you pointed out, it's common to have multiple MSIs for a single device, so we also need a context to pass around, so my suggestion would become something like: struct msi_desc *msi_request(struct device *dev, irq_handler_t handler, unsigned long flags, const char *name, void *data); It's possible that we have to add one or two more arguments here. Good suggestion, thanks! A degenerate case of this would be a system where a PCI device sends its MSI into the host controller, that generates a legacy interrupt and that in turn gets sent to an irqchip which turns it back into an MSI for the GICv3. This would of course be very inefficient, but I think we should be able to express this with both the binding and the in-kernel framework just to be on the safe side. Yes, the best way to tell the kernel which msi_chip should deliver to is describe the binding in DTS file. If a real degenerate case found, we can update the platform interface which is responsible for getting the match msi_chip in future. Ok. Arnd . -- Thanks! Yijing ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH 00/11] Refactor MSI to support Non-PCI device
On 2014/8/4 22:45, Arnd Bergmann wrote: On Monday 04 August 2014, Yijing Wang wrote: I have another question is some drivers will request more than one MSI/MSI-X IRQ, and the driver will use them to process different things. Eg. network driver generally uses one of them to process trivial network thins, and others to transmit/receive data. So, in this case, it seems to driver need to touch the IRQ numbers. wr-linux:~ # cat /proc/interrupts CPU0 CPU1 CPU2 CPU17 CPU18 CPU19 CPU20 CPU21 CPU22 CPU23 .. 100: 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSI-edge eth0 101: 2 0 0 0 0 0 302830488 0 0 0 IR-PCI-MSI-edge eth0-TxRx-0 102:110 0 0 0 0 360675897 0 0 0 0 IR-PCI-MSI-edge eth0-TxRx-1 103:109 0 0 0 0 0 0 0 0 0 IR-PCI-MSI-edge eth0-TxRx-2 104:107 0 0 9678933 0 0 0 0 0 0 IR-PCI-MSI-edge eth0-TxRx-3 105:107 0 0 0 357838258 0 0 0 0 0 IR-PCI-MSI-edge eth0-TxRx-4 106:115 0 0 0 0 0 0 0 0 0 IR-PCI-MSI-edge eth0-TxRx-5 107:114 0 0 0 0 0 0 337866096 0 0 IR-PCI-MSI-edge eth0-TxRx-6 108: 373801199 0 0 0 0 0 0 0 0 0 IR-PCI-MSI-edge eth0-TxRx-7 I think in this example, you just need to request eight interrupts, and pass a different data pointer each time, pointing to the napi_struct of each of the NIC queues. The driver has no need to deal with the IRQ number at all, and I would be surprised if it cared today. Yes, you are right, this is not a stumbling block. :) Arnd . -- Thanks! Yijing ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization